==================
DataArray accessor
==================
.. _xarray: http://xarray.pydata.org

.. ipython:: python
    :suppress:

    import numpy as np
    import pandas as pd
    import xarray as xr

``cerbere`` provides an accessor to the xarray_ ``DataArray`` class, called
``cb``. This accessor offers a set of attributes and methods that enrich
those provided natively by xarray_.


Standard attributes
===================

The ``cb`` accessor gives access to standard variable attributes, based on CF
and other conventions, through a set of properties. For instance:

.. ipython:: python

    # create a DataArray
    da = xr.DataArray(data=100., dims=[],
        attrs={'standard_name': 'water_temperature'})

    # import cerbere
    import cerbere

    # get the standard_name attribute with the `standard_name` property of
    # the cerbere `cb` accessor
    stdname = da.cb.standard_name

    # or set this attribute when creating a DataArray
    da.cb.standard_name = 'sea_surface_temperature'


Using named attributes, instead of a free dictionary as in the ``attrs``
property of xarray_ ``DataArray`` class, helps improving the consistency of
datasets (avoiding using different names or variants, typo errors,...) for
data producers, and the code genericity (the caller is expecting fixed
properties) for data users.


Science dtype
=============

The ``science_dtype`` attribute preserves the "true" data type of the quantity
stored in an array. It was introduced as a workaround to deal with xarray
dtype changes, as illustrated below:


.. ipython:: python

    # let's init a DataArray from an integer numpy array.
    da = xr.DataArray(np.arange(10, dtype=np.int32))

    # The created DataArray is of the same type as the initial numpy array
    da


Now let's create a DataArray from a numpy MaskedArray instead, and sea what
happens to the array dtype and masked values:

.. ipython:: python

    arr = np.ma.masked_greater(np.arange(10, dtype=np.int32), 5)
    arr.set_fill_value(999)
    da = xr.DataArray(arr)

    # The created DataArray was changed to float and the masked values to NaNs
    da


Converting back to a numpy MaskedArray will still return a float array, the
original dtype of the array (int32) was lost:

.. ipython:: python

    da.to_masked_array()


``cerbere`` provides a DataArray constructor that will prevent this, by
storing the original dtype (in ``science_dtype``) and fill value (999):

.. ipython:: python
   :okwarning:

    # creates the DataArray with the cerbere constructor
    da = cerbere.new_array(arr)

    # using the cerbere to_masked_array accessor function instead
    da.cb.to_masked_array()


subsetting with isel
====================

``cerbere`` extends the ``isel`` methods of xarray,
trough a redefinition of these methods in ``cb`` accessor, providing
additional arguments.

When extracting a subset from a DataArray beyond its limits, padding can be
applied to return a new DataArray of the expected size. Let's look at this
example:


.. ipython:: python
   :okwarning:

    # let's init a DataArray from an integer numpy array.
    da = xr.DataArray(np.arange(10, dtype=np.int32), dims=['lat'])

    # extracting a subset within the array limits. The output array has an
    # expected size of 5
    subset = da.isel(lat=slice(0, 5))
    subset

    # now extracting a subset beyond the array limits. xarray automatically
    # trims the output array which has now a size of 2
    subset = da.isel(lat=slice(8, 13))
    subset

    # using now the cerbere isel method, we get an output dataset of size 5
    # with padded values beyond the initial array limit
    subset = da.cb.isel(lat=slice(8, 13), padding=True)
    subset

    # this works with negative indices too
    subset = da.cb.isel(lat=slice(-2, 3), padding=True)
    subset


Note that when padding, the array dtype is changed here to float as xarray
would normally do with a numpy MaskedArray (see `Science dtype`_ section above).
This can be avoided by preserving the original array dtype (NaNs are then
replaced with fill values) using ``as_science_dtype`` keyword, or returning the
result as numpy MaskedArray using ``as_masked_array`` keyword:

.. ipython:: python
   :okwarning:

    # preserving the original data type
    subset = da.cb.isel(lat=slice(-2, 3), padding=True, as_science_dtype=True)
    subset

    # returning the result as a MaskedArray
    subset = da.cb.isel(lat=slice(-2, 3), padding=True, as_masked_array=True)
    subset