================== DataArray accessor ================== .. _xarray: http://xarray.pydata.org .. ipython:: python :suppress: import numpy as np import pandas as pd import xarray as xr ``cerbere`` provides an accessor to the xarray_ ``DataArray`` class, called ``cb``. This accessor offers a set of attributes and methods that enrich those provided natively by xarray_. Standard attributes =================== The ``cb`` accessor gives access to standard variable attributes, based on CF and other conventions, through a set of properties. For instance: .. ipython:: python # create a DataArray da = xr.DataArray(data=100., dims=[], attrs={'standard_name': 'water_temperature'}) # import cerbere import cerbere # get the standard_name attribute with the `standard_name` property of # the cerbere `cb` accessor stdname = da.cb.standard_name # or set this attribute when creating a DataArray da.cb.standard_name = 'sea_surface_temperature' Using named attributes, instead of a free dictionary as in the ``attrs`` property of xarray_ ``DataArray`` class, helps improving the consistency of datasets (avoiding using different names or variants, typo errors,...) for data producers, and the code genericity (the caller is expecting fixed properties) for data users. Science dtype ============= The ``science_dtype`` attribute preserves the "true" data type of the quantity stored in an array. It was introduced as a workaround to deal with xarray dtype changes, as illustrated below: .. ipython:: python # let's init a DataArray from an integer numpy array. da = xr.DataArray(np.arange(10, dtype=np.int32)) # The created DataArray is of the same type as the initial numpy array da Now let's create a DataArray from a numpy MaskedArray instead, and sea what happens to the array dtype and masked values: .. ipython:: python arr = np.ma.masked_greater(np.arange(10, dtype=np.int32), 5) arr.set_fill_value(999) da = xr.DataArray(arr) # The created DataArray was changed to float and the masked values to NaNs da Converting back to a numpy MaskedArray will still return a float array, the original dtype of the array (int32) was lost: .. ipython:: python da.to_masked_array() ``cerbere`` provides a DataArray constructor that will prevent this, by storing the original dtype (in ``science_dtype``) and fill value (999): .. ipython:: python :okwarning: # creates the DataArray with the cerbere constructor da = cerbere.new_array(arr) # using the cerbere to_masked_array accessor function instead da.cb.to_masked_array() subsetting with isel ==================== ``cerbere`` extends the ``isel`` methods of xarray, trough a redefinition of these methods in ``cb`` accessor, providing additional arguments. When extracting a subset from a DataArray beyond its limits, padding can be applied to return a new DataArray of the expected size. Let's look at this example: .. ipython:: python :okwarning: # let's init a DataArray from an integer numpy array. da = xr.DataArray(np.arange(10, dtype=np.int32), dims=['lat']) # extracting a subset within the array limits. The output array has an # expected size of 5 subset = da.isel(lat=slice(0, 5)) subset # now extracting a subset beyond the array limits. xarray automatically # trims the output array which has now a size of 2 subset = da.isel(lat=slice(8, 13)) subset # using now the cerbere isel method, we get an output dataset of size 5 # with padded values beyond the initial array limit subset = da.cb.isel(lat=slice(8, 13), padding=True) subset # this works with negative indices too subset = da.cb.isel(lat=slice(-2, 3), padding=True) subset Note that when padding, the array dtype is changed here to float as xarray would normally do with a numpy MaskedArray (see `Science dtype`_ section above). This can be avoided by preserving the original array dtype (NaNs are then replaced with fill values) using ``as_science_dtype`` keyword, or returning the result as numpy MaskedArray using ``as_masked_array`` keyword: .. ipython:: python :okwarning: # preserving the original data type subset = da.cb.isel(lat=slice(-2, 3), padding=True, as_science_dtype=True) subset # returning the result as a MaskedArray subset = da.cb.isel(lat=slice(-2, 3), padding=True, as_masked_array=True) subset