================ Dataset accessor ================ .. _xarray: http://xarray.pydata.org .. ipython:: python :suppress: from datetime import datetime import numpy as np import pandas as pd import xarray as xr ``cerbere`` provides an accessor to the xarray_ ``Dataset`` class, called ``cb``. This accessor offers a set of attributes and methods that enrich those provided natively by xarray_. Harmonization ============= ``cerbere`` harmonizes Dataset object related to Earth observation data by enforcing CF and others conventions, providing consistent naming for coordinate variables and dimensions but also variable and global attributes, similar types and conventions for coordinate data, etc... It also tries to fill in some generic standard attributes defined by the aforementioned conventions. Harmonization rules include: * consistent naming of latitude, longitude and time as ``lat``, ``lon``, ``time`` * detection of X, Y, Z, T axis coordinates and dimensions * detection of instance dimensions for collections of discrete features * longitudes changed to [-180, 180] range unless setting the ``longitude180`` global option to False with ``cerbere.set_options(longitude180=False)`` * consistent naming (inferring from data) of global attributes such as ``time_coverage_start``, ``time_coverage_end``, ... Given an xarray `Dataset`, ``cerbere`` will perform some guesses to provide a more harmonized version of this dataset in ``cfdataset`` property of the cerbere ``cb`` accessor. For instance: .. ipython:: python # create a Dataset dst = xr.Dataset( {'myvar': (['LATITUDE', 'LONGITUDE', 'Z'], np.ones(shape=(160, 360,3)))}, coords={ 'TIME': (['TIME'], [datetime(2018, 1, 1)], {'units': 'seconds since 2001-01-01 00:00:00'}), 'LATITUDE': (['LATITUDE'], np.arange(-80, 80, 1)), 'LONGITUDE': (['LONGITUDE'], np.arange(-180, 180, 1)), 'DEPTH': (['Z'], np.arange(5, 20, 5))}, attrs={ 'start_date': '2018-01-01 00:00:00', 'stop_date': '2018-02-01 00:00:00'} ) import cerbere # get a cerberized version with renaming of variables, dimensions and # attributes, added complementary CF attributes cfdst = dst.cb.cfdataset cfdst Now let's see what happened to above dataset when retrieving its cerbere compliant version through ``cb.cfdataset`` property. Spatiotemporal coordinates ========================== `TIME`, `LATITUDE` and `LONGITUDE` were renamed to `time`, `lat`, `lon` used consistently within ``cerbere``. Users can safely use these harmonized variable names whenever reading and manipulating Earth Observation data with ``cerbere``. Other variants of coordinate naming are recognized; in case the `time`, `lat`, `lon` variables are not found in the file, an error will be thrown and this case should be managed through a specific reader when encountering such dataset. The spatiotemporal coordinates can be listed with ``cf_coords`` method, which provides the mapping between the CF coordinate reference and the internal naming in the dataset after ``cerbere`` harmonization. .. ipython:: python # get the spatiotemporal coordinates cfdst.cb.cf_coords Note the `time`, `lat`, `lon` naming convention in ``cerbere``, matching the `time`, `latitude`, `longitude` CF standard names. ``cerbere`` does not rename the vertical coordinate as it can be misleading due to the many ways of expressing this quantity (depth, altitude, pressure, sigma level,...) but it can be accessed in a unified way through ``vertical`` property: .. ipython:: python # get the spatiotemporal vertical coordinate cfdst.cb.vertical Similarly, ``time``, ``latitude``, ``longitude`` can be used to access the other spatiotemporal coordinates. Spatiotemporal axes =================== The CF axis dimensions corresponding to spatiotemporal information (`X`, `Y`, `Z`, `T`) have been detected and can be listed with ``cf_axis_dims`` method: .. ipython:: python # get the coordinates for each CF axis cfdst.cb.cf_axis_dims Coordinates that depend only on a single spatiotemporal axis are refered to as as `axis coordinates`. The `axis` attribute is set by ``cerbere`` for the corresponding coordinate variables. They can be listed with ``cf_axis_coords`` method: .. ipython:: python # get the spatiotemporal vertical coordinate cfdst.cb.cf_axis_coords # axis attribute value for time, added by cerbere cfdst['time'].attrs['axis'] Note that axis coordinate variables can be individually retrieved using their CF standard axis name, ``X``, ``Y``, ``Z`` or ``T`` (None is returned if it does not exist or was not identified as such by ``cerbere``): .. ipython:: python cfdst.cb.X Spatiotemporal attributes ========================= ``cerbere`` also harmonizes some attributes. In above example, `start_date` and `stop_date` were detected as aliases of the more standard `time_coverage_start` and `time_coverage_end` (from ACDD convention) that can be accessed with the properties ``time_coverage_start`` and ``time_coverage_end``: .. ipython:: python cfdst.cb.time_coverage_start