======== Overview ======== .. _Xarray: http://xarray.pydata.org .. _BackendEntryPoint: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html .. _flags: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#flags .. |feature| replace:: :mod:`~cerbere.feature` History ======= ``cerbere`` was created in 2010 at Ifremer / CERSAT as a unified python API to manipulate any type of spatio-temporal observations, which can be read from many existing storage formats using the same set of functions. It also provided classes for specific types of observation features. Many of these aspects are now handled by other packages, such as Xarray_, and ``cerbere`` has evolved accordingly, taking advantage of these existing contributions and refining them. So...why do we need Cerbere? ============================ When using different earth observation products, one faces straight away various painful issues: Earth Observation data products come in a variety of format, self described or not: NetCDF, HDF, GRIB, BUFR, agency specific formats (at ESA, Eumetsat,...), weird binary storage or worst mixed binary/ASCII formats... Xarray_ does handle some of them but in limited number. Even when using a widely used format like netCDF, readable with Xarray_, these products come in a variety of conventions for the representation of the same information, whether it is data (time, lat, lon, ...) or metadata (time coverage start or end, spatial footprint, ...). And even when based on the same convention (such as Climate and Forecast convention - CF -, data products come with different or incomplete interpretations of it, and still have significant discrepancies that prevent from using them in a generic way. Subtle tweaking of the variables or attributes naming are required by the user that are a waste of time and complexify code that could be generic and accept different type of inputs. Besides Xarray_ has a few shortcomings, notably with regard to the way it handles masked arrays (variables with missing values) in particular for integer variables which are often used in observation data for ancillary fields (flags, sensor ancillary fields,...). Concept ======= ``cerbere`` now leverages on Xarray_ for internal data structure but goes further in terms of harmonization of feature description and metadata, following standards such as CF convention and other community efforts (ACDD, GHRSST) for naming coordinates, dimensions and frequently used attributes, and adding a true typology of common Earth Observation patterns. This is required to implement generic software, truly independent from the choices of the various data producers. We assume indeed there is no practical reason why data corresponding to the same sampling pattern (feature) would be represented differently, or would have different namings for coordinates, dimensions, etc . ``cerbere`` therefore extends further Xarray_ by pre-defining the names, dimensions and main attributes required to properly describe an Earth observation feature. Having a set of predefined templates for each feature type allows to write once and for all the commonly used generic operations applied to such object, such as display, extraction of values, remapping or resampling, and - most of all for data producers - saving observation data consistently to the same format and metadata content. ``cerbere`` is the middleware between domain agnostic Xarray_ and generic usage of Earth Observation data. How does it work? ================= Users familiar with Xarray_ will easily adapt to the additional features provided by ``cerbere``. ``cerbere`` specializes the xarray DataArray and Dataset classes through accessors to implement particular attributes and behaviours, and adds a collection of new classes implementing particular data features gathered in the |feature| subpackage. It also comes with numerous contrib packages providing xarray backend engines to read additional formats or further harmonize the content of the different datasets that can be read with Xarray_. ``cerbere`` accessors --------------------- ``cerbere`` provides an accessor, called ``cb``, to both Xarray_ ``DataArray`` and and ``Dataset`` classes. Accessors in Xarray_ is the recommended way of specializing xarray classes. The ``cb`` :doc:`accessor ` for ``DataArray`` objects offers a set of attributes and methods that enrich the API provided natively by Xarray_. It includes additional standard attributes (``units``, ``standard_name``, .. .), helper functions to handle flags_ variables, transform data back and force to numpy ``MaskedArray`` and better preserve the science data type than Xarray does, ... The ``cb`` :doc:`accessor ` for ``Dataset`` objects offers a set of attributes and methods that enrich the API provided natively by Xarray_. It performs harmonization of the ``Dataset`` object coordinates naming, expression of time and longitude, helper functions to subset these objects,... ``backend`` and ``reader`` classes ---------------------------------- Some datasets deviate largely from the CF/Cerbere format and content recommendations, and it may be very complex (and too specific) to guess and handle all these particular cases in the harmonization performed by ``cerbere`` accessors. Xarray_ provides the BackendEntryPoint_ subclassing mechanism to handle new format engines (in addition to those readily available such as `netcdf4` etc. ..). It is the preferred way to implement the access to format previously unknown to Xarray_ ecosystem (BUFR, EPS, ...). We define some BackendEntryPoint_ classes in contrib packages. For datasets that can be opened with existing Xarray_ engines, but requiring significant transformation and postprocessing to harmonize them to CF/Cerbere requirements, ``reader`` classes can be written for any dataset that requires such transformation. Loads of additional contrib packages (provided independently to avoid stacking unnecessary classes or dependencies in your environment) are already available in the ``cerbere`` ecosystem. You need to install them in addition to ``cerbere`` core package if you need them. The complete list of existing ``reader`` classes, and their compatibility with known EO products is listed in :doc:`compatibility`. If no ``reader`` class can handle a particular dataset, a new corresponding ``reader`` class must be written. Refer to :doc:`writing_reader` to write your own ``reader`` class. Last, ``reader`` classes are registered internally when installed (through python EntryPoint); they can be dynamically discovered and loaded thanks to a guessing mechanism based on the file patterns associated with a given reader class. ``cerbere`` provides a ``open_dataset`` function (similar to the Xarray_ one) that allows to open a file while fetching transparently for the user the proper reader: .. code-block:: python import cerbere # detects it is a GHRSST file, fetches the correct reader and returns a # CF/Cerbere compliant Xarray Dataset object. ds = cerbere.open_dataset('./samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc') ds ``feature`` classes ------------------- The |feature| module provides a set of classes implementing each known observation pattern, using a unique representation for each of them, inspired from CF convention and other works. It therefore extends above category to "typed geospatial" xarray_ ``Dataset`` objects, providing further restrictions on the content and naming of a set of data so that similar patterns of observation are represented and accessed in the same way, allowing generic manipulation of observation data and specific handling or display functions. It consists in a higher level of abstraction above xarray_ ``Dataset`` classes, fitting data into common templates. A feature object can easily be created from xarray_ ``Dataset`` or using the ``open_feature`` function. With the above example that corresponds to a satellite swath, we would create a ``Swath`` object as follow: .. code-block:: python import cerbere import cerbere.feature # detects it is a GHRSST file, fetches the correct reader and returns a # CF/Cerbere compliant Xarray Dataset object. ds = cerbere.open_dataset('./samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc') swath = cerbere.feature.cswath.Swath(ds) swath or more simply, with ``open_feature``: .. code-block:: python import cerbere # detects it is a GHRSST file, fetches the correct reader and returns a # CF/Cerbere compliant Swath feature object. swath = cerbere.open_feature( './samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc', 'Swath') swath Currently managed features include : * :class:`~cerbere.feature.cgrid.Grid` * :class:`~cerbere.feature.cswath.Swath` * :class:`~cerbere.feature.cimage.Image` * :class:`~cerbere.feature.cgridtimeseries.GridTimeSeries` * :class:`~cerbere.feature.ctrajectory.Trajectory` * :class:`~cerbere.feature.cpointtimeseries.PointTimeSeries` * :class:`~cerbere.feature.cpoint.Point` * :class:`~cerbere.feature.cprofile.Profile` * :class:`~cerbere.feature.ctimeseries.TimeSeries` * :class:`~cerbere.feature.ctimeseriesprofile.TimeSeriesProfile` * :class:`~cerbere.feature.ctrajectoryprofile.TrajectoryProfile` For more features and more details on their properties, refer to the _ section.