========
Overview
========
.. _Xarray: http://xarray.pydata.org
.. _BackendEntryPoint: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html
.. _flags: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#flags

.. |feature| replace:: :mod:`~cerbere.feature`


History
=======

``cerbere`` was created in 2010 at Ifremer / CERSAT as a unified python API to
manipulate any type of spatio-temporal observations, which can be read from
many existing storage formats using the same set of functions. It also
provided classes for specific types of observation features.

Many of these aspects are now handled by other packages, such as Xarray_, and
``cerbere`` has evolved accordingly, taking advantage of these existing
contributions and refining them.


So...why do we need Cerbere?
============================

When using different earth observation products, one faces straight away various
painful issues:

Earth Observation data products come in a variety of format, self described or
not: NetCDF, HDF, GRIB, BUFR, agency specific formats (at ESA, Eumetsat,...),
weird binary storage or worst mixed binary/ASCII formats... Xarray_ does
handle some of them but in limited number.

Even when using a widely used format like netCDF, readable with Xarray_, these
products come in a variety   of conventions for the representation of the
same information, whether it is data (time, lat, lon, ...) or metadata
(time coverage start or end, spatial footprint, ...). And even when based on
the same convention (such as Climate and Forecast convention - CF -, data
products come with different or incomplete interpretations of it, and still
have significant discrepancies that prevent from using them in a generic way.
Subtle tweaking of the variables or attributes naming are required by the user
that are a waste of time and complexify code that could be generic and accept
different type of inputs.

Besides Xarray_ has a few shortcomings, notably with regard to the way
it handles masked arrays (variables with missing values) in particular for
integer variables which are often used in observation data for ancillary
fields (flags, sensor ancillary fields,...).

Concept
=======

``cerbere`` now leverages on Xarray_ for internal data structure but goes
further in terms of harmonization of feature description and metadata, following
standards such as CF convention and other community efforts (ACDD, GHRSST)
for naming coordinates, dimensions and frequently used attributes, and
adding a true typology of common Earth Observation patterns. This is required
to implement generic software, truly independent from the choices of the various
data producers. We assume indeed there is no practical reason why data
corresponding to the same sampling pattern (feature) would be represented
differently, or would have different namings for coordinates, dimensions, etc
. ``cerbere`` therefore extends further Xarray_ by pre-defining the names,
dimensions and main attributes required to properly describe an Earth
observation feature. Having a set of predefined templates for each feature
type allows to write once and for all the commonly used generic operations
applied to such object, such as display, extraction of values, remapping or
resampling, and - most of all for data producers - saving observation data
consistently to the same format and metadata content.

``cerbere`` is the middleware between domain agnostic Xarray_ and generic
usage of Earth Observation data.


How does it work?
=================

Users familiar with Xarray_ will easily adapt to the additional features
provided by ``cerbere``.

``cerbere`` specializes the xarray DataArray and Dataset classes through
accessors to implement particular attributes and behaviours, and adds a
collection of new classes implementing particular data features gathered in
the |feature| subpackage. It also comes with numerous contrib packages
providing xarray backend engines to read additional formats or further
harmonize the content of the different datasets that can be read with Xarray_.

``cerbere`` accessors
---------------------

``cerbere`` provides an accessor, called ``cb``, to both Xarray_
``DataArray`` and and ``Dataset`` classes. Accessors in Xarray_ is the
recommended way of specializing xarray classes.

The ``cb`` :doc:`accessor <dataarray>` for ``DataArray`` objects offers a
set of attributes and methods that enrich the API provided natively by Xarray_.
It includes additional standard attributes (``units``, ``standard_name``, ..
.), helper functions to handle flags_ variables, transform data back and force
to numpy ``MaskedArray`` and better preserve the science data type than
Xarray does, ...

The ``cb`` :doc:`accessor <dataset>` for ``Dataset`` objects offers a set of
attributes and methods that enrich the API provided natively by Xarray_. It
performs harmonization of the ``Dataset`` object coordinates naming,
expression of time and longitude, helper functions to subset these objects,...


``backend`` and ``reader`` classes
----------------------------------

Some datasets deviate largely from the CF/Cerbere format and
content recommendations, and it may be very complex (and too specific) to
guess and handle all these particular cases in the harmonization performed by
``cerbere`` accessors.

Xarray_ provides the BackendEntryPoint_ subclassing mechanism to handle new
format engines (in addition to those readily available such as `netcdf4` etc.
..). It is the preferred way to implement the access to format previously
unknown to Xarray_ ecosystem (BUFR, EPS, ...). We define some
BackendEntryPoint_ classes in contrib packages.

For datasets that can be opened with existing Xarray_ engines, but requiring
significant transformation and postprocessing to harmonize them to CF/Cerbere
requirements, ``reader`` classes can be written for any dataset that requires
such transformation. Loads of additional contrib packages (provided
independently to avoid stacking unnecessary classes or dependencies in your
environment) are already available in the ``cerbere`` ecosystem. You need to
install them in addition to ``cerbere`` core package if you need them.

The complete list of existing ``reader`` classes, and their compatibility with
known EO products is listed in :doc:`compatibility`.

If no ``reader`` class can handle a particular dataset, a new corresponding
``reader`` class must be written. Refer to :doc:`writing_reader` to write
your own ``reader`` class.

Last, ``reader``  classes are registered internally when installed (through
python EntryPoint); they can be dynamically discovered and loaded thanks to a
guessing mechanism based on the file patterns associated with a given reader
class. ``cerbere`` provides a ``open_dataset`` function (similar to the
Xarray_ one) that allows to open a file while fetching transparently for the
user the proper reader:

.. code-block:: python

    import cerbere

    # detects it is a GHRSST file, fetches the correct reader and returns a
    # CF/Cerbere compliant Xarray Dataset object.
    ds = cerbere.open_dataset('./samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc')
    ds


``feature`` classes
-------------------

The |feature| module provides a set of classes implementing each known
observation pattern, using a unique representation for each of them, inspired
from CF convention and other works. It therefore extends above category to
"typed geospatial" xarray_ ``Dataset`` objects, providing further restrictions
on the content and naming of a set of data so that similar patterns of
observation are represented and accessed in the same way, allowing generic
manipulation of observation data and specific handling or display functions.
It consists in a higher level of abstraction above xarray_ ``Dataset``
classes, fitting data into common templates.

A feature object can easily be created from xarray_ ``Dataset`` or using the
``open_feature`` function. With the above example that corresponds to a
satellite swath, we would create a ``Swath`` object as follow:

.. code-block:: python

    import cerbere
    import cerbere.feature

    # detects it is a GHRSST file, fetches the correct reader and returns a
    # CF/Cerbere compliant Xarray Dataset object.
    ds = cerbere.open_dataset('./samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc')
    swath = cerbere.feature.cswath.Swath(ds)
    swath

or more simply, with ``open_feature``:

.. code-block:: python

    import cerbere

    # detects it is a GHRSST file, fetches the correct reader and returns a
    # CF/Cerbere compliant Swath feature object.
    swath = cerbere.open_feature(
        './samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc',
        'Swath')
    swath

Currently managed features include :

 * :class:`~cerbere.feature.cgrid.Grid`
 * :class:`~cerbere.feature.cswath.Swath`
 * :class:`~cerbere.feature.cimage.Image`
 * :class:`~cerbere.feature.cgridtimeseries.GridTimeSeries`
 * :class:`~cerbere.feature.ctrajectory.Trajectory`
 * :class:`~cerbere.feature.cpointtimeseries.PointTimeSeries`
 * :class:`~cerbere.feature.cpoint.Point`
 * :class:`~cerbere.feature.cprofile.Profile`
 * :class:`~cerbere.feature.ctimeseries.TimeSeries`
 * :class:`~cerbere.feature.ctimeseriesprofile.TimeSeriesProfile`
 * :class:`~cerbere.feature.ctrajectoryprofile.TrajectoryProfile`

For more features and more details on their properties, refer to the
<features>_ section.