Overview

History

cerbere was created in 2010 at Ifremer / CERSAT as a unified python API to manipulate any type of spatio-temporal observations, which can be read from many existing storage formats using the same set of functions. It also provided classes for specific types of observation features.

Many of these aspects are now handled by other packages, such as Xarray, and cerbere has evolved accordingly, taking advantage of these existing contributions and refining them.

So…why do we need Cerbere?

When using different earth observation products, one faces straight away various painful issues:

Earth Observation data products come in a variety of format, self described or not: NetCDF, HDF, GRIB, BUFR, agency specific formats (at ESA, Eumetsat,…), weird binary storage or worst mixed binary/ASCII formats… Xarray does handle some of them but in limited number.

Even when using a widely used format like netCDF, readable with Xarray, these products come in a variety of conventions for the representation of the same information, whether it is data (time, lat, lon, …) or metadata (time coverage start or end, spatial footprint, …). And even when based on the same convention (such as Climate and Forecast convention - CF -, data products come with different or incomplete interpretations of it, and still have significant discrepancies that prevent from using them in a generic way. Subtle tweaking of the variables or attributes naming are required by the user that are a waste of time and complexify code that could be generic and accept different type of inputs.

Besides Xarray has a few shortcomings, notably with regard to the way it handles masked arrays (variables with missing values) in particular for integer variables which are often used in observation data for ancillary fields (flags, sensor ancillary fields,…).

Concept

cerbere now leverages on Xarray for internal data structure but goes further in terms of harmonization of feature description and metadata, following standards such as CF convention and other community efforts (ACDD, GHRSST) for naming coordinates, dimensions and frequently used attributes, and adding a true typology of common Earth Observation patterns. This is required to implement generic software, truly independent from the choices of the various data producers. We assume indeed there is no practical reason why data corresponding to the same sampling pattern (feature) would be represented differently, or would have different namings for coordinates, dimensions, etc . cerbere therefore extends further Xarray by pre-defining the names, dimensions and main attributes required to properly describe an Earth observation feature. Having a set of predefined templates for each feature type allows to write once and for all the commonly used generic operations applied to such object, such as display, extraction of values, remapping or resampling, and - most of all for data producers - saving observation data consistently to the same format and metadata content.

cerbere is the middleware between domain agnostic Xarray and generic usage of Earth Observation data.

How does it work?

Users familiar with Xarray will easily adapt to the additional features provided by cerbere.

cerbere specializes the xarray DataArray and Dataset classes through accessors to implement particular attributes and behaviours, and adds a collection of new classes implementing particular data features gathered in the feature subpackage. It also comes with numerous contrib packages providing xarray backend engines to read additional formats or further harmonize the content of the different datasets that can be read with Xarray.

cerbere accessors

cerbere provides an accessor, called cb, to both Xarray DataArray and and Dataset classes. Accessors in Xarray is the recommended way of specializing xarray classes.

The cb accessor for DataArray objects offers a set of attributes and methods that enrich the API provided natively by Xarray. It includes additional standard attributes (units, standard_name, .. .), helper functions to handle flags variables, transform data back and force to numpy MaskedArray and better preserve the science data type than Xarray does, …

The cb accessor for Dataset objects offers a set of attributes and methods that enrich the API provided natively by Xarray. It performs harmonization of the Dataset object coordinates naming, expression of time and longitude, helper functions to subset these objects,…

backend and reader classes

Some datasets deviate largely from the CF/Cerbere format and content recommendations, and it may be very complex (and too specific) to guess and handle all these particular cases in the harmonization performed by cerbere accessors.

Xarray provides the BackendEntryPoint subclassing mechanism to handle new format engines (in addition to those readily available such as netcdf4 etc. ..). It is the preferred way to implement the access to format previously unknown to Xarray ecosystem (BUFR, EPS, …). We define some BackendEntryPoint classes in contrib packages.

For datasets that can be opened with existing Xarray engines, but requiring significant transformation and postprocessing to harmonize them to CF/Cerbere requirements, reader classes can be written for any dataset that requires such transformation. Loads of additional contrib packages (provided independently to avoid stacking unnecessary classes or dependencies in your environment) are already available in the cerbere ecosystem. You need to install them in addition to cerbere core package if you need them.

The complete list of existing reader classes, and their compatibility with known EO products is listed in Complementary readers.

If no reader class can handle a particular dataset, a new corresponding reader class must be written. Refer to Writing a new reader class to write your own reader class.

Last, reader classes are registered internally when installed (through python EntryPoint); they can be dynamically discovered and loaded thanks to a guessing mechanism based on the file patterns associated with a given reader class. cerbere provides a open_dataset function (similar to the Xarray one) that allows to open a file while fetching transparently for the user the proper reader:

import cerbere

# detects it is a GHRSST file, fetches the correct reader and returns a
# CF/Cerbere compliant Xarray Dataset object.
ds = cerbere.open_dataset('./samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc')
ds

feature classes

The feature module provides a set of classes implementing each known observation pattern, using a unique representation for each of them, inspired from CF convention and other works. It therefore extends above category to “typed geospatial” xarray Dataset objects, providing further restrictions on the content and naming of a set of data so that similar patterns of observation are represented and accessed in the same way, allowing generic manipulation of observation data and specific handling or display functions. It consists in a higher level of abstraction above xarray Dataset classes, fitting data into common templates.

A feature object can easily be created from xarray Dataset or using the open_feature function. With the above example that corresponds to a satellite swath, we would create a Swath object as follow:

import cerbere
import cerbere.feature

# detects it is a GHRSST file, fetches the correct reader and returns a
# CF/Cerbere compliant Xarray Dataset object.
ds = cerbere.open_dataset('./samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc')
swath = cerbere.feature.cswath.Swath(ds)
swath

or more simply, with open_feature:

import cerbere

# detects it is a GHRSST file, fetches the correct reader and returns a
# CF/Cerbere compliant Swath feature object.
swath = cerbere.open_feature(
    './samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc',
    'Swath')
swath

Currently managed features include :

  • Grid

  • Swath

  • Image

  • GridTimeSeries

  • Trajectory

  • PointTimeSeries

  • Point

  • Profile

  • TimeSeries

  • TimeSeriesProfile

  • TrajectoryProfile

For more features and more details on their properties, refer to the <features>_ section.