Overview

Concept

cerbere provides a unified python API to manipulate any type of spatio-temporal observations, which can be read from many existing storage formats using the same set of functions. It also provides classes for specific types of observation features.

Many of these aspects are now handled by other packages, such as xarray. cerbere leverages on xarray for internal data structure but goes further in terms of normalization of feature description and metadata, following standards such as CF convention and other community efforts, and adding a true typology of common observation patterns. This is required to implement generic software, truly independent from the choices of the various data producers. We assume indeed there is no practical reason why data corresponding to the same sampling pattern (feature) would be represented differently, or having different namings for coordinates, dimensions, etc. It extends further xarray by pre-defining the names, dimensions and main attributes required to properly describe an Earth observation feature. Having a set of predefined template for each feature type allows to write once and for all the commonly used generic operations applied to such object, such as display, extraction of values, remapping or resampling, saving to the same format conventions,…

cerbere provides a collection of classes pretty similar to Dataset class (and actually based on it). They are divided into two main subpackages: dataset and feature.

Note

This new version of cerbere is now based on xarray for data structure. All the internal xarray objects can be accessed : users familiar with xarray can use these objects while benefiting on the improved normalization of the data and the dataset classes provided for different formats. cerbere API also provides some class methods matching the xarray API to improve the feel of handling a xarray objects but this is not pure heritage.

dataset

dataset classes are used to read the content of a file in any format to a generic and standardized representation, stored within a xarray Dataset object, normalizing in the process any spatial and temporal information that may be contained in the file. It basically builds a ‘geospatial’ xarray Dataset object.

So…why do we need several dataset classes?

When using different earth observation products, one faces straight away various painful issues:

  • products come in a variety of format, self described or not: NetCDF, HDF, GRIB, BUFR, agency specific formats (at ESA, Eumetsat,…), weird binary storage or worst mixed binary/ASCII formats… xarray does handle some of them but in limited number. None self-described format require specific classes for each different product.

  • even when using a widely used format like netCDF, products come in a variety of conventions for the representation of the same information, whether it is data (time, lat, lon, …) or metadata (coverage start or end time, spatial footprint, …). Even when based on the same convention (such as Climate and Forecast convention - CF -, data producers come with different interpretation and still have significant discrepancies within the products that prevent from using them in a generic way. Subtle transformations of the information content are required by the user that are a waste of time.

cerbere comes with a few built-in dataset classes but most of them are provided through plugin packages, that you need to install only if you need them. Built-in classes include:

  • Dataset

  • NCDataset

  • GHRSSTNCDataset

The complete list of existing dataset classes, and their compatibility with known EO products is listed in Datasets.

If no mapper class exists for a particular format, a new corresponding dataset class must be written by inheriting Dataset. Refer to Writing a new dataset class to write your own dataset class.

Note

the dataset package can be used independently, as a unified API to read the content from any data file.

Note

Field objects are similar to variables in netcdf or DataArray in xarray . A field consists of :

  • an attached variable describing the geophysical quantity provided by the field (together with a few descriptive attributes such standard name, etc…)

  • attributes further documenting the provided observation values (units,…) similar to the variable attributes in netCDF

  • an array of values (observations)

  • an optional array of quality flags (one for each observation value)

  • an optional array of quality history (one for each observation value) documenting the reason why a value was flagged

feature

The feature module provides a set of classes implementing each known observation pattern, using a unique representation for each of them, inspired from CF convention and other works. It therefore extends above category to “typed geospatial” xarray Dataset objects, providing further restrictions on the content of a set of data so that similar patterns of observation are represented and accessed in the same way, allowing generic manipulation of observation data and specific handling or display functions. It consists in a higher level of abstraction above dataset classes, fitting data into common templates.

Currently managed features include :

  • Grid

  • Swath

  • Image

  • GridTimeSeries

  • Trajectory

  • PointTimeSeries

  • Point

  • Profile

  • TimeSeries

  • TimeSeriesProfile

  • TrajectoryProfile

Note

The feature requires the usage of the dataset package to read or write the data into/from a feature object into a specific format (or format convention).

The classes provided in feature modules and listed above correspond to the main sampling patterns usually used for Earth Observation data. Whenever possible, they follow the recommendations of Climate and Forecast (CF) convention.

The following table describes the dimensions and spatio-temporal coordinate (geolocation) fields associated with each feature in feature:

cerbere main features

Feature

Dims [size]

Coords [dims]

Fields [dims]

Swath

row (y)
cell (x)
time (row, cell)
lat (row, cell)
lon (row, cell)

<name> (row, cell)

Image

time (1)
row (y)
cell (x)
time (time)
lat (row, cell)
lon (row, cell)

<name> (row, cell)

Grid

time (1)
y (y)
x (x)
time (time)
lat (y, x)
lon (y, x)

<name> (y, x)

GridTimeSeries

time (t)
y (y)
x (x)
time (time)
lat (y, x)
lon (y, x)

<name> (time, lat, lon)

Point 1

obs (x)

time (obs)
lat (obs)
lon (obs)
depth/alt (obs) [opt]

<name> (obs)

Trajectory 2

time (t)

time (time)
lat (time)
lon (time)
depth/alt (time) [opt]

<name> (time)

Profile 3

profile ()
z (z)
time ()
lat ()
lon ()
alt/depth (z)

<name> (z)

TimeSeries 4

station (1)
time (t)
time (time)
lat (time)
lon (time)
depth/alt (time) [opt]

<name> (time)

TrajectoryProfile 5

profile (n)
z (z)
time (profile)
lat (profile)
lon (profile)
alt/depth (profile, z)

<name> (profile, z)

TimeSeriesProfile 6

profile (n)
z (z)
time (profile)
lat ()
lon ()
alt/depth (profile, z)

<name> (profile, z)

CF references

1

CF Point featureType

2

CF Trajectory featureType

3

CF Profile featureType

4

CF TimeSeries featureType

5

CF TrajectoryProfile featureType

6

CF TimeSeriesProfile featureType

Special features

Additional types of features, representing particular cases of the main features above, are also available. They follow the CF rules defined for these particular cases, when applicable.

Special features

Feature

Dims [size]

Coords [dims]

Fields [dims]

UniZTrajectoryProfile 7

profile (n)
z (z)
time (profile)
lat (profile)
lon (profile)
depth/alt (z)

<name> (profile, z)

CylindricalGrid

time (1)
lat (y)
lon (x)
time (time)
lat (lat, lon)
lon (lat, lon)

<name> (lat, lon)

CF references

7

CF TrajectoryProfile featureType where all the profiles have the same set of vertical coordinates

Collection features

Additional types of features include collection : these are special features that consist in grouping features of the same type into one single dataset along an extra axis, the name of which depends on the grouped feature’s type.

They are described in CF convention and include:

  • orthogonal multidimensional collection of features (OMDCollection)

  • incomplete multidimensional collection of features (IMDCollection)

  • contiguous ragged array collection of features (CRACollection)