Overview
History
cerbere
was created in 2010 at Ifremer / CERSAT as a unified python API to
manipulate any type of spatio-temporal observations, which can be read from
many existing storage formats using the same set of functions. It also
provided classes for specific types of observation features.
Many of these aspects are now handled by other packages, such as Xarray, and
cerbere
has evolved accordingly, taking advantage of these existing
contributions and refining them.
So…why do we need Cerbere?
When using different earth observation products, one faces straight away various painful issues:
Earth Observation data products come in a variety of format, self described or not: NetCDF, HDF, GRIB, BUFR, agency specific formats (at ESA, Eumetsat,…), weird binary storage or worst mixed binary/ASCII formats… Xarray does handle some of them but in limited number.
Even when using a widely used format like netCDF, readable with Xarray, these products come in a variety of conventions for the representation of the same information, whether it is data (time, lat, lon, …) or metadata (time coverage start or end, spatial footprint, …). And even when based on the same convention (such as Climate and Forecast convention - CF -, data products come with different or incomplete interpretations of it, and still have significant discrepancies that prevent from using them in a generic way. Subtle tweaking of the variables or attributes naming are required by the user that are a waste of time and complexify code that could be generic and accept different type of inputs.
Besides Xarray has a few shortcomings, notably with regard to the way it handles masked arrays (variables with missing values) in particular for integer variables which are often used in observation data for ancillary fields (flags, sensor ancillary fields,…).
Concept
cerbere
now leverages on Xarray for internal data structure but goes
further in terms of harmonization of feature description and metadata, following
standards such as CF convention and other community efforts (ACDD, GHRSST)
for naming coordinates, dimensions and frequently used attributes, and
adding a true typology of common Earth Observation patterns. This is required
to implement generic software, truly independent from the choices of the various
data producers. We assume indeed there is no practical reason why data
corresponding to the same sampling pattern (feature) would be represented
differently, or would have different namings for coordinates, dimensions, etc
. cerbere
therefore extends further Xarray by pre-defining the names,
dimensions and main attributes required to properly describe an Earth
observation feature. Having a set of predefined templates for each feature
type allows to write once and for all the commonly used generic operations
applied to such object, such as display, extraction of values, remapping or
resampling, and - most of all for data producers - saving observation data
consistently to the same format and metadata content.
cerbere
is the middleware between domain agnostic Xarray and generic
usage of Earth Observation data.
How does it work?
Users familiar with Xarray will easily adapt to the additional features
provided by cerbere
.
cerbere
specializes the xarray DataArray and Dataset classes through
accessors to implement particular attributes and behaviours, and adds a
collection of new classes implementing particular data features gathered in
the feature
subpackage. It also comes with numerous contrib packages
providing xarray backend engines to read additional formats or further
harmonize the content of the different datasets that can be read with Xarray.
cerbere
accessors
cerbere
provides an accessor, called cb
, to both Xarray
DataArray
and and Dataset
classes. Accessors in Xarray is the
recommended way of specializing xarray classes.
The cb
accessor for DataArray
objects offers a
set of attributes and methods that enrich the API provided natively by Xarray.
It includes additional standard attributes (units
, standard_name
, ..
.), helper functions to handle flags variables, transform data back and force
to numpy MaskedArray
and better preserve the science data type than
Xarray does, …
The cb
accessor for Dataset
objects offers a set of
attributes and methods that enrich the API provided natively by Xarray. It
performs harmonization of the Dataset
object coordinates naming,
expression of time and longitude, helper functions to subset these objects,…
backend
and reader
classes
Some datasets deviate largely from the CF/Cerbere format and
content recommendations, and it may be very complex (and too specific) to
guess and handle all these particular cases in the harmonization performed by
cerbere
accessors.
Xarray provides the BackendEntryPoint subclassing mechanism to handle new format engines (in addition to those readily available such as netcdf4 etc. ..). It is the preferred way to implement the access to format previously unknown to Xarray ecosystem (BUFR, EPS, …). We define some BackendEntryPoint classes in contrib packages.
For datasets that can be opened with existing Xarray engines, but requiring
significant transformation and postprocessing to harmonize them to CF/Cerbere
requirements, reader
classes can be written for any dataset that requires
such transformation. Loads of additional contrib packages (provided
independently to avoid stacking unnecessary classes or dependencies in your
environment) are already available in the cerbere
ecosystem. You need to
install them in addition to cerbere
core package if you need them.
The complete list of existing reader
classes, and their compatibility with
known EO products is listed in Complementary readers.
If no reader
class can handle a particular dataset, a new corresponding
reader
class must be written. Refer to Writing a new reader class to write
your own reader
class.
Last, reader
classes are registered internally when installed (through
python EntryPoint); they can be dynamically discovered and loaded thanks to a
guessing mechanism based on the file patterns associated with a given reader
class. cerbere
provides a open_dataset
function (similar to the
Xarray one) that allows to open a file while fetching transparently for the
user the proper reader:
import cerbere
# detects it is a GHRSST file, fetches the correct reader and returns a
# CF/Cerbere compliant Xarray Dataset object.
ds = cerbere.open_dataset('./samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc')
ds
feature
classes
The feature
module provides a set of classes implementing each known
observation pattern, using a unique representation for each of them, inspired
from CF convention and other works. It therefore extends above category to
“typed geospatial” xarray Dataset
objects, providing further restrictions
on the content and naming of a set of data so that similar patterns of
observation are represented and accessed in the same way, allowing generic
manipulation of observation data and specific handling or display functions.
It consists in a higher level of abstraction above xarray Dataset
classes, fitting data into common templates.
A feature object can easily be created from xarray Dataset
or using the
open_feature
function. With the above example that corresponds to a
satellite swath, we would create a Swath
object as follow:
import cerbere
import cerbere.feature
# detects it is a GHRSST file, fetches the correct reader and returns a
# CF/Cerbere compliant Xarray Dataset object.
ds = cerbere.open_dataset('./samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc')
swath = cerbere.feature.cswath.Swath(ds)
swath
or more simply, with open_feature
:
import cerbere
# detects it is a GHRSST file, fetches the correct reader and returns a
# CF/Cerbere compliant Swath feature object.
swath = cerbere.open_feature(
'./samples/20190719000110-MAR-L2P_GHRSST-SSTskin-SLSTRA-20190719021531-v02.0-fv01.0.nc',
'Swath')
swath
Currently managed features include :
Grid
Swath
Image
GridTimeSeries
Trajectory
PointTimeSeries
Point
Profile
TimeSeries
TimeSeriesProfile
TrajectoryProfile
For more features and more details on their properties, refer to the <features>_ section.