Format profiles
cerbere
was designed to ease data management tasks. It can be used to
convert data files from one format to another, or to reformat data to
another formatting convention matching a specific project requirements, with
minimum code effort. For some examples of such operation, refer also to
format_profile.
When saving the content of a dataset
class object, the output
file is formatted following the default settings and conventions implemented in
the save()
method of this class.
The format can also be refined and customized through a external format profile
file that can be passed on when saving a dataset. It provides the directives to
properly format a dataset, using some convention or default settings. In
particular, it can define:
the list of global metadata attributes (and default value)
the list of field metadata attributes (and default value) such as units, standard name, comment, description, reference,…
the encoding parameters used when writing the data on disk, such as (for a NetCDF writer) scale factor, add offset, number of significant digits, compression,…
Let’s format for instance data to GHRSST format (as defined in GDS 2.1 document). We define these requirements in a profile file as follow:
---
# Defines the list and default values of the global attributes of a Cerbere new feature
attributes:
# Description
id:
naming_authority: org.ghrsst
title:
summary:
cdm_data_type:
keywords: Oceans > Ocean Temperature > Sea Surface Temperature
acknowledgement: "Please acknowledge the use of these data with the following statement: these data were produced by the Centre de Recherche et d'Exploitation Satellitaire (CERSAT), at IFREMER, Plouzane (France)"
processing_level:
metadata_link:
comment:
file_quality_level:
# Observation
platform:
platform_type:
instrument:
instrument_type:
band:
# Conventions
Conventions: CF 1.7, ACDD 1.3, ISO 8601
Metadata_Conventions: Climate and Forecast (CF) 1.7, Attribute Convention for Data Discovery (ACDD) 1.3
standard_name_vocabulary: NetCDF Climate and Forecast (CF) Metadata Convention
keywords_vocabulary: NASA Global Change Master Directory (GCMD) Science Keywords
format_version: GDSv1.2
gds_version_id:
platform_vocabulary: CEOS mission table
instrument_vocabulary: CEOS instrument table
# Authorship
institution: Institut Francais de Recherche et d'Exploitation de la Mer (Ifremer) Centre de Recherche et d'Exploitation satellitaire (CERSAT)
institution_abbreviation: Ifremer/CERSAT
project: Group for High Resolution Sea Surface Temperature (GHRSST)
program: CMEMS
license: GHRSST protocol describes data use as free and open.
publisher_name: CERSAT
publisher_url: http://cersat.ifremer.fr
publisher_email: cersat@ifremer.fr
publisher_institution: Ifremer
publisher_type: institution
creator_name: CERSAT
creator_url: http://cersat.ifremer.fr
creator_email: cersat@ifremer.fr
creator_type: institution
creator_institution: Ifremer
contributor_name:
contributor_role:
references:
# Traceability
processing_software: Telemachus 1.0
product_version: 3.0
netcdf_version_id:
uuid:
history:
source:
source_version:
date_created:
date_modified:
date_issued:
date_metadata_modified:
# BBox
geospatial_lat_min:
geospatial_lat_max:
geospatial_lat_units: degrees
geospatial_lon_min:
geospatial_lon_max:
geospatial_lon_units: degrees
geospatial_bounds:
geospatial_bounds_crs: WGS84
# Resolution
spatial_resolution:
geospatial_lat_resolution:
geospatial_lon_resolution:
# Temporal
time_coverage_start:
time_coverage_end:
time_coverage_resolution:
fields:
lat:
standard_name: latitude
units: degrees_north
valid_range: -90, 90
comment: geographical coordinates, WGS84 projection
coordinates: lon lat
lon:
standard_name: longitude
units: degrees_east
valid_range: -180., 180
comment: geographical coordinates, WGS84 projection
time:
long_name: reference time of sst file
standard_name: time
sea_surface_temperature:
long_name: sea surface foundation temperature
standard_name: sea_surface_foundation_temperature
units: kelvin
valid_range: -2., 50.
sst_dtime:
long_name: time difference from reference time
units: seconds
valid_range: -86400, 86400
comment: time plus sst_dtime gives each measurement time
solar_zenith_angle:
long_name: solar zenith angle
units: angular_degree
valid_range: 0, 180
comment: the solar zenith angle at the time of the SST observations
sses_bias:
long_name: SSES bias estimate
units: kelvin
valid_range: -2.54, 2.54
comment: Bias estimate derived using the techniques described at http://www.ghrsst.org/SSES-Description-of-schemes.html
sses_standard_deviation:
long_name: SSES standard deviation
valid_range: 0., 2.54
comment: Standard deviation estimate derived using the techniques described at http://www.ghrsst.org/SSES-Description-of-schemes.html
quality_level:
long_name: quality level of SST pixel
valid_range: 0, 5
flag_meanings: no_data bad_data worst_quality low_quality acceptable_quality best_quality
flag_values: 0, 1, 2, 3, 4, 5
comment: These are the overall quality indicators and are used for all GHRSST SSTs
or_latitude:
units: degrees_north
valid_range: -80., 80
long_name: original latitude of the SST value
standard_name: latitude
or_longitude:
units: degrees_east
valid_range: -180., 180.
long_name: original longitude of the SST value
standard_name: longitude
or_number_of_pixels:
long_name: original number of pixels from the L2Ps contributing to the SST value
valid_range: -32767, 32767
satellite_zenith_angle:
long_name: satellite zenith angle
units: angular_degree
comment: the satellite zenith angle at the time of the SST observations
valid_min: 0
valid_max: 90
adjusted_sea_surface_temperature:
long_name: adjusted collated sea surface temperature
standard_name: sea_surface_subskin_temperature
units: kelvin
comment: bias correction using a multi-sensor reference field
valid_min: -300
valid_max: 4500
encoding:
lat:
dtype: float32
least_significant_digit: 3
lon:
dtype: float32
least_significant_digit: 3
sea_surface_temperature:
dtype: int16
_FillValue: -32768
scale_factor: 0.01
add_offset: 273.15
sst_dtime:
_FillValue: -2147483648
add_offset: 0
scale_factor: 1
dtype: int32
solar_zenith_angle:
_FillValue: -128
add_offset: 90.
scale_factor: 1.
quality_level:
_FillValue: -128
dtype: byte
sses_bias:
_FillValue: -128
dtype: byte
add_offset: 0.
scale_factor: 0.02
sses_standard_deviation:
_FillValue: -128
dtype: byte
add_offset: 2.54
scale_factor: 0.02
or_latitude:
dtype: int16
_FillValue: -32768
add_offset: 0.
scale_factor: 0.01
units: degrees_north
or_longitude:
dtype: int16
_FillValue: -32768
add_offset: 0.
scale_factor: 0.01
or_number_of_pixels:
dtype: byte
_FillValue: -32768
add_offset: 0
scale_factor: 1
satellite_zenith_angle:
dtype: byte
_FillValue: -128
add_offset: 0.
scale_factor: 1.
adjusted_sea_surface_temperature:
dtype: int16
_FillValue: -32768
add_offset: 273.15
scale_factor: 0.01
This profile file can be passed on to the netCDF dedicated dataset object,
provided by cerbere.dataset.ncdataset.NCDataset
class when saving the
object content to disk:
# create a dataset object
from cerbere.dataset.ncdataset import NCDataset
dst = NCDataset()
# save it in a NetCDF file, using above profile and NCDataset class
dst.save('test.nc')
Note that the attributes already defined in the dataset object are not overridden by the default values in the profile file. Attributes not defined in the dataset or feature object to be saved will fall back to their default value defined in the format profile.
# create a NCDataset dataset object and fill in some attributes
# save it, using above profile