============================ Reading or creating datasets ============================ .. |dataset| replace:: :mod:`~cerbere.dataset` .. |Dataset| replace:: :mod:`~cerbere.dataset.dataset.Dataset` .. |NCDataset| replace:: :class:`~cerbere.dataset.ncdataset.NCDataset` .. _xarray: http://xarray.pydata.org Reading from a file =================== The |dataset| package and other contribution packages provide various classes to read and standardize the format and content of EO data products. To each EO data product type should correspond a specific class in |dataset| to read its content. Some of these classes, such as the |NCDataset| for CF compliant NetCDF files, can read a wide range of EO products sharing similar format conventions. Each class derives from the main |Dataset| base class and inherits all its methods. To read data from a file, first instantiate a |dataset| object of the corresponding class, specifying the path to this file. For instance, let's create a dataset object from a Mercator Ocean Model file (test file available at ftp://ftp.ifremer.fr/ifremer/cersat/projects/cerbere/test_data/NCDataset/mercatorpsy4v3r1_gl12_hrly_20200219_R20200210.nc ). It is a CF compliant NetCDF file and we can then use the |NCDataset] class: >>> import cerbere >>> # instantiate the dataset object with the file path as argument >>> dst = cerbere.open_dataset( >>> ... 'NCDataset', "mercatorpsy4v3r1_gl12_hrly_20200219_R20200210.nc") or, directly importing the |NCDataset| class: >>> from cerbere.internals.ncdataset import NCDataset >>> # instantiate the dataset object with the file path as argument >>> dst = NCDataset("mercatorpsy4v3r1_gl12_hrly_20200219_R20200210.nc") Print the dataset description: >>> print(dst) A Dataset can also be created from a list of files. >>> from cerbere.internals.ncdataset import NCDataset >>> # instantiate the dataset object with the file path as argument >>> dst = NCDataset("mercatorpsy4v3r1_gl12_hrly_20200219_R20200210.nc") Print the dataset description: >>> print(dst) A Dataset can also be created from a list of files. >>> from cerbere.dataset.ncdataset import NCDataset >>> # instantiate the dataset object with the file path as argument >>> dst = NCDataset("mercatorpsy4v3r1_gl12_hrly_20200219_R20200210.nc") Print the dataset description: >>> print(dst) A Dataset can also be created from a list of files. Creating a new dataset ====================== A |Dataset| class object (or from an inherited class in |dataset| package) can be created in memory without pre-existing file. A |Dataset| object can be created in different ways: * from a xarray_ :class:`~xarray.Dataset` object * using xarray_ ``data_vars``, ``coords``, ``attrs`` arguments * from a dict, using xarray_ syntax (as in xarray_ :meth:`from_dict`) * from another cerbere |dataset| object Creating a Dataset from an xarray_ :class:`~xarray.Dataset` object ------------------------------------------------------------------ The xarray_ :class:`~xarray.Dataset` object must have latitude, longitude and time coordinates with valid `cerbere` names (``lat``, ``lon``, ``time``): >>> import xarray as xr >>> import numpy as np >>> xrobj = xr.Dataset( coords={ 'lat': np.arange(0,10, 0.1), 'lon': np.arange(5,15, 0.1), 'time': np.full((100,), np.datetime64(('2010-02-03'), dtype='D')) }, data_vars={'myvar': (('time',), np.ones(100))} ) >>> dst = Dataset(xrobj) Creating a dataset from a dictionary ------------------------------------ Using the same syntax as xarray (see: http://xarray.pydata.org/en/stable/generated/xarray.Dataset.from_dict.html#xarray.Dataset.from_dict ) by providing these arguments as a dictionary. The provided dict must have latitude, longitude and time coordinates with valid **cerbere** names (``lat``, ``lon``, ``time``, optionally ``z``): >>> from cerbere.dataset.internals import Dataset >>> import numpy as np >>> from datetime import datetime >>> dst = Dataset( ... {'time': {'dims': ('time'), 'data': [datetime(2018, 1, 1)]}, ... 'lat': {'dims': ('lat'), 'data': np.arange(-80, 80, 1)}, ... 'lon': {'dims': ('lon',), 'data': np.arange(-180, 180, 1)}, ... 'myvar': {'dims': ('lat', 'lon',), ... 'data': np.ones(shape=(160, 360))} ... } ... ) >>> print(dst) Dataset: Dataset Feature Dims : . lat : 160 . lon : 360 . time : 1 Other Dims : Feature Coordinates : . time (time: 1) . lat (lat: 160) . lon (lon: 360) Other Coordinates : Fields : . myvar (lat: 160, lon: 360) Global Attributes : . time_coverage_start 2018-01-01 00:00:00 . time_coverage_end 2018-01-01 00:00:00 Another syntax accepted by >>> from cerbere.dataset.internals import Dataset >>> import numpy as np >>> from datetime import datetime >>> dst = Dataset( ... {'time': {'dims': ('time'), 'data': [datetime(2018, 1, 1)]}, ... 'lat': {'dims': ('lat'), 'data': np.arange(-80, 80, 1)}, ... 'lon': {'dims': ('lon',), 'data': np.arange(-180, 180, 1)}, ... 'myvar': {'dims': ('lat', 'lon',), ... 'data': np.ones(shape=(160, 360))} ... } ... ) >>> print(dst) Dataset: Dataset Feature Dims : . lat : 160 . lon : 360 . time : 1 Other Dims : Feature Coordinates : . time (time: 1) . lat (lat: 160) . lon (lon: 360) Other Coordinates : Fields : . myvar (lat: 160, lon: 360) Global Attributes : . time_coverage_start 2018-01-01 00:00:00 . time_coverage_end 2018-01-01 00:00:00 Another syntax accepted by >>> from cerbere.internals.internals import Dataset >>> import numpy as np >>> from datetime import datetime >>> dst = Dataset( ... {'time': {'dims': ('time'), 'data': [datetime(2018, 1, 1)]}, ... 'lat': {'dims': ('lat'), 'data': np.arange(-80, 80, 1)}, ... 'lon': {'dims': ('lon',), 'data': np.arange(-180, 180, 1)}, ... 'myvar': {'dims': ('lat', 'lon',), ... 'data': np.ones(shape=(160, 360))} ... } ... ) >>> print(dst) Dataset: Dataset Feature Dims : . lat : 160 . lon : 360 . time : 1 Other Dims : Feature Coordinates : . time (time: 1) . lat (lat: 160) . lon (lon: 360) Other Coordinates : Fields : . myvar (lat: 160, lon: 360) Global Attributes : . time_coverage_start 2018-01-01 00:00:00 . time_coverage_end 2018-01-01 00:00:00 Another syntax accepted by >>> from cerbere.internals.internals import Dataset >>> import numpy as np >>> from datetime import datetime >>> dst = Dataset( ... {'time': {'dims': ('time'), 'data': [datetime(2018, 1, 1)]}, ... 'lat': {'dims': ('lat'), 'data': np.arange(-80, 80, 1)}, ... 'lon': {'dims': ('lon',), 'data': np.arange(-180, 180, 1)}, ... 'myvar': {'dims': ('lat', 'lon',), ... 'data': np.ones(shape=(160, 360))} ... } ... ) >>> print(dst) Dataset: Dataset Feature Dims : . lat : 160 . lon : 360 . time : 1 Other Dims : Feature Coordinates : . time (time: 1) . lat (lat: 160) . lon (lon: 360) Other Coordinates : Fields : . myvar (lat: 160, lon: 360) Global Attributes : . time_coverage_start 2018-01-01 00:00:00 . time_coverage_end 2018-01-01 00:00:00 Another syntax accepted by >>> from cerbere.dataset.dataset import Dataset >>> import numpy as np >>> from datetime import datetime >>> dst = Dataset( ... {'time': {'dims': ('time'), 'data': [datetime(2018, 1, 1)]}, ... 'lat': {'dims': ('lat'), 'data': np.arange(-80, 80, 1)}, ... 'lon': {'dims': ('lon',), 'data': np.arange(-180, 180, 1)}, ... 'myvar': {'dims': ('lat', 'lon',), ... 'data': np.ones(shape=(160, 360))} ... } ... ) >>> print(dst) Dataset: Dataset Feature Dims : . lat : 160 . lon : 360 . time : 1 Other Dims : Feature Coordinates : . time (time: 1) . lat (lat: 160) . lon (lon: 360) Other Coordinates : Fields : . myvar (lat: 160, lon: 360) Global Attributes : . time_coverage_start 2018-01-01 00:00:00 . time_coverage_end 2018-01-01 00:00:00 Another syntax accepted by xarray_ provides explicit coordinates (``coords``), fields (``data_vars``), dimensions (``dims``) and global attributes (``attrs``), which, again, have to be passed as a dictionary to the |Dataset| creator: >>> dst = Dataset({ ... 'coords': { ... 'time': {'dims': ('time'), 'data': [datetime(2018, 1, 1)], ... 'attrs': {'units': 'seconds since 2001-01-01 00:00:00'}}, ... 'lat': {'dims': ('lat'), 'data': np.arange(-80, 80, 1)}, ... 'lon': {'dims': ('lon',), 'data': np.arange(-180, 180, 1)}}, ... 'attrs': {'gattr1': 'gattr_val'}, ... 'dims': ('time', 'lon', 'lat'), ... 'data_vars': {'myvar': {'dims': ('lat', 'lon',), ... 'data': np.ones(shape=(160, 360))}}} ... ) >>> print(dst) Dataset: Dataset Feature Dims : . lat : 160 . lon : 360 . time : 1 Other Dims : Feature Coordinates : . time (time: 1) . lat (lat: 160) . lon (lon: 360) Other Coordinates : Fields : . myvar (lat: 160, lon: 360) Global Attributes : . gattr1 gattr_val . time_coverage_start 2018-01-01 00:00:00 . time_coverage_end 2018-01-01 00:00:00 **cerbere** :class:`~cerbere.dataset.field.Field` objects can also be mixed in: >>> from cerbere.internals.field import Field >>> field = Field( ... np.ones(shape=(160, 360)), ... 'myvar', ... dims=('lat', 'lon',), ... attrs={'myattr': 'attr_val'} ... ) >>> dst = Dataset( ... {'time': {'dims': ('time'), 'data': [datetime(2018, 1, 1)]}, ... 'lat': {'dims': ('lat'), 'data': np.arange(-80, 80, 1)}, ... 'lon': {'dims': ('lon',), 'data': np.arange(-180, 180, 1)}, ... 'myvar': field ... } ... ) >>> print(dst) Dataset: Dataset Feature Dims : . lat : 160 . lon : 360 . time : 1 Other Dims : Feature Coordinates : . time (time: 1) . lat (lat: 160) . lon (lon: 360) Other Coordinates : Fields : . myvar (lat: 160, lon: 360) Global Attributes : . time_coverage_start 2018-01-01 00:00:00 . time_coverage_end 2018-01-01 00:00:00 >>> from cerbere.internals.field import Field >>> field = Field( ... np.ones(shape=(160, 360)), ... 'myvar', ... dims=('lat', 'lon',), ... attrs={'myattr': 'attr_val'} ... ) >>> dst = Dataset( ... {'time': {'dims': ('time'), 'data': [datetime(2018, 1, 1)]}, ... 'lat': {'dims': ('lat'), 'data': np.arange(-80, 80, 1)}, ... 'lon': {'dims': ('lon',), 'data': np.arange(-180, 180, 1)}, ... 'myvar': field ... } ... ) >>> print(dst) Dataset: Dataset Feature Dims : . lat : 160 . lon : 360 . time : 1 Other Dims : Feature Coordinates : . time (time: 1) . lat (lat: 160) . lon (lon: 360) Other Coordinates : Fields : . myvar (lat: 160, lon: 360) Global Attributes : . time_coverage_start 2018-01-01 00:00:00 . time_coverage_end 2018-01-01 00:00:00 >>> from cerbere.dataset.field import Field >>> field = Field( ... np.ones(shape=(160, 360)), ... 'myvar', ... dims=('lat', 'lon',), ... attrs={'myattr': 'attr_val'} ... ) >>> dst = Dataset( ... {'time': {'dims': ('time'), 'data': [datetime(2018, 1, 1)]}, ... 'lat': {'dims': ('lat'), 'data': np.arange(-80, 80, 1)}, ... 'lon': {'dims': ('lon',), 'data': np.arange(-180, 180, 1)}, ... 'myvar': field ... } ... ) >>> print(dst) Dataset: Dataset Feature Dims : . lat : 160 . lon : 360 . time : 1 Other Dims : Feature Coordinates : . time (time: 1) . lat (lat: 160) . lon (lon: 360) Other Coordinates : Fields : . myvar (lat: 160, lon: 360) Global Attributes : . time_coverage_start 2018-01-01 00:00:00 . time_coverage_end 2018-01-01 00:00:00 Creating a dataset from xarray_ arguments ------------------------------------ Using the same syntax as xarray (see: http://xarray.pydata.org/en/stable/data-structures.html#dataset ). The provided coords must have latitude, longitude and time coordinates with valid **cerbere** names (``lat``, ``lon``, ``time``, optionally ``z``) and the same goes for dimensions: >>> dst = Dataset( ... {'myvar': (['lat', 'lon'], np.ones(shape=(160, 360)))}, ... coords={ ... 'time': (['time'], [datetime(2018, 1, 1)], {'units': 'seconds since 2001-01-01 00:00:00'}), ... 'lat': (['lat'], np.arange(-80, 80, 1)), ... 'lon': (['lon'], np.arange(-180, 180, 1)) ... }, ... attrs={'gattr1': 'gattr_val'} ... )