openeo_udf.api package

Submodules

openeo_udf.api.collection_base module

OpenEO Python UDF interface

class openeo_udf.api.collection_base.CollectionBase(id: str, extent: Union[openeo_udf.api.spatial_extent.SpatialExtent, NoneType] = None, start_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType] = None, end_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType] = None)[source]

Bases: object

This is the base class for raster and vector collection tiles. It implements start time, end time and spatial extent handling.

Some basic tests:

>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0, height=10, width=10)
>>> coll = CollectionBase(id="test", extent=extent)
>>> print(coll)
id: test
extent: top: 100
bottom: 0
right: 100
left: 0
height: 10
width: 10
start_times: None
end_times: None
>>> import pandas
>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0, height=10, width=10)
>>> dates = [pandas.Timestamp('2012-05-01')]
>>> starts = pandas.DatetimeIndex(dates)
>>> dates = [pandas.Timestamp('2012-05-02')]
>>> ends = pandas.DatetimeIndex(dates)
>>> rdc = CollectionBase(id="test", extent=extent,
...                      start_times=starts, end_times=ends)
>>> "extent" in rdc.extent_to_dict()
True
>>> rdc.extent_to_dict()["extent"]["left"] == 0
True
>>> rdc.extent_to_dict()["extent"]["right"] == 100
True
>>> rdc.extent_to_dict()["extent"]["top"] == 100
True
>>> rdc.extent_to_dict()["extent"]["bottom"] == 0
True
>>> rdc.extent_to_dict()["extent"]["height"] == 10
True
>>> rdc.extent_to_dict()["extent"]["width"] == 10
True
>>> import json
>>> json.dumps(rdc.start_times_to_dict())
'{"start_times": ["2012-05-01T00:00:00"]}'
>>> json.dumps(rdc.end_times_to_dict())
'{"end_times": ["2012-05-02T00:00:00"]}'
>>> ct = CollectionBase(id="test")
>>> ct.set_extent_from_dict({"top": 53, "bottom": 50, "right": 30, "left": 24, "height": 0.01, "width": 0.01})
>>> ct.set_start_times_from_list(["2012-05-01T00:00:00"])
>>> ct.set_end_times_from_list(["2012-05-02T00:00:00"])
>>> print(ct)
id: test
extent: top: 53
bottom: 50
right: 30
left: 24
height: 0.01
width: 0.01
start_times: DatetimeIndex(['2012-05-01'], dtype='datetime64[ns]', freq=None)
end_times: DatetimeIndex(['2012-05-02'], dtype='datetime64[ns]', freq=None)
check_data_with_time()[source]

Check if the start and end date vectors have the same size as the data

end_times

Returns the end time vector

Returns:End time vector
Return type:pandas.DatetimeIndex
end_times_to_dict() → Dict[source]

Convert the end times vector into a dictionary representation that can be converted to JSON

Returns:The end times vector
Return type:dict
extent

Return the spatial extent

Returns:The spatial extent
Return type:SpatialExtent
extent_to_dict() → Dict[source]

Convert the extent into a dictionary representation that can be converted to JSON

Returns:The spatial extent
Return type:dict
get_end_times() → Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType][source]

Returns the end time vector

Returns:End time vector
Return type:pandas.DatetimeIndex
get_extent() → openeo_udf.api.spatial_extent.SpatialExtent[source]

Return the spatial extent

Returns:The spatial extent
Return type:SpatialExtent
get_start_times() → Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType][source]

Returns the start time vector

Returns:Start time vector
Return type:pandas.DatetimeIndex
set_end_times(end_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType])[source]

Set the end times vector

Parameters:end_times (pandas.DatetimeIndex) -- The end times vector
set_end_times_from_list(end_times: Dict)[source]

Set the end times vector from a dictionary

Parameters:end_times (dict) -- The dictionary with the layout of the JSON end times vector definition
set_extent(extent: openeo_udf.api.spatial_extent.SpatialExtent)[source]

Set the spatial extent

Parameters:extent (SpatialExtent) -- The spatial extent with resolution information, must be of type SpatialExtent
set_extent_from_dict(extent: Dict)[source]

Set the spatial extent from a dictionary

Parameters:extent (dict) -- The dictionary with the layout of the JSON SpatialExtent definition
set_start_times(start_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType])[source]

Set the start times vector

Parameters:start_times (pandas.DatetimeIndex) -- The start times vector
set_start_times_from_list(start_times: Dict)[source]

Set the start times vector from a dictionary

Parameters:start_times (dict) -- The dictionary with the layout of the JSON start times vector definition
start_times

Returns the start time vector

Returns:Start time vector
Return type:pandas.DatetimeIndex
start_times_to_dict() → Dict[source]

Convert the start times vector into a dictionary representation that can be converted to JSON

Returns:The start times vector
Return type:dict

openeo_udf.api.datacube module

OpenEO Python UDF interface

class openeo_udf.api.datacube.DataCube(array: xarray.core.dataarray.DataArray)[source]

Bases: object

This class is a hypercube representation of multi-dimensional data that stores an xarray and provides methods to convert the xarray into the HyperCube JSON representation

>>> array = xarray.DataArray(numpy.zeros(shape=(2, 3)), coords={'x': [1, 2], 'y': [1, 2, 3]}, dims=('x', 'y'))
>>> array.attrs["description"] = "This is an xarray with two dimensions"
>>> array.name = "testdata"
>>> h = DataCube(array=array)
>>> d = h.to_dict()
>>> d["id"]
'testdata'
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[{'name': 'x', 'coordinates': [1, 2]}, {'name': 'y', 'coordinates': [1, 2, 3]}]
>>> d["description"]
'This is an xarray with two dimensions'
>>> new_h = DataCube.from_dict(d)
>>> d = new_h.to_dict()
>>> d["id"]
'testdata'
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[{'name': 'x', 'coordinates': [1, 2]}, {'name': 'y', 'coordinates': [1, 2, 3]}]
>>> d["description"]
'This is an xarray with two dimensions'
>>> array = xarray.DataArray(numpy.zeros(shape=(2, 3)), coords={'x': [1, 2], 'y': [1, 2, 3]}, dims=('x', 'y'))
>>> h = DataCube(array=array)
>>> d = h.to_dict()
>>> d["id"]
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[{'name': 'x', 'coordinates': [1, 2]}, {'name': 'y', 'coordinates': [1, 2, 3]}]
>>> "description" not in d
True
>>> new_h = DataCube.from_dict(d)
>>> d = new_h.to_dict()
>>> d["id"]
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[{'name': 'x', 'coordinates': [1, 2]}, {'name': 'y', 'coordinates': [1, 2, 3]}]
>>> "description" not in d
True
>>> array = xarray.DataArray(numpy.zeros(shape=(2, 3)))
>>> h = DataCube(array=array)
>>> d = h.to_dict()
>>> d["id"]
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[]
>>> "description" not in d
True
>>> new_h = DataCube.from_dict(d)
>>> d = new_h.to_dict()
>>> d["id"]
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[]
>>> "description" not in d
True
array

Return the xarray.DataArray that contains the data and dimension definition

Returns:that contains the data and dimension definition
Return type:xarray.DataArray
static from_data_collection() → List[_ForwardRef('DataCube')][source]

Create data cubes from a data collection

Parameters:data_collection --
Returns:A list of data cubes
static from_dict() → openeo_udf.api.datacube.DataCube[source]

Create a hypercube from a python dictionary that was created from the JSON definition of the HyperCube

Parameters:hc_dict (dict) -- The dictionary that contains the hypercube definition
Returns:HyperCube
get_array() → xarray.core.dataarray.DataArray[source]

Return the xarray.DataArray that contains the data and dimension definition

Returns:that contains the data and dimension definition
Return type:xarray.DataArray
id
set_array(array: xarray.core.dataarray.DataArray)[source]

Set the xarray.DataArray that contains the data and dimension definition

This function will check if the provided data is a geopandas.GeoDataFrame and raises an Exception

Parameters:array -- xarray.DataArray that contains the data and dimension definition
to_data_collection()[source]
to_dict() → Dict[source]

Convert this hypercube into a dictionary that can be converted into a valid JSON representation

Returns:HyperCube as a dictionary
Return type:dict
>>> example = {
...     "id": "test_data",
...     "data": [
...         [
...             [0.0, 0.1],
...             [0.2, 0.3]
...         ],
...         [
...             [0.0, 0.1],
...             [0.2, 0.3]
...         ]
...     ],
...     "dimension": [{"name": "time", "unit": "ISO:8601", "coordinates":["2001-01-01", "2001-01-02"]},
...                   {"name": "X", "unit": "degree", "coordinates":[50.0, 60.0]},
...                   {"name": "Y", "unit": "degree"},
...                  ]
... }

openeo_udf.api.feature_collection module

OpenEO Python UDF interface

class openeo_udf.api.feature_collection.FeatureCollection(id: str, data: geopandas.geodataframe.GeoDataFrame, start_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType] = None, end_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType] = None)[source]

Bases: openeo_udf.api.collection_base.CollectionBase

A feature collection that represents a subset or a whole feature collection where single vector features may have time stamps assigned.

Some basic tests:

>>> from shapely.geometry import Point
>>> import geopandas
>>> p1 = Point(0,0)
>>> p2 = Point(100,100)
>>> p3 = Point(100,0)
>>> pseries = [p1, p2, p3]
>>> data = geopandas.GeoDataFrame(geometry=pseries, columns=["a", "b"])
>>> data["a"] = [1,2,3]
>>> data["b"] = ["a","b","c"]
>>> fct = FeatureCollection(id="test", data=data)
>>> print(fct)
id: test
start_times: None
end_times: None
data:    a  b         geometry
0  1  a      POINT (0 0)
1  2  b  POINT (100 100)
2  3  c    POINT (100 0)
>>> import json
>>> json.dumps(fct.to_dict()) 
...                           
'{"id": "test", "data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature",
"properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}},
{"id": "1", "type": "Feature", "properties": {"a": 2, "b": "b"}, "geometry": {"type": "Point",
"coordinates": [100.0, 100.0]}}, {"id": "2", "type": "Feature", "properties": {"a": 3, "b": "c"},
"geometry": {"type": "Point", "coordinates": [100.0, 0.0]}}]}}'
>>> p1 = Point(0,0)
>>> pseries = [p1]
>>> data = geopandas.GeoDataFrame(geometry=pseries, columns=["a", "b"])
>>> data["a"] = [1]
>>> data["b"] = ["a"]
>>> dates = [pandas.Timestamp('2012-05-01')]
>>> starts = pandas.DatetimeIndex(dates)
>>> dates = [pandas.Timestamp('2012-05-02')]
>>> ends = pandas.DatetimeIndex(dates)
>>> fct = FeatureCollection(id="test", start_times=starts, end_times=ends, data=data)
>>> print(fct)
id: test
start_times: DatetimeIndex(['2012-05-01'], dtype='datetime64[ns]', freq=None)
end_times: DatetimeIndex(['2012-05-02'], dtype='datetime64[ns]', freq=None)
data:    a  b     geometry
0  1  a  POINT (0 0)
>>> import json
>>> json.dumps(fct.to_dict()) 
...                           
'{"id": "test", "start_times": ["2012-05-01T00:00:00"], "end_times": ["2012-05-02T00:00:00"],
"data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature",
"properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}]}}'
>>> fct = FeatureCollection.from_dict(fct.to_dict())
>>> json.dumps(fct.to_dict()) 
...                           
'{"id": "test", "start_times": ["2012-05-01T00:00:00"], "end_times": ["2012-05-02T00:00:00"],
"data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature",
"properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}]}}'
data

Return the geopandas.GeoDataFrame that contains the geometry column and any number of attribute columns

Returns:A data frame that contains the geometry column and any number of attribute columns
Return type:geopandas.GeoDataFrame
static from_dict()[source]

Create a feature collection from a python dictionary that was created from the JSON definition of the FeatureCollection

Parameters:fct_dict (dict) -- The dictionary that contains the feature collection definition
Returns:A new FeatureCollection object
Return type:FeatureCollection
get_data() → geopandas.geodataframe.GeoDataFrame[source]

Return the geopandas.GeoDataFrame that contains the geometry column and any number of attribute columns

Returns:A data frame that contains the geometry column and any number of attribute columns
Return type:geopandas.GeoDataFrame
set_data(data: geopandas.geodataframe.GeoDataFrame)[source]

Set the geopandas.GeoDataFrame that contains the geometry column and any number of attribute columns

This function will check if the provided data is a geopandas.GeoDataFrame and raises an Exception

Parameters:data (geopandas.GeoDataFrame) -- A GeoDataFrame with geometry column and attribute data
to_dict() → Dict[source]

Convert this FeatureCollection into a dictionary that can be converted into a valid JSON representation

Returns:FeatureCollection as a dictionary
Return type:dict

openeo_udf.api.machine_learn_model module

OpenEO Python UDF interface

class openeo_udf.api.machine_learn_model.MachineLearnModelConfig(framework: str, name: str, description: str, path: Union[str, NoneType] = None, md5_hash: Union[str, NoneType] = None)[source]

Bases: object

This class represents a machine learn model. The model will be loaded at construction, based on the machine learn framework.

The following frameworks are supported:
  • sklearn models that are created with sklearn.externals.joblib
  • pytorch models that are created with torch.save
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.externals import joblib
>>> model = RandomForestRegressor(n_estimators=10, max_depth=2, verbose=0)
>>> path = '/tmp/test.pkl.xz'
>>> dummy = joblib.dump(value=model, filename=path, compress=("xz", 3))
>>> m = MachineLearnModelConfig(framework="sklearn", name="test",
...                       description="Machine learn model", path=path)
>>> m.get_model()
...              
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=2,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)
>>> m.to_dict() 
...             
{'description': 'Machine learn model', 'name': 'test', 'framework': 'sklearn', 'path': '/tmp/test.pkl.xz', 'md5_hash': None}
>>> d = {'description': 'Machine learn model', 'name': 'test', 'framework': 'sklearn',
...      'path': '/tmp/test.pkl.xz', "md5_hash": None}
>>> m = MachineLearnModelConfig.from_dict(d)
>>> m.get_model() 
...               
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=2,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)
>>> import torch
>>> import torch.nn as nn
>>> model = nn.Module
>>> path = '/tmp/test.pt'
>>> torch.save(model, path)
>>> m = MachineLearnModelConfig(framework="pytorch", name="test",
...                       description="Machine learn model", path=path)
>>> m.get_model()
...              
<class 'torch.nn.modules.module.Module'>
>>> m.to_dict() 
...             
{'description': 'Machine learn model', 'name': 'test', 'framework': 'pytorch', 'path': '/tmp/test.pt', 'md5_hash': None}
>>> d = {'description': 'Machine learn model', 'name': 'test', 'framework': 'pytorch',
...      'path': '/tmp/test.pt', "md5_hash": None}
>>> m = MachineLearnModelConfig.from_dict(d)
>>> m.get_model() 
...               
<class 'torch.nn.modules.module.Module'>
static from_dict()[source]
get_model()[source]

Get the loaded machine learn model. This function will return None if the model was not loaded

Returns:the loaded model
load_model()[source]

Load the machine learn model from the path or md5 hash.

Supported model: - sklearn models that are created with sklearn.externals.joblib - pytorch models that are created with torch.save

to_dict() → Dict[source]

openeo_udf.api.run_code module

OpenEO Python UDF interface

openeo_udf.api.run_code.load_module_from_string[source]

Experimental -- avoid loading same UDF module more than once, to make caching inside the udf work. @param code: @return:

openeo_udf.api.run_code.run_legacy_user_code(dict_data: Dict) → Dict[source]

Run the user defined python code on legacy data

Parameters:dict_data -- the udf request object with code and legacy data organized in a dictionary

Returns:

openeo_udf.api.run_code.run_udf_model_user_code(udf_model: openeo_udf.server.data_model.udf_schemas.UdfRequestModel) → openeo_udf.api.udf_data.UdfData[source]

Run the user defined python code

Parameters:python -- the udf request object with code and data collection

Returns:

openeo_udf.api.run_code.run_user_code(code: str, data: openeo_udf.api.udf_data.UdfData) → openeo_udf.api.udf_data.UdfData[source]

openeo_udf.api.spatial_extent module

OpenEO Python UDF interface

class openeo_udf.api.spatial_extent.SpatialExtent(top: float, bottom: float, right: float, left: float, height: Union[float, NoneType] = None, width: Union[float, NoneType] = None)[source]

Bases: object

The axis aligned spatial extent of a collection tile

Some basic tests:

>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0, height=10, width=10)
>>> print(extent)
top: 100
bottom: 0
right: 100
left: 0
height: 10
width: 10
>>> extent.to_index(50, 50)
(5, 5)
>>> extent.to_index(0, 0)
(0, 10)
>>> extent.to_index(100, 0)
(0, 0)
>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0)
>>> print(extent)
top: 100
bottom: 0
right: 100
left: 0
height: None
width: None
>>> p = extent.as_polygon()
>>> print(p)
POLYGON ((0 100, 100 100, 100 0, 0 0, 0 100))
>>> from shapely.wkt import loads
>>> p = loads("POLYGON ((0 100, 100 100, 100 0, 0 0, 0 100))")
>>> extent = SpatialExtent.from_polygon(p)
>>> print(extent)
top: 100.0
bottom: 0.0
right: 100.0
left: 0.0
height: None
width: None
>>> extent.contains_point(50, 50)
True
>>> extent.contains_point(150, 50)
False
>>> extent.contains_point(25, 25)
True
>>> extent.contains_point(101, 101)
False
>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0)
>>> extent.as_polygon() == extent.as_polygon()
True
>>> diff = extent.as_polygon() - extent.as_polygon()
>>> print(diff)
GEOMETRYCOLLECTION EMPTY
>>> extent_1 = SpatialExtent(top=80, bottom=10, right=80, left=10)
>>> extent_2 = SpatialExtent(top=100, bottom=0, right=100, left=0)
>>> extent_1.as_polygon() == extent_2.as_polygon()
False
>>> extent_2.as_polygon().contains(extent_2.as_polygon())
True
as_polygon() → shapely.geometry.polygon.Polygon[source]

Return the extent as shapely.geometry.Polygon to perform comparison operations between other extents like equal, intersect and so on

Returns:The polygon representing the spatial extent
Return type:shapely.geometry.Polygon
contains_point(top: float, left: float) → shapely.geometry.point.Point[source]

Return True if the provided coordinate is located in the spatial extent, False otherwise

Parameters:
  • top (float) -- The top (northern) coordinate of the point
  • left (float) -- The left (western) coordinate of the point
Returns:

True if the coordinates are contained in the extent, False otherwise

Return type:

bool

static from_dict()[source]

Create a SpatialExtent from a python dictionary that was created from the JSON definition of the SpatialExtent

Parameters:extent (dict) -- The dictionary that contains the spatial extent definition
Returns:A new SpatialExtent object
Return type:SpatialExtent
static from_polygon() → openeo_udf.api.spatial_extent.SpatialExtent[source]

Convert a polygon with rectangular shape into a spatial extent

Parameters:polygon (shapely.geometry.Polygon) -- The polygon that should be converted into a spatial extent
Returns:The spatial extent
Return type:SpatialExtent
to_dict() → Dict[source]

Return the spatial extent as a dict that can be easily converted into JSON

Returns:Dictionary representation
Return type:dict
to_index(top: float, left: float) → Tuple[int, int][source]

Return True if the provided coordinate is located in the spatial extent, False otherwise

Parameters:
  • top (float) -- The top (northern) coordinate
  • left (float) -- The left (western) coordinate
Returns:

(x, y) The x, y index

Return type:

tuple(int, int)

openeo_udf.api.structured_data module

OpenEO Python UDF interface

class openeo_udf.api.structured_data.StructuredData(description, data, type)[source]

Bases: object

This class represents structured data that is produced by an UDF and can not be represented as a RasterCollectionTile or FeatureCollectionTile. For example the result of a statistical computation. The data is self descriptive and supports the basic types dict/map, list and table.

The data field contains the UDF specific values (argument or return) as dict, list or table:

  • A dict can be as complex as required by the UDF
  • A list must contain simple data types example {"list": [1,2,3,4]}
  • A table is a list of lists with a header, example {"table": [["id","value"],
    [1, 10], [2, 23], [3, 4]]}
>>> table = [("col_1", "col_2"), (1, 2), (2, 3)]
>>> st = StructuredData(description="Table output", data=table, type="table")
>>> st.to_dict() 
...              
{'description': 'Table output', 'data': [('col_1', 'col_2'), (1, 2), (2, 3)], 'type': 'table'}
>>> values = [1,2,3,4]
>>> st = StructuredData(description="List output", data=values, type="list")
>>> st.to_dict() 
...              
{'description': 'List output', 'data': [1, 2, 3, 4], 'type': 'list'}
>>> key_value_store = dict(a=1, b=2, c=3)
>>> st = StructuredData(description="Key-value output", data=key_value_store, type="dict")
>>> st.to_dict() 
...              
{'description': 'Key-value output', 'data': {'a': 1, 'b': 2, 'c': 3}, 'type': 'dict'}
static from_dict()[source]
to_dict() → Dict[source]

openeo_udf.api.tools module

OpenEO Python UDF interface

openeo_udf.api.tools.create_datacube(name: str, value: float, shape: Tuple = (3, 2, 2), dims: Tuple = ('t', 'x', 'y')) → openeo_udf.api.datacube.DataCube[source]

Create a datacube from shape and dimension parameter. The number of shapes and dimensions must be equal.

openeo_udf.api.udf_data module

OpenEO Python UDF interface

class openeo_udf.api.udf_data.UdfData(proj: Dict = None, datacube_list: Union[typing.List[openeo_udf.api.datacube.DataCube], NoneType] = None, feature_collection_list: Union[typing.List[openeo_udf.api.feature_collection.FeatureCollection], NoneType] = None, structured_data_list: Union[typing.List[openeo_udf.api.structured_data.StructuredData], NoneType] = None, ml_model_list: Union[typing.List[openeo_udf.api.machine_learn_model.MachineLearnModelConfig], NoneType] = None, metadata: openeo_udf.server.data_model.metadata_schema.MetadataModel = None)[source]

Bases: object

The class that stores the arguments for a user defined function (UDF)

Some basic tests:

>>> from shapely.geometry import Point
>>> import geopandas
>>> import numpy, pandas
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.externals import joblib
>>> data = numpy.zeros(shape=(1,1,1))
>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0, height=10, width=10)
>>> starts = pandas.DatetimeIndex([pandas.Timestamp('2012-05-01')])
>>> ends = pandas.DatetimeIndex([pandas.Timestamp('2012-05-02')])
>>> p1 = Point(0,0)
>>> p2 = Point(100,100)
>>> p3 = Point(100,0)
>>> pseries = [p1, p2, p3]
>>> data = geopandas.GeoDataFrame(geometry=pseries, columns=["a", "b"])
>>> data["a"] = [1,2,3]
>>> data["b"] = ["a","b","c"]
>>> C = FeatureCollection(id="C", data=data)
>>> D = FeatureCollection(id="D", data=data)
>>> udf_data = UdfData(proj={"EPSG":4326}, feature_collection_list=[C, D])
>>> model = RandomForestRegressor(n_estimators=10, max_depth=2, verbose=0)
>>> path = '/tmp/test.pkl.xz'
>>> dummy = joblib.dump(value=model, filename=path, compress=("xz", 3))
>>> m = MachineLearnModelConfig(framework="sklearn", name="test",
...                       description="Machine learn model", path=path)
>>> udf_data.append_machine_learn_model(m)
>>> print(udf_data.get_feature_collection_by_id("C"))
id: C
start_times: None
end_times: None
data:    a  b         geometry
0  1  a      POINT (0 0)
1  2  b  POINT (100 100)
2  3  c    POINT (100 0)
>>> print(udf_data.get_feature_collection_by_id("D"))
id: D
start_times: None
end_times: None
data:    a  b         geometry
0  1  a      POINT (0 0)
1  2  b  POINT (100 100)
2  3  c    POINT (100 0)
>>> print(len(udf_data.get_feature_collection_list()) == 2)
True
>>> print(udf_data.ml_model_list[0].path)
/tmp/test.pkl.xz
>>> print(udf_data.ml_model_list[0].framework)
sklearn
>>> import json
>>> json.dumps(udf_data.to_dict()) 
...                                
'{"proj": {"EPSG": 4326}, "user_context": {}, "server_context": {}, "datacubes": [], "feature_collection_list": [{"id": "C", "data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}, {"id": "1", "type": "Feature", "properties": {"a": 2, "b": "b"}, "geometry": {"type": "Point", "coordinates": [100.0, 100.0]}}, {"id": "2", "type": "Feature", "properties": {"a": 3, "b": "c"}, "geometry": {"type": "Point", "coordinates": [100.0, 0.0]}}]}}, {"id": "D", "data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}, {"id": "1", "type": "Feature", "properties": {"a": 2, "b": "b"}, "geometry": {"type": "Point", "coordinates": [100.0, 100.0]}}, {"id": "2", "type": "Feature", "properties": {"a": 3, "b": "c"}, "geometry": {"type": "Point", "coordinates": [100.0, 0.0]}}]}}], "structured_data_list": [], "machine_learn_models": [{"description": "Machine learn model", "name": "test", "framework": "sklearn", "path": "/tmp/test.pkl.xz", "md5_hash": null}]}'
>>> udf = UdfData.from_dict(udf_data.to_dict())
>>> json.dumps(udf.to_dict()) 
...                           
'{"proj": {"EPSG": 4326}, "user_context": {}, "server_context": {}, "datacubes": [], "feature_collection_list": [{"id": "C", "data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}, {"id": "1", "type": "Feature", "properties": {"a": 2, "b": "b"}, "geometry": {"type": "Point", "coordinates": [100.0, 100.0]}}, {"id": "2", "type": "Feature", "properties": {"a": 3, "b": "c"}, "geometry": {"type": "Point", "coordinates": [100.0, 0.0]}}]}}, {"id": "D", "data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}, {"id": "1", "type": "Feature", "properties": {"a": 2, "b": "b"}, "geometry": {"type": "Point", "coordinates": [100.0, 100.0]}}, {"id": "2", "type": "Feature", "properties": {"a": 3, "b": "c"}, "geometry": {"type": "Point", "coordinates": [100.0, 0.0]}}]}}], "structured_data_list": [], "machine_learn_models": [{"description": "Machine learn model", "name": "test", "framework": "sklearn", "path": "/tmp/test.pkl.xz", "md5_hash": null}]}'
>>> sd_list = StructuredData(description="Data list", data={"list":[1,2,3]}, type="list")
>>> sd_dict = StructuredData(description="Data dict", data={"A":{"B": 1}}, type="dict")
>>> udf = UdfData(proj={"EPSG":4326}, structured_data_list=[sd_list, sd_dict])
>>> json.dumps(udf.to_dict()) 
...                           
'{"proj": {"EPSG": 4326}, "user_context": {}, "server_context": {}, "datacubes": [], "feature_collection_list": [], "structured_data_list": [{"description": "Data list", "data": {"list": [1, 2, 3]}, "type": "list"}, {"description": "Data dict", "data": {"A": {"B": 1}}, "type": "dict"}], "machine_learn_models": []}'
>>> array = xarray.DataArray(numpy.zeros(shape=(2, 3)), coords={'x': [1, 2], 'y': [1, 2, 3]}, dims=('x', 'y'))
>>> array.attrs["description"] = "This is an xarray with two dimensions"
>>> array.name = "testdata"
>>> h = DataCube(array=array)
>>> udf_data = UdfData(proj={"EPSG":4326}, datacube_list=[h])
>>> udf_data.user_context = {"kernel": 3}
>>> udf_data.server_context = {"reduction_dimension": "t"}
>>> udf_data.user_context
{'kernel': 3}
>>> udf_data.server_context
{'reduction_dimension': 't'}
>>> print(udf_data.get_datacube_by_id("testdata").to_dict())
{'id': 'testdata', 'data': [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], 'dimensions': [{'name': 'x', 'coordinates': [1, 2]}, {'name': 'y', 'coordinates': [1, 2, 3]}], 'description': 'This is an xarray with two dimensions'}
>>> json.dumps(udf_data.to_dict()) 
...                           
'{"proj": {"EPSG": 4326}, "user_context": {"kernel": 3}, "server_context": {"reduction_dimension": "t"}, "datacubes": [{"id": "testdata", "data": [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], "dimensions": [{"name": "x", "coordinates": [1, 2]}, {"name": "y", "coordinates": [1, 2, 3]}], "description": "This is an xarray with two dimensions"}], "feature_collection_list": [], "structured_data_list": [], "machine_learn_models": []}'
>>> udf = UdfData.from_dict(udf_data.to_dict())
>>> json.dumps(udf.to_dict()) 
...                           
'{"proj": {"EPSG": 4326}, "user_context": {}, "server_context": {}, "datacubes": [{"id": "testdata", "data": [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], "dimensions": [{"name": "x", "coordinates": [1, 2]}, {"name": "y", "coordinates": [1, 2, 3]}], "description": "This is an xarray with two dimensions"}], "feature_collection_list": [], "structured_data_list": [], "machine_learn_models": []}'
append_datacube(datacube: openeo_udf.api.datacube.DataCube)[source]

Append a HyperCube to the list

It will be automatically added to the dictionary of all datacubes

Parameters:datacube (DataCube) -- The HyperCube to append
append_feature_collection(feature_collection_tile: openeo_udf.api.feature_collection.FeatureCollection)[source]

Append a feature collection tile to the list

It will be automatically added to the dictionary of all feature collection tiles

Parameters:feature_collection_tile (FeatureCollection) -- The feature collection tile to append
append_machine_learn_model(machine_learn_model: openeo_udf.api.machine_learn_model.MachineLearnModelConfig)[source]

Append a machine learn model to the list

Parameters:machine_learn_model (MachineLearnModelConfig) -- A MachineLearnModel objects
append_structured_data(structured_data: openeo_udf.api.structured_data.StructuredData)[source]

Append a structured data object to the list

Parameters:structured_data (StructuredData) -- A StructuredData objects
datacube_list

Get the datacube list

del_datacube_list()[source]

Delete all datacubes

del_feature_collection_list()[source]

Delete all feature collection tiles

del_ml_model_list()[source]

Delete all machine learn models

del_structured_data_list()[source]

Delete all structured data entries

feature_collection_list

Get all feature collections as list

Returns:The list of feature collections
Return type:list[FeatureCollection]
static from_dict()[source]

Create a udf data object from a python dictionary that was created from the JSON definition of the UdfData class

Parameters:udf_dict (dict) -- The dictionary that contains the udf data definition
Returns:A new UdfData object
Return type:UdfData
static from_udf_data_model() → UdfData[source]

TODO: Must be implemented

Parameters:udf_model --

Returns:

get_datacube_by_id(id: str) → Union[openeo_udf.api.datacube.DataCube, NoneType][source]

Get a datacube by its id

Parameters:id (str) -- The datacube id
Returns:the requested datacube or None if not found
Return type:HypeCube
get_datacube_list() → Union[typing.List[openeo_udf.api.datacube.DataCube], NoneType][source]

Get the datacube list

get_feature_collection_by_id(id: str) → Union[openeo_udf.api.feature_collection.FeatureCollection, NoneType][source]

Get a feature collection by its id

Parameters:id (str) -- The vector tile id
Returns:the requested feature collection or None if not found
Return type:FeatureCollection
get_feature_collection_list() → Union[typing.List[openeo_udf.api.feature_collection.FeatureCollection], NoneType][source]

Get all feature collections as list

Returns:The list of feature collections
Return type:list[FeatureCollection]
get_ml_model_list() → Union[typing.List[openeo_udf.api.machine_learn_model.MachineLearnModelConfig], NoneType][source]

Get all machine learn models

Returns:A list of MachineLearnModel objects
Return type:(list[MachineLearnModel])
get_structured_data_list() → Union[typing.List[openeo_udf.api.structured_data.StructuredData], NoneType][source]

Get all structured data entries

Returns:A list of StructuredData objects
Return type:(list[StructuredData])
metadata
ml_model_list

Get all machine learn models

Returns:A list of MachineLearnModel objects
Return type:(list[MachineLearnModel])
server_context

Return the server context that is passed from the backend to the UDF server for runtime configuration

set_datacube_list(datacube_list: List[openeo_udf.api.datacube.DataCube])[source]

Set the datacube list

If datacube_list is None, then the list will be cleared

Parameters:datacube_list (List[DataCube]) -- A list of HyperCube's
set_feature_collection_list(feature_collection_list: Union[typing.List[openeo_udf.api.feature_collection.FeatureCollection], NoneType])[source]

Set the feature collection tiles

If feature_collection_tiles is None, then the list will be cleared

Parameters:feature_collection_list (list[FeatureCollection]) -- A list of FeatureCollectionTile's
set_ml_model_list(ml_model_list: Union[typing.List[openeo_udf.api.machine_learn_model.MachineLearnModelConfig], NoneType])[source]

Set the list of machine learn models

If ml_model_list is None, then the list will be cleared

Parameters:ml_model_list (list[MachineLearnModelConfig]) -- A list of MachineLearnModel objects
set_structured_data_list(structured_data_list: Union[typing.List[openeo_udf.api.structured_data.StructuredData], NoneType])[source]

Set the list of structured data

If structured_data_list is None, then the list will be cleared

Parameters:structured_data_list (list[StructuredData]) -- A list of StructuredData objects
structured_data_list

Get all structured data entries

Returns:A list of StructuredData objects
Return type:(list[StructuredData])
to_dict() → Dict[source]

Convert this UdfData object into a dictionary that can be converted into a valid JSON representation

Returns:UdfData object as a dictionary
Return type:dict
user_context

Return the user context that was passed to the run_udf function

openeo_udf.api.udf_signatures module

This module defines a number of function signatures that can be implemented by UDF's. Both the name of the function and the argument types are/can be used by the backend to validate if the provided UDF is compatible with the calling context of the process graph in which it is used.

openeo_udf.api.udf_signatures.apply_datacube(cube: openeo_udf.api.datacube.DataCube, context: Dict) → openeo_udf.api.datacube.DataCube[source]

Map a DataCube to another DataCube. Depending on the context in which this function is used, the DataCube dimensions have to be retained or can be chained. For instance, in the context of a reducing operation along a dimension, that dimension will have to be reduced to a single value. In the context of a 1 to 1 mapping operation, all dimensions have to be retained.

Parameters:
  • cube -- A DataCube object
  • context -- A dictionary containing user context.
Returns:

A DataCube object

openeo_udf.api.udf_signatures.apply_timeseries(series: pandas.core.series.Series, context: Dict) → pandas.core.series.Series[source]

Process a timeseries of values, without changing the time instants. This can for instance be used for smoothing or gap-filling. TODO: do we need geospatial coordinates for the series?

Parameters:
  • series -- A Pandas Series object with a date-time index.
  • context -- A dictionary containing user context.
Returns:

A Pandas Series object with the same datetime index.

openeo_udf.api.udf_wrapper module

openeo_udf.api.udf_wrapper.apply_timeseries(series: pandas.core.series.Series, context: Dict) → pandas.core.series.Series[source]

Do something with the timeseries :param series: :param context: :return:

openeo_udf.api.udf_wrapper.apply_timeseries_generic(udf_data: openeo_udf.api.udf_data.UdfData, callback: Callable = <function apply_timeseries>)[source]

Implements the UDF contract by calling a user provided time series transformation function (apply_timeseries). Multiple bands are currently handled separately, another approach could provide a dataframe with a timeseries for each band.

Parameters:udf_data --
Returns:

Module contents