openeo_udf.api package¶

Submodules¶

openeo_udf.api.collection_base module¶

OpenEO Python UDF interface

class openeo_udf.api.collection_base.CollectionBase(id: str, extent: Union[openeo_udf.api.spatial_extent.SpatialExtent, NoneType] = None, start_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType] = None, end_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType] = None)[source]¶

Bases: object

This is the base class for raster and vector collection tiles. It implements start time, end time and spatial extent handling.

Some basic tests:

>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0, height=10, width=10)
>>> coll = CollectionBase(id="test", extent=extent)
>>> print(coll)
id: test
extent: top: 100
bottom: 0
right: 100
left: 0
height: 10
width: 10
start_times: None
end_times: None

>>> import pandas
>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0, height=10, width=10)
>>> dates = [pandas.Timestamp('2012-05-01')]
>>> starts = pandas.DatetimeIndex(dates)
>>> dates = [pandas.Timestamp('2012-05-02')]
>>> ends = pandas.DatetimeIndex(dates)
>>> rdc = CollectionBase(id="test", extent=extent,
...                      start_times=starts, end_times=ends)
>>> "extent" in rdc.extent_to_dict()
True
>>> rdc.extent_to_dict()["extent"]["left"] == 0
True
>>> rdc.extent_to_dict()["extent"]["right"] == 100
True
>>> rdc.extent_to_dict()["extent"]["top"] == 100
True
>>> rdc.extent_to_dict()["extent"]["bottom"] == 0
True
>>> rdc.extent_to_dict()["extent"]["height"] == 10
True
>>> rdc.extent_to_dict()["extent"]["width"] == 10
True

>>> import json
>>> json.dumps(rdc.start_times_to_dict())
'{"start_times": ["2012-05-01T00:00:00"]}'
>>> json.dumps(rdc.end_times_to_dict())
'{"end_times": ["2012-05-02T00:00:00"]}'

>>> ct = CollectionBase(id="test")
>>> ct.set_extent_from_dict({"top": 53, "bottom": 50, "right": 30, "left": 24, "height": 0.01, "width": 0.01})
>>> ct.set_start_times_from_list(["2012-05-01T00:00:00"])
>>> ct.set_end_times_from_list(["2012-05-02T00:00:00"])
>>> print(ct)
id: test
extent: top: 53
bottom: 50
right: 30
left: 24
height: 0.01
width: 0.01
start_times: DatetimeIndex(['2012-05-01'], dtype='datetime64[ns]', freq=None)
end_times: DatetimeIndex(['2012-05-02'], dtype='datetime64[ns]', freq=None)

check_data_with_time()[source]¶: Check if the start and end date vectors have the same size as the data

end_times¶

Returns the end time vector

Returns:	End time vector
Return type:	pandas.DatetimeIndex

end_times_to_dict() → Dict[source]¶

Convert the end times vector into a dictionary representation that can be converted to JSON

Returns:	The end times vector
Return type:	dict

extent¶

Return the spatial extent

Returns:	The spatial extent
Return type:	SpatialExtent

extent_to_dict() → Dict[source]¶

Convert the extent into a dictionary representation that can be converted to JSON

Returns:	The spatial extent
Return type:	dict

get_end_times() → Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType][source]¶

Returns the end time vector

Returns:	End time vector
Return type:	pandas.DatetimeIndex

get_extent() → openeo_udf.api.spatial_extent.SpatialExtent[source]¶

Return the spatial extent

Returns:	The spatial extent
Return type:	SpatialExtent

get_start_times() → Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType][source]¶

Returns the start time vector

Returns:	Start time vector
Return type:	pandas.DatetimeIndex

set_end_times(end_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType])[source]¶

Set the end times vector

Parameters:	end_times (pandas.DatetimeIndex) -- The end times vector

set_end_times_from_list(end_times: Dict)[source]¶

Set the end times vector from a dictionary

Parameters:	end_times (dict) -- The dictionary with the layout of the JSON end times vector definition

set_extent(extent: openeo_udf.api.spatial_extent.SpatialExtent)[source]¶

Set the spatial extent

Parameters:	extent (SpatialExtent) -- The spatial extent with resolution information, must be of type SpatialExtent

set_extent_from_dict(extent: Dict)[source]¶

Set the spatial extent from a dictionary

Parameters:	extent (dict) -- The dictionary with the layout of the JSON SpatialExtent definition

set_start_times(start_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType])[source]¶

Set the start times vector

Parameters:	start_times (pandas.DatetimeIndex) -- The start times vector

set_start_times_from_list(start_times: Dict)[source]¶

Set the start times vector from a dictionary

Parameters:	start_times (dict) -- The dictionary with the layout of the JSON start times vector definition

start_times¶

Returns the start time vector

Returns:	Start time vector
Return type:	pandas.DatetimeIndex

start_times_to_dict() → Dict[source]¶

Convert the start times vector into a dictionary representation that can be converted to JSON

Returns:	The start times vector
Return type:	dict

openeo_udf.api.datacube module¶

OpenEO Python UDF interface

class openeo_udf.api.datacube.DataCube(array: xarray.core.dataarray.DataArray)[source]¶

Bases: object

This class is a hypercube representation of multi-dimensional data that stores an xarray and provides methods to convert the xarray into the HyperCube JSON representation

>>> array = xarray.DataArray(numpy.zeros(shape=(2, 3)), coords={'x': [1, 2], 'y': [1, 2, 3]}, dims=('x', 'y'))
>>> array.attrs["description"] = "This is an xarray with two dimensions"
>>> array.name = "testdata"
>>> h = DataCube(array=array)
>>> d = h.to_dict()
>>> d["id"]
'testdata'
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[{'name': 'x', 'coordinates': [1, 2]}, {'name': 'y', 'coordinates': [1, 2, 3]}]
>>> d["description"]
'This is an xarray with two dimensions'

>>> new_h = DataCube.from_dict(d)
>>> d = new_h.to_dict()
>>> d["id"]
'testdata'
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[{'name': 'x', 'coordinates': [1, 2]}, {'name': 'y', 'coordinates': [1, 2, 3]}]
>>> d["description"]
'This is an xarray with two dimensions'

>>> array = xarray.DataArray(numpy.zeros(shape=(2, 3)), coords={'x': [1, 2], 'y': [1, 2, 3]}, dims=('x', 'y'))
>>> h = DataCube(array=array)
>>> d = h.to_dict()
>>> d["id"]
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[{'name': 'x', 'coordinates': [1, 2]}, {'name': 'y', 'coordinates': [1, 2, 3]}]
>>> "description" not in d
True

>>> new_h = DataCube.from_dict(d)
>>> d = new_h.to_dict()
>>> d["id"]
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[{'name': 'x', 'coordinates': [1, 2]}, {'name': 'y', 'coordinates': [1, 2, 3]}]
>>> "description" not in d
True

>>> array = xarray.DataArray(numpy.zeros(shape=(2, 3)))
>>> h = DataCube(array=array)
>>> d = h.to_dict()
>>> d["id"]
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[]
>>> "description" not in d
True

>>> new_h = DataCube.from_dict(d)
>>> d = new_h.to_dict()
>>> d["id"]
>>> d["data"]
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
>>> d["dimensions"]
[]
>>> "description" not in d
True

array¶

Return the xarray.DataArray that contains the data and dimension definition

Returns:	that contains the data and dimension definition
Return type:	xarray.DataArray

static from_data_collection() → List[_ForwardRef('DataCube')][source]¶

Create data cubes from a data collection

Parameters:	data_collection --
Returns:	A list of data cubes

static from_dict() → openeo_udf.api.datacube.DataCube[source]¶

Create a hypercube from a python dictionary that was created from the JSON definition of the HyperCube

Parameters:	hc_dict (dict) -- The dictionary that contains the hypercube definition
Returns:	HyperCube

get_array() → xarray.core.dataarray.DataArray[source]¶

Return the xarray.DataArray that contains the data and dimension definition

Returns:	that contains the data and dimension definition
Return type:	xarray.DataArray

id¶

set_array(array: xarray.core.dataarray.DataArray)[source]¶

Set the xarray.DataArray that contains the data and dimension definition

This function will check if the provided data is a geopandas.GeoDataFrame and raises an Exception

Parameters:	array -- xarray.DataArray that contains the data and dimension definition

to_data_collection()[source]¶

to_dict() → Dict[source]¶

Convert this hypercube into a dictionary that can be converted into a valid JSON representation

Returns:	HyperCube as a dictionary
Return type:	dict

>>> example = {
...     "id": "test_data",
...     "data": [
...         [
...             [0.0, 0.1],
...             [0.2, 0.3]
...         ],
...         [
...             [0.0, 0.1],
...             [0.2, 0.3]
...         ]
...     ],
...     "dimension": [{"name": "time", "unit": "ISO:8601", "coordinates":["2001-01-01", "2001-01-02"]},
...                   {"name": "X", "unit": "degree", "coordinates":[50.0, 60.0]},
...                   {"name": "Y", "unit": "degree"},
...                  ]
... }

openeo_udf.api.feature_collection module¶

OpenEO Python UDF interface

class openeo_udf.api.feature_collection.FeatureCollection(id: str, data: geopandas.geodataframe.GeoDataFrame, start_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType] = None, end_times: Union[pandas.core.indexes.datetimes.DatetimeIndex, NoneType] = None)[source]¶

Bases: openeo_udf.api.collection_base.CollectionBase

A feature collection that represents a subset or a whole feature collection where single vector features may have time stamps assigned.

Some basic tests:

>>> from shapely.geometry import Point
>>> import geopandas
>>> p1 = Point(0,0)
>>> p2 = Point(100,100)
>>> p3 = Point(100,0)
>>> pseries = [p1, p2, p3]
>>> data = geopandas.GeoDataFrame(geometry=pseries, columns=["a", "b"])
>>> data["a"] = [1,2,3]
>>> data["b"] = ["a","b","c"]
>>> fct = FeatureCollection(id="test", data=data)
>>> print(fct)
id: test
start_times: None
end_times: None
data:    a  b         geometry
0  1  a      POINT (0 0)
1  2  b  POINT (100 100)
2  3  c    POINT (100 0)
>>> import json
>>> json.dumps(fct.to_dict()) 
...                           
'{"id": "test", "data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature",
"properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}},
{"id": "1", "type": "Feature", "properties": {"a": 2, "b": "b"}, "geometry": {"type": "Point",
"coordinates": [100.0, 100.0]}}, {"id": "2", "type": "Feature", "properties": {"a": 3, "b": "c"},
"geometry": {"type": "Point", "coordinates": [100.0, 0.0]}}]}}'

>>> p1 = Point(0,0)
>>> pseries = [p1]
>>> data = geopandas.GeoDataFrame(geometry=pseries, columns=["a", "b"])
>>> data["a"] = [1]
>>> data["b"] = ["a"]
>>> dates = [pandas.Timestamp('2012-05-01')]
>>> starts = pandas.DatetimeIndex(dates)
>>> dates = [pandas.Timestamp('2012-05-02')]
>>> ends = pandas.DatetimeIndex(dates)
>>> fct = FeatureCollection(id="test", start_times=starts, end_times=ends, data=data)
>>> print(fct)
id: test
start_times: DatetimeIndex(['2012-05-01'], dtype='datetime64[ns]', freq=None)
end_times: DatetimeIndex(['2012-05-02'], dtype='datetime64[ns]', freq=None)
data:    a  b     geometry
0  1  a  POINT (0 0)

>>> import json
>>> json.dumps(fct.to_dict()) 
...                           
'{"id": "test", "start_times": ["2012-05-01T00:00:00"], "end_times": ["2012-05-02T00:00:00"],
"data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature",
"properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}]}}'

>>> fct = FeatureCollection.from_dict(fct.to_dict())
>>> json.dumps(fct.to_dict()) 
...                           
'{"id": "test", "start_times": ["2012-05-01T00:00:00"], "end_times": ["2012-05-02T00:00:00"],
"data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature",
"properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}]}}'

data¶

Return the geopandas.GeoDataFrame that contains the geometry column and any number of attribute columns

Returns:	A data frame that contains the geometry column and any number of attribute columns
Return type:	geopandas.GeoDataFrame

static from_dict()[source]¶

Create a feature collection from a python dictionary that was created from the JSON definition of the FeatureCollection

Parameters:	fct_dict (dict) -- The dictionary that contains the feature collection definition
Returns:	A new FeatureCollection object
Return type:	FeatureCollection

get_data() → geopandas.geodataframe.GeoDataFrame[source]¶

Return the geopandas.GeoDataFrame that contains the geometry column and any number of attribute columns

Returns:	A data frame that contains the geometry column and any number of attribute columns
Return type:	geopandas.GeoDataFrame

set_data(data: geopandas.geodataframe.GeoDataFrame)[source]¶

Set the geopandas.GeoDataFrame that contains the geometry column and any number of attribute columns

This function will check if the provided data is a geopandas.GeoDataFrame and raises an Exception

Parameters:	data (geopandas.GeoDataFrame) -- A GeoDataFrame with geometry column and attribute data

to_dict() → Dict[source]¶

Convert this FeatureCollection into a dictionary that can be converted into a valid JSON representation

Returns:	FeatureCollection as a dictionary
Return type:	dict

openeo_udf.api.machine_learn_model module¶

OpenEO Python UDF interface

class openeo_udf.api.machine_learn_model.MachineLearnModelConfig(framework: str, name: str, description: str, path: Union[str, NoneType] = None, md5_hash: Union[str, NoneType] = None)[source]¶

Bases: object

This class represents a machine learn model. The model will be loaded at construction, based on the machine learn framework.

The following frameworks are supported:

sklearn models that are created with sklearn.externals.joblib
pytorch models that are created with torch.save

>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.externals import joblib
>>> model = RandomForestRegressor(n_estimators=10, max_depth=2, verbose=0)
>>> path = '/tmp/test.pkl.xz'
>>> dummy = joblib.dump(value=model, filename=path, compress=("xz", 3))
>>> m = MachineLearnModelConfig(framework="sklearn", name="test",
...                       description="Machine learn model", path=path)
>>> m.get_model()
...              
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=2,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)
>>> m.to_dict() 
...             
{'description': 'Machine learn model', 'name': 'test', 'framework': 'sklearn', 'path': '/tmp/test.pkl.xz', 'md5_hash': None}
>>> d = {'description': 'Machine learn model', 'name': 'test', 'framework': 'sklearn',
...      'path': '/tmp/test.pkl.xz', "md5_hash": None}
>>> m = MachineLearnModelConfig.from_dict(d)
>>> m.get_model() 
...               
RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=2,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

>>> import torch
>>> import torch.nn as nn
>>> model = nn.Module
>>> path = '/tmp/test.pt'
>>> torch.save(model, path)
>>> m = MachineLearnModelConfig(framework="pytorch", name="test",
...                       description="Machine learn model", path=path)
>>> m.get_model()
...              
<class 'torch.nn.modules.module.Module'>
>>> m.to_dict() 
...             
{'description': 'Machine learn model', 'name': 'test', 'framework': 'pytorch', 'path': '/tmp/test.pt', 'md5_hash': None}
>>> d = {'description': 'Machine learn model', 'name': 'test', 'framework': 'pytorch',
...      'path': '/tmp/test.pt', "md5_hash": None}
>>> m = MachineLearnModelConfig.from_dict(d)
>>> m.get_model() 
...               
<class 'torch.nn.modules.module.Module'>

static from_dict()[source]¶

get_model()[source]¶

Get the loaded machine learn model. This function will return None if the model was not loaded

Returns:	the loaded model

load_model()[source]¶

Load the machine learn model from the path or md5 hash.

Supported model: - sklearn models that are created with sklearn.externals.joblib - pytorch models that are created with torch.save

to_dict() → Dict[source]¶

openeo_udf.api.run_code module¶

OpenEO Python UDF interface

openeo_udf.api.run_code.load_module_from_string[source]¶: Experimental -- avoid loading same UDF module more than once, to make caching inside the udf work. @param code: @return:

openeo_udf.api.run_code.run_legacy_user_code(dict_data: Dict) → Dict[source]¶

Run the user defined python code on legacy data

Parameters:	dict_data -- the udf request object with code and legacy data organized in a dictionary

Returns:

openeo_udf.api.run_code.run_udf_model_user_code(udf_model: openeo_udf.server.data_model.udf_schemas.UdfRequestModel) → openeo_udf.api.udf_data.UdfData[source]¶

Run the user defined python code

Parameters:	python -- the udf request object with code and data collection

Returns:

openeo_udf.api.run_code.run_user_code(code: str, data: openeo_udf.api.udf_data.UdfData) → openeo_udf.api.udf_data.UdfData[source]¶

openeo_udf.api.spatial_extent module¶

OpenEO Python UDF interface

class openeo_udf.api.spatial_extent.SpatialExtent(top: float, bottom: float, right: float, left: float, height: Union[float, NoneType] = None, width: Union[float, NoneType] = None)[source]¶

Bases: object

The axis aligned spatial extent of a collection tile

Some basic tests:

>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0, height=10, width=10)
>>> print(extent)
top: 100
bottom: 0
right: 100
left: 0
height: 10
width: 10
>>> extent.to_index(50, 50)
(5, 5)
>>> extent.to_index(0, 0)
(0, 10)
>>> extent.to_index(100, 0)
(0, 0)

>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0)
>>> print(extent)
top: 100
bottom: 0
right: 100
left: 0
height: None
width: None
>>> p = extent.as_polygon()
>>> print(p)
POLYGON ((0 100, 100 100, 100 0, 0 0, 0 100))

>>> from shapely.wkt import loads
>>> p = loads("POLYGON ((0 100, 100 100, 100 0, 0 0, 0 100))")
>>> extent = SpatialExtent.from_polygon(p)
>>> print(extent)
top: 100.0
bottom: 0.0
right: 100.0
left: 0.0
height: None
width: None
>>> extent.contains_point(50, 50)
True
>>> extent.contains_point(150, 50)
False
>>> extent.contains_point(25, 25)
True
>>> extent.contains_point(101, 101)
False

>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0)
>>> extent.as_polygon() == extent.as_polygon()
True
>>> diff = extent.as_polygon() - extent.as_polygon()
>>> print(diff)
GEOMETRYCOLLECTION EMPTY

>>> extent_1 = SpatialExtent(top=80, bottom=10, right=80, left=10)
>>> extent_2 = SpatialExtent(top=100, bottom=0, right=100, left=0)
>>> extent_1.as_polygon() == extent_2.as_polygon()
False
>>> extent_2.as_polygon().contains(extent_2.as_polygon())
True

as_polygon() → shapely.geometry.polygon.Polygon[source]¶

Return the extent as shapely.geometry.Polygon to perform comparison operations between other extents like equal, intersect and so on

Returns:	The polygon representing the spatial extent
Return type:	shapely.geometry.Polygon

contains_point(top: float, left: float) → shapely.geometry.point.Point[source]¶

Return True if the provided coordinate is located in the spatial extent, False otherwise

Parameters:	top (float) -- The top (northern) coordinate of the point left (float) -- The left (western) coordinate of the point
Returns:	True if the coordinates are contained in the extent, False otherwise
Return type:	bool

static from_dict()[source]¶

Create a SpatialExtent from a python dictionary that was created from the JSON definition of the SpatialExtent

Parameters:	extent (dict) -- The dictionary that contains the spatial extent definition
Returns:	A new SpatialExtent object
Return type:	SpatialExtent

static from_polygon() → openeo_udf.api.spatial_extent.SpatialExtent[source]¶

Convert a polygon with rectangular shape into a spatial extent

Parameters:	polygon (shapely.geometry.Polygon) -- The polygon that should be converted into a spatial extent
Returns:	The spatial extent
Return type:	SpatialExtent

to_dict() → Dict[source]¶

Return the spatial extent as a dict that can be easily converted into JSON

Returns:	Dictionary representation
Return type:	dict

to_index(top: float, left: float) → Tuple[int, int][source]¶

Return True if the provided coordinate is located in the spatial extent, False otherwise

Parameters:	top (float) -- The top (northern) coordinate left (float) -- The left (western) coordinate
Returns:	(x, y) The x, y index
Return type:	tuple(int, int)

openeo_udf.api.structured_data module¶

OpenEO Python UDF interface

class openeo_udf.api.structured_data.StructuredData(description, data, type)[source]¶

Bases: object

This class represents structured data that is produced by an UDF and can not be represented as a RasterCollectionTile or FeatureCollectionTile. For example the result of a statistical computation. The data is self descriptive and supports the basic types dict/map, list and table.

The data field contains the UDF specific values (argument or return) as dict, list or table:

A dict can be as complex as required by the UDF

A list must contain simple data types example {"list": [1,2,3,4]}

A table is a list of lists with a header, example {"table": [["id","value"],

[1, 10], [2, 23], [3, 4]]}

>>> table = [("col_1", "col_2"), (1, 2), (2, 3)]
>>> st = StructuredData(description="Table output", data=table, type="table")
>>> st.to_dict() 
...              
{'description': 'Table output', 'data': [('col_1', 'col_2'), (1, 2), (2, 3)], 'type': 'table'}

>>> values = [1,2,3,4]
>>> st = StructuredData(description="List output", data=values, type="list")
>>> st.to_dict() 
...              
{'description': 'List output', 'data': [1, 2, 3, 4], 'type': 'list'}

>>> key_value_store = dict(a=1, b=2, c=3)
>>> st = StructuredData(description="Key-value output", data=key_value_store, type="dict")
>>> st.to_dict() 
...              
{'description': 'Key-value output', 'data': {'a': 1, 'b': 2, 'c': 3}, 'type': 'dict'}

static from_dict()[source]¶

to_dict() → Dict[source]¶

openeo_udf.api.tools module¶

OpenEO Python UDF interface

openeo_udf.api.tools.create_datacube(name: str, value: float, shape: Tuple = (3, 2, 2), dims: Tuple = ('t', 'x', 'y')) → openeo_udf.api.datacube.DataCube[source]¶: Create a datacube from shape and dimension parameter. The number of shapes and dimensions must be equal.

openeo_udf.api.udf_data module¶

OpenEO Python UDF interface

class openeo_udf.api.udf_data.UdfData(proj: Dict = None, datacube_list: Union[typing.List[openeo_udf.api.datacube.DataCube], NoneType] = None, feature_collection_list: Union[typing.List[openeo_udf.api.feature_collection.FeatureCollection], NoneType] = None, structured_data_list: Union[typing.List[openeo_udf.api.structured_data.StructuredData], NoneType] = None, ml_model_list: Union[typing.List[openeo_udf.api.machine_learn_model.MachineLearnModelConfig], NoneType] = None, metadata: openeo_udf.server.data_model.metadata_schema.MetadataModel = None)[source]¶

Bases: object

The class that stores the arguments for a user defined function (UDF)

Some basic tests:

>>> from shapely.geometry import Point
>>> import geopandas
>>> import numpy, pandas
>>> from sklearn.ensemble import RandomForestRegressor
>>> from sklearn.externals import joblib
>>> data = numpy.zeros(shape=(1,1,1))
>>> extent = SpatialExtent(top=100, bottom=0, right=100, left=0, height=10, width=10)
>>> starts = pandas.DatetimeIndex([pandas.Timestamp('2012-05-01')])
>>> ends = pandas.DatetimeIndex([pandas.Timestamp('2012-05-02')])
>>> p1 = Point(0,0)
>>> p2 = Point(100,100)
>>> p3 = Point(100,0)
>>> pseries = [p1, p2, p3]
>>> data = geopandas.GeoDataFrame(geometry=pseries, columns=["a", "b"])
>>> data["a"] = [1,2,3]
>>> data["b"] = ["a","b","c"]
>>> C = FeatureCollection(id="C", data=data)
>>> D = FeatureCollection(id="D", data=data)
>>> udf_data = UdfData(proj={"EPSG":4326}, feature_collection_list=[C, D])
>>> model = RandomForestRegressor(n_estimators=10, max_depth=2, verbose=0)
>>> path = '/tmp/test.pkl.xz'
>>> dummy = joblib.dump(value=model, filename=path, compress=("xz", 3))
>>> m = MachineLearnModelConfig(framework="sklearn", name="test",
...                       description="Machine learn model", path=path)
>>> udf_data.append_machine_learn_model(m)
>>> print(udf_data.get_feature_collection_by_id("C"))
id: C
start_times: None
end_times: None
data:    a  b         geometry
0  1  a      POINT (0 0)
1  2  b  POINT (100 100)
2  3  c    POINT (100 0)
>>> print(udf_data.get_feature_collection_by_id("D"))
id: D
start_times: None
end_times: None
data:    a  b         geometry
0  1  a      POINT (0 0)
1  2  b  POINT (100 100)
2  3  c    POINT (100 0)
>>> print(len(udf_data.get_feature_collection_list()) == 2)
True
>>> print(udf_data.ml_model_list[0].path)
/tmp/test.pkl.xz
>>> print(udf_data.ml_model_list[0].framework)
sklearn

>>> import json
>>> json.dumps(udf_data.to_dict()) 
...                                
'{"proj": {"EPSG": 4326}, "user_context": {}, "server_context": {}, "datacubes": [], "feature_collection_list": [{"id": "C", "data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}, {"id": "1", "type": "Feature", "properties": {"a": 2, "b": "b"}, "geometry": {"type": "Point", "coordinates": [100.0, 100.0]}}, {"id": "2", "type": "Feature", "properties": {"a": 3, "b": "c"}, "geometry": {"type": "Point", "coordinates": [100.0, 0.0]}}]}}, {"id": "D", "data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}, {"id": "1", "type": "Feature", "properties": {"a": 2, "b": "b"}, "geometry": {"type": "Point", "coordinates": [100.0, 100.0]}}, {"id": "2", "type": "Feature", "properties": {"a": 3, "b": "c"}, "geometry": {"type": "Point", "coordinates": [100.0, 0.0]}}]}}], "structured_data_list": [], "machine_learn_models": [{"description": "Machine learn model", "name": "test", "framework": "sklearn", "path": "/tmp/test.pkl.xz", "md5_hash": null}]}'

>>> udf = UdfData.from_dict(udf_data.to_dict())
>>> json.dumps(udf.to_dict()) 
...                           
'{"proj": {"EPSG": 4326}, "user_context": {}, "server_context": {}, "datacubes": [], "feature_collection_list": [{"id": "C", "data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}, {"id": "1", "type": "Feature", "properties": {"a": 2, "b": "b"}, "geometry": {"type": "Point", "coordinates": [100.0, 100.0]}}, {"id": "2", "type": "Feature", "properties": {"a": 3, "b": "c"}, "geometry": {"type": "Point", "coordinates": [100.0, 0.0]}}]}}, {"id": "D", "data": {"type": "FeatureCollection", "features": [{"id": "0", "type": "Feature", "properties": {"a": 1, "b": "a"}, "geometry": {"type": "Point", "coordinates": [0.0, 0.0]}}, {"id": "1", "type": "Feature", "properties": {"a": 2, "b": "b"}, "geometry": {"type": "Point", "coordinates": [100.0, 100.0]}}, {"id": "2", "type": "Feature", "properties": {"a": 3, "b": "c"}, "geometry": {"type": "Point", "coordinates": [100.0, 0.0]}}]}}], "structured_data_list": [], "machine_learn_models": [{"description": "Machine learn model", "name": "test", "framework": "sklearn", "path": "/tmp/test.pkl.xz", "md5_hash": null}]}'

>>> sd_list = StructuredData(description="Data list", data={"list":[1,2,3]}, type="list")
>>> sd_dict = StructuredData(description="Data dict", data={"A":{"B": 1}}, type="dict")
>>> udf = UdfData(proj={"EPSG":4326}, structured_data_list=[sd_list, sd_dict])
>>> json.dumps(udf.to_dict()) 
...                           
'{"proj": {"EPSG": 4326}, "user_context": {}, "server_context": {}, "datacubes": [], "feature_collection_list": [], "structured_data_list": [{"description": "Data list", "data": {"list": [1, 2, 3]}, "type": "list"}, {"description": "Data dict", "data": {"A": {"B": 1}}, "type": "dict"}], "machine_learn_models": []}'

>>> array = xarray.DataArray(numpy.zeros(shape=(2, 3)), coords={'x': [1, 2], 'y': [1, 2, 3]}, dims=('x', 'y'))
>>> array.attrs["description"] = "This is an xarray with two dimensions"
>>> array.name = "testdata"
>>> h = DataCube(array=array)
>>> udf_data = UdfData(proj={"EPSG":4326}, datacube_list=[h])
>>> udf_data.user_context = {"kernel": 3}
>>> udf_data.server_context = {"reduction_dimension": "t"}
>>> udf_data.user_context
{'kernel': 3}
>>> udf_data.server_context
{'reduction_dimension': 't'}
>>> print(udf_data.get_datacube_by_id("testdata").to_dict())
{'id': 'testdata', 'data': [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], 'dimensions': [{'name': 'x', 'coordinates': [1, 2]}, {'name': 'y', 'coordinates': [1, 2, 3]}], 'description': 'This is an xarray with two dimensions'}
>>> json.dumps(udf_data.to_dict()) 
...                           
'{"proj": {"EPSG": 4326}, "user_context": {"kernel": 3}, "server_context": {"reduction_dimension": "t"}, "datacubes": [{"id": "testdata", "data": [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], "dimensions": [{"name": "x", "coordinates": [1, 2]}, {"name": "y", "coordinates": [1, 2, 3]}], "description": "This is an xarray with two dimensions"}], "feature_collection_list": [], "structured_data_list": [], "machine_learn_models": []}'

>>> udf = UdfData.from_dict(udf_data.to_dict())
>>> json.dumps(udf.to_dict()) 
...                           
'{"proj": {"EPSG": 4326}, "user_context": {}, "server_context": {}, "datacubes": [{"id": "testdata", "data": [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], "dimensions": [{"name": "x", "coordinates": [1, 2]}, {"name": "y", "coordinates": [1, 2, 3]}], "description": "This is an xarray with two dimensions"}], "feature_collection_list": [], "structured_data_list": [], "machine_learn_models": []}'

append_datacube(datacube: openeo_udf.api.datacube.DataCube)[source]¶

Append a HyperCube to the list

It will be automatically added to the dictionary of all datacubes

Parameters:	datacube (DataCube) -- The HyperCube to append

append_feature_collection(feature_collection_tile: openeo_udf.api.feature_collection.FeatureCollection)[source]¶

Append a feature collection tile to the list

It will be automatically added to the dictionary of all feature collection tiles

Parameters:	feature_collection_tile (FeatureCollection) -- The feature collection tile to append

append_machine_learn_model(machine_learn_model: openeo_udf.api.machine_learn_model.MachineLearnModelConfig)[source]¶

Append a machine learn model to the list

Parameters:	machine_learn_model (MachineLearnModelConfig) -- A MachineLearnModel objects

append_structured_data(structured_data: openeo_udf.api.structured_data.StructuredData)[source]¶

Append a structured data object to the list

Parameters:	structured_data (StructuredData) -- A StructuredData objects

datacube_list¶: Get the datacube list

del_datacube_list()[source]¶: Delete all datacubes

del_feature_collection_list()[source]¶: Delete all feature collection tiles

del_ml_model_list()[source]¶: Delete all machine learn models

del_structured_data_list()[source]¶: Delete all structured data entries

feature_collection_list¶

Get all feature collections as list

Returns:	The list of feature collections
Return type:	list[FeatureCollection]

static from_dict()[source]¶

Create a udf data object from a python dictionary that was created from the JSON definition of the UdfData class

Parameters:	udf_dict (dict) -- The dictionary that contains the udf data definition
Returns:	A new UdfData object
Return type:	UdfData

static from_udf_data_model() → UdfData[source]¶

TODO: Must be implemented

Parameters:	udf_model --

Returns:

get_datacube_by_id(id: str) → Union[openeo_udf.api.datacube.DataCube, NoneType][source]¶

Get a datacube by its id

Parameters:	id (str) -- The datacube id
Returns:	the requested datacube or None if not found
Return type:	HypeCube

get_datacube_list() → Union[typing.List[openeo_udf.api.datacube.DataCube], NoneType][source]¶: Get the datacube list

get_feature_collection_by_id(id: str) → Union[openeo_udf.api.feature_collection.FeatureCollection, NoneType][source]¶

Get a feature collection by its id

Parameters:	id (str) -- The vector tile id
Returns:	the requested feature collection or None if not found
Return type:	FeatureCollection

get_feature_collection_list() → Union[typing.List[openeo_udf.api.feature_collection.FeatureCollection], NoneType][source]¶

Get all feature collections as list

Returns:	The list of feature collections
Return type:	list[FeatureCollection]

get_ml_model_list() → Union[typing.List[openeo_udf.api.machine_learn_model.MachineLearnModelConfig], NoneType][source]¶

Get all machine learn models

Returns:	A list of MachineLearnModel objects
Return type:	(list[MachineLearnModel])

get_structured_data_list() → Union[typing.List[openeo_udf.api.structured_data.StructuredData], NoneType][source]¶

Get all structured data entries

Returns:	A list of StructuredData objects
Return type:	(list[StructuredData])

metadata¶

ml_model_list¶

Get all machine learn models

Returns:	A list of MachineLearnModel objects
Return type:	(list[MachineLearnModel])

server_context¶: Return the server context that is passed from the backend to the UDF server for runtime configuration

set_datacube_list(datacube_list: List[openeo_udf.api.datacube.DataCube])[source]¶

Set the datacube list

If datacube_list is None, then the list will be cleared

Parameters:	datacube_list (List[DataCube]) -- A list of HyperCube's

set_feature_collection_list(feature_collection_list: Union[typing.List[openeo_udf.api.feature_collection.FeatureCollection], NoneType])[source]¶

Set the feature collection tiles

If feature_collection_tiles is None, then the list will be cleared

Parameters:	feature_collection_list (list[FeatureCollection]) -- A list of FeatureCollectionTile's

set_ml_model_list(ml_model_list: Union[typing.List[openeo_udf.api.machine_learn_model.MachineLearnModelConfig], NoneType])[source]¶

Set the list of machine learn models

If ml_model_list is None, then the list will be cleared

Parameters:	ml_model_list (list[MachineLearnModelConfig]) -- A list of MachineLearnModel objects

set_structured_data_list(structured_data_list: Union[typing.List[openeo_udf.api.structured_data.StructuredData], NoneType])[source]¶

Set the list of structured data

If structured_data_list is None, then the list will be cleared

Parameters:	structured_data_list (list[StructuredData]) -- A list of StructuredData objects

structured_data_list¶

Get all structured data entries

Returns:	A list of StructuredData objects
Return type:	(list[StructuredData])

to_dict() → Dict[source]¶

Convert this UdfData object into a dictionary that can be converted into a valid JSON representation

Returns:	UdfData object as a dictionary
Return type:	dict

user_context¶: Return the user context that was passed to the run_udf function

openeo_udf.api.udf_signatures module¶

This module defines a number of function signatures that can be implemented by UDF's. Both the name of the function and the argument types are/can be used by the backend to validate if the provided UDF is compatible with the calling context of the process graph in which it is used.

openeo_udf.api.udf_signatures.apply_datacube(cube: openeo_udf.api.datacube.DataCube, context: Dict) → openeo_udf.api.datacube.DataCube[source]¶

Map a DataCube to another DataCube. Depending on the context in which this function is used, the DataCube dimensions have to be retained or can be chained. For instance, in the context of a reducing operation along a dimension, that dimension will have to be reduced to a single value. In the context of a 1 to 1 mapping operation, all dimensions have to be retained.

Parameters:	cube -- A DataCube object context -- A dictionary containing user context.
Returns:	A DataCube object

openeo_udf.api.udf_signatures.apply_timeseries(series: pandas.core.series.Series, context: Dict) → pandas.core.series.Series[source]¶

Process a timeseries of values, without changing the time instants. This can for instance be used for smoothing or gap-filling. TODO: do we need geospatial coordinates for the series?

Parameters:	series -- A Pandas Series object with a date-time index. context -- A dictionary containing user context.
Returns:	A Pandas Series object with the same datetime index.

openeo_udf.api.udf_wrapper module¶

openeo_udf.api.udf_wrapper.apply_timeseries(series: pandas.core.series.Series, context: Dict) → pandas.core.series.Series[source]¶: Do something with the timeseries :param series: :param context: :return:

openeo_udf.api.udf_wrapper.apply_timeseries_generic(udf_data: openeo_udf.api.udf_data.UdfData, callback: Callable = <function apply_timeseries>)[source]¶

Implements the UDF contract by calling a user provided time series transformation function (apply_timeseries). Multiple bands are currently handled separately, another approach could provide a dataframe with a timeseries for each band.

Parameters:	udf_data --
Returns:

openeo_udf.api package¶

Submodules¶

openeo_udf.api.collection_base module¶

openeo_udf.api.datacube module¶

openeo_udf.api.feature_collection module¶

openeo_udf.api.machine_learn_model module¶

openeo_udf.api.run_code module¶

openeo_udf.api.spatial_extent module¶

openeo_udf.api.structured_data module¶

openeo_udf.api.tools module¶

openeo_udf.api.udf_data module¶

openeo_udf.api.udf_signatures module¶

openeo_udf.api.udf_wrapper module¶

Module contents¶

Table Of Contents

Previous topic

Next topic

This Page