DataCube construction¶
The load_collection
process¶
The most straightforward way to start building your openEO data cube is through the load_collection
process.
As mentioned earlier, this is provided by the
load_collection()
method
on a Connection
object,
which produces a DataCube
instance.
For example:
cube = connection.load_collection("SENTINEL2_TOC")
While this should cover the majority of use cases,
there some cases
where one wants to build a DataCube
object
from something else or something more than just a simple load_collection
process.
Construct DataCube from process¶
Through user-defined processes one can encapsulate
one or more load_collection
processes and additional processing steps in a single
reusable user-defined process.
For example, imagine a user-defined process “masked_s2”
that loads an openEO collection “SENTINEL2_TOC” and applies some kind of cloud masking.
The implementation details of the cloud masking are not important here,
but let’s assume there is a parameter “dilation” to fine-tune the cloud mask.
Also note that the collection id “SENTINEL2_TOC” is hardcoded in the user-defined process.
We can now construct a data cube from this user-defined process
with datacube_from_process()
as follows:
cube = connection.datacube_from_process("masked_s2", dilation=10)
# Further processing of the cube:
cube = cube.filter_temporal("2020-09-01", "2020-09-10")
Note that datacube_from_process()
can be
used with all kind of processes, not only user-defined processes.
For example, while this is not exactly a real EO data use case,
it will produce a valid openEO process graph that can be executed:
>>> cube = connection.datacube_from_process("mean", data=[2, 3, 5, 8])
>>> cube.execute()
4.5
Construct a DataCube from JSON¶
openEO process graphs are typically stored and published in JSON format. Most notably, user-defined processes are transferred between openEO client and back-end in a JSON structure roughly like in this example:
{
"id": "evi",
"parameters": [
{"name": "red", "schema": {"type": "number"}},
{"name": "blue", "schema": {"type": "number"}},
...
],
"process_graph": {
"sub": {"process_id": "subtract", "arguments": {"x": {"from_parameter": "nir"}, "y": {"from_parameter": "red"}}},
"p1": {"process_id": "multiply", "arguments": {"x": 6, "y": {"from_parameter": "red"}}},
"div": {"process_id": "divide", "arguments": {"x": {"from_node": "sub"}, "y": {"from_node": "sum"}},
...
It is possible to construct a DataCube
object that corresponds with this
process graph with the Connection.datacube_from_json
method.
It can be given one of:
a raw JSON string,
a path to a local JSON file,
an URL that points to a JSON resource
The JSON structure should be one of:
a mapping (dictionary) like the example above with at least a
"process_graph"
item, and optionally a"parameters"
item.a mapping (dictionary) with
{"process_id": ...}
items
Some examples¶
Load a DataCube
from a raw JSON string, containing a
simple “flat graph” representation:
raw_json = '''{
"lc": {"process_id": "load_collection", "arguments": {"id": "SENTINEL2_TOC"}},
"ak": {"process_id": "apply_kernel", "arguments": {"data": {"from_node": "lc"}, "kernel": [[1,2,1],[2,5,2],[1,2,1]]}, "result": true}
}'''
cube = connection.datacube_from_json(raw_json)
Load from a raw JSON string, containing a mapping with “process_graph” and “parameters”:
raw_json = '''{
"parameters": [
{"name": "kernel", "schema": {"type": "array"}, "default": [[1,2,1], [2,5,2], [1,2,1]]}
],
"process_graph": {
"lc": {"process_id": "load_collection", "arguments": {"id": "SENTINEL2_TOC"}},
"ak": {"process_id": "apply_kernel", "arguments": {"data": {"from_node": "lc"}, "kernel": {"from_parameter": "kernel"}}, "result": true}
}
}'''
cube = connection.datacube_from_json(raw_json)
Load directly from a local file or URL containing these kind of JSON representations:
# Local file
cube = connection.datacube_from_json("path/to/my_udp.json")
# URL
cube = connection.datacube_from_json("https://example.com/my_udp.json")
Parameterization¶
When the process graph uses parameters, you must specify the desired parameter values
at the time of calling Connection.datacube_from_json
.
For example, take this simple toy example of a process graph that takes the sum of 5 and a parameter “increment”:
raw_json = '''{"add": {
"process_id": "add",
"arguments": {"x": 5, "y": {"from_parameter": "increment"}},
"result": true
}}'''
Trying to build a DataCube
from it without specifying parameter values will fail
like this:
>>> cube = connection.datacube_from_json(raw_json)
ProcessGraphVisitException: No substitution value for parameter 'increment'.
Instead, specify the parameter value:
>>> cube = connection.datacube_from_json(
... raw_json,
... parameters={"increment": 4},
... )
>>> cube.execute()
9
Parameters can also be defined with default values, which will be used when they are not specified
in the Connection.datacube_from_json
call:
raw_json = '''{
"parameters": [
{"name": "increment", "schema": {"type": "number"}, "default": 100}
],
"process_graph": {
"add": {"process_id": "add", "arguments": {"x": 5, "y": {"from_parameter": "increment"}}, "result": true}
}
}'''
cube = connection.datacube_from_json(raw_json)
result = cube.execute())
# result will be 105
Re-parameterization¶
TODO