Getting Started

Connect to an openEO back-end

First, establish a connection to an openEO back-end, using its connection URL. For example the VITO/Terrascope backend:

import openeo

connection = openeo.connect("openeo.vito.be")

The resulting Connection object is your central gateway to

  • list data collections, available processes, file formats and other capabilities of the back-end

  • start building your openEO algorithm from the desired data on the back-end

  • execute and monitor (batch) jobs on the back-end

  • etc.

See also

Use the openEO Hub to explore different back-end options and their capabilities in a web-based way.

Collection discovery

The Earth observation data (the input of your openEO jobs) is organised in so-called collections, e.g. fundamental satellite collections like “Sentinel 1” or “Sentinel 2”, or preprocessed collections like “NDVI”.

You can programmatically list the collections that are available on a back-end and their metadata using methods on the connection object we just created (like list_collection_ids() or describe_collection()

>>> # Get all collection ids
>>> connection.list_collection_ids()
['SENTINEL1_GRD', 'SENTINEL2_L2A', ...

>>> # Get metadata of a single collection
>>> connection.describe_collection("SENTINEL2_L2A")
{'id': 'SENTINEL2_L2A', 'title': 'Sentinel-2 top of canopy ...', 'stac_version': '0.9.0', ...

Congrats, you now just did your first real openEO queries to the openEO back-end using the openEO Python client library.

Tip

The openEO Python client library comes with Jupyter (notebook) integration in a couple of places. For example, put connection.describe_collection("SENTINEL2_L2A") (without print()) as last statement in a notebook cell and you’ll get a nice graphical rendering of the collection metadata.

See also

Find out more about data discovery, loading and filtering at Finding and loading data.

Authentication

In the code snippets above we did not need to log in as a user since we just queried publicly available back-end information. However, to run non-trivial processing queries one has to authenticate so that permissions, resource usage, etc. can be managed properly.

To handle authentication, openEO leverages OpenID Connect (OIDC). It offers some interesting features (e.g. a user can securely reuse an existing account), but is a fairly complex topic, discussed in more depth at Authentication and Account Management.

The openEO Python client library tries to make authentication as streamlined as possible. In most cases for example, the following snippet is enough to obtain an authenticated connection:

import openeo

connection = openeo.connect("openeo.vito.be").authenticate_oidc()

This statement will automatically reuse a previously authenticated session, when available. Otherwise, e.g. the first time you do this, some user interaction is required and it will print a web link and a short user code, for example:

To authenticate: visit https://aai.egi.eu/auth/realms/egi/device and enter the user code 'SLUO-BMUD'.

Visit this web page in a browser, log in there with an existing account and enter the user code. If everything goes well, the connection object in the script will be authenticated and the back-end will be able to identify you in subsequent requests.

Example use case: EVI map and timeseries

A common task in earth observation is to apply a formula to a number of spectral bands in order to compute an ‘index’, such as NDVI, NDWI, EVI, … In this tutorial we’ll go through a couple of steps to extract EVI (enhanced vegetation index) values and timeseries, and discuss some openEO concepts along the way.

Loading an initial data cube

For calculating the EVI, we need the reflectance of the red, blue and (near) infrared spectral components. These spectral bands are part of the well-known Sentinel-2 data set and is available on the current back-end under collection id SENTINEL2_L2A. We load an initial small spatio-temporal slice (a data cube) as follows:

sentinel2_cube = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent={"west": 5.14, "south": 51.17, "east": 5.17, "north": 51.19},
    temporal_extent = ["2021-02-01", "2021-04-30"],
    bands=["B02", "B04", "B08"]
)

Note how we specify a the region of interest, a time range and a set of bands to load.

Important

By filtering as early as possible (directly in load_collection() in this case), we make sure the back-end only loads the data we are interested in for better performance and keeping the processing costs low.

See also

See the chapter Finding and loading data for more details on data discovery, general data loading (Loading a data cube from a collection) and filtering (e.g. Filter on temporal extent).

The load_collection() method on the connection object created a DataCube object (variable sentinel2_cube). This DataCube class of the openEO Python Client Library provides loads of methods corresponding to various openEO processes, e.g. for masking, filtering, aggregation, spectral index calculation, data fusion, etc. In the next steps we will illustrate a couple of these.

Important

It is important to highlight that we did not load any real EO data yet. Instead we just created an abstract client-side reference, encapsulating the collection id, the spatial extent, the temporal extent, etc. The actual data loading will only happen at the back-end once we explicitly trigger the execution of the data processing pipeline we are building.

Band math

From this data cube, we can now select the individual bands with the DataCube.band() method and rescale the digital number values to physical reflectances:

blue = sentinel2_cube.band("B02") * 0.0001
red = sentinel2_cube.band("B04") * 0.0001
nir = sentinel2_cube.band("B08") * 0.0001

We now want to compute the enhanced vegetation index and can do that directly with these band variables:

evi_cube = 2.5 * (nir - red) / (nir + 6.0 * red - 7.5 * blue + 1.0)

Important

As noted before: while this looks like an actual calculation, there is no real data processing going on here. The evi_cube object at this point is just an abstract representation of our algorithm under construction. The mathematical operators we used here are syntactic sugar for expressing this part of the algorithm in a very compact way.

As an illustration of this, let’s have peek at the JSON representation of our algorithm so far, the so-called openEO process graph:

>>> print(evi_cube.to_json(indent=None))
{"process_graph": {"loadcollection1": {"process_id": "load_collection", ...
... "id": "SENTINEL2_L2A", "spatial_extent": {"west": 5.15, "south": ...
... "multiply1": { ... "y": 0.0001}}, ...
... "multiply3": { ... {"x": 2.5, "y": {"from_node": "subtract1"}}} ...
...

Note how the load_collection arguments, rescaling and EVI calculation aspects can be deciphered from this. Rest assured, as user you normally you don’t have to worry too much about these process graph details, the openEO Python Client library handles this behind the scenes for you.

Download (synchronously)

Let’s download this as a GeoTIFF file. Because GeoTIFF does not support a temporal dimension, we first eliminate it by taking the temporal maximum value for each pixel:

evi_composite = evi_cube.max_time()

Note

This max_time() is not an official openEO process but one of the many convenience methods in the openEO Python Client Library to simplify common processing patterns. It implements a reduce operation along the temporal dimension with a max reducer/aggregator.

Now we can download this to a local file:

evi_composite.download("evi-composite.tiff")

This download command triggers the actual processing on the back-end: it sends the process graph to the back-end and waits for the result. It is a synchronous operation (the download() call blocks until the result is fully downloaded) and because we work on a small spatio-temporal extent, this should only take a couple of seconds.

If we inspect the downloaded image, we see that the maximum EVI value is heavily impacted by cloud related artefacts, which makes the result barely usable. In the next steps we will address cloud masking.

_images/evi-composite.png

Batch Jobs (asynchronous execution)

Synchronous downloads are handy for quick experimentation on small data cubes, but if you start processing larger data cubes, you can easily hit computation time limits or other constraints. For these larger tasks, it is recommended to work with batch jobs, which allow you to work asynchronously: after you start your job, you can disconnect (stop your script or even close your computer) and then minutes/hours later you can reconnect to check the batch job status and download results. The openEO Python Client Library also provides helpers to keep track of a running batch job and show a progress report.

See also

See Batch Jobs for more details.

Applying a cloud mask

As mentioned above, we need to filter out cloud pixels to make the result more usable. It is very common for earth observation data to have separate masking layers that for instance indicate whether a pixel is covered by a (type of) cloud or not. For Sentinel-2, one such layer is the “scene classification” layer generated by the Sen2Cor algorithm. In this example, we will use this layer to mask out unwanted data.

First, we load a new SENTINEL2_L2A based data cube with this specific SCL band as single band:

s2_scl = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent={"west": 5.14, "south": 51.17, "east": 5.17, "north": 51.19},
    temporal_extent = ["2021-02-01", "2021-04-30"],
    bands=["SCL"]
)

Now we can use the compact “band math” feature again to build a binary mask with a simple comparison operation:

# Select the "SCL" band from the data cube
scl_band = s2_scl.band("SCL")
# Build mask to mask out everything but class 4 (vegetation)
mask = (scl_band != 4)

Before we can apply this mask to the EVI cube we have to resample it, as the “SCL” layer has a “ground sample distance” of 20 meter, while it is 10 meter for the “B02”, “B04” and “B08” bands. We can easily do the resampling by referring directly to the EVI cube.

mask_resampled = mask.resample_cube_spatial(evi_cube)

# Apply the mask to the `evi_cube`
evi_cube_masked = evi_cube.mask(mask_resampled)

We can now download this as a GeoTIFF, again after taking the temporal maximum:

evi_cube_masked.max_time().download("evi-masked-composite.tiff")

Now, the EVI map is a lot more valuable, as the non-vegetation locations and observations are filtered out:

_images/evi-masked-composite.png

Aggregated EVI timeseries

A common type of analysis is aggregating pixel values over one or more regions of interest (also known as “zonal statistics) and tracking this aggregation over a period of time as a timeseries. Let’s extract the EVI timeseries for these two regions:

features = {"type": "FeatureCollection", "features": [
    {
        "type": "Feature", "properties": {},
        "geometry": {"type": "Polygon", "coordinates": [[
            [5.1417, 51.1785], [5.1414, 51.1772], [5.1444, 51.1768], [5.1443, 51.179], [5.1417, 51.1785]
        ]]}
    },
    {
        "type": "Feature", "properties": {},
        "geometry": {"type": "Polygon", "coordinates": [[
            [5.156, 51.1892], [5.155, 51.1855], [5.163, 51.1855], [5.163, 51.1891], [5.156, 51.1892]
        ]]}
    }
]}

Note

To have a self-containing example we define the geometries here as an inline GeoJSON-style dictionary. In a real use case, your geometry will probably come from a local file or remote URL. The openEO Python Client Library supports alternative ways of specifying the geometry in methods like aggregate_spatial(), e.g. as Shapely geometry objects.

Building on the experience from previous sections, we first build a masked EVI cube (covering a longer time window than before):

# Load raw collection data
sentinel2_cube = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent={"west": 5.14, "south": 51.17, "east": 5.17, "north": 51.19},
    temporal_extent = ["2020-01-01", "2021-12-31"],
    bands=["B02", "B04", "B08", "SCL"],
)

# Extract spectral bands and calculate EVI with the "band math" feature
blue = sentinel2_cube.band("B02") * 0.0001
red = sentinel2_cube.band("B04") * 0.0001
nir = sentinel2_cube.band("B08") * 0.0001
evi = 2.5 * (nir - red) / (nir + 6.0 * red - 7.5 * blue + 1.0)

# Use the scene classification layer to mask out non-vegetation pixels
scl = sentinel2_cube.band("SCL")
evi_masked = evi.mask(scl != 4)

Now we use the aggregate_spatial() method to do spatial aggregation over the geometries we defined earlier. Note how we can specify the aggregation function "mean" as a simple string for the reducer argument.

evi_aggregation = evi_masked.aggregate_spatial(
    geometries=features,
    reducer="mean",
)

If we download this, we get the timeseries encoded as a JSON structure, other useful formats are CSV and netCDF.

evi_aggregation.download("evi-aggregation.json")

Warning

Technically, the output of the openEO process aggregate_spatial is a so-called “vector cube”. At the time of this writing, the specification of this openEO concept is not fully fleshed out yet in the openEO API. openEO back-ends and clients to provide best-effort support for it, but bear in mind that some details are subject to change.

The openEO Python Client Library provides helper functions to convert the downloaded JSON data to a pandas dataframe, which we massage a bit more:

import json
import pandas as pd
from openeo.rest.conversions import timeseries_json_to_pandas

import json
with open("evi-aggregation.json") as f:
    data = json.load(f)

df = timeseries_json_to_pandas(data)
df.index = pd.to_datetime(df.index)
df = df.dropna()
df.columns = ("Field A", "Field B")

This gives us finally our EVI timeseries dataframe:

>>> df
                           Field A   Field B
date
2020-01-06 00:00:00+00:00  0.522499  0.300250
2020-01-16 00:00:00+00:00  0.529591  0.288079
2020-01-18 00:00:00+00:00  0.633011  0.327598
...                             ...       ...
_images/evi-timeseries.png

Computing multiple statistics

Warning

This is an experimental feature of the GeoPySpark openEO back-end, it may not be supported by other back-ends, and is subject to change. See Open-EO/openeo-geopyspark-driver#726 for further discussion,

The same method also allows the computation of multiple statistics at once. This does rely on ‘callbacks’ to construct a result with multiple statistics. The use of such more complex processes is further explained in Processes with child “callbacks”.

from openeo.processes import array_create, mean, sd, median, count

evi_aggregation = evi_masked.aggregate_spatial(
    geometries=features,
    reducer=lambda x: array_create([mean(x), sd(x), median(x), count(x)]),
)