================ Getting Started ================ Connect to an openEO back-end ============================== First, establish a connection to an openEO back-end, using its connection URL. For example the VITO/Terrascope backend: .. code-block:: python import openeo connection = openeo.connect("openeo.vito.be") The resulting :py:class:`~openeo.rest.connection.Connection` object is your central gateway to - list data collections, available processes, file formats and other capabilities of the back-end - start building your openEO algorithm from the desired data on the back-end - execute and monitor (batch) jobs on the back-end - etc. .. seealso:: Use the `openEO Hub `_ to explore different back-end options and their capabilities in a web-based way. Collection discovery ===================== The Earth observation data (the input of your openEO jobs) is organised in `so-called collections `_, e.g. fundamental satellite collections like "Sentinel 1" or "Sentinel 2", or preprocessed collections like "NDVI". You can programmatically list the collections that are available on a back-end and their metadata using methods on the `connection` object we just created (like :py:meth:`~openeo.rest.connection.Connection.list_collection_ids` or :py:meth:`~openeo.rest.connection.Connection.describe_collection` .. code-block:: pycon >>> # Get all collection ids >>> connection.list_collection_ids() ['SENTINEL1_GRD', 'SENTINEL2_L2A', ... >>> # Get metadata of a single collection >>> connection.describe_collection("SENTINEL2_L2A") {'id': 'SENTINEL2_L2A', 'title': 'Sentinel-2 top of canopy ...', 'stac_version': '0.9.0', ... Congrats, you now just did your first real openEO queries to the openEO back-end using the openEO Python client library. .. tip:: The openEO Python client library comes with **Jupyter (notebook) integration** in a couple of places. For example, put ``connection.describe_collection("SENTINEL2_L2A")`` (without ``print()``) as last statement in a notebook cell and you'll get a nice graphical rendering of the collection metadata. .. seealso:: Find out more about data discovery, loading and filtering at :ref:`data_access_chapter`. Authentication ============== In the code snippets above we did not need to log in as a user since we just queried publicly available back-end information. However, to run non-trivial processing queries one has to authenticate so that permissions, resource usage, etc. can be managed properly. To handle authentication, openEO leverages `OpenID Connect (OIDC) `_. It offers some interesting features (e.g. a user can securely reuse an existing account), but is a fairly complex topic, discussed in more depth at :ref:`authentication_chapter`. The openEO Python client library tries to make authentication as streamlined as possible. In most cases for example, the following snippet is enough to obtain an authenticated connection: .. code-block:: python import openeo connection = openeo.connect("openeo.vito.be").authenticate_oidc() This statement will automatically reuse a previously authenticated session, when available. Otherwise, e.g. the first time you do this, some user interaction is required and it will print a web link and a short *user code*, for example: .. code-block:: To authenticate: visit https://aai.egi.eu/auth/realms/egi/device and enter the user code 'SLUO-BMUD'. Visit this web page in a browser, log in there with an existing account and enter the user code. If everything goes well, the ``connection`` object in the script will be authenticated and the back-end will be able to identify you in subsequent requests. .. _basic_example_evi_map_and_timeseries: Example use case: EVI map and timeseries ========================================= A common task in earth observation is to apply a formula to a number of spectral bands in order to compute an 'index', such as NDVI, NDWI, EVI, ... In this tutorial we'll go through a couple of steps to extract EVI (enhanced vegetation index) values and timeseries, and discuss some openEO concepts along the way. Loading an initial data cube ============================= For calculating the EVI, we need the reflectance of the red, blue and (near) infrared spectral components. These spectral bands are part of the well-known Sentinel-2 data set and is available on the current back-end under collection id ``SENTINEL2_L2A``. We load an initial small spatio-temporal slice (a data cube) as follows: .. code-block:: python sentinel2_cube = connection.load_collection( "SENTINEL2_L2A", spatial_extent={"west": 5.14, "south": 51.17, "east": 5.17, "north": 51.19}, temporal_extent = ["2021-02-01", "2021-04-30"], bands=["B02", "B04", "B08"] ) Note how we specify a the region of interest, a time range and a set of bands to load. .. important:: By filtering as early as possible (directly in :py:meth:`~openeo.rest.connection.Connection.load_collection` in this case), we make sure the back-end only loads the data we are interested in for better performance and keeping the processing costs low. .. seealso:: See the chapter :ref:`data_access_chapter` for more details on data discovery, general data loading (:ref:`data-loading-and-filtering`) and filtering (e.g. :ref:`temporal-extent-handling`). The :py:meth:`~openeo.rest.connection.Connection.load_collection` method on the connection object created a :py:class:`~openeo.rest.datacube.DataCube` object (variable ``sentinel2_cube``). This :py:class:`~openeo.rest.datacube.DataCube` class of the openEO Python Client Library provides loads of methods corresponding to various openEO processes, e.g. for masking, filtering, aggregation, spectral index calculation, data fusion, etc. In the next steps we will illustrate a couple of these. .. important:: It is important to highlight that we *did not load any real EO data* yet. Instead we just created an abstract *client-side reference*, encapsulating the collection id, the spatial extent, the temporal extent, etc. The actual data loading will only happen at the back-end once we explicitly trigger the execution of the data processing pipeline we are building. Band math ========= From this data cube, we can now select the individual bands with the :py:meth:`DataCube.band() ` method and rescale the digital number values to physical reflectances: .. code-block:: python blue = sentinel2_cube.band("B02") * 0.0001 red = sentinel2_cube.band("B04") * 0.0001 nir = sentinel2_cube.band("B08") * 0.0001 We now want to compute the enhanced vegetation index and can do that directly with these band variables: .. code-block:: python evi_cube = 2.5 * (nir - red) / (nir + 6.0 * red - 7.5 * blue + 1.0) .. important:: As noted before: while this looks like an actual calculation, there is *no real data processing going on here*. The ``evi_cube`` object at this point is just an abstract representation of our algorithm under construction. The mathematical operators we used here are *syntactic sugar* for expressing this part of the algorithm in a very compact way. As an illustration of this, let's have peek at the *JSON representation* of our algorithm so far, the so-called *openEO process graph*: .. code-block:: text >>> print(evi_cube.to_json(indent=None)) {"process_graph": {"loadcollection1": {"process_id": "load_collection", ... ... "id": "SENTINEL2_L2A", "spatial_extent": {"west": 5.15, "south": ... ... "multiply1": { ... "y": 0.0001}}, ... ... "multiply3": { ... {"x": 2.5, "y": {"from_node": "subtract1"}}} ... ... Note how the ``load_collection`` arguments, rescaling and EVI calculation aspects can be deciphered from this. Rest assured, as user you normally you don't have to worry too much about these process graph details, the openEO Python Client library handles this behind the scenes for you. Download (synchronously) ======================== Let's download this as a GeoTIFF file. Because GeoTIFF does not support a temporal dimension, we first eliminate it by taking the temporal maximum value for each pixel: .. code-block:: python evi_composite = evi_cube.max_time() .. note:: This :py:meth:`~openeo.rest.datacube.DataCube.max_time()` is not an official openEO process but one of the many *convenience methods* in the openEO Python Client Library to simplify common processing patterns. It implements a ``reduce`` operation along the temporal dimension with a ``max`` reducer/aggregator. Now we can download this to a local file: .. code-block:: python evi_composite.download("evi-composite.tiff") This download command **triggers the actual processing** on the back-end: it sends the process graph to the back-end and waits for the result. It is a *synchronous operation* (the :py:meth:`~openeo.rest.datacube.DataCube.download()` call blocks until the result is fully downloaded) and because we work on a small spatio-temporal extent, this should only take a couple of seconds. If we inspect the downloaded image, we see that the maximum EVI value is heavily impacted by cloud related artefacts, which makes the result barely usable. In the next steps we will address cloud masking. .. image:: _static/images/basics/evi-composite.png Batch Jobs (asynchronous execution) =================================== Synchronous downloads are handy for quick experimentation on small data cubes, but if you start processing larger data cubes, you can easily hit *computation time limits* or other constraints. For these larger tasks, it is recommended to work with **batch jobs**, which allow you to work asynchronously: after you start your job, you can disconnect (stop your script or even close your computer) and then minutes/hours later you can reconnect to check the batch job status and download results. The openEO Python Client Library also provides helpers to keep track of a running batch job and show a progress report. .. seealso:: See :ref:`batch-jobs-chapter` for more details. Applying a cloud mask ========================= As mentioned above, we need to filter out cloud pixels to make the result more usable. It is very common for earth observation data to have separate masking layers that for instance indicate whether a pixel is covered by a (type of) cloud or not. For Sentinel-2, one such layer is the "scene classification" layer generated by the Sen2Cor algorithm. In this example, we will use this layer to mask out unwanted data. First, we load a new ``SENTINEL2_L2A`` based data cube with this specific ``SCL`` band as single band: .. code-block:: python s2_scl = connection.load_collection( "SENTINEL2_L2A", spatial_extent={"west": 5.14, "south": 51.17, "east": 5.17, "north": 51.19}, temporal_extent = ["2021-02-01", "2021-04-30"], bands=["SCL"] ) Now we can use the compact "band math" feature again to build a binary mask with a simple comparison operation: .. code-block:: python # Select the "SCL" band from the data cube scl_band = s2_scl.band("SCL") # Build mask to mask out everything but class 4 (vegetation) mask = (scl_band != 4) Before we can apply this mask to the EVI cube we have to resample it, as the "SCL" layer has a "ground sample distance" of 20 meter, while it is 10 meter for the "B02", "B04" and "B08" bands. We can easily do the resampling by referring directly to the EVI cube. .. code-block:: python mask_resampled = mask.resample_cube_spatial(evi_cube) # Apply the mask to the `evi_cube` evi_cube_masked = evi_cube.mask(mask_resampled) We can now download this as a GeoTIFF, again after taking the temporal maximum: .. code-block:: python evi_cube_masked.max_time().download("evi-masked-composite.tiff") Now, the EVI map is a lot more valuable, as the non-vegetation locations and observations are filtered out: .. image:: _static/images/basics/evi-masked-composite.png Aggregated EVI timeseries =========================== A common type of analysis is aggregating pixel values over one or more regions of interest (also known as "zonal statistics) and tracking this aggregation over a period of time as a timeseries. Let's extract the EVI timeseries for these two regions: .. code-block:: python features = {"type": "FeatureCollection", "features": [ { "type": "Feature", "properties": {}, "geometry": {"type": "Polygon", "coordinates": [[ [5.1417, 51.1785], [5.1414, 51.1772], [5.1444, 51.1768], [5.1443, 51.179], [5.1417, 51.1785] ]]} }, { "type": "Feature", "properties": {}, "geometry": {"type": "Polygon", "coordinates": [[ [5.156, 51.1892], [5.155, 51.1855], [5.163, 51.1855], [5.163, 51.1891], [5.156, 51.1892] ]]} } ]} .. note:: To have a self-containing example we define the geometries here as an inline GeoJSON-style dictionary. In a real use case, your geometry will probably come from a local file or remote URL. The openEO Python Client Library supports alternative ways of specifying the geometry in methods like :py:meth:`~openeo.rest.datacube.DataCube.aggregate_spatial()`, e.g. as Shapely geometry objects. Building on the experience from previous sections, we first build a masked EVI cube (covering a longer time window than before): .. code-block:: python # Load raw collection data sentinel2_cube = connection.load_collection( "SENTINEL2_L2A", spatial_extent={"west": 5.14, "south": 51.17, "east": 5.17, "north": 51.19}, temporal_extent = ["2020-01-01", "2021-12-31"], bands=["B02", "B04", "B08", "SCL"], ) # Extract spectral bands and calculate EVI with the "band math" feature blue = sentinel2_cube.band("B02") * 0.0001 red = sentinel2_cube.band("B04") * 0.0001 nir = sentinel2_cube.band("B08") * 0.0001 evi = 2.5 * (nir - red) / (nir + 6.0 * red - 7.5 * blue + 1.0) # Use the scene classification layer to mask out non-vegetation pixels scl = sentinel2_cube.band("SCL") evi_masked = evi.mask(scl != 4) Now we use the :py:meth:`~openeo.rest.datacube.DataCube.aggregate_spatial()` method to do spatial aggregation over the geometries we defined earlier. Note how we can specify the aggregation function ``"mean"`` as a simple string for the ``reducer`` argument. .. code-block:: python evi_aggregation = evi_masked.aggregate_spatial( geometries=features, reducer="mean", ) If we download this, we get the timeseries encoded as a JSON structure, other useful formats are CSV and netCDF. .. code-block:: python evi_aggregation.download("evi-aggregation.json") .. warning:: Technically, the output of the openEO process ``aggregate_spatial`` is a so-called "vector cube". At the time of this writing, the specification of this openEO concept is not fully fleshed out yet in the openEO API. openEO back-ends and clients to provide best-effort support for it, but bear in mind that some details are subject to change. The openEO Python Client Library provides helper functions to convert the downloaded JSON data to a pandas dataframe, which we massage a bit more: .. code-block:: python import json import pandas as pd from openeo.rest.conversions import timeseries_json_to_pandas import json with open("evi-aggregation.json") as f: data = json.load(f) df = timeseries_json_to_pandas(data) df.index = pd.to_datetime(df.index) df = df.dropna() df.columns = ("Field A", "Field B") This gives us finally our EVI timeseries dataframe: .. code-block:: pycon >>> df Field A Field B date 2020-01-06 00:00:00+00:00 0.522499 0.300250 2020-01-16 00:00:00+00:00 0.529591 0.288079 2020-01-18 00:00:00+00:00 0.633011 0.327598 ... ... ... .. image:: _static/images/basics/evi-timeseries.png Computing multiple statistics ============================= .. warning:: This is an experimental feature of the GeoPySpark openEO back-end, it may not be supported by other back-ends, and is subject to change. See `Open-EO/openeo-geopyspark-driver#726 `_ for further discussion, The same method also allows the computation of multiple statistics at once. This does rely on 'callbacks' to construct a result with multiple statistics. The use of such more complex processes is further explained in :ref:`callbackfunctions`. .. code-block:: python from openeo.processes import array_create, mean, sd, median, count evi_aggregation = evi_masked.aggregate_spatial( geometries=features, reducer=lambda x: array_create([mean(x), sd(x), median(x), count(x)]), )