This glossary introduces, and tries to define, the major technical terms used in the openEO project.
The acronym openEO contracts two concepts:
- open: used here in the context of open source software; open source software is available in source code form, and can be freely modified and redistributed; the openEO project will create open source software, reusable under a liberal open source license (Apache 2.0)
- EO: Earth observation; openEO targets the processing and analysis of Earth observation data
- API: application programming interface (wikipedia); a communication protocol between client and back-end
- (cloud) back-end: server; computer infrastructure (one or more physical computers or virtual machines) used for storing EO data and processing it
- big Earth observation cloud back-end server infrastructure where industry and researchers analyse large amounts of EO data
- unified current EO cloud back-ends all have a different API, making EO data analysis hard to validate,difficult to reproduce, and back-ends difficult to compare in terms of capability and costs, or to combine in a joint analysis across back-ends. A unified API can resolve many of these problems.
CEOS (CEOS OpenSearch Best Practice Document v1.2) defines Granules and Collections as follows:
"A granule is the finest granularity of data that can be independently managed. A granule usually matches the individual file of EO satellite data."
"A collection is an aggregation of granules sharing the same product specification. A collection typically corresponds to the series of products derived from data acquired by a sensor on board a satellite and having the same mode of operation."
The same document lists the synonyms used (by organisations) for:
- granule: dataset (ISO 19115), dataset (ESA), granule (NASA), product (ESA, CNES), scene (JAXA)
- collection: dataset series (ISO 19115), collection (CNES, NASA), dataset (JAXA), dataset series (ESA), product (JAXA)
Here, we will use granule and collection.
A granule will typically refer to a limited area and a single overpass leading to a very short observation period (seconds), or a temporal aggregation of such data as e.g. for 16-day MODIS composites.
The open geospatial consortium published a document on OGC OpenSearch Geo and Time Extensions.
Processes and Jobs
The terms process, process graph and job have different meanings in the openEO API specification.
A process is simply the description of an operation as provided by the back end, similar to a function definition in programming languages.
In this context openEO will:
- consider, or allow to consider, band as a dimension
- consider imagery (image collections) to consist of one or more collections, as argument to functions; allow filtering on a particular collection, or joining them into a single collection
- allow filtering on attributes, e.g. on cloud-free pixels, or pixels inside a
MULTIPOLYGONdescribing the floodplains of the Danube. This filters on attributes rather than dimensions.
- Provide generic aggregate operations that aggregate over one or more dimensions. Clients may provide dimension-specific aggregation functions for particular cases (such as
A process graph includes specific process calls, i.e. references to one or more processes including specific values for input arguments similar to a function call in programming. However, process graphs can chain multiple processes. In particular, arguments of processes in general can be again (recursive) process graphs, input datasets, or simple scalar or array values.
A job brings one process graph to the back-end and organizes its execution, which may or may not induce costs. Jobs furthermore allow to run process graphs from different data views (see section on data views). Views define at which resolution and extent we look at the data during processing and hence allow to try out process graphs on small subsets, or work interactively within web map applications. For more information about jobs and their evaluation types, see the section on jobs.
Aggregation vs. resampling
Aggregation computes new values from sets of values that are uniquely assigned to groups. It involves a grouping predicate (e.g. monthly, 100 m x 100 m grid cells; think of SQL's
group_by), and an aggregation function (e.g.,
mean) that computes one or more new values from the original ones.
- a time series aggregation may return a regression slope and intercept for every pixel time series, for a single band (group by: full time extent)
- a time series may be aggregated to monthly values by computing the mean for all values in a month (group by: months)
- spatial aggregation involves computing e.g. mean pixel values on a 100 x 100 m grid, from 10 m x 10 m pixels, where each original pixel is assigned uniquely to a larger pixel (group by: 100 m x 100 m grid cells)
Note that for the first example, the aggregation function not only requires time series values, but also their time stamps.
Resampling is a broader term where we have data at one resolution, and need values at another (also called scaling). In case we have values at a 100 m x 100 m grid and need values at a 10 m x 10 m grid, the original values will be reused many times, and may be be simply assigned to the nearest high resolution grid cells ("nearest neighbor"), or may be interpolated somehow (e.g. by bilinear interpolation). Resampling from finer to coarser grid by nearest neighbor may again be a special case of aggregation.
When the target grid or time series has a lower resolution (larger grid cells) or lower frequency (longer time intervals) than the source grid, aggregation might be used for resampling. For example, if the resolutions are fairly similar, say the source collection has values for consecutive 10 day intervals and the target needs values for consecutive 16 day intervals, then some form of interpolation may be more appropriate than aggregation as defined here.
The API developed by the openEO project uses HTTP REST for communication between client and back-end server.
User-defined functions (UDFs)
The abbreviation UDF stands for user-defined function. With this concept, users are able to upload custom code and have it executed e.g. for every pixel of a scene, allowing custom calculations on server-side data.