In this guide selected core mechanisms of the openEO package are described. It is targeted towards interested developers and it is highly recommended to dive into the source code, while reading through this guide. The explanations here are abstracted from the code and shall guide new developers on the concepts and routines of this package.
ProcessCollection class represents the toolbox for creating a process graph in openEO. In contrast to the S3 class
ProcessList which is created in
list_processes() from the returned metadata of the back-end, this
ProcessCollection interprets the meta data of the processes, e.g. the name and the available parameter with their types and names and creates builder functions upon this information like
p$load_collection(). The builder functions themselves create the
ProcessNode objects based on the used processes and the passed values for the arguments.
Note: we might reuse the
ProcessCollection at some points, therefore it needed to be an R6 class, otherwise we copy the potentially list based object multiple times, which might resolves into memory issues at some point.
The classes related to the process graph like
Process are contained in process_graph_building.R. The argument and parameter related classes are in argument_types.R. And lastly the
ProcessCollection is located in predefined_processes.R.
The first important detail is that the R6 object is unlocked, this means that R6 object can be changed at runtime. This is required because the builder functions are added dynamically during the initialization of the R6 object.
ProcessCollection = R6Class( "ProcessCollection", lock_objects = FALSE, ...)
Now, during the initialization (
ProcessCollection$initialize()) of the
ProcessList is translated into a list of
Process objects (1) and based on that the builder functions are derived (2).
private$createListOfProcesses()where the main work is done by the utility function
Process$getFormals(). This will retrieve the parameter names and the default values from the meta data. For the function body we create a
ProcessNodefrom the respective
Processvia a deep copy. Deep copy means that a new object is created, but all the fields are copied, especially nested
Argumentobjects also need to be copied, otherwise two instances of the same process would share their arguments. Finally this process node will receive the values of the builder function as arguments, once the function is invoked. During the creation of those builder function
indexwas used in the for-loop. To work properly we need to replace the variable with its real value, otherwise we cannot access the correct process, because either
indexis unknown or it is the wrong variable.
processFromJson was used to create a
Process object from the JSON meta data - actually, the JSON meta data is already transformed into an R list object but this will always be referred as the JSON meta data as it always will be the response of the back-end. The function itself is won’t do much, but feeding the correct bits of the JSON meta data to the
Process constructor. As part of the constructor parameter, a list of
Argument objects need to be passed on. In the conceptual vision of the package parameter is the descriptive part and argument is essentially a parameter for which can hold a value.
parameterFromJson will perform the translation from the JSON parameter meta data into a
Argument object. The translation is done by comparing the type and schema of the meta data with the implemented
Argument representation. Therefore each implemented
Argument gets its unique schema and type assigned upon creation.
The parameter meta data matching is handled in
findParameterGenerator() and after a suitable
Argument was found additional restrictive information are transferred from the meta data to the
Argument, e.g. not-null constraints, patterns or enumerations, default values etc.
To complete this section
findParameterGenerator() creates a single instance of all registered
Argument objects and invokes
Parameter$matchesSchema() on each object with the given schema. If none matches then a ominous
Argument object will be created which has not many constraints by itself. If more than one match is found, then the first one in the list is chosen, otherwise the one match is selected as suitable
During the development of this package several functions were called again and again, especially
serialize() on the
Argument object. In general those functions work very similar, so R6 inheritance was used to unify this behavior, but for each type
private$typeSerialization() is implemented according to the specific needs of the argument and respectively called by their public counter part.
Similar considerations were made between
ProcessNode. Essentially the node is a process, but carries a unique id that is used in a process graph.
At some point it appeared tedious to pass the active
OpenEOConnection always to each function which interacts with the back-end. So the currently active components of an openeo session are stored in an internal package environment (
openeo:::pkgEnvironment). This environment shall not be accessed by user, but
active_process_collection() were implemented to access or set those environment variables.
Another interesting and somewhat complex aspect is the coercion from an R function into an openEO process graph. This job is done by
.function_to_graph() (in process_graph_building.R) and it is called in the respective coerce function
as.Graph.function(). The routine would look like this.
create_variable()for each parameter of the function
do.call()with the function and the parameters (which are all of type
ProcessNodewhich will be the final node
When a function is passed as reducer or aggregation function it is basically the same procedure. But
ProcessGraphArgument in this case offers already a set of process graph parameters which will be used instead of
create_variable(). If the formals from the function and the amount of parameters from the
ProcessGraphArgument do not match, the coercion will fail.
In some contexts objects are rendered as HTML documents. For example in a Jupyter notebook environment, a RMarkdown or a RNotebook the meta data objects of collections, processes and their graphs are rendered in HTML. The rendering in HTML needs an internet connection, because java script files and styles are accessed from a content delivery system. The openEO ecosystem already provides those components because the openEO Webeditor already uses them. They are distributed at npm vue-components.
The visualization is controlled via the
print_html() is invoked instead of printing to console.
The authentication changed over the years a lot. Basic Authentication was the initial mechanism, then there were various Open ID Connect mechanisms, which are all based on the OAuth2.0 authentication method. For legacy reasons all the different approaches are kept and are available in authentication.R. For the authentication classes inheritance is used again to provide the same function calls from
OpenEOConnection. The main points are that an access_token needs to be provided for authentication and that a
login() and a
logout() is provided. Depending on the access token grants offered by the back-ends identity provider different procedures have to be performed, which might require user interaction. For example the
OIDCAuthCodeFlow spawns a local webservice and waits for a call from the local internet browser based on a redirect that has to be stated at the Authentication Provider. Other flows like
OIDCAuthDeviceCodeFlow poll a certain endpoint at the Authentication Provider with a device code until the user has entered the code and gave the consent to the personal data. The different flows have been implemented by the
httr2 package, which is used to retrieve the
access_token which is required for authorized services at the back-end.
When using RStudio an additional feature was implemented that allows to inspect the available data sources of a connected back-end by using the RStudio’s Connection Contract to populate the
Connections Pane. The connection contract is implemented in
.fill_rstudio_connection_observer() in client.R. After connecting the contracts
listObjects function is called which lists all the available data sets. On extending the view of a specific collection the contracts
listColumns is invoked. This interacts with the back-end to describe the collection (
describe_collection()) and the result is parsed into the stated table structure.
+ <Collection> - <dimension>: <description>