vignettes/openeo-06-developer_implementation_details.Rmd
openeo-06-developer_implementation_details.Rmd
In this guide selected core mechanisms of the openEO package are described. It is targeted towards interested developers and it is highly recommended to dive into the source code, while reading through this guide. The explanations here are abstracted from the code and shall guide new developers on the concepts and routines of this package.
The ProcessCollection
class represents the toolbox for
creating a process graph in openEO. In contrast to the S3 class
ProcessList
which is created in
list_processes()
from the returned metadata of the
back-end, this ProcessCollection
interprets the meta data
of the processes, e.g. the name and the available parameter with their
types and names and creates builder functions upon this information like
p$load_collection()
. The builder functions themselves
create the ProcessNode
objects based on the used processes
and the passed values for the arguments.
Note: we might reuse the ProcessCollection
at some
points, therefore it needed to be an R6 class, otherwise we copy the
potentially list based object multiple times, which might resolves into
memory issues at some point.
The classes related to the process graph like
ProcessNode
and Process
are contained in
process_graph_building.R. The argument and parameter related
classes are in argument_types.R. And lastly the
ProcessCollection
is located in
predefined_processes.R.
ProcessCollection
The first important detail is that the R6 object is unlocked, this means that R6 object can be changed at runtime. This is required because the builder functions are added dynamically during the initialization of the R6 object.
ProcessCollection = R6Class(
"ProcessCollection",
lock_objects = FALSE,
...)
Now, during the initialization
(ProcessCollection$initialize()
) of the
ProcessCollection
, the ProcessList
is
translated into a list of Process
objects (1) and based on
that the builder functions are derived (2).
private$createListOfProcesses()
where the main work is done
by the utility function processFromJson()
Process$getFormals()
. This will retrieve the parameter
names and the default values from the meta data. For the function body
we create a ProcessNode
from the respective
Process
via a deep copy. Deep copy means that a new object
is created, but all the fields are copied, especially nested
Argument
objects also need to be copied, otherwise two
instances of the same process would share their arguments. Finally this
process node will receive the values of the builder function as
arguments, once the function is invoked. During the creation of those
builder function index
was used in the for-loop. To work
properly we need to replace the variable with its real value, otherwise
we cannot access the correct process, because either index
is unknown or it is the wrong variable.processFromJson
and parameterFromJson
processFromJson
was used to create a
Process
object from the JSON meta data - actually, the JSON
meta data is already transformed into an R list object but this will
always be referred as the JSON meta data as it always will be the
response of the back-end. The function itself is won’t do much, but
feeding the correct bits of the JSON meta data to the
Process
constructor. As part of the constructor parameter,
a list of Argument
objects need to be passed on. In the
conceptual vision of the package parameter is the descriptive
part and argument is essentially a parameter for which can hold
a value. parameterFromJson
will perform the
translation from the JSON parameter meta data into a
Argument
object. The translation is done by comparing the
type and schema of the meta data with the implemented
Argument
representation. Therefore each implemented
Argument
gets its unique schema and type assigned upon
creation.
URI = R6Class(
"uri",
inherit=Argument,
public = list(
initialize=function(name=character(),description=character(),required=FALSE) {
private$name = name
private$description = description
private$required = required
private$schema$type = "string"
private$schema$subtype = "uri"
}
), ...)
The parameter meta data matching is handled in
findParameterGenerator()
and after a suitable
Argument
was found additional restrictive information are
transferred from the meta data to the Argument
,
e.g. not-null constraints, patterns or enumerations, default values
etc.
To complete this section findParameterGenerator()
creates a single instance of all registered Argument
objects and invokes Parameter$matchesSchema()
on each
object with the given schema. If none matches then a ominous
Argument
object will be created which has not many
constraints by itself. If more than one match is found, then the first
one in the list is chosen, otherwise the one match is selected as
suitable Argument
.
During the development of this package several functions were called
again and again, especially validate()
and
serialize()
on the Argument
object. In general
those functions work very similar, so R6 inheritance was used to unify
this behavior, but for each type private$typeCheck()
and
private$typeSerialization()
is implemented according to the
specific needs of the argument and respectively called by their public
counter part.
Similar considerations were made between Process
and
ProcessNode
. Essentially the node is a process, but carries
a unique id that is used in a process graph.
At some point it appeared tedious to pass the active
OpenEOConnection
always to each function which interacts
with the back-end. So the currently active components of an openeo
session are stored in an internal package environment
(openeo:::pkgEnvironment
). This environment shall not be
accessed by user, but active_connection()
,
active_data_collection()
or
active_process_collection()
were implemented to access or
set those environment variables.
Another interesting and somewhat complex aspect is the coercion from
an R function into an openEO process graph. This job is done by
.function_to_graph()
(in process_graph_building.R)
and it is called in the respective coerce function
as.Graph.function()
. The routine would look like this.
create_variable()
for each
parameter of the functiondo.call()
with the function and the parameters
(which are all of type ProcessGraphParameter
)ProcessNode
which
will be the final nodeWhen a function is passed as reducer or aggregation
function it is basically the same procedure. But
ProcessGraphArgument
in this case offers already a set of
process graph parameters which will be used instead of
create_variable()
. If the formals from the function and the
amount of parameters from the ProcessGraphArgument
do not
match, the coercion will fail.
In some contexts objects are rendered as HTML documents. For example in a Jupyter notebook environment, a RMarkdown or a RNotebook the meta data objects of collections, processes and their graphs are rendered in HTML. The rendering in HTML needs an internet connection, because java script files and styles are accessed from a content delivery system. The openEO ecosystem already provides those components because the openEO Webeditor already uses them. They are distributed at npm vue-components.
The visualization is controlled via the print
function
(print-functions.R), which checks if the current session is in
an HTML environment and if so the internal print_html()
is
invoked instead of printing to console.
The authentication changed over the years a lot. Basic
Authentication was the initial mechanism, then there were various
Open ID Connect mechanisms, which are all based on the
OAuth2.0 authentication method. For legacy reasons all the
different approaches are kept and are available in
authentication.R. For the authentication classes inheritance is
used again to provide the same function calls from
OpenEOConnection
. The main points are that an
access_token needs to be provided for authentication and that a
login()
and a logout()
is provided. Depending
on the access token grants offered by the back-ends identity provider
different procedures have to be performed, which might require user
interaction. For example the OIDCAuthCodeFlow
spawns a
local webservice and waits for a call from the local internet browser
based on a redirect that has to be stated at the Authentication
Provider. Other flows like OIDCAuthDeviceCodeFlow
poll a
certain endpoint at the Authentication Provider with a device code until
the user has entered the code and gave the consent to the personal data.
The different flows have been implemented by the httr2
package, which is used to retrieve the access_token
which
is required for authorized services at the back-end.
When using RStudio an additional feature was implemented that allows
to inspect the available data sources of a connected back-end by using
the RStudio’s Connection
Contract to populate the Connections
Pane. The
connection contract is implemented in
.fill_rstudio_connection_observer()
in client.R.
After connecting the contracts listObjects
function is
called which lists all the available data sets. On extending the view of
a specific collection the contracts listColumns
is invoked.
This interacts with the back-end to describe the collection
(describe_collection()
) and the result is parsed into the
stated table structure.
+ <Collection>
- <dimension>: <description>