.. index:: single: batch job see: job; batch job .. _batch-jobs-chapter: ============ Batch Jobs ============ Most of the simple, basic openEO usage examples show **synchronous** downloading of results: you submit a process graph with a (HTTP POST) request and receive the result as direct response of that same request. This only works properly if the processing doesn't take too long (order of seconds, or a couple of minutes at most). For the heavier work (larger regions of interest, larger time series, more intensive processing, ...) you have to use **batch jobs**, which are supported in the openEO API through separate HTTP requests, corresponding to these steps: - you create a job (providing a process graph and some other metadata like title, description, ...) - you start the job - you wait for the job to finish, periodically polling its status - when the job finished successfully: get the listing of result assets - you download the result assets (or use them in an other way) .. tip:: This documentation mainly discusses how to **programmatically** create and interact with batch job using the openEO Python client library. The openEO API however does not enforce usage of the same tool for each step in the batch job life cycle. For example: if you prefer a graphical, web-based **interactive environment** to manage and monitor your batch jobs, feel free to *switch to an openEO web editor* like `editor.openeo.org `_ or `editor.openeo.cloud `_ at any time. After logging in with the same account you use in your Python scripts, you should see your batch jobs listed under the "Data Processing" tab: .. image:: _static/images/batchjobs-webeditor-listing.png With the "action" buttons on the right, you can for example inspect batch job details, start/stop/delete jobs, download their results, get batch job logs, etc. .. index:: batch job; create Create a batch job =================== In the openEO Python Client Library, if you have a (raster) data cube, you can easily create a batch job with the :py:meth:`DataCube.create_job() ` method. It's important to specify in what *format* the result should be stored, which can be done with an explicit :py:meth:`DataCube.save_result() ` call before creating the job: .. code-block:: python cube = connection.load_collection(...) ... # Store raster data as GeoTIFF files cube = cube.save_result(format="GTiff") job = cube.create_job() or directly in :py:meth:`job.create_job() `: .. code-block:: python cube = connection.load_collection(...) ... job = cube.create_job(out_format="GTiff) While not necessary, it is also recommended to give your batch job a descriptive title so it's easier to identify in your job listing, e.g.: .. code-block:: python job = cube.create_job(title="NDVI timeseries 2022") .. index:: batch job; object Batch job object ================= The ``job`` object returned by :py:meth:`~openeo.rest.datacube.DataCube.create_job()` is a :py:class:`~openeo.rest.job.BatchJob` object. It is basically a *client-side reference* to a batch job that *exists on the back-end* and allows to interact with that batch job (see the :py:class:`~openeo.rest.job.BatchJob` API docs for available methods). .. note:: The :py:class:`~openeo.rest.job.BatchJob` class originally had the more cryptic name :py:class:`~openeo.rest.job.RESTJob`, which is still available as legacy alias, but :py:class:`~openeo.rest.job.BatchJob` is (available and) recommended since version 0.11.0. A batch job on a back-end is fully identified by its :py:data:`~openeo.rest.job.BatchJob.job_id`: .. code-block:: pycon >>> job.job_id 'd5b8b8f2-74ce-4c2e-b06d-bff6f9b14b8d' Reconnecting to a batch job ---------------------------- Depending on your situation or use case: make sure to properly take note of the batch job id. It allows you to "reconnect" to your job on the back-end, even if it was created at another time, by another script/notebook or even with another openEO client. Given a back-end connection and the batch job id, use :py:meth:`Connection.job() ` to create a :py:class:`~openeo.rest.job.BatchJob` object for an existing batch job: .. code-block:: python job_id = "5d806224-fe79-4a54-be04-90757893795b" job = connection.job(job_id) Jupyter integration -------------------- :py:class:`~openeo.rest.job.BatchJob` objects have basic Jupyter notebook integration. Put your :py:class:`~openeo.rest.job.BatchJob` object as last statement in a notebook cell and you get an overview of your batch jobs, including job id, status, title and even process graph visualization: .. image:: _static/images/batchjobs-jupyter-created.png .. index:: batch job; listing List your batch jobs ======================== You can list your batch jobs on the back-end with :py:meth:`Connection.list_jobs() `, which returns a list of job metadata: .. code-block:: pycon >>> connection.list_jobs() [{'title': 'NDVI timeseries 2022', 'status': 'created', 'id': 'd5b8b8f2-74ce-4c2e-b06d-bff6f9b14b8d', 'created': '2022-06-08T08:58:11Z'}, {'title': 'NDVI timeseries 2021', 'status': 'finished', 'id': '4e720e70-88bd-40bc-92db-a366985ebd67', 'created': '2022-06-04T14:46:06Z'}, ... The listing returned by :py:meth:`Connection.list_jobs() ` has Jupyter notebook integration: .. image:: _static/images/batchjobs-jupyter-listing.png .. index:: batch job; start Run a batch job ================= Starting a batch job is pretty straightforward with the :py:meth:`~openeo.rest.job.BatchJob.start()` method: .. code-block:: python job.start() If this didn't raise any errors or exceptions your job should now have started (status "running") or be queued for processing (status "queued"). .. index:: batch job; status Wait for a batch job to finish -------------------------------- A batch job typically takes some time to finish, and you can check its status with the :py:meth:`~openeo.rest.job.BatchJob.status()` method: .. code-block:: pycon >>> job.status() "running" The possible batch job status values, defined by the openEO API, are "created", "queued", "running", "canceled", "finished" and "error". Usually, you can only reliably get results from your job, as discussed in :ref:`batch_job_results`, when it reaches status "finished". .. index:: batch job; polling loop Create, start and wait in one go ---------------------------------- You could, depending on your situation, manually check your job's status periodically or set up a **polling loop** system to keep an eye on your job. The openEO Python client library also provides helpers to do that for you. Working from an existing :py:class:`~openeo.rest.job.BatchJob` instance If you have a batch job that is already created as shown above, you can use the :py:meth:`job.start_and_wait() ` method to start it and periodically poll its status until it reaches status "finished" (or fails with status "error"). Along the way it will print some progress messages. .. code-block:: pycon >>> job.start_and_wait() 0:00:00 Job 'b0e8adcf-087f-41de-afe6-b3c0ea88ff38': send 'start' 0:00:36 Job 'b0e8adcf-087f-41de-afe6-b3c0ea88ff38': queued (progress N/A) 0:01:35 Job 'b0e8adcf-087f-41de-afe6-b3c0ea88ff38': queued (progress N/A) 0:02:19 Job 'b0e8adcf-087f-41de-afe6-b3c0ea88ff38': running (progress N/A) 0:02:50 Job 'b0e8adcf-087f-41de-afe6-b3c0ea88ff38': running (progress N/A) 0:03:28 Job 'b0e8adcf-087f-41de-afe6-b3c0ea88ff38': finished (progress N/A) Working from a :py:class:`~openeo.rest.datacube.DataCube` instance If you didn't create the batch job yet from a given :py:class:`~openeo.rest.datacube.DataCube` you can do the job creation, starting and waiting in one go with :py:meth:`cube.execute_batch() `: .. code-block:: pycon >>> job = cube.execute_batch() 0:00:00 Job 'f9f4e3d3-bc13-441b-b76a-b7bfd3b59669': send 'start' 0:00:23 Job 'f9f4e3d3-bc13-441b-b76a-b7bfd3b59669': queued (progress N/A) ... Note that :py:meth:`cube.execute_batch() ` returns a :py:class:`~openeo.rest.job.BatchJob` instance pointing to the newly created batch job. .. tip:: You can fine-tune the details of the polling loop (the poll frequency, how the progress is printed, ...). See :py:meth:`job.start_and_wait() ` or :py:meth:`cube.execute_batch() ` for more information. .. index:: batch job; logs .. _batch-job-logs: Batch job logs =============== Batch jobs in openEO have **logs** to help with *monitoring and debugging* batch jobs. The back-end typically uses this to dump information during data processing that may be relevant for the user (e.g. warnings, resource stats, ...). Moreover, openEO processes like ``inspect`` allow users to log their own information. Batch job logs can be fetched with :py:meth:`job.logs() ` .. code-block:: pycon >>> job.logs() [{'id': 'log001', 'level': 'info', 'message': 'Job started with 4 workers'}, {'id': 'log002', 'level': 'debug', 'message': 'Loading 5x3x6 tiles'}, {'id': 'log003', 'level': 'error', 'message': "Failed to load data cube: corrupt data for tile 'J9A7K2'."}, ... In a Jupyter notebook environment, this also comes with Jupyter integration: .. image:: _static/images/batchjobs-jupyter-logs.png Automatic batch job log printing --------------------------------- When using :py:meth:`job.start_and_wait() ` or :py:meth:`cube.execute_batch() ` to run a batch job and it fails, the openEO Python client library will automatically print the batch job logs and instructions to help with further investigation: .. code-block:: pycon >>> job.start_and_wait() 0:00:00 Job '68caccff-54ee-470f-abaa-559ed2d4e53c': send 'start' 0:00:01 Job '68caccff-54ee-470f-abaa-559ed2d4e53c': running (progress N/A) 0:00:07 Job '68caccff-54ee-470f-abaa-559ed2d4e53c': error (progress N/A) Your batch job '68caccff-54ee-470f-abaa-559ed2d4e53c' failed. Logs can be inspected in an openEO (web) editor or with `connection.job('68caccff-54ee-470f-abaa-559ed2d4e53c').logs()`. Printing logs: [{'id': 'log001', 'level': 'info', 'message': 'Job started with 4 workers'}, {'id': 'log002', 'level': 'debug', 'message': 'Loading 5x3x6 tiles'}, {'id': 'log003', 'level': 'error', 'message': "Failed to load data cube: corrupt data for tile 'J9A7K2'."}] .. index:: batch job; results .. _batch_job_results: Download batch job results ========================== Once a batch job is finished you can get a handle to the results (which can be a single file or multiple files) and metadata with :py:meth:`~openeo.rest.job.BatchJob.get_results`: .. code-block:: pycon >>> results = job.get_results() >>> results The result metadata describes the spatio-temporal properties of the result and is in fact a valid STAC item: .. code-block:: pycon >>> results.get_metadata() { 'bbox': [3.5, 51.0, 3.6, 51.1], 'geometry': {'coordinates': [[[3.5, 51.0], [3.5, 51.1], [3.6, 51.1], [3.6, 51.0], [3.5, 51.0]]], 'type': 'Polygon'}, 'assets': { 'res001.tiff': { 'href': 'https://openeo.example/download/432f3b3ef3a.tiff', 'type': 'image/tiff; application=geotiff', ... 'res002.tiff': { ... Download all assets -------------------- In the general case, when you have one or more result files (also called "assets"), the easiest option to download them is using :py:meth:`~openeo.rest.job.JobResults.download_files` (plural) where you just specify a download folder (otherwise the current working directory will be used by default): .. code-block:: python results.download_files("data/out") The resulting files will be named as they are advertised in the results metadata (e.g. ``res001.tiff`` and ``res002.tiff`` in case of the metadata example above). Download single asset --------------------- If you know that there is just a single result file, you can also download it directly with :py:meth:`~openeo.rest.job.JobResults.download_file` (singular) with the desired file name: .. code-block:: python results.download_file("data/out/result.tiff") This will fail however if there are multiple assets in the job result (like in the metadata example above). In that case you can still download a single by specifying which one you want to download with the ``name`` argument: .. code-block:: python results.download_file("data/out/result.tiff", name="res002.tiff") Fine-grained asset downloads ---------------------------- If you need a bit more control over which asset to download and how, you can iterate over the result assets explicitly and download these :py:class:`~openeo.rest.job.ResultAsset` instances with :py:meth:`~openeo.rest.job.ResultAsset.download`, like this: .. code-block:: python for asset in results.get_assets(): if asset.metadata["type"].startswith("image/tiff"): asset.download("data/out/result-v2-" + asset.name) Directly load batch job results =============================== If you want to skip downloading an asset to disk, you can also load it directly. For example, load a JSON asset with :py:meth:`~openeo.rest.job.ResultAsset.load_json`: .. code-block:: pycon >>> asset.metadata {"type": "application/json", "href": "https://openeo.example/download/432f3b3ef3a.json"} >>> data = asset.load_json() >>> data {"2021-02-24T10:59:23Z": [[3, 2, 5], [3, 4, 5]], ....}