import requests
import numpy as np
import matplotlib.pyplot as plt
from typing import List, Optional, cast
from pystac import Collection, MediaType
from pystac_client import Client, CollectionClient
from datetime import datetime
import xarray as xr
From STAC to Data: Accessing EOPF Zarr with xarray
Introduction
In this tutorial we will demonstrate how to access EOPF Zarr
products directly from the EOPF Sentinel Zarr Sample Service STAC Catalog.
What we will learn
- βοΈ How to open cloud-optimised datasets from the EOPF Zarr STAC Catalog with xarray
- π Examing loaded datasets
- π Perform simple data analyses with the loaded data
Prerequisites
This tutorial requires the xarray-eopf
extension for data manipulation. To find out more about the library, access the documentation.
It is advised that you go through the previous section, as it gives you an introduction on how to programmatically access a STAC catalog.
Import libraries
Helper functions
list_found_elements
As we are expecting to visualise several elements that will be stored in lists, we define a function that will allow us retrieve item id
βs and collections id
βs for further retrieval.
def list_found_elements(search_result):
id = []
= []
coll for item in search_result.items(): #retrieves the result inside the catalogue.
id.append(item.id)
coll.append(item.collection_id)return id , coll
Establish a connection to the EOPF Zarr STAC Catalog
Our first step is to a connection to the EOPF Zarr STAC Catalog. This involves defining the url of the STAC endpoint. See the previous section for a more detailed explanation how to retrieve the enpoint URL.
Through the Client.open()
function, we can establish the connection to the EOPF Zarr STAC catalog by providing the specific url.
= 100
max_description_length
= "https://stac.core.eopf.eodc.eu/" #root starting point
eopf_stac_api_root_endpoint = Client.open(url=eopf_stac_api_root_endpoint)
eopf_catalog # eopf_catalog #print to have an interative visualisation
Filtering for items of interest
For this tutorial, we will focus on the Sentinel-2 L2A Collection. The EOPF STAC Catalog corresponding id is: sentinel-2-l2a
.
As we are interested in retrieving and exploring an Item from the collection, we will focus again over the Innsbruck area we have defined in the previous tutorial.
= eopf_catalog.search( # searching in the Catalog
innsbruck_s2 = 'sentinel-2-l2a', # interest Collection,
collections=(11.124756, 47.311058, # AOI extent
bbox11.459839,47.463624),
='2020-05-01T00:00:00Z/2025-05-31T23:59:59.999999Z' # interest period
datetime
)
=list_found_elements(innsbruck_s2)
combined_ins
print("Search Results:")
print('Total Items Found for Sentinel-2 L-2A over Innsbruck: ',len(combined_ins[0]))
Search Results:
Total Items Found for Sentinel-2 L-2A over Innsbruck: 27
Let us now select the first Item in the list of 27 Items.
=combined_ins[0][0]
first_item_idprint(first_item_id)
S2B_MSIL2A_20250530T101559_N0511_R065_T32TPT_20250530T130924
In a next step, we retrieve the URL of the cloud location for the specific item. We will need the url to access and load the selected Item with the help of xarray
.
= eopf_catalog.get_collection('sentinel-2-l2a')
c_sentinel2 #Choosing the first item available to be opened:
= c_sentinel2.get_item(id=first_item_id)
item= item.get_assets(media_type=MediaType.ZARR)
item_assets
= item_assets['product'].href
cloud_storage
print('Item cloud storage URL for retrieval:',cloud_storage)
Item cloud storage URL for retrieval: https://objects.eodc.eu:443/e05ab01a9d56408d82ac32d69a5aae2a:202505-s02msil2a/30/products/cpm_v256/S2B_MSIL2A_20250530T101559_N0511_R065_T32TPT_20250530T130924.zarr
Examining Dataset Structure
In the following step, we open the cloud-optimised Zarr dataset using xarray.open_datatree
supported by the xarray-eopf extension
.
The subsequent loop then prints out all the available groups within the opened DataTree
, providing a comprehensive overview of the hierarchical structure of the EOPF Zarr products.
= xr.open_datatree(
dt # the cloud storage url from the Item we are interested in
cloud_storage, ="eopf-zarr", # xarray-eopf defined engine
engine="native", # visualisation mode
op_mode={}) # default eopf chunking size
chunks
for dt_group in sorted(dt.groups):
print("DataTree group {group_name}".format(group_name=dt_group)) # getting the available groups
DataTree group /
DataTree group /conditions
DataTree group /conditions/geometry
DataTree group /conditions/mask
DataTree group /conditions/mask/detector_footprint
DataTree group /conditions/mask/detector_footprint/r10m
DataTree group /conditions/mask/detector_footprint/r20m
DataTree group /conditions/mask/detector_footprint/r60m
DataTree group /conditions/mask/l1c_classification
DataTree group /conditions/mask/l1c_classification/r60m
DataTree group /conditions/mask/l2a_classification
DataTree group /conditions/mask/l2a_classification/r20m
DataTree group /conditions/mask/l2a_classification/r60m
DataTree group /conditions/meteorology
DataTree group /conditions/meteorology/cams
DataTree group /conditions/meteorology/ecmwf
DataTree group /measurements
DataTree group /measurements/reflectance
DataTree group /measurements/reflectance/r10m
DataTree group /measurements/reflectance/r20m
DataTree group /measurements/reflectance/r60m
DataTree group /quality
DataTree group /quality/atmosphere
DataTree group /quality/atmosphere/r10m
DataTree group /quality/atmosphere/r20m
DataTree group /quality/atmosphere/r60m
DataTree group /quality/l2a_quicklook
DataTree group /quality/l2a_quicklook/r10m
DataTree group /quality/l2a_quicklook/r20m
DataTree group /quality/l2a_quicklook/r60m
DataTree group /quality/mask
DataTree group /quality/mask/r10m
DataTree group /quality/mask/r20m
DataTree group /quality/mask/r60m
DataTree group /quality/probability
DataTree group /quality/probability/r20m
Root Dataset Metadata
We specifically look for groups containing data variables under /measurements/reflectance/r20m
(which corresponds to Sentinel-2 bands at 20m resolution). The output provides key information about the selected group, including its dimensions, available data variables (the different spectral bands), and coordinates.
# Get /measurements/reflectance/r20m group
= list(dt.groups)
groups = [
interesting_groups for group in groups if group.startswith('/measurements/reflectance/r20m')
group and dt[group].ds.data_vars
]print(f"\nπ Searching for groups with data variables in '/measurements/reflectance/r20m'...")
π Searching for groups with data variables in '/measurements/reflectance/r20m'...
if interesting_groups:
= interesting_groups[0]
sample_group = dt[sample_group].ds
group_ds
print(f"Group '{sample_group}' Information")
print("=" * 50)
print(f"Dimensions: {dict(group_ds.dims)}")
print(f"Data Variables: {list(group_ds.data_vars.keys())}")
print(f"Coordinates: {list(group_ds.coords.keys())}")
else:
print("No groups with data variables found in the first 5 groups.")
Group '/measurements/reflectance/r20m' Information
==================================================
Dimensions: {'y': 5490, 'x': 5490}
Data Variables: ['b01', 'b02', 'b03', 'b04', 'b05', 'b06', 'b07', 'b11', 'b12', 'b8a']
Coordinates: ['x', 'y']
/tmp/ipykernel_1128780/1451209294.py:7: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
print(f"Dimensions: {dict(group_ds.dims)}")
In a next step, we inspect the attributes of the root dataset within the DataTree. Attributes often contain important high-level metadata about the entire product, such as processing details, STAC discovery information, and more. We print the first few attributes to get an idea of the available metadata.
# Examine the root dataset
= dt.ds
root_dataset
print("Root Dataset Metadata")
if root_dataset.attrs:
print(f"\nAttributes (first 3):")
for key, value in list(root_dataset.attrs.items())[:3]:
print(f" {key}: {str(value)[:80]}{'...' if len(str(value)) > 80 else ''}")
Root Dataset Metadata
Attributes (first 3):
other_metadata: {'AOT_retrieval_model': 'SEN2COR_DDV', 'L0_ancillary_data_quality': '4', 'L0_eph...
stac_discovery: {'assets': {'analytic': {'eo:bands': [{'center_wavelength': 0.4423, 'common_name...
Visualising the RGB quicklook composite
EOPF Zarr Assets include a quicklook RGB composite, which we now want to open and visuliase. We open the Zarr dataset again, but this time, we specifically target the quality/l2a_quicklook/r20m group
and its variables.
This group typically contains a true-colour (RGB) quicklook composite, which is a readily viewable representation of the satellite image.
We use xr.open_dataset()
and specify the following set of arguments in order to load the quicklook.
## Visualising the RGB quicklook composite:
= xr.open_dataset(
ds # the cloud storage url from the Item we are interested in
cloud_storage, ="eopf-zarr", # xarray-eopf defined engine
engine="native", # visualisation mode
op_mode={}, # default eopf chunking size
chunks="/",
group_sep="quality/l2a_quicklook/r20m/*",
variables )
As soon as we loaded it, we can create a simple plot with imshow()
to see the quicklook.
"quality/l2a_quicklook/r20m/tci"].plot.imshow()
ds['RGB Quicklook')
plt.title('X-coordinate')
plt.xlabel('Y-coordinate')
plt.ylabel(False) # Turn off grid for image plots
plt.grid('tight') # Ensure axes fit the data tightly plt.axis(
Simple Data Analysis: Calculating NDVI
Let us now do a simple analysis with the data from the EOPF Zarr STAC Catalog. Let us calculate the Normalized Difference Vegetation Index (NDVI).
First, we open the Zarr dataset specifically for the red (B04) and Near-Infrared (B08) bands, which are crucial for the calculation of the NDVI. We also specify resolution=20
to ensure we are working with the 20-meter resolution bands.
= xr.open_dataset(
red_nir
cloud_storage,="eopf-zarr",
engine={},
chunks=0,
spline_orders=['b04', 'b08'],
variables= 60,
resolution )
In a next step, we cast the red (B04) and Near-Infrared (B08) bands to floating-point numbers. This is important for accurate mathematical operations, which we will conduct in the next cell.
= red_nir.b04.astype(float)
red_f = red_nir.b08.astype(float) nir_f
Now, we perform the initial steps for NDVI calculation: - sum_bands
: Calculates the sum of the Near-Infrared and Red bands. - diff_bands
: Calculates the difference between the Near-Infrared and Red bands.
= nir_f + red_f
sum_bands = nir_f - red_f
diff_bands = diff_bands / sum_bands ndvi
To prevent division by zero errors in areas where both red and NIR bands might be zero (e.g., water bodies or clouds), this line replaces any NaN values resulting from division by zero with 0. This ensures a clean and robust NDVI product.
= ndvi.where(sum_bands != 0, 0) ndvi
In a final step, we can visualise the calculated NDVI.
='RdYlGn', vmin=-1, vmax=1)
ndvi.plot(cmap'Normalized Difference Vegetation Index (NDVI)')
plt.title('X-coordinate')
plt.xlabel('Y-coordinate')
plt.ylabel(False) # Turn off grid for image plots
plt.grid('tight') # Ensure axes fit the data tightly
plt.axis(
# Display the plot
plt.show()
πͺ Now it is your turn
With the foundations learned so far, you are now equipped to access products from the EOPF Zarr STAC catalog. These are your tasks: ### Task 1: Explore five additional Sentinel-2 Items for Innsbruck Replicate the RGB quicklook and have an overview of the spatial changes.
Task 2: Calculate NDVI
Replicate the NDVI calculation for the additional Innsbruck items.
Task 3: Applying more advanced analysis techniques
The EOPF STAC Catalog offers a wealth of data beyond Sentinel-2. Replicate the search and data access for data from other collections.
Conclusion
In this section we established a connection to the EOPF Sentinel Zarr Sample Service STAC Catalog and directly accessed an EOPF Zarr item with xarray
. In the tutorial you are guided through the process of opening hierarchical EOPF Zarr products using xarray
βs DataTree
, a library designed for accessing complex hierarchical data structures.
Whatβs next?
This online resource is under active development. So stay tuned for regular updates.