Discover EOPF Zarr - Sentinel-2 L2A

Introduction

This tutorial introduces you to the structure of a an EOPF Zarr product sample for Sentinel-2 L2A data. We will demonstrate how to access and open a Zarr product sample with xarray, how to visualise the zarr encoding structure, explore embedded information, and retrieve relevant metadata for further processing.

What we will learn

  • ⚙️ How to open a .zarr file using xarray?
  • 🛰️ The general structure of a Sentinel-2 L-2A item
  • 🔎 How to access metadata that describes the .zarr encoding?

Prerequisites

This tutorial uses a re-processed sample dataset from the EOPF Sentinel Zarr Samples Service STAC API that is available for direct access here.

The selected zarr product is a Sentinel-2 L2A tile from the 10th of June 2025: * File name: S2C_MSIL2A_20250610T103641_N0511_R008_T32UMD_20250610T132001.zarr.).


Import libraries

import os
import xarray as xr

Helper functions

Open a Zarr Store

In a first step, we use the function open_datatree() from the xarray library to open a Zarr store as a DataTree.
Inside, we ned to define the following key word arguments:

  • filename_or_obj: path leading to a zarr store
  • engine: 'eopf-zarr', designed for the EOPF zarr by ESA.
  • op_mode: extension by the xarray-eopf development for allowing an analysis or native mode. For more information visit the xarray-eopf documentation.
  • chunks: loads the data with dask using the engine’s preferred chunk size, generally identical to the format’s chunk size

The final print of the DataTree object is commented out, as the display can be quite extensive, showing the entire content within the Zarr. An alternative is to apply a helper function that only displays the higher level structure as shown in the next code cell.

url = 'https://objects.eodc.eu/e05ab01a9d56408d82ac32d69a5aae2a:202506-s02msil2a/10/products/cpm_v256/S2C_MSIL2A_20250610T103641_N0511_R008_T32UMD_20250610T132001.zarr'
s2l2a_zarr_sample= xr.open_datatree(url,
    engine="eopf-zarr", # storage format
    op_mode="native", # no analysis mode
    chunks={}, # allows to open the default chunking
)

If we apply the helper function print_gen_structure on the root of the DataTree object, we will get a listing of the tree-like structure of the object. We can see all Zarr groups, such as measurements, quality and conditions, their sub-groups and content.

print("Zarr Sentinel 2 L2A Structure")
print_gen_structure(s2l2a_zarr_sample.root) 
print("-" * 30)
Zarr Sentinel 2 L2A Structure
None
  conditions
    geometry
    mask
      detector_footprint
        r10m
        r20m
        r60m
      l1c_classification
        r60m
      l2a_classification
        r20m
        r60m
    meteorology
      cams
      ecmwf
  measurements
    reflectance
      r10m
      r20m
      r60m
  quality
    atmosphere
      r10m
      r20m
      r60m
    l2a_quicklook
      r10m
      r20m
      r60m
    mask
      r10m
      r20m
      r60m
    probability
      r20m
------------------------------

Extract information from Zarr groups

In a next step, we can explore the content of individual Zarr groups. By specifying the name of the group and subgroup and adding it into square brackets, we can extract the content of the relevant group. Let us for example extract the content of the subgroup reflectance under measurements.

As a result, it is visible that there are three subgroups of the parent node measurements/reflectance: r10, r20 and r60, which are the DataArrays with the three different resolutions of the Sentinel-2 L2A data.

The xarray.DataTree structure allows the exploration of additional group-related metadata and information. For example, we can find the chunksize of each array and the coordinates.

# Retrieving the reflectance groups:
# s2l2a_zarr_sample["measurements/reflectance"] # Run it yourself for an inteactive overview

Extract Zarr metadata on different levels

Through s2l2a_zarr_sample.attrs[] we are able to visualise both the stac_discovery and other_metadata included in the zarr store.

For the properties inside stac_discovery for example we can get the parameters included:

# STAC metadata style:
print(list(s2l2a_zarr_sample.attrs["stac_discovery"].keys()))
['assets', 'bbox', 'geometry', 'id', 'links', 'properties', 'stac_extensions', 'stac_version', 'type']

We are also, able to retrieve specific information by diving deep into the stac_discovery metadata, such as:

print('Date of Item Creation: ', s2l2a_zarr_sample.attrs['stac_discovery']['properties']['created'])
print('Item Bounding Box    : ', s2l2a_zarr_sample.attrs['stac_discovery']['bbox'])
print('Item ESPG            : ', s2l2a_zarr_sample.attrs['stac_discovery']['properties']['proj:epsg'])
print('Sentinel Platform    : ', s2l2a_zarr_sample.attrs['stac_discovery']['properties']['platform'])
print('Item Processing Level: ', s2l2a_zarr_sample.attrs['stac_discovery']['properties']['processing:level'])
Date of Item Creation:  2025-06-10T13:20:01+00:00
Item Bounding Box    :  [9.146276872400831, 52.25344953517325, 7.500940412097549, 53.24953673463324]
Item ESPG            :  32632
Sentinel Platform    :  sentinel-2c
Item Processing Level:  L2A

And from other_metadata, we are able to retrieve the information specific to the instrument variables.

# Complementing metadata:
print(list(s2l2a_zarr_sample.attrs["other_metadata"].keys()))
['AOT_retrieval_model', 'L0_ancillary_data_quality', 'L0_ephemeris_data_quality', 'NUC_table_ID', 'SWIR_rearrangement_flag', 'UTM_zone_identification', 'absolute_location_assessment_from_AOCS', 'band_description', 'declared_accuracy_of_AOT_model', 'declared_accuracy_of_radiative_transfer_model', 'declared_accuracy_of_water_vapour_model', 'electronic_crosstalk_correction_flag', 'eopf_category', 'geometric_refinement', 'history', 'horizontal_CRS_code', 'horizontal_CRS_name', 'mean_sensing_time', 'mean_sun_azimuth_angle_in_deg_for_all_bands_all_detectors', 'mean_sun_zenith_angle_in_deg_for_all_bands_all_detectors', 'mean_value_of_aerosol_optical_thickness', 'mean_value_of_total_water_vapour_content', 'meteo', 'multispectral_registration_assessment', 'onboard_compression_flag', 'onboard_equalization_flag', 'optical_crosstalk_correction_flag', 'ozone_source', 'ozone_value', 'percentage_of_degraded_MSI_data', 'planimetric_stability_assessment_from_AOCS', 'product_quality_status', 'reflectance_correction_factor_from_the_Sun-Earth_distance_variation_computed_using_the_acquisition_date', 'spectral_band_of_reference']

đź’Ş Now it is your turn

As we are able to retrieve several items from the EOPF Sentinel Zarr Samples Service STAC API, let us try the following:
### Task Go to the Sentinel-2 Level-2A collection and: - Choose an item of interest. - Replicate the workflow and explore the item’s metadata. When was it retrieved? - What are the dimensions? - What is the detailed location of the item?

Conclusion

This tutorial provides an initial understanding of the zarr structure for a Sentinel-2 L2A product sample. By using the xarray library, we can effectively navigate and inspect the different components within the zarr format, including its metadata and array organisation.

What’s next?

Now that you’ve been introduced to the zarr format, learned its core concepts, and understood the basics of how to explore it, you are prepared for the next step. In the following chapter we will introduce you to STAC and the EOPF Zarr STAC Catalog. As we go along, we are more and more transition from theory to practice, providing you with hands-on tutorials working with EOPF Zarr products.