Access Zarr Data with Julia

Introduction

This notebook demonstrates how to access and analyze Sentinel-2 Level-2A data stored in Zarr format using Julia. We’ll use the Zarr.jl package to read cloud-optimized data directly from S3 storage.

What we will learn

⛓️‍💥 Query STAC API to find Sentinel-2 data
📬 Open and explore Zarr stores using Zarr.jl
🔍 Navigate hierarchical group structures
📕 Read and scale array data
🍃 Compute NDVI from reflectance bands
👀 Visualize results with CairoMakie

Prerequisites

Julia 1.12 or higher
Basic understanding of Julia syntax
Familiarity with remote sensing concepts

# Install required packages (run once)
# using Pkg
# Pkg.add(["HTTP", "JSON3", "Zarr", "CairoMakie", "Statistics"])

Load the required packages:

using HTTP
using JSON3
using Zarr
using CairoMakie
using Statistics

Access Zarr data from STAC

We’ll query the EOPF STAC API to find the most recent Sentinel-2 Level-2A data.

# Query STAC API for Sentinel-2 L2A data
# Get most recent data by sorting by datetime in descending order
stac_url = "https://stac.core.eopf.eodc.eu/search"
query = Dict(
    "collections" => ["sentinel-2-l2a"],
    "bbox" => [12.0, 51.5, 13.0, 52.5],
    "sortby" => [Dict("field" => "datetime", "direction" => "desc")],
    "limit" => 1,
    "query" => Dict("eo:cloud_cover" => Dict("lt" => 50))
)

response = HTTP.post(stac_url, ["Content-Type" => "application/json"], JSON3.write(query))
items = JSON3.read(response.body)

# Check if any items were found
isempty(items.features) && error("No items found — try relaxing the cloud cover or expanding the search area")

item = items.features[1]

println("Found item: $(item.id)")
println("Date: $(item.properties.datetime)")
println("Cloud cover: $(item.properties[Symbol("eo:cloud_cover")])%")

Found item: S2B_MSIL2A_20260415T101559_N0512_R065_T33UUU_20260415T153336
Date: 2026-04-15T10:15:59.024000Z
Cloud cover: 36.081517%

# Get the Zarr product URL
# The EOPF catalog provides Zarr data in the 'product' asset
product_url = item.assets.product.href
println("Product URL: $(product_url)")

Product URL: https://objects.eodc.eu:443/e05ab01a9d56408d82ac32d69a5aae2a:202604-s02msil2a-eu/15/products/cpm_v270/S2B_MSIL2A_20260415T101559_N0512_R065_T33UUU_20260415T153336.zarr

# Get the CRS information
# EOPF uses proj:code instead of proj:epsg
crs = item.properties[Symbol("proj:code")]
println("CRS: $(crs)")

CRS: EPSG:32633

Explore the Zarr Store

Now let’s open the Zarr store and explore its structure. Sentinel-2 data is organized hierarchically with groups for different resolutions.

Note: This notebook uses Zarr.jl’s internal .groups and .arrays API (tested with Zarr.jl 0.9). Future versions may require path-style access like g["measurements"].

# Open the Zarr store with consolidated metadata
g = zopen(product_url, consolidated=true)
println("Opened Zarr store successfully")

Opened Zarr store successfully

# List top-level groups
println("Top-level groups:")
for group_name in keys(g.groups)
    println("  - $(group_name)")
end

Top-level groups:
  - measurements
  - conditions
  - quality

# Navigate to 20m resolution reflectance data
measurements = g.groups["measurements"]
reflectance = measurements.groups["reflectance"]
r20m = reflectance.groups["r20m"]

println("Arrays in r20m group:")
for array_name in keys(r20m.arrays)
    println("  - $(array_name)")
end

Arrays in r20m group:
  - b12
  - x
  - b11
  - b07
  - y
  - b04
  - b01
  - b02
  - b03
  - b8a
  - b06
  - b05

# Inspect the b02 (Blue) array
b02 = r20m.arrays["b02"]

println("Band B02 (Blue) properties:")
println("  Shape: $(size(b02))")
println("  Data type: $(eltype(b02))")
println("  Chunks: $(b02.metadata.chunks)")
println("  Scale factor: $(b02.attrs["scale_factor"])")
println("  Add offset: $(b02.attrs["add_offset"])")

Band B02 (Blue) properties:
  Shape: (5490, 5490)
  Data type: UInt16
  Chunks: (2048, 2048)
  Scale factor: 0.0001
  Add offset: -0.1

Resolution comparison

Let’s compare the array shapes across different resolutions:

# Compare resolutions
resolutions = ["r10m", "r20m", "r60m"]

for res in resolutions
    res_group = reflectance.groups[res]
    # Get first array to check shape
    first_array_name = first(keys(res_group.arrays))
    arr = res_group.arrays[first_array_name]
    println("$(res): $(size(arr))")
end

r10m: (10980, 10980)
r20m: (5490, 5490)
r60m: (1830, 1830)

Read Zarr Data

Now let’s read actual data from the arrays. We can read subsets or entire arrays.

# Read a small subset of b02
subset = b02[1:10, 1:5]
println("Subset shape: $(size(subset))")
println("Sample values:")
println(subset)

Subset shape: (10, 5)
Sample values:
UInt16[0x0661 0x0632 0x0673 0x05a5 0x05b4; 0x066b 0x06b6 0x06f3 0x0683 0x05db; 0x0658 0x06ac 0x06df 0x0704 0x0650; 0x0665 0x0675 0x069c 0x0721 0x06d1; 0x0641 0x0677 0x0687 0x06e2 0x0735; 0x0626 0x0640 0x066e 0x068b 0x072c; 0x062c 0x061b 0x064c 0x06c2 0x06e5; 0x0631 0x0619 0x0637 0x069f 0x06a2; 0x0631 0x062a 0x0638 0x063f 0x0650; 0x062d 0x0627 0x0621 0x063c 0x0653]

Coordinates

Let’s access the coordinate arrays to understand the spatial extent:

# Get coordinate array shapes
x_arr = r20m.arrays["x"]
y_arr = r20m.arrays["y"]

println("X coordinate shape: $(size(x_arr))")
println("Y coordinate shape: $(size(y_arr))")

X coordinate shape: (5490,)
Y coordinate shape: (5490,)

# Read coordinate values
x_coords = x_arr[:]
y_coords = y_arr[:]

println("X range: $(minimum(x_coords)) to $(maximum(x_coords))")
println("Y range: $(minimum(y_coords)) to $(maximum(y_coords))")

X range: 300010.0 to 409790.0
Y range: 5.79025e6 to 5.90003e6

Scaling Data

The reflectance data is stored as integers with scale and offset attributes. We need to apply these to get physical values.

# Function to get scale and offset
function get_scale_and_offset(arr)
    scale = haskey(arr.attrs, "scale_factor") ? arr.attrs["scale_factor"] : 1.0
    offset = haskey(arr.attrs, "add_offset") ? arr.attrs["add_offset"] : 0.0
    return scale, offset
end

get_scale_and_offset (generic function with 1 method)

# Check scale and offset for spectral bands
b04 = r20m.arrays["b04"]
b8a = r20m.arrays["b8a"]

scale_b04, offset_b04 = get_scale_and_offset(b04)
scale_b8a, offset_b8a = get_scale_and_offset(b8a)

println("B04 (Red):")
println("  Scale: $(scale_b04), Offset: $(offset_b04)")
println("B8A (NIR):")
println("  Scale: $(scale_b8a), Offset: $(offset_b8a)")

B04 (Red):
  Scale: 0.0001, Offset: -0.1
B8A (NIR):
  Scale: 0.0001, Offset: -0.1

# Check scale and offset for coordinates
scale_x, offset_x = get_scale_and_offset(x_arr)
scale_y, offset_y = get_scale_and_offset(y_arr)

println("X coordinate: Scale=$(scale_x), Offset=$(offset_x)")
println("Y coordinate: Scale=$(scale_y), Offset=$(offset_y)")

X coordinate: Scale=1.0, Offset=0.0
Y coordinate: Scale=1.0, Offset=0.0

Simple Data Analysis: Calculating NDVI

The Normalized Difference Vegetation Index (NDVI) is a measure of vegetation health:

\[NDVI = \frac{NIR - Red}{NIR + Red}\]

For Sentinel-2, we use Band 8A (NIR) and Band 4 (Red) at 20m resolution.

Read and scale bands

Let’s read a subset of the data to calculate NDVI:

# Read a 1000x1000 pixel subset from the center of the scene
n = 1000
# Get array dimensions
dims = size(b04)
# Calculate center position
center_y = div(dims[1], 2)
center_x = div(dims[2], 2)
# Calculate start and end indices
start_y = max(1, center_y - div(n, 2))
end_y = min(dims[1], start_y + n - 1)
start_x = max(1, center_x - div(n, 2))
end_x = min(dims[2], start_x + n - 1)

println("Reading subset from ($start_y:$end_y, $start_x:$end_x)")
red_raw = b04[start_y:end_y, start_x:end_x]
nir_raw = b8a[start_y:end_y, start_x:end_x]

# Mask nodata values (0 and 65535 are fill values in Sentinel-2 uint16 data)
valid_mask = (red_raw .!= 0) .& (red_raw .!= 65535) .& (nir_raw .!= 0) .& (nir_raw .!= 65535)

# Convert to Float64 and apply mask before scaling to avoid contaminating statistics
red_masked = Float64.(red_raw)
nir_masked = Float64.(nir_raw)
red_masked[.!valid_mask] .= NaN
nir_masked[.!valid_mask] .= NaN

# Apply scale and offset
red = red_masked .* scale_b04 .+ offset_b04
nir = nir_masked .* scale_b8a .+ offset_b8a

# Calculate statistics on valid pixels only
red_valid = filter(!isnan, red)
nir_valid = filter(!isnan, nir)

if !isempty(red_valid) && !isempty(nir_valid)
    println("Red band range: $(minimum(red_valid)) to $(maximum(red_valid))")
    println("NIR band range: $(minimum(nir_valid)) to $(maximum(nir_valid))")
    println("Number of valid pixels: $(length(red_valid))")
else
    println("Warning: No valid data found in the selected region")
end

Reading subset from (2245:3244, 2245:3244)
Red band range: 0.0030000000000000027 to 1.3812
NIR band range: 0.004400000000000001 to 1.3717
Number of valid pixels: 1000000

Compute NDVI

Calculate NDVI with proper handling of division by zero:

# Calculate NDVI (NaN values from masking automatically propagate)
denominator = nir .+ red
ndvi = (nir .- red) ./ denominator

# Calculate statistics on valid pixels only
ndvi_valid = filter(!isnan, ndvi)

if !isempty(ndvi_valid)
    println("NDVI statistics:")
    println("  Min: $(minimum(ndvi_valid))")
    println("  Max: $(maximum(ndvi_valid))")
    println("  Mean: $(mean(ndvi_valid))")
    println("  Std: $(std(ndvi_valid))")
else
    println("Warning: No valid NDVI values computed")
end

NDVI statistics:
  Min: -0.6890459363957597
  Max: 0.930300096805421
  Mean: 0.5340182008549976
  Std: 0.21206509784310607

Visualise NDVI

Create a heatmap to visualize the NDVI:

# Create NDVI visualization
fig = Figure(size=(800, 700))
ax = Axis(fig[1, 1], 
    title="NDVI - Sentinel-2 L2A",
    xlabel="X (pixels)",
    ylabel="Y (pixels)")

hm = heatmap!(ax, ndvi', colormap=:RdYlGn, colorrange=(-0.5, 1.0))
Colorbar(fig[1, 2], hm, label="NDVI")

fig

💪 Now it is your turn

Try these exercises to practice working with Zarr data:

Explore different resolutions: Access the 10m resolution data and compare band B02 (Blue) at 10m vs 20m resolution.
Calculate NDWI: The Normalized Difference Water Index uses Green (B03) and NIR (B8A) bands at 20m resolution: \[NDWI = \frac{Green - NIR}{Green + NIR}\]
Multi-temporal analysis: Query multiple dates and compare NDVI changes over time.

Conclusion

In this notebook, we’ve demonstrated how to:

Query STAC API to discover Sentinel-2 data
Open and navigate Zarr stores using Zarr.jl
Access hierarchical group structures with .groups and .arrays
Read and scale array data correctly
Perform simple analysis (NDVI calculation)
Visualize results with CairoMakie

The Zarr format combined with Julia provides efficient access to cloud-optimized Earth observation data, enabling scalable analysis workflows.

What’s next?

In the following section, we will provide a glimpse of the actual status on how to integrate the Sentinel Products of the EOPF Sentinel Zarr Sample Service STAC Catalog in Qgis.