Access Zarr Data with Julia

Introduction

This notebook demonstrates how to access and analyze Sentinel-2 Level-2A data stored in Zarr format using Julia. We’ll use the Zarr.jl package to read cloud-optimized data directly from S3 storage.

What we will learn

  • ⛓️‍💥 Query STAC API to find Sentinel-2 data
  • 📬 Open and explore Zarr stores using Zarr.jl
  • 🔍 Navigate hierarchical group structures
  • 📕 Read and scale array data
  • 🍃 Compute NDVI from reflectance bands
  • 👀 Visualize results with CairoMakie

Prerequisites

  • Julia 1.12 or higher
  • Basic understanding of Julia syntax
  • Familiarity with remote sensing concepts
# Install required packages (run once)
# using Pkg
# Pkg.add(["HTTP", "JSON3", "Zarr", "CairoMakie", "Statistics"])

Load the required packages:

using HTTP
using JSON3
using Zarr
using CairoMakie
using Statistics

Access Zarr data from STAC

We’ll query the EOPF STAC API to find the most recent Sentinel-2 Level-2A data.

# Query STAC API for Sentinel-2 L2A data
# Get most recent data by sorting by datetime in descending order
stac_url = "https://stac.core.eopf.eodc.eu/search"
query = Dict(
    "collections" => ["sentinel-2-l2a"],
    "bbox" => [12.0, 51.5, 13.0, 52.5],
    "sortby" => [Dict("field" => "datetime", "direction" => "desc")],
    "limit" => 1,
    "query" => Dict("eo:cloud_cover" => Dict("lt" => 50))
)

response = HTTP.post(stac_url, ["Content-Type" => "application/json"], JSON3.write(query))
items = JSON3.read(response.body)

# Check if any items were found
isempty(items.features) && error("No items found — try relaxing the cloud cover or expanding the search area")

item = items.features[1]

println("Found item: $(item.id)")
println("Date: $(item.properties.datetime)")
println("Cloud cover: $(item.properties[Symbol("eo:cloud_cover")])%")
Found item: S2B_MSIL2A_20260415T101559_N0512_R065_T33UUU_20260415T153336
Date: 2026-04-15T10:15:59.024000Z
Cloud cover: 36.081517%
# Get the Zarr product URL
# The EOPF catalog provides Zarr data in the 'product' asset
product_url = item.assets.product.href
println("Product URL: $(product_url)")
Product URL: https://objects.eodc.eu:443/e05ab01a9d56408d82ac32d69a5aae2a:202604-s02msil2a-eu/15/products/cpm_v270/S2B_MSIL2A_20260415T101559_N0512_R065_T33UUU_20260415T153336.zarr
# Get the CRS information
# EOPF uses proj:code instead of proj:epsg
crs = item.properties[Symbol("proj:code")]
println("CRS: $(crs)")
CRS: EPSG:32633

Explore the Zarr Store

Now let’s open the Zarr store and explore its structure. Sentinel-2 data is organized hierarchically with groups for different resolutions.

Note: This notebook uses Zarr.jl’s internal .groups and .arrays API (tested with Zarr.jl 0.9). Future versions may require path-style access like g["measurements"].

# Open the Zarr store with consolidated metadata
g = zopen(product_url, consolidated=true)
println("Opened Zarr store successfully")
Opened Zarr store successfully
# List top-level groups
println("Top-level groups:")
for group_name in keys(g.groups)
    println("  - $(group_name)")
end
Top-level groups:
  - measurements
  - conditions
  - quality
# Navigate to 20m resolution reflectance data
measurements = g.groups["measurements"]
reflectance = measurements.groups["reflectance"]
r20m = reflectance.groups["r20m"]

println("Arrays in r20m group:")
for array_name in keys(r20m.arrays)
    println("  - $(array_name)")
end
Arrays in r20m group:
  - b12
  - x
  - b11
  - b07
  - y
  - b04
  - b01
  - b02
  - b03
  - b8a
  - b06
  - b05
# Inspect the b02 (Blue) array
b02 = r20m.arrays["b02"]

println("Band B02 (Blue) properties:")
println("  Shape: $(size(b02))")
println("  Data type: $(eltype(b02))")
println("  Chunks: $(b02.metadata.chunks)")
println("  Scale factor: $(b02.attrs["scale_factor"])")
println("  Add offset: $(b02.attrs["add_offset"])")
Band B02 (Blue) properties:
  Shape: (5490, 5490)
  Data type: UInt16
  Chunks: (2048, 2048)
  Scale factor: 0.0001
  Add offset: -0.1

Resolution comparison

Let’s compare the array shapes across different resolutions:

# Compare resolutions
resolutions = ["r10m", "r20m", "r60m"]

for res in resolutions
    res_group = reflectance.groups[res]
    # Get first array to check shape
    first_array_name = first(keys(res_group.arrays))
    arr = res_group.arrays[first_array_name]
    println("$(res): $(size(arr))")
end
r10m: (10980, 10980)
r20m: (5490, 5490)
r60m: (1830, 1830)

Read Zarr Data

Now let’s read actual data from the arrays. We can read subsets or entire arrays.

# Read a small subset of b02
subset = b02[1:10, 1:5]
println("Subset shape: $(size(subset))")
println("Sample values:")
println(subset)
Subset shape: (10, 5)
Sample values:
UInt16[0x0661 0x0632 0x0673 0x05a5 0x05b4; 0x066b 0x06b6 0x06f3 0x0683 0x05db; 0x0658 0x06ac 0x06df 0x0704 0x0650; 0x0665 0x0675 0x069c 0x0721 0x06d1; 0x0641 0x0677 0x0687 0x06e2 0x0735; 0x0626 0x0640 0x066e 0x068b 0x072c; 0x062c 0x061b 0x064c 0x06c2 0x06e5; 0x0631 0x0619 0x0637 0x069f 0x06a2; 0x0631 0x062a 0x0638 0x063f 0x0650; 0x062d 0x0627 0x0621 0x063c 0x0653]

Coordinates

Let’s access the coordinate arrays to understand the spatial extent:

# Get coordinate array shapes
x_arr = r20m.arrays["x"]
y_arr = r20m.arrays["y"]

println("X coordinate shape: $(size(x_arr))")
println("Y coordinate shape: $(size(y_arr))")
X coordinate shape: (5490,)
Y coordinate shape: (5490,)
# Read coordinate values
x_coords = x_arr[:]
y_coords = y_arr[:]

println("X range: $(minimum(x_coords)) to $(maximum(x_coords))")
println("Y range: $(minimum(y_coords)) to $(maximum(y_coords))")
X range: 300010.0 to 409790.0
Y range: 5.79025e6 to 5.90003e6

Scaling Data

The reflectance data is stored as integers with scale and offset attributes. We need to apply these to get physical values.

# Function to get scale and offset
function get_scale_and_offset(arr)
    scale = haskey(arr.attrs, "scale_factor") ? arr.attrs["scale_factor"] : 1.0
    offset = haskey(arr.attrs, "add_offset") ? arr.attrs["add_offset"] : 0.0
    return scale, offset
end
get_scale_and_offset (generic function with 1 method)
# Check scale and offset for spectral bands
b04 = r20m.arrays["b04"]
b8a = r20m.arrays["b8a"]

scale_b04, offset_b04 = get_scale_and_offset(b04)
scale_b8a, offset_b8a = get_scale_and_offset(b8a)

println("B04 (Red):")
println("  Scale: $(scale_b04), Offset: $(offset_b04)")
println("B8A (NIR):")
println("  Scale: $(scale_b8a), Offset: $(offset_b8a)")
B04 (Red):
  Scale: 0.0001, Offset: -0.1
B8A (NIR):
  Scale: 0.0001, Offset: -0.1
# Check scale and offset for coordinates
scale_x, offset_x = get_scale_and_offset(x_arr)
scale_y, offset_y = get_scale_and_offset(y_arr)

println("X coordinate: Scale=$(scale_x), Offset=$(offset_x)")
println("Y coordinate: Scale=$(scale_y), Offset=$(offset_y)")
X coordinate: Scale=1.0, Offset=0.0
Y coordinate: Scale=1.0, Offset=0.0

Simple Data Analysis: Calculating NDVI

The Normalized Difference Vegetation Index (NDVI) is a measure of vegetation health:

\[NDVI = \frac{NIR - Red}{NIR + Red}\]

For Sentinel-2, we use Band 8A (NIR) and Band 4 (Red) at 20m resolution.

Read and scale bands

Let’s read a subset of the data to calculate NDVI:

# Read a 1000x1000 pixel subset from the center of the scene
n = 1000
# Get array dimensions
dims = size(b04)
# Calculate center position
center_y = div(dims[1], 2)
center_x = div(dims[2], 2)
# Calculate start and end indices
start_y = max(1, center_y - div(n, 2))
end_y = min(dims[1], start_y + n - 1)
start_x = max(1, center_x - div(n, 2))
end_x = min(dims[2], start_x + n - 1)

println("Reading subset from ($start_y:$end_y, $start_x:$end_x)")
red_raw = b04[start_y:end_y, start_x:end_x]
nir_raw = b8a[start_y:end_y, start_x:end_x]

# Mask nodata values (0 and 65535 are fill values in Sentinel-2 uint16 data)
valid_mask = (red_raw .!= 0) .& (red_raw .!= 65535) .& (nir_raw .!= 0) .& (nir_raw .!= 65535)

# Convert to Float64 and apply mask before scaling to avoid contaminating statistics
red_masked = Float64.(red_raw)
nir_masked = Float64.(nir_raw)
red_masked[.!valid_mask] .= NaN
nir_masked[.!valid_mask] .= NaN

# Apply scale and offset
red = red_masked .* scale_b04 .+ offset_b04
nir = nir_masked .* scale_b8a .+ offset_b8a

# Calculate statistics on valid pixels only
red_valid = filter(!isnan, red)
nir_valid = filter(!isnan, nir)

if !isempty(red_valid) && !isempty(nir_valid)
    println("Red band range: $(minimum(red_valid)) to $(maximum(red_valid))")
    println("NIR band range: $(minimum(nir_valid)) to $(maximum(nir_valid))")
    println("Number of valid pixels: $(length(red_valid))")
else
    println("Warning: No valid data found in the selected region")
end
Reading subset from (2245:3244, 2245:3244)
Red band range: 0.0030000000000000027 to 1.3812
NIR band range: 0.004400000000000001 to 1.3717
Number of valid pixels: 1000000

Compute NDVI

Calculate NDVI with proper handling of division by zero:

# Calculate NDVI (NaN values from masking automatically propagate)
denominator = nir .+ red
ndvi = (nir .- red) ./ denominator

# Calculate statistics on valid pixels only
ndvi_valid = filter(!isnan, ndvi)

if !isempty(ndvi_valid)
    println("NDVI statistics:")
    println("  Min: $(minimum(ndvi_valid))")
    println("  Max: $(maximum(ndvi_valid))")
    println("  Mean: $(mean(ndvi_valid))")
    println("  Std: $(std(ndvi_valid))")
else
    println("Warning: No valid NDVI values computed")
end
NDVI statistics:
  Min: -0.6890459363957597
  Max: 0.930300096805421
  Mean: 0.5340182008549976
  Std: 0.21206509784310607

Visualise NDVI

Create a heatmap to visualize the NDVI:

# Create NDVI visualization
fig = Figure(size=(800, 700))
ax = Axis(fig[1, 1], 
    title="NDVI - Sentinel-2 L2A",
    xlabel="X (pixels)",
    ylabel="Y (pixels)")

hm = heatmap!(ax, ndvi', colormap=:RdYlGn, colorrange=(-0.5, 1.0))
Colorbar(fig[1, 2], hm, label="NDVI")

fig

💪 Now it is your turn

Try these exercises to practice working with Zarr data:

  1. Explore different resolutions: Access the 10m resolution data and compare band B02 (Blue) at 10m vs 20m resolution.

  2. Calculate NDWI: The Normalized Difference Water Index uses Green (B03) and NIR (B8A) bands at 20m resolution: \[NDWI = \frac{Green - NIR}{Green + NIR}\]

  3. Multi-temporal analysis: Query multiple dates and compare NDVI changes over time.

Conclusion

In this notebook, we’ve demonstrated how to:

  • Query STAC API to discover Sentinel-2 data
  • Open and navigate Zarr stores using Zarr.jl
  • Access hierarchical group structures with .groups and .arrays
  • Read and scale array data correctly
  • Perform simple analysis (NDVI calculation)
  • Visualize results with CairoMakie

The Zarr format combined with Julia provides efficient access to cloud-optimized Earth observation data, enabling scalable analysis workflows.

What’s next?

In the following section, we will provide a glimpse of the actual status on how to integrate the Sentinel Products of the EOPF Sentinel Zarr Sample Service STAC Catalog in Qgis.