# Install required packages (run once)
# using Pkg
# Pkg.add(["HTTP", "JSON3", "Zarr", "CairoMakie", "Statistics"])Access Zarr Data with Julia
Introduction
This notebook demonstrates how to access and analyze Sentinel-2 Level-2A data stored in Zarr format using Julia. We’ll use the Zarr.jl package to read cloud-optimized data directly from S3 storage.
What we will learn
- ⛓️💥 Query STAC API to find Sentinel-2 data
- 📬 Open and explore Zarr stores using Zarr.jl
- 🔍 Navigate hierarchical group structures
- 📕 Read and scale array data
- 🍃 Compute NDVI from reflectance bands
- 👀 Visualize results with CairoMakie
Prerequisites
- Julia 1.12 or higher
- Basic understanding of Julia syntax
- Familiarity with remote sensing concepts
Load the required packages:
using HTTP
using JSON3
using Zarr
using CairoMakie
using StatisticsAccess Zarr data from STAC
We’ll query the EOPF STAC API to find the most recent Sentinel-2 Level-2A data.
# Query STAC API for Sentinel-2 L2A data
# Get most recent data by sorting by datetime in descending order
stac_url = "https://stac.core.eopf.eodc.eu/search"
query = Dict(
"collections" => ["sentinel-2-l2a"],
"bbox" => [12.0, 51.5, 13.0, 52.5],
"sortby" => [Dict("field" => "datetime", "direction" => "desc")],
"limit" => 1,
"query" => Dict("eo:cloud_cover" => Dict("lt" => 50))
)
response = HTTP.post(stac_url, ["Content-Type" => "application/json"], JSON3.write(query))
items = JSON3.read(response.body)
# Check if any items were found
isempty(items.features) && error("No items found — try relaxing the cloud cover or expanding the search area")
item = items.features[1]
println("Found item: $(item.id)")
println("Date: $(item.properties.datetime)")
println("Cloud cover: $(item.properties[Symbol("eo:cloud_cover")])%")Found item: S2B_MSIL2A_20260415T101559_N0512_R065_T33UUU_20260415T153336
Date: 2026-04-15T10:15:59.024000Z
Cloud cover: 36.081517%
# Get the Zarr product URL
# The EOPF catalog provides Zarr data in the 'product' asset
product_url = item.assets.product.href
println("Product URL: $(product_url)")Product URL: https://objects.eodc.eu:443/e05ab01a9d56408d82ac32d69a5aae2a:202604-s02msil2a-eu/15/products/cpm_v270/S2B_MSIL2A_20260415T101559_N0512_R065_T33UUU_20260415T153336.zarr
# Get the CRS information
# EOPF uses proj:code instead of proj:epsg
crs = item.properties[Symbol("proj:code")]
println("CRS: $(crs)")CRS: EPSG:32633
Explore the Zarr Store
Now let’s open the Zarr store and explore its structure. Sentinel-2 data is organized hierarchically with groups for different resolutions.
Note: This notebook uses Zarr.jl’s internal .groups and .arrays API (tested with Zarr.jl 0.9). Future versions may require path-style access like g["measurements"].
# Open the Zarr store with consolidated metadata
g = zopen(product_url, consolidated=true)
println("Opened Zarr store successfully")Opened Zarr store successfully
# List top-level groups
println("Top-level groups:")
for group_name in keys(g.groups)
println(" - $(group_name)")
endTop-level groups:
- measurements
- conditions
- quality
# Navigate to 20m resolution reflectance data
measurements = g.groups["measurements"]
reflectance = measurements.groups["reflectance"]
r20m = reflectance.groups["r20m"]
println("Arrays in r20m group:")
for array_name in keys(r20m.arrays)
println(" - $(array_name)")
endArrays in r20m group:
- b12
- x
- b11
- b07
- y
- b04
- b01
- b02
- b03
- b8a
- b06
- b05
# Inspect the b02 (Blue) array
b02 = r20m.arrays["b02"]
println("Band B02 (Blue) properties:")
println(" Shape: $(size(b02))")
println(" Data type: $(eltype(b02))")
println(" Chunks: $(b02.metadata.chunks)")
println(" Scale factor: $(b02.attrs["scale_factor"])")
println(" Add offset: $(b02.attrs["add_offset"])")Band B02 (Blue) properties:
Shape: (5490, 5490)
Data type: UInt16
Chunks: (2048, 2048)
Scale factor: 0.0001
Add offset: -0.1
Resolution comparison
Let’s compare the array shapes across different resolutions:
# Compare resolutions
resolutions = ["r10m", "r20m", "r60m"]
for res in resolutions
res_group = reflectance.groups[res]
# Get first array to check shape
first_array_name = first(keys(res_group.arrays))
arr = res_group.arrays[first_array_name]
println("$(res): $(size(arr))")
endr10m: (10980, 10980)
r20m: (5490, 5490)
r60m: (1830, 1830)
Read Zarr Data
Now let’s read actual data from the arrays. We can read subsets or entire arrays.
# Read a small subset of b02
subset = b02[1:10, 1:5]
println("Subset shape: $(size(subset))")
println("Sample values:")
println(subset)Subset shape: (10, 5)
Sample values:
UInt16[0x0661 0x0632 0x0673 0x05a5 0x05b4; 0x066b 0x06b6 0x06f3 0x0683 0x05db; 0x0658 0x06ac 0x06df 0x0704 0x0650; 0x0665 0x0675 0x069c 0x0721 0x06d1; 0x0641 0x0677 0x0687 0x06e2 0x0735; 0x0626 0x0640 0x066e 0x068b 0x072c; 0x062c 0x061b 0x064c 0x06c2 0x06e5; 0x0631 0x0619 0x0637 0x069f 0x06a2; 0x0631 0x062a 0x0638 0x063f 0x0650; 0x062d 0x0627 0x0621 0x063c 0x0653]
Coordinates
Let’s access the coordinate arrays to understand the spatial extent:
# Get coordinate array shapes
x_arr = r20m.arrays["x"]
y_arr = r20m.arrays["y"]
println("X coordinate shape: $(size(x_arr))")
println("Y coordinate shape: $(size(y_arr))")X coordinate shape: (5490,)
Y coordinate shape: (5490,)
# Read coordinate values
x_coords = x_arr[:]
y_coords = y_arr[:]
println("X range: $(minimum(x_coords)) to $(maximum(x_coords))")
println("Y range: $(minimum(y_coords)) to $(maximum(y_coords))")X range: 300010.0 to 409790.0
Y range: 5.79025e6 to 5.90003e6
Scaling Data
The reflectance data is stored as integers with scale and offset attributes. We need to apply these to get physical values.
# Function to get scale and offset
function get_scale_and_offset(arr)
scale = haskey(arr.attrs, "scale_factor") ? arr.attrs["scale_factor"] : 1.0
offset = haskey(arr.attrs, "add_offset") ? arr.attrs["add_offset"] : 0.0
return scale, offset
endget_scale_and_offset (generic function with 1 method)
# Check scale and offset for spectral bands
b04 = r20m.arrays["b04"]
b8a = r20m.arrays["b8a"]
scale_b04, offset_b04 = get_scale_and_offset(b04)
scale_b8a, offset_b8a = get_scale_and_offset(b8a)
println("B04 (Red):")
println(" Scale: $(scale_b04), Offset: $(offset_b04)")
println("B8A (NIR):")
println(" Scale: $(scale_b8a), Offset: $(offset_b8a)")B04 (Red):
Scale: 0.0001, Offset: -0.1
B8A (NIR):
Scale: 0.0001, Offset: -0.1
# Check scale and offset for coordinates
scale_x, offset_x = get_scale_and_offset(x_arr)
scale_y, offset_y = get_scale_and_offset(y_arr)
println("X coordinate: Scale=$(scale_x), Offset=$(offset_x)")
println("Y coordinate: Scale=$(scale_y), Offset=$(offset_y)")X coordinate: Scale=1.0, Offset=0.0
Y coordinate: Scale=1.0, Offset=0.0
Simple Data Analysis: Calculating NDVI
The Normalized Difference Vegetation Index (NDVI) is a measure of vegetation health:
\[NDVI = \frac{NIR - Red}{NIR + Red}\]
For Sentinel-2, we use Band 8A (NIR) and Band 4 (Red) at 20m resolution.
Read and scale bands
Let’s read a subset of the data to calculate NDVI:
# Read a 1000x1000 pixel subset from the center of the scene
n = 1000
# Get array dimensions
dims = size(b04)
# Calculate center position
center_y = div(dims[1], 2)
center_x = div(dims[2], 2)
# Calculate start and end indices
start_y = max(1, center_y - div(n, 2))
end_y = min(dims[1], start_y + n - 1)
start_x = max(1, center_x - div(n, 2))
end_x = min(dims[2], start_x + n - 1)
println("Reading subset from ($start_y:$end_y, $start_x:$end_x)")
red_raw = b04[start_y:end_y, start_x:end_x]
nir_raw = b8a[start_y:end_y, start_x:end_x]
# Mask nodata values (0 and 65535 are fill values in Sentinel-2 uint16 data)
valid_mask = (red_raw .!= 0) .& (red_raw .!= 65535) .& (nir_raw .!= 0) .& (nir_raw .!= 65535)
# Convert to Float64 and apply mask before scaling to avoid contaminating statistics
red_masked = Float64.(red_raw)
nir_masked = Float64.(nir_raw)
red_masked[.!valid_mask] .= NaN
nir_masked[.!valid_mask] .= NaN
# Apply scale and offset
red = red_masked .* scale_b04 .+ offset_b04
nir = nir_masked .* scale_b8a .+ offset_b8a
# Calculate statistics on valid pixels only
red_valid = filter(!isnan, red)
nir_valid = filter(!isnan, nir)
if !isempty(red_valid) && !isempty(nir_valid)
println("Red band range: $(minimum(red_valid)) to $(maximum(red_valid))")
println("NIR band range: $(minimum(nir_valid)) to $(maximum(nir_valid))")
println("Number of valid pixels: $(length(red_valid))")
else
println("Warning: No valid data found in the selected region")
endReading subset from (2245:3244, 2245:3244)
Red band range: 0.0030000000000000027 to 1.3812
NIR band range: 0.004400000000000001 to 1.3717
Number of valid pixels: 1000000
Compute NDVI
Calculate NDVI with proper handling of division by zero:
# Calculate NDVI (NaN values from masking automatically propagate)
denominator = nir .+ red
ndvi = (nir .- red) ./ denominator
# Calculate statistics on valid pixels only
ndvi_valid = filter(!isnan, ndvi)
if !isempty(ndvi_valid)
println("NDVI statistics:")
println(" Min: $(minimum(ndvi_valid))")
println(" Max: $(maximum(ndvi_valid))")
println(" Mean: $(mean(ndvi_valid))")
println(" Std: $(std(ndvi_valid))")
else
println("Warning: No valid NDVI values computed")
endNDVI statistics:
Min: -0.6890459363957597
Max: 0.930300096805421
Mean: 0.5340182008549976
Std: 0.21206509784310607
Visualise NDVI
Create a heatmap to visualize the NDVI:
# Create NDVI visualization
fig = Figure(size=(800, 700))
ax = Axis(fig[1, 1],
title="NDVI - Sentinel-2 L2A",
xlabel="X (pixels)",
ylabel="Y (pixels)")
hm = heatmap!(ax, ndvi', colormap=:RdYlGn, colorrange=(-0.5, 1.0))
Colorbar(fig[1, 2], hm, label="NDVI")
fig💪 Now it is your turn
Try these exercises to practice working with Zarr data:
Explore different resolutions: Access the 10m resolution data and compare band B02 (Blue) at 10m vs 20m resolution.
Calculate NDWI: The Normalized Difference Water Index uses Green (B03) and NIR (B8A) bands at 20m resolution: \[NDWI = \frac{Green - NIR}{Green + NIR}\]
Multi-temporal analysis: Query multiple dates and compare NDVI changes over time.
Conclusion
In this notebook, we’ve demonstrated how to:
- Query STAC API to discover Sentinel-2 data
- Open and navigate Zarr stores using Zarr.jl
- Access hierarchical group structures with
.groupsand.arrays - Read and scale array data correctly
- Perform simple analysis (NDVI calculation)
- Visualize results with CairoMakie
The Zarr format combined with Julia provides efficient access to cloud-optimized Earth observation data, enabling scalable analysis workflows.
What’s next?
In the following section, we will provide a glimpse of the actual status on how to integrate the Sentinel Products of the EOPF Sentinel Zarr Sample Service STAC Catalog in Qgis.