Visualising Multiscale Pyramids

Introduction

Overviews enable efficient visualisation by providing progressively coarser representations of the data, allowing us to quickly navigate and explore large satellite images without loading the full-resolution dataset every time.

Our approach consists of two notebooks:

Part 1: Creating Zarr Overviews
Part 2: Visualising Multiscale Pyramids

In this notebook, we will explore and visualise the multiscale overview pyramid created in the previous tutorial. Building upon what we learned about creating overviews, we will now focus on how to use them effectively for interactive visualisation and exploration of large Earth Observation datasets.

What we will learn

📊 How to parse and inspect the multiscales metadata structure?
🔍 How to load different overview levels dynamically from the Zarr hierarchy?
🎨 How to create RGB composites at different resolutions for visualisation?
🖱️ How to build interactive widgets for exploring zoom levels?
🗺️ How to create an interactive web map with automatic level selection?

Prerequisites

To successfully run this notebook, make sure to have:

Created the overview hierarchy here
Successfully generated multiscales metadata for Sentinel-2 L2A reflectance data
A Zarr dataset with overview levels stored in the measurements/reflectance/r10m/overviews/ subfolder on S3

Required packages:

Core: xarray, zarr, numpy, pandas, s3fs
Visualisation: matplotlib, ipywidgets
Web mapping: ipyleaflet, pyproj, pillow

Import libraries

Note that the below environment variables are necessary to retrieve the product with generated overviews from our private S3 bucket storage. You may edit the values to test the function with your own bucket.

import xarray as xr, os, s3fs, json, pandas as pd, warnings, numpy as np, matplotlib.pyplot as plt
from ipywidgets import interact, IntSlider
warnings.filterwarnings("ignore")
bucket = os.environ["BUCKET_NAME"]
access = os.environ["ACCESS_KEY"]
secret = os.environ["SECRET_KEY"]
bucket_endpoint = os.environ["BUCKET_ENDPOINT"]
# S3 filesystem
fs = s3fs.S3FileSystem(
    key=access,
    secret=secret,
    client_kwargs={"endpoint_url": bucket_endpoint},
)

Understanding the Overview Hierarchy

The first step in working with overviews is to understand their structure. In the previous tutorial, we created a multiscale pyramid with several levels (L0 through L7), where each level represents the data at a different resolution.

Let’s load the dataset and examine the multiscales metadata that describes this hierarchy. This metadata follows the GeoZarr Overviews specification and contains information about: - Which levels exist and where they are stored in the Zarr hierarchy - How each level relates to others (parent-child relationships through derived_from) - The spatial resolution (cell_size) of each level in meters - What resampling method was used to create each level

Opening the dataset

We start by opening the dataset at the group level where our overviews are stored on S3. This is the same path we used in the previous tutorial when creating the overviews (measurements/reflectance/r10m).

# S3 path configuration
s3_zarr_name = "S2A_MSIL2A_20250831T135741_N0511_R010_T26WPD_20250831T185012.zarr"
variable_group_path = "measurements/reflectance/r10m"
# Open dataset from S3 using fs.get_mapper()
r10m_store = fs.get_mapper(f"{bucket}/{s3_zarr_name}/{variable_group_path}")
dataset = xr.open_dataset(r10m_store, engine="zarr")  # Load dataset
print(f"Dataset: {dict(dataset.dims)} | Variables: {list(dataset.data_vars)} | Multiscales: {'multiscales' in dataset.attrs}\n")

Dataset: {'y': 10980, 'x': 10980} | Variables: ['b02', 'b03', 'b04', 'b08'] | Multiscales: True

The output confirms that our dataset has dimensions of 10,980 × 10,980 pixels and contains four spectral bands (b02, b03, b04, b08), along with the multiscales metadata we created in the previous tutorial.

Parsing the multiscales metadata

The multiscales attribute contains all the information about our overview pyramid. Let’s parse it into a table format using pandas to make it easier to understand.

Each row in the table represents one resolution level:

id: Level identifier (L0, L1, L2, etc.)
path: Where this level is stored in the Zarr hierarchy (. means current group, overviews/L* means subfolder)
cell_size: Spatial resolution in metres as [x_res, y_res]
derived_from: Which parent level this was created from
factors: Downsampling factors applied as [y_factor, x_factor]
resampling_method: How pixels were aggregated (average, nearest, etc.)

# Parse multiscales metadata
multiscales= dataset.attrs["multiscales"]
layout_df = pd.DataFrame(multiscales["layout"])
print("Overview Level Hierarchy:")
display(layout_df)

Overview Level Hierarchy:

	id	path	cell_size	derived_from	factors	resampling_method
0	L0	.	[10.0, 10.0]	NaN	NaN	NaN
1	L1	overviews/L1	[20.0, 20.0]	L0	[2, 2]	average
2	L2	overviews/L2	[40.0, 40.0]	L1	[4, 4]	average
3	L3	overviews/L3	[80.0, 80.0]	L2	[8, 8]	average
4	L4	overviews/L4	[160.0, 160.0]	L3	[16, 16]	average
5	L5	overviews/L5	[320.0, 320.0]	L4	[32, 32]	average
6	L6	overviews/L6	[640.0, 640.0]	L5	[64, 64]	average
7	L7	overviews/L7	[1280.0, 1280.0]	L6	[128, 128]	average

Notice the hierarchical structure:

L0 has path=".", meaning it references the native data at the current group level (no data duplication).
L1-L7 have path="overviews/L*", stored in the overviews/ subfolder we created.
Each level is progressively coarser: L1 is 2× downsampled, L2 is 4×, L3 is 8×, and so on.
The cell_size doubles at each level: 10m → 20m → 40m → 80m → 160m → 320m → 640m → 1280m.
All levels use average resampling, which is appropriate for continuous reflectance data.

Loading Overview Levels

Now that we understand the structure from the metadata, let’s load each overview level into memory. We’ll iterate through the layout array from the multiscales metadata and open each level as a separate xarray Dataset.

The code below:

Creates an empty dictionary to store all overview datasets
Loops through each entry in the layout metadata
Constructs the full path by joining the base path with the level’s relative path
Opens each level using xr.open_dataset() with the Zarr engine
Displays the dimensions and downsampling factor for each level

overview_datasets = {}  # Dictionary to store all levels
for entry in multiscales["layout"]:
    level_id, level_path = entry["id"], entry["path"]
    # Construct S3 path for each level
    if level_path == ".":
        # L0 references the base dataset
        level_store = fs.get_mapper(f"{bucket}/{s3_zarr_name}/{variable_group_path}")
    else:
        # L1-L7 are in the overviews subfolder
        level_store = fs.get_mapper(f"{bucket}/{s3_zarr_name}/{variable_group_path}/{level_path}")
    
    overview_datasets[level_id] = xr.open_dataset(level_store, engine="zarr")
    print(f"Loaded  {level_id}: {overview_datasets[level_id]['b02'].shape[0]:5d} × {overview_datasets[level_id]['b02'].shape[1]:5d} pixels")

Loaded  L0: 10980 × 10980 pixels
Loaded  L1:  5490 ×  5490 pixels
Loaded  L2:  2745 ×  2745 pixels
Loaded  L3:  1372 ×  1372 pixels
Loaded  L4:   686 ×   686 pixels
Loaded  L5:   343 ×   343 pixels
Loaded  L6:   171 ×   171 pixels
Loaded  L7:    85 ×    85 pixels

Observe how the dimensions decrease at each level:

L0: 10,980 × 10,980 pixels (full resolution, ~121 megapixels)
L1: 5,490 × 5,490 pixels (half resolution, ~30 megapixels)
L2: 2,745 × 2,745 pixels (quarter resolution, ~7.5 megapixels)
L7: 85 × 85 pixels (128× downsampled, ~7 kilopixels)

Lower resolution levels are much faster to load and visualise, making them ideal for quick previews, thumbnails, or zoomed-out views. This is the key benefit of the overview pyramid structure.

Visualization of Overviews

Creating RGB Composites

To visualise our Sentinel-2 data, we’ll create RGB true-colour composites by combining three spectral bands:

Red channel: Band 4 (b04) at 664 nm - sensitive to red wavelengths
Green channel: Band 3 (b03) at 560 nm - sensitive to green wavelengths
Blue channel: Band 2 (b02) at 490 nm - sensitive to blue wavelengths

However, raw reflectance values (typically ranging from 0 to ~0.4) don’t display well directly. We need to apply contrast enhancement through percentile stretching to make features visible and create visually appealing images.

Percentile stretching for contrast enhancement

The normalise() helper function below performs percentile-based contrast stretching, a standard technique in remote sensing visualisation. It works by:

Filtering out NaN (fill) values, which are common in Earth Observation data due to clouds, partial coverage, or sensor issues
Computing the 2nd and 98th percentile values of the valid data
Linearly mapping these percentile values to 0 and 1 respectively
Clipping any values outside this range

This approach is robust to outliers (very bright or dark pixels) and produces visually balanced images. Critical note: Proper NaN handling is essential - if NaN values aren’t filtered out, np.percentile() returns NaN, causing the entire image to display as white or blank.

    # Normalize each channel using percentile stretch (2% - 98%)
def normalize(band):
    valid_pixels = band[~np.isnan(band)]  # ← ADD THIS LINE
    if len(valid_pixels) == 0:
        return np.zeros_like(band)
    p2, p98 = np.percentile(valid_pixels, [2, 98])  # ← USE valid_pixels
    if p98 == p2:
        return np.zeros_like(band)
    normalized = (band - p2) / (p98 - p2)
    return np.clip(normalized, 0, 1)

Interactive visualization with zoom levels

Now let’s create an interactive visualisation that demonstrates the power of overviews. The widget below allows you to:

Move the slider to switch between zoom levels (0 = highest resolution L0, 7 = lowest resolution L7)
Observe the performance difference: Lower resolution levels (L5-L7) display almost instantly, while higher resolution levels take longer
Notice the quality tradeoff: Higher levels show more detail but require more time to load and render

The function crops each image to the top-left quarter to make the display more responsive. Even with this cropping, L0 at full resolution would be 2,745 × 2,745 pixels, which is still quite large. This demonstrates why overviews are so valuable for interactive applications.


def visualize_level(i):
    """Show RGB composite for overview level (cropped top-left)."""
    ds = overview_datasets[f"L{i}"]                         # Select overview level
    r, g, b = [ds[b].values for b in ('b04', 'b03', 'b02')] # Extract RGB bands
    h, w = r.shape
    rgb = np.dstack([r, g, b])[:h//4, :w//4]                # Crop top-left quarter
    img = np.dstack([normalize(rgb[..., j]) for j in range(3)]) # Normalise bands
    fig, ax = plt.subplots(figsize=(4.5, 4.5))               # Set figure size
    ax.imshow(img)                                           # Display RGB image
    plt.show()
interact(visualize_level, i=IntSlider(min=0, max=7, value=4, description='Zoom Level')); # Interactive slider

Comparing Multiple Resolution Levels

To better understand the effect of downsampling, let’s compare multiple overview levels side by side. This visual comparison helps us see:

How much spatial detail is lost at each downsampling level
The tradeoff between image quality and file size/performance - L2 is 16× smaller than L0 but still retains good detail
Which level is appropriate for different use cases: L2 for regional analysis, L5 for quick previews, L7 for thumbnails

The function below displays three different levels (L2, L5, and L7) to show this progression from medium to very coarse resolution.

def show_levels(levels=(2, 5, 7)):
    """Display RGB composites for selected overview levels."""
    fig, axes = plt.subplots(1, len(levels), figsize=(12, 4))
    for ax, i in zip(axes, levels):
        ds = overview_datasets[f"L{i}"]                              # Select overview level
        rgb = np.dstack([ds[b].values for b in ('b04', 'b03', 'b02')]) # Stack RGB bands
        img = np.dstack([normalize(rgb[..., j]) for j in range(3)])     # Normalise
        ax.imshow(img); ax.axis("off")
        ax.set_title(f"L{i} | {rgb.shape[0]}×{rgb.shape[1]} | ×{2**i}", fontsize=10)
    plt.tight_layout(); plt.show()

show_levels()

Notice the progressive loss of detail:

L2 (4× downsampled, 2745×2745 px): Retains good spatial detail, suitable for regional-scale visualisation and analysis
L5 (32× downsampled, 343×343 px): Shows general patterns and major features but loses fine details like small fields or roads
L7 (128× downsampled, 85×85 px): Provides only a coarse overview showing large-scale patterns, useful for thumbnails or global context

This demonstrates why overviews are essential for progressive rendering in web mapping applications: the application can display L7 instantly for context, then progressively load L5, L3, L1 as the user waits, and finally load L0 when zoomed in for detailed analysis.

Interactive Web Map with Automatic Level Selection

Now let’s put everything together by creating a professional-like interactive web map using ipyleaflet. This demonstrates a real-world application of overviews for geospatial visualisation.

Key features:

Automatic level selection: As you zoom in or out, the map automatically switches to the most appropriate overview level based on the current zoom level and ground resolution
Manual override: Use the slider to manually select a specific level if you want to compare quality at different resolutions
Standard map controls: Pan by clicking and dragging, zoom with mouse wheel or +/- buttons, switch basemap layers
Real-time information: The label shows which level is displayed, its dimensions, cell size, and current ground resolution

The automatic selection algorithm works by:

Calculating the ground resolution (metres per pixel) at the current Web Mercator zoom level
Comparing it to the cell_size metadata of each overview level
Selecting the level whose cell_size is closest to the ground resolution

This ensures you always see the optimal balance between image quality and loading performance for your current view.

# Create Interactive Map with Smart Overview Level Selection
from ipyleaflet_multiscales import create_interactive_map

# Get metadata from the base level (L0)
metadata = overview_datasets["L0"].b02.attrs

# Define RGB band names (Sentinel-2: R=b04, G=b03, B=b02)
band_names = {"r": "b04", "g": "b03", "b": "b02"}

# Create the interactive map
# - initial_level=4: Start with L4 (good balance between quality and performance)
# - initial_zoom=10: Start zoomed to show the full area
# - band_names: Specify which bands to use for RGB composite
map_widget = create_interactive_map(
    overview_datasets=overview_datasets,
    multiscales=multiscales,
    metadata=metadata,
    initial_level=5,
    initial_zoom=7,
    band_names=band_names
)

# Display the map
# The label now shows: Level | Pixel dimensions | Cell size | Zoom | Ground resolution
display(map_widget)

🗺️  Creating interactive map with overview levels...
   Center: [70.6556, -22.7698] (lat/lon)
   Bounds: [[70.1988, -24.3543], [71.1124, -21.1852]]

Creating initial overlay for L5...
   Image shape: (343, 343, 3)
✓ Map created successfully!
   - Zoom in/out to automatically switch overview levels (smart selection based on cell_size)
   - Or use the slider to manually select a level

How to use the interactive map:

Zoom in/out using the +/- buttons, mouse wheel, or double-click - watch as the overview level automatically adjusts to match your zoom level
Pan the map by clicking and dragging to explore different areas
Use the slider to manually select a specific overview level if you want to compare quality
Monitor the label which shows:
- Level ID and dimensions: e.g. “L5 (343×343 px)”
- Downsampling factor: e.g. “32× downsampled”
- Cell size: The spatial resolution in metres (e.g. “320.0m” means each pixel represents 320m on the ground)
- Current zoom: The Web Mercator zoom level (typically 1-18)
- Ground resolution: The actual pixel size at the current zoom level (e.g. “84.5m/px”)

Try zooming in and out to see how the system automatically switches between levels to maintain both visual quality and responsiveness!

💪 Now it is your turn

Now that you’ve learned how to visualise and interact with multiscale overviews, try these exercises to deepen your understanding:

Task 1: Experiment with different band combinations

Instead of true colour RGB (b04, b03, b02), try creating false colour composites that highlight different features:

Colour infrared (CIR): {"r": "b08", "g": "b04", "b": "b03"} - highlights vegetation in red tones
Agriculture composite: {"r": "b08", "g": "b03", "b": "b02"} - emphasises crop health and vigour
Urban analysis: {"r": "b08", "g": "b04", "b": "b02"} - distinguishes urban areas from vegetation

Modify the band_names dictionary in the map creation code above and re-run the cell to see how different band combinations reveal different information about the landscape.

Task 2: Apply this workflow to a different area

Go back to the previous tutorial (65_create_overviews.ipynb) and create overviews for a different Sentinel-2 scene from your region of interest. Then return to this notebook and update the s3_zarr_name variable to visualise your new dataset. Compare how the overview structure works for different landscapes (urban vs rural, mountainous vs flat, etc.).

Task 3: Analyse the performance tradeoff

Measure how long it takes to load and display different overview levels. Add timing code like:

import time
start = time.time()
visualize_level(0)  # L0 - full resolution
print(f"L0 took {time.time()-start:.2f} seconds")

start = time.time()
visualize_level(5)  # L5 - much coarser
print(f"L5 took {time.time()-start:.2f} seconds")

How much faster is L5 compared to L0? What about L7? At what level do you think the quality becomes too low for useful analysis?

Conclusion

In this tutorial, we’ve learned how to visualise and interact with multiscale overview pyramids for GeoZarr datasets. We covered:

Understanding the hierarchy: How to parse and inspect multiscales metadata following the GeoZarr specification
Loading levels dynamically: How to iterate through the layout metadata and load different resolution levels
Creating RGB composites: How to combine spectral bands with proper contrast enhancement and NaN handling
Interactive exploration: How to build responsive widgets for exploring different zoom levels
Professional web mapping: How to create maps with automatic level selection based on zoom and ground resolution

Key takeaways:

Overviews enable efficient multi-scale visualisation by providing progressively coarser representations of large datasets
Automatic level selection ensures optimal performance while maintaining visual quality appropriate for the current view
Proper NaN handling is critical - failing to filter NaN values before percentile calculations causes white or blank displays
The cell_size metadata enables intelligent zoom-aware rendering by matching overview resolution to ground resolution

These techniques are essential for building interactive Earth Observation applications that remain responsive even when working with very large satellite imagery datasets.

What’s next?

Now that you’ve mastered creating and visualising GeoZarr overviews through tutorials 65 and 66, you can:

Apply these techniques to your own datasets: Use the complete workflow on your Earth Observation data from any source (Sentinel-2, Landsat, commercial satellites, etc.)
Build custom web applications: The ipyleaflet_multiscales module provides a foundation for developing interactive mapping tools tailored to specific needs
Optimise for your use case: Test different chunk sizes, compression algorithms, and scale factors to find the best balance between file size and access performance
Scale to cloud platforms: The workflow demonstrated here already uses S3 cloud object storage, which enables web-scale access and can be adapted to other cloud providers (Azure Blob, Google Cloud Storage)

Further resources:

GeoZarr Specification - Full technical specification for GeoZarr extensions
xarray documentation - Comprehensive guide to working with labelled multi-dimensional arrays