Harmonized Landsat Sentinel-2 on Azure: Cloud Native HLS for Scalable Geospatial AI

ChatGPT · 2025-09-30T17:32:49-0400

NASA’s Harmonized Landsat and Sentinel-2 (HLS) dataset is now hosted on Microsoft’s Planetary Computer, bringing a multi-petabyte, harmonized archive of 30‑meter, analysis‑ready surface reflectance imagery into a scalable Azure environment and making high‑frequency Earth observation workflows easier to run at cloud scale. This new availability lets researchers, developers, and enterprise teams combine HLS’s frequent revisit, harmonized spectral response, and standardized Cloud‑Optimized GeoTIFF (COG) layout with Microsoft’s Planetary Computer APIs, Azure storage, and Azure AI/ML services — unlocking faster time‑series analytics, AI model training, and interactive geospatial applications.

Background: what HLS is and why it matters

HLS is a NASA‑led project that blends data from the Landsat family (OLI/OLI‑2 on Landsat‑8/9) and the European Copernicus Sentinel‑2 MSI sensors into a single, harmonized surface reflectance product. The goal is to remove sensor‑to‑sensor differences (radiometric, spectral, geometric and BRDF) so Landsat and Sentinel‑2 observations can be treated like a single continuous record at a common 30‑meter grid. That harmonization enables higher temporal density and more consistent multi‑sensor time series for land‑surface monitoring.
Prior to harmonization, analysts had to reconcile differences in bandpasses, view angles, and atmospheric corrections when fusing Landsat and Sentinel‑2. HLS handles those steps — atmospheric correction, cloud/cloud‑shadow masking, spectral bandpass adjustment, view‑angle normalization, and reprojection to a common Sentinel‑2 MGRS tiling — yielding two principal product families: HLSS30 (Sentinel‑derived, resampled to 30 m) and HLSL30 (Landsat‑derived adjusted to the same grid). Those products are delivered as per‑band COGs and packaged with QA layers for robust, cloud‑native analysis.
The practical payoff is simple: instead of waiting 5–16 days for a single platform to revisit, HLS can supply near‑global 30‑meter observations every two to three days by fusing multiple satellites — a step change for agriculture, ecosystem monitoring, disaster response, and near‑real‑time environmental intelligence. HLS V2.0, completed in 2023 and reaching global coverage (outside Antarctica), improved atmospheric correction and cloud masking relative to earlier releases and extended the archive back to 2013.

What Microsoft hosting changes: technical details and access

How the data are stored and organized on Azure

On Microsoft’s Planetary Computer and its AIforEarth dataset collection, HLS assets are stored as Cloud‑Optimized GeoTIFFs (COGs) inside Azure Blob Storage (East US 2 in the published configuration). File naming follows the HLS convention (e.g., HLS.S30.T16TDL.2019206.v1.4_03.tif) and is organized by product family (L30 or S30), MGRS tile, acquisition day, collection version, and band/subdataset. Microsoft’s dataset page exposes a container listing, recommended access patterns, and sample notebooks for reading COGs directly in Python.
Microsoft provides read‑only access paths and programmatic SAS (Shared Access Signature) token endpoints for scripted use, plus example mounting/IO recipes to consume HLS assets inside Azure compute. Because the HLS archive can contain hundreds of terabytes to petabytes of data, Microsoft explicitly recommends performing large‑scale processing in the same Azure region where the data are hosted to avoid egress charges and to get the performance benefits of local blob access.

Dataset products and index layers

The HLS distribution on Planetary Computer covers the main reflectance bands, thermal bands (for L30), and QA layers. It also includes the newer HLS vegetation‑index suite (NDVI, EVI, SAVI, NDMI, NBR, and others) which was rolled out as part of the V2.0 productization. Each per‑band COG includes embedded metadata and external .met ECS metadata to support discoverability and reproducible processing. The HLS tiling and gridding use Sentinel‑2’s MGRS (109.8 km tiles with overlap) so existing Sentinel‑2 workflows will translate directly to HLS tiles.

Programmatic access and sample flows

Common access methods provided in documentation and examples include:

Direct HTTP(S) reads to COGs with SAS tokens for ephemeral authenticated access.
Mounting Azure blobs into compute VMs or containers (recommended within East US 2).
Leveraging Planetary Computer’s APIs and STAC metadata (where available) to discover items by tile, date, or footprint.
Using sample Python notebooks that load HLS COGs into xarray/rasterio and feed them into Azure Machine Learning or compute clusters for model training.

Why this matters: new capabilities unlocked by cloud hosting

Hosting HLS on Microsoft’s Planetary Computer brings three practical advantages that matter to both researchers and production teams:

Scale and locality — process petabyte‑scale time series in‑cloud without downloading terabytes of imagery, significantly reducing turnaround time for model training and inference. Microsoft’s platform guidance explicitly recommends co‑located compute for cost and speed.
Native cloud‑friendly formats — HLS is provided as COGs and compatible with STAC‑style discovery patterns; this unblocks modern data pipelines built on Dask, xarray, and cloud‑native GIS tools.
AI and analytics integration — colocating HLS with Azure AI and Azure OpenAI services lets teams prototype models that combine deep learning, time‑series analysis, and prompt‑driven exploration (for example, NASA’s Earth Copilot prototype). This creates a pathway to build production geospatial AI applications — automated land‑cover classification, crop stress early warning, flood damage assessment, or near‑real‑time change detection.

These benefits translate to real outcomes: faster iteration on models, repeatable batch processing, and the ability to deliver near‑real‑time insights to stakeholders who need rapid situational awareness.

Practical use cases and examples

Agriculture and crop monitoring

HLS’s 2–3 day effective revisit enables field‑scale phenology monitoring and more responsive crop health tracking. Frequent, harmonized reflectance means you can compute vegetation indices like NDVI with reduced bias when sensors switch between Landsat and Sentinel‑2 — critical for automated irrigation advisories, pest detection algorithms, and yield forecasting. Hosted on Azure, these analyses can be executed across thousands of fields in parallel.

Disturbance detection and disaster response

For wildfire burn mapping, flood inundation mapping, and storm impact assessments, HLS provides denser temporal coverage and the harmonized consistency needed for rapid pre/post event change detection. Combining HLS with Azure’s compute and serverless functions enables near‑real‑time ingestion and automated analytic pipelines to feed dashboards for emergency managers.

Long‑term land‑cover change and carbon accounting

The HLS archive extends back to 2013 for Landsat inputs and 2015 for Sentinel‑2 inputs, offering a decade‑plus record at a consistent 30‑meter grid — a sweet spot for regional carbon monitoring, forest‑loss mapping, and land‑use transition studies. The harmonized nature reduces artificial discontinuities that can plague historical analyses when switching sensors.

Validation, known limitations, and scientific caveats

HLS’s harmonization pipeline is sophisticated, but it is not magic. A peer assessment of HLS V2.0 shows strong reductions in between‑sensor differences for red, NIR, and SWIR bands — with same‑day reflectance differences falling below roughly 4.2% for those bands — but the blue and green bands retain slightly larger residual differences, due in part to the challenges of atmospheric correction at shorter visible wavelengths. Users doing radiometrically precise work in coastal, aquatic, or atmospheric‑sensitive bands should validate HLS outputs against in‑situ measurements or sensor‑specific products for their application.
The HLS product pipeline uses USGS/NASA atmospheric correction code and produces QA masks, but cloud and cloud‑shadow masking remains an area where errors can affect time‑series analytics, particularly in persistently cloudy regions. Analysts must include QA filtering and consider temporal compositing strategies to reduce false positives in index‑based alarms. HLS documentation and the algorithm theoretical basis documents describe these behaviors and recommended filtering approaches.
Finally, there’s a product latency angle: HLS V2.0 latency targets are around 1.7 days from overpass to LP DAAC availability if all Level‑1 and auxiliary data arrive on time. Low‑latency HLS products and the vegetation index suite are being rolled out progressively; users should confirm the version and latency characteristics of the specific HLS tiles they rely on, because operational decisioning requires awareness of product timing guarantees.

Risks and governance considerations for cloud hosting

Moving HLS to a major cloud provider unlocks power, but it also introduces governance and operational risks that teams must manage carefully.

Platform dependency and service changes. While Planetary Computer makes data accessible, Microsoft’s platform choices, region availability, API updates, or product reorientation can affect long‑term workflows. Historical community discussion shows Planetary Computer services and tooling evolve; teams should keep an exit strategy and local caching for mission‑critical workflows.
Cost and compute planning. Cloud compute, storage egress, and parallel processing costs must be budgeted. Co‑locating compute in the same Azure region reduces egress, but large multi‑tile processing will still generate significant compute charges. Effective pipeline design (tiling strategies, lazy evaluation using Dask, and spot/low‑priority VMs) helps control spend.
Data licensing and usage terms. HLS source data come with the original Landsat/Sentinel licenses and government distribution policies; Microsoft surfaces the HLS dataset “as‑is” and disclaims warranties. Teams building commercial products should confirm licensing and attribution requirements, and consider how external dependencies (e.g., USGS, ESA data policies) influence commercial uses.
Reproducibility and versioning. HLS uses collection versioning (e.g., v2.0, v1.4). Scientific reproducibility requires recording exact collection version, date ranges, tile IDs, and processing flags; automated pipelines should bake those identifiers into metadata so results remain traceable despite dataset updates.

Where statements about platform guarantees, availability or pricing are concerned, teams must rely on the cloud provider’s service documentation and contracts — and bake monitoring and alerts into their deployments.

How to get started — practical checklist

Identify the tiles and date range you need using the Sentinel‑2 MGRS tiling grid and HLS tile lookup scripts.
Check which HLS collection/version (L30 or S30, and V2.x) contains your target period; record collection identifiers for reproducibility.
Prototype locally with sample COGs or the provided notebooks to validate your QA filtering and index calculations. Use rasterio, rio‑cogeo, and xarray to stream COGs in Python.
Move heavy processing to Azure compute in the same region (East US 2 if using the published HLS blob), and use SAS tokens or managed identities for secure blob access. Consider Azure Batch, Azure ML, or Databricks for distributed runs.
Implement cost controls: limit parallel workers, use spot instances for non‑critical jobs, and validate egress minimization by keeping intermediate outputs inside Azure storage.

The human + AI angle: Earth Copilot and natural‑language discovery

An immediate synergy from the NASA–Microsoft collaboration is Earth Copilot, an AI prototype that demonstrates how natural‑language queries and Azure OpenAI can simplify dataset discovery and early analysis workflows. Earth Copilot acts as a conversational layer that translates questions (e.g., “show crop stress in county X in June 2024”) into dataset discovery, filtering, and visualization actions — which is particularly valuable for non‑specialist stakeholders. The prototype is currently being evaluated inside NASA, and it illustrates the potential for prompt‑driven access on top of Planetary Computer datasets like HLS.
While Earth Copilot showcases convenience, teams should be cautious about AI hallucinations and the need for domain validation. Prompt outputs must be traced back to deterministic pipeline steps (tile IDs, collection versions, QA thresholds). AI can accelerate discovery, but reproducible scientific workflows still require explicit code, provenance, and validation steps.

Final assessment: strengths, trade‑offs, and recommended next steps

The public arrival of HLS on Microsoft’s Planetary Computer is an important milestone for cloud‑native remote sensing. Key strengths include:

Massive accessibility to a harmonized, high‑frequency archive in cloud‑native form, removing friction for scaling analyses.
Interoperability with standard tiling (MGRS), COG formats, and STAC‑style discovery patterns that fit today’s geospatial data stacks.
AI and application integration that enables prototypes like Earth Copilot and production AI/ML workflows on Azure.

Primary trade‑offs and risks to weigh:

Residual radiometric differences in blue/green bands and cloud‑masking edge cases require domain validation for sensitive applications.
Operational dependency on a commercial cloud provider — teams should prepare fallback/archival strategies and account for service evolution or regional availability changes.
Cost and governance — cloud compute and data egress management, plus license and compliance considerations for downstream use, must be baked into program budgets and policies.

Recommended next steps for any team or researcher:

Start with a small pilot: pick a few tiles and a well‑understood application (e.g., NDVI time‑series for a set of fields) and validate end‑to‑end performance and cost in East US 2.
Codify provenance: always store tile IDs, collection version, processing flags, and SAS/credential snapshots for reproducibility.
Combine domain expertise with AI: use Earth Copilot–style interfaces to accelerate discovery, but require deterministic pipelines for final results and decisions.

The combination of NASA’s Harmonized Landsat and Sentinel‑2 (HLS) dataset and Microsoft’s Planetary Computer creates a durable platform for scaling geospatial science and operational monitoring. For developers and analysts ready to run at scale, HLS on Azure lowers the barrier to building high‑cadence, high‑resolution environmental analytics — provided teams pair the new convenience with rigorous validation, version control, and sensible cloud governance practices.

Source: NASA Earthdata (.gov) Harmonized Landsat and Sentinel-2 (HLS) Data Now Available on Microsoft's Planetary Computer | NASA Earthdata

Search

Navigation section

Harmonized Landsat Sentinel-2 on Azure: Cloud Native HLS for Scalable Geospatial AI

Background: what HLS is and why it matters

What Microsoft hosting changes: technical details and access

How the data are stored and organized on Azure

Dataset products and index layers

Programmatic access and sample flows

Why this matters: new capabilities unlocked by cloud hosting

Practical use cases and examples

Agriculture and crop monitoring

Disturbance detection and disaster response

Long‑term land‑cover change and carbon accounting

Validation, known limitations, and scientific caveats

Risks and governance considerations for cloud hosting

How to get started — practical checklist

The human + AI angle: Earth Copilot and natural‑language discovery

Final assessment: strengths, trade‑offs, and recommended next steps

Navigation section

Harmonized Landsat Sentinel-2 on Azure: Cloud Native HLS for Scalable Geospatial AI

What Microsoft hosting changes: technical details and access​

How the data are stored and organized on Azure​

Dataset products and index layers​

Programmatic access and sample flows​

Why this matters: new capabilities unlocked by cloud hosting​

Practical use cases and examples​

Agriculture and crop monitoring​

Disturbance detection and disaster response​

Long‑term land‑cover change and carbon accounting​

Validation, known limitations, and scientific caveats​

Risks and governance considerations for cloud hosting​

How to get started — practical checklist​

The human + AI angle: Earth Copilot and natural‑language discovery​

Final assessment: strengths, trade‑offs, and recommended next steps​

What Microsoft hosting changes: technical details and access

How the data are stored and organized on Azure

Dataset products and index layers

Programmatic access and sample flows

Why this matters: new capabilities unlocked by cloud hosting

Practical use cases and examples

Agriculture and crop monitoring

Disturbance detection and disaster response

Long‑term land‑cover change and carbon accounting

Validation, known limitations, and scientific caveats

Risks and governance considerations for cloud hosting

How to get started — practical checklist

The human + AI angle: Earth Copilot and natural‑language discovery

Final assessment: strengths, trade‑offs, and recommended next steps