This tutorial demonstrates how to apply NASA’s Harmonized Landsat-Sentinel2 (HSL) data for habitat monitoring in R. It demonstrates how to query, extract and analyse useful information at habitat level from the HLS data time series.

The tutorial is adapted from NASA’s tutorial(1) for Finnish use case by Pekka Hurskainen and Mikko Kervinen (Finnish Environment Institute Syke). This work was supported by Finnish Ecosystem Observatory (FEO) project, funded by Ministry of Environment. Many thanks to Janne Mäyrä (Syke) for helping with Github essentials.

Original tutorial by Aaron Friesz available here: https://github.com/nasa/HLS-Data-Resources/blob/main/r/HLS_Tutorial.Rmd

1. Introduction

The Harmonized Landsat Sentinel-2 (HLS) project is a NASA initiative aiming to produce a consistent, harmonized surface reflectance product from Landsat 8 and Sentinel-2 data acquired by the Operational Land Imager (OLI) and Multi-Spectral Instrument (MSI) aboard Landsat 8 and Sentinel-2 satellites, respectively. Using sets of algorithms, all the necessary radiometric, spectral, geometric, and spatial corrections have been applied to make HLS into seamless timeseries that are stackable and comparable.

Dense timeseries of HLS are creating unique and exciting opportunities to monitor, map dynamics in land surface properties with unprecedented spatial detail. Numerous studies such as land cover change, agricultural management, disaster response, water resources, and vegetation phenology will vastly benefit from higher temporal resolution of this dataset.

NASA’s Land Processes Distributed Active Archive Center (LP DAAC) archives and distributes HLS products in the LP DAAC Cumulus cloud archive as Cloud Optimized GeoTIFFs (COG). The primary objective of this tutorial is to show how to query and subset HLS data using the NASA CMR-STAC application programming interface (API). Using cloud hosted publicly available archive from LP DAAC Cumulus cloud, you will not need to download HLS source data files for your research needs anymore. STAC allows you to subset your desired data collection by region, time, and band. Therefore, you will only download the data you really want.

1.1 Use Case Example:

Here we demonstrate through one simple case study, how to monitor habitat sites with satellite image time series. As an example, we use a polygon of a seashore meadow (merenrantaniitty) habitat from Lumijoki, Finland. It is one of the largest continuous meadow habitats in Finland.

Here we use satellite images and NDVI index from the past 10 years to characterize vegetation dynamics. The hypothesis is, that by inspecting the time series and the trends of NDVI index, we are able to inspect not only the phenological patterns of the meadow vegetation, but potentially also spot the times when mowing took place.

In this tutorial, a case study is used to show how to process (calculate NDVI), visualize, and calculate statistics for an NDVI time series derived from HLS data over a region of interest. Bad quality pixels (flagged as clouds or cloud shadows in the quality assessment layer) as well as images outside growing season are filtered out before analysis. The tutorial also shows how to export the statistics so you will have all of the information you need for your area of interest without having to download the source data files.

1.2 Products Used

1. Daily 30 meter (m) global HLS Sentinel-2 Multi-spectral Instrument Surface Reflectance - HLSS30.002
Science Dataset (SDS) layers:
- B8A (NIR Narrow)
- B04 (Red)
- Fmask (Quality)

2. Daily 30 meter (m) global HLS Landsat-8 OLI Surface Reflectance - HLSL30.002
Science Dataset (SDS) layers:
- B05 (NIR)
- B04 (Red)
- Fmask (Quality) ————————————————————————

1.3 Prerequisites

R and RStudio are required to execute this tutorial. Installation details can be found here.
A NASA Earthdata Login account is required to access the data used in this Tutorial. You can create an account here.
Setting up a netrc file:
- This tutorial leverages a netrc file storing your Earthdata login credentials for authentication. This file is assumed to be stored in the user profile directory on Window OS or in the home directory on Mac OS. Run the chunk below (earthdata_netrc_setup.R) to ensure you have the proper netrc file set up. If you are prompted for your NASA Earthdata Login Username and Password, hit enter once you have submitted your credentials. If neither HOME nor Userprofile are recognized by R, the current working directory is used.
- If you want to manually create your own netrc file, download the .netrc file template, add your credentials, and save to your Userprofile/HOME directory. You should also make sure that both HOME directory and Userprofile directories are as same as a directory you are saving the .netrc. To do that run Sys.setenv("HOME" = "YOUR DIRECTORY") and Sys.setenv("userprofile" = "YOUR DIRECTORY").

# source("earthdata_netrc_setup.R")

2. Getting Started

Clone or download this tutorial repository.
When you open this Rmarkdown notebook in RStudio, you can click the little green “Play” button in each grey code chunk to execute the code. The result can be printed either in the R Console or inline in the RMarkdown notebook, depending on your RStudio preferences.

2.1 Environment Setup: R version and libraries

Check the version of R by typing version into the console and RStudio by typing RStudio.Version() into the console and update them if needed. This tutorial has been tested on Windows using R Version 4.2.2 and RStudio version 2022.7.2.576.

The recuired packages are:

-   `terra`
-   `imager`
-   `leaflet`  
-   `jsonlite`  
-   `sp`   
-   `rasterVis`  
-   `httr`  
-   `ggplot2`  
-   `RColorBrewer`  
-   `dygraphs`
-   `xts`  
-   `xml2`  
-   `lubridate`  
-   `DT`     
-   'rstac'
-   'jpeg'

Run the cell below to identify any missing packages to install, and then load all of the required packages. Note the libPaths variable: change it to your custom folder or disable if you dont have one.

# If you have a custom folder for R libraries set it here
.libPaths("D:/Software/Rlibs")

packages <- c('terra','jsonlite','sp','httr', 'leaflet',
              'rasterVis','ggplot2','magrittr','RColorBrewer','xml2','dygraphs',
              'xts','lubridate','DT','rmarkdown', 'rprojroot','imager', 'knitr', 'rstac')

# Identify missing (not installed) packages
new.packages = packages[!(packages %in% installed.packages()[,"Package"])]

# Install new (not installed) packages
if(length(new.packages)) install.packages(new.packages, repos='http://cran.rstudio.com/') else print('All required packages are installed.')

[1] "All required packages are installed."

2.2 Load Required Libraries

Next load all packages using library() function.

invisible(lapply(packages, library, character.only = TRUE))

#clear unnecessary variables
rm(new.packages, packages)

2.3 Set Up Working Directory

Create an output directory for the results.

# Create an output directory if it doesn't exist
outDir <- file.path("R_Output", fsep="/")
suppressWarnings(dir.create(outDir))

Now we are ready to go!

3. CMR-STAC API: Searching for Items

We will use the LPCLOUD STAC API endpoint to query the HLS data by region of interest and time period of interest. In order to retrieve STAC Items that match your criteria, you need to define your query parameters.

To learn how to navigate through the structure of a CMR-STAC Catalog and define Search parameters, see Getting Started with NASA’s CMR-STAC API. Information on the specifications for adding parameters to a search query can be found here.

First, assign NASA’s STAC URL to a “rstac_query” object. It contains all request parameters to be provided to the API service.

In this example we use NASA’s “CMR STAC” (Common Metadata Repository SpatioTemporal Asset Catalog (STAC) and “LPCLOUD” provider. More info: https://github.com/nasa/cmr-stac

#rstac::stac - create a /stac API endpoint for the NASA CMR and LPCLOUD provider. 
#Version number of the stac is required.
s_obj <- stac("https://cmr.earthdata.nasa.gov/cloudstac/LPCLOUD", force_version = "1.0.0")

3.1 Collection Query

We will need to assign the lists one or more Collection IDs (Product shortnames) we want to include in the search query to a variable. Only Items in one of the provided Collections will be searched. Here, we are interested in both HLS Landsat-8 and Sentinel-2 collections. To learn how to access Collection IDs in CMR-STAC visit Getting Started with NASA’s CMR-STAC API.

See what collections are available in LPCLOUD with a GET request:

collections_query = rstac::collections(s_obj)
available_collections <- rstac::get_request(collections_query)
available_collections

###Collections
- collections (10 item(s)):
  - MOD09GA.v061
  - MOD09GQ.v061
  - MYD09GA.v061
  - MYD09GQ.v061
  - MCD18A2.v061
  - HLSL30.v2.0
  - HLSS30.v2.0
  - MCD43A1.v061
  - MCD43A4.v061
  - MOD11A1.v061
- field(s): id, stac_version, description, license, type, links, collections

From the available collections, we choose the HLS collections and add them to a new list() variable:

HLS_col <- list("HLSS30.v2.0", "HLSL30.v2.0")

3.2 Spatial Query Search Parameter

Import the meadow shapefile. It has one habitat boundary with polygon geometry. Reproject to WGS84.

meadow.shp = terra::vect("D:\\Data\\FEO\\Habitat_monitoring_HLS\\Pitkänokka_3067.shp")
meadow.shp = project(meadow.shp, "EPSG:4326")

Then we can use the leaflet package to plot the polygon boundary on top of a basemap.

leaflet() %>% 
  addPolygons(data = meadow.shp, fill = FALSE) %>% 
  addProviderTiles(providers$Esri.WorldImagery) %>% 
  addMiniMap(zoomLevelFixed = 5)

This map lets us visualize our study area (Region of Interest (ROI)), which is important to confirm. But when we actually want to search the CMR-STAC for spatial subsetting, we’ll need a parameter called a “bounding box”, indicated by the lower left and upper right coordinates of our study area. Like the name suggests the bounding box is rectangular shape with four coordinates, and covers a bit larger area than the roi. Below, we extract the extent of the input shapefile, and we can use it to create spatial search parameters.

roi <- terra::ext(meadow.shp)
bbox = as.numeric(c(roi[1], roi[3], roi[2], roi[4]))

3.3 Temporal Query Search Parameter

Next, we will need to define a temporal search query parameter. Note that the temporal ranges should be specified as a pair of date-time values separated by comma (,) or forward slash (/). Each date-time value must have the format ofYYYY-MM-DDTHH:MM:SSZ. Additional information on setting temporal searches can be found in the NASA CMR Documentation.

In this example, we will set the time period of interest from beginning of June 2013 until end of September 2023, i.e. to search the entire catalog of HLS data.

datetime <- "2013-05-01T00:00:00Z/2023-09-30T23:59:59Z"   # YYYY-MM-DDTHH:MM:SSZ/YYYY-MM-DDTHH:MM:SSZ

3.4 Submit a Query for Our Search Criteria

Now we are all prepared to submit a request to the CMR-STAC endpoint! We will do this using the rstac package. The output will show all the available data matching our search criteria based on our datetime and bounding box.

The rstac::stac_search() function implements ⁠/search API endpoint. It prepares query parameters used in the search API request, a stac object with all filter parameters that we will pass to post_request function. The GeoJSON content returned by these requests is a doc_items object, a regular R list representing a STAC Item Collection document.

#First we construct the search body for POST request.
search.body.rstac = stac_search(q = s_obj,
    collections = HLS_col, 
    datetime = datetime,
    bbox = bbox,
    limit = 100) 

#send POST request, and save it to executed_stac_query variable.
executed_stac_query  = post_request(search.body.rstac, encode="json") 
executed_stac_query

###Items
- matched feature(s): 4854
- features (100 item(s) / 4754 not fetched):
  - HLS.L30.T34WFT.2013121T094213.v2.0
  - HLS.L30.T35WMM.2013121T094213.v2.0
  - HLS.L30.T35WMN.2013121T094213.v2.0
  - HLS.L30.T34WFS.2013121T094237.v2.0
  - HLS.L30.T35WMN.2013137T094225.v2.0
  - HLS.L30.T35WMM.2013137T094225.v2.0
  - HLS.L30.T34WFT.2013137T094225.v2.0
  - HLS.L30.T34WFS.2013137T094249.v2.0
  - HLS.L30.T35WMM.2013144T094839.v2.0
  - HLS.L30.T34WFS.2013144T094839.v2.0
  - ... with 90 more feature(s).
- assets: 
B01, B02, B03, B04, B05, B06, B07, B09, B10, B11, browse, Fmask, metadata, SAA, SZA, VAA, VZA
- item's fields: 
assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type

Note that only a limited number of search results will be given in the first “page” (here, 100 of 974 features). To get all results, we need to pass this to items_fetch() which automatically gets the remaining results.

(Note 2: this took us a lot of time to figure out how to fetch all results. In the original NASA tutorial this was not implemented, and it used the httr library instead. The good thing with rstac library is that one can apply the user-friendly helper tools it has, like items_fetch()), which can do exactly what we need:

executed_stac_query_full = items_fetch(executed_stac_query, encode="json")


  |                                                                            
  |======================================================================| 100%

executed_stac_query_full

###Items
- matched feature(s): 4854
- features (4854 item(s) / 0 not fetched):
  - HLS.L30.T34WFT.2013121T094213.v2.0
  - HLS.L30.T35WMM.2013121T094213.v2.0
  - HLS.L30.T35WMN.2013121T094213.v2.0
  - HLS.L30.T34WFS.2013121T094237.v2.0
  - HLS.L30.T35WMN.2013137T094225.v2.0
  - HLS.L30.T35WMM.2013137T094225.v2.0
  - HLS.L30.T34WFT.2013137T094225.v2.0
  - HLS.L30.T34WFS.2013137T094249.v2.0
  - HLS.L30.T35WMM.2013144T094839.v2.0
  - HLS.L30.T34WFS.2013144T094839.v2.0
  - ... with 4844 more feature(s).
- assets: 
B01, B02, B03, B04, B05, B06, B07, B08, B09, B10, B11, B12, B8A, browse, Fmask, metadata, SAA, SZA, VAA, VZA
- item's fields: 
assets, bbox, collection, geometry, id, links, properties, stac_extensions, stac_version, type

cat('The full search found', length(executed_stac_query_full$features), 'images.')

The full search found 4854 images.

Next we go through the full list of results, and filter out images that are outside growing season: January-May and October-December. Optionally, May could be included also, but snow/ice could still be present in the images and would require us to do extra filtering.

#First, create a new "property" for "executed_stac_query_full" called "growing_season" and give it value 1 if its inside growing #season, otherwise value 0. The "start_datetime" property is searched for a matching string for months 6-9 (June-September). This new property will be used for filtering.
for (i in 1:length(executed_stac_query_full$features)) {
                     #if(grepl("...-05-", executed_stac_query_full$features[[i]]$properties$start_datetime))
                     #   { 
                     #  executed_stac_query_full$features[[i]]$properties$growing_season = 1 
                     #   }
                     if(grepl("...-06-", executed_stac_query_full$features[[i]]$properties$start_datetime))
                        { 
                       executed_stac_query_full$features[[i]]$properties$growing_season = 1 
                        } 
                     else if(grepl("...-07-", executed_stac_query_full$features[[i]]$properties$start_datetime))
                        { 
                       executed_stac_query_full$features[[i]]$properties$growing_season = 1 
                        }
                     else if(grepl("...-08-", executed_stac_query_full$features[[i]]$properties$start_datetime))
                        { 
                       executed_stac_query_full$features[[i]]$properties$growing_season = 1 
                        }
                     else if(grepl("...-09-", executed_stac_query_full$features[[i]]$properties$start_datetime))
                        { 
                       executed_stac_query_full$features[[i]]$properties$growing_season = 1 
                        }
                     else 
                       {
                       executed_stac_query_full$features[[i]]$properties$growing_season = 0  
                       }
  }

#Then use items_filter() helper function to filter out images outside growing period:
executed_stac_query_full = items_filter(executed_stac_query_full, properties$growing_season == 1)

cat('After filtering out images outside growing period, we remain with', length(executed_stac_query_full$features), 'images.')

After filtering out images outside growing period, we remain with 2723 images.

We can also easily filter out images with too much cloud cover, e.g. more than 25%, using “eo:cloud_cover” property. You can allow more clouds to increase the number of images, or even fetch all available images regardless of cloud cover. Then you need to afterwards filter cloudy pixels using fmask, with the cost of longer downloading and processing time, and accepting that you would have many “empty images” in your time series as a result.

executed_stac_query_full = items_filter(executed_stac_query_full, properties$`eo:cloud_cover` < 25)

cat('After filtering out images with too much cloud cover, we remain with', length(executed_stac_query_full$features), 'images.')

After filtering out images with too much cloud cover, we remain with 628 images.

Now we can look at the name fields in the output, and list the assets of the retrieved items.

names(executed_stac_query_full)

[1] "type"           "stac_version"   "numberMatched"  "numberReturned"
[5] "features"       "links"          "context"

items_assets(executed_stac_query_full)

 [1] "B01"      "B02"      "B03"      "B04"      "B05"      "B06"     
 [7] "B07"      "B08"      "B09"      "B10"      "B11"      "B12"     
[13] "B8A"      "browse"   "Fmask"    "metadata" "SAA"      "SZA"     
[19] "VAA"      "VZA"

Detailed information about the features in our response can be found in features property. Let’s select the first feature and view its metadata content.

search_features <- executed_stac_query_full$features


Feature1 <- search_features[[1]]
Feature1 %>% 
  jsonlite::toJSON(auto_unbox = TRUE, pretty = TRUE)

{
  "type": "Feature",
  "id": "HLS.L30.T34WFT.2013178T093612.v2.0",
  "stac_version": "1.0.0",
  "stac_extensions": "https://stac-extensions.github.io/eo/v1.0.0/schema.json",
  "collection": "HLSL30.v2.0",
  "geometry": {
    "type": "Polygon",
    "coordinates": [
      [
        [25.414, 64.7707],
        [25.4969, 65.2667],
        [25.0931, 64.78],
        [25.414, 64.7707]
      ]
    ]
  },
  "bbox": [25.0931, 64.7707, 25.4969, 65.2667],
  "links": [
    {
      "rel": "self",
      "href": {}
    },
    {
      "rel": "self",
      "href": "https://cmr.earthdata.nasa.gov/cloudstac/LPCLOUD/collections/HLSL30.v2.0/items/HLS.L30.T34WFT.2013178T093612.v2.0"
    },
    {
      "rel": "parent",
      "href": "https://cmr.earthdata.nasa.gov/cloudstac/LPCLOUD/collections/HLSL30.v2.0"
    },
    {
      "rel": "collection",
      "href": "https://cmr.earthdata.nasa.gov/cloudstac/LPCLOUD/collections/HLSL30.v2.0"
    },
    {
      "rel": "root",
      "href": "https://cmr.earthdata.nasa.gov/cloudstac/"
    },
    {
      "rel": "provider",
      "href": "https://cmr.earthdata.nasa.gov/cloudstac/LPCLOUD"
    },
    {
      "rel": "via",
      "href": "https://cmr.earthdata.nasa.gov/search/concepts/G2243105404-LPCLOUD.json"
    },
    {
      "rel": "via",
      "href": "https://cmr.earthdata.nasa.gov/search/concepts/G2243105404-LPCLOUD.umm_json"
    }
  ],
  "properties": {
    "eo:cloud_cover": 18,
    "datetime": "2013-06-27T09:36:12.277Z",
    "start_datetime": "2013-06-27T09:36:12.277Z",
    "end_datetime": "2013-06-27T09:36:36.155Z",
    "growing_season": 1
  },
  "assets": {
    "B05": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.B05.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.B05.tif"
    },
    "B11": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.B11.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.B11.tif"
    },
    "B06": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.B06.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.B06.tif"
    },
    "B01": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.B01.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.B01.tif"
    },
    "B10": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.B10.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.B10.tif"
    },
    "SAA": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.SAA.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.SAA.tif"
    },
    "B07": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.B07.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.B07.tif"
    },
    "VZA": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.VZA.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.VZA.tif"
    },
    "B03": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.B03.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.B03.tif"
    },
    "SZA": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.SZA.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.SZA.tif"
    },
    "B09": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.B09.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.B09.tif"
    },
    "B02": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.B02.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.B02.tif"
    },
    "VAA": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.VAA.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.VAA.tif"
    },
    "B04": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.B04.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.B04.tif"
    },
    "Fmask": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.Fmask.tif",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.Fmask.tif"
    },
    "browse": {
      "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/HLSL30.020/HLS.L30.T34WFT.2013178T093612.v2.0/HLS.L30.T34WFT.2013178T093612.v2.0.jpg",
      "type": "image/jpeg",
      "title": "Download HLS.L30.T34WFT.2013178T093612.v2.0.jpg"
    },
    "metadata": {
      "href": "https://cmr.earthdata.nasa.gov/search/concepts/G2243105404-LPCLOUD.xml",
      "type": "application/xml"
    }
  }
}

Next, let’s load the browse image to get a quick view of the first feature in the time series.

browse_image_url <- Feature1$assets$browse$href

browse <-load.image(browse_image_url)
plot(browse)

4. Accessing and Interacting with HLS Data

This section will demonstrate how to leverage the Region of Interest boundaries to perform a spatial subset as well as identify the bands we are interested in and access those data.

First, set up rgdal configurations to access the cloud assets that we are interested in. You can learn more about these configuration options here.

setGDALconfig("GDAL_HTTP_UNSAFESSL", value = "YES")
setGDALconfig("GDAL_HTTP_COOKIEFILE", value = ".rcookies")
setGDALconfig("GDAL_HTTP_COOKIEJAR", value = ".rcookies")
setGDALconfig("GDAL_DISABLE_READDIR_ON_OPEN", value = "EMPTY_DIR")
setGDALconfig("CPL_VSIL_CURL_ALLOWED_EXTENSIONS", value = "TIF")

4.1 Subset assets by Band

To subset assets by band, we will filter bands to include the NIR, Red, and Quality (Fmask) layers in the list of links to access. Below you can find the different band numbers for each of the two products. Additional information on HLS band allocations can be found here.

Sentinel 2:
- “narrow” NIR = B8A
- Red = B04
- Quality = Fmask
Landsat 8:
- NIR = B05
- Red = B04
- Quality = Fmask

Below, we’ll make a searchable data table including links to assets. The granule_list object is defined to store these links. Our final step will be a searchable data table!

granule_list <- list()
n <- 1
for (item in 1:length(search_features)){                       # Get the NIR, Red, and Quality band layer names
  if (search_features[[item]]$collection == 'HLSS30.v2.0'){
    ndvi_bands <- c('B8A','B04','Fmask')
  }
  else{
    ndvi_bands <- c('B05','B04','Fmask')
  }
  for(b in ndvi_bands){
    f <- search_features[[item]]
    b_assets <- f$assets[[b]]$href
    
    df <- data.frame(Collection = f$collection,   # Make a data frame including links and other info
                     Granule_ID = f$id,
                     Cloud_Cover = f$properties$`eo:cloud_cover`,
                     band = b,
                     Asset_Link = b_assets, stringsAsFactors=FALSE)
    granule_list[[n]] <- df
    n <- n + 1
  }
}

# Create a searchable datatable
search_df <- do.call(rbind, granule_list)
DT::datatable(search_df)

4.2 Subset HLS data spatially and stack HLS data layers

Accessing files in the cloud requires you to authenticate using your NASA Earthdata Login account. In the prerequisite section of this tutorial, a proper netrc file has been set up by calling earthdata_netrc_setup.R script.

We need to find out in which UTM zone the ROI is, and set that zone as the target CRS for reprojecting data from HLS collections that might be in other UTM zones.

if(roi[1] >= 18 && roi[1] <= 24) {
  print("The ROI is in UTM zone 34N") 
  target_crs = c("EPSG:32635")
  } else if(roi[1] >= 24 && roi[1] <= 30) {
    print("The ROI is in UTM zone 35N")
    target_crs = c("EPSG:32635")
    } else if(roi[1] >= 30 && roi[1] <= 36) {
      print("The ROI is in UTM zone 36N")
      target_crs = c("EPSG:32636")
      } else {
        print("The ROI is not in Finland.")
      }

[1] "The ROI is in UTM zone 35N"

Reproject the ROI shapefile to the target CRS. Then create a SpatVector object from the ROI, to be used for first cropping of the images. Also add 100 m buffer, just in case.

meadow.shp.utm <- terra::project(meadow.shp, target_crs) 
roi = vect(roi)
crs(roi) = "EPSG:4326"
roi = buffer(roi, 100)
crs(roi) = "EPSG:4326"

Below, we create a list of raster layers for each of our bands of interest (i.e., Red, NIR, and Fmask). The following things are done: 1) Create a reference geometry from the 1st image in the search_df. All images will be resampled to this, as there might be small differences in extents. 2) In the for loop, the NIR, Fmask and Red bands are first read into the memory. 3) As the whole image takes a lot of memory, it is good idea to do the first cropping now, using the buffered bounding box of the ROI. 4) Then the cropped image is reprojected to target CRS (as defined above). 5) A second crop is done, now to the exact boundaries of the ROI, also masking out pixels outside the ROI. 6) The image is resampled to the reference geometry (as defined before the for loop begins). 7) The processed image is saved to a list. 8) Also, date of the image and indication if it is Sentinel or Landsat is saved in a separate list.

Note that the rast function is making a separate request for each data layer located in the Cumulus cloud archive. This takes time, and the more data layers we have in the time series, the longer this cell takes to run.

Also note there is a if statement in the beginning checking if the ndvi stack is already calculated, to prevent from unnecessarily downloading all images again.

if(!exists("ndvi_stack")) {
  red_stack <- nir_stack <- fmask_stack <- date_list <- list()
  #Progress bar  Disabled for Rmarkdown
  #pb = txtProgressBar(min = 0, max = length(search_df$band), initial = 0, style = 3)
  l <- m <- n  <- 0

  #Create a reference geometry 
  ref.geom = terra::rast(paste0('/vsicurl/', search_df$Asset_Link[1]))
  ref.geom = terra::project(ref.geom, target_crs) #reproject to target CRS defined above
  ref.geom = terra::crop(ref.geom, meadow.shp.utm, mask=T)

  for (row in seq(length(search_df$band))){
  #  setTxtProgressBar(pb,row)
    if (search_df$band[row] == 'B04'){
      l = l+1
      red = terra::rast(paste0('/vsicurl/', search_df$Asset_Link[row]))  
      red = crop(red, project(roi, crs(red, proj=T))) #first crop 
      red = terra::project(red, method="bilinear", target_crs) #reproject to target CRS defined above
      red_crop = terra::crop(red, meadow.shp.utm, mask=T) #second crop 
      red_crop = terra::resample(red_crop, method="bilinear", ref.geom) #resample to the reference geometry
      red_stack[[l]] = red_crop
    
      doy_time = strsplit(sources(terra::rast(paste0('/vsicurl/', search_df$Asset_Link[row]))), "[.]")[[1]][14]
      doy = substr(doy_time, 1, 7)
      date = as.Date(as.integer(substr(doy,5,7)), origin = paste0(substr(doy,1,4), "-01-01"))
      if (strsplit(sources(terra::rast(paste0('/vsicurl/', search_df$Asset_Link[row]))), "[.]")[[1]][12] == 'S30'){
        date_list[[l]] = paste0('S',as.character(date))
      }else{
        date_list[[l]] = paste0('L',as.character(date))
      }
      rm (red, red_crop)
    }else if (search_df$band[row] == 'Fmask'){
      m = m+1
      fmask = terra::rast(paste0('/vsicurl/', search_df$Asset_Link[row]))
      fmask = crop(fmask, project(roi, crs(fmask, proj=T)))
      fmask = terra::project(fmask, method="near", target_crs) #reproject to target CRS, use nearest neighbor for categorical data
      fmask_crop = terra::crop(fmask, meadow.shp.utm, mask=T)
      fmask_crop = terra::resample(fmask_crop, method="near", ref.geom) 
      fmask_stack[[m]] =  fmask_crop
      rm(fmask, fmask_crop)
    }else{
      n = n+1
      nir = terra::rast(paste0('/vsicurl/', search_df$Asset_Link[row]))
      nir = crop(nir, project(roi, crs(nir, proj=T)))
      nir = terra::project(nir, method="bilinear", target_crs)
      nir_crop = terra::crop(nir, meadow.shp.utm, mask=T)
      nir_crop = terra::resample(nir_crop, method="bilinear", ref.geom)
      nir_stack[[n]] = nir_crop
      rm(nir, nir_crop)
    }
  }
}
#close(pb)

Remove variables not needed anymore to save memory.

rm(search_df, search_features, granule_list, Feature1, executed_stac_query, executed_stac_query_full)

Now, the cropped and stacked rasters are loaded into memory without being stored to a hard disk.

5. Processing HLS Data

Before going further, it is imperative to mask out “bad quality” pixels from further analysis. This information can be found in the fmask (Quality Assesment) layers, which are identically produced for both Landsat and Sentinel scenes by NASA. See HLS User guide, Table 9 for description of the bits in the fmask layer. There are many ways to do this, one example is given by Ben Mack at his tutorial: https://benmack.github.io/nasa_hls/build/html/tutorials/Working_with_HLS_datasets_and_nasa_hls.html

Following his tutorial idea, we will mask out pixels flagged as cirrus, cloud or cloud shadow in the fmask layer. All other pixels we consider valid. Long story short, these pixels values in the fmask can be considered clear sky or valid pixels:

valid_pixels = c(0, 4, 16, 20, 32, 36,  48,  52,  64,  68,  80,  84,  96, 100, 112, 116, 128, 132, 144, 148, 160, 164, 176, 180, 192, 196, 208, 212, 224, 228, 240, 244)

In some cases you might want to mask out also other things, perhaps pixels flagged as “adjacent to cloud” or “snow/ice”, depending on your needs.

Now we go through all images in red_stack and nir_stack, and mask out “invalid” pixels with the help of fmask_stack and the valid_pixels list, using the inverse option in mask() function.

for (i in 1:length(nir_stack)){
  red_stack[[i]] = mask(red_stack[[i]], fmask_stack[[i]], maskvalues=valid_pixels, inverse=T)
  nir_stack[[i]] = mask(nir_stack[[i]], fmask_stack[[i]], maskvalues=valid_pixels, inverse=T)
}

Now we can start asking our science questions. First we define the NDVI function and then execute it on the data loaded into memory.

5.1 Calculate NDVI

Create a function to calculate NDVI.

calculate_NDVI <- function(nir, red){
  ndvi <- (nir-red)/(nir+red)
  return(ndvi)
}

Now we can calculate NDVI from stacked Red and NIR rasters. We will create layer names from the dates the data is captured, along with the first letter of L and S shows which sensor is data from.

if(!file.exists('Results/NDVI_stack.tif')) {
  ndvi_stack <- list()
  for (i in 1:length(nir_stack)){ 
    # Calculate NDVI 
    ndvi_stack[[i]] <- calculate_NDVI(nir_stack[[i]], red_stack[[i]])
      # Exclude the Inf and -Inf from NDVI
    ndvi_stack[[i]][ndvi_stack[[i]] == Inf] <- NA
    ndvi_stack[[i]][ndvi_stack[[i]] == -Inf] <- NA
      # Exclude values below 0 and above 1
    ndvi_stack[[i]][ndvi_stack[[i]] < 0] <- NA
    ndvi_stack[[i]][ndvi_stack[[i]] > 1] <- NA
    names(ndvi_stack[[i]]) <- date_list[[i]]
  }
}

Finally, create a terra raster object from the ndvi stack.

ndvi_stacks <- terra::rast(ndvi_stack)

Let’s plot the first item in NDVI time series:

# Create a color palette 
pal <- colorNumeric(terrain.colors(n = 100), c(0,1) ,na.color = "transparent", reverse = TRUE)

leaflet() %>% 
    addProviderTiles(providers$Esri.WorldImagery) %>%
    addRasterImage(ndvi_stacks[[1]], color = pal, opacity = 1) %>%
    addPolygons(data = meadow.shp, fill = FALSE) %>%
    addMiniMap(zoomLevelFixed = 5) %>%
    leaflet::addLegend(pal = pal, values = c(0,1), title = "NDVI")

5.2 Visualize Stacked Time Series

Now we can plot multiple layers to create an interactive NDVI time series map with leaflet. Click on the dates on the left side to view the layer. Here we plot just the first 100 NDVI images. If you see images with no data, these are probably images which didn’t have any valid observations. They were masked out with the fmask information.

base<-c('map<-leaflet()%>%
            addProviderTiles(providers$Esri.WorldImagery) %>%
            addMiniMap(zoomLevelFixed = 5) %>%')

# make a string including the addRasterImage function for every layer in the raster stack
X <- lapply(1:100, function(j){                            #to get all, use lapply(1:nlyr(ndvi_stacks), function(j)
    paste(paste("addRasterImage(ndvi_stacks[[",j,"]],
             colors = pal,
             opacity=1,
             group=names(ndvi_stacks[[",j,"]]))", sep=""),"%>% \n")})

X <- do.call(paste, X)

controls<-"addLayersControl(baseGroups=names(ndvi_stacks[[1:100]]),
               options = layersControlOptions(collapsed=F), position = 'topleft')%>%"

legend <- "leaflet::addLegend(pal = pal, values = c(0,1), title = 'NDVI')"

final <- paste(base, X, controls, legend ,sep="\n")
eval(parse(text=final))
map

5.3 Export Statistics

Plot the time series as boxplots showing the distribution of NDVI values for the entire time series from our habitat patch:

raster::boxplot(ndvi_stacks, col=c('olivedrab3'),  main='NDVI Time Series', ylab='NDVI', names = names(ndvi_stacks), las=2)

This plot gets too crowded. We can also boxplot year by year, like so:

raster::boxplot(ndvi_stacks[[grepl(".2023", names(ndvi_stacks))]], col=c('olivedrab3'),  main='NDVI Time Series, year 2023', ylab='NDVI',  las=2)

Multiple years can also work in one plot, if we have fewer observations per year:

years = c(".2013", ".2014", ".2015")
raster::boxplot(ndvi_stacks[[grepl(paste(years, collapse='|'), names(ndvi_stacks))]], col=c('olivedrab3'),  main='NDVI Time Series, years 2013-2015', ylab='NDVI',  las=2)

rm(years)

Next, calculate the statistics for each observation and export to CSV. Raster stack is used to calculate the statistics.

ndvi_mean <- terra::global(ndvi_stacks, 'mean', na.rm=TRUE)
ndvi_max <- terra::global(ndvi_stacks, 'max', na.rm=TRUE)
ndvi_min <- terra::global(ndvi_stacks, 'min', na.rm=TRUE)
ndvi_sd <- terra::global(ndvi_stacks, 'sd', na.rm=TRUE)

Make interactive plots of raster statistics using dygraphs library. The date is formatted using lubridate package and xts package is used to transform the dataframe to the xts format.

stats <- data.frame(
  Date=substr(names(ndvi_stacks), 2,11),
  NDVI_Max = ndvi_max,
  NDVI_Min = ndvi_min,
  NDVI_mean = ndvi_mean,
  NDVI_SD = ndvi_sd
)
stats$Date <- ymd(stats$Date)                      # reformat the date
variables = xts(x=stats[,-1], order.by=stats$Date) # Choose the cols with the variables
dygraph(variables)

If you want to export these statistics, we can do so to a CSV file.

stats_name <- file.path("NDVI_Statistics.csv")
write.csv(stats,stats_name)

6. Export Output to GeoTIFF

Finally, if you want to capture the final output files locally on your machine, you can export the output files as GeoTIFFs.

for (i in 1:length(ndvi_stacks)){
  output_name <- file.path(outDir, paste0(names(ndvi_stacks[[i]]), "_NDVI.tif"))
  terra::writeRaster(ndvi_stacks[[i]], output_name, overwrite = TRUE)
}

The raster stack object can also be written to the disk. Again check if it already has been saved so no unnecessary writing.

output_name <- file.path(outDir, "NDVI_stack.tif")
terra::writeRaster(ndvi_stacks, filename=output_name, overwrite=TRUE)

And we’re done! You have successfully analyzed data in the cloud, exporting just the information you needed for your area of interest rather than having to download everything.

Habitat monitoring demo with Harmonized Landsat Sentinel-2 (HLS) data and NDVI time-series