Metadata

title: "E-TRAINEE Tutorial - Sentinel-2 snow cover time series classification in Python (tutorial with excercise)"
description: "This is a tutorial with excercise within the third theme of Module 1 of the E-TRAINEE course."
lastUpdate: 2023-07-21
authors: Andreas Mayr

Sentinel-2 snow cover time series classification in Python (Tutorial with excercise)¶

This Notebook shows how to implement a simple snow classification workflow on a Sentinel-2 (S-2) satellite image time series.

For a small area-of-interest (AOI) in the Alps (Obergurgl, Austria), an S-2 Normalized Difference Snow Index (NDSI) time series is downloaded from the Google Earth Engine (GEE) and classified per-scene into snow-covered and snow-free areas, using the widely used NDSI threshold of 0.4. The results of this simple rule-based classification are analysed and visualized regarding the duration of snow cover (months) per pixel and percentage of snow-covered area within the AOI per month.

For Google Earth Engine (GEE) and xarray based raster time series processing we use the geemap, wxee and eemont packages, which are contained in the requirements file provided for the course. Please see the instructions on the software page for setting up a Conda environment based on this file.

This is an extended version of the Notebook including a suggestion to solve the related ETRAINEE excercise.

Import packages, then authenticate and initialize Google Earth Engine.

In [1]:

Copied!

import ee, eemont, geemap, wxee
import ee, eemont, geemap, wxee

In [2]:

Copied!





try:
        wxee.Initialize()
except Exception as e:          # If initialize does not work, you probably have to authenticate first
        ee.Authenticate()
        wxee.Initialize()
try:
        wxee.Initialize()
except Exception as e:          # If initialize does not work, you probably have to authenticate first
        ee.Authenticate()
        wxee.Initialize()

Construct an area-of-interest (AOI)¶

We want a area around Obergurgl (Tyrol, Austria) as our area-of-interest (AOI). Using eemont, we pass a query for this place via the geopy package to a geocoding service. We just provide a string representing a place that is geocoded. A Bounding Box (ee.Geometry.BBox) for a geocoded place can be constructed from such a query using the ee.Geometry.BBoxFromQuery constructor (extended through eemont). Depending on the geocoding service used (e.g., ArcGIS, Baidu, Google Maps, IGN France, Nominatim, etc.), the resulting Bounding Boxes will differ. In this example, the method is convenient because we don't have to enter any coordinates for our AOI or import a file.

In [3]:

Copied!

AOI = ee.Geometry.BBoxFromQuery("Obergurgl, Austria", user_agent = "RS_course")
AOI = ee.Geometry.BBoxFromQuery("Obergurgl, Austria", user_agent = "RS_course")

The user_agent argument must be specified: This is a string describing the name of the app that is using a geocoding service.

The geemap package can directly display the ee.Geometry.BBox object with our AOI in an interactive map.

In [4]:

Copied!





Map = geemap.Map()
Map.addLayer(AOI,{'color':'blue'},'Nominatim')  # Open Street Map geocoding service
Map.centerObject(AOI,10)
Map
Map = geemap.Map()
Map.addLayer(AOI,{'color':'blue'},'Nominatim')  # Open Street Map geocoding service
Map.centerObject(AOI,10)
Map

Out[4]:

Map(center=[46.87037509744985, 11.02749840000902], controls=(WidgetControl(options=['position', 'transparent_b…

Retrieve an image collection from GEE, preprocess it and calculate the NDSI.

In [5]:

Copied!





S2 = ee.ImageCollection("COPERNICUS/S2_SR") \
    .filterDate("2022-01-01","2022-06-30") \
    .filterBounds(AOI) \
    .filterMetadata("CLOUDY_PIXEL_PERCENTAGE", "less_than", 30) \
    .preprocess() \
    .spectralIndices("NDSI") \
    .select("NDSI")
S2 = ee.ImageCollection("COPERNICUS/S2_SR") \
    .filterDate("2022-01-01","2022-06-30") \
    .filterBounds(AOI) \
    .filterMetadata("CLOUDY_PIXEL_PERCENTAGE", "less_than", 30) \
    .preprocess() \
    .spectralIndices("NDSI") \
    .select("NDSI")

We have filtered by a date range and by bounds, i.e. queried imagery for the year 2022 and for our AOI. The eemont preprocess() method automatically masks clouds and shadows and scales and offsets the image or image collection. The eemont spectralIndices() method can calculate any index from the Awesome Spectral Indices library. For snow classification we will only use the NDSI, no original bands.

Aggregate and download time series¶

Create a wxee TimeSeries object from the image collection to to add more functionality, such as temporal aggregations.

In [6]:

Copied!

S2_ts = S2.wx.to_time_series()
S2_ts = S2.wx.to_time_series()

First, get some info about our time series.

In [7]:

Copied!

S2_ts.describe()
S2_ts.describe()

COPERNICUS/S2_SR
	Images: 68
	Start date: 2022-01-01 10:27:43 UTC
	End date: 2022-06-27 10:18:12 UTC
	Mean interval: 2.64 days

And plot the timeline to see how the (low-cloudiness) observations are distributed over the observation period.

In [8]:

Copied!

S2_ts.wx.to_time_series().timeline()
S2_ts.wx.to_time_series().timeline()

Perform a temporal aggregation (calculate monthly medians) of our time series in the GEE.

In [9]:

Copied!

S2_monthly_ts= S2_ts.aggregate_time(frequency="month", reducer=ee.Reducer.median())
S2_monthly_ts= S2_ts.aggregate_time(frequency="month", reducer=ee.Reducer.median())

Now use wxee to get the result of the time series aggregation to an xarray Dataset (i.e. download the aggregated rasters). You could also download multiple images as GeoTIFFs. Wxee, however, is not designed to download larger areas and high resolution - in such cases consider downloading to a Google Drive using ee or to a local drive using geemap. In our case 12 months with 20 m resolution throw a GEE memory error ("User memory exceeded"). Therefore, we split the request and download simply into two parts with 6 months each.

Either define a path where a NetCDF file will be written to, or load the Dataset into memory. You can set the scale parameter to control the resolution. See the wxee documentation for more options.

The requests and downloads may take some minutes.

In [10]:

Copied!

S2_ds_1 = S2_monthly_ts.wx.to_xarray(region = AOI, crs='EPSG:25832', scale=20)
S2_ds_1 = S2_monthly_ts.wx.to_xarray(region = AOI, crs='EPSG:25832', scale=20)

Requesting data:   0%|          | 0/6 [00:00<?, ?it/s]

Downloading data:   0%|          | 0/6 [00:00<?, ?it/s]

We repeat the data query and download for the second part of the year.

In [11]:

Copied!





S2 = ee.ImageCollection("COPERNICUS/S2_SR") \
    .filterDate("2022-07-01","2022-12-31") \
    .filterBounds(AOI) \
    .filterMetadata("CLOUDY_PIXEL_PERCENTAGE", "less_than", 30) \
    .preprocess() \
    .spectralIndices("NDSI") \
    .select("NDSI")                                                                     # Query S2 NDSI from GEE
S2_ts = S2.wx.to_time_series()                                                          # Build wxee time series object
S2_monthly_ts= S2_ts.aggregate_time(frequency="month", reducer=ee.Reducer.median())     # Reduce NDSI to monthly median
S2_ds_2 = S2_monthly_ts.wx.to_xarray(region = AOI, crs='EPSG:25832', scale=20)          # Download monthly NDSI
S2 = ee.ImageCollection("COPERNICUS/S2_SR") \
    .filterDate("2022-07-01","2022-12-31") \
    .filterBounds(AOI) \
    .filterMetadata("CLOUDY_PIXEL_PERCENTAGE", "less_than", 30) \
    .preprocess() \
    .spectralIndices("NDSI") \
    .select("NDSI")                                                                     # Query S2 NDSI from GEE
S2_ts = S2.wx.to_time_series()                                                          # Build wxee time series object
S2_monthly_ts= S2_ts.aggregate_time(frequency="month", reducer=ee.Reducer.median())     # Reduce NDSI to monthly median
S2_ds_2 = S2_monthly_ts.wx.to_xarray(region = AOI, crs='EPSG:25832', scale=20)          # Download monthly NDSI

Requesting data:   0%|          | 0/6 [00:00<?, ?it/s]

Downloading data:   0%|          | 0/6 [00:00<?, ?it/s]

Concatenate the two xarray Datasets (six months of monthly NDSI each) along the time dimension and check the final Dataset we obtained.

Note that we have only downloaded the monthly medians of the NDSI, no other bands.

Show NDSI maps¶

Display the monthly NDSI maps using xarray's built-in plotting functions:

In [13]:

Copied!

S2_ds.NDSI.plot(col="time", col_wrap=6, cmap='coolwarm_r', center=0.4, aspect=1, robust=True)
S2_ds.NDSI.plot(col="time", col_wrap=6, cmap='coolwarm_r', center=0.4, aspect=1, robust=True)

Out[13]:

<xarray.plot.facetgrid.FacetGrid at 0x217e8593be0>

No description has been provided for this image

Snow classification for each month¶

Now we map binary snow cover simply by thresholding the NDSI with the widely used threshold 0.4 (which we also took to center the colormap in the plot above). This results in a time series of monthly snow/no-snow maps (encoded as 1/0).

In [14]:

Copied!





import xarray as xr
snow = xr.where(S2_ds.NDSI>0.4, 1, 0)
snow.attrs = S2_ds.attrs                                                # Copy the attributes of the S2 Dataset (CRS, spatial res., ...)
snow.plot(col='time', col_wrap=6, cbar_kwargs={'label': 'Snow cover'}, cmap='coolwarm_r')
import xarray as xr
snow = xr.where(S2_ds.NDSI>0.4, 1, 0)
snow.attrs = S2_ds.attrs                                                # Copy the attributes of the S2 Dataset (CRS, spatial res., ...)
snow.plot(col='time', col_wrap=6, cbar_kwargs={'label': 'Snow cover'}, cmap='coolwarm_r')

Out[14]:

<xarray.plot.facetgrid.FacetGrid at 0x217ed38e1a0>

Let's take this time series of snow/no-snow maps to quantify the percentage of our area-of-interest that is snow covered for each month.

In [15]:

Copied!

import numpy as np
snow_percentage = np.nanmean(snow.values, axis=(1,2))*100
import numpy as np
snow_percentage = np.nanmean(snow.values, axis=(1,2))*100

... and plot this.

In [16]:

Copied!





import matplotlib.pyplot as plt
plt.figure(figsize=(5,3))
plt.plot(snow.time, snow_percentage)
plt.title("Percentage of area covered by snow\nbased on monthly Sentinel-2 NDSI")
plt.xticks(rotation=45)
plt.ylabel("Snow covered area [%]")
import matplotlib.pyplot as plt
plt.figure(figsize=(5,3))
plt.plot(snow.time, snow_percentage)
plt.title("Percentage of area covered by snow\nbased on monthly Sentinel-2 NDSI")
plt.xticks(rotation=45)
plt.ylabel("Snow covered area [%]")

Out[16]:

Text(0, 0.5, 'Snow covered area [%]')

Snow cover duration per pixel¶

Finally calculate how many months every pixel is snow covered.

In [17]:

Copied!

snow.sum(dim='time').plot(cbar_kwargs={'label': 'Snow cover duration [months]'})
snow.sum(dim='time').plot(cbar_kwargs={'label': 'Snow cover duration [months]'})

Out[17]:

<matplotlib.collections.QuadMesh at 0x217f04a3c70>

Excercise: Snow cover mapping - Interpretation and sensitivity analysis¶

Tasks and questions:

The snow cover duration appears to be quite short in some parts of our AOI (given that the village of Obergurgl is located at > 1900 m a.s.l.)? Let's have a look at these areas. Try to get a simple color composite for one date using Google Earth Engine and the tools shown in the tutorial. Maybe this helps with interpretation. Alternatively use (monotemporal) higher resolution data from other sources and display them in Python or in a GIS. What are the surface characteristics of the areas with short snow cover duration? Does this point to potential limitations of the snow classification approach?
For the binary classification of NDSI into "snow" and "no snow", we had chosen an NDSI threshold of 0.4. What if we change this threshold? How does this affect the classification results? Test a range of thresholds (0.2 to 0.6 with step 0.02), and visualize the effects of varying the threshold on the spatio-temporal classification results.

RGB composite for interpretation of snow cover duration patterns¶

Why is the snow cover duration apparently so short in some parts of our AOI? Let's have a look at these areas. Maybe a simple color composite from one date helps with interpretation.

As our downloaded time series contains only the NDSI, we need a new query: We search for the closest image to a certain date matching also some other criteria (bounds, cloud cover percentage).

In [18]:

Copied!





S2_RGB = ee.ImageCollection("COPERNICUS/S2_SR") \
    .filterBounds(AOI) \
    .filterMetadata("CLOUDY_PIXEL_PERCENTAGE", "less_than", 10) \
    .closest('2022-08-01') \
    .preprocess() \
    .select(['B4', 'B3', 'B2'])                                                         # Query only certain bands from GEE
S2_RGB_ts = S2_RGB.wx.to_time_series()                                                  # Build wxee time series object (although we expect only one observation)
S2_RGB_ds = S2_RGB_ts.wx.to_xarray(region = AOI, crs='EPSG:25832', scale=10)            # Download to xarray dataset, now with 10 m resolution
S2_RGB = ee.ImageCollection("COPERNICUS/S2_SR") \
    .filterBounds(AOI) \
    .filterMetadata("CLOUDY_PIXEL_PERCENTAGE", "less_than", 10) \
    .closest('2022-08-01') \
    .preprocess() \
    .select(['B4', 'B3', 'B2'])                                                         # Query only certain bands from GEE
S2_RGB_ts = S2_RGB.wx.to_time_series()                                                  # Build wxee time series object (although we expect only one observation)
S2_RGB_ds = S2_RGB_ts.wx.to_xarray(region = AOI, crs='EPSG:25832', scale=10)            # Download to xarray dataset, now with 10 m resolution

Requesting data:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0/2 [00:00<?, ?it/s]

Note that this request has unexpectedly found two images of the same day. The reason is that our AOI is located on a tile overlap and, thus, gets imagery from both tiles.

In [19]:

Copied!

S2_RGB_ds['time'].values
S2_RGB_ds['time'].values

Out[19]:

array(['2022-08-09T10:27:53.000000000', '2022-08-09T10:28:07.000000000'],
      dtype='datetime64[ns]')

We simply take the per-pixel mean values of the two images along the time dimension and plot the RGB bands as a true color composite. Try to assess the landcover in the areas with relatively short snow cover duration. How could this impact the NDSI values?

In [55]:

Copied!

S2_RGB_ds.to_array().mean(dim='time').plot.imshow(robust=True)
plt.title('Sentinel-2 true color image summer 2022')
S2_RGB_ds.to_array().mean(dim='time').plot.imshow(robust=True)
plt.title('Sentinel-2 true color image summer 2022')

Out[55]:

Text(0.5, 1.0, 'Sentinel-2 true color image summer 2022')

Classification threshold sensitivity analysis¶

For the binary classification of NDSI into "snow" and "no snow", we had chosen an NDSI threshold of 0.4. What if we change this threshold? How does this affect the classification results?

Let's construct a new dataset snow_sensitivity with a new dimension called 'threshold', where we test a range of different values for the threshold parameter (0.2 to 0.6 with step 0.02).

In [21]:

Copied!

t_var = xr.Variable('threshold', np.arange(0.2, 0.62, 0.02))            # Create a variable for the tresholds
snow_sensitivity = xr.concat([xr.where(S2_ds.NDSI>t, 1, 0) for t in np.arange(0.2, 0.62, 0.02)], dim=t_var)
snow_sensitivity.attrs = S2_ds.attrs                                    # Copy attributes (CRS, spatial resolution, ...)
t_var = xr.Variable('threshold', np.arange(0.2, 0.62, 0.02))            # Create a variable for the tresholds
snow_sensitivity = xr.concat([xr.where(S2_ds.NDSI>t, 1, 0) for t in np.arange(0.2, 0.62, 0.02)], dim=t_var)
snow_sensitivity.attrs = S2_ds.attrs                                    # Copy attributes (CRS, spatial resolution, ...)

Spatio-temporal sensitivity of snow classification¶

For each pixel, we compute the average of all the binary snow classification outputs with different thresholds. Assuming that each of these threshold is equally likely to produce a correct classification, we call this average value "snow probability" and plot it in a map for each month. Moreover, we compute the variance of all classification outputs (referring to as "snow classification sensitivity to threshold"). Thereby, we can see where and when the choice of the classification threshold is most critical.

In [22]:

Copied!

snow_sensitivity.mean(dim='threshold').plot(col='time', col_wrap=6, cbar_kwargs={'label': 'Snow cover probability [%]'})
snow_sensitivity.mean(dim='threshold').plot(col='time', col_wrap=6, cbar_kwargs={'label': 'Snow cover probability [%]'})

Out[22]:

<xarray.plot.facetgrid.FacetGrid at 0x217eece5bd0>

In [39]:

Copied!

snow_sensitivity.var(dim='threshold').plot(col='time', col_wrap=6, cmap='Reds', cbar_kwargs={'label': 'Snow classification sensitivity to threshold'})
snow_sensitivity.var(dim='threshold').plot(col='time', col_wrap=6, cmap='Reds', cbar_kwargs={'label': 'Snow classification sensitivity to threshold'})

Out[39]:

<xarray.plot.facetgrid.FacetGrid at 0x2178726ff40>

Sensitivity of snow covered area to classification threshold¶

Finally, we want to see how the choice of classification thresholds affects the snow covered area determined over the course of the year.

In [40]:

Copied!

snow_percentage_sens = np.nanmean(snow_sensitivity.values, axis=(2,3))*100
snow_percentage_sens.shape
snow_percentage_sens = np.nanmean(snow_sensitivity.values, axis=(2,3))*100
snow_percentage_sens.shape

Out[40]:

(21, 12)

In [41]:

Copied!





import pandas as pd

df = pd.DataFrame(snow_percentage_sens, index=snow_sensitivity.threshold)               # each month in a separate column, lines indexed with the thresholds
df = df.melt(var_name='Month', value_name='Snow covered area [%]', ignore_index=False)  # melt converts the table to long-form (months are indicated by a column now)
df['Month'] = df['Month']+1                                                             # otherwise the months start with zero
df
import pandas as pd

df = pd.DataFrame(snow_percentage_sens, index=snow_sensitivity.threshold)               # each month in a separate column, lines indexed with the thresholds
df = df.melt(var_name='Month', value_name='Snow covered area [%]', ignore_index=False)  # melt converts the table to long-form (months are indicated by a column now)
df['Month'] = df['Month']+1                                                             # otherwise the months start with zero
df

Out[41]:

	Month	Snow covered area [%]
0.20	1	99.174355
0.22	1	99.063530
0.24	1	98.961017
0.26	1	98.833569
0.28	1	98.706120
...	...	...
0.52	12	89.682210
0.54	12	87.878536
0.56	12	85.384978
0.58	12	82.292965
0.60	12	77.405591

252 rows × 2 columns

Plot the mean and the standard deviation (light blue band) for the classifications of each month.

In [56]:

Copied!

import seaborn as sns
sns.lineplot(data=df, x='Month', y='Snow covered area [%]', errorbar='sd')
import seaborn as sns
sns.lineplot(data=df, x='Month', y='Snow covered area [%]', errorbar='sd')

Out[56]:

<Axes: xlabel='Month', ylabel='Snow covered area [%]'>

In [57]:

Copied!

sns.lineplot(data=df, x='Month', y='Snow covered area [%]', hue=df.index)
plt.legend(title='NDSI threshold')
sns.lineplot(data=df, x='Month', y='Snow covered area [%]', hue=df.index)
plt.legend(title='NDSI threshold')

Out[57]:

<matplotlib.legend.Legend at 0x21780f22bf0>

Conclusions¶

A simple thresholding approach applied on Sentinel-2 Normalized Difference Snow Index (NDSI) rasters was, in general, shown to be capable of mapping snow cover. Classification of a time series of monthly composites is straightforward and can plausibly track the seasonal evolution of snow covered area.

For some parts of the AOI, the workflow shown in this notebook indicates a short snow cover duration. These areas are mainly covered by forest. This points to difficulties of the approach to map snow cover reliably in forested areas.

The impact of NDSI threshold choice varies with space and time, depending on the snow cover conditions. Completely snow-free or completely snow-covered pixels show sufficient contrast in an NDSI raster and can be classified unambiguously (with a range of thresholds). A patchy snow cover (i) in the melting period and after the first snowfalls in autumn or (ii) in steep and rocky terrain is more ambiguous (mixed pixel problem) and sensitive to the classification threshold.

The approach works without any reference data as long as it is not being validated (no training of a statistical classifier). Note that this example is a binary classification based on only one classification feature which provides a good contrast (i.e. separability) for the target classes. Lots of experience from scientific studies is available on the suitability of the feature (NDSI) and the threshold for the problem at hand. For more complex classification tasks (more classes, poor separability, more features, ...) we need more sophisticated approaches to find appropriate decision boundaries (classification rules), for instance using machine learning.