Metadata
title: "E-TRAINEE Tutorial - Forest disturbance assessment with Python and the GEE"description: "This is a tutorial within the fourth theme of Module 1 of the E-TRAINEE course."
lastUpdate: 2023-07-21
authors: Andreas Mayr
Forest disturbance assessment with Python and the GEE¶
We want to compare the NDVI time series (spectral-temporal profile) for a disturbed forest location and for an undisturbed location to assess the timing of forest disturbance.
This notebook introduces possibilities for working with Python, the Google Earth Engine (GEE) and additional (helper) packages. You will learn how to extract spectral-temporal profiles for defined locations using the following packages:
- ee - The GEE Python API.
- geemap - A Python package described in Wu (2020). The geemap tools facilitate interactive mapping but also some other tasks with GEE.
- eemont - This package adds utility methods for different Earth Engine Objects which are friendly with Python method chaining (Montero 2021). Thereby, the complexity of the code needed for processing tasks like cloud masking and spectral index calculation is reduced.
- pandas and seaborn are used to process and plot the point-based time series
These packages are contained in the requirements file provided for the course. Please see the instructions on the software page for setting up a Conda environment based on this file.
Note: You will need a Google account with GEE activated (Sign up here, if you do not already have one.).
Define points-of-interest¶
Start with importing the required packages.
import ee, eemont, geemap # ee is the GEE Python API
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
Authenticate and initialize Earth Engine.
try:
ee.Initialize()
except Exception as e: # If initialize does not work, you probably have to authenticate first
ee.Authenticate()
ee.Initialize()
We want to compare the NDVI time series (spectral-temporal profile) for two points-of-interest: a disturbed forest location and an undisturbed location. Create a GEE FeatureCollection by defining locations as geographic coordinates (copied from QGIS) and adding a buffer around each point.
lat = [47.2989635, 47.300451] # two latitude values, one for each point
lon = [11.4038744, 11.417014] # two longitude values, one for each point
mypoints = ee.FeatureCollection([
ee.Feature(ee.Geometry.Point([lon[0],lat[0]]).buffer(50),{'point_ID':1}), # a location with disturbed forest
ee.Feature(ee.Geometry.Point([lon[1],lat[1]]).buffer(50),{'point_ID':2}) # an undisturbed location
])
Let's use the geemap package to display our locations on a map.
Map = geemap.Map()
Map.addLayer(mypoints,{'color':'blue'})
Map.centerObject(mypoints,14) # we provide a zoom level here (14)
Map
Map(center=[47.299707488940804, 11.410444109531477], controls=(WidgetControl(options=['position', 'transparent…
Extract time series for regions¶
Query GEE image collection¶
Query the Landsat 8 image collection from the Google Earth Engine centered around our points-of-interest (a FeatureCollection), and preprocess the image collection. We also calculate two different vegetation indices. The eemont package provides functions to make this easy.
L8 = (ee.ImageCollection('LANDSAT/LC08/C01/T1_SR') # or 'LANDSAT/LC08/C02/T1_L2' for Landsat Collection 2
.filterBounds(mypoints)
.maskClouds()
.scaleAndOffset()
.spectralIndices(['NDVI', 'EVI']))
The eemont package has a function for extracting spectral time series for regions. As regions we use the FeatureCollection containing our points with a buffer. We aggregate the pixels inside the buffer by taking the mean value at each time stamp.
L8_point_ts = L8.getTimeSeriesByRegions(collection = mypoints,
bands = ['NDVI', 'EVI'],
reducer = ee.Reducer.mean(),
scale = 30)
Convert time series to Pandas¶
To utilize the plotting and analysis functionality of Pandas, we convert the time series retrieved as a feature collection to a Pandas DataFrame.
tsPandas = geemap.ee_to_pandas(L8_point_ts)
tsPandas
date | point_ID | EVI | reducer | NDVI | |
---|---|---|---|---|---|
0 | 2013-03-24T10:00:18 | 1 | -9999.000000 | mean | -9999.000000 |
1 | 2013-03-24T10:00:18 | 2 | -9999.000000 | mean | -9999.000000 |
2 | 2013-04-13T09:59:46 | 1 | -9999.000000 | mean | -9999.000000 |
3 | 2013-04-13T09:59:46 | 2 | 0.385702 | mean | 0.697527 |
4 | 2013-05-15T09:59:56 | 1 | -9999.000000 | mean | -9999.000000 |
... | ... | ... | ... | ... | ... |
673 | 2021-11-20T10:04:29 | 2 | 0.375316 | mean | 0.770483 |
674 | 2021-12-06T10:04:30 | 1 | -9999.000000 | mean | -9999.000000 |
675 | 2021-12-06T10:04:30 | 2 | -9999.000000 | mean | -9999.000000 |
676 | 2021-12-22T10:04:27 | 1 | -0.021050 | mean | -0.004391 |
677 | 2021-12-22T10:04:27 | 2 | 0.456888 | mean | 0.785765 |
678 rows × 5 columns
Replace all -9999 values by NaN (NoData, 'not-a-number'), and convert the date from string to a date data type.
tsPandas[tsPandas == -9999] = np.nan
tsPandas['date'] = pd.to_datetime(tsPandas['date'],infer_datetime_format = True)
tsPandas
date | point_ID | EVI | reducer | NDVI | |
---|---|---|---|---|---|
0 | 2013-03-24 10:00:18 | 1 | NaN | mean | NaN |
1 | 2013-03-24 10:00:18 | 2 | NaN | mean | NaN |
2 | 2013-04-13 09:59:46 | 1 | NaN | mean | NaN |
3 | 2013-04-13 09:59:46 | 2 | 0.385702 | mean | 0.697527 |
4 | 2013-05-15 09:59:56 | 1 | NaN | mean | NaN |
... | ... | ... | ... | ... | ... |
673 | 2021-11-20 10:04:29 | 2 | 0.375316 | mean | 0.770483 |
674 | 2021-12-06 10:04:30 | 1 | NaN | mean | NaN |
675 | 2021-12-06 10:04:30 | 2 | NaN | mean | NaN |
676 | 2021-12-22 10:04:27 | 1 | -0.021050 | mean | -0.004391 |
677 | 2021-12-22 10:04:27 | 2 | 0.456888 | mean | 0.785765 |
678 rows × 5 columns
Visualization¶
We use seaborn
(imported as sns in the beginning) to visualize our time series.
plt.figure(figsize = (10,3))
sns.lineplot(data = tsPandas,
x = 'date',
y = 'NDVI',
palette = ['red', 'grey'],
hue = 'point_ID',
alpha = 0.5)
<Axes: xlabel='date', ylabel='NDVI'>
Or each spectral-temporal profile in a separate panel:
myplot = sns.FacetGrid(tsPandas,row = 'point_ID',height = 3,aspect = 4)
myplot.map_dataframe(sns.lineplot,x = 'date',y = 'NDVI')
myplot.add_legend()
<seaborn.axisgrid.FacetGrid at 0x211cac72290>
Temporal aggregations¶
To perform temporal aggregations (such as computing monthly means) we need a DataFrame with DatetimeIndex. So we set the date column as index.
tsPandas_indexed = tsPandas.set_index('date')
tsPandas_indexed
point_ID | EVI | reducer | NDVI | |
---|---|---|---|---|
date | ||||
2013-03-24 10:00:18 | 1 | NaN | mean | NaN |
2013-03-24 10:00:18 | 2 | NaN | mean | NaN |
2013-04-13 09:59:46 | 1 | NaN | mean | NaN |
2013-04-13 09:59:46 | 2 | 0.385702 | mean | 0.697527 |
2013-05-15 09:59:56 | 1 | NaN | mean | NaN |
... | ... | ... | ... | ... |
2021-11-20 10:04:29 | 2 | 0.375316 | mean | 0.770483 |
2021-12-06 10:04:30 | 1 | NaN | mean | NaN |
2021-12-06 10:04:30 | 2 | NaN | mean | NaN |
2021-12-22 10:04:27 | 1 | -0.021050 | mean | -0.004391 |
2021-12-22 10:04:27 | 2 | 0.456888 | mean | 0.785765 |
678 rows × 4 columns
Next, we want the NDVI data of each point in a separate column.
tsPandas_pv = pd.pivot_table(tsPandas, index='date', values='NDVI', columns='point_ID')
tsPandas_pv
point_ID | 1 | 2 |
---|---|---|
date | ||
2013-04-13 09:59:46 | NaN | 0.697527 |
2013-06-07 10:06:10 | 0.876574 | 0.865738 |
2013-06-16 09:59:55 | 0.757573 | 0.766580 |
2013-06-23 10:06:04 | 0.891951 | 0.806887 |
2013-07-02 09:59:56 | 0.881142 | 0.875385 |
... | ... | ... |
2021-09-10 09:58:14 | 0.732096 | 0.853265 |
2021-10-28 09:58:24 | 0.560447 | 0.759319 |
2021-11-20 10:04:29 | 0.465900 | 0.770483 |
2021-12-22 10:04:27 | -0.004391 | 0.785765 |
2021-12-31 09:58:12 | 0.192541 | 0.899323 |
159 rows × 2 columns
Now we can compute aggregate statistics (such as the mean, maximum, minimum, median) over defined time periods. This will smooth the time series and make it easier to grasp certain patterns from the plots.
tsPandas_monthly = tsPandas_pv.resample('M').max()
tsPandas_3monthly = tsPandas_pv.resample('3M').max()
tsPandas_yearly = tsPandas_pv.resample('Y').max()
fig, axes = plt.subplots(3,1, figsize=(10, 5), sharex=True)
sns_plot = sns.lineplot(data = tsPandas_monthly, ax=axes[0])
sns_plot = sns.lineplot(data = tsPandas_3monthly, ax=axes[1])
sns_plot = sns.lineplot(data = tsPandas_yearly, ax=axes[2])
Of course we can also plot the time series with matplotlib.
plt.figure(figsize=(10,5))
plt.plot_date(tsPandas_pv.index, tsPandas_pv)
plt.title("Landsat 8 NDVI time series forest sites")
plt.ylabel("NDVI")
plt.legend(['Point 1', 'Point 2'])
plt.grid(True)
plt.show()
Interpretation¶
Looking at the NDVI time series for the two points, we see a seasonal cycle and some variation between different years. The variation is stronger for the yearly minima than for the yearly maxima, maybe because the minima are affected by snow and Landsat 8 did not always capture the (relatively short-lived) situations with a lot of snow on the canopy.
However, what is apparent in all these spectral trajectories is that point 1 (the disturbed forest site) has an unusually low NDVI in summer 2019, compared to the previous years, whereas the NDVI of point 2 (the undisturbed site) maintains constantly high annual maxima. In the years 2020 and 2021, the annual NDVI maxima of point 1 have increased again. This pattern reflects the destruction of forest in winter 2019 (by an avalanche as we know in this case) with subsequent establishment or expansion of some vegetation in the following years (probably forbs and shrubs).