Metadata
title: "Introduction to the Google Earth Engine Python API: Sentinel-2 cloud masking, interactive mapping and download"description: "This is a tutorial within the second theme of Module 1 of the E-TRAINEE course."
lastUpdate: 2023-07-14
author: Andreas Mayr
Introduction to the Google Earth Engine Python API: Sentinel-2 cloud masking, interactive mapping and download¶
This Notebook performs a procedure for cloud masking in Sentinel-2 imagery, which is an important basis for further time series analysis. It is shown how to (i) access Google Earth Engine (GEE) image collections via the GEE Python API, (ii) process the imagery in the cloud, (iii) visualize the results in interactive maps, and (iv) download data from GEE. More specifically, you will learn how to achieve the following steps:
- Query and filter GEE image collections of Sentinel-2 imagery and associated cloud probabilities
- Join the two different, filtered image collections to build a new collection
- Derive clouds and cloud shadows as components of a cloud mask
- Dilate the edge of cloud mask objects
- Visualize the imagery, the final cloud mask and its components in an interactive Leaflet map using the Folium package
- Download cloud masked data produced in the Earth Engine to a local hard drive (using the geemap package)
- Apply a function (e.g. for calculating a vegetation index) to an image collection
Much of the Notebook is based on Sentinel-2 Cloud Masking with s2cloudless by Justin Braaten, licensed under the CC BY 4.0 License and the Apache 2.0 License (code samples).
Requirements:
- You will need a Google account with GEE activated (sign up here, if you do not already have one).
- The Python packages used in this tutorial are listed in the requirements file provided for the course. Please see the instructions on the software page for setting up a Conda environment based on this file.
Authenticate and initialize API¶
First, we load the required packages: Google Earth Engine Python API (which is called ee
), geemap
and folium
.
import ee
import geemap
import folium
Before we can start using the GEE API, we must authenticate with our credentials and initialize the API. Run the cell and follow the instructions on how to grant the notebook access with your account.
try: # If you have authenticated recently, you might be able to initialize directly
ee.Initialize()
except Exception as e: # If initialize does not work, you probably have to authenticate first
ee.Authenticate()
ee.Initialize()
To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions:
The authorization workflow will generate a code, which you should paste in the box below.
Successfully saved authorization token.
Query and filter image collections¶
We want to query the Sentinel-2 Multi-Spectral instrument (MSI) image collection with Level-2A (surface reflectance) products and apply filters to the collection. First, we define a geometry and some other parameters that will be used for filtering:
my_point = ee.Geometry.Point(11.0, 46.8) # Point of interest with latitude and longitude
AOI = my_point # Area of interest (map display will be centered on this, here we simply use a point)
START_DATE = '2021-06-09' # Start date of our time period of interest
END_DATE = '2021-07-11' # End date of our time period of interest
CLOUD_FILTER = 60 # Maximum image cloud cover percent allowed in image collection
CLD_PRB_THRESH = 40 # Cloud probability (%); values greater than are considered cloud
NIR_DRK_THRESH = 0.15 # NIR reflectance; values less than this are considered potential cloud shadow
CLD_PRJ_DIST = 1.5 # Maximum distance (km) to search for cloud shadows from cloud edges
BUFFER = 60 # Distance [m] to dilate the edge of cloud-identified objects
You may fine tune the cloud-mask related thresholds to get the best results for your needs. The strictness of cloud masking is essentially a trade-off between data completeness (little spatial and temporal gaps) and reliability of land surface observations. It will depend on your further processing steps (such as compositing) and how any remaining cloudy pixels can compromise the reliability of intermediate products and final results.
We will use two different ImageCollections:
- The Sentinel-2 Level-2A collection provides the surface reflectances produced by the sen2cor processor and downloaded from the Copernicus Open Access Hub.
- The Sentinel 2 Cloud Probability collection is created using a machine learning approach to cloud detection, which is detailed here and here. Alternatively, the QA60 cloud mask or the SCL scene classification map provided by ESA with the Sentinel-2 L2A product could be used. See this blog post for a comparative discussion of the cloud masks.
These two collections are filtered by bounds and date and then joined into a single collection.
def get_s2_sr_cld_col(aoi, start_date, end_date):
# Import and filter S2 surface reflectance
s2_sr_col = (ee.ImageCollection('COPERNICUS/S2_SR')
.filterBounds(aoi)
.filterDate(start_date, end_date)
.filter(ee.Filter.lte('CLOUDY_PIXEL_PERCENTAGE', CLOUD_FILTER)))
# Import and filter cloud probabilities produced by s2cloudless
s2_cloudless_col = (ee.ImageCollection('COPERNICUS/S2_CLOUD_PROBABILITY')
.filterBounds(aoi)
.filterDate(start_date, end_date))
# Join the filtered cloud probabilities to the surface reflectances by the 'system:index' property
return ee.ImageCollection(ee.Join.saveFirst('s2cloudless').apply(**{
'primary': s2_sr_col,
'secondary': s2_cloudless_col,
'condition': ee.Filter.equals(**{
'leftField': 'system:index',
'rightField': 'system:index'
})
}))
# Apply the function defined above to build a collection containing both S2 SR and cloud probabilities
my_S2_sr_cld_col = get_s2_sr_cld_col(AOI, START_DATE, END_DATE)
def add_cloud_bands(img):
# Get s2cloudless image, subset the probability band.
cld_prb = ee.Image(img.get('s2cloudless')).select('probability')
# Condition s2cloudless by the probability threshold value (defined at the beginning).
is_cloud = cld_prb.gt(CLD_PRB_THRESH).rename('clouds')
# Add the cloud probability layer and cloud mask as image bands.
return img.addBands(ee.Image([cld_prb, is_cloud]))
Cloud shadow components¶
Define a function to add dark pixels, cloud projection, and identified shadows as bands to an S2 SR image input. Note that the image input needs to be the result of the above add_cloud_bands function because it relies on knowing which pixels are considered cloudy ('clouds' band).
def add_shadow_bands(img):
# Exclude water pixels using the scene classification (SCL) band.
not_water = img.select('SCL').neq(6)
# Identify dark NIR pixels that are not water (potential cloud shadow pixels).
SR_BAND_SCALE = 1e4
dark_pixels = img.select('B8').lt(NIR_DRK_THRESH*SR_BAND_SCALE).multiply(not_water).rename('dark_pixels')
# Determine the direction to project cloud shadow from clouds (assumes UTM projection).
shadow_azimuth = ee.Number(90).subtract(ee.Number(img.get('MEAN_SOLAR_AZIMUTH_ANGLE')));
# Project shadows from clouds for the distance specified by the CLD_PRJ_DIST input.
cld_proj = (img.select('clouds').directionalDistanceTransform(shadow_azimuth, CLD_PRJ_DIST*10)
.reproject(**{'crs': img.select(0).projection(), 'scale': 100})
.select('distance')
.mask()
.rename('cloud_transform'))
# Identify the intersection of dark pixels with cloud shadow projection.
shadows = cld_proj.multiply(dark_pixels).rename('shadows')
# Add dark pixels, cloud projection, and identified shadows as image bands.
return img.addBands(ee.Image([dark_pixels, cld_proj, shadows]))
Final cloud-shadow mask¶
Define a function to assemble all of the cloud and cloud shadow components and produce the final mask.
def add_cld_shdw_mask(img):
# Add cloud component bands.
img_cloud = add_cloud_bands(img)
# Add cloud shadow component bands.
img_cloud_shadow = add_shadow_bands(img_cloud)
# Combine cloud and shadow mask, set cloud and shadow as value 1, else 0.
is_cld_shdw = img_cloud_shadow.select('clouds').add(img_cloud_shadow.select('shadows')).gt(0)
# Remove small cloud-shadow patches and dilate remaining pixels by BUFFER input.
# 20 m scale is for speed, and assumes clouds don't require 10 m precision.
is_cld_shdw = (is_cld_shdw.focalMin(2).focalMax(BUFFER*2/20)
.reproject(**{'crs': img.select([0]).projection(), 'scale': 20})
.rename('cloudmask'))
# Add the final cloud-shadow mask to the image.
return img_cloud_shadow.addBands(is_cld_shdw)
# or (to include only the final cloud/cloud shadow mask along with the original image bands)
#return img.addBands(is_cld_shdw)
# Define a method for displaying Earth Engine image tiles to a folium map.
def add_ee_layer(self, ee_image_object, vis_params, name, show=True, opacity=1, min_zoom=0):
map_id_dict = ee.Image(ee_image_object).getMapId(vis_params)
folium.raster_layers.TileLayer(
tiles=map_id_dict['tile_fetcher'].url_format,
attr='Map Data © <a href="https://earthengine.google.com/">Google Earth Engine</a>',
name=name,
show=show,
opacity=opacity,
min_zoom=min_zoom,
overlay=True,
control=True
).add_to(self)
# Add the Earth Engine layer method to folium.
folium.Map.add_ee_layer = add_ee_layer
Define a function to display all of the cloud and cloud shadow components to an interactive Folium map. The input is an image collection where each image is the result of the add_cld_shdw_mask function defined previously.
To facilitate display in a single map, we create a mosaic of the collection. ee.ImageCollection.mosaic composites images according to their position in the collection (priority is last to first) and pixel mask status, where invalid (mask value 0) pixels are filled by preceding valid (mask value >0) [pixels.
def display_cloud_layers(col):
# Mosaic the image collection.
img = col.mosaic()
# Subset layers and prepare them for display.
clouds = img.select('clouds').selfMask()
shadows = img.select('shadows').selfMask()
dark_pixels = img.select('dark_pixels').selfMask()
probability = img.select('probability')
cloudmask = img.select('cloudmask').selfMask()
cloud_transform = img.select('cloud_transform')
# Create a folium map object.
center = AOI.centroid(10).coordinates().reverse().getInfo()
m = folium.Map(location=center, zoom_start=12)
# Add layers to the folium map.
m.add_ee_layer(img,
{'bands': ['B4', 'B3', 'B2'], 'min': 0, 'max': 2500, 'gamma': 1.1},
'S2 image', True, 1, 9)
m.add_ee_layer(probability,
{'min': 0, 'max': 100},
'probability (cloud)', False, 1, 9)
m.add_ee_layer(clouds,
{'palette': 'e056fd'},
'clouds', False, 1, 9)
m.add_ee_layer(cloud_transform,
{'min': 0, 'max': 1, 'palette': ['white', 'black']},
'cloud_transform', False, 1, 9)
m.add_ee_layer(dark_pixels,
{'palette': 'orange'},
'dark_pixels', False, 1, 9)
m.add_ee_layer(shadows, {'palette': 'yellow'},
'shadows', False, 1, 9)
m.add_ee_layer(cloudmask, {'palette': 'orange'},
'cloudmask', True, 0.5, 9)
# Add a layer control panel to the map.
m.add_child(folium.LayerControl())
# Display the map.
display(m)
# Export the map as HTML file (to be viewed in a web browser), this is optional.
outfp = "output/cloud_mask_map.html"
m.save(outfp)
Display mask component layers¶
Map the add_cld_shdw_mask function over the collection to add mask component bands to each image, then display the results.
Give the system some time to render everything, it should take less than a minute.
my_S2_sr_cld_col_disp = my_S2_sr_cld_col.map(add_cld_shdw_mask)
display_cloud_layers(my_S2_sr_cld_col_disp)
Apply cloud mask¶
We use these cloud mask components to set the cloud-affected pixels to NoData.
# Define a function to apply the cloud mask to each image in the collection
def apply_cld_shdw_mask(img):
# Subset the cloudmask band and invert it so clouds/shadow are 0, else 1.
not_cld_shdw = img.select('cloudmask').Not()
# Subset reflectance bands and update their masks, return the result.
return img.select('B.*').updateMask(not_cld_shdw)
# Add the mask components to each image and apply the function to process the collection
S2_sr_cloudless = my_S2_sr_cld_col.map(add_cld_shdw_mask).map(apply_cld_shdw_mask)
Build a cloud-free composite¶
Reduce the collection by median (in your application, you might consider using medoid reduction to build a composite from actual data values, instead of per-band statistics).
For comparison, we also create a mosaic (filled with most recent pixel values, see above). Moreover, we want to know how many cloud-free observations are available for each pixel and, thus, use the count()
funtion.
s2_sr_median = S2_sr_cloudless.median()
s2_sr_mosaic = S2_sr_cloudless.mosaic()
s2_sr_count = S2_sr_cloudless.count()
Display the cloud-free composites¶
Display the results. Be patient while the map renders, it may take a minute; ee.Image.reproject()
is forcing computations to happen at 100 and 20 m scales (i.e. it is not relying on appropriate pyramid level scales for analysis). The issues with ee.Image.reproject()
being resource-intensive in this case are mostly confined to interactive map viewing. Batch image exports and table reduction exports where the scale parameter is set to typical Sentinel-2 scales (10-60 m) are less affected. For the count layer's colour scale a simple legend would be good but we skip this for simplicity here (this tutorial describes adding legends with the geemap package).
# Create a folium map object.
center = AOI.centroid(10).coordinates().reverse().getInfo()
m = folium.Map(location=center, zoom_start=12)
# Add layers to the folium map.
m.add_ee_layer(s2_sr_count,
{'bands': ['B4'], 'min': 0, 'max': 20},
'S2 cloud-free count', True, 1, 9)
m.add_ee_layer(s2_sr_median,
{'bands': ['B4', 'B3', 'B2'], 'min': 0, 'max': 2500, 'gamma': 1.1},
'S2 cloud-free median', True, 1, 9)
m.add_ee_layer(s2_sr_mosaic,
{'bands': ['B4', 'B3', 'B2'], 'min': 0, 'max': 2500, 'gamma': 1.1},
'S2 cloud-free mosaic', True, 1, 9)
# Add a layer control panel to the map.
m.add_child(folium.LayerControl())
# Display the map.
display(m)
Export images to local machine¶
Export the composite image¶
First, we will export the composite (i.e. a single image) for our study area to a folder on our local hard drive. Here, we make use of geemap, a Python package described in Wu (2020) The geemap tools facilitate interactive mapping but also some other tasks with Google Earth Engine greatly.
We define a region for the export as a polygon. Clipping seems not to be necessary then. Nevertheless, see here for clipping image collections, or here for how to use filterBounds.
# Define a region-of-interest (ROI) from a list of GeoJSON 'Polygon' formatted coordinates (Lon-Lat pairs).
geom = ee.Geometry.Polygon(
[
[
[10.990786901656424, 46.845543799137999],
[10.990786901656424, 46.878736564096471],
[11.055145371135509, 46.878736564096471],
[11.055145371135509, 46.845543799137999],
[10.990786901656424, 46.845543799137999] # matching the first vertex is optional
]
]
)
feature = ee.Feature(geom, {})
my_roi = feature.geometry()
Now we define an output path and actually export ... Using either geemap or this solution. Note that we export all 12 bands into a single GeoTIFF file.
ee_object = s2_sr_median
file_name = "C:\work\etrainee\gee\s2_sr_median.tif"
geemap.ee_export_image(ee_object, file_name, scale=10, region=my_roi, file_per_band=False) # Scale sets the resolution in meters per pixel. Defaults to 1000.
Generating URL ... Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/07d9e7b5b1e229f0676eb0d1e318379c-d62a70e682fd862abdb9429e5d318c86:getPixels Please wait ... Data downloaded to C:\work\etrainee\gee\s2_sr_median.tif
Export the cloud-masked image collection¶
We use geemap because this contains a nice function to export data directly to a local hard drive.
ee_object = S2_sr_cloudless
out_dir = "C:\work\etrainee\gee"
geemap.ee_export_image_collection(ee_object, out_dir, scale=10, region=my_roi, file_per_band=False)
print("Finished!")
Total number of images: 9 Exporting 1/9: 20210610T101559_20210610T102220_T32TPS.tif Generating URL ... Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/da77d9b40ecfc8d490a5bf035fdaef95-68c21680a834d24ced92e9e4f2437579:getPixels Please wait ... Data downloaded to C:\work\etrainee\gee\20210610T101559_20210610T102220_T32TPS.tif Exporting 2/9: 20210612T101031_20210612T101329_T32TPS.tif Generating URL ... Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/f5870ed4cb317edc28dfc05d86b2ff60-92df9db8a30f72880986d1b3a66a9201:getPixels Please wait ... Data downloaded to C:\work\etrainee\gee\20210612T101031_20210612T101329_T32TPS.tif Exporting 3/9: 20210615T102021_20210615T102602_T32TPS.tif Generating URL ... Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/a8cfa850d5348b5c032a0b6639a6baf2-e5550c87efc288ec5b909d16ac07042b:getPixels Please wait ... Data downloaded to C:\work\etrainee\gee\20210615T102021_20210615T102602_T32TPS.tif Exporting 4/9: 20210617T100559_20210617T101301_T32TPS.tif Generating URL ... Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/6332cb72f442ccc10346b5b25597cac0-c2b8f17ffe1e982083c4ee91d57f2786:getPixels Please wait ... Data downloaded to C:\work\etrainee\gee\20210617T100559_20210617T101301_T32TPS.tif Exporting 5/9: 20210622T101031_20210622T101416_T32TPS.tif Generating URL ... Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/d533affbd9a2a64295e827d4f73f2a81-cb4ce0734f4bed78b1ad8bc00b2a4d1a:getPixels Please wait ... Data downloaded to C:\work\etrainee\gee\20210622T101031_20210622T101416_T32TPS.tif Exporting 6/9: 20210625T102021_20210625T102545_T32TPS.tif Generating URL ... Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/92c0cb84a4fd5930c9e620d04816f5e0-71b8a6e296513974d7036af38c123baa:getPixels Please wait ... Data downloaded to C:\work\etrainee\gee\20210625T102021_20210625T102545_T32TPS.tif Exporting 7/9: 20210627T100559_20210627T101702_T32TPS.tif Generating URL ... Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/5001a95b748c21799a5e2d13f2cd6463-b844f2ea6f00a21b02f297b888b96fec:getPixels Please wait ... Data downloaded to C:\work\etrainee\gee\20210627T100559_20210627T101702_T32TPS.tif Exporting 8/9: 20210702T101031_20210702T101157_T32TPS.tif Generating URL ... Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/c9c0c9560dc073cd9a3545329298b2b1-f3d9aa4f17a1de9028f931aa8e731193:getPixels Please wait ... Data downloaded to C:\work\etrainee\gee\20210702T101031_20210702T101157_T32TPS.tif Exporting 9/9: 20210710T101559_20210710T102312_T32TPS.tif Generating URL ... Downloading data from https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/4ad22c03193120c71b4f7f461dde986d-9539fa20c7cdf2d8d39c10e89bfda327:getPixels Please wait ... Data downloaded to C:\work\etrainee\gee\20210710T101559_20210710T102312_T32TPS.tif Finished!
Calculate the NDVI from the cloud-masked image collection and export¶
From the collection of cloudmasked S2 images we calculate the NDVI. In GEE there is a helper function normalizedDifference()
for such purposes and a more flexible expression()
function. We use lambda
to define a function and then map it over the collection. Using map()
over an image collection is similar to a for-loop but it can make the computations running in parallel.
In addition, we print the number of scenes in this collection.
Finally, we will export the cloud-free NDVI image collection for the ROI to a directory on our local hard drive. Again, we make use of geemap.
S2_ndvi = S2_sr_cloudless.map(lambda image: image.normalizedDifference(['B8', 'B4']).rename(['ndvi']))
print(S2_ndvi.__sizeof__())
ee_object = S2_ndvi
out_dir = "C:\work\etrainee\gee"
geemap.ee_export_image_collection(ee_object, out_dir, scale=10, region=my_roi, file_per_band=False)
print("Finished!")
32 Finished!