Metadata

title: "E-TRAINEE Tutorial - Point cloud exploration with Python"
description: "This is a tutorial within the second theme of Module 1 of the E-TRAINEE course."
lastUpdate: 2023-07-17
authors: Andreas Mayr

Point cloud exploration with Python¶

A Notebook to explore a 3D point cloud with various state-of-the-art Python packages specialized on large datasets. Different options for loading and visualizing point clouds quickly in a 2D map view are shown.

The packages needed for this tutorial are contained in the requirements file provided for the course. Please see the instructions on the software page for setting up a Conda environment based on this file.

Data:

Use the pc_subset_hut.las point cloud or try with your own one. The pc_subset_hut.las is a small subset of a point cloud acquired by unmanned aerial vehicle laser scanning (details in Mayr et al. 2020) and covers a small area in the Dolomites (IT), including the buildings of a mountain hut.

Handling point clouds with Vaex¶

Vaex is a Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. Let's see how this can be used with point clouds.

Reading files¶

Read LAS file to Vaex dataframe¶

In [1]:

Copied!





import laspy
import vaex
import pandas as pd
import pathlib
import laspy
import vaex
import pandas as pd
import pathlib

In [3]:

Copied!

data_dir = pathlib.Path('C:/work/etrainee/point_clouds')    # Define path on your local system
data_dir
data_dir = pathlib.Path('C:/work/etrainee/point_clouds')    # Define path on your local system
data_dir

Out[3]:

WindowsPath('C:/work/etrainee/point_clouds')

In [4]:

Copied!





PC_las = data_dir / 'pc_subset_hut.las'
#PC_las = data_dir / 'ahk_2020_uls.laz'             # use AHK rock glacier point cloud (lacks intensity values)

# read LAS file to Vaex dataframe (with selected attributes, e.g. intensity)
inFile = laspy.read(PC_las)
vdf_las = vaex.from_dict({'X': inFile.x, 'Y': inFile.y, 'Z': inFile.z, 'intensity': inFile.intensity})
print(f'preview the first rows\n: {vdf_las.head(5)}')
print(f'column names: {vdf_las.column_names}')

print(f'number of rows: {vdf_las.shape[0]:,}')
print(f'number of columns: {vdf_las.shape[1]}')
PC_las = data_dir / 'pc_subset_hut.las'
#PC_las = data_dir / 'ahk_2020_uls.laz'             # use AHK rock glacier point cloud (lacks intensity values)

# read LAS file to Vaex dataframe (with selected attributes, e.g. intensity)
inFile = laspy.read(PC_las)
vdf_las = vaex.from_dict({'X': inFile.x, 'Y': inFile.y, 'Z': inFile.z, 'intensity': inFile.intensity})
print(f'preview the first rows\n: {vdf_las.head(5)}')
print(f'column names: {vdf_las.column_names}')

print(f'number of rows: {vdf_las.shape[0]:,}')
print(f'number of columns: {vdf_las.shape[1]}')

preview the first rows
:   #       X            Y        Z    intensity
  0  714607  5.16844e+06  2333.22        39255
  1  714610  5.16844e+06  2334.51        38425
  2  714609  5.16844e+06  2334.6         38971
  3  714609  5.16844e+06  2334.76        38053
  4  714609  5.16844e+06  2334.93        39823
column names: ['X', 'Y', 'Z', 'intensity']
number of rows: 4,839,997
number of columns: 4

Export the point cloud to CSV (using pandas_df.to_csv under the hood):

In [32]:

Copied!

%%time
PC_csv = data_dir / 'pc_subset_hut.csv'
vdf_las.export_csv(PC_csv)
%%time
PC_csv = data_dir / 'pc_subset_hut.csv'
vdf_las.export_csv(PC_csv)

CPU times: total: 23.9 s
Wall time: 27.1 s

Compare this to the speed of using the Apache Arrow backend:

In [5]:

Copied!

%%time
PC_csv = data_dir / 'pc_subset_hut.csv'
vdf_las.export_csv_arrow(PC_csv)
%%time
PC_csv = data_dir / 'pc_subset_hut.csv'
vdf_las.export_csv_arrow(PC_csv)

CPU times: total: 3.77 s
Wall time: 4.32 s

Read CSV file to Vaex dataframe¶

The vaex.open method reads a CSV file lazily, i.e. the data is streamed when needed for computations. This is useful when your data is larger the available memory. In this case, however, we use vaex.from_csv to read into memory.

In [6]:

Copied!

vdf = vaex.from_csv(PC_csv)    # read textfile (csv) to Vaex dataframe (use sep='\t' if your data is tab-separated)
vdf.head(n=5)
vdf = vaex.from_csv(PC_csv)    # read textfile (csv) to Vaex dataframe (use sep='\t' if your data is tab-separated)
vdf.head(n=5)

Out[6]:

#	X	Y	Z	intensity
0	714607	5.16844e+06	2333.22	39255
1	714610	5.16844e+06	2334.51	38425
2	714609	5.16844e+06	2334.6	38971
3	714609	5.16844e+06	2334.76	38053
4	714609	5.16844e+06	2334.93	39823

Memory efficient out-of-core processing¶

By specifying the convert=True argument, large CSVs (larger than memory) can be read in chunks, converted to the memory-mappable HDF5 format and written to disk (first as one temporary HDF5 file per chunk, then concatenated into one HDF5, hence requiring twice the disk space of the final HDF5 file). This frees memory, and allows you to efficiently work out-of-core.

The syntax to do this is quite simple even for multiple files (e.g. a tiled point cloud; see the documentation here), and very straightforward for a single CSV file:

In [35]:

Copied!

vdf = vaex.from_csv(PC_csv, convert=True, chunk_size=5_000_000, progress=True, copy_index=True)    # Does not make much sense with only 4.8M points, try with a larger point cloud.
vdf = vaex.from_csv(PC_csv, convert=True, chunk_size=5_000_000, progress=True, copy_index=True)    # Does not make much sense with only 4.8M points, try with a larger point cloud.

Here, we also copied the index column of the Pandas DataFrame as a regular column (just to demonstrate the option).

Visualization with Vaex¶

Let's plot histograms directly from the Vaex dataframe, first the elevation ...

In [7]:

Copied!

vdf.viz.histogram(vdf.Z)
vdf.viz.histogram(vdf.Z)

Out[7]:

[<matplotlib.lines.Line2D at 0x18a26ecb220>]

No description has been provided for this image

... then the intensity values:

In [8]:

Copied!

vdf.viz.histogram(vdf.intensity, limits='99.7%')
vdf.viz.histogram(vdf.intensity, limits='99.7%')

Out[8]:

[<matplotlib.lines.Line2D at 0x18a26fe1e10>]

Now we plot 2D maps with aggregating the mean and standard deviation of the Z value (i.e. elevation in meters) per visualization cell, as well as the number of points. Note that we set the aspect parameter to fix the axis aspect ratio (and prevent a spatial distortion). See the Vaex documentation for more options.

In [9]:

Copied!





#vdf.viz.heatmap(vdf.X, vdf.Y, what=vaex.stat.mean(vdf.Z), colorbar=False) # With a single plot there seems to be a problem with colorbar placement.
vdf.viz.heatmap(vdf.X, vdf.Y, what=[vaex.stat.mean(vdf.Z), vaex.stat.std(vdf.Z), vaex.stat.count(vdf.Z)],
                title="Values aggregated per cell",
                aspect="equal",
                figsize=(10, 5))
#vdf.viz.heatmap(vdf.X, vdf.Y, what=vaex.stat.mean(vdf.Z), colorbar=False) # With a single plot there seems to be a problem with colorbar placement.
vdf.viz.heatmap(vdf.X, vdf.Y, what=[vaex.stat.mean(vdf.Z), vaex.stat.std(vdf.Z), vaex.stat.count(vdf.Z)],
                title="Values aggregated per cell",
                aspect="equal",
                figsize=(10, 5))

Out[9]:

<matplotlib.image.AxesImage at 0x18a279c3fa0>

More Vaex tools¶

Vaex offers other tools you may find useful for handling 3D point clouds, such as:

Geo operations - For "clipping-like" geometric operations, e.g. inside_polygons()?
Fast group-by operations
Arrow support - Make use of the Arrow memory format capable of zero-copy reads for fast data access, visualizing quick overviews, shallow copies, and creating new virtual columns.
Machine learning - vaex.ml offers ML functionality, including a fast and scalable K-means clustering implementation and many other tools.
Propagation of uncertainties
For more point cloud plotting examples see: https://examples.pyviz.org/ and https://examples.pyviz.org/seattle_lidar/Seattle_Lidar.html#seattle-lidar-gallery-seattle-lidar

Datashader¶

Let's try plotting from Pandas DataFrames with Datashader, a toolset for rendering graphics of large datasets quickly and conveniently.

First we load a point cloud in csv format as Pandas DataFrame.

In [10]:

Copied!

import pandas as pd
df = pd.read_csv(PC_csv)                    # load point cloud as pandas dataframe
import pandas as pd
df = pd.read_csv(PC_csv)                    # load point cloud as pandas dataframe

Now import the required packages and use Datashader to quickly plot a 2D planimetric view of our point cloud, where the colors represent the 2D point density.

In [11]:

Copied!





import colorcet
import datashader as ds, datashader.transfer_functions as tf

cvs = ds.Canvas(plot_width=850)
agg = cvs.points(df, 'X', 'Y')                  # we plot the point density
tf.shade(agg, cmap=colorcet.fire, how='log')    # plot with fire colormap and log colorscale
import colorcet
import datashader as ds, datashader.transfer_functions as tf

cvs = ds.Canvas(plot_width=850)
agg = cvs.points(df, 'X', 'Y')                  # we plot the point density
tf.shade(agg, cmap=colorcet.fire, how='log')    # plot with fire colormap and log colorscale

Out[11]:

In [12]:

Copied!

tf.shade(agg, cmap=colorcet.fire, how='eq_hist')    # plot with fire colormap and histogram equalization
tf.shade(agg, cmap=colorcet.fire, how='eq_hist')    # plot with fire colormap and histogram equalization

Out[12]:

... and we plot the maximum elevation per bin:

In [13]:

Copied!

agg = cvs.points(df, 'X', 'Y', ds.max('Z'))                  # we plot the maximum elevation per bin
tf.shade(agg, cmap=colorcet.gray)
agg = cvs.points(df, 'X', 'Y', ds.max('Z'))                  # we plot the maximum elevation per bin
tf.shade(agg, cmap=colorcet.gray)

Out[13]:

Now we create an interactive version of the map, using Holoviews with the Bokeh plotting backend and again with datashader for fast rendering. We switch to aggregating the standard deviation of z-values per bin.

In [14]:

Copied!





import holoviews.operation.datashader as hd
import holoviews as hv

hv.extension('bokeh')
shaded = hd.datashade(hv.Points(df, ['X', 'Y']), cmap=colorcet.bmy, aggregator=ds.std('Z'))
hd.dynspread(shaded, threshold=0.5, max_px=4).opts(bgcolor='black', xaxis=None, yaxis=None, width=900, height=500)
import holoviews.operation.datashader as hd
import holoviews as hv

hv.extension('bokeh')
shaded = hd.datashade(hv.Points(df, ['X', 'Y']), cmap=colorcet.bmy, aggregator=ds.std('Z'))
hd.dynspread(shaded, threshold=0.5, max_px=4).opts(bgcolor='black', xaxis=None, yaxis=None, width=900, height=500)

Out[14]:

hvPlot¶

The next plotting option we try is hvPlot, which provides a high-level plotting API built on HoloViews. By specifying rasterize, we get a colorbar and the possibility to fix the axes aspect ratio (preventing spatial distortion). This uses datashader to aggregate the data into a viewable image, but to get a logarithmic colorscale we must use hvPlot's option logz() (using datashader directly it would be how='log'). See the user guide for more customization options.

In [15]:

Copied!

import hvplot.pandas
# Make colors represent point density
df.hvplot.scatter(x='X', y='Y', rasterize=True, aspect="equal", logz=True, title='Point count')
import hvplot.pandas
# Make colors represent point density
df.hvplot.scatter(x='X', y='Y', rasterize=True, aspect="equal", logz=True, title='Point count')

Out[15]:

In [17]:

Copied!

df.hvplot.scatter(x='X', y='Y', rasterize=True, aggregator=ds.std('Z'), aspect="equal", title='Elevation standard deviation [m]')
df.hvplot.scatter(x='X', y='Y', rasterize=True, aggregator=ds.std('Z'), aspect="equal", title='Elevation standard deviation [m]')

Out[17]:

Now we want to use the Matplotlib plotting backend (instead of the default Bokeh backend) and show map along with a histogram.

In [18]:

Copied!

hvplot.extension('matplotlib')

(df.hvplot.scatter(x='X', y='Y', rasterize=True, aggregator=ds.mean('Z'), aspect="equal", title='Elevation mean [m]') +
df.hvplot.hist('Z'))
hvplot.extension('matplotlib')

(df.hvplot.scatter(x='X', y='Y', rasterize=True, aggregator=ds.mean('Z'), aspect="equal", title='Elevation mean [m]') +
df.hvplot.hist('Z'))

Out[18]:

Maybe try also GeoPandas and spatialpandas!