Jupyter Notebooks and a few python basics¶
Here, you can learn how to use a Jupyter Notebook for interactive computing and for documenting a workflow.
At this opportunity we will also have a look at a few selected Python basics.
Jupyter Notebooks¶
Program code can be run by executing "normal" scripts (e.g. *.py
files), as we did before. Especially for experimenting or learning, a great alternative is to run pieces of code interactively within Jupyter Notebooks. The Jupyter Notebook format (*.ipynb
, short for “IPython notebook”) can contain code cells along with text explanations as well as graphics and other output in between. This makes the Notebooks also a great format for documenting your work and for explaining it to others. The E-TRAINEE course and many other tutorials make extensive use of such Notebooks.
Although Jupyter Notebooks are plain text files, it makes not much sense to view or edit them in a normal plain text editor (in contrast to *.py
files for instance). They are made to be edited and run in dedicated web-based or desktop applications that can process and display them properly, such as JupyterLab, JupyterHub, Jupyter Desktop, or Visual Studio Code. We suggest the latter, amongst others because VSCode is also a good editor for "normal" Python scripts.
Create a new Notebook¶
In VSCode, use the shortcut CTRL + Shift + P
to show and run commands for VSCode at the top of your screen and then type/select "Create: New Jupyter Notebook". Don't forget to save your Notebook if you want to continue with it.
There are two different "modes" when you are in Jupyter Notebook: Navigation mode and cell editing mode. You can switch between the two modes with Enter and Esc, respectively. A key feature of Notebooks is that they are usually divided into different cells.
Code cells¶
As the name suggests, code cells are used to edit and write new code in a programming language such as Python, Julia or R. When a code cell is executed, its code is sent to the kernel associated with the Notebook. If the kernel can process it (right kernel for the language, required packages installed, no bugs in the code, ...), the results that are returned from that computation are displayed in the Notebook as the cell's output. Such output can be text, tables or figures.
Your new Notebook will contain one first code cell and look more or less like this:
There are (amongst others) options to
- Select the Kernel in the upper right corner
- Add a code cell or Add a Markdown cell in the upper left corner (appears also below/between existing cells in the document if you are pointing there with your mouse)
- Run a single code cell (play button left of the cell) or run all code cells (play button above the document)
- Select the language mode of the cell in the bottom right corner (default is Markdown or Python)
If you click into a cell you get additional options, such as for rendering a Markdown cell (stop editing; hook in the upper right corner of the cell), for running code, or to delete the cell.
Text cells¶
For writing descriptive text, we can insert Markdown cells. Markdown is a simple and easy-to-use markup language that allows us to format these text cells (e.g., with headings, emphasis, lists of bullet points, etc.). Actuallly, the E-TRAINEE course documents are more or less based on Jupyter Notebooks and pure Markdown documents (with some HTML syntax for more advanced formatting in between).
Try to create a simple example with a few code and text cells! Have a short look at this guide to the Markdown syntax and try to create text with two heading levels and parts of the text highlighted in bold or italic.
Open an existing Notebook¶
What you are viewing here (if you are on the course website) is actually a Jupyter Notebook converted to HTML and rendered in your browser. To be able to use it interactively (modify and execute cells), you have to download it as *.ipynb
file from GitHub (download button at the upper right of this Notebook) and open it in VSCode (or JupiterLab).
Try some interactive Python¶
Once you have downloaded this Notebook, you can use it interactively, that means run code cells, see their output, modify the code (if you want) and run it again to see what changes. Let's try this!
Along this way we learn or repeat also some Python basics. However, this is by far not a comprehensive crash course and if you are new to Python or need a more thorough refresher of your coding skills, please have a look at the official Python tutorial.
Print messages¶
print("Hello world!")
Try skipping this cell and run the next one first. What happens?
my_string = "Hello world!" # don't run this cell, run the next one first
print(my_string) # this only works if we have run the cell above already (try)
Loops and flow control¶
For repeated execution of a task, for
loops are important.
for n in range(3):
print("Hello world!")
print(f"This is iteration number {n}")
print("Finished!")
This example shows several things:
- The colon
:
denotes the beginning of a definition (here of the repeated code under the for loop). - Python defaults to counting from 0 rather than from 1.
- Function calls in Python always use parentheses:
print()
- Code blocks are identified through indentations.
- We can supply a variable to a string, in this case we supply the current value of
n
to an f-string (a string preceeded by the letter "f").
For logical decisions, Python has if
statements.
if n > 2:
print("n is greater than 2!")
else:
print("n is not greater than 2!")
The while
statement can be used to repeat code as long as a condition is met (conditional loop).
m = 0
while m < 3:
print(f"This is iteration number {m}.")
m += 1
print("Now the condition is no longer met.")
Loading packages¶
To make the functions defined in an installed Python package available to us, we must load the package using the import
keyword. Let's load the pandas
package, which is popular for working with tabular data.
import pandas
Now, we can access any function from the package, by typing pandas.*. For example, create a Series object (a labelled 1-dimensional array):
pandas.Series([1, 3, 5])
For many packages it is common practice to import a package under a different name which is shorter to type, for example:
import pandas as pd
We can now access all of the functions in pandas using pd.*
, instead of pandas.*
. Let's try working on the series.
a = pd.Series([1, 3, 5])
b = 3 # What is different if b = 3.0 (instead of 3)?
c = a + b
c
While importing packages or their submodules under a short name is somehow a matter of taste, you should maybe not do this too excessively. Sometimes it is preferable to have a longer but also more informative line of code (you won't remember all packages and submodules imported in the beginning of a long script). For widely used packages, however, it is more or less a convention to abbreviate them and you will also get used to these short names. Examples used in the course are:
import pandas as pd
import geopandas as gpd
import numpy as np
import xarray as xr
import seaborn as sns
import matplotlib.pyplot as plt # Matplotlib's pyplot interface (a submodule of thematplotlib package)
from matplotlib import pyplot as plt # Alternative way to import only the pyplot submodule
Some basic plotting with Matplotlib¶
A nice feature of Jupyter Notebooks is that they can contain also graphical output, such as figures generated by the matplotlib
library.
import numpy as np
import matplotlib.pyplot as plt
theta = np.linspace(0 , 20, 100) # Create some data
sintheta = np.sin(theta)
plt.plot(theta, sintheta, label='y = sin(x)', color='purple')
plt.grid()
plt.legend()
plt.xlabel('Theta')
plt.ylabel('Sintheta')
plt.title('This is a great title', fontsize=14)
Text(0.5, 1.0, 'This is a great title')
For this plot we used the functional (pyplot
) interface of matplotlib
. Various function calls add elements (like a title or a legend) to the same figure and or modify this figure. While we can create rather simple plots quickly and easily with this approach, the possibilities for customization and complexity in a figure are somehow limited.
The object-oriented interface of matplotlib
provides more advanced options and is suited to generate even very complex and specific figures. If you want to learn more about the two different plotting approaches of matplotlib
read the documentation or the blog posts here and here. In the course we will use also other, more specific plotting libraries (e.g., to visualize geographic data or for statistical visualizations) which are built on top of matplotlib
.
Now you should be able to use Jupyter Notebooks for interactive Python coding. For more detailed information on Jupyter Notebooks please see the user documentation. For a quickstart to geographic data in Python continue here.