Pythonic Science in the Browser
In the past, if you wanted a friendly environment for doing Python programming, you would use Ipython. The Ipython project actually consists of three parts: the standard console interface, a Qt-based GUI interface and a web server interface that you can connect to with a web browser. The web browser interface, especially, has become the de facto way of doing scientific programming with Python. It has become so popular in fact, it has spun off as its own project, named Jupyter. In this article, I take a look at how to get the latest version up and running, and I discuss the kinds of things you can do with it once it is set up.
The first step is to install the latest version. Because it is under
very active development, you probably will want to keep it updated on
pip is definitely the easiest way to do this. The following
command will install Jupyter, if it isn't already installed,
or it will update Jupyter to the latest version:
sudo pip install --upgrade jupyter
Be sure that you have a C compiler installed, along with the development package for Python. For example, on Debian-based systems, you can be sure you are ready by executing the following command:
sudo apt-get install python-dev build-essentials
This should make sure you have everything you need installed.
To start Jupyter, open a terminal window and enter the command:
jupyter notebook --no-browser
This will start a web server, listening on port 8888, that will accept connections from the local machine. For security reasons, by default, it will ignore incoming connections from outside machines. If you want to set it up to accept connections from outside machines, you can do so by adding an extra option:
jupyter notebook --no-browser --ip=*
This makes your Jupyter server wide open, so it is strongly discouraged unless you are on a secure private network. Otherwise, you should have some sort of user authentication set up to manage who can use your system.
Once Jupyter is up and running, open a browser and point it to http://localhost:8888. Across the top, you will see a series of tabs for each section of the workspace. Most people will see only three: Files, Running and Clusters. If you are using the Anaconda Python distribution, you will get a fourth tab named Conda.
Figure 1. When you first enter Jupyter, you are presented with a file listing from the current working directory.
On startup, you will be located at the first tab, Files. This is simply a directory listing of the current working directory. You probably won't have any notebooks currently available, so you will need to create a new one. You can do that by clicking the drop-down list on the right-hand side of the screen labeled New and selecting the Python notebook entry on the menu. This will open a new browser tab and load a new empty notebook.
Figure 2. Clicking on the New tab will load a new, empty notebook within a new browser tab.
The second tab shows you all of the active notebooks running on this particular server. You can click the related link to open the selected notebook in a new browser tab or click the Shutdown button to halt the selected notebook. There is also a section for any active terminal sessions. The Clusters section gives you status information on the parallel Python engines that are configured on your system. By default, this isn't configured at all.
Figure 3. You can get a listing of all of the active notebooks and terminal sessions.
If you are using Anaconda, you will get a fourth section labeled Conda. This section gives you details about what packages are available and what packages are installed. You even can manage your Conda environments, creating new ones, exporting existing ones or deleting environments that you are done with.
Figure 4. Jupyter lets you check the details of your Anaconda packages and environments.
Let's take a look at what you can do with the notebook itself. The interface should feel familiar to people who have used applications like Maple or Mathematica. Your input is entered in sections called cells. Cells can be of different types, namely code, markdown text or headings. This way, you can have text cells describing the code sections, explaining what the code is doing and why. It's extremely useful when you're doing scientific calculations, because it allows you to include your documentation with your code, so everything stays synchronized and up to date.
When you start entering lines of code, pressing Enter takes you to a new line within the same cell. None of the code gets executed yet. When you are ready to run the cell, press the Shift and Enter keys together. All of the code runs within the same Python engine, so results from one cell will be available to other cells later on.
Figure 5. You get the output from your Python code displayed within the notebook.
You also can import extra Python modules, just like you do in any other Python environment. The most useful for visualizing data and results is matplotlib. If you import the matplotlib module and execute plotting commands, Jupyter can render the resulting graphs directly in the notebook.
Figure 6. Jupyter can render matplotlib graphs directly in the notebook.
As you can see, you need to use an extra statement that starts with a % character to tell Jupyter to render the plot as an image within the notebook. Otherwise, the plots will be rendered within a new window. This new statement is called a magic. There is an entire library of magics available. For example, you can use the timeit magic to profile how long it takes to run the code within a cell.
Figure 7. There are several magic statements available, such as the timeit magic to find runtimes of code cells.
If it is a section of code that has a short runtime, Jupyter automatically will run it several times to get an average runtime. This allows you to work on optimizing your code as well as developing it.
When you are ready to share your results, several different options are available. You always can simply share the Jupyter notebook. It is all stored in a single file with the filename ending .ipynb. The other options depend a bit on which Python modules you have installed on your system. The formats most often used are either as a single static HTML page or PDF document. If you are going to present your results, you even can export it as an HTML-based presentation where the cells are formatted as individual slides. All of these options are available under the File→Download As menu item.
Hopefully this article has shown some functionality you can make use of in your own code. Jupyter is especially useful for scientific exploration. You can try lots of different calculations and do different types of data analysis and see the results right away within the notebook. And when you reach a useful conclusion, you can share the notebook with others within the growing community of Jupyter users.
Limited Time Offer
Take Linux Journal for a test drive. Download our September issue for FREE.
Topic of the Week
The cloud has become synonymous with all things data storage. It additionally equates to the many web-centric services accessing that same back-end data storage, but the term also has evolved to mean so much more.