Using Python for Science

Introducing Anaconda, a Python distribution for scientific research.

I've looked at several ways you could use Python to do scientific calculations in the past, but I've never actually covered how to set up and use Python itself in a way that makes scientific work easier. Anaconda does just that.

The default installation includes a large number of Python modules that are useful when doing data science—or really any other type of scientific computing. Installation is relatively easy. You can find download links on the main Anaconda site that will allow you to choose between Mac OS X, Windows and Linux.

For Linux, you can choose between Python 2.X and 3.X, as well as between 32-bit or 64-bit executables. Now that Python 3.X has matured more, my default suggestion has changed. Unless you have a specific reason to do something differently, I suggest that you default on downloading and using Python 3.X. Once it is downloaded, you can make the downloaded file executable or you can run it directly by using bash, like this:

bash ./

You'll need to accept the license agreement to finish the installation.

The installer will ask for an installation location, defaulting on the anaconda3 directory within your home directory. It will unpack everything there and then ask if you want its bin directory added to your PATH environment variable. It's important to remember this if you use Python scripts to do system administration tasks. If you just run the command python, it will default to the one installed by Anaconda.

One of the core technologies that makes Anaconda unique is the conda package management system. Conda can be used to manage all of the modules and other software installed when you installed Anaconda. To manage updates, simply run the following commands:

conda update --all

You also can update individual packages selectively by using their package names in the above command rather than the --all option. To install a new Python module, such as opencv, use the command:

conda install opencv

That command will check on all the requirements and make sure all the dependencies are correct.

If you can't remember (or don't know) what a particular module name might be in the conda packaging scheme, you can do a search with a command like the following:

conda search --names-only open

This will return a list of all of the conda package names that have the text "open" in them.

You always can check to see what already has been installed by using the list option to conda.

If you have finished with some experimental code and want to remove a particular package that you no longer need, you can uninstall them with the following command:

conda remove opencv

All of those commands have several more options that I haven't covered here, but you can find many more details by looking at their help pages.

Another really powerful tool, especially when working on multiple projects, is the enhanced management of virtual environments that is possible with Anaconda. When you are doing research computations, you often have to start with explorations into your problem area. You definitely don't want any of those exploratory tasks to interfere with any currently ongoing work. So the best option is to set up a separate, isolated, environment where it is safe to destroy things with no fear of losing earlier work. This is handled by virtual environments. Python has had virtual environments for some time, but managing them can be unintuitive for some people. Anaconda has included a set of tools to help simplify the process.

When you install Anaconda, you actually are operating within a default environment already. In order to create a new one, you would use the command:

conda create --name project1

In order to activate this new environment, run the command:

source activate project1

Now, everything you do, with regard to Python and conda, will take place within this environment. For example, if you run the command conda list within this environment, you'll see that there are no packages installed. If you now install a package, it will exist only within this environment. This way, you can have an isolated environment that will contain only the Python modules you need for that particular project.

If you already have an environment that you have been working with, but you want to extend it in some manner, you can clone this starting environment with the command:

conda create --name project2 --clone project1

As you work with this environment, conda keeps track of the history of changes that you have applied to it. You can get the list of those changes with:

conda list --revisions

This way, you always can revert back to some previous revision with the following command, where X is the revision number you want to revert to):

conda install --revision X

Once you are done with your work for the day, you can deactivate a given environment with:

source deactivate

When you are completely finished with a particular environment, you can permanently delete it with:

conda remove --name project2 --all

Just be sure that you are deleting the correct environment. You don't want to destroy all of your hard work accidentally.

You can get a list of all of the environments managed by conda with the command:

conda info --envs

If you are working on a project collaboratively, you probably don't want to have to send an entire environment to someone else, as that simply would take too much bandwidth. You also don't want to send a list of handwritten instructions on how to re-create it, as humans are famous for forgetting steps. Instead, conda includes the following command that you can use to create a descriptive file:

conda list --explicit >> project1.txt

You can send this file to your collaborators and have them run this:

conda create --name my_project1 --file ./project1.txt

That will allow them to re-create your project environment.

All of these commands have been managed on the command line, but not everyone is comfortable working that way. For those people, Anaconda includes the Anaconda Navigator. You can start it with the command anaconda-navigator.

Figure 1. The Anaconda Navigator provides a graphical interface for interacting with your installation of Anaconda.

On the first page of the application, you'll see launchers for the major Python software that is available through Anaconda. This includes packages like spyder, orange3 and the jupyter notebook. If they haven't been installed yet, you'll see a button labelled "Install" rather than "Launch".

The second page allows you to manage environments within your Anaconda installation. From here, you can manage the installed Python modules, create new environments or clone existing ones. You even can import projects from specification files to create a new copy of an environment. The right-hand side of the window displays Python modules, and you filter based on whether the list is those installed, update-able or yet to be installed.

Figure 2. Anaconda allows for managing environments within your installation.

There is a third page, currently in beta, which manages projects. Projects are a way of organizing larger pieces of code and sharing them with others. Sharing is made easier with the Anaconda Cloud. Once you have an account on Anaconda Cloud, you can upload projects, environments, packages and jupyter notebooks. Once they have been uploaded, you can share them with other people around the globe much more easily. Although you can log in and work with the Anaconda Cloud in a web browser, the Anaconda Navigator allows you to log in directly from there and be able to interact with your materials stored online.

Figure 3. Anaconda also helps you manage larger projects, along with environments.

This was a short introduction, but hopefully I covered enough to help you better organize your scientific code. In future articles, I plan to dig a bit more into actually doing some scientific work with Python and taking advantage of these organizational tools.

Load Disqus comments