Transitioning to Python 3

on December 6, 2016

The Python language, which is not new but continues to gain momentum and users as if it were, has changed remarkably little since it first was released. I don't mean to say that Python hasn't changed; it has grown, gaining functionality and speed, and it's now a hot language in a variety of domains, from data science to test automation to education. But, those who last used Python 15 or 20 years ago would feel that the latest versions of the language are a natural extension and evolution of what they already know.

At the same time, changes to the language—and particularly changes made in Python 3.x—mean that Python 2 programs won't run unmodified in Python 3. This is a known issue, and it was part of the process that Python's BDFL (Benevolent Dictator for Life) Guido van Rossum announced back when the "Python 3000" project was launched years ago. Guido expected it would take time for organizations to move from Python 2 to Python 3, but he also felt that the improvements to the language were necessary.

The good news is that Python 3, which at the time of this writing exists in version 3.5, is indeed better than Python 2. The bad news is that there still are a lot of companies (including many of my training and consulting clients) that still use Python 2.

Why don't they just upgrade? For the most part, it's because the time and effort needed to do so aren't seen as a worthwhile investment of developer resources. Most differences between Python 2 and 3 are easily expressed and understood by people, but the upgrades aren't completely automatic. Moving a large code base from Python 2 to 3 might take days, but it also might take weeks or months.

That said, companies will soon be forced to upgrade, because as of the year 2020, there will be no more support for Python 2. That's a risk many companies aren't going to want to take.

If you have to upgrade, but can't upgrade, that puts you in a terrible spot. However, there is another option: upgrade incrementally, modifying just 1–2 files each week so that they work with both Python 2 and 3. After a number of months of such incremental changes, you'll be able to switch completely to Python 3 with relatively little investment.

How can you make your code compatible with both? In this article, I provide a number of suggestions on how to do this, using both an understanding of Python 3's changes and the tools that have been developed to make this transition easier. Don't wait until 2019 to start making these changes; if you're a Python developer, you already (in mid-2016) should be thinking about how to change your code to be Python 3-compatible.

What Has Changed?

The first thing to ask is this: what exactly changed in Python 3? And, how easily can you move from Python 2 to Python 3? Or, how can you modify your Python 2 programs so they'll continue to work in Python 2, but then also work unmodified in Python 3? This last question is probably the most important one for my clients, and possibly for your business as well, during this transition period.

On the face of things, not very much actually changed in Python 3. It's a cleaner, more efficient and modern language that works like more modern Python developers want and expect. Things that Python developers were doing for years, but that weren't defaults in the language, are now indeed defaults. Sure, there are things I'm still getting used to after years of bad habits, such as failing to use parentheses around the arguments passed to print, but on the whole, the language has stayed the same.

However, this doesn't mean that nothing has changed or that you can get away with not changing your code.

For example, you almost certainly never wanted to use Python 2's input built-in function to get user input. Rather, you wanted to use the raw_input built-in function. So in Python 3, there is no equivalent to Python 2's input; the Python 3 input function is the same as Python 2's raw_input.

A more profound change is the switch in the behavior of strings. No longer do strings contain bytes; now they contain Unicode characters, encoded using UTF-8. If 100% of your work uses ASCII, you're in luck; nothing in your programs will really need to change. But if you use non-ASCII characters, and if you do so in the same program as you work with the contents of binary files, you'll have to make some adjustments. Python 2's str class is now a bytes class, and Python 2's unicode class is now the str class.

A number of other changes have been made that make Python more efficient. For example, Python 2 has the range function (which returns a list of integers) and the xrange function (which returns an iterator). Python 3's range function is the same as Python 2's xrange, because it's so much more efficient, and there really are few reasons to prefer the old range. But if your program expects to get a list back from range, you might be in trouble when you move to Python 3.

Another problem, which has become far less acute in the last year or two, is that of third-party libraries. If you're using packages from PyPI, you need to make sure not only that your own code works with Python 3, but also that all of those packages do. For a long time, I would argue that these packages were the bottleneck stopping many people from upgrading. But nowadays, most popular packages support Python 3, as you can see at the Python 3 Readiness site, which tracks such information.

Identifying Problems

So, how can you take a Python 2 program and modify it so that it'll work under both Python 2 and 3? You could go through the code line by line and try to find changes, but there are tools that can make the process much easier.

The first is an old friend of Python developers, the pylint program, which normally checks your code for Python style and usage. Modern versions of pylint have a py3k option you can apply that checks your code to see how compatible it is with Python 3. For example, let's assume you have written the (terrible) program shown in Listing 1. How can you find out which parts of it aren't going to work? You can run this:


pylint --py3k oldstuff.py

And, you'll get the following output:


************* Module oldstuff
W:  3, 7: raw_input built-in referenced (raw_input-builtin)
E:  4, 0: print statement used (print-statement)
E:  5, 0: print statement used (print-statement)
E:  6, 0: print statement used (print-statement)
W:  8, 9: raw_input built-in referenced (raw_input-builtin)
E: 10, 4: print statement used (print-statement)
W: 10,48: division w/o __future__ statement (old-division)
E: 14, 4: print statement used (print-statement)
W: 16, 4: range built-in referenced when not iterating
 ↪(range-builtin-not-iterating)
E: 17, 0: print statement used (print-statement)

The output contains both errors ("E") and warnings ("W"). The example program is using print as a statement, rather than a function. It's using range when not iterating. And, it's using raw_input. What can you do about it, and how can you improve things? pylint won't tell you; that's not its job. But if nothing else, you now have a list of things to fix and improve, so that it'll at least run under Python 3.

Listing 1. oldstuff.py


#!/usr/bin/env python

name = raw_input("Enter your name: ")
print "Hello, ",
print name,
print "!"

number = raw_input("Enter a number: ")
for i in [2,3,5]:
    print "{} / {} = {}".format(int(number), i, int(number) / i)


for i in range(10):
    print i

x = range(10)
print x[3]

If you have written a Python package with a requirements file, you can download and install caniusepython3 from PyPI. Running caniusepython3 against your requirements file will indicate what will work and what won't. If you don't want to download and install caniusepython3, you actually can go to the Can I Use Python 3 site and upload your requirements file there.

Fixing Problems

Python has come with a program called 2to3 for some time that looks over your Python 2 code and tries to find ways to make it work with Python 3. So, you can run:


2to3 oldstuff.py





and get unified diff-style output, indicating what changes you'll need
to make in order for your program to work under Python 3. The problem
is that this is a one-way conversion. It tells you how to change your
program so it'll work with Python 3, but it doesn't help you make your program
compatible with both 2 and 3 simultaneously.



Fortunately, there's a package on PyPI called futurize that not
only runs 2to3, but also provides the import statements necessary for your
code to run under both versions. You can just run:


futurize oldstuff.py






and the output is (as with 2to3) in diff format, so you
can use it
either to create a file that's compatible with both or to read through
things.



What if you have Python 3 code and want to make it backward-compatible with Python 2?
The same people who make futurize also
make the amusingly named pasteurize, which inserts the appropriate
import statements into code.



How do you know if your code really works well under both Python 2 and
3 after you have applied futurize's changes? You can't, and there is
no doubt that these automatic tools will get some things wrong. For
this reason (among others), it's crucial that you have a good test
suite, with good coverage of your Python 2 code. Then you can run
your tests against the Python 3 version and ensure that it works
correctly there as well. Without these tests, you shouldn't think
that your upgrade has worked; even 100% test coverage is never a
guarantee, but it at least can tell you that the risk of failure has
been minimized.



What if you're doing all sorts of serious and deep things with Python
2 that 2to3 can't notice, or that you can't paper
over? A great
package on PyPI is six, which papers over the differences between
Python 2 and 3. For example, let's say you want to create a new
object of the type used for text, such that things will be compatible
across versions. In Python 2, that's going to be unicode, but in
Python 3, that's going to be str. You don't want to have an
"if"
statement in your code each time you do this. Thus, using six, you can
say:


import six
s = six.text_type()





Now you can be sure that "s" is an object of the appropriate type.



six defines an amazing array of things that have changed, which you
might need to keep track of in your code. Want to check something in
the builtins namespace (aka
__builtins__ in Python 2)? Want to
re-raise exceptions? Want to use StringIO (or BytesIO)? Want to deal
with metaclasses? Using six, you can write a single line of code,
which behind the scenes will issue the appropriate "if" statements for
the version of Python you're using.



Even if you don't use six in your code, I recommend that you read
through its documentation just to see where things have changed in
Python 3. It'll open your eyes (as it did to mine) regarding the
behind-the-scenes changes that often aren't discussed in the Python
2/3 world, and it might give you more insights into how to write your
code so that it can work in both.



Conclusion


If you're starting to write some new Python code today, you should use
Python 3. And if you have Python 2 code that you can upgrade to
Python 3, you should do that as well. But if you're like most
companies with an existing Python 2 code base, your best option
might well be to upgrade incrementally, which means having code that
works under 2 and 3 simultaneously. Once you've converted all of your
code, and it passes tests under both 2 and 3, you can flip the switch,
joining the world of Python 3 and all of its goodness.



Resources


Much has been written about the changes in Python 2 and 3. A great
collection of such information is at the Python-Future website.
That site offers the futurize and
pasteurize packages as well as a
great deal of documentation describing the changes between versions,
techniques for upgrading and things to watch out for.



The six package is documented
here. Even
if you don't use six for 2/3 compatibility, I strongly suggest
that you look through its capabilities.



Finally, if you're a web developer using Django, you
definitely should read the Django-specific documentation regarding moving to
Python 3
here.



This is especially important because of Django's handling of strings,
bytes and Unicode strings, the names of which changed a bit over the
years. Django actually includes a copy of the six library, modified
slightly to suit its needs for internal use.

Reuven Lerner teaches Python, data science and Git to companies around the world. You can subscribe to his free, weekly "better developers" e-mail list, and learn from his books and courses at http://lerner.co.il. Reuven lives with his wife and children in Modi'in, Israel.

Load Disqus comments

python

HOWTOs

Transitioning to Python 3

Recent Articles