Introducing Mypy, an Experimental Optional Static Type Checker for Python

Python

Tighten up your code and identify errors before they occur with mypy.

I've been using dynamic languages—Perl, Ruby and Python—for many years. I love the flexibility and expressiveness that such languages provide. For example, I can define a function that sums numbers:


def mysum(numbers):
    total = 0
   for one_number in numbers:
       total += one_number
   return total

The above function will work on any iterable that returns numbers. So I can run the above on a list, tuple or set of numbers. I can even run it on a dictionary whose keys are all numbers. Pretty great, right?

Yes, but for my students who are used to static, compiled languages, this is a very hard thing to get used to. After all, how can you make sure that no one passes you a string, or a number of strings? What if you get a list in which some, but not all, of the elements are numeric?

For a number of years, I used to dismiss such worries. After all, dynamic languages have been around for a long time, and they have done a good job. And really, if people are having these sorts of type mismatch errors, then maybe they should be paying closer attention. Plus, if you have enough testing, you'll probably be fine.

But as Python (and other dynamic languages) have been making inroads into large companies, I've become increasingly convinced that there's something to be said for type checking. In particular, the fact that many newcomers to Python are working on large projects, in which many parts need to interoperate, has made it clear to me that some sort of type checking can be useful.

How can you balance these needs? That is, how can you enjoy Python as a dynamically typed language, while simultaneously getting some added sense of static-typing stability?

One of the most popular answers is a system known as mypy, which takes advantage of Python 3's type annotations for its own purposes. Using mypy means that you can write and run Python in the normal way, gradually adding static type checking over time and checking it outside your program's execution.

In this article, I start exploring mypy and how you can use it to check for problems in your programs. I've been impressed by mypy, and I believe you're likely to see it deployed in a growing number of places, in no small part because it's optional, and thus allows developers to use it to whatever degree they deem necessary, tightening things up over time, as well.

Dynamic and Strong Typing

In Python, users enjoy not only dynamic typing, but also strong typing. "Dynamic" means that variables don't have types, but that values do. So you can say:


>>> x = 100
>>> print(type(x))
int

>>> x = 'abcd'
>>> print(type(x))
str

>>> x = [10, 20, 30]
>>> print(type(x))
list

As you can see, I can run the above code, and it'll work just fine. It's not particularly useful, per se, but it never would pass even a first-pass compilation in a statically compiled language. That's because in such languages, variables have types—meaning that if you try to assign an integer to a string variable, you'll get an error.

In a dynamic language, by contrast, variables don't have types at all. Running the type function, as I did above, doesn't actually return the variable's type, but rather the type of data to which the variable currently points.

Just because a language is dynamically typed doesn't mean that it's totally loosey-goosey, letting you do whatever you want. (And yes, that is the technical term.) For example, I can try this:


>>> x = 1
>>> y = '1'
>>> print(x+y)

That code will result in an error, because Python doesn't know how to add integers and strings together. It can add two integers (and get an integer result) or two strings (and get a string result), but not a combination of the two.

The mysum function that you saw earlier assigns 0 to the local "total" variable, and then adds each of the elements of numbers to it. This means that if numbers contains any non-numbers, you're going to be in trouble. Fortunately, mypy will be able to solve this problem for you.

Type Annotations

Python 3 introduced the idea of "type annotations," and as of Python 3.6, you can annotate variables, not just function parameters and return values. The idea is that you can put a colon (:) and then a type following parameter names. For example:


def hello(name:str):
    return f'Hello, {name}'

Here, I've given the name parameter a type annotation of str. If you've used a statically typed language, you might believe that this will add an element of type safety. That is, you might think that if I try to execute:


hello(5)

I will get an error. But in actuality, Python will ignore these type annotations completely. Moreover, you can use any object you want in an annotation; although it's typical to use a type, you actually can use anything.

This might strike you as completely ridiculous. Why introduce such annotations, if you're never going to use them? The basic idea is that coding tools and extensions will be able to use the annotations for their own purposes, including (as you'll see in just a bit) for the purposes of type checking.

This is important, so I'll repeat and stress it: type annotations are ignored by the Python language, although it does store them in an attribute called __annotations__. For example, after defining the above hello function, you can look at its annotations, which are stored as a dictionary:


>>> hello.__annotations__
{'name': <class 'str'>}

Using Mypy

The mypy type checker can be downloaded and installed with the standard Python pip package installer. On my system, in a terminal window, I ran:


$ pip3 install -U mypy

The pip3 reflects that I'm using Python 3, rather than Python 2. And the -U option indicates that I'd like to upgrade my installation of mypy, if the package has been updated since I last installed it on my computer. If you're installing this package globally and for all users, you might well need to run this as root, using sudo.

Once mypy is installed, you can run it, naming your file. For example, let's assume that hello.py looks like this:


def hello(name:str):
   return f"Hello, {name}"

print(hello('world'))
print(hello(5))
print(hello([10, 20, 30]))

If I run this program, it'll actually work fine. But I'd like to use that type annotation to ensure that I'm only invoking the function with a string argument. I can thus run, on the command line:


$ mypy ./hello.py

And I get the following output:


hello.py:7: error: Argument 1 to "hello" has incompatible type
 ↪"int"; expected "str"
hello.py:8: error: Argument 1 to "hello" has incompatible type
 ↪"List[int]"; expected "str"

Sure enough, mypy has identified two places in which the expectation that I've expressed with the type annotation—namely, that only strings will be passed as arguments to "hello"—has been violated. This doesn't bother Python, but it should bother you, either because the type annotation needs to be loosened up, or because (as in this case), it's calling the function with the wrong type of argument.

In other words, mypy won't tell you what to do or stop you from running your program. But it will try to give you warnings, and if you hook this together with a Git hook and/or with an integration and testing system, you'll have a better sense of where your program might be having problems.

Of course, mypy will check only where there are annotations. If you fail to annotate something, mypy won't be able to check it.

For example, I didn't annotate the function's return value. I can fix that, indicating that it returns a string, with:


def hello(name:str) -> str:
   return f"Hello, {name}"

Notice that Python introduced a new syntax (the -> arrow), and allowed me to stick an annotation before the end-of-line colon, in order for annotations to work. The annotation dictionary has now expanded too:


>>> hello.__annotations__
{'name': <class 'str'>, 'return': <class 'str'>}

And in case you're wondering what Python will do if you have a local variable named return that conflicts with the return value's annotation...well, "return" is a reserved word and cannot be used as a parameter name.

More Sophisticated Checking

Let's go back to the mysum function. What will (and won't) mypy be able to check? For example, assume the following file:


def mysum(numbers:list) -> int:
   output = 0
   for one_number in numbers:
       output += one_number
   return output

print(mysum([10, 20, 30, 40, 50]))
print(mysum((10, 20, 30, 40, 50)))
print(mysum([10, 20, 'abc', 'def', 50]))
print(mysum('abcd'))

As you can see, I've annotated the numbers parameter to take only lists and to indicate that the function will always return integers. And sure enough, mypy catches the problems:


mysum.py:10: error:
    Argument 1 to "mysum" has incompatible type
           "Tuple[int, int, int, int, int]"; expected
             ↪"List[Any]"

mysum.py:12: error:
    Argument 1 to "mysum" has incompatible type
           "str"; expected "List[Any]"

The good news is that I've identified some problems. But in one case, I'm calling mysum with a tuple of numbers, which should be fine, but is flagged as a problem. And in another case, I'm calling it with a list of both integers and strings, but that's seen as just fine.

I'm going to need to tell mypy that I'm willing to accept not just a list, but any sequence, such as a tuple. Fortunately, Python now has a typing module that provides you with objects designed for use in such circumstances. For example, I can say:


from typing import Sequence

def mysum(numbers:Sequence) -> int:
   output = 0
   for one_number in numbers:
       output += one_number
   return output

I've grabbed Sequence from the typing module, which includes all three Python sequence types—strings, lists and tuples. Once I do that, all of the mypy problems disappear, because all of the arguments are sequences.

That went a bit overboard, admittedly. What I really want to say is that I'll accept any sequence whose elements are integers. I can state that by changing my function's annotations to be:


from typing import Sequence

def mysum(numbers:Sequence[int]) -> int:
   output = 0
   for one_number in numbers:
       output += one_number
   return output

Notice that I've modified the annotation to be Sequence[int]. In the wake of that change, mypy has now found lots of problems:


mysum.py:13: error: List item 2 has incompatible type "str";
 ↪expected "int"
mysum.py:13: error: List item 3 has incompatible type "str";
 ↪expected "int"
mysum.py:14: error: Argument 1 to "mysum" has incompatible type
 ↪"str"; expected "Sequence[int]"

I'd call this a big success. If someone now tries to use my function with the wrong type of value, it'll call them out on it.

But wait: do I really only want to allow for lists and tuples? What about sets, which also are iterable and can contain integers? And besides, what's this obsession with integers—shouldn't I also allow for floats?

I can solve the first problem by saying that I'll take not a Sequence[int], but Iterable[int]—meaning, anything that is iterable and returns integers. In other words, I can say:


from typing import Iterable

def mysum(numbers:Iterable[int]) -> int:
   output = 0
   for one_number in numbers:
       output += one_number
   return output

Finally, how can I allow for either integers or strings? I use the special Union type, which lets you combine types together in square brackets:


from typing import Iterable, Union

def mysum(numbers:Iterable[Union[int, float]]) ->
 ↪Union[int,float]:
   output = 0
   for one_number in numbers:
       output += one_number
   return output

But if I run mypy against this code, and try to call mysum with an iterable containing at least one float, I'll get an error:


mysum.py:9: error: Incompatible types in assignment
 ↪(expression has type "float", variable has type "int")

What's the problem? Simply put, when I create output as a variable, I'm giving it an integer value. And then, when I try to add a floating-point value to it, I get a warning from mypy. So, I can silence that by annotating the variable:


def mysum(numbers:Iterable[Union[int, float]])
 ↪-> Union[int,float]:
   output : Union[int,float] = 0
   for one_number in numbers:
       output += one_number
   return output

Sure enough, the function is now pretty well annotated. I'm too experienced to know that this will catch and solve all problems, but if others on my team, who want to use my function, use mypy to check the types, they'll get warnings. And that's the whole point here, to catch problems before they're even close to production.

Resources

You can read more about mypy here. That site has documentation, tutorials and even information for people using Python 2 who want to introduce mypy via comments (rather than annotations).

Reuven Lerner teaches Python, data science and Git to companies around the world. You can subscribe to his free, weekly "better developers" e-mail list, and learn from his books and courses at http://lerner.co.il. Reuven lives with his wife and children in Modi'in, Israel.

Load Disqus comments