Application Monitoring, Featured

Using Python Comprehensions

Posted by Mike Beale

One of my favorite features of the Python language is comprehensions. They provide a terse format for creating new containers (set, list, dict, generator, etc.) of data. If you are new to Python you might have missed this powerful feature. Even if you have used list comprehensions in the past, hopefully this post will help you discover new ways to take advantage of comprehensions.

A comprehension creates a container from another container (or any iterable object). Comprehensions can take the place of for loops where the end goal is to transform the data into another format. The syntax can look confusing, but let’s break it down in a basic form and go over the parts.  

First, the syntax is as follows:

value(s) = [expression for variable in iterable]

Where:

value:             The new container being created
expression:     The function or value being collected
variable:         The value returned from each iteration
Iterable:         the iterator object that contains the data you want to transform

Let’s start with a list comprehension example. We have a list of daily temperatures in Celsius, but you want to convert them to Fahrenheit. Using a for loop the code would look similar to this:

To translate the for loop code to a comprehension, we use f as the value, `(float(9)/5) * (c+32)` as the expression, c as the variable, and ‘celsius’ as the iterable.

The comprehension decreases the lines of code to one while maintaining readability and increasing performance (we’ll get to the performance later).

There’s another part to the comprehension syntax which I left off to make the introduction less confusing: filters.

The filter is entirely optional, but it will control which data is evaluated by the expression. For example, if there was a need to only convert the Celsius values above 0, the comprehension could be modified like this.

The for loop code would look like:

A more advanced comprehension would be to use nested comprehensions. The example below builds a 3 X 10 table.

The corresponding for loop would be:

You can start to see how list comprehensions can make code more compact while keeping it readable (IMHO). I’ll leave some more examples at the end of the post for inspiration.

Different Types

We went over list comprehensions, but there’s also support for and set and dict comprehensions.

Set comprehension:

Dict comprehension:

There are generator comprehensions as well. A generator differs from the other comprehensions in that each item of the iterator is evaluated one at a time instead of evaluating the entire list at the same time. This can save on memory requirements.

Formatting

I’m not going to spend much time on the formatting of comprehensions because it’s usually a matter of personal (or team) preference. Typically, comprehensions live on a single line but there are occasions where breaking it up into multiple lines makes sense based on the Python linter being used or the team’s internal code style guidelines.

Performance

Why use comprehensions instead of for loops or the built-in map function?  Since the end results are the same, it really depends on which method performs the best and under which circumstances. To show you what I mean, let’s revisit  the Celsius to Fahrenheit example with code samples using for loops, comprehension, and a map.

Most of the time when I use comprehensions, I use an inline transformation and don’t use a function. For this comparison, we’ll define the transformation function to compare the three methods all with the constraint of calling the same function.

Now we need the three different methods to transform the data.

The code will be run in a loop with rising amounts of Celsius variables to see what effect the processing size has on each method. The code was run on Colaboratory using version 3.6 of Python.

The purple line shows the performance of for loops, the green line shows the performance of list comprehensions, and the blue line shows the performance map. It is pretty clear that for loops perform worse than ist comprehensions and maps.

We can use the `dis` module to inspect the code and see there is a key difference in the generated bytecode that can be used to explain the difference between for loops and comprehensions.

For every iteration the list comprehension uses an optimized loop that contains `LIST_APPEND` where the for loop needs to load the append method (LOAD_METHOD) and call that method (CALL_METHOD).

Surprisingly map outperforms comprehensions. So why am I touting list comprehensions’ if the performance for map is better? The Python built-in of map is optimized for calling a function on each item so let’s see how the inline transformation for list comprehensions perform.

Let’s check the performance by adding two more methods:

  • Inline transformation in the list comprehension
    • [float(9)/5 *  c + 32 for c in celsius]
  • Map with a lambda function instead of a defined function
    • list(map(lambda x:float(9)/5 *  x + 32 , celsius))

With the inline transformation, list comprehensions (red line) pulls ahead of map (blue line) in terms of performance.

There’s also another advantage to list comprehensions in that there is inline filtering. Map doesn’t have that capability, so we can’t compare them fairly.

Based on those results, which method should we choose?  Here’s a set of guidelines to start with.

Use Maps when:

  • You have an existing transform function to map against
  • The transform function is large and/or complicated
  • You don’t need to filter
  • When you need lazy evaluation

Use List Comprehensions when:

  • You need to filter and transform the data
  • You are saving the results
  • The transformation code is small

Use For Loops when:

  • When the end result does not keep the data

These are sweeping statements that are by no means hard rules, but hopefully this offers some general guidelines to get started. I’m sure there are many cases where these guidelines won’t be applicable, but you can use tools like timeit and the dis module to run some tests on your own code to make the best choice for your use case.

A word of warning—production data often performs differently than the randomized/dummy data created for benchmarking.

Additional Examples

As mentioned earlier, I have a collection of comprehension samples. This isn’t meant to be an exhaustive list, but rather a source of inspiration.

Nested comprehension:

Create two lists:

Create key/value:

String comprehension:

Create a data table:

Multiple filters:

Unique set of words in a sentence:

Hopefully this quick post gives you some insight into how comprehensions work and when they might be useful. Along with potential performance gains, they’re more readable and easier to grok then nested for loops—at least in my opinion.

If you would like to go further down the rabbit hole, check out Python generators, async generators, and async comprehensions.

If you have any questions or concerns, feel free to reach out to me directly at mbeale@solarwinds.com.

© 2019 SolarWinds Worldwide, LLC. All rights reserved.