# Python Profiling – cProfile and line_profiler Tools (Part 2)

## cProfile

The Python standard library includes the profile tool by default; however, it additionally includes cProfile, which is an optimization of profile written in C.

A good engineering practice, prior to profiling, is to generate a hypothesis about the speed and performance of the components of the application under evaluation. This is useful, as it helps us identify potential bottlenecks, and allows us to support our findings with evidence.

We are going to use the same sorting algorithm that we used in the previous post (Time Profiling), and cProfile to observe the performance:

import numpy as np

def bubble_sorting(array):
n = len(array)

for i in range(n):
sorted_item = True

for j in range(n - i - 1):
if array[j] > array[j + 1]:
array[j], array[j + 1] = array[j + 1], array[j]
sorted_item = False

if sorted_item:
break

return array

work_array = np.random.randint(1, 100000, size=10000)
bubble_sorting(work_array)

alexis@MacBook-Pro % python -m cProfile -s cumulative main.py
74777 function calls (72805 primitive calls) in 10.061 seconds

Ordered by: cumulative time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
110/1    0.000    0.000   10.062   10.062 {built-in method builtins.exec}
1    0.000    0.000   10.061   10.061 main.py:1(<module>)
1    9.959    9.959    9.959    9.959 main.py:4(bubble_sorting)
13    0.000    0.000    0.213    0.016 __init__.py:1(<module>)
149/1    0.000    0.000    0.102    0.102 <frozen importlib._bootstrap>:1002(_find_and_load)
149/1    0.000    0.000    0.102    0.102 <frozen importlib._bootstrap>:967(_find_and_load_unlocked)
138/1    0.000    0.000    0.102    0.102 <frozen importlib._bootstrap>:659(_load_unlocked)
109/1    0.000    0.000    0.102    0.102 <frozen importlib._bootstrap_external>:844(exec_module)
216/1    0.000    0.000    0.102    0.102 <frozen importlib._bootstrap>:220(_call_with_frames_removed)
179/15    0.000    0.000    0.095    0.006 <frozen importlib._bootstrap>:1033(_handle_fromlist)
362/8    0.000    0.000    0.095    0.012 {built-in method builtins.__import__}
138/135    0.000    0.000    0.042    0.000 <frozen importlib._bootstrap>:558(module_from_spec)
28/26    0.000    0.000    0.040    0.002 <frozen importlib._bootstrap_external>:1171(create_module)
28/26    0.039    0.001    0.040    0.002 {built-in method _imp.create_dynamic}
109    0.000    0.000    0.027    0.000 <frozen importlib._bootstrap_external>:916(get_code)
1    0.000    0.000    0.023    0.023 _pickle.py:1(<module>)
1    0.000    0.000    0.022    0.022 multiarray.py:1(<module>)
1    0.000    0.000    0.022    0.022 overrides.py:1(<module>)
...


The -s cumulative parameter sorts function calls by elapsed execution time, and gives us an idea of where the longest execution time occurred. We can see that the total execution time is similar to the times obtained when we measured just the elapsed execution time, which gives us a good indication of the optimization level of cProfile.

Line [9] tells us that the bubble_sorting method is called only once, and between lines [11] and [20] we see that most calls occur where the sort is performed, moving and freezing data in memory. These metrics can give us a read on where most of the processing of the function is done, since figuring out what’s happening line by line is very tricky with cProfile.

These results can be exported to a binary file for further analysis, using the Python pstats library; these statistics basically contain the information obtained by the terminal:

alexis@MacBook-Pro % python -m cProfile -o statistics.stats main.py

import pstats

# Displays the results, sorted by cumulative elapsed time
p = pstats.Stats('statistics.stats')
p.sort_stats('cumulative')
p.print_stats()


We can display the information of the calling functions, using the p.print_callers() method:

Ordered by: cumulative time

Function                                                  was called by...
ncalls  tottime  cumtime
{built-in method builtins.exec}                    <-   109/1    0.000    0.116  <frozen importlib._bootstrap>:220(_call_with_frames_removed)
main.py:1(<module>)                                <-       1    0.000   10.023  {built-in method builtins.exec}
main.py:4(bubble_sorting)                          <-       1    9.905    9.905  main.py:1(<module>)
<frozen importlib._bootstrap>:1002(_find_and_load) <-       1    0.000    0.002  /<base_dir>/base64.py:3(<module>)
1    0.000    0.000  /<base_dir>/bisect.py:1(<module>)
2    0.000    0.008  /<base_dir>/ctypes/__init__.py:1(<module>)
2    0.000    0.001  /<base_dir>/datetime.py:1(<module>)
1    0.000    0.001  /<base_dir>/hashlib.py:82(__get_builtin_constructor)
2    0.000    0.007  /<base_dir>/hmac.py:1(<module>)
2    0.000    0.003  /<base_dir>/json/__init__.py:1(<module>)
1    0.000    0.001  /<base_dir>/json/scanner.py:1(<module>)
4    0.000    0.000  /<base_dir>/ntpath.py:2(<module>)
3    0.000    0.002  /<base_dir>/pathlib.py:1(<module>)
4    0.000    0.002  /<base_dir>/pickle.py:1(<module>)
1    0.000    0.005  /<base_dir>/platform.py:3(<module>)
3    0.000    0.002  /<base_dir>/random.py:1(<module>)
3    0.000    0.011  /<base_dir>/secrets.py:1(<module>)
2    0.000    0.001  /<base_dir>/site-packages/numpy/__init__.py:1(<module>)
...


Or the p.print_callees() option to show which functions call for other functions:

Ordered by: cumulative time

Function                                  called...
ncalls  tottime  cumtime
{built-in method builtins.exec}  ->       1    0.000    0.000  /<base_dir>/__future__.py:1(<module>)
1    0.000    0.000  /<base_dir>/_compat_pickle.py:9(<module>)
...
1    0.000    0.000  /<base_dir>/urllib/parse.py:1(<module>)
1    0.000   10.023  main.py:1(<module>)
main.py:1(<module>)              ->       1    0.000    0.117  <frozen importlib._bootstrap>:1002(_find_and_load)
1    9.905    9.905  main.py:4(bubble_sorting)
1    0.000    0.001  {method 'randint' of 'numpy.random.mtrand.RandomState' objects}
main.py:4(bubble_sorting)        ->       1    0.000    0.000  {built-in method builtins.len}
<frozen importlib._bootstrap>:1002(_find_and_load)  ->     149    0.000    0.000  <frozen importlib._bootstrap>:152(__init__)
149    0.000    0.001  <frozen importlib._bootstrap>:156(__enter__)
149    0.000    0.000  <frozen importlib._bootstrap>:160(__exit__)
148    0.000    0.000  <frozen importlib._bootstrap>:185(cb)
149    0.000    0.000  {method 'get' of 'dict' objects}
<frozen importlib._bootstrap>:967(_find_and_load_unlocked)  ->     4/1    0.000    0.007  <frozen importlib._bootstrap>:220(_call_with_frames_removed)
...


The information provided by cProfile is extensive and a complete visualization of the information will be necessary in order to obtain the greatest benefit from the tool. Something that can help us in the visualization is to use the SnakeViz module, which diagrams the information obtained by cProfile.

## SnakeViz, for cProfile insights

According to the description found on the website:

SnakeViz is a browser based graphical viewer for the output of Python’s cProfile module and an alternative to using the standard library pstats module…

SnakeViz Website

Using the statistics file obtained with cProfile, we can visualize the results of the profiling:

snakeviz statistics.stats


If we choose the Sunburst chart style, I find it clearer that most of the processing is done from the main call to bubble_sorting (which is obvious, considering the simplicity of the test case):

These graphs are useful to communicate to others about the results of the execution of the module under test, and it helps to visually identify which part of the process consumes the most processing.

## Line-by-line Profiling

line_profiler is a great tool for identifying causes of poor code performance as it evaluates Python functions line by line, and this is where using cProfile should lead us: to evaluate sections of problematic code and that will be the ideal point to use line_profiler.

TIP📝 -> It is important to keep track of code changes and profiling results as some changes may cause drawbacks in other sections.

line_profiler includes two utilities:

• The @profile decorator, which indicates the functions to be profiled.
• kernprof is responsible for executing and recording the evaluation metrics and statistics.

Optionally, kernprof has the -v (verbose) parameter which, when included, displays the results on termination; and when omitted it generates a file with extension .lprof that can be analyzed later using line_profiler.

For example:

import numpy as np

# WARNING DEVS: this decorator doesn't need to import anything
# this is intended for kernprof usage only
# be prepared with some coffee to wait until the profiling finishes
@profile
def bubble_sorting(array):
n = len(array)

for i in range(n):
sorted_item = True

for j in range(n - i - 1):
if array[j] > array[j + 1]:
array[j], array[j + 1] = array[j + 1], array[j]
sorted_item = False

if sorted_item:
break

return array

work_array = np.random.randint(1, 100000, size=10000)
bubble_sorting(work_array)

alexis@MacBook-Pro % kernprof -l -v main.py
Wrote profile results to main.py.lprof
Timer unit: 1e-06 s

Total time: 68.9938 s
File: main.py
Function: bubble_sorting at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
4                                           @profile
5                                           def bubble_sorting(array):
6         1          4.7      4.7      0.0      n = len(array)
7
8      9863       3127.6      0.3      0.0      for i in range(n):
9      9863       3076.8      0.3      0.0          sorted_item = True
10
11  49995547   15022385.0      0.3     21.8          for j in range(n - i - 1):
12  49985684   27385581.9      0.5     39.7              if array[j] > array[j + 1]:
13  24989733   18838493.1      0.8     27.3                  array[j], array[j + 1] = array[j + 1], array[j]
14  24989733    7737957.2      0.3     11.2                  sorted_item = False
15
16      9863       3134.5      0.3      0.0          if sorted_item:
17         1          0.3      0.3      0.0              break
18
19         1          6.1      6.1      0.0      return array


This is much more descriptive than the results obtained from cProfile, but the idea is to use cProfile to identify which functions you should evaluate with line_profiler. Additionally, we can learn about the cost of operations and improve our programming skills.

Perhaps you were able to see the comment close the bubble_sorting method definition:

# WARNING DEVS: this decorator doesn't need to import anything
# this is intended for kernprof usage only
# be prepared with some coffee to wait until the profiling finishes


Adding the @profile decorator will work when we are using line_profiler (or memory_profiler), but if you try to run, for example, unit tests, it will fail with the following message:

Traceback (most recent call last):
File "/Users/alexis/PythonProfiling/main.py", line 3, in <module>
@profile
NameError: name 'profile' is not defined

Process finished with exit code 1


You don’t want broken code into your Git repository, so you can add the following code to avoid the error message on runtime:

import numpy as np

if 'line_profiler' not in dir() and 'profile' not in dir():
def profile(func):
return func

@profile
def bubble_sorting(array):
n = len(array)

for i in range(n):
sorted_item = True

for j in range(n - i - 1):
if array[j] > array[j + 1]:
array[j], array[j + 1] = array[j + 1], array[j]
sorted_item = False

if sorted_item:
break

return array

work_array = np.random.randint(1, 100000, size=1000)
bubble_sorting(work_array)



## Other useful tools

### gprof2dot

Another useful tool is gprof2dot, which allows you to view the statistics obtained with cProfile. Requires prior installation of the Graphviz app. gprof2dot contains the dot command, and it is used as follows:

alexis@MacBook-Pro % gprof2dot -f pstats statistics.stats | dot -Tpng -o output.png


This pipeline transfers the analysis performed by gprof2dot to the Graphviz dot command, and generates the following graph:

### Pyinstrument

This tool complements cProfile by adding some missing functionalities like more visualization options and integration with web frameworks like Django, Flask or FastAPI, which I consider is a great help for profiling Web applications and REST APIs.

The most basic way to use Pyinstrument, is just calling the command, followed by the script to be evaluated:

alexis@MacBook-Pro % pyinstrument main.py

_     ._   __/__   _ _  _  _ _/_   Recorded: 07:50:33  Samples:  73
/_//_/// /_\ / //_// / //_'/ //     Duration: 9.885     CPU time: 10.188
/   _/                      v4.2.0

Program: main.py

9.885 <module>  <string>:1
[4 frames hidden]  <string>, runpy
9.885 _run_code  runpy.py:64
└─ 9.885 <module>  main.py:1
├─ 9.772 bubble_sorting  main.py:4
└─ 0.112 <module>  numpy/__init__.py:1
[193 frames hidden]  numpy, pickle, <built-in>, pathlib, a...



For more details on how to integrate Pyinstrument into the related frameworks, we can click the following links:

## Conclusion

cProfile and line_profiler offer us a great detail of information about the execution of the sections of code that we want to evaluate. On our part, it is important to understand the methodology to obtain the best results of using these tools.

In each execution cycle and code modification we can formulate hypotheses about what we expect from the change, and evaluate its performance. Logging the results will be important to understand which code changes positively impacted performance.

Additionaly, cProfile is the default option in PyCharm, which makes a kind of “default” option for many developers.

## Appendix

### Install with pip or conda

# PIP
pip install snakeviz
pip install line_profiler[ipython]
pip install gprof2dot
pip install pyinstrument

# Conda
conda install -c anaconda snakeviz
conda install -c conda-forge line_profiler
conda install -c conda-forge gprof2dot
conda install -c conda-forge pyinstrument


If you think this content was helpful, consider buy me a cup of coffee. 😉 ☕👉

Share