Python Profiling – Memory Profiling (Part 3, Final)

memory_profiler
PySpy
DISassembling
Final Recommendations

memory_profiler

Similar to line_profiler, memory_profiler provides detailed memory usage measurements, with the aim of efficiently reducing memory consumption and optimizing memory usage to improve application performance..

⚠️ Before starting using this tool, it is important to mention the impact on the execution speed of the code while it is in use, since the execution time increases dozens of times, it is convenient to profiling time and CPU beforehand, and use line_profiler only on sections identified to be optimized.

As with line_profiler, the @profile decorator indicates the method to be evaluated.

However, analyzing the memory usage in Python is complicated, even if we consider that the garbage collector is freeing memory, we do not know for sure when a section of memory is freed and when an object is going to be released, or how long it will remain in the pool until it is finally disposed. Many times, it is convenient to handle methods that consume more memory, and not repeatedly allocate memory because it is an expensive process, and it is more efficient than analyzing the memory usage of the code line by line.

Using the sorting algorithm as an experiment, we perform the following exercise:

import numpy as np

@profile
def bubble_sorting(array):
    n = len(array)

    for i in range(n):
        sorted_item = True

        for j in range(n - i - 1):
            if array[j] > array[j + 1]:
                array[j], array[j + 1] = array[j + 1], array[j]
                sorted_item = False

        if sorted_item:
            break

    return array


work_array = np.random.randint(1, 100000, size=10000)
bubble_sorting(work_array)

In the following instruction, I use gtime -p along with memory_profiler to measure the execution time, but it is not required.

alexis@MacBook-Pro % gtime -p python -m memory_profiler main.py

Filename: main.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     3   56.859 MiB   56.859 MiB           1   @profile
     4                                         def bubble_sorting(array):
     5   56.859 MiB    0.000 MiB           1       n = len(array)
     6                                         
     7   56.906 MiB -254060.531 MiB        9909       for i in range(n):
     8   56.906 MiB -254060.531 MiB        9909           sorted_item = True
     9                                         
    10   56.906 MiB -874942713.984 MiB    50000814           for j in range(n - i - 1):
    11   56.906 MiB -874688626.062 MiB    49990905               if array[j] > array[j + 1]:
    12   56.906 MiB -287907579.547 MiB    25155548                   array[j], array[j + 1] = array[j + 1], array[j]
    13   56.906 MiB -287907586.656 MiB    25155548                   sorted_item = False
    14                                         
    15   56.906 MiB -254101.422 MiB        9909           if sorted_item:
    16   16.016 MiB  -40.891 MiB           1               break
    17                                         
    18   16.016 MiB    0.000 MiB           1       return array

real 1855.21
user 1046.50
sys 808.82

There are two important things to note here: first, the output of memory_profiler is very detailed line by line, which is what we are looking for when using this tool; and, second, the elapsed time is approximately 30 minutes, which is a considerable time compared to the original time (~10 seconds). These metrics should give us an indication of how and when to use this tool.

memory_profiler includes the mprof command (requires matplotlib to be installed), which allows you to visualize changes in memory usage, for example:

alexis@MacBook-Pro PythonProfiling % mprof run main.py

mprof: Sampling memory every 0.1s
running new process
running as a Python program...
alexis@MacBook-Pro PythonProfiling % mprof plot                             
Using last profile data.

PySpy

PySpy is a great tool that helps us to evaluate processes that are already running, without the need to stop and configure them beforehand. For this reason, it requires additional privileges to evaluate other processes. It is especially useful for running experiments in environments such as Production or UAT.

The py-spy package must be installed with PIP or Conda, depending on the local Python installation, and the command it includes is py-spy. To evaluate a running process it is important to obtain the Process Identifier (PID), for example:

Linux/MacOS
Windows

Linux/MacOS

alexis@MacBook-Pro % ps -A

  PID TTY           TIME CMD
...
36588 ??         0:51.08 /Users/alexis/miniforge3/bin/python /Users/alexis/PythonProfiling/main.py
...

Windows

C:\Users\alexis>tasklist

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
...
python3.10.exe               18212 Console                    1     47,460 K
...

PySpy has 4 subcommands:

USAGE:
    py-spy <SUBCOMMAND>

OPTIONS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    record    Records stack trace information to a flamegraph, speedscope or raw file
    top       Displays a top like view of functions consuming CPU
    dump      Dumps stack traces for a target program to stdout
    help      Prints this message or the help of the given subcommand(s)

Example of subcommand dump:

Linux/MacOS
Windows

Linux/MacOS

alexis@MacBook-Pro % sudo env "PATH=$PATH" py-spy dump --pid 36588

Process 36588: /Users/alexis/miniforge3/bin/python /Users/alexis/PythonProfiling/main.py
Python v3.9.13 (/Users/alexis/miniforge3/bin/python3.9)

Thread 0x100FDC580 (active+gil): "MainThread"
    bubble_sorting (main.py:11)
    <module> (main.py:22)

Windows

C:\Users\alexi>py-spy dump --pid 18212
Process 18212: python  main.py
Python v3.9.12 (C:\Users\alexi\anaconda3\python.exe)

Thread 2168 (active+gil): "MainThread"
    bubble_sorting (main.py:12)
    <module> (main.py:21)

Example of subcommand top, it refreshes the information once per second:

Linux/MacOS
Windows

Linux/MacOS

alexis@MacBook-Pro % sudo env "PATH=$PATH" py-spy top --pid 36588

Collecting samples from '/Users/alexis/miniforge3/bin/python /Users/alexis/PythonProfiling/main.py' (python v3.9.13)
Total Samples 1200
GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                   
 38.00%  38.00%    5.86s     5.86s   bubble_sorting (main.py:12)
 52.00%  52.00%    5.65s     5.65s   bubble_sorting (main.py:11)
  9.00%   9.00%   0.420s    0.420s   bubble_sorting (main.py:10)
  1.00%   1.00%   0.070s    0.070s   bubble_sorting (main.py:13)
  0.00% 100.00%   0.000s    12.00s   <module> (main.py:22)

Windows

C:\Users\alexi>py-spy top --pid 18212

Collecting samples from 'python  main.py' (python v3.9.12)
Total Samples 400
GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)
 58.00%  58.00%    2.27s     2.27s   bubble_sorting (main.py:12)
 36.00%  36.00%    1.51s     1.51s   bubble_sorting (main.py:11)
  5.00%   5.00%   0.160s    0.160s   bubble_sorting (main.py:10)
  1.00%   1.00%   0.060s    0.060s   bubble_sorting (main.py:13)
  0.00% 100.00%   0.000s     4.00s   <module> (main.py:21)

Example of subcommand record, it generates an image file with extension .svg summarizing the experiment:

Linux/MacOS
Windows

Linux/MacOS

alexis@MacBook-Pro % sudo env "PATH=$PATH" py-spy record --pid 36588
py-spy> Sampling process 100 times a second. Press Control-C to exit.

^C
py-spy> Stopped sampling because Control-C pressed
py-spy> Wrote flamegraph data to '36588-2022-08-06T08:00:04-05:00.svg'. Samples: 24397 Errors: 0

Windows

C:\Users\alexi>py-spy record --pid 18212 --output profile.svg
py-spy> Sampling process 100 times a second. Press Control-C to exit.


py-spy> Stopped sampling because Control-C pressed
py-spy> Wrote flamegraph data to 'profile.svg'. Samples: 603 Errors: 0

DISassembling

Understanding how the executing bytecode behaves, beyond the Python code that we have analyzed, can gives us an idea of how things work internally and why some pieces of code work better than others, and the dis module is the tool that includes the default distribution of Python that we have for this purpose.

In [1]: import dis

In [2]: import main

In [3]: dis.dis(main.bubble_sorting)
  5           0 LOAD_GLOBAL              0 (len)
              2 LOAD_FAST                0 (array)
              4 CALL_FUNCTION            1
              6 STORE_FAST               1 (n)

  7           8 LOAD_GLOBAL              1 (range)
             10 LOAD_FAST                1 (n)
             12 CALL_FUNCTION            1
             14 GET_ITER
        >>   16 FOR_ITER                96 (to 114)
             18 STORE_FAST               2 (i)

  8          20 LOAD_CONST               1 (True)
             22 STORE_FAST               3 (sorted_item)

 10          24 LOAD_GLOBAL              1 (range)
             26 LOAD_FAST                1 (n)
             28 LOAD_FAST                2 (i)
             30 BINARY_SUBTRACT
             32 LOAD_CONST               2 (1)
             34 BINARY_SUBTRACT
             36 CALL_FUNCTION            1
             38 GET_ITER
        >>   40 FOR_ITER                62 (to 104)
             42 STORE_FAST               4 (j)

 11          44 LOAD_FAST                0 (array)
             46 LOAD_FAST                4 (j)
             48 BINARY_SUBSCR
             50 LOAD_FAST                0 (array)
             52 LOAD_FAST                4 (j)
             54 LOAD_CONST               2 (1)
             56 BINARY_ADD
             58 BINARY_SUBSCR
             60 COMPARE_OP               4 (>)
             62 POP_JUMP_IF_FALSE       40

 12          64 LOAD_FAST                0 (array)
             66 LOAD_FAST                4 (j)
             68 LOAD_CONST               2 (1)
             70 BINARY_ADD
             72 BINARY_SUBSCR
             74 LOAD_FAST                0 (array)
             76 LOAD_FAST                4 (j)
             78 BINARY_SUBSCR
             80 ROT_TWO
             82 LOAD_FAST                0 (array)
             84 LOAD_FAST                4 (j)
             86 STORE_SUBSCR
             88 LOAD_FAST                0 (array)
             90 LOAD_FAST                4 (j)
             92 LOAD_CONST               2 (1)
             94 BINARY_ADD
             96 STORE_SUBSCR

 13          98 LOAD_CONST               3 (False)
            100 STORE_FAST               3 (sorted_item)
            102 JUMP_ABSOLUTE           40

 15     >>  104 LOAD_FAST                3 (sorted_item)
            106 POP_JUMP_IF_FALSE       16

 16         108 POP_TOP
            110 JUMP_ABSOLUTE          114
            112 JUMP_ABSOLUTE           16

 18     >>  114 LOAD_FAST                0 (array)
            116 RETURN_VALUE

Clearly, this output is the disassembly of the code being executed. When interpreting it we can observe the following details:

The first column corresponds to the equivalent line number in the Python code that you are referring to.
In the second column, only some lines contain >>, which corresponds to the redirects and moves in the Python code.
The remaining columns correspond to the equivalent lines of the disassembled code.

Final Recommendations

The tools for profiling an application are now available, but there are several thoughts that we must take into account. Here are some recommendations:

CPU Overclocking and BIOS-based Accelerators, as Eco Modes and Battery Optimizing, can affect the real performance of the code under testing. It is recommended to disable them.
Telemetry Tools, Anti-Viruses, or Cloud-based File Sharing Systems should be disabled during the experiments.
Perform several measurements to obtain valid statistics could help us to confirm the expected behavior of the changes, and confirm the hypothesis.
Run experiments in a separated branches from the main repository, and keep the branches that reported code performance improvements
Consider using PySpy and Pyinstrument for Web-based applications in environments where it is not possible to make changes to code.

Part 1. TimeProfiling

Part 2. cProfile and line_profiler

If you think this content was helpful, consider buy me a cup of coffee. 😉☕️👉🏼

Tags: profiling python

Python Profiling – Memory Profiling (Part 3, Final)

Table of Contents

memory_profiler

PySpy

DISassembling

Final Recommendations

You may also like...

Leave a Reply Cancel reply

My latest certification

Table of Contents

memory_profiler

PySpy

DISassembling

Final Recommendations

You may also like...

Transformers: how do they work internally?

Working with Matrices in TensorFlow

Activation Functions (updated)

Leave a Reply Cancel reply

My latest certification