Thursday, April 24, 2014

how to profile a python program to pinpoint performance bottlenecks

Recently, I wrote some python programs to help integrating Cisco's UCS (Unified Computing System) with our homegrown infrastructure backends.  The programs run ok, though slower than we expected.

The python programs run from a CentOS-6.5 virtual machine on my MacBook (Mavrerick) against the famous Cisco's UCS Platform Emulator (UCSPE v2.2 1b) also running as a virtual machine on the same MacBook.

To pinpoint performance bottlenecks in these python programs, I started with the standard Python profiler.
# to get a text output of cpu time stats
python -m cProfile myutil.py  > /tmp/myutil.py-cProfile.out
# to ignore calls spent less than 1s in tottime.
grep -v -P "^\s*\d+/*\d*\s+0\."  /tmp/myutil.py-cProfile.out

With the above, we immediately noted that a sigificant percentage of cpu time (tottime) was spent on FindClassIdInMoMetaIgnoreCase and String.lower(). 
         1849158 function calls (1848366 primitive calls) in 109.533 CPU seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   863     45.159    0.052   88.026    0.102 UcsBase.py:553(FindClassIdInMoMetaIgnoreCase)        1     2.361    2.361    2.371    2.371 urllib.py:23(<module>)
      764    8.904    0.012    8.904    0.012 {built-in method read}   11825    6.999    0.001    6.999    0.001 {method 'append' of 'list' objects}
1735232  42.783    0.000   42.783    0.000 {method 'lower' of 'str' objects}

Looking at UcsBase.py closely, I was able to locate a silly performance bug and fixed it. As you can see in the new results below, the quick fix reduced the number of calls to String.lower() to half!
    978415 function calls (977623 primitive calls) in 48.234 CPU seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      863   12.825    0.015   33.887    0.039 UcsBase.py:553(FindClassIdInMoMetaIgnoreCase)
      764   11.436    0.015   11.436    0.015 {built-in method read}
 873738   21.037    0.000   21.037    0.000 {method 'lower' of 'str' objects}

It still seems too many though. Trying to reduce calls to lower() everywhere in UcsUtils.py and UcsBase.py didn't reduce much. Now we need to identify which/what calls contributed to this many invocation of String.lower().

pycallgraph seems to a magnificent tool to examine a running python program to see which calls which and for how many times. Unfortunately, I couldn't get it to work on centos-6.5 (Python-2.6.6). I ended up resorting to gprof2dot
# to capture profiling results in pstat format
python -m cProfile -o /tmp/python-cProfile-SDK_orig.pstats myutil.py
# use gprof2dot & dot to graph into a svg
python ~/gprof2dot/gprof2dot.py -f pstats /tmp/python-cProfile-SDK_orig.pstats | dot -Tsvg -o  python-cProfile-SDK_orig.svg

Safari & Chrome open up the .svg pictures just fine. Isn't it a beauty! It is very nice to scale up to zoom in on these pictures. After all, .svg stands for Scalable Vector Graphics image :-)
From the pretty pictures, I can see most String.lower() calls were indeed by FindClassIdInMoMetaIgnoreCase().  Oh, well, guess I'll have to tackle calls to FindClassIdInMoMetaIgnoreCase() another day.

If there's interest, I'll follow up with a post on how to set up these python performance profiling utilities. Please let me know what you think, in the comments below.

No comments: