Recently, I wrote some python programs to help integrating Cisco's UCS (Unified Computing System) with our homegrown infrastructure backends. The programs run ok, though slower than we expected.
The python programs run from a CentOS-6.5 virtual machine on my MacBook (Mavrerick) against the famous Cisco's UCS Platform Emulator (UCSPE v2.2 1b) also running as a virtual machine on the same MacBook.
The python programs run from a CentOS-6.5 virtual machine on my MacBook (Mavrerick) against the famous Cisco's UCS Platform Emulator (UCSPE v2.2 1b) also running as a virtual machine on the same MacBook.
To pinpoint performance bottlenecks in these python programs, I started with the standard Python profiler.
# to get a text output of cpu time stats
python -m cProfile myutil.py > /tmp/myutil.py-cProfile.out
# to ignore calls spent less than 1s in tottime.
grep -v -P "^\s*\d+/*\d*\s+0\." /tmp/myutil.py-cProfile.out
With the above, we immediately noted that a sigificant percentage of cpu time (tottime) was spent on FindClassIdInMoMetaIgnoreCase and String.lower().
1849158 function calls (1848366 primitive calls) in 109.533 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
863 45.159 0.052 88.026 0.102 UcsBase.py:553(FindClassIdInMoMetaIgnoreCase) 1 2.361 2.361 2.371 2.371 urllib.py:23(<module>)
764 8.904 0.012 8.904 0.012 {built-in method read} 11825 6.999 0.001 6.999 0.001 {method 'append' of 'list' objects}
1735232 42.783 0.000 42.783 0.000 {method 'lower' of 'str' objects}
Looking at UcsBase.py closely, I was able to locate a silly performance bug and fixed it. As you can see in the new results below, the quick fix reduced the number of calls to String.lower() to half!
978415 function calls (977623 primitive calls) in 48.234 CPU secondsOrdered by: standard namencalls tottime percall cumtime percall filename:lineno(function)863 12.825 0.015 33.887 0.039 UcsBase.py:553(FindClassIdInMoMetaIgnoreCase)764 11.436 0.015 11.436 0.015 {built-in method read}873738 21.037 0.000 21.037 0.000 {method 'lower' of 'str' objects}
It still seems too many though. Trying to reduce calls to lower() everywhere in UcsUtils.py and UcsBase.py didn't reduce much. Now we need to identify which/what calls contributed to this many invocation of String.lower().
pycallgraph seems to a magnificent tool to examine a running python program to see which calls which and for how many times. Unfortunately, I couldn't get it to work on centos-6.5 (Python-2.6.6). I ended up resorting to gprof2dot.
# to capture profiling results in pstat format
python -m cProfile -o /tmp/python-cProfile-SDK_orig.pstats myutil.py
# use gprof2dot & dot to graph into a svg
python ~/gprof2dot/gprof2dot.py -f pstats /tmp/python-cProfile-SDK_orig.pstats | dot -Tsvg -o python-cProfile-SDK_orig.svg
Safari & Chrome open up the .svg pictures just fine. Isn't it a beauty! It is very nice to scale up to zoom in on these pictures. After all, .svg stands for Scalable Vector Graphics image :-)
From the pretty pictures, I can see most String.lower() calls were indeed by FindClassIdInMoMetaIgnoreCase(). Oh, well, guess I'll have to tackle calls to FindClassIdInMoMetaIgnoreCase() another day.
If there's interest, I'll follow up with a post on how to set up these python performance profiling utilities. Please let me know what you think, in the comments below.
No comments:
Post a Comment