Thursday, April 24, 2014

how to profile a python program to pinpoint performance bottlenecks

Recently, I wrote some python programs to help integrating Cisco's UCS (Unified Computing System) with our homegrown infrastructure backends.  The programs run ok, though slower than we expected.

The python programs run from a CentOS-6.5 virtual machine on my MacBook (Mavrerick) against the famous Cisco's UCS Platform Emulator (UCSPE v2.2 1b) also running as a virtual machine on the same MacBook.

To pinpoint performance bottlenecks in these python programs, I started with the standard Python profiler.
# to get a text output of cpu time stats
python -m cProfile myutil.py  > /tmp/myutil.py-cProfile.out
# to ignore calls spent less than 1s in tottime.
grep -v -P "^\s*\d+/*\d*\s+0\."  /tmp/myutil.py-cProfile.out

With the above, we immediately noted that a sigificant percentage of cpu time (tottime) was spent on FindClassIdInMoMetaIgnoreCase and String.lower(). 
         1849158 function calls (1848366 primitive calls) in 109.533 CPU seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   863     45.159    0.052   88.026    0.102 UcsBase.py:553(FindClassIdInMoMetaIgnoreCase)        1     2.361    2.361    2.371    2.371 urllib.py:23(<module>)
      764    8.904    0.012    8.904    0.012 {built-in method read}   11825    6.999    0.001    6.999    0.001 {method 'append' of 'list' objects}
1735232  42.783    0.000   42.783    0.000 {method 'lower' of 'str' objects}

Looking at UcsBase.py closely, I was able to locate a silly performance bug and fixed it. As you can see in the new results below, the quick fix reduced the number of calls to String.lower() to half!
    978415 function calls (977623 primitive calls) in 48.234 CPU seconds
   Ordered by: standard name
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      863   12.825    0.015   33.887    0.039 UcsBase.py:553(FindClassIdInMoMetaIgnoreCase)
      764   11.436    0.015   11.436    0.015 {built-in method read}
 873738   21.037    0.000   21.037    0.000 {method 'lower' of 'str' objects}

It still seems too many tho. Trying to reduce calls to lower() everywhere in UcsUtils.py and UcsBase.py didn't reduce much. Now we need to identify which/what calls contributed to this many invokation of String.lower().

pycallgraph seems to a magnificent tool to examine a running python program to see which calls which and for how many times. Unfortunately, I couldn't get it to work on centos-6.5 (Python-2.6.6). I ended up resorting to gprof2dot
# to capture profiling results in pstat format
python -m cProfile -o /tmp/python-cProfile-SDK_orig.pstats myutil.py
# use gprof2dot & dot to graph into a svg
python ~/gprof2dot/gprof2dot.py -f pstats /tmp/python-cProfile-SDK_orig.pstats | dot -Tsvg -o  python-cProfile-SDK_orig.svg

Safari & Chrome open up the .svg pictures just fine. Isn't it a beauty! It is very nice to scale up to zoom in on these pictures. After all, .svg stands for Scalable Vector Graphics image :-)
From the pretty pictures, I can see most String.lower() calls were indeed by FindClassIdInMoMetaIgnoreCase().  Oh, well, guess I'll have to tackle calls to FindClassIdInMoMetaIgnoreCase() another day.

If there's interest, I'll follow up with a post on how to set up these python performance profiling utilities. Please let me know what you think, in the comments below.

Tuesday, April 01, 2014

how to bring VirtualBox out of 'Guru meditation'

I have been using Oracle Virtualbox for work for quite a few years now. This time around I have a few relatively big virtual machines running. A Netapp simulator and a UCS platform Emulator. I keep switching between the two, by pausing one to free up memory to work on the other.

Yesterday, the UCS one suddenly can not be started again from the Virtualbox Manager GUI. All possible actions for the vm are grayed out, except that I could review logs. Under the vm icon, it says 'Guru meditation'. The mouse-over hint said 'session locked'.  

The actual directory looks pretty clean. The Logs/VBox.log indeed has an entry confirming this unfortunate 'guru meditation' state.
41:15:26.089214 UIMediumEnumerator: Machine (or snapshot) event processed, ID = d2b17907-9bef-4e1d-b755-d6be33e5570a
41:15:26.089683 UIMediumEnumerator: Machine (or snapshot) event received, ID = d2b17907-9bef-4e1d-b755-d6be33e5570a
41:15:26.089856 UIMediumEnumerator:  Old usage: 0210f54c-ea12-4a15-bc4d-f8fa690759a8, 658b6fcb-ee59-4065-abd4-c3aeb1eab3a8, 6d3ed09f-42cd-4b36-8652-b4a75e8181bd, 7b83e1e3-ffc2-445d-a9fc-9d6586833b4e, 8589d5fd-4ff3-4c80-b915-7cee87a2e974, e6964c46-790b-4aff-82fb-b22a83792882, f623a9aa-4b0c-44a0-b34c-34d9a3f09dd6, fea35c17-bc46-4d31-84b0-08f4620e7cc4
41:15:26.101502 UIMediumEnumerator:  New usage: 658b6fcb-ee59-4065-abd4-c3aeb1eab3a8, e6964c46-790b-4aff-82fb-b22a83792882, f623a9aa-4b0c-44a0-b34c-34d9a3f09dd6, 8589d5fd-4ff3-4c80-b915-7cee87a2e974
41:15:26.101557 UIMediumEnumerator:  Items excluded from usage: 7b83e1e3-ffc2-445d-a9fc-9d6586833b4e, fea35c17-bc46-4d31-84b0-08f4620e7cc4, 0210f54c-ea12-4a15-bc4d-f8fa690759a8, 6d3ed09f-42cd-4b36-8652-b4a75e8181bd
41:15:26.113783 UIMediumEnumerator: Machine (or snapshot) event processed, ID = d2b17907-9bef-4e1d-b755-d6be33e5570a
41:15:26.135516 UIMediumEnumerator: Medium-enumeration finished!
41:21:23.219316 ERROR [COM]: aRC=VBOX_E_INVALID_VM_STATE (0x80bb0002) aIID={8ab7c520-2442-4b66-8d74-4ff1e195d2b6} aComponent={Console} aText={Invalid machine state: GuruMeditation}, preserve=false

I made a copy of the virtual machine directory, embracing for the worst.
Then it dawned on me that I may be able to bring it out of 'session locked' state via CLI, namely, VBoxManage. 

Viola!
Mac:/Users/jack/VirtualBoxVMs$ VBoxManage controlvm ucspe resume
VBoxManage: error: Cannot resume the machine as it is not paused (machine state: GuruMeditation)
VBoxManage: error: Details: code VBOX_E_INVALID_VM_STATE (0x80bb0002), component Console, interface IConsole, callee nsISupports
VBoxManage: error: Context: "Resume()" at line 118 of file VBoxManageControlVM.cpp
Mac:/Users/jack/VirtualBoxVMs$ VBoxManage controlvm ucspe poweroff
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
As soon as VBoxManage finished executing, all the grayed-out actions became available again on the GUI for this vm. I started it. 

The virtual machine came up just fine. What a relief :-)

In case this helps somebody, I am running Virtualbox v3.4.8 on Mac OS X 10.9.2 (Maverick). The two VMs in question are Linux 2.6 and FreeBSD.