Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Friday, December 22, 2006

how to coerce OMSA 5.1 to install & run properly on CentOS 4 (part III)

Here is the summary of what I did for OMSA 5.1 on CentOS 4.1 ( the white-box twin of RHAS 4.1)

For installation, on a CentOS 4.1 without OpenIPMI or net-snmp, you just need to
  1. append /etc/redhat-release with "Nahant", RHEL 4's code name.
1c1
< CentOS release 4.1 (Final) Nahant --- > CentOS release 4.1 (Final)
2. start installation by running ./linux/suppportscripts/srvadmin-install.sh
After installation, answer NO to start start all services. Instead, conduct the following steps first:
1. insert a line 'test -e /dev/ipmi0 || mknod -m 0600 /dev/ipmi0 c 253 0' at the beginning of /etc/init.d/dsm_sa_ipmi. w/o it, /etc/init.d/dsm_sa_ipmi will fail miserablly. It seems that udev doesn't put /dev/ipmi0 back as expected.
22,24d21 < # my hack < test -e /dev/ipmi0 || mknod -m 0600 /dev/ipmi0 c 253 0 < #
2. insert '/etc/init.d/dsm_sa_ipmi start' at the beginning of 'start' section inside /usr/bin/srvadmin- services.sh
w/o it, srvadmin-services.sh start will fail with IPMI drivers fail to load. However, if you start IPMI manually by '/etc/init.d/dsm_sa_ipmi start', you'd be just fine.
241,243d240 < # my hack::ipmi failed to start when called from later this script < /etc/init.d/dsm_sa_ipmi start <
3. now start it all with /usr/bin/srvadmin-services.sh start
4. verify it with ' omreport chassis temps' and 'omreport storage controller '. These two rely on different things to work. I'll elaborate on this later.

The rc scripts are placed in the S50 instead of their own start sequence number mandated in the scripts. I corrected them manually with my understanding of chkconfig directives in the scripts. However, I didn't get to change run level to test whether that would guarantee OMSA startup succesfully, since all my boxens are in production. Besides, Hobbit Monitor will let me know soon enough upon such failure.

It is such a great relief to have constant & comprehensive monitoring against critical systems. As a system engineer, you don't have to make all these mental or paper notes to check this or that. A decent NMS will do its job to nag you (whether you like or not) when necessary.

Monday, December 18, 2006

Hobbit Monitor :: how to report multiple temperture probe results

For Dell PowerEdge servers, OMSA can report temperature for multiple probes, notably, "BMC Ambient", "BMC Planaar", "BMC Riser", and one for each physical processor. I initially wrote my own extension script named 'temps' , which reports all probes under one 'status server1.temps green' call. It works for a test server after I did the following on the Hobbit server:
  • defined a RRD graph section in hobbitgraph.cfg for all probes available on that server
  • NCV_temps='*:GAUGE" in hobbitserver.cfg
  • appended temps=ncv to test2rrd line in hobbitserver.cfg
  • restart hobbitd server
When I deployed the same extension code to a different server, the limitation of such an extension became painfully obvious: the graph definition failed when a different server reports more or less probes. Since hobbitgraph.cfg's graph section is keyed to the test name in hobbitgraph.cfg, I can't add custom graph section per server configuration, not to mention it is not scalable!. As for alerting, I tried setting up thresholds on the server. However, it didn't seem to generate any alert even if the threshold is obviously surpassed. wonder if the 'status blah GREEN' reported by the extension script kinda blocked such centralized checking.

After posting the above questions to Hobbit's mailing list and got no answers, I took upon myself to find the answers. Good thing that Hobbit Monitor is licensed using GPLv1. I found the answers in the source code, rrd/do_temperature.c. [[ All hail goes to Open Source & Henrik! ]]
  • the 'temperature' test is built-in. duh...
  • if  'temperature' is used as 'test name' on the client, nothing needs to be changed on the server end.
  • a lump-all status command will do. Format of the data portion is somewhat restrictive. Each probe needs to be in the format of '&green BMCambient 17 62'. 
  • the test can be done locally on the client and the overall $status is reported by 'status server1.temperature $status'.
Wishes:
  • It'd be nice to have such data report format documented elsewhere other than the source code.
  • The current code (4.2RC1-20060712) didn't take input other than integer too well. It confused the hell out of it, in fact, to the extent it lump the whole long after $status as the probe name. Most of problems I experienced was actually the '.' in my data report confused Hobbit.
  • It'd be nice to be able to specify threshold on the server centrally.

Wednesday, September 20, 2006

to flush a jsp

in order to test how a load balancer (F5 BIG-IP or Citrix Netscaler or others) time-outs an idle server or client, I cooked a simple JSP. what I desire is to flush any byte as soon it is written to the response stream. Using PERL, one'd set $| = 1; or autoflush(1) and be over with it. In an Apache/Tomcat environment, I ended up using <@ page buffer="none" %> and response.flushBuffer() combined. The API for 'buffer' page parameter led me believe it should accomplish what I desire by itself. It doesn't seem to be the case. Maybe it is buffered by Apache?