Friday, December 22, 2006

how to coerce OMSA 5.1 to install & run properly on CentOS 4 (part III)

Here is the summary of what I did for OMSA 5.1 on CentOS 4.1 ( the white-box twin of RHAS 4.1)

For installation, on a CentOS 4.1 without OpenIPMI or net-snmp, you just need to
  1. append /etc/redhat-release with "Nahant", RHEL 4's code name.
1c1
< CentOS release 4.1 (Final) Nahant --- > CentOS release 4.1 (Final)
2. start installation by running ./linux/suppportscripts/srvadmin-install.sh
After installation, answer NO to start start all services. Instead, conduct the following steps first:
1. insert a line 'test -e /dev/ipmi0 || mknod -m 0600 /dev/ipmi0 c 253 0' at the beginning of /etc/init.d/dsm_sa_ipmi. w/o it, /etc/init.d/dsm_sa_ipmi will fail miserablly. It seems that udev doesn't put /dev/ipmi0 back as expected.
22,24d21 < # my hack < test -e /dev/ipmi0 || mknod -m 0600 /dev/ipmi0 c 253 0 < #
2. insert '/etc/init.d/dsm_sa_ipmi start' at the beginning of 'start' section inside /usr/bin/srvadmin- services.sh
w/o it, srvadmin-services.sh start will fail with IPMI drivers fail to load. However, if you start IPMI manually by '/etc/init.d/dsm_sa_ipmi start', you'd be just fine.
241,243d240 < # my hack::ipmi failed to start when called from later this script < /etc/init.d/dsm_sa_ipmi start <
3. now start it all with /usr/bin/srvadmin-services.sh start
4. verify it with ' omreport chassis temps' and 'omreport storage controller '. These two rely on different things to work. I'll elaborate on this later.

The rc scripts are placed in the S50 instead of their own start sequence number mandated in the scripts. I corrected them manually with my understanding of chkconfig directives in the scripts. However, I didn't get to change run level to test whether that would guarantee OMSA startup succesfully, since all my boxens are in production. Besides, Hobbit Monitor will let me know soon enough upon such failure.

It is such a great relief to have constant & comprehensive monitoring against critical systems. As a system engineer, you don't have to make all these mental or paper notes to check this or that. A decent NMS will do its job to nag you (whether you like or not) when necessary.

Wednesday, December 20, 2006

how to change IP address on a Linux server :: a project?

I was asked this question once, "how do you change IP address on a Linux server?"  At its face value, it's a simple technical question on systems administration or network administration using a Linux server. So, my instant response is
  • edit /etc/sysconfig/ifcfg-eth0 (or eth?).   or system-config-network TUI or GUI.
  • adjust routes and personal fw on the host as needed
  • ifconfig eth1 down && ifconfig eth1 up (or /etc/init.d/network restart)
  • verify: ifconfig eth1 (to see if the new IP has taken effect)
  • check & verify from remote host it is actually accessible (TCP/IP level, plus other services the server may provide)
  • update DNS if applicable
To dwell on the question a little more,  and depends on the context of the conversation, the lines of questions, and who asks, I would expand a bit from the perspective of process management, configuration management, and knowledge management. In other words, I'd prefer to drive such a "simple" change as a small project. Thus, here comes the addition to my initial answer:

So, the actual change is simple. Depends on the function of the server and its inter-relationship with other systems, you probably need to manage such a change as a project. As they say, a little bit more of thinking and planning goes a long way.
  • gather requirement from functional groups and management
  • set a date to cut-over and obtain sign-off from stake-holders
  • plan for it (procedures/fall-back plan/check-list/notification/etc.)
  • communicate the planned change and its potential impact to stake-holders and end users
  • dry run if necessary
  • prior to cut-over: for days leading up to the cut-over, shorten TTL for A record for that IP, if it has DNS record and is accessed by DNS name instead of IP alone.
  • prior to cut-over: check to be sure ACL or conduits or fw rules get updated on router/firewalls and systems the server gain access to or from
  • during cut-over: keep team and stake-holders abreast of up-to-date status
  • post cut-over: verify by going through your check-list to determine success
  • post cut-over: enter change log for this event & notify users the cut-over is done
  • post cut-over: alias or real NIC to keep servicing the IP for limited period. (could also just alias the new IP onto the NIC serving the current IP, the cut-over becomes switching who's alias and who's real)
  • post cut-over: added DNAT or reverse proxy rules to catch traffic to the old IP and redirect/log/alert as necessary
  • post cut-over: document this in the knowledge-management system: a scrapbook, a WIKI, a technical writer.