Showing posts with label operating system. Show all posts
Showing posts with label operating system. Show all posts

Friday, December 22, 2006

how to coerce OMSA 5.1 to install & run properly on CentOS 4 (part III)

Here is the summary of what I did for OMSA 5.1 on CentOS 4.1 ( the white-box twin of RHAS 4.1)

For installation, on a CentOS 4.1 without OpenIPMI or net-snmp, you just need to
  1. append /etc/redhat-release with "Nahant", RHEL 4's code name.
1c1
< CentOS release 4.1 (Final) Nahant --- > CentOS release 4.1 (Final)
2. start installation by running ./linux/suppportscripts/srvadmin-install.sh
After installation, answer NO to start start all services. Instead, conduct the following steps first:
1. insert a line 'test -e /dev/ipmi0 || mknod -m 0600 /dev/ipmi0 c 253 0' at the beginning of /etc/init.d/dsm_sa_ipmi. w/o it, /etc/init.d/dsm_sa_ipmi will fail miserablly. It seems that udev doesn't put /dev/ipmi0 back as expected.
22,24d21 < # my hack < test -e /dev/ipmi0 || mknod -m 0600 /dev/ipmi0 c 253 0 < #
2. insert '/etc/init.d/dsm_sa_ipmi start' at the beginning of 'start' section inside /usr/bin/srvadmin- services.sh
w/o it, srvadmin-services.sh start will fail with IPMI drivers fail to load. However, if you start IPMI manually by '/etc/init.d/dsm_sa_ipmi start', you'd be just fine.
241,243d240 < # my hack::ipmi failed to start when called from later this script < /etc/init.d/dsm_sa_ipmi start <
3. now start it all with /usr/bin/srvadmin-services.sh start
4. verify it with ' omreport chassis temps' and 'omreport storage controller '. These two rely on different things to work. I'll elaborate on this later.

The rc scripts are placed in the S50 instead of their own start sequence number mandated in the scripts. I corrected them manually with my understanding of chkconfig directives in the scripts. However, I didn't get to change run level to test whether that would guarantee OMSA startup succesfully, since all my boxens are in production. Besides, Hobbit Monitor will let me know soon enough upon such failure.

It is such a great relief to have constant & comprehensive monitoring against critical systems. As a system engineer, you don't have to make all these mental or paper notes to check this or that. A decent NMS will do its job to nag you (whether you like or not) when necessary.

Wednesday, September 06, 2006

custom transport for postfix on RHEL 4/AS (CentOS 4.3)

I was told to use a special server to relay outbound emails for a new NMS server I built. The NMS server has POSTFIX installed and is configured to send production alerts to a new duty pager, a Blackberry 6230 with BIS account served by T-mobile. I did POSTFIX relay thingy once elsewhere, so, I instantly did the following
  • vi /etc/postfix/transport & added the following two lines at the bottom of the file
tmo.blackberry.net smtp:smartrelay2.intranet.com
intranet.com smtp:10.9.9.99
  • run 'postmap' to generate the hashed transport:
postmap /etc/postfix/transport
  • vi /etc/postfix/main.cf and commented out the global relay
#relayhost = 10.9.9.99
  • reload configuration for the running postfix. Reloading was met with some nonsense error/warnings, so I ended up with a full restart.
/etc/init.d/postfix reload OR service postfix reload
  • tested it and it didn't work. it seemed that the transport map doesn't exist, since it looked up MX record to find SMTP servers to send mail to.
man postmap, man postfix, man 5 transport, grep -i transport main.cf.
  • It became obvious that main.cf doesn't have reference to use which transport hash at all. so I added the following then restarted POSTFIX. Bingo!
transport_maps = hash:/etc/postfix/transport

It still made me curious why this is not enabled by default since one needs to consciously insert rules into the transport configuration file as well as to consciously run postmap to generate the hash. This would save me the hassle to figure out it is not there, and save a service outage in production environment should custom transport becomes necessary. The only argument I can find up to now is the security, only if it takes different credentials to modify transport.db and to start use transport.db at first place.

Of course, "crazy" default behavior can be identified on other modern operating systems also. Don't get me started with that Solaris halts if it receives the POWER Fail signal sent by a resetting serial console.