miles wide miles deep: sysadmin

Showing posts with label sysadmin. Show all posts

Friday, December 22, 2006

how to coerce OMSA 5.1 to install & run properly on CentOS 4 (part III)

Here is the summary of what I did for OMSA 5.1 on CentOS 4.1 ( the white-box twin of RHAS 4.1)

For installation, on a CentOS 4.1 without OpenIPMI or net-snmp, you just need to

append /etc/redhat-release with "Nahant", RHEL 4's code name.

1c1
< CentOS release 4.1 (Final) Nahant --- > CentOS release 4.1 (Final)
2. start installation by running ./linux/suppportscripts/srvadmin-install.sh
After installation, answer NO to start start all services. Instead, conduct the following steps first:
1. insert a line 'test -e /dev/ipmi0 || mknod -m 0600 /dev/ipmi0 c 253 0' at the beginning of /etc/init.d/dsm_sa_ipmi. w/o it, /etc/init.d/dsm_sa_ipmi will fail miserablly. It seems that udev doesn't put /dev/ipmi0 back as expected.
22,24d21 < # my hack < test -e /dev/ipmi0 || mknod -m 0600 /dev/ipmi0 c 253 0 < #
2. insert '/etc/init.d/dsm_sa_ipmi start' at the beginning of 'start' section inside /usr/bin/srvadmin- services.sh
w/o it, srvadmin-services.sh start will fail with IPMI drivers fail to load. However, if you start IPMI manually by '/etc/init.d/dsm_sa_ipmi start', you'd be just fine.
241,243d240 < # my hack::ipmi failed to start when called from later this script < /etc/init.d/dsm_sa_ipmi start <
3. now start it all with /usr/bin/srvadmin-services.sh start
4. verify it with ' omreport chassis temps' and 'omreport storage controller '. These two rely on different things to work. I'll elaborate on this later.

The rc scripts are placed in the S50 instead of their own start sequence number mandated in the scripts. I corrected them manually with my understanding of chkconfig directives in the scripts. However, I didn't get to change run level to test whether that would guarantee OMSA startup succesfully, since all my boxens are in production. Besides, Hobbit Monitor will let me know soon enough upon such failure.

It is such a great relief to have constant & comprehensive monitoring against critical systems. As a system engineer, you don't have to make all these mental or paper notes to check this or that. A decent NMS will do its job to nag you (whether you like or not) when necessary.

Wednesday, December 20, 2006

how to change IP address on a Linux server :: a project?

I was asked this question once, "how do you change IP address on a Linux server?" At its face value, it's a simple technical question on systems administration or network administration using a Linux server. So, my instant response is

edit /etc/sysconfig/ifcfg-eth0 (or eth?). or system-config-network TUI or GUI.
adjust routes and personal fw on the host as needed
ifconfig eth1 down && ifconfig eth1 up (or /etc/init.d/network restart)
verify: ifconfig eth1 (to see if the new IP has taken effect)
check & verify from remote host it is actually accessible (TCP/IP level, plus other services the server may provide)
update DNS if applicable

To dwell on the question a little more, and depends on the context of the conversation, the lines of questions, and who asks, I would expand a bit from the perspective of process management, configuration management, and knowledge management. In other words, I'd prefer to drive such a "simple" change as a small project. Thus, here comes the addition to my initial answer:

So, the actual change is simple. Depends on the function of the server and its inter-relationship with other systems, you probably need to manage such a change as a project. As they say, a little bit more of thinking and planning goes a long way.

gather requirement from functional groups and management
set a date to cut-over and obtain sign-off from stake-holders
plan for it (procedures/fall-back plan/check-list/notification/etc.)
communicate the planned change and its potential impact to stake-holders and end users
dry run if necessary
prior to cut-over: for days leading up to the cut-over, shorten TTL for A record for that IP, if it has DNS record and is accessed by DNS name instead of IP alone.
prior to cut-over: check to be sure ACL or conduits or fw rules get updated on router/firewalls and systems the server gain access to or from
during cut-over: keep team and stake-holders abreast of up-to-date status
post cut-over: verify by going through your check-list to determine success
post cut-over: enter change log for this event & notify users the cut-over is done
post cut-over: alias or real NIC to keep servicing the IP for limited period. (could also just alias the new IP onto the NIC serving the current IP, the cut-over becomes switching who's alias and who's real)
post cut-over: added DNAT or reverse proxy rules to catch traffic to the old IP and redirect/log/alert as necessary
post cut-over: document this in the knowledge-management system: a scrapbook, a WIKI, a technical writer.

Monday, December 18, 2006

Hobbit Monitor :: how to report multiple temperture probe results

For Dell PowerEdge servers, OMSA can report temperature for multiple probes, notably, "BMC Ambient", "BMC Planaar", "BMC Riser", and one for each physical processor. I initially wrote my own extension script named 'temps' , which reports all probes under one 'status server1.temps green' call. It works for a test server after I did the following on the Hobbit server:

defined a RRD graph section in hobbitgraph.cfg for all probes available on that server
NCV_temps='*:GAUGE" in hobbitserver.cfg
appended temps=ncv to test2rrd line in hobbitserver.cfg
restart hobbitd server

When I deployed the same extension code to a different server, the limitation of such an extension became painfully obvious: the graph definition failed when a different server reports more or less probes. Since hobbitgraph.cfg's graph section is keyed to the test name in hobbitgraph.cfg, I can't add custom graph section per server configuration, not to mention it is not scalable!. As for alerting, I tried setting up thresholds on the server. However, it didn't seem to generate any alert even if the threshold is obviously surpassed. wonder if the 'status blah GREEN' reported by the extension script kinda blocked such centralized checking.

After posting the above questions to Hobbit's mailing list and got no answers, I took upon myself to find the answers. Good thing that Hobbit Monitor is licensed using GPLv1. I found the answers in the source code, rrd/do_temperature.c. [[ All hail goes to Open Source & Henrik! ]]

the 'temperature' test is built-in. duh...
if 'temperature' is used as 'test name' on the client, nothing needs to be changed on the server end.
a lump-all status command will do. Format of the data portion is somewhat restrictive. Each probe needs to be in the format of '&green BMCambient 17 62'.
the test can be done locally on the client and the overall $status is reported by 'status server1.temperature $status'.

Wishes:

It'd be nice to have such data report format documented elsewhere other than the source code.
The current code (4.2RC1-20060712) didn't take input other than integer too well. It confused the hell out of it, in fact, to the extent it lump the whole long after $status as the probe name. Most of problems I experienced was actually the '.' in my data report confused Hobbit.
It'd be nice to be able to specify threshold on the server centrally.

Wednesday, December 13, 2006

How to coerce OMSA 5.1 to install & run properly on CentOS 4 (part II)

Not so fast! The same fix doesn't work on CentOS 4.1 running on a Dell PE6850. srvadmin-install.sh somehow determined it needed a newer version of OpenIPMI from its own package, regardless of whether the OS has latest OpenIPMI installed or not. Recall that the same hack worked w/o OpenIPMI installed at all on CentOS 4.4.

# Here is the excerpt from srvadmin-install.sh
Server Administrator requires a newer version of the OpenIPMI driver modules
for the running kernel than are currently installed on the system. The OpenIPMI
driver modules for the running kernel will be upgraded by installing an
OpenIPMI driver RPM.
warning: RPMS/supportRPMS/openipmi-33.13.RHEL4-1dkms.noarch.rpm: V3 DSA signature: NOKEY, key ID 23b66a9d
Preparing... ########################################### [100%]
1:openipmi ########################################### [100%]

Loading kernel module source and prebuilt module binaries (if any)
Installing prebuilt kernel module binaries (if any)

Module build for the currently running kernel was skipped since the
kernel source for this kernel does not seem to be installed.

Your DKMS tree now includes:
openipmi, 33.13.RHEL4, 2.6.9-5.EL, i686: built
openipmi, 33.13.RHEL4, 2.6.9-5.EL, x86_64: built
openipmi, 33.13.RHEL4, 2.6.9-5.ELsmp, i686: built
openipmi, 33.13.RHEL4, 2.6.9-5.ELsmp, x86_64: built
openipmi, 33.13.RHEL4, 2.6.9-5.ELhugemem, i686: built

The OpenIPMI packages is from the CentOS 4.4 release DVD. So, it couldn't be newer !
# rpm -aq|grep -i OpenIPMI
OpenIPMI-libs-1.4.14-1.4E.13
OpenIPMI-1.4.14-1.4E.13

All our production servers don't have kernel source or development kits available. So, dkms approach isn't the one we'd take easily if we have other viable alternatives. So, I modified srvadmin-openipmi.sh to force it not to upgrade IPMI.
# diff srvadmin-openipmi.sh.orig srvadmin-openipmi.sh
1055c1055,1056
< loc_recommended_action="$?
> #LOC_RECOMMENDED_ACTION=$?
> LOC_RECOMMENDED_ACTION=4

This worked for the installation. However, upon start-up the services, the IPMI services failed to start. I ran into problem before with OMSA's own ipmi init script (dsm_sa_ipmi) deleting /dev/ipmi0 while udev never puts it back. So, as a simple hack, I just added a line to add the device at the beginning of the script.
test -e /dev/ipmi0 || mknod -m 0600 /dev/ipmi0 c 253 0

After this, /etc/init.d/dsm_sa_ipmi would start ok, w/o OpenIPMI package installed. However, when it is called from srvadmin-services.sh (I suspect instsvcdrv is the actually caller). It failed each every time, causing omreport could't get any data. For now, I can manually start dsm_sa_ipmi and then start/stop srvadmin-services.sh just fine.

What makes worse (maybe even the cause of this IPMI start-up problem) is that the init sequence numbers assigned to all these init scripts are S50, instead of S00(mptctl), S96(instsvcdrv), and S97 for all the rest. I don't think this will work upon reboot or run level changes. I fixed it manually per chkconfig directives inside the scripts. However, I am curious why chkconfig got confused.

P.S. A more complete tweak would be at beginning of the function in srvadmin-openipmi.sh. The previous tweak on srvadmin-openipmi.sh works only when you have OpenIPMI installed.

# diff srvadmin-openipmi.sh.orig srvadmin-openipmi.sh
973a974,976
> LOC_RECOMMENDED_ACTION=4
> return ${LOC_RECOMMENDED_ACTION}

How to coerce OMSA 5.1 to install & run properly on CentOS 4

To monitor Dell PowerEdge servers running Linux, I downloaded OMSA (Open Manage Server Assistant) 5.1 package for Redhat Enterprise Linux 4 (RHEL4) on a bunch of Dell PowerEdge 6850 servers. The package was listed under 'system management' of the 'drivers & download' section. The actual file name is 'OM_5.1_ManNode_LIN_A00.tar.gz'.

On the PE6850 server, I unpacked the tarball, and attempted to install using the srvadmin-install.sh. It failed silently, with an error code of 2.
# supportscripts/srvadmin-install.sh
# echo $?
2
Examing srvadmin-install.sh closely, I realized that it didn't support CentOS 4. Instead, it supports & detects RHEL4 by its code node 'Nahant'. Instead of mucking the code to support CentOS, I decided to fake RHEL4 by modifying /etc/redhat-release.
# cat /etc/redhat-release
fake RHEL 4 Nahant
It worked.The installation went a-ok and all services started properly. 'omreport storage controllers' now reports various information on various storage controllers.

Thinking the faking was needed only by the installation scripts, I reverted /etc/redhat-release to its original version. To be sure, I restarted OMSA by ' srvadmin-services.sh stop|start'. Well, it didn't pan out that well.

# /usr/bin/srvadmin-services.sh start
Starting mptctl:
Waiting for mptctl driver registration to complete:
[ OK ] Starting Systems Management Device Drivers:
Starting dell_rbu: [ OK ]
Starting ipmi driver: [FAILED]
Starting Systems Management Device Drivers:
Starting dell_rbu: Already started [ OK ]
Starting ipmi driver: [FAILED]
Starting DSM SA Shared Services: [ OK ]

Starting DSM SA Connection Service: [ OK ]

Now it seemed the srvadmin-ipmi package's init script still relies on /etc/redhat-release to tell it which OS it is running on. The server doesn't have OpenIPMI installed. Without any IPMI driver loaded, no report can be run against storage or temps using OMSA.

To have it work for both world (OMSA and CentOS), I cooked up this /etc/redhat-release by appending the RHEL4 code name to the original /etc/redhat-release
# cat /etc/redhat-release.fake
CentOS release 4.4 (Final) Nahant
# /usr/bin/srvadmin-services.sh start
Starting mptctl:
Waiting for mptctl driver registration to complete: [ OK ] Starting Systems Management Device Drivers:
Starting dell_rbu: [ OK ]
Starting ipmi driver: [ OK ]
Starting Systems Management Data Engine:
Starting dsm_sa_datamgr32d: [ OK ]
Starting dsm_sa_eventmgr32d: [ OK ]
Starting DSM SA Shared Services: [ OK ]
Starting DSM SA Connection Service: [ OK ]

This cooked /etc/redhat-release had been verified to work well with 'yum -y update' and OMSA's 'srvadmin-services.sh start'

Wednesday, December 06, 2006

trick to enable non-bridged wireless access on a Netgear Wireless router (WGR614v6)

I bought a Netgear wireless router (WGR614v6) from CompUSA yesterday. It is going to a customer's site for WI-FI blackberries (BB7270). I need a wireless router that can simply

obtain its own TCP/IP via DHCP from a network port
self-contain its WLAN: NAT/SPI protect all wi-fi devices

I would be happier with a Netgear MR814v3, which we have in the lab and is known to work well with BB7270. All the bugs and glitches we ran into in the past year turned out to be on RIM's end, instead of the "poor" quality perceived by RIM's engineer of consumer-grade wireless access points.

Followed the auto-run wizard from the installation CD to the letter, I connected my own laptop to one of the LAN ports on the unit, with the unit's own Internet port plugged into the wall jacket in my office. Once I did a 'ipconfig /renew' on my laptop, the laptop got TCP/IP settings via DHCP as promised. It is a bit odd to see the DNS server is 192.168.1.1. This means the Netgear unit proxies DNS queries for devices utilizing its DHCP/NAT services. However, two things do not work right:

nslookup anything resolves to 192.168.1.1 ?
the LED on the front of the unit is not on for the WI-FI? 'Router status' says 'wireless :: off', while the checkbox to turn on 'wireless router radio' was shadowed out!

Searching on Netgear's site, I was relieved/disappointed to find the firmware was up to date. There's a link from WGR614's support page on how to enable the unit as a wireless access point. However, such enabling is to bridge the WLAN to the local LAN. I attempted that to satisfy my curiosity. It requires to use a regular LAN port as the autosense uplink instead of the separate Internet port on the unit. Once I did it, the laptop and a wifi blackberry were bridged to the real Ethernet LAN and got their TCP/IP settings from the real corporate DHCP server instead of from the unit's own DHCP server.

Lo & behold, everything went peachy all of a sudden, after this diversion. The checkbox to turn on wireless router radio option is now available, even if after I moved the cable from regular LAN port to Internet port on the unit. The wireless icon on the unit is blinking green. Power cycle the unit didn't eliminate such capability.

I am a bit at loss why wireless capability was disabled before and how it could be enabled by the steps I took. One thing I should have done at the beginning, is to reset it to factory default, just in case someone has messed it up somehow. The unit has all its seals and should be a new one. Anyhow, I am glad it worked out. However, I can't imagine any non-techie can get this far w/o picking up the phone or returning it to the retailer.

One thing to note is that WPA-PSK worked as advertised, both on the access point and on the BB7270. A lengthy 'mydoaminDS-24random' was used for SSID and the maximum allowed '63RANDOM' as the PSK (Pre-shared key). I couldn't imagine that I can type them in w/o a typo, so I emailed such lengthy secret to the email account currently assigned to my test BB7270. On the BB7270, I copy+paste these lengthy random bits from email to a new WLAN profile creation form under Options/WLAN. It worked great. With older version of BB7270 firmware, copy+paste WEP key fails each every time. I'd assume the bug was fixed by now (from v4.0.1.84 to v.4.0.1.104). On the BES (Blackberry Enterprise Solution) server, a site-specific "IT Policy" got created on the using these WLAN secrets worked great too. However, the SSID somehow took several kicks to get really sucked into the "IT policy". The BBM (Blackberry Manager) didn't complain when SSID didn't take, resulting that a BB7270 provisioned from scratch after nuking didn't carry any WLAN profile at all. Only upon reviewing the IT policy, I was shocked to find SSID field was blank while everything else under the WLAN policy section looked good.

Friday, November 10, 2006

perc 4e/Di on Dell PE6850 saga continues...part C

With the load from the full database dump plus the application load burst we set up last Friday , the problematic server 'syb04' generated a few alerts over the weekend. The alerts complained the stamp didn't show up right after the 'logger' call. We were very excited, thinking we were able to reproduce the problem this quick. The next thing would be just to pick out what to upgrade from a decent list of potential upgrades.

Examining closely the local log as well as the log on the remote syslogd server, however, showed that such 'missing' stamps appeared up right after the complaints of their absence. Most were within the same second, and only one was one second late. Therefore such alerts were identified as false positives.

Needless to say, we were very disappointed. Even worse, this remained the case for the full week. We started to toss around the idea maybe the hiccup was merely a delay and we overacted a little by flipping the switch.

To "add insult to the injury", our constant attention was demanded by a lot of database problems related to application peak load which was coerced to repeat. The problems were:

Sybase ASE log device filled up, causing the application peak load come to a sudden halt, until Sybase is restarted with log cleared.
Hourly transaction has grown from 20M each to over 1G each. It seemed like some transaction failed to be committed.
In turn the transaction dumps filled up the disk.

So far, the server endured seven 24-hour days of load run, which totals 24x7x3= $504 load peaks. A regular day have only 2 peaks, therefore, this equals to 250 days worth of load.

I am betting a small sum of money on the PR (patrol read), whose background scheduling may be surprised by the sudden spike in disk IO caused by the nightly full database backup as well as the daily application peak. To force PR to collide with the load, I wrote a script to check PR status and start one if none is 'In Progress' already, as reported by 'megapr -dispPR -a0'.

BTW, the 'megapr -dispPR -a0' command alone causes the following errors in PERC controller's exportlog. My inquiry on this error got no response from Dell's linux-PowerEdge forum, which is monitored by a few Dell engineers.
11/07 10:25:51: MPT_Rec: INQ Error - Negotiating LD[6] pRfm a07517c0
11/07 10:25:51: MPT_Rec: INQ Error - Negotiating LD[16] pRfm a0743360
11/07 10:25:51: GET: SCSI_chn=ff, rtn status=0

Tuesday, November 07, 2006

debugfs :: handy utility to debug an ext2 or ext3 file system

Today I am learning a useful utility program named 'debugfs'. It is part of e2fsprogs package, an essential package containing axillary programs for ext2 and ext3 file system under Linux.

For a regular file, 'debugfs' can help you find an inode by any data block the file or dir entry is using. Then you can turn around and ask for the name of the inode. This could be handy when some mysterious files causing df and du to disagree whether the filie system is full, or the file system is corrupted or can't mounted to be accessed as usual. More advanced file system features are available too.

# to find what inode is claiming a given data block
# debugfs -R "icheck 12345" /dev/hda1
debugfs 1.35 (28-Feb-2004) Block Inode number 12345 340

# to find the file name given the inode number
# debugfs -R "ncheck 49153" /dev/hda1 debugfs 1.35 (28-Feb-2004) Inode Pathname
49153 /usr/share/locale/ar/LC_MESSAGES/libbonobo-2.0.mo

# Print the location of the inode data structure
# debugfs -R "imap /boot/vmlinuz-2.6.9-42.0.2.EL" /dev/hda1 debugfs 1.35 (28-Feb-2004)
Inode 557516 is part of block group 34
located at block 1114128, offset 0x0580

# to dump the direntry (filespec, per man page)
debugfs -R "dump -p /boot/vmlinuz- 2.6.9-42.0.2.EL /tmp/vmlinuz_dumped" /dev/hda1
# md5sum /boot/vmlinuz-2.6.9-42.0.2.EL /tmp/vmlinuz_dumped e5c536b539b5ffcaa03b22bd7fcc164a /boot/vmlinuz-2.6.9-42.0.2.EL e5c536b539b5ffcaa03b22bd7fcc164a /tmp/vmlinuz_dumped

# to get the contents of a file, assume the fs can't be mounted and accessed the usually way.
# debugfs -R "cat /etc/redhat-release" /dev/hda1
debugfs 1.35 (28-Feb-2004) CentOS release 4.4 (Final)

Noteworthy is, for files under /selinux ( a pseudo fs), it can find inode number associated with a data block. However, it couldn't find the file name for the very inode number.
# debugfs -R "ncheck 8" /dev/hda1 debugfs 1.35 (28-Feb-2004)
Inode Pathname 8
# find / -inum 8
/selinux/relabel
# ls -id /selinux/relabel
8 /selinux/relabel
# debugfs -R "icheck 4567" /dev/hda1 debugfs 1.35 (28-Feb-2004) Block Inode number
4567 8
# / is on /dev/hda1
/dev/hda1 8127400 6738524 1306308 84% /

There are a lot of powerful (and dangerous) features such as

feature you can set or clear various file system features in the superblock
freeb to mark data blocks as unallocated vs. setb
freei to free the inode specified
clri to clear the contents of the inode
chroot to chroot to the directory
find_free_block
find_free_inode
init_filesys to create an ext2 file system
kill_file deallocate the file and its blocks. It doesn't remove any direntry to this inode. not ' rm' or 'unlink'.
logdump to dump the ext3 journal
modify_inode modify the contents of the inode structure
ls/mkdir/mknod/rm/rmdir

'debugfs' starts interactively by default, unless you have '-R' to request one-time use only. A session would be like below:
# debugfs
debugfs 1.35 (28-Feb-2004)
debugfs: open /dev/hda1
debugfs: icheck 12345
Block Inode number
12345 340
debugfs: ncheck 340
Inode Pathname
340 /usr/X11R6/lib/xscreensaver/mountain
debugfs: close
debugfs: quit

Monday, October 23, 2006

step-by-step: how to set up a large ramdisk on CentOS 4.4/i386

On our new database server running CentOS 4.4 on a Dell PE 6850, a 2G ramdisk is needed. The option to use 'tmpfs' was considered and forfeited due to its potential of using disk swap.

To make the ramdisk 2G apiece, a kernel parameter (ramdisk_size) needs to be passed ( grub.conf or lilo.conf) and a quick reboot is required to make it effective. The ramdisk by default is 16M in size.

Here is my grub.conf entry:

kernel /vmlinuz-2.6.9-42.ELsmp ro root=LABEL=/ console=ttyS0,57600n8 console=tty1 ramdisk_size=2097152

To make the ramdisk accessible to the database after reboot, I cooked a chkconfig-compliant init script

 ---cut----x------/etc/init.d/ramdisk4syb-----x-----cut---------
#!/bin/bash
#
# $Id: rc.ramdisk4syb,v 1.00 2006/10/23 13:57:14 jer Exp $
#
# rc file for set up ramdisk fs for sybase to use for tempdb
#
#

 # For Redhat-ish systems
#
# chkconfig: 2345 30 70
# processname: ramdisk4syb
# config: none
# description: set up ramdisk for sybase to use for tempdb
case $1 in
start)
          test -d /mnt/ram1 || mkdir /mnt/ram1
   mke2fs -m 1 -b 1024 /dev/ram1  >/tmp/ramdisk.mke2fs.log &&
   mount -t ext2 /dev/ram1 /mnt/ram1 &&
   chmod 770 /mnt/ram1 && chown sybase:dba /mnt/ram1
          ;;
stop)
   umount /mnt/ram1
         ;;
*)
  echo "Usage: $0 start|stop"
  ;;
esac

to pin the init script to run levels

chkconfig --add ramdisk4syb

to test the init script before reboot

/etc/rc3.d/S30ramdisk4syb start && df -k /mnt/ram1 && ls -ld /man/ram1

to make all this effective by reboot.

init 6

to verify the resultant ramdisk file system

df -k /mnt/ram1 && ls -ld /man/ram1

Much to my surprise, the ramdisk is not mounted after the reboot! The mke2fs log showed the mke2fs was successful. When I attempted to mount it manually, I got
# mount /dev/ram1 /mnt/ram1 mount: wrong fs type, bad option, bad superblock on /dev/ram1, or too many mounted file systems

In the log, I did notice the block size was 4K was instead of the 1K block size I observed with a 512M ramdisk.
Block size=4096 (log=2) Fragment size=4096 (log=2) 262144 inodes, 524288 blocks

Since the Sybase ASE uses 2K page size, I tried mke2fs using a 2K block. No joy! Mounting attempts were met with the same errors.

1K block size is known to have worked for 512M ramdisk, so I gave it a try. Bingo!
#mke2fs -b 1024 /dev/ram1 Block size=1024 (log=0) Fragment size=1024 (log=0) 262144 inodes, 2097152 blocks
Now we have a working ramdisk as below:
/dev/ram1 2063763 3086 1955820 1% /mnt/ram1
I don't see much of a point to waste space on 5% reserved for superuser. So, I tuned it down to 1%
# tune2fs -m 1 /dev/ram1
/dev/ram1 2063763 3086 2039706 1% /mnt/ram1
From the 'free' output, I am a bit surprised to see the 2G is not counted as 'used'.
total used free shared buffers cached Mem: 16627288 231236 16396052 0 48300 89240 -/+ buffers/cache: 93696 16533592 Swap: 16779884 0 16779884

bootparam man[7] page actually says " These days ram disks use the buffer cache, and grow dynamically." If so, ramdisk truly behaves similar to tmpfs at this point, except

ramdisk's maximum size (the reserved?) needs a reboot to adjust, while tmpfs can be adjusted on the fly.
ramdisk won't have swap in the picture at all. [good thing]

However, by growing dynamically, FWIW, it may not able to grow to its full reserved size when it needs it [bad bad thing], if physical RAM is used up by some real processes already. Questions yet to be answered:

Does the 2.6 kernel or other series know or set ramdisk has higher priority on its claim to thus reserved memory?!
why ext2 file system made with block sizes other than 1K failed to be mounted ? ramdisk is the only case here?

Sunday, October 22, 2006

which is eth0 on a PowerEdge 6850 running CentOS 4.4/i386

When setting up the refurbished Dell PowerEdge 6850 I purchased recently, I noticed it is newer and has a few oddities

no video connectors at front panel nor the back. SVGA video only on the DRAC 5 card.
a dual-port Intel Pro copper Ethernet card
PERC adapter is of 4e/DC instead of the 4e/Di with the other two PE6850 bought new from Dell last summer.

A minimal installation of Linux is done smoothly using a CentOS 4.4/i386 server-cd. Well, I did run into one hitch that caused problems getting the network connectivity going. Initially, I didn't have confidence in ports on both ends of the cat5e cable:

The "new" port is in an aged Cisco Catalyst 2900XL switch bought off eBay in 2000. My boss told me several times one port is known to be bad on this switch, since I joined the company a year ago. I used twelve free ports so far without problem. This port is one of the last two "free" ports on this 24-port switch.
The port was in VLAN #1 to mirror two neighboring ports and had not been in active use. I programmed the switch such that the port now participates VLAN#3 instead of VLAN#1, and no longer mirrors any port.
I was hoping to see a steady green LED and a blinking yellow LED on the NIC, by switching the cable from port to port on the server. The results were not telling. The first embedded Broadcom port had that, but no connectivity. The two Intel-pro ports had only green LED, no blinking yellow LED nor connectivity. I later came to the realization that the fast switching from port to port won't work well, probably due to ARP cache and such on the switch.

After a bit of frustration, I decided to leave the cable in the first embedded port and to do some serious work on the server side to make it work. Need a quieter place to think and type, I borrowed a serial cable to access its serial console. All the boxen I set up have serial console enabled in /etc/inittab and GRUB or LILO. For Dell PowerEdge servers, with BIOS-redirect-to-serial enabled too. Too busy to bargain hunt a serial condenser. I used Digi board before and it is nice. However, Digi board is too expensive a proposition for the current employer who's satisfied with the capability to call the ISP technician to hook up monitor and keyboard to their servers.

Once sitting down in the much quieter guest working area, I immediately put the pieces together and realized that the embedded Broadcom Ethernet ports were enumerated as eth2 and eth3, while the two ports on the Intel-Pro PCI-e card were as eth0 and eth1. Here is my "detective work":

grep eth /etc/modprobe .conf
alias eth0 e1000 alias eth1 e1000 alias eth2 tg3 alias eth3 tg3
dmesg
e1000: 0000:17:06.0: e1000_probe: (PCI:66MHz:64-bit) 00:04:23:cd:6e:b8 divert: allocating divert_blk for eth0 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection ACPI: PCI interrupt 0000:17:06.1[B] -> GSI 133 (level, low) -> IRQ 50 e1000: 0000:17:06.1: e1000_probe: (PCI:66MHz:64-bit) 00:04:23:cd:6e:b9 divert: allocating divert_blk for eth1 e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection tg3.c:v3.52-rh (Mar 06, 2006) ACPI: PCI interrupt 0000:14:02.0[A] -> GSI 64 (level, low) -> IRQ 201 divert: allocating divert_blk for eth2 eth2: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:13:72:40:e1:bd eth2: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[1] Split[0] WireSpeed[1] TSOcap[0] eth2: dma_rwctrl[769f4000] ACPI: PCI interrupt 0000:14: 02.1[B] -> GSI 65 (level, low) -> IRQ 209 divert: allocating divert_blk for eth3 eth3: Tigon3 [partno(BCM95704A6) rev 2100 PHY(5704)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:13:72:40:e1:be eth3: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[1] eth3: dma_rwctrl[769f4000]
grep HWA /etc/sysconfig/network-scripts/ifconfig-eth?
ifcfg-eth1:HWADDR=00:04:23:CD:6E:B9 ifcfg-eth2:HWADDR=00:13:72:40:E1:BD ifcfg-eth3:HWADDR=00:13:72:40:E1:BE ifcfg-eth0:HWADDR=00:04:23:CD:6E:B8

With the card-to-driver mapping in dmesg and in the modprobe.conf, it is easy to see the embedded Broadcom ethernet ports are mapped to eth2 and eth3, while the add-on Intel-pro ports are mapped to eth0 and eth1. The mac addresses in the 'dmesg' and the HWADDR recorded in ifcfg-eth1 through ifcfg-eth3 files hammered down the last nail. Without the last nail, it'd rather hard to tell if all ports is of the same make and model.

mii-tool sorta told the tale as well. lspci can help too.
eth0: no link eth1: no link eth2: 10baseTx, HD, link ok eth3: no link
Once I duplicated the IP/netmask on eth0 to eth2, then ifdown eth0 and ifup eth2, all is well now per ' mii-tool'
eth2: negotiated 100baseTx-FD, link ok

It'd be an interesting exercises to force eth0 to be the first embedded Broadcom port. It makes more sense for Linux to enumerate the embeded ports first, ethernet or not. What do you think?

Saturday, October 21, 2006

step-by-step: how to host WordPress under Linux (FC5/RHEL4)

Why:

on Blogger: missing certain nice features ( static page support, import/export). Or,
on WordPress.com: don't like your own html get stripped off. Your little piggy-bank gets left empty since Google Adsense code get stripped or disabled. Or,
want your own professional domain name and such. Or,
simply love the experience of DIY and the freedom of customization

Assumptions & Prerequisites:

Fedora Core 5 stock installation (All should be applicable to CentOS 4 or RHEL 4 as well, unless the FC5 SELinux changes get in your way.)

apache (httpd package) installed.
php package installed.
mysql-server package installed.

The stable release (latest.tar.gz) has been downloaded from wordpress.org and extracted under /var/www/html
comfortable with MySQL and SQL commands.
starter's knowledge of how a blog works.

Steps to follow:
# set up MySQL access

This is to set up a MySQL database dedicated for WordPress and a user to access this database from the Apache/PHP combo. The schema and data for the database itself will be populated by install.php later. The names used below assume you want user 'wp_master' to have all access to a local database named 'wordpressDB'.

/etc/init.d/mysqld start
/usr/bin/mysqladmin -u root password 'mysqlAdminpassword'
/usr/bin/mysqladmin -u root -p create wordpressDB
mysql -Uroot -p mysql

insert into user (host, user, password) values ('localhost', 'wp_master', password('mastersecret'));
insert into db (host, db, user) values ('localhost', 'wordpressDB', 'wp_master');

the 'db' table entry should also grant the 'wp_master' user with most of the privileges to the 'wordpressDB' . So, I simply replaced the 'test' database entry therein, since I am lazy, plus I consider a wild-open test db a security risk.
update db set host='localhost',db='wordpressDB',user='wp_master' where host='%' and db='test';

commit;
restart mysqld by '/etc/init.d/mysqld restart'

# set up MySQL PHP extension support (Otherwise you may get this error, "Your PHP installation appears to be missing the MySQL which is required for WordPress. ")

yum -y install php-mysql
/etc/init.d/httpd restart

# to set up wp-config.php with your own set of credentials for WP to use to access MySQL
cd /var/www/html/wordpress && cp wp-config-sample.php wp-config.php && vi wp-config.php
# RTFM using your favorite browser

point to your browser to http://localhost/wordpress/readme.html

# initialize/populate your new WP database. Do record the 'admin' account information generated.
point to your browser to follow http://localhost/wordpress/wp-admin/install.php

# Now, time to verify. If it worked, you'd see the default 'hello world' post using the 'default' theme.
point to your browser to http://localhost/wordpress/index.php

# to customize the blog right away, click on 'edit' link inside the blog and log on with the 'admin' account

# about them themes:
Only two themes are bundled (classic/default). To add a theme, find ones you like and download from a trustworthy source.

wget http://downloads.wordpress.org/theme/pool.zip
cd /var/www/html/wordpress/wp-content/themes && unzip pool.zip
Now the new theme(s) would just show up if you go back to edit/presentation/themes

# Get them roamers home

import existing blogs and comments from Blogger and other top blog sites.
if new, register this new blog with technorati.com, weblogs.com
new or updated, brief your readers of the changes [advance notice would be appreciated too]

Wednesday, October 18, 2006

PERC 4e/di went offline on a Dell PowerEdge 6850, with nothing apparently bad

I got paged 2:15 in the morning, only to find the main RAID controller on a Dell PowerEdge 6850 yanked itself away from under the running OS (CentOS 4.1/i386), again. The OS seemed to be fine, except any task involving read/write to the disk failed.

The PE6850 server was bought new from Dell last June, and has been in production since. Same day same time last month, it gave us its first outage showing the same symptoms. Questions we asked ourselves are: why starts now? And why exactly one month apart?

The serial console showed the following message repeating on the screen:

  scsi0 (0:0): rejecting I/O to offline device  EXT3-fs error (device sda2): ext3_find_entry: reading directory #178817 offset 0

The nightly full database backup kicked off at 2:00 and should finish by 2:45am.
Log exported from the PERC 4e/di controller showed that PR (patrol read) is scheduled to run every four hours and one started at 1:00 and didn't get to finish either.
Same RAID controller log also has two interesting entries that coincided with the two outages. These are the only two "ProcessHostDmaInterrupt: No requests active" for a log stretching back to last June. The DMA thingy made the BIOS update A01 interesting in that it corrects memory address assignment or claims by the raid controller ( The server has 16G DDR2 ECC RAM, 2G chips)

09/18  2:05:47: ProcessHostDmaInterrupt: No requests active! (ch=a0102be8)
09/18  2:05:47: ProcessHostDmaInterrupt: No requests active! (ch=a0102be8)
10/18  2:13:34: ProcessHostDmaInterrupt: No requests active! (ch=a0102be8)
10/18  2:13:34: ProcessHostDmaInterrupt: No requests active! (ch=a0102be8)

Some postings suggests turning off PR may help in case it conflicts with heavy disk I/O from application tasks. However, at time of both outages, the disk I/O and the CPU load is not the highest, as we've seen much higher ones during the day every work day, per the baselines neatly graphed using RRDTOOL by our new Hobbit Monitor.
Dell Diag is run and nothing is reported wrong, for the memory, disks, RAID controller and such.

Checking Dell's Driver download site with the server's service tag turned up a few interesting potential fixes:

LSI Logic Perc 4e/Di, v.522A, A13 [[We are at 521S, A00]] Release Date: 9/11/2006

"Fixed an issue that could cause a blue screen, file system error or system hang when using EVPD inquiry commands."
Anybody has any idea what is exactly "an issue" ?!

BIOS upgrade A01 [[ we are at A00 ]]

Added workaround for lockup resulting from the systems with 8GB RAM or more and RAID storage controller potentially claiming inappropriate addresses.

SCSI controller firmware update JT00 for various Maxtor drives

"Under certain circumstances a hard disk drive may go offline, hard disk drives (HDD), may report offline due to a timeout condition. If the HDD is unable to complete commands, this may result in the controller reporting the HDD off line due to a timeout condition."
"Higher than expected failures rates have been reported on the Maxtor Atlas 10K V ULD (unleaded or lead free) SCSI hard disk drives. If the hard disk drive (HDD) is unable to complete commands, this may result in the controller reporting the HDD offline due to the timeout condition. The primary failure modes have been the HDD failing to successfully rebuild and also failing after a rebuild has completed."
Sounds promising, but it turned out that drives in our PE6850 are all Seagate.