Showing posts with label ramdisk. Show all posts
Showing posts with label ramdisk. Show all posts

Tuesday, October 31, 2006

kernel panic when using a 2G ramdisk on a CentOS 4.4 serverCD installation

I was burnt recently with a minimal installation using CentOS 4.4 ServerCD. CentOS project has this one-CD flavor for server use in addition to the regular current 4-CD full distro, mirroring Redhat Enterprise Advanced Server (RHAS). I found it awfully convenient to download only one CD of 530M, instead of the full set of 2.4G. The installations I do at work are all server installations, if my own Linux desktop is not counted. It has been a god-given ever since I first noticed its existence in CentOS 4.2 /iso folder.

Kernel paniced for a new production Sybase ASE database server when Sybase ASE 12.5 database engines started to zero out ~2G tmpdb devices (data & log devices). The hardware specs: Dell PowerEdge 6850 with 16G of DDR2 RAM and quad CPUs. The box survived my own stress test on CPU and disk. However, as an hindsight, no attempt was made to exhaust the memory.

The Sybase seemed to have finished its task, when the DBA called me saying he lost connection to the machine. I attached to its serial console and opened a connection using Kermit. Input was not rejected, but no response (no echo) from the system. After a quick reboot, I found nothing interesting in the log tail around the panic ( syslog.conf is set to write kernel.* to /var/log/messages). So, I decided to log the console session and asked the DBA to start Sybase. Sure enough, it happened again, with tons of 'out of memory' messages, followed by desperate yet fatal attempts by oom-killer to kill all processes to free up memory.
oom-killer: gfp_mask=0xd0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1

cpu 0 cold: low 0, high 2, batch 1

cpu 1 hot: low 2, high 6, batch 1

cpu 1 cold: low 0, high 2, batch 1

cpu 2 hot: low 2, high 6, batch 1

cpu 2 cold: low 0, high 2, batch 1

cpu 3 hot: low 2, high 6, batch 1

cpu 3 cold: low 0, high 2, batch 1

cpu 4 hot: low 2, high 6, batch 1

cpu 4 cold: low 0, high 2, batch 1

cpu 5 hot: low 2, high 6, batch 1

cpu 5 cold: low 0, high 2, batch 1

cpu 6 hot: low 2, high 6, batch 1

cpu 6 cold: low 0, high 2, batch 1
cpu 7 hot: low 2, high 6, batch 1

cpu 7 cold: low 0, high 2, batch 1

Normal per-cpu:

cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16
cpu 1 cold: low 0, high 32, batch 16
cpu 2 hot: low 32, high 96, batch 16

cpu 2 cold: low 0, high 32, batch 16

cpu 3 hot: low 32, high 96, batch 16

cpu 3 cold: low 0, high 32, batch 16

cpu 4 hot: low 32, high 96, batch 16

cpu 4 cold: low 0, high 32, batch 16
cpu 5 hot: low 32, high 96, batch 16
cpu 5 cold: low 0, high 32, batch 16
cpu 6 hot: low 32, high 96, batch 16
cpu 6 cold: low 0, high 32, batch 16
cpu 7 hot: low 32, high 96, batch 16
cpu 7 cold: low 0, high 32, batch 16
HighMem per-cpu:
cpu 0 hot: low 32, high 96, batch 16

cpu 0 cold: low 0, high 32, batch 16
cpu 1 hot: low 32, high 96, batch 16 cpu 1 cold: low 0, high 32, batch 16 cpu 2 hot: low 32, high 96, batch 16 cpu 2 cold: low 0, high 32, batch 16 cpu 3 hot: low 32, high 96, batch 16 cpu 3 cold: low 0, high 32, batch 16 cpu 4 hot: low 32, high 96, batch 16 cpu 4 cold: low 0, high 32, batch 16 cpu 5 hot: low 32, high 96, batch 16 cpu 5 cold: low 0, high 32, batch 16 cpu 6 hot: low 32, high 96, batch 16 cpu 6 cold: low 0, high 32, batch 16 cpu 7 hot: low 32, high 96, batch 16 cpu 7 cold: low 0, high 32, batch 16 Free pages: 12506892kB (12493888kB HighMem) Active:208204 inactive:791964 dirty:101636 writeback:26 unstable:0 free:3126723 slab:20412 mapped:563179 pagetables:DMA free:12564kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:4 all_unreclaimable? yes protections[]: 0 0 0 Normal free:440kB min:928kB low:1856kB high:2784kB active:644696kB inactive:16036kB present:901120kB pages_scanned:1069066 all_unreclaimable? yes protections[]: 0 0 0 HighMem free:12493888kB min:512kB low:1024kB high:1536kB active:180824kB inactive:3159244kB present:16908288kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 DMA: 3*4kB 5*8kB 4*16kB 3*32kB 3*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12564kB Normal: 0*4kB 1*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 440kB HighMem: 1418*4kB 535*8kB 128*16kB 85*32kB 119*64kB 38*128kB 10*256kB 4*512kB 2*1024kB 2*2048kB 3041*4096kB = 12493888kB Swap cache: add 0, delete 0, find 0/0, race 0+0 0 bounce buffer pages Free swap: 16779884kB 4456448 pages of RAM 3963834 pages of HIGHMEM 299626 reserved pages 849369 pages shared 0 pages swap cached Out of Memory: Killed process 6586 (dataserver). oom-killer: gfp_mask=0xd0 Mem-info: DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 cpu 1 hot: low 2, high 6, batch 1 cpu 1 cold: low 0, high 2, batch 1 cpu 2 hot: low 2, high 6, batch 1 cpu 2 cold: low 0, high 2, batch 1 cpu 3 hot: low 2, high 6, batch 1 cpu 3 cold: low 0, high 2, batch 1 cpu 4 hot: low 2, high 6, batch 1 cpu 4 cold: low 0, high 2, batch 1 cpu 5 hot: low 2, high 6, batch 1 cpu 5 cold: low 0, high 2, batch 1 cpu 6 hot: low 2, high 6, batch 1 cpu 6 cold: low 0, high 2, batch 1 cpu 7 hot: low 32, high 96, batch 16 cpu 7 cold: low 0, high 32, batch 16 Free pages: 12507020kB (12493888kB HighMem) Active:46134 inactive:953784 dirty:101636 writeback:26 unstable:0 free:3126755 slab:20379 mapped:563179 pagetables:1589 DMA free:12564kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:4 all_unreclaimable? yes protections[]: 0 0 0 Normal free:568kB min:928kB low:1856kB high:2784kB active:0kB inactive:659820kB present:901120kB pages_scanned:1845199 all_unreclaimable? no protections[]: 0 0 0 HighMem free:12493888kB min:512kB low:1024kB high:1536kB active:180824kB inactive:3159244kB present:16908288kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 DMA: 3*4kB 5*8kB 4*16kB 3*32kB 3*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12564kB Normal: 20*4kB 3*8kB 3*16kB 1*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 568kB HighMem: 1418*4kB 535*8kB 128*16kB 85*32kB 119*64kB 38*128kB 10*256kB 4*512kB 2*1024kB 2*2048kB 3041*4096kB = 12493888kB Swap cache: add 0, delete 0, find 0/0, race 0+0 0 bounce buffer pages Free swap: 16779884kB 4456448 pages of RAM 3963834 pages of HIGHMEM 299626 reserved pages 849602 pages shared 0 pages swap cached Out of Memory: Killed process 6587 (dataserver). oom-killer: gfp_mask=0xd0 Mem-info: DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 cpu 1 hot: low 2, high 6, batch 1 cpu 1 cold: low 0, high 2, batch 1 cpu 2 hot: low 2, high 6, batch 1

The server was rebooted fine with sybase up the day before. Today, the DBA got some message claiming the tmpdb devices are not initialized. Since these tmpdb devices are created from scratch when sybase starts, it really didn't make sense! DBA decides to drop the current set of tmpdb devices and create a new set instead. Just when the new set is finishing initialization per Sybase logs, the system stopped responding. Before I lost my console, I were able to issue 'ps -e -o rss,pid,cmd,args|sort -n', 'free', and 'top'. Only about 6G is used out of the total 16G. So, the memory is not really exhausted. So, this got to do with how Kernel is carving and using the memory. Googling ' oom-killer: gfp_mask=0xd0' yields a single post which referred to a bug of sorts by a overly-zealous OOM-killer and how to disable it, due to problem with memory addressing scheume switch between high mem and low mem area.

For that, I wonder if ramdisk was claiming HIMEM. If HIMEM were not addressable or addressed improperly, that may have the appearance of OOM (out of memory). Once I started to examine '/var/log/messages' line by line, there, it was painfully obvious that hugemem kernel was not selected by CentOS ServerCD 4.4. The regular kernel-smp was selected.
Oct 23 12:35:30 syb06 kernel: Linux version 2.6.9-42.ELsmp (buildcentos@build-i386) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-3 )) #1 SMP Sat Aug 12 09:39:11 CDT 2006
Oct 23 12:35:30 syb06 kernel: BIOS-provided physical RAM map:
Oct 23 12:35:30 syb06 kernel:********************************************************
Oct 23 12:35:30 syb06 kernel: * This system has more than 16 Gigabyte of memory. *
Oct 23 12:35:30 syb06 kernel: * It is recommended that you read the release notes *
Oct 23 12:35:30 syb06 kernel: * that accompany your copy of Red Hat Enterprise Linux *
Oct 23 12:35:30 syb06 kernel: * about the recommended kernel for such configurations *
Oct 23 12:35:30 syb06 kernel: **********************************************

Once I downloaded the corresponding kernel-hugemem and rebooted the server with it, (grub.conf needed to be edited manually to to boot the new kernel by default). All is peachy again.

P.S. To answer Barry's question below, here is the 12-day memory RRD graph. The panic happened on the last Thursday or Friday, where lines were broken due to missing data. It would be nicer if Blogger actually allows image insertion inside comments. -20061030
P.P.S. Obviously the correct selection of the right kernel package and the maintenance thereof had been a problem/pain for Redhat's maintainers and customers. Per its release notes, the newly minted Fedore Core 6 (FC6) has one single kernel package per architecture. Kernel parameters optimized for different hardware specs (SMP or UNP, 4G/8G/16G/32G of RAM, etc.) would be set dynamicly upon boot, instead of being hard coded in different kernel packages: kernel, kernel-smp, and kernel-hugemem. Of course, with RHEL5 beta in the horizon, I guess we won't see such a change (in RHEL6) for a few years :(

Monday, October 23, 2006

step-by-step: how to set up a large ramdisk on CentOS 4.4/i386

On our new database server running CentOS 4.4 on a Dell PE 6850, a 2G ramdisk is needed. The option to use 'tmpfs' was considered and forfeited due to its potential of using disk swap.
  • To make the ramdisk 2G apiece, a kernel parameter (ramdisk_size) needs to be passed ( grub.conf or lilo.conf) and a quick reboot is required to make it effective. The ramdisk by default is 16M in size.
Here is my grub.conf entry:
kernel /vmlinuz-2.6.9-42.ELsmp ro root=LABEL=/ console=ttyS0,57600n8 console=tty1 ramdisk_size=2097152
  • To make the ramdisk accessible to the database after reboot, I cooked a chkconfig-compliant init script
---cut----x------/etc/init.d/ramdisk4syb-----x-----cut---------
#!/bin/bash
#
# $Id: rc.ramdisk4syb,v 1.00 2006/10/23 13:57:14 jer Exp $
#
# rc file for set up ramdisk fs for sybase to use for tempdb
#
#
# For Redhat-ish systems
#
# chkconfig: 2345 30 70
# processname: ramdisk4syb
# config: none
# description: set up ramdisk for sybase to use for tempdb
case $1 in
start)
test -d /mnt/ram1 || mkdir /mnt/ram1
mke2fs -m 1 -b 1024 /dev/ram1 >/tmp/ramdisk.mke2fs.log &&
mount -t ext2 /dev/ram1 /mnt/ram1 &&
chmod 770 /mnt/ram1 && chown sybase:dba /mnt/ram1
;;
stop)
umount /mnt/ram1
;;
*)
echo "Usage: $0 start|stop"
;;
esac
  • to pin the init script to run levels
chkconfig --add ramdisk4syb
  • to test the init script before reboot
/etc/rc3.d/S30ramdisk4syb start && df -k /mnt/ram1 && ls -ld /man/ram1
  • to make all this effective by reboot.
init 6
  • to verify the resultant ramdisk file system
df -k /mnt/ram1 && ls -ld /man/ram1

Much to my surprise, the ramdisk is not mounted after the reboot! The mke2fs log showed the mke2fs was successful. When I attempted to mount it manually, I got
# mount /dev/ram1 /mnt/ram1 mount: wrong fs type, bad option, bad superblock on /dev/ram1, or too many mounted file systems

In the log, I did notice the block size was 4K was instead of the 1K block size I observed with a 512M ramdisk.
Block size=4096 (log=2) Fragment size=4096 (log=2) 262144 inodes, 524288 blocks

Since the Sybase ASE uses 2K page size, I tried mke2fs using a 2K block. No joy! Mounting attempts were met with the same errors.

1K block size is known to have worked for 512M ramdisk, so I gave it a try. Bingo!
#mke2fs -b 1024 /dev/ram1 Block size=1024 (log=0) Fragment size=1024 (log=0) 262144 inodes, 2097152 blocks
Now we have a working ramdisk as below:
/dev/ram1 2063763 3086 1955820 1% /mnt/ram1
I don't see much of a point to waste space on 5% reserved for superuser. So, I tuned it down to 1%
# tune2fs -m 1 /dev/ram1
/dev/ram1 2063763 3086 2039706 1% /mnt/ram1

From the 'free' output, I am a bit surprised to see the 2G is not counted as 'used'.
total used free shared buffers cached Mem: 16627288 231236 16396052 0 48300 89240 -/+ buffers/cache: 93696 16533592 Swap: 16779884 0 16779884

bootparam man[7] page actually says " These days ram disks use the buffer cache, and grow dynamically." If so, ramdisk truly behaves similar to tmpfs at this point, except
  • ramdisk's maximum size (the reserved?) needs a reboot to adjust, while tmpfs can be adjusted on the fly.
  • ramdisk won't have swap in the picture at all. [good thing]
However, by growing dynamically, FWIW, it may not able to grow to its full reserved size when it needs it [bad bad thing], if physical RAM is used up by some real processes already. Questions yet to be answered:
  • Does the 2.6 kernel or other series know or set ramdisk has higher priority on its claim to thus reserved memory?!
  • why ext2 file system made with block sizes other than 1K failed to be mounted ? ramdisk is the only case here?