The server (the Sybase database engines) has been up for 14 days today. At 09:50am, just when the server started to ramp up to its daily load peak (CPU load ~=4) , some processes failed to write to the disk and 'date > junk' from cmdline just hang there. I canceled that 'date>junk'. All is good after less than 4 minutes. Nothing interesting (warn/error/abort) in the system log, exportlog from PERC controller, or database log. PR was running at the time.
The symptoms definitely differ, so the BIOS and firmware upgrade did make some difference towards the better. For the previous two lockups and the only two for 15 months, we lost access to the disks totally, getting "reject i/o to offlined disk" without kernel panic or corruption. This time, this is merely a hiccup or pause or suspension of sorts.
Older postings on similar topic on dell-linux-poweredge forum suggested PR could be the culprit if BIOS/firmware is up-to-date. On the system, I get the following output from '"megapr -dispPR -a0" today. Is #Iterations current count of the total PR has run or a threshold or some sort? If the former, how to clear it? If the latter, how to increase? Basically I am looking into why it locked up exactly 30 days (could be coincidence too. and we are now using newer BIOS and firmware). Dell diag from OMSA 4.4 on 10/17/2006 suggests nothing wrong the controller, memory, or underlying disks. (omreport on the controller is appended below too).
********PR INFO********
Mode :AUTO
#Iterations:2200
Status :PR In Progress
# omreport storage controller
Controller PERC 4e/Di (Embedded)
Controllers
ID
Status
Name
Slot ID : Embedded
State
Firmware Version : 522A
Driver Version : Not Applicable
Minimum Required Firmware Version : Not Applicable
Minimum Required Driver Version : Not Applicable
Number of Channels : 2
Rebuild Rate : 30%
Alarm State : Not Applicable
Cluster Mode : Not Applicable
SCSI Initiator ID : 7
- Added support for Virtualization Technology in the processor.
Should I assume this is not referring to HT, but of special server virtualization assistance from Intel's VT (?) technology or alike ?
- Added support for 800MHz system configurations.
Although the megaraid* driver is dated early 2005. The CHANGLOG.megraid in /kernel/Documentation doesn't have much interesting changes either.
No comments:
Post a Comment