Examining closely the local log as well as the log on the remote syslogd server, however, showed that such 'missing' stamps appeared up right after the complaints of their absence. Most were within the same second, and only one was one second late. Therefore such alerts were identified as false positives.
Needless to say, we were very disappointed. Even worse, this remained the case for the full week. We started to toss around the idea maybe the hiccup was merely a delay and we overacted a little by flipping the switch.
To "add insult to the injury", our constant attention was demanded by a lot of database problems related to application peak load which was coerced to repeat. The problems were:
- Sybase ASE log device filled up, causing the application peak load come to a sudden halt, until Sybase is restarted with log cleared.
- Hourly transaction has grown from 20M each to over 1G each. It seemed like some transaction failed to be committed.
- In turn the transaction dumps filled up the disk.
I am betting a small sum of money on the PR (patrol read), whose background scheduling may be surprised by the sudden spike in disk IO caused by the nightly full database backup as well as the daily application peak. To force PR to collide with the load, I wrote a script to check PR status and start one if none is 'In Progress' already, as reported by 'megapr -dispPR -a0'.
BTW, the 'megapr -dispPR -a0' command alone causes the following errors in PERC controller's exportlog. My inquiry on this error got no response from Dell's linux-PowerEdge forum, which is monitored by a few Dell engineers.
11/07 10:25:51: MPT_Rec: INQ Error - Negotiating LD[6] pRfm a07517c0
11/07 10:25:51: MPT_Rec: INQ Error - Negotiating LD[16] pRfm a0743360
11/07 10:25:51: GET: SCSI_chn=ff, rtn status=0
3 comments:
if its under warranty, replace the controller.
The problem is no diagnostic utilities showing anything wrong with any hardware component. Dell Tech support took the position that CentOS is unsupported (even those CentOS 4 is a certifiable twin of RHEL 4 AS) and Dell Diag reports nothing wrong with hardware, so nothing, zip!
What ever became of this for you? We received similiar log messages and are wondering what they mean. We also see the same message as you and also things in our like: "Rejecting MISC opcode: unknown sub-opcode (0x26)" and "Rejecting Unknown FC DCMD 1f"
Post a Comment