Hobbit Monitor is running all the servers, so it is rather easy to catch the old lockup problem wherein all checks went to 'purple', as in 'stale', or 'no report received' status. It is a bit tricky to detect when a hiccup happens. If it happens squarely inside the 5-minute interval Hobbit Monitor uses, we'd miss the signal! It seems it is not all that easy to change monitor frequency down to 1 minute for one single client, as nobody has answered my question on the Hobbit mailing list for three days now. After much discussion of alternatives, I come up with a way and verified it works.
With the monitor fine-tuned and focused on syb04, load is added to it first. Count full nightly database backup and daily peak as two load situations, we need have at least 28 peaks to equate to the 14 days leading up to the lockup. The nightly database backup takes only 25 minutes, and is very easy to run it continuously by simply changing cron schedule to every 30 minutes instead of every day. So, we did that. After 20 hours (~= 40 load peaks), nothing happened. Since we don't plan to work over the weekend, it is decided to simulate the daily load peak and let it run continuously. It took some Java code change and it is done. So, we'd have both the application load and the backup load against the server over the weekend.
* fingers-crossed *
No comments:
Post a Comment