Woo-hoo! I love it when a plan comes together. All UPS managed servers shutdown cleanly as their UPS batteries gave out. The primary servers (file, db, web) stayed up for approximately 47 minutes after primary power was lost. Compute servers stayed up for anywhere from 8 minutes to 28 minutes depending on the UPS. Slave apcupsd daemons correctly sensed when the primary apcupsd daemons were transitioning, and performed apprpriately. The lone unprotected compute server came back up fine. So, at least in the minimal load scenario, everything performed as it was supposed to. Time will tell if all is well under heavy load, when shutdown times are likely to be a little longer.
Our building is undergoing some rennovations and they’ll be turning off power this weekend. I’ve already set up my apcupsd daemons to monitor the uninteruptable power supplies and automatically shutdown when the batteries are about to give out. It is unlikely that any of my UPSs will last the entire outage, so I’ll get a test of the automated shutdown procedure, I guess. While I was in the server room taking care of exon’s new memory module, I rebooted each of the other machines and set their power-restored state to “on” in the computer’s BIOS – this way, when power is restored, they should just restart and I won’t have to wait until Monday to access my servers again. I also set each of the machines to use demand-based power management – getting sufficient UPS backed power capacity is a problem right now in the server room, so I’m maxing out my UPSs – perhaps the demand-based power management will reduce their load. One of my servers, intron, is currently plugged directly into the wall – I’ll be interested to see how it fares. Here’s hoping all is well on Monday!
One of my original power edge 2950 servers, exon, was showing a memory error code on its LCD panel.
First time I noticed it, I rebooted the server, and it seemed to go away.
Then a couple of weeks latter, Les, the sysadmin for the other machines in the server room mentioned that one of my machines was flashing orange again – so clearly something was up.
Computers codon, tandem, and trypsin installed in the Edwards Lab research cluster. These Dell 1950 III Power Edge servers have two quad-core Xeon processors and 16Gb of RAM each. These machines bring the total number of available CPUs in the Edwards Lab cluster to 64.