Reboot Adventures

Every so often, there are security updates for the Linux Kernel. Most security updates you get for Linux don’t require a reboot. Installing a new kernel is one of the few things that does. Well, we have 16 RedHat servers in production, and 11 of them had kernel updates in the queue (the other 5 were just put in production within the last couple days, and had the necessary updates installed already), so I got to spend a few hours tonight rebooting servers. And Myk and Chase got to spend that time at the colo facility watching them reboot in case anything went wrong. They also had other things to do while they were there (there were some tinderboxes being moved, and three of the above-mentioned new machines were actually getting put in tonight). And something did go wrong. Megalon (the CVS server) didn’t come back up after rebooting. After an hour or so, Myk power-cycled it, and it came back up this time. It seemed to have stalled trying to initialize the RAID controller the first time. Ah, what fun.