At Mozilla, since our server farm has gotten so big, we’ve gotten reloading and upgrading machines kind of down to a science. We have a PXE server in each colo, and installations and upgrades run over the network. It’s a great system until you get to one of the following two situations:
- The machine you need to upgrade is in someone else’s colo, with no local PXE server you have control over and no one on-site to do CD flipping for you. -or-
- The machine you need to upgrade *is* the PXE server.
We have situation #2 coming up soon, but I had a machine with situation #1 that I recently experimented on to get this all working without PXE or a CD.
We had a major push to get the entire RHEL4 portion of our infrastructure either reloaded or upgraded to RHEL5 about a year ago. There were several machines that didn’t get upgraded for one of a few reasons at the time. Machines that were due to be replaced soon anyway, machines that ran the RHN Proxies (because RHN Proxy didn’t run on RHEL5 yet at the time), and machines that had software running on them that for some reason didn’t work on RHEL5 yet at the time (Zimbra with clustering support in our case). And also machines that were in remote locations with no PXE and nobody to flip CDs.
RHN has a remote kickstart capability. I’ve experimented with it before, but never had too much success getting it working. I probably just didn’t play enough. But going on the general concept of how it worked, I discovered it was quite easy to get booted into the Anaconda installer from grub… Just copy the isolinux directory off the CD/DVD into /boot, and set up an entry in grub.conf for it that looked very similar to the one in pxelinux.cfg on the PXE server.
The big problem comes with where to have it locate the install media. Conventional wisdom says I’ll get the most bang for the time spent if I put it right on that machine’s hard drive. Only problem: Anaconda doesn’t know how to talk LVM until after it gets stage2.img loaded, and it needs to already have access to the install media to load that. Almost all of these machines have LVM for the main partitions. In the end, I ended up putting an “allow from” line in the apache config on our main kickstart server to allow it to be accessed from this machine’s IP address via the Internet, and just loaded everything over the Internet. Slowed it down quite a bit, but it worked… almost.
Once it got to the point of locating an existing installation to upgrade, it died with “Upgrading between major versions is not supported.” Say what? Last year we upgraded a good couple dozen RHEL4 boxes to RHEL5 in place and it worked just fine (with a few caveats – a few things break, but it’s easy to clean up in %post in your kickstart). Well, RHEL is currently 5.3, it was 5.1 at the time when we did it before. I’d long since discarded the 5.1 images off the kickstart server. So I downloaded the 5.1 DVD again, and staged it on the kickstart server again, adjusted the kickstart file to install 5.1 instead of 5.3, and set it off to do its thing again. This time, success. The machine is now upgraded to RHEL 5.1. Now I just have to use yum to update it the rest of the way to RHEL 5.3. yum and glibc need to be upgraded first, then everything else.
Since I know someone will try to point it out (someone always does), yes, it is generally better to cleanly install RHEL5 from scratch than to try to ugprade from RHEL4 in place (this is probably why RedHat disabled it on the newer installers). Some situations make that not easy (like this one, where I have no PXE available and nobody local to the machine to flip CDs for me, not to mention the lengthy amount of time the machine would be down if I had to restore a few dozen GB of backups remotely over the Internet after a clean reload). This is another one of those times that makes me wonder why Red Hat can’t be as easy to upgrade as Ubuntu is, which not only has a way to upgrade from one release to the next in place without even needing to boot directly into an installer, but fully supports and encourages that upgrade method. Red Hat has some learning to do here.
We only have 6 machines left on RHEL4 now, and one of those is a build machine that can just go away once we’re no longer having to support RHEL4. Almost there. 🙂