stage.mozilla.org take 3?

So as you may or may not have noticed, the stage.mozilla.org update previously advertised wound up getting partially reverted about a day after it was deployed.  After getting the full production load on it, we wound up crashing it several times again.  There’s just not much we can do to emulate real users using WinSCP to upload files from our load testing scripts. 🙁

We’ve gotten some new patches to the unionfs filesystem driver that attempt to fix some of the crashes we’ve been getting.  Unfortunately our only real way to test them is to throw it back into production and see what happens.  As such, over the next week or two, the stage.mozilla.org domain name will be swapping back and forth between the old machine and the new one periodically as we test things.  If you were following the directions given in the previous announcement this shouldn’t affect you at all, but I thought it would be good to give people a heads-up.  Obviously this means today’s deadline for the old machine to remain available has been averted, and it’ll probably still be around for another week or two at least.

If you absolutely need to reach the old machine, it’s at stage-old.mozilla.org.  The new machine is at stage-new.mozilla.org.  The stage.mozilla.org domain name could point at either one of them at any given time for the next week or two while we continue testing.  If possible it’d be great if you can continue to use stage.mozilla.org and follow where the domain points so you can help with the testing.  But if you run into any problems, feel free to use stage-old.mozilla.org just to guarantee the old way of access.

#build on irc.mozilla.org is the place to ask if you have questions or have any issues.

stage.mozilla.org moving *finally*

Migrating stage.mozilla.org to a new host has been in the works for a long time. It’s been stalled for a couple months while we worked with the developers of the unionfs filesystem (mostly by providing them with crash reports) to stabilize it to the point where we felt comfortable putting it to use on a production service. That time has finally come. The alternative to unionfs (keeping 9 terrabytes of disk space online at once, completely dedicated to the FTP staging process) was really not cost effective, and it was really worth waiting for.

This Thursday, March 13th, we’ll be switching the DNS for the stage.mozilla.org domain name to point at the new box. The SSH host keys have been copied over, so most people will never notice. The existing box, effective immediately, is accessible using the name stage-old.mozilla.org. If for any reason you’re nervous about switching your upload process to the new box this coming Thursday, you can use the stage-old name to keep using the old one, for now. Stage-old will go away on Tuesday, March 25th, so you’ll have until then to resolve any issues you have (or get IT to resolve them if you think it’s our problem). Feel free to file a bug against Server Ops if you need help.

Be sure to read my previous post for details of how the new system will work (it is a little different, though most well-behaved upload scripts shouldn’t be affected). The most noticeable change will be that there will now be a slight delay between when you upload something and when it shows up on the ftp server, since we’re now virus-scanning what you upload before it gets made available on the ftp server.  On the current system that we’re moving away from, the virus scan runs after the files are placed on the ftp server, and then the files are yanked afterwards if they get flagged. This of course, could let something get out there briefly, which we obviously don’t want.