Localizing Asterisk for China

This last week, we set up a bunch of the IT infrastructure at Mozilla’s new Chinese office. My primary part of the process was setting up their phone system. We used Asterisk, of course, seeing as how we already use that for our phone systems in Mountain View and Toronto. Asterisk has a really cool feature that lets you put localized sound files in for the voice prompts, and each device and incoming phone line can be set up to default to a particular language. It will use the localized files for that language, if available, and fall back on the English ones if they aren’t. You can also allow users to change which language they get with a little careful scripting (“Press 1 for English”, etc). We set it up so that dialing from any phones in the Beijing office will get Chinese prompts, dialing into the Beijing office from outside will get Chinese prompts, and dialing into the Beijing office via the links to our other offices will get English prompts.

One of the difficulties we’ve run into is that there’s no official Chinese language pack for Asterisk, and the only unofficial Chinese language pack we could find is fairly incomplete. You’ll be listening to something in Chinese (like the instructions for voicemail) and suddenly get a word or two of English in the middle of it. 🙂 I ended up spending a fair portion of this week trying to set up a nice friendly web app the folks in the office can use to easily see which files have been localized and which haven’t, and allow them to record their own localized files and have them automatically go where they need to go. Other folks might find it useful, so I’ll try to get it posted somewhere once I get it fixed up a little (it’s a bit of a quick and dirty hack still right now, but it’s getting there).

Digging people

So yesterday afternoon, Alex Faaborg blogged about some new features in Firefox 3. No big deal until it got posted on digg.com. The blog server could take it. It’s in the load balancing cluster behind a caching proxy server, which doesn’t even notice this kind of traffic. But Alex had posted his images in his personal space on people.mozilla.com, which is a single server which isn’t really considered production critical on IT’s priority list. Now, even though this is a single server, it’s not exactly sucky hardware. The machine should have been more than capable of handling a slashdotting and getting dugg at the same time. So we were all pretty surprised when it fell over.

Apache kept dying, and spitting out errors about failing to setuid to the apache user. After much banging of heads, Justin Dolske found a relevant forum post in one of the Gentoo forums of all places, which pointed the finger at per-user process limits, and using ulimit in the initscript to override them. Using ulimit turned out not to be necessary, but it did get me looking in the right places.

Mozilla employees get shell accounts on people.mozilla.com (makes it easier for them to manage the webspace there, and several folks use it to run irssi in screen to keep a session to irc.mozilla.org open). In order to keep users from bogging down the machine, we had used pam_limit to limit user logins to 100 processes per user in /etc/security/limit.conf. Well, it turns out that this limit applies to both root and apache as well. So when apache spawned that 100th process to handle that many concurrent connections, it hit that limit and died. Now, root is immune to process limits, however, limits set for root still apply to any setuid processes spawned by root, if that limit is lower than the user being setuid to. So setting a specific (higher) limit for apache in limit.conf wasn’t enough. Had to bump it up for root as well.

But that did the job. The site was back up in no time, happily serving all the images any Digg user could want to go with Alex’s blog, and still keeping an 0.03 load average. Next time someone’s images get posted to Digg or Slashdot, people.mozilla.com will be ready.

Cron job output overload

We have a mailing list at Mozilla which receives mail sent to root at any of our servers. The majority of this mail is cron job output. I have filters set up in my Zimbra account to filter the cron job mail specifically into a folder separate from the rest of the mail to that mailing list. I was on vacation last week, and the last day before I left, I completely deleted the contents of that folder. On my return, that folder contained 26,373 messages in it, all dated within the last week. Trying to separate the nuisance mail from the real problems is kind of impossible by hand with that volume.

Obviously one task is to eliminate the nuisance mail. This has to be done carefully, because typically you still want to get errors from cron jobs, but you don’t want the general output. And not all jobs are good about their use of standard error and standard output, so often you can’t just devnull the standard out and expect to only get mail when there’s a problem. So fixing the nuisance mail sometimes means writing a wrapper script for a cron job that does some grep or awk work to filter the output. But even with the nuisance mail gone, it’s a lot of mail to sift through to find any possible real problems.

So, I filed bug 377043 with an idea for a tool to do some automated analysis of all this cron job output. Keep track of patterns and point out things that need looking at, etc. Unfortunately both cron jobs and data analysis are pretty popular topics (and usually not related to each other) so Google isn’t helping me much trying to search for existing tools. Does anyone know of any existing tools that do something similar to this that we might either be able to use, or build upon?

Vacation and work travel

So this last week I’ve been on vacation, but just hanging out at home hoping to catch up on some things.  One of the projects I’ve been working on this week is trying to write a driver for lirc to use a USB-attached IR receiver on Mac OS X.  One of my MythTV boxes is running on Mac OS X, and it’s annoying to have the little white Apple remote be the only one that works on it (it’s a nice simple remote, but there’s just not enough buttons on it to be useful for a full entertainment center).  I’ve been hoping to get that driver working before I left so my wife could use a real remote while I’m gone.  Not quite there yet, not much time left.  I made major progress on it this afternoon though while the kids were watching the new movies they got in their Easter baskets.

Tomorrow afternoon (Monday) I leave to head out to Mountain View for our quarterly all-hands meeting at Mozilla.  The following week I’ll be attending an Asterisk training program put on by Digium in San Jose (teach me everything I need to know to run the PBX system at Mozilla), so I’ll be away from home for two weeks.  It’s always fun visiting Mozilla, but it’s not going to be fun being away from the family for that long.

CVS Checkin mail

I just checked in a change to the script the Mozilla cvs server uses to send email. It’s one that’s been a long time coming, to get it to use a more reliable way to send the email so that high load conditions won’t prevent the message from delivering. I don’t expect any trouble with it, but if you’re on one of the mailing lists or newsgroups that get new check-in messages, do let me know if you notice anything strange.

Thanks!