The AWS Import Saga, or “Why does this computer shit have to be so hard?”

I’ve got a very large data set that we’re trying to move into AWS, and we did the math to realize that at the speeds we’re getting from our network connection it’ll take at least 4-5 months of transfer time to get the whole data set over to S3.

Amazon has a nice feature called Import/Export, that lets you take advantage of the old adage “never underestimate the bandwidth of a station wagon full of backup tapes”, but updated to modern times. Basically you ship them hard drives, and they suck the data off of them directly into S3.

So we’ve got a couple 3TB disks lying around, and figure every 3TB we transfer is about 4 weeks of data transfer time we save. Sounds great, right?

Well, not so fast. First thing to know is that those 3TB USB drives, unlike most other large drives, have a 4K sector size instead of the usual 512. That’s to make them compatible with older systems that don’t understand how to count past 4 billion sectors.

Of course, there’s an implicit “windows” there… using 4K sectors works with older versions of windows apparently, but not with older versions of Linux. Specifically RHEL5, which is where a bunch of the data lives.

Well, not a problem, just build a new parted and use the partprobe utility that ships with it… it’s a userspace tools problem on rhel5, not a kernel problem. At least for up-to-date rhel5… all bets are off for 5.5 and below.

But wait, the data’s considered sensitive, and we’d really like to encrypt it in transit. Fortunately import/export has the solution, and that solution is Truecrypt. Of course, Truecrypt has been dead since the end of May, but it’s still the only solution that Import/Export supports.

Sigh. Ok. Download truecrypt 7.1 for windows, start formatting up the disk ntfs (because my ntfs driver should work with Linux too). Realize that the process is going to take 2 solid days in un-interruptible mode. Try again by making an ntfs volume then telling to convert. 3 days, but can be paused.

I don’t even know if the ntfs3g driver will play nicely on Linux with Truecrypt, so … open a support case with AWS asking for guidance. They say that even though it’s not listed, truecrypt+ext4 is supported. Ok. Native Linux it is. Plug the USB drive into my RHEL6 box and let’s go.

Install Truecrypt 7.1 on my RHEL6 workstation? Check. Point it at /dev/sdb1? Check. Tell it to go? DENIED. Apparently on kernels less than 2.6.33 and devices over 2TB, you have to disable the “kernel cryptographic services” integration. RHEL6, of course, runs 2.6.32…

OK. Go through the truecrypt setup options, disable kernel cryptographic services integration. Yes, I know it might slow me down, but I’m on a USB2 port anyway, how much slower are we talking?

I never got to find out, because apparently for 4K sector sizes, you must use kernel cryptographic services. But for 2TB drives, you can’t use rhel6 kernel cryptographic services.

Ok. So I’m going to try something else. Install Fedora 20 on a kvm guest, pass through the whole USB device to that guest, and run truecrypt there. Fetch the iso, install qemu-kvm, libvirtd, virt-manager, get the iso to local machine, set up, install fedora …. le sigh… it’s the gnome3 spin. Intolerable, and unusable in my virt-manager-over-vnc-over-vpn environment. OK. Install updates, mate and lxde.

584 updates. And now we play the waiting game …

I feel like a hornet trapped inside a window. I’m strong, I’m formidable, I’m a bit angry, and I just can’t quite seem to figure out that the thing I’m bouncing off is a pane of glass and that I’ll just need to find a different way out. Maybe the fact that I keep bouncing, keep trying new approaches is that whole tenacity thing. Or maybe I’m just a dumb bug for not seeing that a foot to the left the window is open and I’m just completely missing it.

Ok, so 584 updates and 200ish other package installs later (or, an hour and change), reboot into a desktop that has buttons I can actually click in a guest. Run truecrypt installer. Attach USB host device of the disk I’m trying to work with. Truecrypt, format the partition, quick format (because I did a zero-pass earlier and don’t care about disclosing how much data is on the disk). Format ext4. Password. Check. Formatting….. hanging…. hanging ….. success!

Mount the volume. now let’s see how bad it is. DD out a 4 gig file from /dev/zero onto the mounted volume, wrap in ‘time’, pump through ‘pv’ for a status. And …. pv isn’t showing up. Hanging. Hanging. Hanging. Waiting. Oh, output! 480 megs? Ok. Hanging….. another 400 megs? Ok…. hanging…. another 300?

Yeah, so after those hoops, the “run it in a vm with usb passthrough” approach works. But gods is it awful….

OK. So when I just dump data onto a LUKS-encrypted device natively on my linux workstation, I get about 30MB/sec. Which is close to line speed of usb2, if less than half the expected write speed of the disk (it’s usb3 and if I had a port that fast this disk would be glad to suck down data at 60+). Adding KVM overhead for usb passthrough, truecrypt, and whatever else fedora might be bringing to the mix, I’m seeing … between 1 and 2 MB/sec. At this rate, it’d be as fast to just pump the data straight to S3 over the network.

Why’s this shit gotta be so hard?

radio streams for an alarm clock function

So a long time ago, I realized that I wake up better with music helping me along. But the last year or so I’ve been using my cell phone as an alarm clock (’cause it’s always on a charged battery and has good clock synch).

Yesterday I tried the music thing again, using an at-job ( at(1p) in the man pages) and mpg123, and it worked great. But pre-selecting a track is sorta lame, so I thought “maybe I can do a shuffle thing, using my old radio scripts” — until I realized that that’d require both work and more metadata-parsing, which I don’t feel like doing ’cause I need to wake up in 7 hours. So I thought “radio stream!”

So I grabbed a PLS from one of the most salient radio stations I could think of, and tossed mpg123 at it. Of course, it didn’t work. So I tried tossing the stream url itself at it, and it didn’t work because the stream’s aac+. Bleh.

Then I tried mplayer … ’cause mplayer’s natively command-line. And it worked from the console, so I thought things would be good. Then I put it in a test at-job, and it didn’t fire off, because while mplayer’s natively a console app it’s also natively interactive. So that was out, unless I launched it in screen… but that would just get complicated.

Then I remembered. mpg123 is a very unixy tool (as opposed to linuxy). So I built mpg321, and tried with that. SUCCESS!

So there you have it: if you want to listen to radio streams from at-jobs, remember mpg321!

And in 7 hours and 5 minutes, my at job will go off after an initial alarm clock ring, and I’ll drag myself up and go fight traffic down to my new job!

mpm-worker versus mpm-prefork, and mod_php versus fastcgi

*Caution: nothing but geek-content here*

So Apache 2.2 has a couple of “stable” mpm’s, namely prefork and worker.

Prefork is the old tried-and-true method, where the server spawns $StartServers httpd processes, and on-demand starts additional up to $MaxClients. Each subprocess handles $MaxRequestsPerChild requests, then dies and is replaced as needed.

Worker, on the other hand, starts $StartServers httpd processes, and on each process runs up to $ThreadsPerChild. Each thread serves requests (the same as processes on prefork), and when any given process’s child threads hit $MaxRequestsPerChild, the process kills its idle threads, unpools its working threads and waits for them to finish, then dies.

Sounds cool, right? Threads are lighter than processes, so having 5 processes running 20 threads each sounds better than having 100 processes. I got caught up in that and decided to try it on a large-ish site (namely gotwoot, and the 20-odd other sites hosted on that box).

Well, it turns out Worker isn’t as stable as I was hoping. At least for me… one of the sites we host is a huge sender of fairly large files and file streams. When apache processes under mpm-worker try to die, they wait until all the threads are done sending… but if the children are sending for hours, it’s going to take hours for these otherwise-defunct processes to die. If we were talking about just one process taking that long to die, it wouldn’t be a big deal because it wouldn’t interact with anything else. But somehow in that interaction mode, various modules start behaving badly.

So I was running mod_php (only using threadsafe modules) on mpm-worker, and every couple of days I’d see random problems. Sometimes it was zero-byte replies from php, other times it was php segfaulting, still other times it was apache itself dropping empty page replies.

I got sick of that, so I switched over to mpm-prefork and php on fastcgi. Things seem to be better now… it gives me user-mode php, and because of that APC maintains a per-site cache. It’s running fcgi processes on demand, which is also cool ’cause if a site doesn’t get traffic, it doesn’t keep a running php process.

Overall, my system load is a bit lower and things just “feel” more stable with prefork+fcgi. In the next couple days I should actually _see_ whether it’s more stable or not… but either way, I guess there’s value in feelings too :p.

Fighting syn floods with iptables

The tracker I admin seems to be undergoing a bit of a syn flood, and the tcp_syncookies tools weren’t helping. My ip_conntrack table was filling at 65536 connections, and the tracker just wasn’t talking to anyone as a result. Lots of packets falling out, lots of full tables. Not sure if it’s intentional or if it’s just a bunch of clients behaving badly… big fan of Hanlon’s Razor there though…

Detection: looked in /proc/net/ip_conntrack. Noticed that the connections listed were mostly SYN_SENT from some src, and “UNREPLIED” status. Many source ip’s. Verified that the tracker was functional (by temporarily firewalling off all inbound syn traffic to it except my own ip addy). Saw lots of packets getting dropped.

Mitigation attempt 1: tuned /proc/sys/net/ip_conntrack_max and /proc/sys/net/ipv4/netfilter/{many settings} in /etc/sysctl.conf, turning up maxes and turning down timeouts. This should more aggressively expire lost tcp connections. Also added a rule: iptables -t raw -A OUTPUT -p tcp --sport $TRACKER_PORT -j NOTRACK in an attempt to further lower the conntrack burden. Result: I no longer kept getting as many conntrack “table full” errors, but the tracker was still not talking to anyone.

Mitigation attempt 2: added two more firewall rules:
iptables -I INPUT 1 -p tcp --dport $TRACKER_PORT --syn -m hashlimit --hashlimit 10/min --hashlimit-burst 15 --hashlimit-name torrenthash --hashlimit-htable-size 2048 --hashlimit-htable-max 65536 --hashlimit-mode srcip -j ACCEPT
followed by
iptables -I INPUT 2 -p tcp --dport $TRACKER_PORT --syn -j DROP.
Result: the tracker appears to be talking again … I can see the web interface on it, and get peers on torrents hosted there without the help of DHT. It’s good times.

I imagine this technique could be used for quite a bit more than just protecting a tracker, so I suppose it’d be great to have it written down somewhere 🙂

Firefox X-forwarding weirdness…

Ok, so check it out. This is like … the single strangest thing I’ve run into in the wide world of linux.

So, you start a local firefox session, then ssh to a machine with a -X (enabling x11 forwarding). Then on the remote machine, you run firefox. You get … another locally-running firefox.

So you close both locally-running firefox sessions (and any others you might have) and invoke firefox on the remote machine. Now you get an X11-forwarded firefox running on the remote box (the expected behavior). And then you run firefox on the local machine, and you get …. another remotely-running firefox window.

Apparently, Xorg doesn’t differentiate between remotely-running windows and locally-running ones, and firefox catches any requests for a new X window named firefox, and instead of letting another copy be run, just makes a new window on the same firefox instance.

A little googling shows that the environment variable MOZ_NO_REMOTE controls this behavior — set it to 1 and firefox doesn’t lurk under the surface intercepting other instances that try to run.

distro blues

Ahh, the geekery. I wonder how many people this has happened to…

So basically, for the last couple years I lived almost exclusively in gentoo for all my linux needs. Sounds good, right? A single distro, albeit a source-based one. I got used to a lot of gentoo-isms. I had my hands in RHEL a bit, I poked a bsd or two a bit, but gentoo was definitely my area of expertise, and my home.

Now I’m not so sure.

See, at work, I run an ubuntu box for my desktop. I’m the only one there on ubuntu, so I’m kinda the odd-man out in that regard (the other sysadmins are on fedora). We’ve also got servers that are fedora, so it’d make sense to be there… but no~oo, I had to install ubuntu instead.

Regardless, the majority of the server functions at work are solaris boxes. So I’m simultaneously getting more comfortable with Ubuntu, getting more comfortable with Fedora, and learning a shitlot about Solaris.

Which is confusing.

See, Gentoo’s got /etc/conf.d. Everything that’s distro-specific is controlled out of there. Things like … default behaviors, network configs, what xdm tool should be called (eg: gdm, kdm, xdm), what options to pass iptables and where to save it, what options to pass in init scripts, etc. It gives a lot of flexibility in a single place, and it’s very clean.

But nobody else does that. At all.

Ubuntu’s got it’s configs strewn all over /etc. Fedora and RHEL shove a lot of, but not all of, their stuff in /etc/sysconfig. Solaris … hell, I still don’t have any idea for half of that stuff … if it’s not in SMF, it’s probably somewhere in /etc, or maybe /var/sadm, or possibly in some random db2 file or something.

But it gets worse. Mainly because of package managers. I am finding myself typing “aptitude search” when I want to find a package on fedora or gentoo, and typing “eix” when I want to find things in ubuntu. I have to remind myself “oh wait, this is ${DISTRO}, not ${OTHERDISTRO}” all the time. This is only exacerbated by the fact that I’ve been building a new fileserver at home, and out of my distros of choice (ubuntu and gentoo), only gentoo’s install cd worked cleanly on the new hardware. On the bright side, at least solaris doesn’t have a sane auto-updating package manager to work with at all, so there’s one less thing to think about.

So yeah, that’s my life these days :p

DEEAAAATH!!! (of yet another video card)

My last video card died in September 2005, in what I affectionately referred to as the september in computer hell. As you may recall (or find in the middle of that megapost), I picked up a Radeon X1600 Pro back when my 6200 failed out and died. And it was ok, had some compatibility issues, but it was a fairly stable and decent card that served me well…

Except that the fan died two months ago. Yeah, pretty lame. Another video card that died due to fan failure and not being able to find a suitable replacement.

Well, I’ve mostly been ignoring that lately, and using the card anyway. That finally caught up with me last night.

Basically, the chip was getting hot enough that the failed-out fan literally melted and distended from the heatsink. Pretty insane.

Why am I describing this though? I’ve got a digicam!
melty fan 1
melty fan 2

So yeah. Slightly less than a year out of that video card. I’m honestly just a little glad to be rid of it. Borrowed a roommate’s geforce 3 for a couple days, and I’ve got a 7600GS on the way. Nice thing is, since I live in Maryland now, I’m right near New Jersey, where Newegg’s shipping from. 1 day UPS ground FTW.

Can’t wait until I’ve got the money to buy new computers n’ stuff… you know … get out of the AGP platform generation and all that …