The AWS Import Saga, or “Why does this computer shit have to be so hard?”

I’ve got a very large data set that we’re trying to move into AWS, and we did the math to realize that at the speeds we’re getting from our network connection it’ll take at least 4-5 months of transfer time to get the whole data set over to S3.

Amazon has a nice feature called Import/Export, that lets you take advantage of the old adage “never underestimate the bandwidth of a station wagon full of backup tapes”, but updated to modern times. Basically you ship them hard drives, and they suck the data off of them directly into S3.

So we’ve got a couple 3TB disks lying around, and figure every 3TB we transfer is about 4 weeks of data transfer time we save. Sounds great, right?

Well, not so fast. First thing to know is that those 3TB USB drives, unlike most other large drives, have a 4K sector size instead of the usual 512. That’s to make them compatible with older systems that don’t understand how to count past 4 billion sectors.

Of course, there’s an implicit “windows” there… using 4K sectors works with older versions of windows apparently, but not with older versions of Linux. Specifically RHEL5, which is where a bunch of the data lives.

Well, not a problem, just build a new parted and use the partprobe utility that ships with it… it’s a userspace tools problem on rhel5, not a kernel problem. At least for up-to-date rhel5… all bets are off for 5.5 and below.

But wait, the data’s considered sensitive, and we’d really like to encrypt it in transit. Fortunately import/export has the solution, and that solution is Truecrypt. Of course, Truecrypt has been dead since the end of May, but it’s still the only solution that Import/Export supports.

Sigh. Ok. Download truecrypt 7.1 for windows, start formatting up the disk ntfs (because my ntfs driver should work with Linux too). Realize that the process is going to take 2 solid days in un-interruptible mode. Try again by making an ntfs volume then telling to convert. 3 days, but can be paused.

I don’t even know if the ntfs3g driver will play nicely on Linux with Truecrypt, so … open a support case with AWS asking for guidance. They say that even though it’s not listed, truecrypt+ext4 is supported. Ok. Native Linux it is. Plug the USB drive into my RHEL6 box and let’s go.

Install Truecrypt 7.1 on my RHEL6 workstation? Check. Point it at /dev/sdb1? Check. Tell it to go? DENIED. Apparently on kernels less than 2.6.33 and devices over 2TB, you have to disable the “kernel cryptographic services” integration. RHEL6, of course, runs 2.6.32…

OK. Go through the truecrypt setup options, disable kernel cryptographic services integration. Yes, I know it might slow me down, but I’m on a USB2 port anyway, how much slower are we talking?

I never got to find out, because apparently for 4K sector sizes, you must use kernel cryptographic services. But for 2TB drives, you can’t use rhel6 kernel cryptographic services.

Ok. So I’m going to try something else. Install Fedora 20 on a kvm guest, pass through the whole USB device to that guest, and run truecrypt there. Fetch the iso, install qemu-kvm, libvirtd, virt-manager, get the iso to local machine, set up, install fedora …. le sigh… it’s the gnome3 spin. Intolerable, and unusable in my virt-manager-over-vnc-over-vpn environment. OK. Install updates, mate and lxde.

584 updates. And now we play the waiting game …

I feel like a hornet trapped inside a window. I’m strong, I’m formidable, I’m a bit angry, and I just can’t quite seem to figure out that the thing I’m bouncing off is a pane of glass and that I’ll just need to find a different way out. Maybe the fact that I keep bouncing, keep trying new approaches is that whole tenacity thing. Or maybe I’m just a dumb bug for not seeing that a foot to the left the window is open and I’m just completely missing it.

Ok, so 584 updates and 200ish other package installs later (or, an hour and change), reboot into a desktop that has buttons I can actually click in a guest. Run truecrypt installer. Attach USB host device of the disk I’m trying to work with. Truecrypt, format the partition, quick format (because I did a zero-pass earlier and don’t care about disclosing how much data is on the disk). Format ext4. Password. Check. Formatting….. hanging…. hanging ….. success!

Mount the volume. now let’s see how bad it is. DD out a 4 gig file from /dev/zero onto the mounted volume, wrap in ‘time’, pump through ‘pv’ for a status. And …. pv isn’t showing up. Hanging. Hanging. Hanging. Waiting. Oh, output! 480 megs? Ok. Hanging….. another 400 megs? Ok…. hanging…. another 300?

Yeah, so after those hoops, the “run it in a vm with usb passthrough” approach works. But gods is it awful….

OK. So when I just dump data onto a LUKS-encrypted device natively on my linux workstation, I get about 30MB/sec. Which is close to line speed of usb2, if less than half the expected write speed of the disk (it’s usb3 and if I had a port that fast this disk would be glad to suck down data at 60+). Adding KVM overhead for usb passthrough, truecrypt, and whatever else fedora might be bringing to the mix, I’m seeing … between 1 and 2 MB/sec. At this rate, it’d be as fast to just pump the data straight to S3 over the network.

Why’s this shit gotta be so hard?