Fixing a Bad SSH authorized_keys under Amazon EC2

I was doing some maintenance on the Amazon EC2 instance that underpins DLTJ and in the process managed to mess up the .ssh/authorized_keys file. (Specifically, I changed the permissions so it was group- and world-readable, which causes `sshd` to not allow users to log in using those private keys.) Unfortunately, there is only one user on this server, so effectively I just locked myself out of the box.

$ ssh -i .ssh/EC2-dltj.pem me@dltj.org
Identity added: .ssh/EC2-dltj.pem (.ssh/EC2-dltj.pem)
Permission denied (publickey).

After browsing the Amazon support forums I managed to puzzle this one out. Since I didn't see this exact solution written up anywhere, I'm posting it here hoping that someone else will find it useful. And since you are reading this, you know that they worked.

Solution Overview

Basically we've got to get the root filesystem mounted on another EC2 instance so we can get access to it. I'm using placeholder identifiers like i-target, i-scratch, and vol-rootfs in place of real values.

Stop the target EC2 instance (i-target).
Note the location of and unmount its root filesystem, and detach its EBS volume (vol-rootfs) from the target instance (i-target).
Attach the volume (vol-rootfs) on another EC2 instance (i-scratch) and mount the filesystem.
Change the file permissions (or whatever needs to be done).
Unmount the filesystem and detach the volume (vol-rootfs) from the other EC2 instance (i-scratch).
Attach the volume (vol-rootfs) to the target EC2 instance (i-target) and start it.

Assuming you've got all of the environment variables set up with the appropriate AWS credentials, these are the commands:

Stop the Target Instance

1	`$ ec2-stop-instances i-target`

Detach Root EBS Volume

A couple of steps here. We need to remember where the root filesystem is mounted so we can put it back at the end. So first get a description of the instance. It will look something like this.

$ ec2-describe-instances i-instance
INSTANCE    i-instance  ami-xxxxxxxx    ec2-[your-IP].compute-1.amazonaws.com   [...lots of other stuff....]
BLOCKDEVICE /dev/sdh    vol-datafs      2011-07-12T01:37:21.000Z
BLOCKDEVICE /dev/sda1   vol-rootfs      2011-07-12T01:37:21.000Z

In this case we need to remember /dev/sda1. (Note that we can ignore the vol-datafs -- on my instance it is where the database and other data is stored. If you don't know which volume is your root volume, you might be facing some trial and error in the steps below until you find it.) Now we detach it:

1	`$ ec2-detach-volume vol-rootfs`

Attach Volume Elsewhere

This set of instructions assumes that you have another EC2 instance running somewhere else. If you don't have one, start a micro instance for this purpose then terminate it when you are done. We're going to attach it as /dev/sdf.

$ ec2-attach-volume vol-rootfs --instance i-scratch -d /dev/sdf

Now log into i-scratch and mount the volume.

$ mount /dev/sdf /mnt

Make Changes

In my case:

$ chmod 600 /mnt/home/me/.ssh/authorized_keys

Unmount/Detach from i-Scratch

While still on the i-scratch server:

1	`$ umount /mnt`

Detatch from the scratch server.

1	`$ ec2-detach-volume vol-rootfs`

Reattach the Volume and Start the Server

We're on the home stretch now. Note that in the first command we're using the mount point we found in the second step.

$ ec2-attach-volume vol-rootfs --instance i-target -d /dev/sda1
$ ec2-start-instances i-target

After the instance starts, you should be able to log in. If not, go through the steps again and read the syslog files in /var/log to figure out what is going on.