Thursday Threads: Developer Genders, Facebook Release Engineering, Alcohol Among Technologists

Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

You’ll get the sense that this week’s Thursday Threads is stacked towards cultural awareness. First is the view of a developer of the female gender in a room of peers at a meeting of the Digital Public Library of America. The second thread is a pointer to a story about Facebook’s software release process, and it leads into a story about the role of alcohol in technology conferences and reflections from the library technology community.

DLTJ Thursday Threads is a weekly summary of technology, library, and publishing topics (and anything else that crosses my path that is worth calling out). Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. New this year is that Pinboard has replaced FriendFeed as my primary aggregation service. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

An Inclusive Table

But here I am, with a constant background obsession, now, of how to get more librarians involved (and involved more deeply) in tech, how to foster collaboration on library technology projects, which is inseparable from the problem of how to get more women involved more deeply and collaboratively in technology. So I can’t not look at that room and see how the status lines fracture, along code mastery but coincidentally also gender, written in the physical geography of the room, where I’m the only one sitting at the table. I can’t not wonder, how can I create spaces which redraw those lines.

Andromeda attended the DPLA hackathon last Thursday and posted this very pointed view of the perceptions of women in library technology.

A Behind-the-Scenes Look at Facebook Release Engineering

I recently had a unique opportunity to visit Facebook headquarters and see that story in action. Facebook gave me an exclusive behind-the-scenes look at the process it uses to deploy new functionality. I watched first-hand as the company’s release engineers rolled out the new “timeline” feature for brand pages.

That was where I met Chuck Rossi, the release engineering team’s leader. Rossi, whose workstation is conveniently located within arm’s reach of the hotfix bar’s plentiful supply of booze, is a software industry veteran who previously worked at Google and IBM. I spent a fascinating afternoon with Rossi and his team learning how they roll out Facebook updates—and why it’s important that they do so on a daily basis.

I’m pointing to this story for two reasons. First, it is a fascinating look at how one of the top internet operations manages its processes for rolling out new software. Second, how the wheels of the release process are greased feeds into the third story below.

Our Culture of Exclusion

Lately there have been a lot of great articles being written and discussion happening around sexism in the tech industry. And the flames are being fanned by
high profile incidents of people saying and doing just plain stupid things.

It reminded me of this draft post just sitting here, uncommitted. For quite a while I’ve been collecting links, tweets and other stuff to illustrate another problem that’s been affecting me (and other people, surely). I thought it was finally time to write the post and bring this up because, honestly, I feel excluded too.

Our Culture of Exclusion, Ryan Funduk’s blog

The role of alcohol in technology events was a topic of discussion on Twitter and elsewhere at the end of last week. There is a term for this that I heard for the first time last week — brogrammer — and I don’t think it is a flattering persona for the technology profession. The way in which Facebook releases its code, described in the thread above, is one data point. Ryan’s message, quoted above, points to some high-profile conferences where alcohol seems to play a central part of the event. His article was the source of some introspection among the Code4Lib community as well.

Code4Lib Discussion of "Culture of Exclusion"

Prompted by Ryan Funduk’s "Culture of Exclusion" post ( about the prevelance of alcohol and alcohol extremes at technology conferences, members of the Code4Lib community pondered what this means for our own events.

Storified by Peter Murray · Wed, Apr 11 2012 23:09:38

"No piles of meat, bongs or lube either-none of this belongs in a place of business." On brogrammers. HT @cazzerson #fbEmily M.
2 takes of ppl who don’t drink at conferences: and I’m personally more inclined to @scalzi’s.John Mark Ockerbloom
@JMarkOckerbloom Interesting to think about in terms of #code4lib, at least for me.Mark Matienzo
…but I can understand @rfunduk’s take too. Confs vary,, but at ones I go to ppl don’t give me grief for skipping the alcohol at socials.John Mark Ockerbloom
@anarchivist Haven’t made it to C4L, so can’t comment. Most confs I go to have events w alcohol, not everyone has it, & that seems fine.John Mark Ockerbloom
This post (thanks @JMarkOckerbloom!) resonated w me: I like a good cocktail, but events shouldn’t be all about drinks.Leslie Johnston
@anarchivist @JMarkOckerbloom The bringing and drinking of specialty beers is one of the most visible #code4lib activities to those outside.Leslie Johnston
@anarchivist @JMarkOckerbloom And if you’re not already in the know about cask ales or regional producers, it can feel a bit exclusionary.Leslie Johnston
@lljohnston @anarchivist @JMarkOckerbloom I’ll admit when I read that, c4l was the first lib conference that came to mindSarah Shreeves
@sshreeves @lljohnston @jmarkockerbloom the craft beer drink up (as it was in 2011 and 2012) is a recent addition. Some ppl tried it [+]Mark Matienzo
@sshreeves @lljohnston @jmarkockerbloom because it was done at other confs. Not to say alcohol centric socializing didnt at c4l before. [-]Mark Matienzo
@anarchivist @sshreeves @jmarkockerbloom I def know that. Just saying it’s become of the most visible events to non-attendees. (1/2)Leslie Johnston
@anarchivist @sshreeves @jmarkockerbloom With the planning via twitter and tweeted images of loaded suitcases and rows of empty bottles.Leslie Johnston
@lljohnston @anarchivist @sshreeves @jmarkockerbloom Also – totally not saying c4l is the only place this happens, or knocking c4l at all.Leslie Johnston
@lljohnston @sshreeves @jmarkockerbloom understood/agreed. I’m implicated as I have organized& promoted those parts. Still have concerns.Mark Matienzo
Skimming tweets about code4lib craft beer meetu. Ever concern about wine tastings at ALA being exclusionary to folks who don’t know wine?Jon Gorman
@codexmonkey I think as @lljohnston said it’s the visibility – totally agree this happens at other confsSarah Shreeves
@anarchivist @sshreeves @lljohnston @jmarkockerbloom the topic is fascinating to me. I always saw it as an inclusive, learning experience.Declan Fleming
@anarchivist @sshreeves @lljohnston @jmarkockerbloom interesting to see it cast as exclusive. Don’t like ppl feeling excluded.Declan Fleming
@lljohnston @anarchivist @JMarkOckerbloom: Fortunately folks behave well at these events. Should reinforce these are tastings not binges.Michael J. Giarlo
@lljohnston @anarchivist @JMarkOckerbloom: And I don’t react well to hearing our tastings are exclusive, so I’ll shut up at this point.Michael J. Giarlo
@anarchivist @sshreeves @lljohnston @jmarkockerbloom: Vegetarian-centric socializing happens as well though admittedly not at same scale.Michael J. Giarlo
@anarchivist @lljohnston @sshreeves @jmarkockerbloom: I agree w/ this, but some folks are extremely sensitive to alcohol & won’t be cmfrtblMichael J. Giarlo
@mjgiarlo @anarchivist @lljohnston @sshreeves @jmarkockerbloom next year: craft cheese.Dan
@danwho @anarchivist @lljohnston @sshreeves @jmarkockerbloom: But that excludes the lactose intolerant!Michael J. Giarlo
@danwho @anarchivist @lljohnston @sshreeves @jmarkockerbloom: Maybe we should have a "we breathe" or "let’s do taxes" gathering.Michael J. Giarlo
@mjgiarlo @anarchivist @lljohnston @sshreeves @jmarkockerbloom c4l does not condone intolerance.Dan
@JMarkOckerbloom @anarchivist I’ve been to academic conferences where alcohol is much more prevalent than in library conferences. 1/2Becky Yoose
@JMarkOckerbloom @anarchivist 2/2 There’s an academic conf where free alcohol flows for entire conf. Ex – business meetings have open bars.Becky Yoose
@mjgiarlo @JMarkOckerbloom @lljohnston @declan @danwho @yo_bj For the sake of arg; let’s say tasting = separate. Code4lib = super social [+]Mark Matienzo
@mjgiarlo @JMarkOckerbloom @lljohnston @declan @danwho @yo_bj conference. Some equate social w/ availability of alcohol; It’s obviously [+]Mark Matienzo
@mjgiarlo @JMarkOckerbloom @lljohnston @declan @danwho @yo_bj not necessarily "expected, but C4L = social & social @ c4l often invloves EtOHMark Matienzo
@danwho @mjgiarlo @anarchivist @lljohnston @sshreeves @jmarkockerbloom Well, we *will* be near Wisconsin next year. I have connections.Becky Yoose
@yo_bj @mjgiarlo @anarchivist @lljohnston @sshreeves @jmarkockerbloom barrel aged munster? ;)Dan
@anarchivist @lljohnston @sshreeves @JMarkOckerbloom why? It already sells out instantly. Obv there is a big market for current style.Jenny Reiswig
Talk of #code4lib and social reminds me I’m hoping to play some board games for #code4lib13. Lot easier to bring when driving ;-)Jon Gorman
@anarchivist @lljohnston @sshreeves @JMarkOckerbloom that’s halfway just a devils advocate reply btw.Jenny Reiswig
RE: discussions of C4L + Beer. I love the beer swaps, but think they are a bit exclusionary. No alternative gathering on same night/time [+]Tim Donohue
Maybe that handful of blog posts and tweet streams will alter human social behavior that spans cultures and generations, we’ll see.Michael J. Giarlo
Plus advertised as "come drink beer with us", rather than "come hang out & meet folks & if interested try some new beer" [-]Tim Donohue
@anarchivist @mjgiarlo @JMarkOckerbloom @declan @danwho @yo_bj Ad the super-social aspect is def one of its best qualities as a conference.Leslie Johnston
@mjgiarlo @danwho @anarchivist @lljohnston @sshreeves @jmarkockerbloom Yoose
And now I’m craving fresh string cheese. Damn you, #code4lib.Becky Yoose
@mjgiarlo @danwho @jmarkockerbloom @lljohnston @sshreeves @yo_bj Honestly, I think that’s not a fair comparison, but whatevs.Mark Matienzo
Last comment on C4L + Beer. I think it’d do wonders to call it something like Code4Lib "Happy Hour" or "Social" rather than "DrinkUp"Tim Donohue
@anarchivist @danwho @jmarkockerbloom @lljohnston @sshreeves @yo_bj: It’s not. Maybe I’ve lost too many brain cells. I wonder how.Michael J. Giarlo
@mjgiarlo @danwho @jmarkockerbloom @lljohnston @sshreeves @yo_bj I blame the pork.Mark Matienzo
@timdonohue: That’s the great thing about code4lib: if anyone’s willing to step up and make that change, it’ll happen.Michael J. Giarlo
@anarchivist @mjgiarlo @jmarkockerbloom @lljohnston @sshreeves @yo_bj it hard to deconstruct an event (ritual?) that grew organically.Dan
@mjgiarlo just feedback to "owners" (usual organizers) of "DrinkUp". A bit part is just in how it is advertised. Emphasize social over beerTim Donohue
@timdonohue: No, I appreciate the feedback, Tim. Wasn’t trying to hit you with a "patches welcome." That is how #code4lib works, it seems.Michael J. Giarlo
@mjgiarlo that being said, I’m a huge fan of the craft beer parts. :)Tim Donohue
@mjgiarlo thanks for clarifying. Final thought: there is such a thing as "craft soda" too. Perhaps it need not be limited to beerTim Donohue
@timdonohue: It needn’t, I agree, and we’ve had plenty of folks bring soda, baked goods, snacks, eau de vie, etc.Michael J. Giarlo
@rfunduk Great blog post. You may be interested to know that librarians are a bit like that too. Restrained example: M.
@mjgiarlo coolio :) I didn’t realize that.Tim Donohue
Further thought: maybe ppl organize drinking events at confs to include newbies rather than have a secret clique event. @rfunduk @cazzersonEmily M.
@bradamant @rfunduk Drinking culture is prevalent beyond tech fields. I’ve been to academic confs where drinking went nonstop for days.Becky Yoose
This. RT @bradamant: Further thought: maybe ppl organize drinking events at confs to include newbies rather than have a secret clique eventMichael J. Giarlo
@bradamant @rfunduk I feel that US culture surrounding alcohol is a big perpetrator in conf drinking, but I would need to do more research.Becky Yoose
@yo_bj @bradamant @rfunduk: And it’s not just libraries, or academics. It spans industries, cultures, and generations.Michael J. Giarlo
@bradamant @rfunduk: Does that page strike you as brogrammer-y? Sure, beer is mentioned, but so is food, and nightlife, and the venue, etc.Michael J. Giarlo
@bradamant @rfunduk I also forgot to mention anime/fandom conventions. Those get dangerous fast, since there are more underage attendees.Becky Yoose
@mjgiarlo @bradamant @rfunduk Yep. For non-drinking folks like myself, I’m sometimes left scratching my head wondering how it got to this.Becky Yoose
@yo_bj @mjgiarlo @bradamant @rfunduk I think a lot of people just don’t have enough socializing in their day to day lives…Alexander O’Neill
@yo_bj @mjgiarlo @bradamant @rfunduk … So conferences full of people who ‘get’ them and no family, etc., are a temping chance to cut looseAlexander O’Neill
Following discussion about alcohol at conferences and in particular @code4lib. Could ppl add to with their thoughts?Margaret Heller
@alxp @yo_bj @bradamant: I’m also not convinced what @rfunduk wrote about happens at e.g. #code4lib. Different phenomenon.Michael J. Giarlo
@alxp @yo_bj @bradamant @rfunduk: Can we please hashtag this #brewhaha?Michael J. Giarlo
Uncomfortable at a bar? Fashion your own teetotaler conf culture instead of advocating the destruction of another.
@mjgiarlo @yo_bj @danwho @anarchivist @sshreeves @jmarkockerbloom We do tend to grouse, it’s true.Leslie Johnston
@mjgiarlo @yo_bj @rfunduk Whoa, back from lunch! Good convo. I don’t think c4l is totally like that, but of all confs I attend: the most.Emily M.
@FeedJoelPie My feed is also talking about it, but for library code conferences.Margaret Heller
@mjgiarlo @yo_bj @rfunduk I’m no teetotaller, but find the seeming necessity of mentioning alcohol arrangements odd. Alcohol != socializing.Emily M.
@bradamant @yo_bj @rfunduk: Not sure which context you’re referring to here, "ours" (e.g. code4lib) or the IT brogrammer one.Michael J. Giarlo
@mjgiarlo @yo_bj @rfunduk Finally, re: expections and alcohol, I loved this article: M.
@mjgiarlo @yo_bj @rfunduk What I’m mulling is that a cross-profession culture/expectation of drinking is being reflected at prof events.Emily M.
@bradamant @yo_bj @rfunduk: I remember feeling quite alienated as a teetotaler (’til I was 26), till I realized I excluded *myself*.Michael J. Giarlo
@bradamant Now that I’ve read @rfunduk ‘s post I feel that those elements of C4L may come from code conference world a bit.Margaret Heller
@bradamant I wrote some of the copy on that page, but I want to make sure ppl have other low key social events. Hope to do cookie baking!Margaret Heller
@Margaret_Heller @bradamant At the Medical Library Association there’s a ton of drinking as well, but generally at vendor parties.Jenny Reiswig
@Margaret_Heller @bradamant a lot of folks do like a drink when they socialize. Not gonna lie, I’m one of them.Jenny Reiswig
@Margaret_Heller @bradamant But I do agree it needs to be optional and not expected, or the only social option.Jenny Reiswig
@Margaret_Heller @bradamant Most of the folks I know who drink at confs drink just as much at home. Not gonna lie, that’s me too.Jenny Reiswig
@jenfoolery @bradamant I agree & certainly I do drink socially and at home. But do worry about unhealthy culture this encourages.Margaret Heller
@jenfoolery @bradamant which is to say, I’ve ended up getting more drunk around professional colleagues than my friends, which is weird.Margaret Heller
@jenfoolery @bradamant And probably due to a) shyness b) enjoying parties and c) wanting to fit in d) all of the above.Margaret Heller
@jenfoolery @Margaret_Heller MLA parties feel different to me. Maybe I don’t go to the good ones? Alcohol perfunctory, not selling point?Emily M.
@bradamant @Margaret_Heller I haven’t been to MLA since about 2003… maybe it’s calmed down. I remember some crazy Ovid parties.Jenny Reiswig

The text was modified to update a link from to on September 26th, 2013.

Fixing a Bad SSH authorized_keys under Amazon EC2

I was doing some maintenance on the Amazon EC2 instance that underpins DLTJ and in the process managed to mess up the .ssh/authorized_keys file. (Specifically, I changed the permissions so it was group- and world-readable, which causes `sshd` to not allow users to log in using those private keys.) Unfortunately, there is only one user on this server, so effectively I just locked myself out of the box.

$ ssh -i .ssh/EC2-dltj.pem
Identity added: .ssh/EC2-dltj.pem (.ssh/EC2-dltj.pem)
Permission denied (publickey).

After browsing the Amazon support forums I managed to puzzle this one out. Since I didn’t see this exact solution written up anywhere, I’m posting it here hoping that someone else will find it useful. And since you are reading this, you know that they worked.

Solution Overview

Basically we’ve got to get the root filesystem mounted on another EC2 instance so we can get access to it. I’m using placeholder identifiers like i-target, i-scratch, and vol-rootfs in place of real values.

  1. Stop the target EC2 instance (i-target).
  2. Note the location of and unmount its root filesystem, and detach its EBS volume (vol-rootfs) from the target instance (i-target).
  3. Attach the volume (vol-rootfs) on another EC2 instance (i-scratch) and mount the filesystem.
  4. Change the file permissions (or whatever needs to be done).
  5. Unmount the filesystem and detach the volume (vol-rootfs) from the other EC2 instance (i-scratch).
  6. Attach the volume (vol-rootfs) to the target EC2 instance (i-target) and start it.

Assuming you’ve got all of the environment variables set up with the appropriate AWS credentials, these are the commands:

Stop the Target Instance

$ ec2-stop-instances i-target

Detach Root EBS Volume

A couple of steps here. We need to remember where the root filesystem is mounted so we can put it back at the end. So first get a description of the instance. It will look something like this.

$ ec2-describe-instances i-instance
INSTANCE	i-instance	ami-xxxxxxxx	ec2-[your-IP]	[...lots of other stuff....]
BLOCKDEVICE	/dev/sdh    vol-datafs      2011-07-12T01:37:21.000Z
BLOCKDEVICE	/dev/sda1   vol-rootfs      2011-07-12T01:37:21.000Z

In this case we need to remember /dev/sda1. (Note that we can ignore the vol-datafs — on my instance it is where the database and other data is stored. If you don’t know which volume is your root volume, you might be facing some trial and error in the steps below until you find it.) Now we detach it:

$ ec2-detach-volume vol-rootfs

Attach Volume Elsewhere

This set of instructions assumes that you have another EC2 instance running somewhere else. If you don’t have one, start a micro instance for this purpose then terminate it when you are done. We’re going to attach it as /dev/sdf.

$ ec2-attach-volume vol-rootfs --instance i-scratch -d /dev/sdf

Now log into i-scratch and mount the volume.

$ mount /dev/sdf /mnt

Make Changes

In my case:

$ chmod 600 /mnt/home/me/.ssh/authorized_keys

Unmount/Detach from i-Scratch

While still on the i-scratch server:

$ umount /mnt

Detatch from the scratch server.

$ ec2-detach-volume vol-rootfs

Reattach the Volume and Start the Server

We’re on the home stretch now. Note that in the first command we’re using the mount point we found in the second step.

$ ec2-attach-volume vol-rootfs --instance i-target -d /dev/sda1
$ ec2-start-instances i-target

After the instance starts, you should be able to log in. If not, go through the steps again and read the syslog files in /var/log to figure out what is going on.

Split Routing with OpenVPN

My place of work has installed a VPN that moderates our access to the server network using the OpenVPN protocol. This is a good thing, but in its default configuration it would send all traffic — even that not destined for the machine room network — through the VPN. Since most of what I do doesn’t involve servers in the machine room, I wanted to change the configuration of the OpenVPN client to only send the machine room traffic through the VPN and everything else through the (original) default gateway. As it turns out, this involves tweaking the routing tables.
Continue reading

Note to Future Self: Use `ssh -D` to bypass annoying interception proxies

Dear future self,

If you are reading this, you are remembering a time when you ran into a really nasty interception proxy1 and you are looking for a way around it. Do you remember when you were sitting in the Denver International Airport using their free wireless service? And remember how it inserted advertising banners in HTML frames at the top of random web pages as you surfed?

After about a half an hour of this, you started looking for solutions and found that the secure shell client can act as a SOCKS proxy2. Using ‘ssh’, you set up a tunnel between your laptop and a server in the office that encrypted and effectively hid all of your network communications from the interception proxy. And if you are reading this again you want to remember how you did it.

Set up the SOCKS proxy

SOCKS is a client protocol that can be used to tunnel all of your traffic to a remote host before it fans out across the internet. The OpenSSH client can set up a local SOCKS proxy that uses an ‘ssh’ session as the network tunnel. To set up the tunnel, use the -D option followed by a local port number:

ssh -D 9050 [username]@[]

To refresh your memory, here is an extract from the ‘ssh’ manual page for the -D option:

-D [bind_address:]port
Specifies a local “dynamic” application-level port forwarding. This works by allocating a socket to listen to port on the local side, optionally bound to the specified bind_address. Whenever a connection is made to this port, the connection is forwarded over the secure channel, and the application protocol is then used to determine where to connect to from the remote machine. Currently the SOCKS4 and SOCKS5 protocols are supported, and ssh will act as a SOCKS server. Only root can forward privileged ports. Dynamic port forwardings can also be specified in the configuration file.

Using the SOCKS proxy

MacOSX 10.5 Proxy screen

Next you need to tell the applications to use the SOCKS proxy. If you are still using a Mac when you are reading this, you’ll probably have it pretty easy. Mac OSX lets you set a proxy system-wide that all well-written Mac applications will use to get their parameters. It is in the “Proxies” tab of the Advanced… network settings. On Mac OSX version 10.5 (Leopard), it looks like the graphic to the right.

If you’re using some sort of UNIX variant, the application may have a setting to use a SOCKS client, or you may need to use the ‘tsocksshim that intercepts the network calls of the application. And, future self, if you are using a Microsoft Windows box right now, please remember how much simpler life was when you used a Mac or Linux desktop. If you find yourself in such a spot, some reader of this blog posting may have left a comment for you below that will help you use a SOCKS proxy with a Windows platform.

Hope this helps. Sincerely,

Self, circa February 2008


  1. Version of the “Proxy Server” Wikipedia page when this posting was written []
  2. Version of the SOCKS Wikipedia page when this posting was written []

DLTJ Updated, Readers Yawn

At least I hope that is the correct headline. I’ve been having some problems with this installation of WordPress lately — in particular, I could no longer activate or deactivate plugins — and the only solution offered in the WordPress codex was to start with a fresh installation of WordPress. Now you know how I spent my free time this weekend. While doing so, I updated the Barthelme theme and along the way gained some really need Semantic Web coolness to the underlying XHTML of the blog pages. The version of Barthelme is still a heavily, heavily hacked one, but hopefully the clean up this weekend will make it possible to keep up with new versions of the underlying theme files without major headaches. I also updated all of the plugins and cleaned out lots of old cruft in the plugins directory and in the theme files. As a result, the pages seem to load faster. Maybe that is just my wishful thinking.

Like the headline says, the average reader shouldn’t notice a difference. One thing I did change was the Permalink structure; I removed the year and month from the URLs and just put the postings behind the word “article”. There are HTTP 301 (“Permanently Moved”) redirects in place, so your browser will be quietly redirected to the new location. If you do notice anything wrong, let me know:

Also, a word of warning for those that try to fix their own WordPress blog, the process of getting your posts, content, and taxonomy terms from your old instance to your new instance is really, really broken. The suggested way to move content is by Exporting to a WordPress-specific XML format on your old blog, then importing it into your new blog. This caused many problems. First, if you are using a 2.3.x version of WordPress, you’ll run into a bug where tags are given numbers rather than the names. The patch — a one line change to the import PHP script — works as advertised. But then you’ll find that, if you have ever deleted a post or term, the act of importing the content will reassign new, sequential IDs — skipping over the deleted IDs in the old database as if they were never there. That’s okay, except many other plugins (most notably in my case, In-Series) rely on the IDs remaining constant because the ID number is stored in other tables. So save yourself a lot of trouble and just copy over the required tables in your underlying database. I used PhpMyAdmin to copy the wp_comments, wp_posts, wp_terms, wp_term_relationships, and wp_term_taxonomy tables from the old database to the new. And everything just worked right after that.

“Everyone’s Guide to By-Passing Internet Censorship for Citizens Worldwide”

Cover of “Everyone’s Guide to By-Passing Internet Censorship for Citizens Worldwide”The title of this post is the same as the report it describes, Everyone’s Guide to By-Passing Internet Censorship for Citizens Worldwide [PDF]. It was announced by Ronald Deibert last week on his blog at Citizen Lab. The one sentence synopsis goes like this: “This guide is meant to introduce non-technical users to Internet censorship circumvention technologies, and help them choose which of them best suits their circumstances and needs.”

Although the stated audience is non-technical users, I found the description of techniques and circumstances under which one might deploy the techniques very interesting. The document provides guidance for those seeking circumvention and those who want to provide it. After a brief introduction to censorship activities worldwide (including in the United States), it walks the reader through an analysis of needs and describes solutions that meet the needs based on the user’s technical skills. I knew ‘tor‘ — a long-time favorite of mine — would be in there, but I was surprised by the range of other options.

To put a library spin on the report, some of the solutions offered are usable on “public computers” — such as, say, what one might find in a library. One could take the report and read about the techniques with the intent to block them on your public workstations, but I think another reading of it would say that such attempts are ultimately futile because of the likelihood of other similar services popping up to take their place. Unless you are running a white-list-only setup (that is to say, your public workstations are explicitly set to only allow access to a prescribed set of sites), any user can walk up to any public workstation and access the circumvention sites described in the report or any other ones that spring into existence.

The circumvention techniques are, of course, do not provide an assurance of privacy. Even though the network traffic is encrypted, the activities of the user can still be monitored by keystroke loggers and other techniques in the workstation itself. In order to get around that, one would need to restart the public workstation with a bootable Linux distribution, but that is perhaps a report for another time…

The text was modified to update a link from to on January 28th, 2011.

The text was modified to update a link from to on January 28th, 2011.

The text was modified to update a link from to on January 28th, 2011.

Killing Off Runaway Apache Processes

Well, something is still going wrong on — despite previous performance tuning efforts, I’m still running into cases where machine performance grinds to a halt. In debugging it a bit further, I’ve found that the root cause is an apache httpd process which wants to consume nearly all of real memory which then causes the rest of the machine to thrash horribly. The problem is that I haven’t figured out what is causing that one thread to want to consume so much RAM — nothing unusual appears in either the access or the error logs and I haven’t figured out a way to debug a running apache thread. (Suggestions anyone?)

Found it! It was a WordPress plug-in plus a change to the PHP configuration that was causing the problem. The fix for the fundamental cause of the problem came from a comment timestamped February 8th, 2007 at 3:55 pm on the Footnotes 0.9 Plugin for WordPress 2.0.x page. An infinite loop was consuming both CPU cycles and RAM, and this was exacerbated by a change I made to the maximum CPU execution time for PHP scripts that was required in order to play with the IP City Cluster plug-in. With the patch to the Footnotes plug-in, has gone 12 hours without a run-away apache process.

In any case, I whipped up this little ditty that is running every five minutes in cron as a way to gloss over the problem for the moment. Running as root, it looks into all of the processes in the virtual /proc file system, specifically in the ‘stat’ file, and using awk looks to see if the second space-delimited value is the name of the httpd process (this is the Gentoo Linux distribution, so the name of the process is apache2) and the 23rd space-delimited value (the virtual size of the process) is bigger than 800MB. If so, it prints out the PID of the process (the first value in the stat file) at which the bash script unceremoniously sends it a kill (‘-9’) signal. The script looks like this:

for i in `/bin/ls -d /proc/[0-9]*`; do
        if [ -f $i/stat ]; then
                pid=`/bin/awk '{ if ($2 == "(apache2)" && $23 > 800000000) print $1}' $i/stat`
                if [ "$pid" != "" ]; then
                        echo "Killing $pid because of load average: `awk '{print $1}' /proc/loadavg`"
                        kill -9 $pid

If anyone has any suggestions as to how to narrow down what the problem might be, I’d appreciate hearing from you. I’ve tried eliminating WordPress plugins, recompiling WordPress and Apache, and attempted to catch the behavior with a network traffic sniffer, but have come up empty so far.

The text was modified to update a link from to on August 22nd, 2013.

WordPress/MySQL Tuning runs on a relatively tiny box — a Pentium III with 512MB of RAM. I’m running a Gentoo Linux distribution, so I actually have a prayer of getting useful work out of the machine (it server is actually a recycled Windows desktop), but the performance just wasn’t great. As it turns out, there are several easy things one can do to dramatically improve life.

The Configuration

The box is both a mail server (IMAP) and a WordPress server. A rough eyeball at the process accounting on the server shows that it spends about 40% of the time doing mail (mostly taken up by Clamscan virus scanning and spam checking) and another 40% doing MySQL and web stuff. Since there isn’t much dynamic content on the box and nothing else using the database but WordPress, I’m fairly confident that blog traffic is almost all of that 40%. I’m using MySQL 5.0.x, Apache 2.0.x and WordPress 2.0.x with about two dozen plugins.

Taking PHP Up A Notch

PHP is an interpreted programming language, meaning that each time a script runs it needs to be translated into something closer to machine code (called the ‘opcode’). (As opposed to compiler languages like C and Java where you compile the source code into an executable in one step and then run that executable in a second step.) For an application like WordPress, where the source code is not changing, this translation causes a lot of overhead. Fortunately, there is a PHP plug-in called the Alternative PHP Cache that will saved the translated opcode the first time the script runs and use it for subsequent invocations. Getting this set up is pretty easy (these are Gentoo-specific commands, your Linux distribution will vary and I am glossing over a number of distribution-specific details like how to install packages and where the configuration files will reside):

  1. emerge -aDNtuv pecl-apc will download and install PHP APC and its dependencies (yep — that easy…I love Gentoo)
  2. Change the configuration defaults in /etc/php/apache2-php5/ext/apc.ini. I’ve found that one shared segment of 20MB is enough, so I set apc.shm_size="20". The rest of the settings are as they came in the distribution.
  3. Restart your web server: /etc/init.d/apache2 restart

APC comes with a nifty PHP page that will give you cache statistics and details. If you copy /usr/share/php5/apc/apc.php into your ‘htdocs’ somewhere and execute that page from a browser, you’ll see what I mean. (This is how I learned that 20MB of opcode cache space was fine for my application.)

Kicking MySQL Into Gear

Database tuning focuses a great deal on memory management. Your RAM will always be an order of magnitude faster than reading blocks off a disk. RAM, of course, costs more per MB than disk, though, so you have to select memory management strategies carefully. WordPress is, of course, a read-intensive operation. In other words, the majority of SQL statements are SELECTs rather than INSERTs, UPDATEs, or DELETEs. With that in mind, we tune MySQL with a read-intensive strategy. I found some of the best guidance in Peter Zaitsev’s “What to tune in MySQL Server after installation” and the documentation on Optimizing for read performance.

The changes I made to my MySQL configuration file, in the [mysqld] section are:

key_buffer = 6M ; (Actually, a decrease from the default since I didn't seem to need as much)table_cache = 512max_connections = 25thread_cache = 16query_cache_type = 1query_cache_limit = 1Mquery_cache_size = 20M

The 20MB query cache limit seems to be just about the right size for me. I seem to get very close to the edge of that buffer, but never seem to go over.

Finishing Up with a WordPress Plug-in

One more thing is needed to make this all come together: Mark Jaquith’s Post Query Accelerator. As Mark points out on his blog, WordPress “always ask[s] for posts with post_date_gmt <= ‘$now’ where $now is set to the current time, to prevent posts in the future from showing up.” If one turns on cache querying as described above, the “problem with $now is that it changes [with each query], so the query is never exactly the same again and the cache doesn’t help.” Mark’s plug-in “freezes” the value of $now to 15 minute increments or to whenever a post is added/updated, which ever comes first. That makes the query cache useful again and all is well.

Simply download the plug-in from Mark’s page and enable it in WordPress. Note that this plug-in is not needed for WordPress 2.1 and higher as the core developers have solved the “$now” problem with the “future” post status.

Getting Around Drupal’s Prohibition of @ Characters in User Ids

A while back we created an LDAP directory to consolidate account information for various back-room services, and when we created it we decided to use the individual’s e-mail address as the account identifier (uid in LDAP-speak). It seemed like the logical thing to do — it is something that the user knows and it is a cheap and easy way to assume that the account identifiers will be unique. This is not uncommon for many internet services, of course.

Now we’re bring up a Drupal content management system and of course want to tie the authentication into the existing LDAP directory. The initial configuration appeared to work, but there were odd, unexplained failures — most notably, Drupal would not consider it a ‘real’ account because it didn’t have an e-mail field. Even weirder was the fact that we configured Drupal to know exactly which LDAP attribute to use as the e-mail address (mail, in LDAP-speak). It wasn’t until one of our system engineers wondered out loud if the at-sign (‘@’) in the user id wasn’t causing problems that we started making progress towards a solution.

As it turns out, he was right. Without spending so much time in the guts of the Drupal code to know exactly if this is true, it seems like Drupal wants to reserve the ‘@something‘ construct for inter-Drupal authentication. In other words, if you have an account on one Drupal server (let’s call it DrupalA) and want to access a second (let’s call it DrupalB) — and if the two servers agree to share user accounts — the account from DrupalA would be recorded in the database of DrupalB as “UserId@DrupalA“.

The ‘at’ symbol for us, though, is just a normal part of an e-mail address. We really didn’t want to reconstruct our LDAP account scheme, so the best choice seemed to be to find a way to trick Drupal into accepting these account identifiers. This, unfortunately, was no easy task. I couldn’t find the root cause of the problem, but did diagnose enough of the symptoms to force a patch into the system. The patch, in the form of a new module (code included below) forces the account to have two necessary attributes that seem to go missing whenever a ‘@’ character appears in the user id. If you have similar problems, I can’t claim that this will work for you, nor can I guarantee this approach will be supportable in the future. All’s I know is that it seems to work for us in our situation right now.

function olinkldap_help($section) {
  $output = '';
  switch ($section) {
    case 'admin/modules#olinkldap':
      $output = 'olinkldap;
    case 'admin/modules#description':
    case 'admin/help#olinkldap':
      $output = t('Sets up OhioLINK-specific LDAP parameters.');
  return $output;
function olinkldap_settings() { }
function olinkldap_user($op, &$edit, &$user, $category = NULL) {
  switch($op) {
    case 'load':
function olinkldap_user_load(&$user) {
  // Calculate the DN for the user -- you'll need to adjust this to match your LDAP base DN
  $ldap_dn=sprintf("uid=%s,ou=People,dc=somewhere,dc=outthere", $user-&gt;name);
  // Create a new array with the two LDAP-specific values that seem to be missing.
  $forced_data=array('ldap_authentified' =&gt; 1, 'ldap_dn' =&gt; $ldap_dn);
    // It seems like this should work, but it doesn't (it throws a segmentation fault)
    //  user_save($user_edit,array($forced_data);
    // so we're going to interact directly with the database
  if ($user-&gt;uid) {
    // Get the 'data' field for the user and put it in the $data array
    $data = unserialize(db_result(db_query('SELECT data FROM {users} WHERE uid = %d', $user-&gt;uid)));
    // Put all of the attributes from $forced_data into $data
    foreach ($forced_data as $key =&gt; $value) {
      $data[$key] = $value;
    // Reserialize the $data array and update it in the database
    $v[] = serialize($data);
    db_query("UPDATE {users} SET data='%s' WHERE uid=%d",array_merge($v,array($user-&gt;uid)));

Save this as ‘olinkldap.module’, update the DN to reflect your LDAP server’s base DN (see comment in code), copy it into your Drupal modules directory, and activate it. Your ‘@’-impaired userids should start working again. If you are using the inter-Drupal account sharing (we’re not) this might break something for you. That’s not interesting for us, so I’m not testing it against that condition. If you use this and find that it works or doesn’t work, or you have a better way of solving the problem, please leave a comment or traceback…

Managing a Gentoo Linux Server Configuration with Subversion, GLCU, and Trac

Keeping track of configuration changes to servers is a tough job made tougher when some of the sysadmins work from home. Questions of who did what when and why can be exacerbated by the lack of physical proximity — in other words, I can’t simply yell over the cubical wall to the colleague down the hall to ask him about the new package installed on the server. Besides, that oral history tradition is difficult to maintain and harder to sustain as the number of machines grows. This essay describes a practice for maintaining a Gentoo Linux distribution using GLCU, Subversion, and Trac that is lightweight (doesn’t impose a large burden on the sysadmin staff), effective (although it is lightweight it better documents and makes accessible the state of our systems over the oral history tradition), and cheap (no operating budget dollars were harmed in the creation of this process — only staff time overhead).

Create an All-Encompassing Configurations Directory

The first step is to put the system configuration files into a revision control system (RCS). An RCS allows us to track the history of files by storing information about changes such as the date/time a change was made, what the change was, who made it, and a free-text field explaining why the change was made. RCS systems are common for software development shops as a way to track changes to source code. In this circumstance we are tracking changes to the text configuration files that make up the operating system and its components. We are using the Subversion RCS, but the same concepts apply whether you are using other systems (such as CVS or Arch).

The RCS will want to act on a single directory tree, but in most cases our configuration files are spread out over the file system. Most are in /etc, but others exist elsewhere. (The portage “world” file, a record of everything installed on your system, for instance, is in /var/lib/portage.) What we do is create a directory called /server-rcs that will be managed by the RCS, and in that directory is copies or links to all of the configuration files on the system.

Putting /etc (or any other directory) Under Version Control

One of the things we’re going to want to do, obviously, is put the entire /etc directory into the RCS. Ideally, we would simply put a link to /etc in /server-rcs. Unfortunately, we can’t use the simple filesystem-based linking methods (soft links and hard links) because a) our RCS is smart enough to see the soft link and records it as a soft link in the revision control database rather than following the link to the contents of that directory; and b) one cannot make a hard link to a directory:

/server-rcs # ln ../etc .
ln: `../etc': hard link not allowed for directory

What we need to do instead is a trick using the ‘mount‘ command to bind one portion of the file system to another part. From the mount MAN page:

Since Linux 2.4.0 it is possible to remount part of the file hierarchy somewhere else. The call is

mount --bind olddir newdir

After this call the same contents is accessible in two places. One can also remount a single file (on a single file).

So we can bind the entire /etc directory into our RCS space with this command:

mount --bind /etc /server-rcs/etc

Better yet, we put this in our /etc/fstab file (also adding the /var/spool/cron directory as well):

/etc                            /system-rcs/etc                         none    bind/var/spool/cron/crontabs        /system-rcs/var-spool-cron-crontabs     none    bind

Since the /etc directory (and other directories) already exist, we’re going to have to play some games to get them into the repository. For the trick do to this with Subversion, see the FAQ entry on in-place imports.

Handling Individual Files Under Version Control

Not everything we want to track is in /etc or neatly packaged into directories. Some application-specific configuration files, most notably web applications, exist somewhere else in the directory structure. We want to track things like the ‘phpmyadmin’ configuration file, for instance.

We could use the mount ‘bind’ trick to put individual files into the /server-rcs space, but that seems overly complicated. Our servers are generally configured with few filesystems, so in many cases the files we need to track in the RCS are within the same filesystem and we can use hard links to put them into the /server-rcs directory. Another alternative is to write a cron job to copy configuration files into the /server-rcs directory, but then realize that this kind of revision control is one way — if we restore a previous version of a file from the RCS, we need to manually copy it back to the original location.

(On the other hand, using the mount ‘bind’ method is a form of self-documenting the otherwise invisible hard links to files in the same filesystem. For that reason, it might be worth considering that option.)

Special Case: /var/lib/portage/world

One special case is the portage ‘world‘ file. This file records all of the user-specified (e.g. non-profile) packages that have been installed on your Gentoo system. Unfortunately, each time ‘emerge‘ runs, the world file is rewritten and the order of package names is seemingly random. This wrecks havoc with the ‘diff’ function of the RCS — it seems like a lot more has changed than just the addition or removal of a package or two.

What we do instead is patch into a hook of the ’emerge’ command that will save a sorted copy of the world file into /server-rcs. This patch goes into /etc/portage/profile/profile.bashrc:

if [ "${EBUILD_PHASE}" == "setup" ]
        sort /var/lib/portage/world &gt; /server-rcs/misc/var-lib-portage-world

Every time ’emerge’ goes through the ‘setup’ mode when installing a package, it will run this sort command. Note that there is no file locking going on here, so there is a remote chance that the /server-rcs version (but not the /var/lib/portage version) could get corrupted. Such a problem is minor, though, and easily fixed.

Importing into Subversion

With the /server-rcs directory prepared, we now just need to get it into the RCS. These are Subversion commands:

svn add --force /server-rcs
svn checkin --message "Importing the configuration files for the server" /server-rcs https://svn.repository.url/svn/configurations/server

Because of the in-place import problem for pre-existing directories (described earlier), we likely had to create some of the repository directory structure already. (In this example, we would have executed a svn mkdir https://svn.repository.url/svn/configurations/server/etc command already to “prime the pump” for adding /etc to the repository.) In line #1, the –force option makes the ‘svn add’ command continue the recursive directory parse to add files and directories to the RCS structure even if some component of those paths were already in the RCS structure. Line #2 checks in our completed /server-rcs directory.

Daily Usage

With all of this setup done, it is finally time to make use of this configuration management infrastructure. Doing so is pretty easy — work as you normally do when installing packages and making changes to configuration files. (As you do so, you also have the added safety net of svn revert filename should you make a mistake and want to go back to the previous version of a file.) When you’ve done a defined chunk of work, simply run this command:

svn status /server-rcs
svn checkin /server-rcs -m "Free-text description of why you made the changes."

The first line will show you the files modified since the last check-in &mdash hopefully only the files you intended to modify, although this is a good point to check to make sure an inadvertent change didn’t happen. The second line will copy changes to the /server-rcs directory into the RCS along with the free-text note describing why you made the change.

Isn’t this great? It is sort of self-documenting. Not only to you have your brief description of what you did but you also have the exact changes made to the configuration files. If a change doesn’t work out, you have easy access to past configurations that allow you to revert back to a previous state. (Note, though, that we’re not saving actual applications in the RCS — you may have to recompile and install older versions of applications to get back to the previous state.)

Portage Updates with GLCU

We can make our system management lives even easier by using the semi-automated tool Gentoo Linux Cron Update (GLCU). This script breaks up the process of updating packages into two pieces. The first that runs in the off-hours via cron that syncs the local portage copy, download and compiles updated packages, and stages ready-to-install binary distributions of those updates. The second piece has the human interface: seeing the list of updated packages in the staging area, selecting which to install, and prompting the sysadmin to install any updates as a result of Gentoo Linux Security Announcements (GLSAs).

See the project on SourceForge for all of the details on installing, configuring and running GLCU. We make one tweak to the GLCU configuration to prompt the sysadmin to complete all of the housekeeping chores: running dispatch-conf to merge changes to configuration files and revdep-rebuild to make sure all of the applications using updated linked libraries are properly recompiled. To do this, add a line to /etc/conf.d/glcu:

updatetc: dispatch-conf && revdep-rebuild -X -pv

A typical update for us looks like:

# glcu /tmp/glcuUpdate-23112****************************************>> Welcome to glcu's easy update featurePrebuilt packages:------------------(  1 ) [binary     U ] app-editors/nano-2.0.1 [1.3.12-r1] USE="ncurses nls spell unicode  -debug -justify -minimal -slang"(  2 ) [binary     U ] media-libs/libsdl-1.2.11 [1.2.8-r1] USE="X esd* -aalib -alsa -arts  -dga -directfb -fbcon -ggi -libcaca -nas -noaudio -noflagstrip -nojoystick -novideo   -opengl -oss -svga -xinerama -xv (-pic%)" Do you want to install the prebuilt package(s) [Y/n]       (or you can either install only specified package number(s) #,     or NOT install package with -# and use i# for injecting)> y[...pre-compiled packages are installed...]>>> Auto-cleaning packages...>>> No outdated packages were found on your system. * GNU info directory index is up-to-date. * IMPORTANT: 1 config files in /etc need updating. * Type emerge --help config to learn how to update config files.glsa's:  ['200612-03'](  1 ) 200612-03 [N] GnuPG: Multiple vulnerabilities ( app-crypt/gnupg ) Do you want to fix all glsa's now? [Y/n]    (or you can either install only specified glsa number(s) #,     or NOT install glsa with -# and use i# for injecting)> y[...packages related to the GLSA are downloaded, compiled and installed...] Do you want to run dispatch-conf && revdep-rebuild -X -pv now? [Y/n] > y[...dispatch-conf and revdep-rebuild are run...]

With the system nicely updated, we can check in all of the changes to the RCS with a note about what we did:

svn ci -m "After running 'glcu' to update app-editors/nano, media-libs/libsdl, and GLSA for app-crypt/gnupg"

Tracking Configuration Changesets and Trouble Tickets with Trac

So far we’ve done quite a lot to document changes to the configuration of our server. What we’re missing is a nice way to view and track those changes over time. Since everything is in the Subversion RCS, one way to accomplish this is to put a web interface (like ) on top of Subversion repository. For just a little bit more effort and complexity, though, we can have a very nice documentation and issue tracking system bundled with the display of our configuration changes repository by using Trac.

Trac is an open source wiki and issue tracking system for software development projects. Its stated mission is to “help developers write great software while staying out of the way.” In this case we’ll be using it to help sysadmins manage complex systems while staying out of the way. Trac is a web-based tool that “allow wiki markup in issue descriptions and commit messages, creating links and seamless references between bugs, tasks, changesets, files and wiki pages. A timeline shows all … events in order, making the acquisition of an overview of the [state of the system] and tracking progress very easy.”

Trac is synchronized with our Subversion source code repository, so the timeline of changes (demo) shows each check in to the Subversion RCS (demo), which can be tied to an issue ticket (demo) for a problem or task that is requested, worked on, then closed via simple wiki-like markup. One can also browse through the stored changes (demo) and look at a graphical difference between any two revisions of a file (demo) but also review the log of check in messages (demo) associated with that file over time.


With a few tools and some modest changes to current system maintenance practices, the history of the configuration of machines can be documented and the changes viewed over time. The changes in practices are designed to be very minimal and simple yet return a large payoff over time if consistently followed. The practices also enhance communication between geographically dispersed staff tasked with managing the same platforms by regularly creating snapshots of the configuration state and documenting who did what changes and why.