The Scariest Bug

Dec 19th 2011 by Tom Coleman

The scariest bug is the bug you can’t reproduce

Disclaimer: I’ve never worked at NASA, or designed guidance systems for ICBMs, or even worked on a product mission critical to people’s workflows; so don’t expect the potential consequences of this bug to scare you quite as much unless you, like me, develop user facing web applications.

Just about the worst possible side effect

After wrapping up the main development phase of our latest release. Zol stumbled upon a bug that transformed one person into another. Somehow we managed to create a super bug that would allow you to shape shift. Wonderful and frightening all at once.

What he was supposed to see

What he did see

Simply by hitting refresh on the bindle pictured above, in a blink of an eye, he was me: able to comment, post bindles, and generally live it up. (As you can see I have admin rights as well, so his potential for mischief was high!)

Needless to say the discovery of this bug a few days before our release was more than a little scary; having a normal user suddenly find themselves logged in as an admin user simply by viewing a random bindle is a dangerous scenario.

A ghost in the machine?

Zol discovered this bug at the same time that I was tinkering around on the server, so our initial hypothesis was of some kind of weird interaction between our sessions. This would have been a bit of a worst case scenario—-to write the bug off to some kind of un-reproducible gremlin in the system and then to (try and) forget about it.

This is something you should never, ever do if you can help it. Sure, you are using a large complicated stack of technologies, and I’m sure that you don’t fully understand every piece involved. But especially for a bug with such serious consequences:

Fight as hard as possible to to reproduce the bug, and then find out why it happens.

And fight you must! You don’t necessarily need to fix the bug, but fix it if you can. I believe it’s impossible to achieve quality software if you are satisfied with unexplained hiccups. Understanding the quirks of a product is the first step in mastering how it works.

Cookies the culprit, always the cookies

Luckily, the bug was easy to reproduce. Refreshing that bindle would cause any user to switch over to my account, or more often, to Zol’s account. The behavior was random, but at least semi-predictable — around 90% of the time on that bindle I would become Zol.

After trying various things, I soon realized the problem only happened on bindle pages, and even then, not on the page of a brand new bindle. Firefox’s privacy dialog confirmed my suspicions of my session mysteriously getting changed right under me.

Aside: Cookie and Internet Sessions

Know how the internet works? Skip this bit.
How does a server remember who you are when you request a new page from a website? Your browser sends a cookie to the server along with the request (give me ‘http://bindle.me/bindles/2’, I am ‘Tom Coleman’!). Except that in order to maintain a modicum of security, the identifier is a frequently changing key that is mapped on the server to ‘Tom Coleman’. The browser learns what key to use when the server sends a response with a ‘Set-Cookie: xxxxxxxxx’ header. This simplistic security model is clearly simple to MitM or packet sniff, thus the success of things like Firesheep.

How do we tell when the session is being changed by the server? We use my favorite diagnostic tool HttpScoop. With HTTPSCOOP we were able to confirm that it was in fact the requests for the image thumbnails inside the bindle setting the cookie to random values (or more precisely the values of the wrong sessions).

The aha! moment

Images in a bindle are thumbnailed on-the-fly by the excellent Dragonfly, and thus these image requests enter the rails stack. We had managed to isolate the bug down to the following simple behaviour. Open this image:

And you transform into me, open this one:

And you will be Zol.

At last, reproducible behavior. (And this explained the 90% Zol / 10% me randomness I was seeing before—-of the 6 images on the bindle, 5 would make you Zol, and one me, around 83%, of course the one that finished loading last would take control of the cookie).

At this point the battle was won. Although there was more work to do (I’m not even completely sure yet if this is a bug in dragonfly, or a misconfiguration on my part), at this point I knew enough about the bug to be able to say:

  1. I know why it is happening.
  2. I’ll know for sure when it is no longer happening.

So the bug can be fixed, or at least worked around, and I can take confidence in the fact that I know exactly what happened to cause the bug and that it won’t happen again (at least not in that form!).

Quality is about setting the bar about what is good enough. When it comes to unexplained behavior, I doubt it’s possible to set the bar too high.

Tom Coleman

Co-creator of bindle.me, searching for simplicity, quality and elegance in technology, products and code.

See all Tom Coleman's posts

One Response to “The Scariest Bug”

  1. Some really excellent posts on this website , regards for contribution.

Leave a Reply

  • Search: