April 2009
April 10, 2009
Experiences deploying a large-scale infrastructure in Amazon EC2. “At OpenX we recently completed a large-scale deployment of one of our server farms to Amazon EC2. Here are some lessons learned from that experience.”
Digg Search: Now With 99.987% Less Suck. Really nice implementation of faceted search, still using Lucene and Solr under the hood.
April 11, 2009
Using Scala with Google App Engine. Scala works, but I haven’t seen confirmation on actors yet (which are likely to break due to their dependency on threads).
rev=canonical bookmarklet and designing shorter URLs
I’ve watched the proliferation of URL shortening services over the past year with a certain amount of dismay. I care about the health of the web and try to ensure that URLs I am responsible will last for as long as possible, and I think it’s very unlikely that all of these new services will still be around in twenty years time. Last month I suggested that the Internet Archive start mirroring redirect databases, and last week I was pleased to hear that Archiveteam, a different organisation, had already started crawling.
[... 920 words]April 12, 2009
Revving up. Jeremy Keith advocates adding the revcanonical attribute to regular A elements as well as / instead of hiding it in the head of the document, following the microformats design principle that invisible metadata is less valuable than augmenting visible links. I’ve updated my shorten bookmarklet to handle this case.
A rev=“canonical” HTTP Header. Chris Shiflett proposes optionally exposing rev=canonical information in an HTTP header, thus allowing sites to discover shorter URLs using just a HEAD request and removing the need to parse HTML. The pingback specification also uses this shortcut.
Running Rhino and Helma NG on Google App Engine. Helma NG is a JavaScript web app framework, which now works on App Engine out of the box.
Tweenbots: Cute Beats Smart. How do you build a robot that can get from one end of Washington Square Park to the other without your help? Give it a cute smile and a sign explaining where it’s going and rely on strangers to point it in the right direction along the way.
The App Store has an inscrutable, time-consuming, whim-dependent approval process. The App Store newsgroup postings are full of angry claims that this is a bug, but I bet it's a feature. If you can't get an app approved until it's working perfectly, and you have to wait a week or two -- or more -- between approval rounds, you're much more likely to put a lot more effort in up front to get it right.
We’re using the same trick on flic.kr to avoid having to maintain a look up database, though we’re using base 58.
17-year-old claims responsibility for Twitter worm. It was a text book XSS attack—the URL on the user profile wasn’t properly escaped, allowing an attacker to insert a script element linking out to externally hosted JavaScript which then used Ajax to steal any logged-in user’s anti-CSRF token and use it to self-replicate in to their profile.
April 13, 2009
django-shorturls. Jacob took my self-admittedly shonky shorter URL code and turned it in to a proper reusable Django application.
I like rev=“canonical”. Les Orchard summarises the current debate over what colour to paint the rev=“canonical” bikeshed.
favikon.com. Small, easy to use online favicon generator.
How to cause moral outrage from the entire Internet in ten lines of code. Looks legit—the author claims to have sparked this weekend’s #amazonfail moral outrage (where Amazon where accused of removing Gay and Lesbian books from their best seller rankings) by exploiting a CSRF hole in Amazon’s “report as inappropriate” feature to trigger automatic takedowns. EDIT: His claim is disputed elsewhere (see comments)
tinyarchive.org. Blaine Cook’s archive of 301 and 302 redirects—needs to be automatically updated by a crawler for it to be really useful though.
April 14, 2009
Amazon Says Listing Problem Was an Error, Not a Hack (via) “A friend within the company told him that someone working on Amazon’s French site mistagged a number of keyword categories, including the ’Gay and Lesbian’ category, as pornographic, using what’s known internally as the Browse Nodes tool. Soon the mistake affected Amazon sites worldwide.”
Visualising Sorting Algorithms. Aldo Cortesi dislikes animations of sorting algorithms, so he designed a beautiful technique for statically visualising them instead (using Python and Cairo to generate the images).
You guys are moving on this stuff too fast! Welcome to 2002, when lots of us had more spare time than employment and we deployed new crap like this on our blogs and sites daily.
Reducing XSS by way of Automatic Context-Aware Escaping in Template Systems (via) The Google Online Security Blog reminds us that simply HTML-escaping everything isn’t enough—the type of escaping needed depends on the current markup context, for example variables inside JavaScript blocks should be escaped differently. Google’s open source Ctemplate library uses an HTML parser to keep track of the current context and apply the correct escaping function automatically.
Counting the ways that rev=“canonical” hurts the Web. Mark Nottingham complains about misapplied trust (a page can falsely claim to be the canonical URL for another page), the easy confusion between rev and rel and the lack of discussion with relevant communities.
London’s abandoned Underground Stations on Google Street View. “The network is littered with buildings that belonged to stations that closed their doors to the public because routes were changed and diverted, or because there was just too little traffic to make them viable. Here are some of the remnants of disused Underground stations that you can see on Google’s Street View of London.”
We did some studies and found that the attribute was almost never used, and most of the time, when it was used, it was a typo where someone meant to write rel="" but wrote rev="". To be precise, the most commonly used value was rev="made", which is equivalent to rel="author" and thus was not a convincing use case. The second most common value was rev="stylesheet", which is meaningless and obviously meant to be rel="stylesheet".
April 15, 2009
10 Cool Things We’ll Be Able To Do Once IE6 Is Dead. Highlights include child and attribute selectors, 24bit PNGs and max-width and min-width. Simple pleasures, but I can hardly wait.
April 16, 2009
(Yet) Another DiggBar Update. Digg are responding in exactly the right way in my opinion—the DiggBar will start returning 301 redirects for anonymous users, while users who are logged in to Digg can opt-out of the feature if they want to (usage statistics show that most Digg users are fine with the feature).
Developing Django apps with zc.buildout. Jacob went ahead and actually documented one of Python’s myriad of packaging options.
April 17, 2009
Cross Browser Base64 Encoded Images Embedded in HTML (via) Scarily clever. View the PHP source to see what’s going on—most browsers get image tags that use data URIs starting with data:image/png;base64, but IE gets served a Content-type:message/rfc822 header and a MIME formatted multipart/related document, as used by e-mail clients to embed inline image attachments.
Installing CouchDB from source on OS X. So far I’ve just been playing with it in an Ubuntu virtual machine.
Drop ACID and think about data. I’ve been very impressed with the quality and speed with which the PyCon 2009 videos have been published. Here’s Bob Ippolito on distributed databases and key/value stores.
Paul Buchheit: Make your site faster and cheaper to operate in one easy step. Paul promotes gzip encoding using nginx as a proxy, and mentions that FriendFeed use a “custom, epoll-based python server” as their application server. Does that mean that they’re serving their real-time comet feeds directly from Python?