<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/atom10full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.simonwillison.net/~d/styles/itemcontent.css"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-gb"><title>Simon Willison's Weblog Entries</title><link href="http://simonwillison.net/" rel="alternate" /><id>http://simonwillison.net/</id><updated>2010-02-16T09:11:21Z</updated><author><name>Simon Willison</name></author><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/atom+xml" href="http://feeds.simonwillison.net/swn-entries" /><feedburner:info xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" uri="swn-entries" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" /><entry><title>Some questions about the "blocking" of HTML5</title><link href="http://simonwillison.net/2010/Feb/16/html5/" rel="alternate" /><updated>2010-02-16T09:11:21Z</updated><id>http://simonwillison.net/2010/Feb/16/html5/</id><summary type="html">&lt;ol&gt;
&lt;li&gt;When people say that the publication of HTML5 "blocked" by Larry Masinter's "formal objection", what exactly do they mean?
&lt;/li&gt;
&lt;li&gt;Why does the private w3c-archive mailing list exist? Why can't anyone reveal what happens on there? What are the consequences for doing so? Who gets to be on that list in the first place?
&lt;/li&gt;
&lt;li&gt;Can anyone raise a "formal objection"?&lt;/li&gt;
&lt;li&gt;Is anyone calling for the HTML Working Group to be "rechartered"? If so, what does that involve?&lt;/li&gt;
&lt;li&gt;If there are concerns about the inclusion of Canvas 2D in the specification, why were these not resolved earlier?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Some &lt;a href="http://simonwillison.net/tags/html5%2Badobe/"&gt;background reading&lt;/a&gt;. I was planning to fill in answers as they arrive, but I screwed up the moderation of the comments and got flooded with detailed responses - I strongly recommend &lt;a href="http://simonwillison.net/2010/Feb/16/html5/#comments"&gt;reading the comments&lt;/a&gt;.&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2010/Feb/16/html5/#comments"&gt;&lt;img src="http://simonwillison.net/2010/Feb/16/html5/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="adobe" /><category term="html5" /><category term="larrymasinter" /><category term="w3c" /></entry><entry><title>WildlifeNearYou: It began on a fort...</title><link href="http://simonwillison.net/2010/Jan/12/wildlifenearyou/" rel="alternate" /><updated>2010-01-12T22:53:20Z</updated><id>http://simonwillison.net/2010/Jan/12/wildlifenearyou/</id><summary type="html">&lt;p&gt;Back in October 2008, myself and 11 others set out on the first &lt;a href="http://devfort.com/"&gt;/dev/fort&lt;/a&gt; expedition. The idea was simple: gather a dozen geeks, rent a fort, take food and laptops and see what we could build in a week.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.flickr.com/photos/nataliedowne/4269421697/in/set-72157623197922000/"&gt;&lt;img src="http://simonwillison.net/static/2010/fort-clonque.jpg" width="450" height="186" alt="Fort Clonque" title="Fort Clonque, by Natalie Downe" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fort was &lt;a href="http://www.anotherurl.com/travel/fort_clonque/handbook.htm"&gt;Fort Clonque&lt;/a&gt; on &lt;a href="http://en.wikipedia.org/wiki/Alderney"&gt;Alderney&lt;/a&gt; in the Channel Islands, managed by the &lt;a href="http://www.landmarktrust.org.uk/"&gt;Landmark Trust&lt;/a&gt;. We spent an incredibly entertaining week there exploring Nazi bunkers, cooking, eating and coding up a storm. It ended up taking &lt;em&gt;slightly&lt;/em&gt; longer than a week to finish, but 14 months later the result of our combined efforts can finally be revealed: &lt;a href="http://www.wildlifenearyou.com/"&gt;WildlifeNearYou.com&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;WildlifeNearYou is a site for people who like to see animals. Have you ever wanted to know where your nearest Llama is? Search for "&lt;a href="http://www.wildlifenearyou.com/search/?q=llamas+near+brighton"&gt;llamas near brighton&lt;/a&gt;" and you'll see that there's one 18 miles away at &lt;a href="http://www.wildlifenearyou.com/gb/ashdown-forest-llama-farm/"&gt;Ashdown Forest Llama Farm&lt;/a&gt;. Or you can see &lt;a href="http://www.wildlifenearyou.com/fr/"&gt;all the places we know about in France&lt;/a&gt;, or &lt;a href="http://www.wildlifenearyou.com/simon/tripbook/"&gt;all the trips I've been on&lt;/a&gt;, or &lt;a href="http://www.wildlifenearyou.com/animals/red-panda/"&gt;everywhere you can see a Red Panda&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The data comes from user contributions: you can use WildlifeNearYou to track your trips to wildlife places and list the animals that you see there. We can only tell you about animals that someone else has already spotted.&lt;/p&gt;

&lt;p&gt;Once you've added some trips, you can import your Flickr photos and match them up with trips and species. We'll be adding a feature in the future that will push machine tags and other metadata back to Flickr for you, if you so choose.&lt;/p&gt;

&lt;p&gt;You can read more about WildlifeNearYou on the site's &lt;a href="http://www.wildlifenearyou.com/about/"&gt;about page&lt;/a&gt; and &lt;a href="http://www.wildlifenearyou.com/about/faq/"&gt;FAQ&lt;/a&gt;. Please don't hesitate to send us &lt;a href="http://www.wildlifenearyou.com/feedback/"&gt;feedback&lt;/a&gt;!&lt;/p&gt;

&lt;h4&gt;What took so long?&lt;/h4&gt;

&lt;p&gt;So why did it take so long to finally launch it? A whole bunch of reasons. Week long marathon hacking sessions are an amazing way to generate a ton of interesting ideas and build a whole bunch of functionality, but it's very hard to get a single cohesive whole at the end of it. Tying up the loose ends is a pretty big job and is severely hampered by the fort residents returning to their real lives, where hacking for 5 hours straight on a cool easter egg suddenly doesn't seem quite so appealing. We also got stuck in a cycle of "just one more thing". On the fort we didn't have internet access, so internet-dependent features like Freebase integration, Google Maps, Flickr imports and OpenID had to be left until later ("they'll only take a few hours" no longer works once you're off /dev/fort time).&lt;/p&gt;

&lt;p&gt;The biggest problem though was perfectionism. The longer a side-project drags on for, the more important it feels to make it "just perfect" before releasing it to the world. Finally, on New Year's Day, &lt;a href="http://natbat.net/"&gt;Nat&lt;/a&gt; and I decided we had had enough. Our resolution was to "ship the thing within a week, no matter what state it's in". We're a few days late, but it's finally live.&lt;/p&gt;

&lt;p&gt;WildlifeNearYou is by far the most fun website I've ever worked on. To all twelve of my &lt;a href="http://www.wildlifenearyou.com/about/#team_avatars"&gt;intrepid fort companions&lt;/a&gt;: congratulations - we made a thing!&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.flickr.com/photos/cindyli/3072532829/in/set-72157610369683426/"&gt;&lt;img src="http://simonwillison.net/static/2010/devfort-group.jpg" width="450" height="300" alt="Group photo at the Fort" title="Group photo at the fort, by Cindy Li" /&gt;&lt;/a&gt;&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2010/Jan/12/wildlifenearyou/#comments"&gt;&lt;img src="http://simonwillison.net/2010/Jan/12/wildlifenearyou/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="devfort" /><category term="django" /><category term="projects" /><category term="python" /><category term="wildlifenearyou" /></entry><entry><title>Crowdsourced document analysis and MP expenses</title><link href="http://simonwillison.net/2009/Dec/20/crowdsourcing/" rel="alternate" /><updated>2009-12-20T12:07:53Z</updated><id>http://simonwillison.net/2009/Dec/20/crowdsourcing/</id><summary type="html">&lt;p&gt;As &lt;a href="http://www.guardian.co.uk/politics/mps-expenses"&gt;you may have heard&lt;/a&gt;, the UK government released a fresh batch of MP expenses documents a week ago on Thursday. I spent that week working with a small team at Guardian HQ to prepare for the release. Here's what we built:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://mps-expenses2.guardian.co.uk/"&gt;http://mps-expenses2.guardian.co.uk/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's a crowdsourcing application that asks the public to help us dig through and categorise the enormous stack of documents - around 30,000 pages of claim forms, scanned receipts and hand-written letters, all scanned and published as PDFs.&lt;/p&gt;

&lt;p&gt;This is the second time we've tried this - the first was back in June, and can be seen at &lt;a href="http://mps-expenses.guardian.co.uk/"&gt;mps-expenses.guardian.co.uk&lt;/a&gt;. Last week's attempt was an opportunity to apply the lessons we learnt the first time round.&lt;/p&gt;

&lt;p&gt;Writing crowdsourcing applications in a newspaper environment is a fascinating challenge. Projects have very little notice - I heard about the new document release the Thursday before giving less than a week to put everything together. In addition to the fast turnaround for the application itself, the 48 hours following the release are crucial. The news cycle moves fast, so if the application launches but we don't manage to get useful data out of it quickly the story will move on before we can impact it.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.scalecamp.org.uk/"&gt;ScaleCamp&lt;/a&gt; on the Friday meant that development work didn't properly kick off until Monday morning. The bulk of the work was performed by two server-side developers, one client-side developer, one designer and one QA on Monday, Tuesday and Wednesday. The Guardian operations team deftly handled our EC2 configuration and deployment, and we had some extra help on the day from other members of the technology department. After launch we also had a number of journalists helping highlight discoveries and dig through submissions.&lt;/p&gt;

&lt;p&gt;The system was written using Django, MySQL (InnoDB), Redis and memcached.&lt;/p&gt;

&lt;h4&gt;Asking the right question&lt;/h4&gt;

&lt;p&gt;The biggest mistake we made the first time round was that we asked the wrong question. We tried to get our audience to categorise documents as either "claims" or "receipts" and to rank them as "not interesting", "a bit interesting", "interesting but already known" and "someone should investigate this". We also asked users to optionally enter any numbers they saw on the page as categorised "line items", with the intention of adding these up later.&lt;/p&gt;

&lt;p&gt;The line items, with hindsight, were a mistake. 400,000 documents makes for a huge amount of data entry and for the figures to be useful we would need to confirm their accuracy. This would mean yet more rounds of crowdsourcing, and the job was so large that the chance of getting even one person to enter line items for each page rapidly diminished as the news story grew less prominent.&lt;/p&gt;

&lt;p&gt;The categorisations worked reasonably well but weren't particularly interesting - knowing if a document is a claim or receipt is useful only if you're going to collect line items. The "investigate this" button worked very well though.&lt;/p&gt;

&lt;p&gt;We completely changed our approach for the new system. We dropped the line item task and instead asked our users to categories each page by applying one or more tags, from a small set that our editors could control. This gave us a lot more flexibility - we changed the tags shortly before launch based on the characteristics of the documents - and had the potential to be a lot more fun as well. I'm particularly fond of the "hand-written" tag, which has highlighted some &lt;a href="http://mps-expenses2.guardian.co.uk/page/1062/"&gt;lovely examples&lt;/a&gt; of correspondence between MPs and the expenses office.&lt;/p&gt;

&lt;p&gt;Sticking to an editorially assigned set of tags provided a powerful tool for directing people's investigations, and also ensured our users didn't start creating potentially libellous tags of their own.&lt;/p&gt;

&lt;h4&gt;Breaking it up in to assignments&lt;/h4&gt;

&lt;p&gt;For the first project, everyone worked together on the same task to review all of the documents. This worked fine while the document set was small, but once we had loaded in 400,000+ pages the progress bar become quite depressing.&lt;/p&gt;

&lt;p&gt;This time round, we added a new concept of "&lt;a href="http://mps-expenses2.guardian.co.uk/assignment/"&gt;assignments&lt;/a&gt;". Each assignment consisted of the set of pages belonging to a specified list of MPs, documents or political parties. Assignments had a threshold, so we could specify that a page must be reviewed by at least X people before it was considered reviewed. An editorial tool let us feature one "main" assignment and several alternative assignments right on the homepage.&lt;/p&gt;

&lt;p&gt;Clicking "start reviewing" on an assignment sets a cookie for that assignment, and adds the assignment's progress bar to the top of the review interface. New pages are selected at random from the set of unreviewed pages in that assignment.&lt;/p&gt;

&lt;p&gt;The assignments system proved extremely effective. We could use it to direct people to the highest value documents (our top hit list of interesting MPs, or members of the shadow cabinet) while still allowing people with specific interests to pick an alternative task.&lt;/p&gt;

&lt;h4&gt;Get the button right!&lt;/h4&gt;

&lt;p&gt;Having run two crowdsourcing projects I can tell you this: the single most important piece of code you will write is the code that gives someone something new to review. Both of our projects had big "start reviewing" buttons. Both were broken in different ways.&lt;/p&gt;

&lt;p&gt;The first time round, the mistakes were around scalability. I used a SQL "ORDER BY RAND()" statement to return the next page to review. I knew this was an inefficient operation, but I assumed that it wouldn't matter since the button would only be clicked occasionally.&lt;/p&gt;

&lt;p&gt;Something like 90% of our database load turned out to be caused by that one SQL statement, and it only got worse as we loaded more pages in to the system. This caused multiple site slow downs and crashes until we threw together a cron job that pushed 1,000 unreviewed page IDs in to memcached and made the button pick one of those at random.&lt;/p&gt;

&lt;p&gt;This solved the performance problem, but meant that our user activity wasn't nearly as well targeted. For optimum efficiency you really want everyone to be looking at a different page - and a random distribution is almost certainly the easiest way to achieve that.&lt;/p&gt;

&lt;p&gt;The second time round I turned to my new favourite in-memory data structure server, &lt;a href="http://code.google.com/p/redis/"&gt;redis&lt;/a&gt;, and its &lt;a href="http://code.google.com/p/redis/wiki/SrandmemberCommand"&gt;SRANDMEMBER&lt;/a&gt; command (a feature I &lt;a href="http://twitter.com/simonw/status/5027987857"&gt;requested&lt;/a&gt; a while ago with this exact kind of project in mind). The system maintains a redis set of all IDs that needed to be reviewed for an assignment to be complete, and a separate set of IDs of all pages had been reviewed. It then uses redis set intersection (the &lt;a href="http://code.google.com/p/redis/wiki/SdiffstoreCommand"&gt;SDIFFSTORE&lt;/a&gt; command) to create a set of unreviewed pages for the current assignment and then SRANDMEMBER to pick one of those pages.&lt;/p&gt;

&lt;p&gt;This is where the bug crept in. Redis was just being used as an optimisation - the single point of truth for whether a page had been reviewed or not stayed as MySQL. I wrote a couple of Django management commands to repopulate the denormalised Redis sets should we need to manually modify the database. Unfortunately I missed some - the sets that tracked what pages were available in each document. The assignment generation code used an intersection of these sets to create the overall set of documents for that assignment. When we deleted some pages that had accidentally been imported twice I failed to update those sets.&lt;/p&gt;

&lt;p&gt;This meant the "next page" button would occasionally turn up a page that didn't exist. I had some very poorly considered fallback logic for that - if the random page didn't exist, the system would return the first page in that assignment instead. Unfortunately, this meant that when the assignment was down to the last four non-existent pages every single user was directed to the same page - which subsequently attracted well over a thousand individual reviews.&lt;/p&gt;

&lt;p&gt;Next time, I'm going to try and make the "next" button completely bullet proof! I'm also going to maintain a "denormalisation dictionary" documenting every denormalisation in the system in detail - such a thing would have saved me several hours of confused debugging.&lt;/p&gt;

&lt;h4&gt;Exposing the results&lt;/h4&gt;

&lt;p&gt;The biggest mistake I made last time was not getting the data back out again fast enough for our reporters to effectively use it. It took 24 hours from the launch of the application to the moment the first reporting feature was added - mainly because we spent much of the intervening time figuring out the scaling issues.&lt;/p&gt;

&lt;p&gt;This time we handled this a lot better. We provided private pages exposing all recent activity on the site. We also provided public pages for each of the tags, as well as combination pages for party + tag, MP + tag, document + tag, assignment + tag and user + tag. Most of these pages were ordered by most-tagged, with the hope that the most interesting pages would quickly bubble to the top.&lt;/p&gt;

&lt;p&gt;This worked pretty well, but we made one key mistake. The way we were ordering pages meant that it was almost impossible to paginate through them and be sure that you had seen everything under a specific tag. If you're trying to keep track of everything going on in the site, reliable pagination is essential. The only way to get reliable pagination on a fast moving site is to order by the date something was first added to a set in ascending order. That way you can work through all of the pages, wait a bit, hit "refresh" and be able to continue paginating where you left off. Any other order results in the content of each page changing as new content comes in.&lt;/p&gt;

&lt;p&gt;We eventually added an undocumented /in-order/ URL prefix to address this issue. Next time I'll pay a lot more attention to getting the pagination options right from the start.&lt;/p&gt;

&lt;h4&gt;Rewarding our contributors&lt;/h4&gt;

&lt;p&gt;The reviewing experience the first time round was actually quite lonely. We deliberately avoided showing people how others had marked each page because we didn't want to bias the results. Unfortunately this meant the site felt like a bit of a ghost town, even when hundreds of other people were actively reviewing things at the same time.&lt;/p&gt;

&lt;p&gt;For the new version, we tried to provide a much better feeling of activity around the site. We added "top reviewer" tables to every assignment, MP and political party as well as a "most active reviewers in the past 48 hours" table on the homepage (this feature was added to the first project several days too late). User profile pages got a lot more attention, with more of a feel that users were collecting their favourite pages in to tag buckets within their profile.&lt;/p&gt;

&lt;p&gt;Most importantly, we added a concept of &lt;a href="http://mps-expenses2.guardian.co.uk/discoveries/"&gt;discoveries&lt;/a&gt; - editorially highlighted pages that were shown on the homepage and credited to the user that had first highlighted them. These discoveries also added valuable editorial interest to the site, showing up on the homepage and also the index pages for &lt;a href="http://mps-expenses2.guardian.co.uk/labour/"&gt;political parties&lt;/a&gt; and &lt;a href="http://mps-expenses2.guardian.co.uk/conservative/gerald-howarth/"&gt;individual MPs&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Light-weight registration&lt;/h4&gt;

&lt;p&gt;For both projects, we implemented an extremely light-weight form of registration. Users can start reviewing pages without going through any signup mechanism, and instead are assigned a cookie and an anon-454 style username the first time they review a document. They are then encouraged to assign themselves a proper username and password so they can log in later and take credit for their discoveries.&lt;/p&gt;

&lt;p&gt;It's difficult to tell how effective this approach really is. I have a strong hunch that it dramatically increases the number of people who review at least one document, but without a formal A/B test it's hard to tell how true that is. The UI for this process in the first project was quite confusing - we gave it a solid makeover the second time round, which seems to have resulted in a higher number of conversions.&lt;/p&gt;

&lt;h4&gt;Overall lessons&lt;/h4&gt;

&lt;p&gt;News-based crowdsourcing projects of this nature are both challenging and an enormous amount of fun. For the best chances of success, be sure to ask the right question, ensure user contributions are rewarded, expose as much data as possible and make the "next thing to review" behaviour rock solid. I'm looking forward to the next opportunity to apply these lessons, although at this point I &lt;em&gt;really&lt;/em&gt; hope it involves something other than MPs' expenses.&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Dec/20/crowdsourcing/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Dec/20/crowdsourcing/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="crowdsourcing" /><category term="django" /><category term="guardian" /><category term="innodb" /><category term="memcached" /><category term="mpsexpenses" /><category term="mysql" /><category term="nosql" /><category term="politics" /><category term="projects" /><category term="python" /><category term="redis" /></entry><entry><title>Node.js is genuinely exciting</title><link href="http://simonwillison.net/2009/Nov/23/node/" rel="alternate" /><updated>2009-11-23T12:50:22Z</updated><id>http://simonwillison.net/2009/Nov/23/node/</id><summary type="html">&lt;p&gt;I gave a talk on Friday at &lt;a href="http://2009.full-frontal.org/"&gt;Full Frontal&lt;/a&gt;, a new one day JavaScript conference in my home town of Brighton. I ended up throwing away my intended topic (JSONP, APIs and cross-domain security) three days before the event in favour of a technology which first crossed my radar &lt;a href="http://simonwillison.net/2009/Nov/9/node/"&gt;less than two weeks ago&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;That technology is Ryan Dahl's &lt;a href="http://nodejs.org/"&gt;Node&lt;/a&gt;. It's the most exciting new project I've come across in quite a while.&lt;/p&gt;

&lt;p&gt;At first glance, Node looks like yet another take on the idea of server-side JavaScript, but it's a lot more interesting than that. It builds on JavaScript's excellent support for event-based programming and uses it to create something that truly plays to the strengths of the language.&lt;/p&gt;

&lt;p&gt;Node describes itself as "evented I/O for V8 javascript". It's a toolkit for writing extremely high performance non-blocking event driven network servers in JavaScript. Think similar to &lt;a href="http://twistedmatrix.com/"&gt;Twisted&lt;/a&gt; or &lt;a href="http://rubyeventmachine.com/"&gt;EventMachine&lt;/a&gt; but for JavaScript instead of Python or Ruby.&lt;/p&gt;

&lt;h4&gt;Evented I/O?&lt;/h4&gt;

&lt;p&gt;As I discussed in my talk, event driven servers are a powerful alternative to the threading / blocking mechanism used by most popular server-side programming frameworks. Typical frameworks can only handle a small number of requests simultaneously, dictated by the number of server threads or processes available. Long-running operations can tie up one of those threads - enough long running operations at once and the server runs out of available threads and becomes unresponsive. For large amounts of traffic, each request must be handled as quickly as possible to free the thread up to deal with the next in line.&lt;/p&gt;

&lt;p&gt;This makes certain functionality extremely difficult to support. Examples include handling large file uploads, combining resources from multiple backend web APIs (which themselves can take an unpredictable amount of time to respond) or providing comet functionality by holding open the connection until a new event becomes available.&lt;/p&gt;

&lt;p&gt;Event driven programming takes advantage of the fact that network servers spend most of their time waiting for I/O operations to complete. Operations against in-memory data are incredibly fast, but anything that involves talking to the filesystem or over a network inevitably involves waiting around for a response.&lt;/p&gt;

&lt;p&gt;With Twisted, EventMachine and Node, the solution lies in specifying I/O operations in conjunction with callbacks. A single event loop rapidly switches between a list of tasks, firing off I/O operations and then moving on to service the next request. When the I/O returns, execution of that particular request is picked up again.&lt;/p&gt;

&lt;p&gt;(In the talk, I attempted to illustrate this with a questionable metaphor involving &lt;a href="http://www.slideshare.net/simon/evented-io-based-web-servers-explained-using-bunnies"&gt;hamsters, bunnies and a hyperactive squid&lt;/a&gt;).&lt;/p&gt;

&lt;h4&gt;What makes Node exciting?&lt;/h4&gt;

&lt;p&gt;If systems like this already exist, what's so exciting about Node? Quite a few things:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;JavaScript is extremely well suited to programming with callbacks&lt;/strong&gt;. Its anonymous function syntax and closure support is perfect for defining inline callbacks, and client-side development in general uses event-based programming as a matter of course: run this function when the user clicks here / when the Ajax response returns / when the page loads. JavaScript programmers already understand how to build software in this way.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Node represents a clean slate&lt;/strong&gt;. Twisted and EventMachine are hampered by the existence of a large number of blocking libraries for their respective languages. Part of the difficulty in learning those technologies is understanding which Python or Ruby libraries you can use and which ones you have to avoid. Node creator Ryan Dahl has a stated aim for Node to never provide a blocking API - even filesystem access and DNS lookups are catered for with non-blocking callback based APIs. This makes it much, much harder to screw things up.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Node is small&lt;/strong&gt;. I read through the &lt;a href="http://nodejs.org/api.html"&gt;API documentation&lt;/a&gt; in around half an hour and felt like I had a pretty comprehensive idea of what Node does and how I would achieve things with it.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Node is fast&lt;/strong&gt;. V8 is the fast and keeps getting faster. Node's event loop uses Marc Lehmann's highly regarded &lt;a href="http://software.schmorp.de/pkg/libev.html"&gt;libev&lt;/a&gt; and &lt;a href="http://software.schmorp.de/pkg/libeio.html"&gt;libeio&lt;/a&gt; libraries. Ryan Dahl is himself something of a speed demon - he just replaced Node's HTTP parser implementation (already pretty speedy due to it's Ragel / Mongrel heritage) with a &lt;a href="http://four.livejournal.com/1033160.html"&gt;hand-tuned C implementation&lt;/a&gt; with some impressive characteristics.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Easy to get started&lt;/strong&gt;. Node ships with all of its dependencies, and compiles cleanly on Snow Leopard out of the box.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With both my JavaScript and server-side hats on, Node just feels right. The APIs make sense, it fits a clear niche and despite its youth (the project started in February) everything feels solid and well constructed. The rapidly growing community is further indication that Ryan is on to something great here.&lt;/p&gt;

&lt;h4&gt;What does Node look like?&lt;/h4&gt;

&lt;p&gt;Here's how to get Hello World running in Node in 7 easy steps:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;samp&gt;git clone git://github.com/ry/node.git&lt;/samp&gt; (or download and extract &lt;a href="http://github.com/ry/node/archives/master" title="Download ry/node from GitHub"&gt;a tarball&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;samp&gt;./configure&lt;/samp&gt;&lt;/li&gt;
  &lt;li&gt;&lt;samp&gt;make&lt;/samp&gt; (takes a while, it needs to compile V8 as well)&lt;/li&gt;
  &lt;li&gt;&lt;samp&gt;sudo make install&lt;/samp&gt;&lt;/li&gt;
  &lt;li&gt;Save the below code as &lt;samp&gt;helloworld.js&lt;/samp&gt;&lt;/li&gt;
  &lt;li&gt;&lt;samp&gt;node helloworld.js&lt;/samp&gt;&lt;/li&gt;
  &lt;li&gt;Visit &lt;samp&gt;http://localhost:8080/&lt;/samp&gt; in your browser&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's helloworld.js:&lt;/p&gt;

&lt;pre&gt;&lt;code class="javascript"&gt;var sys = require('sys'), 
  http = require('http');

http.createServer(function(req, res) {
  res.sendHeader(200, {'Content-Type': 'text/html'});
  res.sendBody('&amp;lt;h1&amp;gt;Hello World&amp;lt;/h1&amp;gt;');
  res.finish();
}).listen(8080);

sys.puts('Server running at http://127.0.0.1:8080/');
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;If you have Apache Bench installed, try running &lt;samp&gt;ab -n 1000 -c 100 'http://127.0.0.1:8080/'&lt;/samp&gt; to test it with 1000 requests using 100 concurrent connections. On my MacBook Pro I get 3374 requests a second.&lt;/p&gt;

&lt;p&gt;So Node is fast - but where it really shines is concurrency with long running requests. Alter the helloworld.js server definition to look like this:&lt;/p&gt;

&lt;pre&gt;&lt;code class="javascript"&gt;http.createServer(function(req, res) {
  setTimeout(function() {
    res.sendHeader(200, {'Content-Type': 'text/html'});
    res.sendBody('&amp;lt;h1&amp;gt;Hello World&amp;lt;/h1&amp;gt;');
    res.finish();
  }, 2000);
}).listen(8080);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We're using &lt;samp&gt;setTimeout&lt;/samp&gt; to introduce an artificial two second delay to each request. Run the benchmark again - I get 49.68 requests a second, with every single request taking between 2012 and 2022 ms. With a two second delay, the best possible performance for 1000 requests 100 at a time is  &lt;em&gt;1000 requests / (1000 / 100) * 2 seconds = 50 requests a second&lt;/em&gt;. Node hits it pretty much bang on the nose.&lt;/p&gt;

&lt;p&gt;The most important line in the above examples is &lt;code&gt;res.finish()&lt;/code&gt;. This is the mechanism Node provides for explicitly signalling that a request has been fully processed and should be returned to the browser. By making it explicit, Node makes it easy to implement comet patterns like long polling and streaming responses - stuff that is decidedly non trivial in most server-side frameworks.&lt;/p&gt;

&lt;h4&gt;djangode&lt;/h4&gt;

&lt;p&gt;Node's core APIs are pretty low level - it has HTTP client and server libraries, DNS handling, asynchronous file I/O etc, but it doesn't give you much in the way of high level web framework APIs. Unsurprisingly, this has lead to a cambrian explosion of lightweight web frameworks based on top of Node - the &lt;a href="http://wiki.github.com/ry/node"&gt;projects using node page&lt;/a&gt; lists a bunch of them. Rolling a framework is a great way of learning a low-level API, so I've thrown together my own - &lt;a href="http://github.com/simonw/djangode"&gt;djangode&lt;/a&gt; - which brings Django's regex-based URL handling to Node along with a few handy utility functions. Here's a simple djangode application:&lt;/p&gt;

&lt;pre&gt;&lt;code class="javascript"&gt;var dj = require('./djangode');

var app = dj.makeApp([
  ['^/$', function(req, res) {
    dj.respond(res, 'Homepage');
  }],
  ['^/other$', function(req, res) {
    dj.respond(res, 'Other page');
  }],
  ['^/page/(\\d+)$', function(req, res, page) {
    dj.respond(res, 'Page ' + page);
  }]
]);
dj.serve(app, 8008);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;djangode is currently a throwaway prototype, but I'll probably be extending it with extra functionality as I explore more Node related ideas.&lt;/p&gt;

&lt;h4&gt;nodecast&lt;/h4&gt;

&lt;p&gt;My main demo in the Full Frontal talk was nodecast, an extremely simple broadcast-oriented comet application. Broadcast is my favourite "hello world" example for comet because it's both simpler than chat and more realistic - I've been involved in plenty of projects that could benefit from being able to broadcast events to their audience, but few that needed an interactive chat room.&lt;/p&gt;

&lt;p&gt;The source code for the version I demoed can be found on GitHub in &lt;a href="http://github.com/simonw/nodecast/tree/no-redis"&gt;the no-redis branch&lt;/a&gt;. It's a very simple application - the client-side JavaScript simply uses jQuery's getJSON method to perform long-polling against a simple URL endpoint:&lt;/p&gt;

&lt;pre&gt;&lt;code class="javascript"&gt;function fetchLatest() {
  $.getJSON('/wait?id=' + last_seen, function(d) {
    $.each(d, function() {
      last_seen = parseInt(this.id, 10) + 1;
      ul.prepend($('&amp;lt;li&amp;gt;&amp;lt;/li&amp;gt;').text(this.text));
    });
    fetchLatest();
  });
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Doing this recursively is probably a bad idea since it will eventually blow the browser's JavaScript stack, but it works OK for the demo.&lt;/p&gt;

&lt;p&gt;The more interesting part is the server-side &lt;samp&gt;/wait&lt;/samp&gt; URL which is being polled. Here's the relevant Node/djangode code:&lt;/p&gt;

&lt;pre&gt;&lt;code class="javascript"&gt;var message_queue = new process.EventEmitter();

var app = dj.makeApp([
  // ...
  ['^/wait$', function(req, res) {
    var id = req.uri.params.id || 0;
    var messages = getMessagesSince(id);
    if (messages.length) {
      dj.respond(res, JSON.stringify(messages), 'text/plain');
    } else {
      // Wait for the next message
      var listener = message_queue.addListener('message', function() {
        dj.respond(res, 
          JSON.stringify(getMessagesSince(id)), 'text/plain'
        );
        message_queue.removeListener('message', listener);
        clearTimeout(timeout);
      });
      var timeout = setTimeout(function() {
        message_queue.removeListener('message', listener);
        dj.respond(res, JSON.stringify([]), 'text/plain');
      }, 10000);
    }
  }]
  // ...
]);&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The wait endpoint checks for new messages and, if any exist, returns immediately. If there are no new messages it does two things: it hooks up a listener on the &lt;samp&gt;message_queue&lt;/samp&gt; EventEmitter (Node's equivalent of jQuery/YUI/Prototype's custom events) which will respond and end the request when a new message becomes available, and also sets a timeout that will cancel the listener and end the request after 10 seconds. This ensures that long polls don't go on too long and potentially cause problems - as far as the browser is concerned it's just talking to a JSON resource which takes up to ten seconds to load.&lt;/p&gt;

&lt;p&gt;When a message does become available, calling &lt;samp&gt;message_queue.emit('message')&lt;/samp&gt; will cause all waiting requests to respond with the latest set of messages.&lt;/p&gt;

&lt;h4&gt;Talking to databases&lt;/h4&gt;

&lt;p&gt;nodecast keeps track of messages using an in-memory JavaScript array, which works fine until you restart the server and lose everything. How do you implement persistent storage?&lt;/p&gt;

&lt;p&gt;For the moment, the easiest answer lies with the NoSQL ecosystem. Node's focus on non-blocking I/O makes it hard (but not impossible) to hook it up to regular database client libraries. Instead, it strongly favours databases that speak simple protocols over a TCP/IP socket - or even better, databases that communicate over HTTP. So far I've tried using CouchDB (with &lt;a href="http://github.com/sixtus/node-couch"&gt;node-couch&lt;/a&gt;) and redis (with &lt;a href="http://github.com/fictorial/redis-node-client"&gt;redis-node-client&lt;/a&gt;), and both worked extremely well. nodecast &lt;a href="http://github.com/simonw/nodecast"&gt;trunk&lt;/a&gt; now uses redis to store the message queue, and provides a nice example of working with a callback-based non-blocking database interface:&lt;/p&gt;

&lt;pre&gt;&lt;code class="javascript"&gt;var db = redis.create_client();
var REDIS_KEY = 'nodecast-queue';

function addMessage(msg, callback) {
  db.llen(REDIS_KEY, function(i) {
    msg.id = i; // ID is set to the queue length
    db.rpush(REDIS_KEY, JSON.stringify(msg), function() {
      message_queue.emit('message', msg);
      callback(msg);
    });
  });
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Relational databases are coming to Node. Ryan has a &lt;a href="http://github.com/ry/node_postgres"&gt;PostgreSQL adapter&lt;/a&gt; in the works, thanks to that database already featuring a mature non-blocking client library. MySQL will be a bit tougher - Node will need to grow a separate thread pool to integrate with the official client libs - but you can talk to MySQL right now by dropping in &lt;a href="http://code.nytimes.com/projects/dbslayer"&gt;DBSlayer&lt;/a&gt; from the NY Times which provides an HTTP interface to a pool of MySQL servers.&lt;/p&gt;

&lt;h4&gt;Mixed environments&lt;/h4&gt;

&lt;p&gt;I don't see myself switching all of my server-side development over to JavaScript, but Node has definitely earned a place in my toolbox. It shouldn't be at all hard to mix Node in to an existing server-side environment - either by running both behind a single HTTP proxy (being event-based itself, &lt;a href="http://nginx.net/"&gt;nginx&lt;/a&gt; would be an obvious fit) or by putting Node applications on a separate subdomain. Node is a tempting option for anything involving comet, file uploads or even just mashing together potentially slow loading web APIs. Expect to hear a lot more about it in the future.&lt;/p&gt;

&lt;h4&gt;Further reading&lt;/h4&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href="http://s3.amazonaws.com/four.livejournal/20091117/jsconf.pdf"&gt;Ryan's JSConf.eu presentation&lt;/a&gt; is the best discussion I've seen anywhere of the design philosophy behind Node.&lt;/li&gt;
  &lt;li&gt;&lt;a href="http://nodejs.org/api.html"&gt;Node's API documentation&lt;/a&gt; is essential reading.&lt;/li&gt;
  &lt;li&gt;&lt;a href="http://debuggable.com/posts/streaming-file-uploads-with-node-js:4ac094b2-b6c8-4a7f-bd07-28accbdd56cb"&gt;Streaming file uploads with node.js&lt;/a&gt; illustrates how well suited Node is to accepting large file uploads.&lt;/li&gt;
  &lt;li&gt;&lt;a href="http://groups.google.com/group/nodejs"&gt;The nodejs Google Group&lt;/a&gt; is the hub of the Node community.&lt;/li&gt;
&lt;/ul&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Nov/23/node/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Nov/23/node/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="async" /><category term="comet" /><category term="couchdb" /><category term="eventio" /><category term="http" /><category term="javascript" /><category term="node" /><category term="nosql" /><category term="redis" /><category term="ryandahl" /><category term="tornado" /><category term="twisted" /><category term="v8" /></entry><entry><title>Why I like Redis</title><link href="http://simonwillison.net/2009/Oct/22/redis/" rel="alternate" /><updated>2009-10-22T10:58:21Z</updated><id>http://simonwillison.net/2009/Oct/22/redis/</id><summary type="html">&lt;p&gt;I've been getting a lot of useful work done with &lt;a href="http://code.google.com/p/redis/"&gt;Redis&lt;/a&gt; recently.&lt;/p&gt;

&lt;p&gt;Redis is typically categorised as yet another of those new-fangled NoSQL key/value stores, but if you look closer it actually has some pretty unique characteristics. It makes more sense to describe it as a "data structure server" - it provides a network service that exposes persistent storage and operations over dictionaries, lists, sets and string values. Think memcached but with list and set operations and persistence-to-disk.&lt;/p&gt;

&lt;p&gt;It's also incredibly easy to set up, &lt;a href="http://code.google.com/p/redis/wiki/Benchmarks" title="How Fast is Redis?"&gt;ridiculously fast&lt;/a&gt; (30,000 read or writes a second on my laptop with the default configuration) and has an interesting approach to persistence. Redis runs in memory, but syncs to disk every Y seconds or after every X operations. Sounds risky, but it supports replication out of the box so if you're worried about losing data should a server fail you can always ensure you have a replicated copy to hand. I wouldn't trust my only copy of critical data to it, but there are plenty of other cases for which it is really well suited.&lt;/p&gt;

&lt;p&gt;I'm currently not using it for data storage at all - instead, I use it as a tool for processing data using the interactive Python interpreter.&lt;/p&gt;

&lt;p&gt;I'm a huge fan of REPLs. When programming Python, I spend most of my time in an &lt;a href="http://ipython.scipy.org/"&gt;IPython&lt;/a&gt; prompt. With JavaScript, I use the &lt;a href="http://getfirebug.com/cl.html"&gt;Firebug console&lt;/a&gt;. I experiment with APIs, get something working and paste it over in to a text editor. For some one-off data transformation problems I never save any code at all - I run a couple of list comprehensions, dump the results out as JSON or CSV and leave it at that.&lt;/p&gt;

&lt;p&gt;Redis is an excellent complement to this kind of programming. I can run a long running batch job in one Python interpreter (say loading a few million lines of CSV in to a Redis key/value lookup table) and run another interpreter to play with the data that's already been collected, even as the first process is streaming data in. I can quit and restart my interpreters without losing any data. And because Redis semantics map closely to Python native data types, I don't have to think for more than a few seconds about how I'm going to represent my data.&lt;/p&gt;

&lt;p&gt;Here's a 30 second guide to getting started with Redis:&lt;/p&gt;

&lt;pre&gt;&lt;code class="shell"&gt;$ wget http://redis.googlecode.com/files/redis-1.01.tar.gz
$ tar -xzf redis-1.01.tar.gz
$ cd redis-1.01
$ make
$ ./redis-server&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And that's it - you now have a Redis server running on port 6379. No need even for a &lt;samp&gt;./configure&lt;/samp&gt; or &lt;samp&gt;make install&lt;/samp&gt;. You can run &lt;samp&gt;./redis-benchmark&lt;/samp&gt; in that directory to exercise it a bit.&lt;/p&gt;

&lt;p&gt;Let's try it out from Python. In a separate terminal:&lt;/p&gt;

&lt;pre&gt;&lt;code class="shell"&gt;$ cd redis-1.01/client-libraries/python/
$ python
&amp;gt;&amp;gt;&amp;gt; import redis
&amp;gt;&amp;gt;&amp;gt; r = redis.Redis()
&amp;gt;&amp;gt;&amp;gt; r.info()
{u'total_connections_received': 1, ... }
&amp;gt;&amp;gt;&amp;gt; r.keys('*') # Show all keys in the database
[]
&amp;gt;&amp;gt;&amp;gt; r.set('key-1', 'Value 1')
'OK'
&amp;gt;&amp;gt;&amp;gt; r.keys('*')
[u'key-1']
&amp;gt;&amp;gt;&amp;gt; r.get('key-1')
u'Value 1'&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now let's try something a bit more interesting:&lt;/p&gt;

&lt;pre&gt;&lt;code class="shell"&gt;&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 1', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 2', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 3', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.lrange('log', 0, 100)
[u'Log message 3', u'Log message 2', u'Log message 1']
&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 4', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 5', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.push('log', 'Log message 6', tail=True)
&amp;gt;&amp;gt;&amp;gt; r.ltrim('log', 0, 2)
&amp;gt;&amp;gt;&amp;gt; r.lrange('log', 0, 100)
[u'Log message 6', u'Log message 5', u'Log message 4']&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That's a simple capped log implementation (similar to a &lt;a href="http://www.mongodb.org/display/DOCS/Capped+Collections"&gt;MongoDB capped collection&lt;/a&gt;) - &lt;samp&gt;push&lt;/samp&gt; items on to the tail of a 'log' key and use &lt;samp&gt;ltrim&lt;/samp&gt; to only retain the last X items. You could use this to keep track of what a system is doing right now without having to worry about storing ever increasing amounts of logging information.&lt;/p&gt;

&lt;p&gt;See the documentation for a &lt;a href="http://code.google.com/p/redis/wiki/CommandReference"&gt;full list of Redis commands&lt;/a&gt;. I'm particularly excited about the &lt;samp&gt;RANDOMKEY&lt;/samp&gt; and new &lt;samp&gt;SRANDMEMBER&lt;/samp&gt; commands (&lt;a href="http://github.com/antirez/redis/commit/2abb95a9a849453eeb864e919ea0b8d6495a6a2a"&gt;git trunk only&lt;/a&gt; at the moment), which help address the common challenge of picking a random item without &lt;code class="sql"&gt;ORDER BY RAND()&lt;/code&gt; clobbering your relational database. In a beautiful example of open source support in action, I &lt;a href="http://twitter.com/simonw/status/5027987857"&gt;requested SRANDMEMBER on Twitter&lt;/a&gt; yesterday and &lt;a href="http://twitter.com/antirez" title="Salvatore Sanfilippo"&gt;antirez&lt;/a&gt; committed just 12 hours later.&lt;/p&gt;

&lt;p&gt;I used Redis this week to help create &lt;a href="http://www.guardian.co.uk/news/datablog/2009/oct/19/bnp-membership-list-constituency" title="BNP membership where you live"&gt;heat maps of the BNP's membership list&lt;/a&gt; for the Guardian. I had the leaked spreadsheet of the BNP member details and a (licensed) CSV file mapping 1.6 million postcodes to their corresponding parliamentary constituencies. I loaded the CSV file in to Redis, then looped through the 12,000 postcodes from the membership and looked them up in turn, accumulating counts for each constituency. It took a couple of minutes to load the constituency data and a few seconds to run and accumulate the postcode counts. In the end, it probably involved less than 20 lines of actual Python code.&lt;/p&gt;

&lt;p&gt;A much more interesting example of an application built on Redis is &lt;a href="http://hurl.it/"&gt;Hurl&lt;/a&gt;, a tool for debugging HTTP requests built in 48 hours by Leah Culver and Chris Wanstrath. The &lt;a href="http://github.com/defunkt/hurl"&gt;code is now open source&lt;/a&gt;, and Chris talks a bit more about the implementation (in particular their use of sort in Redis) &lt;a href="http://ozmm.org/posts/sort_in_redis.html"&gt;on his blog&lt;/a&gt;. Redis also gets a mention in Tom Preston-Werner's &lt;a href="http://github.com/blog/530-how-we-made-github-fast"&gt;epic writeup&lt;/a&gt; of the new scalable architecture behind GitHub.&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Oct/22/redis/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Oct/22/redis/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="chriswanstrath" /><category term="github" /><category term="guardian" /><category term="hurl" /><category term="interactivedevelopment" /><category term="ipython" /><category term="leahculver" /><category term="opensource" /><category term="performance" /><category term="python" /><category term="redis" /></entry><entry><title>This shouldn't be the image of Hack Day</title><link href="http://simonwillison.net/2009/Oct/19/hackday/" rel="alternate" /><updated>2009-10-19T22:22:34Z</updated><id>http://simonwillison.net/2009/Oct/19/hackday/</id><summary type="html">&lt;p&gt;I love hack days. I was working in the vicinity of Chad Dickerson when he organised the first internal Yahoo! Hack Day back in 2005, and I've since participated in hack day events at Yahoo!, Global Radio and the Guardian. I've also been to every one of Yahoo!'s Open Hack Day events in London. They're fantastic, and the team that organises them should be applauded.&lt;/p&gt;

&lt;p&gt;As such, I care a great deal about the image of hack day - and the videos that emerged from last weekend's &lt;a href="http://developer.yahoo.net/blog/archives/2009/10/2009_taiwan_hac.html"&gt;Taiwan Hack Day&lt;/a&gt; are hugely disappointing.&lt;/p&gt;

&lt;p&gt;&lt;img src="/static/2009/hack-girls-0.jpg" width="440" height="247" alt="Hack Girl dancers at Open Hack Day Taiwan" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="/static/2009/hack-girls-1.jpg" width="440" height="247" alt="Hack Girl dancers at Open Hack Day Taiwan" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src="/static/2009/hack-girls-2.jpg" width="440" height="247" alt="Hack Girl dancers at Open Hack Day Taiwan" /&gt;&lt;/p&gt;

&lt;p&gt;(These are still images from the video - &lt;a href="http://www.flickr.com/photos/jeremyjohnstone/4019401218/"&gt;the original&lt;/a&gt; has been taken down).&lt;/p&gt;

&lt;p&gt;Seriously, what the hell?&lt;/p&gt;

&lt;p&gt;I've heard arguments that this kind of thing is culturally acceptable in Taiwan - in fact it may even be expected for technology events, though I'd love to hear further confirmation. I don't care. The technology industry has a serious, widely recognised problem attracting female talent. The ratio of male to female attendants at most conferences I attend is embarassing - An Event Apart last week in Chicago was a notable and commendable exception.&lt;/p&gt;

&lt;p&gt;Our industry is still young. If we want an all-encompassing technology scene, we need to actively work to cultivate an inclusive environment. This means a zero tolerance approach to this kind of entertainment. Booth babes, tequila girls, and scantily clad gyrating women simply set the wrong tone, here or abroad. Heck, this isn't just about offending women - many guy geeks I know would be mortified by this kind of thing.&lt;/p&gt;

&lt;p&gt;Hack days are a celebration of ingenuity and creativity. Past US hack days have featured performances from Beck and Girl Talk, both of whom embody the creative spirit of the event. Sexy dancing girls? Not so much.&lt;/p&gt;

&lt;p&gt;I'm not the only one who's disappointed.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://twitter.com/Caterina/status/4967140857"&gt;Caterina Fake&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote cite="http://twitter.com/Caterina/status/4967140857"&gt;&lt;p&gt;@Yahoo, for shame : &lt;a href="http://flic.kr/p/78btX1"&gt;http://flic.kr/p/78btX1&lt;/a&gt; I'm frankly disgusted.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="http://twitter.com/chaddickerson/status/4966644906"&gt;Chad Dickerson&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote cite="http://twitter.com/chaddickerson/status/4966644906"&gt;&lt;p&gt;i am *so* disappointed: &lt;a href="http://flic.kr/p/78btX1"&gt;http://flic.kr/p/78btX1&lt;/a&gt;. remember, a team of women delivered the winning hack at the 1st one:&lt;a href="http://bit.ly/FokfF"&gt;http://bit.ly/FokfF&lt;/a&gt;&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;There was &lt;a href="http://search.twitter.com/search?q=&amp;amp;ands=hack+day+taiwan&amp;amp;since=2009-10-17&amp;amp;until=2009-10-19"&gt;a flurry of activity&lt;/a&gt; about this on Twitter yesterday. I sat on this entry for most of today, partly because writing this kind of thing is &lt;em&gt;really&lt;/em&gt; hard but also because I was hoping someone at Yahoo! would wake up and release some kind of statement. So far, nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update (1:30am): &lt;/strong&gt; Chris Yeh of YDN has responded with &lt;a href="http://developer.yahoo.net/blog/archives/2009/10/taiwan_ohd_apology.html"&gt;an appropriately worded apology&lt;/a&gt;.&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Oct/19/hackday/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Oct/19/hackday/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="conferences" /><category term="events" /><category term="hackday" /><category term="hackgirls" /><category term="taiwan" /><category term="womenintechnology" /><category term="yahoo" /></entry><entry><title>Django ponies: Proposals for Django 1.2</title><link href="http://simonwillison.net/2009/Sep/28/ponies/" rel="alternate" /><updated>2009-09-28T23:32:04Z</updated><id>http://simonwillison.net/2009/Sep/28/ponies/</id><summary type="html">&lt;p&gt;I've decided to step up my involvement in Django development in the run-up to Django 1.2, so I'm currently going through several years worth of accumulated pony requests figuring out which ones are worth advocating for. I'm also ensuring I have the code to back them up - my innocent &lt;a href="http://code.djangoproject.com/wiki/AutoEscaping"&gt;AutoEscaping proposal&lt;/a&gt; a few years ago resulted in an enormous amount of work by Malcolm and I don't think he'd appreciate a repeat performance.&lt;/p&gt;

&lt;p&gt;I'm not a big fan of branches when it comes to exploratory development - they're fine for doing the final implementation once an approach has been agreed, but I don't think they are a very effective way of discussing proposals. I'd much rather see working code in a separate application - that way I can try it out with an existing project without needing to switch to a new Django branch. Keeping code out of a branch also means people can start using it for real development work, making the API much easier to evaluate. Most of my proposals here have accompanying applications on GitHub.&lt;/p&gt;

&lt;p&gt;I've recently got in to the habit of including an "examples" directory with each of my experimental applications. This is a full Django project (with settings.py, urls.py and manage.py files) which serves two purposes. Firstly, it allows developers to run the application's unit tests without needing to install it in to their own pre-configured project, simply by changing in to the examples directory and running &lt;samp&gt;./manage.py test&lt;/samp&gt;. Secondly, it gives me somewhere to put demonstration code that can be viewed in a browser using the runserver command - a further way of making the code easier to evaluate. &lt;a href="http://github.com/simonw/django-safeform"&gt;django-safeform&lt;/a&gt; is a good example of this pattern.&lt;/p&gt;

&lt;p&gt;Here's my current list of ponies, in rough order of priority.&lt;/p&gt;

&lt;h4&gt;Signing and signed cookies&lt;/h4&gt;

&lt;p&gt;Signing strings to ensure they have not yet been tampered with is a crucial technique in web application security. As with all cryptography, it's also surprisingly difficult to do correctly. &lt;a href="http://vnhacker.blogspot.com/2009/09/flickrs-api-signature-forgery.html"&gt;A vulnerability in the signing implementation&lt;/a&gt; used to protect the Flickr API was revealed just today.&lt;/p&gt;

&lt;p&gt;One of the many uses of signed strings is to implement signed cookies. Signed cookies are fantastically powerful - they allow you to send cookies safe in the knowledge that your user will not be able to alter them without you knowing. This dramatically reduces the need for sessions - most web apps use sessions for security rather than for storing large amounts of data, so moving that "logged in user ID" value to a signed cookie eliminates the need for session storage entirely, saving a round-trip to persistent storage on every request.&lt;/p&gt;

&lt;p&gt;This has particularly useful implications for scaling - you can push your shared secret out to all of your front end web servers and scale horizontally, with no need for shared session storage just to handle simple authentication and "You are logged in as X" messages.&lt;/p&gt;

&lt;p&gt;The latest version of my &lt;a href="http://github.com/simonw/django-openid"&gt;django-openid&lt;/a&gt; library uses signed cookies to store the OpenID you log in with, removing the need to configure Django's session storage. I've extracted that code in to &lt;a href="http://github.com/simonw/django-signed"&gt;django-signed&lt;/a&gt;, which I hope to evolve in to something suitable for inclusion in &lt;samp&gt;django.utils&lt;/samp&gt;.&lt;/p&gt;

&lt;p&gt;Please note that django-signed has not yet been vetted by cryptography specialists, something I plan to fix before proposing it for final inclusion in core.&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;&lt;a href="http://github.com/simonw/django-signed"&gt;django-signed&lt;/a&gt; on GitHub&lt;/li&gt;
    &lt;li&gt;&lt;a href="http://code.djangoproject.com/wiki/Signing"&gt;Details of the Signing proposal&lt;/a&gt; on the Django wiki&lt;/li&gt;
    &lt;li&gt;&lt;a href="http://groups.google.com/group/django-developers/browse_thread/thread/133509246caf1d91"&gt;Signing discussion&lt;/a&gt; on the django-developers mailing list&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Improved CSRF support&lt;/h4&gt;

&lt;p&gt;This is mainly Luke Plant's pony, but I'm very keen to see it happen. Django has shipped with CSRF protection for &lt;a href="http://code.djangoproject.com/changeset/2868"&gt;more than three years now&lt;/a&gt;, but the approach (using middleware to rewrite form HTML) is relatively crude and, crucially, the protection isn't turned on by default. Hint: if you aren't 100% positive you are protected against &lt;a href="http://en.wikipedia.org/wiki/Cross-site_request_forgery"&gt;CSRF&lt;/a&gt;, you should probably go and turn it on.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://bitbucket.org/spookylukey/django-trunk-lukeplant/src/05f0530f3207/django/contrib/csrf/"&gt;Luke's approach&lt;/a&gt; is an iterative improvement - a template tag (with a dependency on RequestContext) is used to output the hidden CSRF field, with middleware used to set the cookie and perform the extra validation. I experimented at length with an alternative solution based around extending Django's form framework to treat CSRF as just another aspect of validation - you can see the result in my &lt;a href="http://github.com/simonw/django-safeform"&gt;django-safeform&lt;/a&gt; project. My approach avoids middleware and template tags in favour of a view decorator to set the cookie and a class decorator to add a CSRF check to the form itself.&lt;/p&gt;

&lt;p&gt;While my approach works, the effort involved in upgrading existing code to it is substantial, compared to a much easier upgrade path for Luke's middleware + template tag approach. The biggest advantage of safeform is that it allows CSRF failure messages to be shown inline on the form, without losing the user's submission - the middleware check means showing errors as a full page without redisplaying the form. It looks like it should be possible to bring that aspect of safeform back to the middleware approach, and I plan to put together a patch for that over the next few days.&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;Luke's &lt;a href="http://bitbucket.org/spookylukey/django-trunk-lukeplant/src/05f0530f3207/django/contrib/csrf/"&gt;CSRF branch&lt;/a&gt; on bitbucket&lt;/li&gt;
    &lt;li&gt;My &lt;a href="http://github.com/simonw/django-signed"&gt;django-safeform&lt;/a&gt; on GitHub&lt;/li&gt;
    &lt;li&gt;&lt;a href="http://code.djangoproject.com/wiki/CsrfProtection"&gt;Details of the CSRF proposal&lt;/a&gt; on the Django wiki&lt;/li&gt;
    &lt;li&gt;&lt;a href="http://groups.google.com/group/django-developers/browse_thread/thread/3d2dc750082103dc"&gt;CSRF discussion&lt;/a&gt; on the django-developers mailing list&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Better support for outputting HTML&lt;/h4&gt;

&lt;p&gt;This is a major pet peeve of mine. Django's form framework is excellent - one of the best features of the framework. There's just one thing that bugs me about it - it outputs full form widgets (for &lt;code&gt;input&lt;/code&gt;, &lt;code&gt;select&lt;/code&gt; and the like) so that it can include the previous value when redisplaying a form during validation, but it does so using XHTML syntax.&lt;/p&gt;

&lt;p&gt;I have a strong preference for an HTML 4.01 strict doctype, and all those &amp;lt;self-closing-tags /&amp;gt; have been niggling away at me for literally &lt;em&gt;years&lt;/em&gt;. Django bills itself as a framework for "perfectionists with deadlines", so I feel justified in getting wound up out of proportion over this one.&lt;/p&gt;

&lt;p&gt;A year ago I started experimenting with a solution, and came up with &lt;a href="http://github.com/simonw/django-html"&gt;django-html&lt;/a&gt;. It introduces two new Django template tags - &lt;code&gt;{% doctype %}&lt;/code&gt; and &lt;code&gt;{% field %}&lt;/code&gt;. The doctype tag serves two purposes - it outputs a particular doctype (saving you from having to remember the syntax) and it records that doctype in Django's template context object. The field tag is then used to output form fields, but crucially it gets to take the current doctype in to account.&lt;/p&gt;

&lt;p&gt;The field tag can also be used to add extra HTML attributes to form widgets from within the template itself, solving another small frustration about the existing form library. The &lt;a href="http://github.com/simonw/django-html/blob/master/README.rst"&gt;README&lt;/a&gt; describes the new tags in detail.&lt;/p&gt;

&lt;p&gt;The way the tags work is currently a bit of a hack - if merged in to Django core they could be more cleanly implemented by refactoring the form library slightly. This refactoring is currently being discussed on the mailing list.&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;&lt;a href="http://github.com/simonw/django-html"&gt;django-html&lt;/a&gt; on GitHub&lt;/li&gt;
    &lt;li&gt;&lt;a href="http://groups.google.com/group/django-developers/browse_thread/thread/bbf75f0eeaf9fa64"&gt;Improved HTML discussion&lt;/a&gt; on the django-developers mailing list&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;Logging&lt;/h4&gt;

&lt;p&gt;This is the only proposal for which I don't yet have any code. I want to add official support for Python's standard logging framework to Django. It's possible to use this at the moment (I've done so on several projects) but it's not at all clear what the best way of doing so is, and Django doesn't use it internally at all. I posted a &lt;a href="http://groups.google.com/group/django-developers/browse_thread/thread/8551ecdb7412ab22"&gt;full argument in favour of logging&lt;/a&gt; to the mailing list, but my favourite argument is this one:&lt;/p&gt;

&lt;blockquote cite="http://groups.google.com/group/django-developers/browse_thread/thread/8551ecdb7412ab22"&gt;&lt;p&gt;Built-in support for logging reflects a growing reality of modern Web development: more and more sites have interfaces with external web service APIs, meaning there are plenty of things that could go wrong that are outside the control of the developer. Failing gracefully and logging what happened is the best way to deal with 3rd party problems - much better than throwing a 500 and leaving no record of what went wrong.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I'm not actively pursuing this one yet, but I'm very interesting in hearing people's opinions on the best way to configure and use the Python logging module in production.&lt;/p&gt;

&lt;h4&gt;A replacement for get_absolute_url()&lt;/h4&gt;

&lt;p&gt;Django has a loose convention of encouraging people to add a &lt;code&gt;get_absolute_url&lt;/code&gt; method to their models that returns that object's URL. It's a controversial feature - for one thing, it's a bit of a layering violation since URL logic is meant to live in the &lt;samp&gt;urls.py&lt;/samp&gt; file. It's incredibly convenient though, and since it's good web citizenship for everything to have one and only one URL I think there's a pretty good argument for keeping it.&lt;/p&gt;

&lt;p&gt;The problem is, the name sucks. I first took a look at this in the last few weeks before the release of Django 1.0 - what started as a quick proposal to come up with a better name before we were stuck with it quickly descended in to a quagmire as I realised quite how broken &lt;code&gt;get_absolute_url()&lt;/code&gt; is. The short version: in some cases it means "get a relative URL starting with /", in other cases it means "get a full URL starting with http://" and the name doesn't accurately describe either.&lt;/p&gt;

&lt;p&gt;A full write-up of my investigation is &lt;a href="http://code.djangoproject.com/wiki/ReplacingGetAbsoluteUrl"&gt;available on the Wiki&lt;/a&gt;. My proposed solution was to replace it with two complementary methods - &lt;code&gt;get_url()&lt;/code&gt; and &lt;code&gt;get_url_path()&lt;/code&gt; - with the user implementing one hence allowing the other one to be automatically derived. My &lt;a href="http://github.com/simonw/django-urls"&gt;django-urls&lt;/a&gt; project illustrates the concept via a model mixin class. A year on I still think it's quite a neat idea, though as far as I can tell no one has ever actually used it.&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;&lt;a href="http://code.djangoproject.com/wiki/ReplacingGetAbsoluteUrl"&gt;ReplacingGetAbsoluteUrl&lt;/a&gt; on the wiki&lt;/li&gt;
    &lt;li&gt;&lt;a href="http://github.com/simonw/django-urls"&gt;django-urls&lt;/a&gt; on GitHub&lt;/li&gt;
    &lt;li&gt;&lt;a href="http://groups.google.com/group/django-developers/browse_thread/thread/7e69c39c23ec1079"&gt;Recent get_absolute_url discussion&lt;/a&gt; on the django-developers mailing list&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Comments on this post are open, but if you have anything to say about any of the individual proposals it would be much more useful if you posted it to the relevant mailing list thread.&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Sep/28/ponies/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Sep/28/ponies/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="cookies" /><category term="cryptography" /><category term="csrf" /><category term="django" /><category term="html" /><category term="logging" /><category term="lukeplant" /><category term="markup" /><category term="ponies" /><category term="projects" /><category term="python" /><category term="security" /><category term="signedcookies" /><category term="signing" /><category term="xhtml" /></entry><entry><title>Hack Day tools for non-developers</title><link href="http://simonwillison.net/2009/Jul/28/tools/" rel="alternate" /><updated>2009-07-28T14:23:53Z</updated><id>http://simonwillison.net/2009/Jul/28/tools/</id><summary type="html">&lt;p&gt;We're about to run our second internal hack day at the Guardian. The first was &lt;a href="http://www.guardian.co.uk/global/insideguardian/2008/nov/18/guardian-hack-day-results" title="Results from Hack Day at the Guardian"&gt;an enormous amount of fun&lt;/a&gt; and the second one looks set to be even more productive.&lt;/p&gt;

&lt;p&gt;There's only one rule at hack day: build something you can demonstrate at the end of the event (Powerpoint slides don't count). Importantly though, our hack days are not restricted to just our development team: anyone from the technology department can get involved, and we extend the invitation to other parts of the organisation as well. At the Guardian, this includes journalists.&lt;/p&gt;

&lt;p&gt;For our first hack day, I put together a list of "tools for non-developers" - sites, services and software that could be used for hacking without programming knowledge as a pre-requisite. I'm now updating that list with recommendations from elsewhere. Here's the list so far:&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://www.freebase.com/"&gt;Freebase&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;Originally a kind of structured version of Wikipedia, Freebase changed its focus last year towards being a "social database about things you know and love". In other words, it's the most powerful OCD-enabler in the history of the world. Create your own "Base" on any subject you like, set up your own types and start gathering together topics from the millions already available in Freebase - or add your own. Examples include the &lt;a href="http://battlestargalactica.freebase.com/"&gt;Battlestar Galactica base&lt;/a&gt;, the &lt;a href="http://tallships.freebase.com/"&gt;Tall Ships base&lt;/a&gt; and the fabulous &lt;a href="http://database.freebase.com/"&gt;Database base&lt;/a&gt;. If you &lt;em&gt;are&lt;/em&gt; a developer the tools in the &lt;a href="http://www.freebase.com/make"&gt;Make Things with Freebase&lt;/a&gt; section are top notch.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://www.dabbledb.com/"&gt;Dabble DB&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;Dabble is a weird combination of a spreadsheet, an online database and a set of visualisation tools. Watch the 8 minute demo to get an idea of how powerful this is - you can start off by loading in an existing spreadsheet and take it from there. You'll need to sign up for the free 30 day trial.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://docs.google.com/"&gt;Google Docs&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;You can always build a hack in Excel, but &lt;a href="http://docs.google.com/"&gt;Google Spreadsheets&lt;/a&gt; is surprisingly powerful and means that you can collaborate with others on your hack (including developers, who can use the Google Docs API to get at the data in your spreadsheet). Check out the following tutorials, which describe ways of using Google Spreadsheets to scrape in data from other webpages and output it in interesting formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://ouseful.wordpress.com/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/"&gt;Data Scraping Wikipedia with Google Spreadsheets&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://ouseful.wordpress.com/2008/10/23/calling-amazon-associatesecommerce-web-services-from-a-google-spreadsheet/"&gt;Calling Amazon Associates/Ecommerce Web Services from a Google Spreadsheet&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's also a simple way to &lt;a href="http://docs.google.com/support/bin/answer.py?hl=en&amp;amp;answer=87809"&gt;create a form&lt;/a&gt; that submits data in to a Google Spreadsheet.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://pipes.yahoo.com/"&gt;Yahoo! Pipes&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;Visual tools for combining, filtering and modifying RSS feeds. Combine with the large number of &lt;a href="http://www.guardian.co.uk/help/insideguardian/2008/oct/22/full-fat-rss-feed-upgrade" title="Upgrading our RSS feeds"&gt;full-content feeds on guardian.co.uk&lt;/a&gt; for all sorts of interesting possibilities. Here's &lt;a href="http://ouseful.wordpress.com/2008/10/20/mashup-reuse-are-you-lazy-enough/" title="Mashup Reuse – Are You Lazy Enough?"&gt;a tutorial&lt;/a&gt; that incorporates Google Docs as well.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://maps.google.com/help/maps/mymaps/create.html"&gt;Google My Maps&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;Google provide a really neat interface for adding your own points, lines and areas to a Google Map. Outputs KML, a handy file format for carting geographic data around between different tools.&lt;/p&gt;

&lt;p&gt;If you already have a KML or GeoRSS feed URL from somewhere (e.g. the output of a Yahoo! Pipe), you can paste it directly in to the Google Maps search box to see the points rendered on a map.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://sketchup.google.com/"&gt;Google SketchUp&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;A simple to use 3D drawing package that lets you create 3D models of real-world buildings and then import them in to &lt;a href="http://earth.google.com/"&gt;Google Earth&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://www.openstreetmap.org/"&gt;OpenStreetMap&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;Try your hand at some open source cartography on OpenStreetMap, the geographic world's answer to Wikipedia. If you have the equipment you can contribute GPS traces, otherwise there's a clever online editor that will let you trace out roads from satellite photos - or you could just make sure your favourite pub is included on the map. The export tools can provide vector or static maps, and if you export as SVG you can further edit your map in Illustrator or Inkscape.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://maps.cloudmade.com/"&gt;CloudMade Maps&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;Commercial tools built on top of &lt;a href="http://www.openstreetmap.org/"&gt;OpenStreetMap&lt;/a&gt;, the most exciting of which allows you to create your own map theme by setting your preferred colours and line widths for various types of map feature.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://manyeyes.alphaworks.ibm.com/manyeyes/"&gt;Many Eyes&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;IBM Research's suite of data visualisation tools, with a wiki-style collaboration platform for publishing data and creating visualisations.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://www.dapper.net/open/"&gt;Dapper&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;Dapper provides a powerful tool for screen scraping websites, without needing to write any code. Output formats include RSS, iCalendar and Google Maps.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://www.tiddlywiki.com/"&gt;TiddlyWiki&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;TiddlyWiki is a complete wiki in a single HTML file, which you can save locally and use as a notebook, collaboration tool and much more. There's a large ecosystem of plugins and macros which can be used to extend it with new features - see &lt;a href="http://tiddlyvault.tiddlyspot.com/"&gt;TiddlyVault&lt;/a&gt; for an index.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://www.wolframalpha.com/"&gt;WolframAlpha&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;The "computational knowledge engine" with the &lt;a href="http://unqualified-reservations.blogspot.com/2009/07/wolfram-alpha-and-hubristic-user.html"&gt;hubristic search-based interface&lt;/a&gt;, potentially useful as a source of data and a tool for processing and visualising that data.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://www.tumblr.com/"&gt;Tumblr&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;Useful as both an input and an output for feeds processed using other tools, and with a smart bookmarklet for collecting bits and pieces from around the web.&lt;/p&gt;

&lt;h4&gt;&lt;a href="http://wiki.english.ucsb.edu/index.php/Toy_Chest_(Online_or_Downloadable_Tools_for_Building_Projects)"&gt;The UCSB Toy Chest&lt;/a&gt;&lt;/h4&gt;

&lt;p&gt;An outstanding list of tools that people "without programming skills (but with basic computer and Internet literacy) can use to create interesting projects", compiled by the English department at UC Santa Barbara.&lt;/p&gt;

&lt;h3&gt;Your help needed&lt;/h3&gt;

&lt;p&gt;There must be dozens, if not hundreds of useful tools missing from the above. Tell me in the comments and I'll add them to the list.&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Jul/28/tools/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Jul/28/tools/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="freebase" /><category term="google" /><category term="googlemaps" /><category term="guardian" /><category term="hackday" /><category term="mapping" /><category term="nondevelopers" /><category term="openstreetmap" /><category term="pipes" /><category term="sketchup" /><category term="tools" /><category term="yahoopipes" /></entry><entry><title>Teaching users to be secure is a shared responsibility</title><link href="http://simonwillison.net/2009/Jul/16/responsibility/" rel="alternate" /><updated>2009-07-16T20:04:45Z</updated><id>http://simonwillison.net/2009/Jul/16/responsibility/</id><summary type="html">&lt;p&gt;Ryan Janssen: &lt;a href="http://drstarcat.com/archives/133"&gt;Why an OAuth iframe is a Great Idea&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote cite="http://drstarcat.com/archives/133"&gt;&lt;p&gt;The reason the OAuth community prefers that we open up a new window is that if you look at the URL in the window (the place you type in a site’s name), you would see that it says www.netflix.com* and know that you are giving your credentials to Netflix.&lt;/p&gt;

&lt;p&gt;Or would you?  I would!  Other technologists would!  But would you?  Would you even notice?  If you noticed would you care?  The answer for the VAST majority of the world is of course, no.  In fact to an average person, getting taken to an ENTIRELY other site with some weird little dialog floating in a big page is EXTREMELY suspicious.  The real site you are trusting to do the right thing is SetJam (not weird pop-up window site).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I posted a reply comment on that post, but I'll replicate it in full here:&lt;/p&gt;

&lt;blockquote cite="http://drstarcat.com/archives/133#IDComment27455126"&gt;
&lt;p&gt;Please, please don't do this.&lt;/p&gt;

&lt;p&gt;As web developers we have a shared responsibility to help our users stay safe on the internet. This is becoming ever more important as people move more of their lives online.&lt;/p&gt;

&lt;p&gt;It's an almost sisyphean task. If you want to avoid online fraud, you need to understand an enormous stack of technologies: browsers, web pages, links, URLs, DNS, SSL, certificates... I know user education is never the right answer, but in the case of the Web I honestly can't see any other route.&lt;/p&gt;

&lt;p&gt;The last thing we need is developers making the problem worse by encouraging unsafe behaviour. That was the whole POINT of OAuth - the password anti-pattern was showing up everywhere, and was causing very real problems. OAuth provides an alternative, but we still have a long way to go convincing users not to hand their password over to any site that asks for it. Still, it's a small victory in a much bigger war.&lt;/p&gt;

&lt;p&gt;If developers start showing OAuth in an iframe, that victory was for nothing - we may as well not have bothered. OAuth isn't just a protocol, it's an ambitious attempt to help users understand the importance of protecting their credentials, and the fact that different sites should be granted different permissions with regards to accessing their stuff. This is a difficult but critical lesson for users to learn. The only real hope is if OAuth, implemented correctly, spreads far enough around the Web that people start to understand it and get a feel for how it is meant to work.&lt;/p&gt;

&lt;p&gt;By implementing OAuth in an iframe you are completely undermining this effort - and in doing so you're contributing to a tragedy of the commons where selfish behaviour on the behalf of a few causes problems for everyone else. Even worse, if the usability DOES prove to be better (which wouldn't be surprising) you'll be actively encouraging people to implement OAuth in an insecure way - your competitors will hardly want to keep doing things the secure way if you are getting higher conversion rates than they are.&lt;/p&gt;

&lt;p&gt;So once again, please don't do this.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I hope my argument is convincing. In case it isn't, I'd strongly suggest that any sites offering OAuth protected APIs add frame-busting JavaScript to their OAuth verification pages. Thankfully, in this case there's a technical option for protecting the commons.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; It turns out Netflix already use a frame-busting script on their OAuth authentication page.&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Jul/16/responsibility/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Jul/16/responsibility/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="education" /><category term="framebusting" /><category term="iframe" /><category term="oauth" /><category term="phishing" /><category term="responsibility" /><category term="security" /></entry><entry><title>Facebook Usernames and OpenID</title><link href="http://simonwillison.net/2009/Jun/13/thefacebookdebacle/" rel="alternate" /><updated>2009-06-13T17:01:00Z</updated><id>http://simonwillison.net/2009/Jun/13/thefacebookdebacle/</id><summary type="html">&lt;p&gt;Today's launch of &lt;a href="http://search.twitter.com/search?q=%23fufacebook"&gt;Facebook Usernames&lt;/a&gt; provides an obvious and exciting opportunity for Facebook to become an OpenID provider. Facebook have clearly demonstrated their interest in becoming the key online identity for their users, and the new usernames feature is their acknowledgement that URL-based identities are an important component of that, no doubt driven in part by Twitter making usernames trendy again.&lt;/p&gt;

&lt;p&gt;It's interesting to consider Facebook's history with regards to OpenID and single sign on in general. When I started publicly advocating for OpenID &lt;a href="http://simonwillison.net/2007/talks/"&gt;back in 2007&lt;/a&gt;, my primary worry was that someone would solve the SSO problem in a proprietary way, irreparably damaging the decentralised nature of the Web - just as Microsoft had attempted a few years earlier with Passport.&lt;/p&gt;

&lt;p&gt;When Facebook Connect was announced &lt;a href="http://blog.facebook.com/blog.php?post=24577977130"&gt;a year ago&lt;/a&gt; it seemed like my worst fears had become realised. Facebook Connect's user experience was a huge improvement over OpenID - with only one provider, the sign in UI could be reduced to a single button. Their use of a popup window for the sign in flow was inspired - various usability studies have since shown that users are much more likely to complete a SSO flow if they can see the site they are signing in to in a background window.&lt;/p&gt;

&lt;p&gt;Thankfully, Facebook seem to understand that the industry isn't willing to accept a single SSO provider, no matter how smooth their implementation. Mark Zuckerberg made reassuring noises about OpenID support at both &lt;a href="http://news.cnet.com/8301-13577_3-10063328-36.html"&gt;FOWA 2008&lt;/a&gt; and &lt;a href="http://www.readwriteweb.com/archives/mark_zuckerberg_on_data_portab.php"&gt;SxSW 2009&lt;/a&gt;, but things really stepped up earlier this year when &lt;a href="http://openid.net/2009/02/05/facebook-joins-openid-foundation-board/"&gt;Facebook joined the OpenID Foundation Board&lt;/a&gt; (accompanied by a substantial financial donation). Facebook's board representative, &lt;a href="http://www.sociallipstick.com/"&gt;Luke Shepherd&lt;/a&gt;, is an excellent addition and brings a refreshingly user-centric approach to OpenID. Luke was previously responsible for much of the work on Facebook Connect and has been advocating OpenID inside Facebook for a long time.&lt;/p&gt;

&lt;p&gt;Facebook may not have committed to becoming a provider yet (at least not in public), but their decision to become a consumer first is another interesting data point. They may be trying to avoid the common criticism thrown at companies who provide but don't consume - if they're not willing to eat their own dog food, why should anyone else?&lt;/p&gt;

&lt;p&gt;At any rate, their consumer implementation is fascinating. It's live right now, even though there's no OpenID login box anywhere to be seen on the site. Instead, Facebook take advantage of the little known &lt;a href="http://openid.net/specs/openid-authentication-2_0.html#anchor28"&gt;checkid_immediate mode&lt;/a&gt;. Once you've associated your OpenID with your Facebook account (using the "Linked Accounts" section of the settings pane) Facebook sets a cookie remembering your OpenID provider, which persists even after you log out of Facebook. When you later visit the Facebook homepage, a checkid_immediate request is silently sent to your provider, logging you in automatically if you are already authenticated there.&lt;/p&gt;

&lt;p&gt;While it's great to see innovation with OpenID at such a large scale, I'm not at all convinced that they've got this right. The feature is virtually invisible to users (it took me a bunch of research to figure out how to use it) and not at all intuitive - if I've logged out of Facebook, how come visiting the home page logs me straight back in again? I guess this is why Luke is keen on &lt;a href="http://www.sociallipstick.com/2009/05/logout-the-other-half-of-the-identity-equation/"&gt;exploring single sign out with OpenID&lt;/a&gt;. It sounds like the current OpenID consumer support is principally intended as a developer preview, and I'm looking forward to seeing how they change it based on ongoing user research.&lt;/p&gt;

&lt;p&gt;As OpenID provider implementation is an obvious next step that can't be that far off - I wouldn't be surprised to hear an announcement within a month or two.&lt;/p&gt;

&lt;h3&gt;HTTP redirect codes&lt;/h3&gt;

&lt;p&gt;As an aside, I decided to check that Facebook were using the correct 3xx HTTP status code to redirect from &lt;a href="http://www.facebook.com/profile.php?id=666590500"&gt;my old profile page&lt;/a&gt; to &lt;a href="http://www.facebook.com/swillison"&gt;my new one&lt;/a&gt;. I was horrified to discover that they are using a 200 code, followed by &lt;a href="http://gist.github.com/129240"&gt;a chunk of JavaScript&lt;/a&gt; to implement the redirect! The situation for logged out users is better but still fundamentally flawed: if you enable your public search listing (using an option tucked away on &lt;a href="http://www.facebook.com/privacy/?view=search"&gt;www.facebook.com/privacy/?view=search&lt;/a&gt;) and &lt;samp&gt;curl -i&lt;/samp&gt; your old profile URL you get a 302 Found, when the correct status code is clearly a 301 Moved Permanently.&lt;/p&gt;

&lt;p&gt;One final note: it almost goes without saying, but one of the best things about OpenID is that you can register a real domain name that you can own, instead of just having another URL on Facebook.&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Jun/13/thefacebookdebacle/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Jun/13/thefacebookdebacle/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="facebook" /><category term="fufacebook" /><category term="http" /><category term="openid" /><category term="sso" /><category term="thefacebookdebacle" /></entry><entry><title>djng - a Django powered microframework</title><link href="http://simonwillison.net/2009/May/19/djng/" rel="alternate" /><updated>2009-05-19T00:13:31Z</updated><id>http://simonwillison.net/2009/May/19/djng/</id><summary type="html">&lt;p&gt;&lt;a href="http://github.com/simonw/djng"&gt;djng&lt;/a&gt; is nearly two weeks old now, so it's about time I wrote a bit about the project.&lt;/p&gt;

&lt;p&gt;I presented a keynote at EuroDjangoCon in Prague earlier this month entitled &lt;a href="http://simonwillison.net/2009/talks/eurodjangocon-heresies/"&gt;Django Heresies&lt;/a&gt;. The talk followed the noble DjangoCon tradition (established last year with the help of Mark Ramm and Cal Henderson) of pointing a spotlight at Django's flaws. In my case, it was a chance to apply the benefit of hindsight to some of the design decisions I helped make back at the Lawrence Journal-World in 2004.&lt;/p&gt;

&lt;p&gt;I took a few cheap shots at things like the &lt;code class="django"&gt;{% endifequal %}&lt;/code&gt; tag and error silencing in the template system, but the three substantial topics in my talk were class-based generic views (I'm a fan), my hatred of settings.py and my interest in &lt;a href="http://en.wikipedia.org/wiki/Turtles_all_the_way_down"&gt;turtles all the way down&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Why I hate settings.py&lt;/h4&gt;

&lt;p&gt;In the talk, I justified my dislike for settings.py by revisiting the problems behind PHP's magic quotes feature (finally going away for good in PHP 6). Magic quotes were one of the main reasons I switched to Python from PHP.&lt;/p&gt;

&lt;p&gt;My main problem with magic quotes was that they made it extremely difficult to write reusable PHP code. The feature was configured globally, which lead to a quandary. What if you have two libraries, one expecting magic quotes on and the other expecting it off? Your library could check &lt;code class="php"&gt;get_magic_quotes_gpc()&lt;/code&gt; and &lt;code class="php"&gt;stripslashes()&lt;/code&gt; from input if the setting was turned on, but this would break in the presence of the common idiom where &lt;code class="php"&gt;stripslashes()&lt;/code&gt; is applied to all incoming &lt;code class="php"&gt;$_GET&lt;/code&gt; and &lt;code class="php"&gt;$_POST&lt;/code&gt; data.&lt;/p&gt;

&lt;p&gt;Unfortunately, global settings configured using settings.py have a similar smell to them. Middleware and context processors are the best example here - a specific setting might be needed by just one installed application, but the effects are felt by everything in the system. While I haven't yet seen two "reusable" Django apps that require conflicting settings, per-application settings are an obvious use case that settings.py fails to cover.&lt;/p&gt;

&lt;p&gt;Global impact aside, my bigger problem with settings.py is that I almost always end up wanting to &lt;em&gt;reconfigure them at run-time&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This is possible in Django today, but comes at a price:&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;Only some settings can actually be changed at run-time - others (such as USE_I18N) are lazily evaluated once and irreversibly reconfigure parts of Django's plumbing. Figuring out which ones can be changed requires exploration of Django's source code.&lt;/li&gt;
    &lt;li&gt;If you change a setting, you need to reliably change it back at the end of a request or your application will behave strangely. Uncaught exceptions could cause problems here, unless you remember to wrap dynamic setting changes in a try/finally block.&lt;/li&gt;
    &lt;li&gt;Changing a setting isn't thread-safe (without doing some extra work).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Almost every setting in Django has legitimate use-cases for modification at run-time. Here are just a few examples:&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;Requests from mobile phones may need a different TEMPLATE_DIRS setting, to load the mobile-specific templates in preference to the site defaults.&lt;/li&gt;
    &lt;li&gt;Some sites offer premium accounts which in turn gain access to more reliable servers. Premium users might get to send e-mail via a separate pool of SMTP servers, for example.&lt;/li&gt;
    &lt;li&gt;Some sections of code may want to use a different cache backend, or talk to a different set of memcache servers - to reduce the chance of one rapidly changing component causing other component's cache entries to expire too early.&lt;/li&gt;
    &lt;li&gt;Errors in one area of a site might need to be sent to a different team of developers.&lt;/li&gt;
    &lt;li&gt;Admin users might want DEBUG=True, while regular site visitors get DEBUG=False.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, settings.py is behind the dreaded "Settings cannot be imported, because environment variable DJANGO_SETTINGS_MODULE is undefined" exception. Yuck.&lt;/p&gt;

&lt;h4&gt;Turtles all the way down&lt;/h4&gt;

&lt;p&gt;The final section of the talk was about turtles. More precisely, it was about their role as an "infinite regression belief about cosmology and the nature of the universe". I want to apply that idea to Django.&lt;/p&gt;

&lt;p&gt;My favourite thing about Django is something I've started to call the "Django Contract": the idea that a Django view is a callable which takes a request object and returns a response object. I want to expand that concept to other parts of Django as well:&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;URLconf: takes a request, dispatches based on &lt;code&gt;request.path&lt;/code&gt;, returns a response.&lt;/li&gt;
    &lt;li&gt;Application: takes a request, returns a response&lt;/li&gt;
    &lt;li&gt;Middleware: takes a request, returns a response (conditionally transforming either)&lt;/li&gt;
    &lt;li&gt;Django-powered site: hooked in to mod_wsgi/FastCGI/a Python web server, takes a request, returns a response&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of a Django site consisting of a settings.py, urls.py and various applications and middlewares, a site would just be a callable that obeys the Django Contract and composes together dozens of other callables.&lt;/p&gt;

&lt;p&gt;At this point, Django starts to look a lot like WSGI. What if WSGI and the Django Contract were interchangeable? WSGI is a wrapper around HTTP, so what if that could be swapped in and out (through proxies) as well? Django, WSGI and HTTP, three breeds of turtle arranged on top of each other in various configurations. Turtles all the way down.&lt;/p&gt;

&lt;h4&gt;djng&lt;/h4&gt;

&lt;p&gt;djng is my experiment to see what Django would like without settings.py and with a whole lot more turtles. It's Yet Another Python Microframework.&lt;/p&gt;

&lt;p&gt;What's a microframework? The best examples are probably &lt;a href="http://webpy.org/"&gt;web.py&lt;/a&gt; (itself a result of Aaron Swartz's frustrations with Django) and &lt;a href="http://www.sinatrarb.com/"&gt;Sinatra&lt;/a&gt;, my all time favourite example of Ruby DSL design. More recent examples in Python include &lt;a href="http://github.com/breily/juno"&gt;juno&lt;/a&gt;, &lt;a href="http://github.com/JaredKuolt/newf"&gt;newf&lt;/a&gt;, &lt;a href="http://github.com/bradleywright/mnml"&gt;mnml&lt;/a&gt; and &lt;a href="http://github.com/toastdriven/itty"&gt;itty&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Microframeworks let you build an entire web application in a single file, usually with only one import statement. They are becoming increasingly popular for building small, self-contained applications that perform only one task - Service Oriented Architecture reborn as a combination of the Unix development philosophy and RESTful API design. I first saw this idea expressed in code by &lt;a href="http://thraxil.org/users/anders/posts/2005/12/12/tasty/"&gt;Anders Pearson&lt;/a&gt; and &lt;a href="http://blog.ianbicking.org/little-apps-instead-of-little-frameworks.html"&gt;Ian Bicking&lt;/a&gt; back in 2005.&lt;/p&gt;

&lt;p&gt;Unlike most microframeworks, djng has a pretty big dependency: Django itself. The plan is to reuse everything I like about Django (the templates, the ORM, view functions, the form library etc) while replacing just the top level plumbing and removing the requirement for separate settings.py and urls.py files.&lt;/p&gt;

&lt;p&gt;This is what "Hello, world" looks like in in djng:&lt;/p&gt;

&lt;pre&gt;&lt;code class="python"&gt;import djng

def index(request):
    return djng.Response('Hello, world')

if __name__ == '__main__':
    djng.serve(index, '0.0.0.0', 8888)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code class="python"&gt;djng.Response&lt;/code&gt; is an alias for Django's &lt;code class="python"&gt;HttpResponse&lt;/code&gt;. &lt;code class="python"&gt;djng.serve&lt;/code&gt; is a utility function which converts up anything fulfilling the Django Contract in to a WSGI application, then exposes it over HTTP.&lt;/p&gt;

&lt;p&gt;Let's add URL routing to the example:&lt;/p&gt;

&lt;pre&gt;&lt;code class="python"&gt;app = djng.Router(
    (r'^hello$', lambda request: djng.Response('Hello, world')),
    (r'^goodbye$', lambda request: djng.Response('Goodbye, world')),
)

if __name__ == '__main__':
    djng.serve(app, '0.0.0.0', 8888)&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The implementation of djng.Router is just &lt;a href="http://github.com/simonw/djng/blob/c892dddf064d5542c17119d02920ea4f5e9dd7f5/djng/router.py"&gt;a few lines of glue code&lt;/a&gt; adding a nicer API to Django's internal RegexURLResolver class.&lt;/p&gt;

&lt;h4&gt;Services, not settings&lt;/h4&gt;

&lt;p&gt;The trickiest problem I still need to solve is how to replace settings.py. A group of developers (including &lt;a href="http://www.holovaty.com/"&gt;Adrian&lt;/a&gt;, &lt;a href="http://lucumr.pocoo.org/"&gt;Armin&lt;/a&gt;, &lt;a href="http://lazypython.blogspot.com/"&gt;Alex&lt;/a&gt; and myself) had an excellent brainstorming session at EuroDjangoCon about this. We realised that most of the stuff in settings.py can be recast as configuring &lt;em&gt;services&lt;/em&gt; which Django makes available to the applications it is hosting. Services like the following:&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;Caching&lt;/li&gt;
    &lt;li&gt;Templating&lt;/li&gt;
    &lt;li&gt;Sending e-mail&lt;/li&gt;
    &lt;li&gt;Sessions&lt;/li&gt;
    &lt;li&gt;Database connection - &lt;code&gt;django.db.connection&lt;/code&gt;&lt;/li&gt;
    &lt;li&gt;Higher level ORM&lt;/li&gt;
    &lt;li&gt;File storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of the above needs to be configured, and each also might need to be reconfigured at runtime. Django already points in this direction by providing hooks for adding custom backends for caching, template loading, file storage and session support. What's missing is an official way of swapping in different backends at runtime.&lt;/p&gt;

&lt;p&gt;I'm currently leaning towards the idea of a "stack" of service implementations, one for each of the service categories listed above. A new implementation could be pushed on to the stack at any time during the Django request/response cycle, and will be automatically popped back off again before the next request is processed (all in a thread-safe manner). Applications would also be able to instantiate and use a particular service implementation directly should they need to do so.&lt;/p&gt;

&lt;p&gt;A few days ago I heard about &lt;a href="http://pypi.python.org/pypi/Contextual"&gt;Contextual&lt;/a&gt;, which appears to be trying to solve a similar problem. Just a few minutes ago I stumbled across &lt;a href="http://pythonpaste.org/modules/registry.html"&gt;paste.registry's StackedObjectProxy&lt;/a&gt; which seems to be exactly what I've been busily reinventing.&lt;/p&gt;

&lt;p&gt;My current rough thoughts on an API for this can be found in &lt;a href="http://github.com/simonw/djng/blob/c892dddf064d5542c17119d02920ea4f5e9dd7f5/services_api_ideas.txt"&gt;services_api_ideas.txt&lt;/a&gt;. I'm eager to hear suggestions on how to tackle this problem.&lt;/p&gt;

&lt;p&gt;djng is very much an experiment at the moment - I wouldn't suggest building anything against it unless you're willing to maintain your own fork. That said, the code is all on GitHub partly because I want people to fork it and experiment with their own API concepts as much as possible.&lt;/p&gt;

&lt;p&gt;If you're interested in exploring these concepts with me, please join me on the brand new &lt;a href="http://groups.google.com/group/djng"&gt;djng mailing list&lt;/a&gt;.&lt;/p&gt;


&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/May/19/djng/#comments"&gt;&lt;img src="http://simonwillison.net/2009/May/19/djng/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="django" /><category term="djangoheresies" /><category term="djng" /><category term="eurodjangocon" /><category term="github" /><category term="php" /><category term="projects" /><category term="python" /><category term="services" /><category term="settingspy" /><category term="talks" /><category term="webframeworks" /></entry><entry><title>rev=canonical bookmarklet and designing shorter URLs</title><link href="http://simonwillison.net/2009/Apr/11/revcanonical/" rel="alternate" /><updated>2009-04-11T17:37:55Z</updated><id>http://simonwillison.net/2009/Apr/11/revcanonical/</id><summary type="html">&lt;p&gt;I've watched the proliferation of URL shortening services over the past year with a certain amount of dismay. I care about the health of the web and try to ensure that URLs I am responsible will last for as long as possible, and I think it's very unlikely that all of these new services will still be around in twenty years time. Last month &lt;a href="http://simonwillison.net/2009/Mar/8/twitter/"&gt;I suggested&lt;/a&gt; that the Internet Archive start mirroring redirect databases, and last week I was &lt;a href="http://simonwillison.net/2009/Apr/3/tinyurl/"&gt;pleased to hear&lt;/a&gt; that Archiveteam, a different organisation, had &lt;a href="http://archiveteam.org/index.php?title=TinyURL"&gt;already started crawling&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The most recent discussion was kicked off by &lt;a href="http://joshua.schachter.org/2009/04/on-url-shorteners.html"&gt;Joshua Schachter&lt;/a&gt; and &lt;a href="http://www.scripting.com/stories/2009/03/07/solvingTheTinyurlCentraliz.html"&gt;Dave Winer&lt;/a&gt;, and &lt;a href="http://laughingmeme.org/2009/04/03/url-shortening-hinting/" title="URL Shortening Hinting"&gt;a solution has emerged&lt;/a&gt; driven by some lightning fast hacking by Kellan Elliott-McCrea. The idea is simple: sites get to chose their preferred source of shortened URLs (including self-hosted solutions) and specify it from individual pages using &lt;code&gt;&amp;lt;link rev="canonical" href="... shorter URL here ..."&amp;gt;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;By hosting their own shorteners, the reliability should match that of the host site - and the amount of damage caused by a major shortener going missing can be dramatically reduced.&lt;/p&gt;

&lt;p&gt;I've been experimenting with this new pattern today. Here are a few small contributions to the wider discussion.&lt;/p&gt;

&lt;h4&gt;A URL shortening bookmarklet&lt;/h4&gt;

&lt;p&gt;Kellan's &lt;a href="http://revcanonical.appspot.com/"&gt;rev=canonical service&lt;/a&gt; exposes rev=canonical links using a server-side script running on App Engine. An obvious next step is to distil that logic in to a bookmarklet. I decided to combine the rev=canonical logic with my &lt;a href="http://simonwillison.net/2008/Aug/27/jsontinyurl/"&gt;json-tinyurl&lt;/a&gt; web service (also on App Engine), which allows browsers to lookup or create TinyURLs using a cross-domain JSONP request. The resulting bookmarklet will display the site's rev=canonical link if it exists, or create and display a TinyURL link otherwise:&lt;/p&gt;

&lt;p&gt;Bookmarklet: &lt;a href="javascript:(function(){var url=document.location;var links=document.getElementsByTagName('link');var found=0;for(var i = 0, l; l = links[i]; i++){if(l.getAttribute('rev')=='canonical'||(/alternateshort/).exec(l.getAttribute('rel'))) {found=l.getAttribute('href');break;}}if (!found) {for (var i = 0; l = document.links[i]; i++) {if (l.getAttribute('rev') == 'canonical') {found = l.getAttribute('href');break;}}}if (found) {prompt('URL:', found);} else {window.onTinyUrlGot = function(r) {if (r.ok) {prompt('URL:', r.tinyurl);} else {alert('Could not shorten with tinyurl');}};var s = document.createElement('script');s.type='text/javascript';s.src='http://json-tinyurl.appspot.com/?callback=onTinyUrlGot&amp;amp;url=' +document.location;document.getElementsByTagName('head')[0].appendChild(s);}})();"&gt;Shorten&lt;/a&gt; (drag to your browser toolbar)&lt;/p&gt;

&lt;p&gt;You can also grab the &lt;a href="http://gist.github.com/93591"&gt;uncompressed source code&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;Designing short URLs&lt;/h4&gt;

&lt;p&gt;I've also implemented rev=canonical on this site. I ended up buying a new domain for this, since simonwillison.net is both difficult to spell and 17 characters long. I ended up going with swtiny.eu - 9 characters, and keeping tiny in the domain helps people guess the nature of the site from just the URLs it generates. Be warned: the DNS doesn't appear to have finished resolving yet.&lt;/p&gt;

&lt;p&gt;For the path component, I turned to a variant of base 62 encoding. Decimal integers are represented using 10 digits (0-9), but base 62 uses those digits plus the letters of the alphabet in both lower and upper case. A 13 character integer such as 7250397214971 compresses down to just 8 characters (CDeIPpOD) using base62. My &lt;a href="http://www.djangosnippets.org/snippets/1431/"&gt;baseconv.py module&lt;/a&gt; implements base62, among others. I considered using base 57 by excluding o, O, 0, 1 and l as being too easily confused but decided against it.&lt;/p&gt;

&lt;p&gt;This site has three key types of content: entries, blogmarks and quotations. Each one is a separate Django model, and hence each has its own underlying database table and individual ID sequence. Since the IDs overlap, I need a way of separating out the shortened URLs for each content type.&lt;/p&gt;

&lt;p&gt;I decided to spend a byte on namespacing my shortened URLs. A prefix of E means an entry, Q means a quotation and B means a blogmark. For example:&lt;/p&gt;

&lt;ul&gt;
    &lt;li&gt;&lt;samp&gt;http://swtiny.eu/EZ8&lt;/samp&gt;: Entry with ID 1584&lt;/li&gt;
    &lt;li&gt;&lt;samp&gt;http://swtiny.eu/BBEQ&lt;/samp&gt;: Blogmark with ID 4108&lt;/li&gt;
    &lt;li&gt;&lt;samp&gt;http://swtiny.eu/QE5&lt;/samp&gt;: Quotation with ID 279&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By using upper case letters for the prefixes, I can later define custom paths starting with a lower case letter. I also have another 23 upper case prefix letters reserved in case I need them.&lt;/p&gt;

&lt;p&gt;I &lt;a href="http://twitter.com/simonw/status/1496864191"&gt;asked on Twitter&lt;/a&gt; and consensus opinion was that a 301 permanent redirect was the right thing to do (as opposed to a 302), both for SEO reasons and because the content will never exist at the shorter URL.&lt;/p&gt;

&lt;h4&gt;Implementation using Django and nginx&lt;/h4&gt;

&lt;p&gt;I run all of my Django sites using Apache and &lt;a href="http://code.google.com/p/modwsgi/"&gt;mod_wsgi&lt;/a&gt;, proxied behind &lt;a href="http://nginx.net/"&gt;nginx&lt;/a&gt;. Each site gets an Apache running on a high port, and nginx deals with virtual host configuration (proxying each domain to a different Apache backend) and static file serving. I didn't want to set up a full Django site just to run swtiny.eu, especially since my existing blog engine was required in order to resolve the shortened URLs.&lt;/p&gt;

&lt;p&gt;Instead, I implemented the shortened URL direction as just another view within my existing site: &lt;samp&gt;http://simonwillison.net/shorter/EZ8&lt;/samp&gt;. I then configured nginx to invisibly requests to &lt;samp&gt;swtiny.eu&lt;/samp&gt; through to that URL. The correct incantation took a while to figure out, so here's the relevant section of my nginx.conf:&lt;/p&gt;

&lt;pre&gt;&lt;code class="nginx-conf"&gt;server {
    listen 80;
    server_name www.swtiny.eu swtiny.eu;
    location / {
        rewrite (.*) /shorter$1 break;
        proxy_pass http://simonwillison.net;
        proxy_redirect off;
    }
}&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;code&gt;proxy_redirect off&lt;/code&gt; is needed to prevent nginx from replacing &lt;samp&gt;simonwillison.net&lt;/samp&gt; in the resulting location header with &lt;samp&gt;swtiny.eu&lt;/samp&gt;. My Django view code is relatively shonky, but if you're interested you can &lt;a href="http://www.djangosnippets.org/snippets/1430/"&gt;find it here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The nice thing about this approach is that it makes it trivial to add custom URL shortening domains to other projects - a quick view function and a few lines of nginx configuration are all that is needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; The bookmarklet now supports the rev attribute on A elements as well - &lt;a href="http://simonwillison.net/2009/Apr/11/revcanonical/#c44088"&gt;thanks for the suggestion&lt;/a&gt;, Jeremy.&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Apr/11/revcanonical/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Apr/11/revcanonical/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="bookmarklets" /><category term="davewiner" /><category term="django" /><category term="joshuaschachter" /><category term="kellanelliottmccrea" /><category term="projects" /><category term="python" /><category term="revcanonical" /><category term="tinyurl" /><category term="urls" /></entry><entry><title>List of SxSW 2009 panels with "social" in the title</title><link href="http://simonwillison.net/2009/Mar/14/sxsw/" rel="alternate" /><updated>2009-03-14T23:02:54Z</updated><id>http://simonwillison.net/2009/Mar/14/sxsw/</id><summary type="html">&lt;ul&gt;
	&lt;li&gt;A Hard Sell? Social Media &amp;amp; Your Boss&lt;/li&gt;
	&lt;li&gt;Can Social Media End Racism?&lt;/li&gt;
	&lt;li&gt;Digital Urbanites: How To Become Part of the New Social Capital&lt;/li&gt;
	&lt;li&gt;The Future Of Social Networks&lt;/li&gt;
	&lt;li&gt;How Social Networks Are Killing the Revolution&lt;/li&gt;
	&lt;li&gt;Making Whuffie: Raising Social Capital in Online Communities&lt;/li&gt;
	&lt;li&gt;The Mix at Six Hosted by Social Media Group&lt;/li&gt;
	&lt;li&gt;Mobile Social SXSW BBQ&lt;/li&gt;
	&lt;li&gt;My Boss Doesn't Get It: Championing Social Media to the Man&lt;/li&gt;
	&lt;li&gt;PBS' Interactive Social Media &amp;amp; Online Video Studio&lt;/li&gt;
	&lt;li&gt;The Search for a More Social Web&lt;/li&gt;
	&lt;li&gt;Security for the Social Set&lt;/li&gt;
	&lt;li&gt;Social Engineering: Scam Your Way Into Anything or From Anybody&lt;/li&gt;
	&lt;li&gt;Social Gamers: Away From the Keyboard&lt;/li&gt;
	&lt;li&gt;Social Media For Social Good&lt;/li&gt;
	&lt;li&gt;Social Media Marketing&lt;/li&gt;
	&lt;li&gt;Social Media Marketing: An Hour a Day&lt;/li&gt;
	&lt;li&gt;Social Media Nonprofit ROI Poetry Slam&lt;/li&gt;
	&lt;li&gt;Social Media: If You Liked it, Then You Should Have Put a Digg on it...&lt;/li&gt;
	&lt;li&gt;Social Networking in Health: e-Patients, Data &amp;amp; Privacy&lt;/li&gt;
	&lt;li&gt;Social Patterns and Antipatterns For the Win&lt;/li&gt;
	&lt;li&gt;Suxorz '09: The Ten Worst Social Media Campaigns&lt;/li&gt;
	&lt;li&gt;Twitter for Marketers: Is It Still Social Media?&lt;/li&gt;
	&lt;li&gt;Using GPS &amp;amp; Location to Enhance Social Networking&lt;/li&gt;
	&lt;li&gt;Using the New Digital Social Media to Accelerate Sustainability&lt;/li&gt;
&lt;/ul&gt;


&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Mar/14/sxsw/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Mar/14/sxsw/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="presentedwithoutcomment" /><category term="social" /><category term="socialmedia" /><category term="sxsw" /></entry><entry><title>A few notes on the Guardian Open Platform</title><link href="http://simonwillison.net/2009/Mar/10/openplatform/" rel="alternate" /><updated>2009-03-10T14:28:39Z</updated><id>http://simonwillison.net/2009/Mar/10/openplatform/</id><summary type="html">&lt;p&gt;This morning we launched the &lt;a href="http://www.guardian.co.uk/open-platform"&gt;Guardian Open Platform&lt;/a&gt; at a well attended event in our new offices in &lt;a href="http://www.kingsplace.co.uk/"&gt;Kings Place&lt;/a&gt;. This is one of the main projects I've been helping out with since joining the Guardian last year, and it's fantastic to finally have it out in the open.&lt;/p&gt;

&lt;p&gt;There are two components to the launch today: the Content API and the Data Store. I'll describe the Data Store first as it deserves not to get buried in the discussion about its larger cousin.&lt;/p&gt;

&lt;h4&gt;The Data Store&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://www.guardian.co.uk/profile/simonrogers"&gt;Simon Rogers&lt;/a&gt; is the Guardian news editor who is principally responsible for gathering data about the world. If you ever see an infographic in the paper, the chances are Simon had a hand in researching the data for it. His delicious feed is a &lt;a href="http://delicious.com/smfrogers"&gt;positive gold mine&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As of today, a sizeable portion the data he collects for the newspaper will also be published online. As a starting point, we're publishing over &lt;a href="http://www.guardian.co.uk/data-store"&gt;80 data sets&lt;/a&gt;, all using Google Spreadsheets which means it's all accessible through the &lt;a href="http://code.google.com/apis/spreadsheets/overview.html"&gt;Spreadsheets Data API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's Simon's take on it, from &lt;a href="http://www.guardian.co.uk/news/datablog/2009/mar/10/blogpost1"&gt;Welcome to the Datablog&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote cite="http://www.guardian.co.uk/news/datablog/2009/mar/10/blogpost1"&gt;&lt;p&gt;Everyday we work with datasets from around the world. We have had to check this data and make sure it's the best we can get, from the most credible sources. But then it lives for the moment of the paper's publication and afterward disappears into a hard drive, rarely to emerge again before updating a year later.&lt;/p&gt;

&lt;p&gt;So, together with its companion site, the Data Store – a directory of all the stats we post – we are opening up that data for everyone. Whenever we come across something interesting or relevant or useful, we'll post it up here and let you know what we're planning to do with it.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;It's worth spending quite a while digging around the data. Most sets come with a full description, including where the data was sourced from. New data sets will be announced &lt;a href="http://www.guardian.co.uk/news/datablog"&gt;on the Datablog&lt;/a&gt;, which is cleverly subtitled "Facts are sacred".&lt;/p&gt;

&lt;h4&gt;The Content API&lt;/h4&gt;

&lt;p&gt;&lt;a href="http://api.guardianapis.com/docs/"&gt;The Content API&lt;/a&gt; provides REST-ish access to over a million items of content, mostly from the last decade but with a few gems that are &lt;a href="http://www.guardian.co.uk/world/1944/aug/26/france.secondworldwar"&gt;a little bit older&lt;/a&gt;. Various types of content are available - article is the most common, but you can grab information (though not necessarily content) about audio, video, galleries and more. You can retrieve 50 items at a time, and pagination is unlimited (provided you stay below the API's rate limit).&lt;/p&gt;

&lt;p&gt;Articles are provided with their full body content, though this does not currently include any HTML tags (a known issue). It's a good idea to review &lt;a href="http://www.guardian.co.uk/open-platform/terms-and-conditions"&gt;our terms and conditions&lt;/a&gt;, but you should know that if you opt to republish our article bodies on your site we may ask you to include our ads alongside our content in the future.&lt;/p&gt;

&lt;p&gt;We serve 15 minute HTTP cache headers, but you are allowed to store our content for up to 24 hours. You really, really don't want to store content for longer than that, as in addition to violating our T&amp;amp;Cs you might find yourself inadvertently publishing an article that has been retracted for legal reasons. UK libel laws can be pretty scary.&lt;/p&gt;

&lt;p&gt;In addition to regular search, you can also filter our content using tags. Tags are a core aspect of the Guardian's &lt;a href="http://www.guardian.co.uk/help/insideguardian+series/an-abc-of-r2"&gt;R2 platform&lt;/a&gt;, being used for keywords, contributors, "series" (used to implement blogs), content types and more. Every item returned by the API includes tags, and the tags can be used to further filter the results.&lt;/p&gt;

&lt;p&gt;We also return a list of filters at the bottom of each page of search results showing the tags that could be used to filter that result set, ordered by the number of results (you may have seen this feature referred to as faceted search or guided navigation). Handy tip: you can use ?count=0 in your search API key to turn off results entirely and just get back the filters section. The race is on to be first to release a tag relationship browser based on this feature.&lt;/p&gt;

&lt;p&gt;API responses can be had in custom XML, JSON or Atom. The Atom format is the least mature at the moment, and we'd welcome suggestions for improving it from the community.&lt;/p&gt;

&lt;p&gt;I released &lt;a href="http://code.google.com/p/openplatform-python/"&gt;a Python client library&lt;/a&gt; for the API this morning, and we also have libraries for &lt;a href="http://code.google.com/p/openplatform-ruby/"&gt;Ruby&lt;/a&gt;, &lt;a href="http://code.google.com/p/openplatform-java/"&gt;Java&lt;/a&gt; and &lt;a href="http://code.google.com/p/openplatform-php/"&gt;PHP&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We also have an API Explorer (written in JavaScript and jQuery, hosted on the same domain as the API so that it can make Ajax requests) but you'll need an API key to try it out.&lt;/p&gt;

&lt;h4&gt;The bad news&lt;/h4&gt;

&lt;p&gt;The response to the API release has been terrific (check out what &lt;a href="http://www.tom-watson.co.uk/2009/03/guardian-open-platform/"&gt;Tom Watson&lt;/a&gt; had to say), but as a result it's likely that API key provisions will be significantly lower than the overall demand for them. Please bear with us while we work towards a more widely accessible release.&lt;/p&gt;

&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Mar/10/openplatform/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Mar/10/openplatform/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="apis" /><category term="atom" /><category term="contentapi" /><category term="data" /><category term="datastore" /><category term="guardian" /><category term="javascript" /><category term="journalism" /><category term="jquery" /><category term="json" /><category term="openplatform" /><category term="python" /><category term="simonrogers" /><category term="tomwatson" /><category term="xml" /></entry><entry><title>Pragmatism, purity and JSON content types</title><link href="http://simonwillison.net/2009/Feb/6/json/" rel="alternate" /><updated>2009-02-06T10:19:55Z</updated><id>http://simonwillison.net/2009/Feb/6/json/</id><summary type="html">&lt;p&gt;I started a conversation about this on Twitter the other day, but Twitter is a horrible place to have an archived discussion so I'm going to try again here.&lt;/p&gt;

&lt;p&gt;If you're producing a JSON API for other people to use (as opposed to an API that's only really meant for your own local Ajax responses), you need to decide which Content-Type to use. The best option is not entirely obvious.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.ietf.org/rfc/rfc4627.txt"&gt;RFC 4672&lt;/a&gt; defines JSON and reserves &lt;code&gt;application/json&lt;/code&gt; as the preferred media type. The problem is that most browsers will prompt you to download the file rather than displaying it inline (as they would for &lt;code&gt;text/plain&lt;/code&gt; or &lt;code&gt;application/javascript&lt;/code&gt;). One of my favourite qualities of REST-style APIs is that they enable exploration and debugging using just a browser - using &lt;code&gt;application/json&lt;/code&gt; throws a big, frustrating road block in the way. There are ways of telling your browser to treat &lt;code&gt;application/json&lt;/code&gt; in the same way as &lt;code&gt;text/plain&lt;/code&gt; but that doesn't really help you if your aim is to create an API that's easy for other developers to use.&lt;/p&gt;

&lt;p&gt;It's also worth mentioning that if you are returning JSONP (with an extra callback function wrapped around the JSON response to enable the dynamic script tag hack) you HAVE to serve as &lt;code&gt;application/javascript&lt;/code&gt; - otherwise the script you are providing won't be executed by the browser. Don't forget to include &lt;code&gt;charset=UTF8&lt;/code&gt; as well (for both types of response).&lt;/p&gt;

&lt;p&gt;So, it's pragmatism v.s. purity. The &lt;em&gt;correct&lt;/em&gt; thing to do is to return &lt;code&gt;application/json&lt;/code&gt;, but doing so makes your API harder for developers to use.&lt;/p&gt;

&lt;p&gt;In a brief, non-comprehensive review of some existing JSON APIs (FriendFeed, Flickr, Google Social Graph etc) I couldn't find any that were using &lt;code&gt;application/json&lt;/code&gt;, presumably for this exact reason.&lt;/p&gt;

&lt;h4&gt;Using the Accept: header&lt;/h4&gt;

&lt;p&gt;The Accept: header is one of my least favourite parts of HTTP. I like to be confident that if I send a URL to someone, they'll get back exactly the same bytes as I did when I retrieved it myself (I distrust language negotiation for the same reason). However, a number of people suggested it on Twitter and it looks like it could be a useful solution to this problem.&lt;/p&gt;

&lt;p&gt;I'm currently considering the following: ONLY use the &lt;code&gt;application/json&lt;/code&gt; Content-Type in reply to requests that include &lt;code&gt;application/json&lt;/code&gt; in their Accept header - essentially allowing clients that care about the correct content type to opt-in to receiving it. Everyone else (browsers included) gets &lt;code&gt;application/javascript&lt;/code&gt;, which is less correct (though not an all-out lie, since JSON is a subset of JavaScript) but solves the usability problem.&lt;/p&gt;

&lt;p&gt;A couple of things worry me about this. Firstly, is this a reasonable thing to use Accept for? Secondly, is there a chance that browsers might add &lt;code&gt;application/json&lt;/code&gt; to their Accept header at some point in the future? Safari currently sends &lt;code&gt;text/xml,application/xml,application/xhtml+xml,text/html; q=0.9,text/plain; q=0.8,image/png,*/*; q=0.5&lt;/code&gt; while Firefox sends &lt;code&gt;text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8&lt;/code&gt;. Would it be smarter to look for &lt;code&gt;*/*&lt;/code&gt; and serve the incorrect Content-Type to those requests and the correct one to everything else?&lt;/p&gt;

&lt;p&gt;An alternative is to simply allow people to specify "JSON with a browsable Content-Type" as an alternative format option, or to enable a "pretty=1" query string argument which returns the response as &lt;code&gt;text/plain&lt;/code&gt; and potentially pretty prints it as well. I haven't yet decided if this is better than messing around with the Accept header.&lt;/p&gt;


&lt;!-- &lt;p&gt;&lt;a href="http://simonwillison.net/2009/Feb/6/json/#comments"&gt;&lt;img src="http://simonwillison.net/2009/Feb/6/json/badge.png" alt="Number of comments"&gt;&lt;/a&gt;&lt;/p&gt; --&gt;
</summary><category term="accept" /><category term="apis" /><category term="contenttypes" /><category term="http" /><category term="json" /><category term="pragmatism" /><category term="purity" /></entry></feed>
