Simon Willison’s Weblog

Subscribe
Atom feed for databases

102 items tagged “databases”

2010

What are people’s experiences using Memcached?

That it’s so obviously a good idea (and works so well) that you’d be crazy not to use it. As far as I’m concerned, it’s part of the default stack for any web application.

[... 46 words]

Where can I find a database of the cities in the United States, their populations, and square miles?

On Freebase: http://www.freebase.com/view/use...—if you sign up for a Freebase account you can further filter this report to include areas.

[... 46 words]

A Gentle Introduction to CouchDB for Relational Practitioners. By “High Performance MySQL” author Baron Schwartz—a smart, concise overview that touches pretty much everything that’s interesting about CouchDB.

# 22nd September 2010, 9:51 pm / couchdb, databases, recovered

How do Solr, Lucene, Sphinx and Searchify compare?

Lucene is a Java library for creating and searching through a full text index. If you want to make use of it, you’ll need to write your own Java code that integrates with it.

[... 109 words]

What is the largest production deployment of CouchDB for online use?

The BBC have a pretty big CouchDB cluster, which they use mostly as a replicated key-value store. It’s used by their new identity platform which includes customisation features for iPlayer.

[... 47 words]

PostgreSQL 9.0 Beta 1 Now Available. With asynchronous streaming replication.

# 5th May 2010, 2:36 pm / databases, postgresql, replication, recovered

grant XXX on * ? (via) PostgreSQL doesn’t have a way to say “this user is allowed to select/update/etc on all tables in database X”. That kind of sucks. UPDATE: This is fixed in PostgreSQL 9, see the comments.

# 16th March 2010, 6:26 pm / grant, permissions, databases, postgresql, sql

Johnny Cache. Clever twist on ORM-level caching for Django. Johnny Cache (great name) monkey-patches Django’s QuerySet classes and caches the result of every single SELECT query in memcached with an infinite expiry time. The cache key includes a “generation” ID for each dependent database table, and the generation is changed every single time a table is updated. For apps with infrequent writes, this strategy should work really well—but if a popular table is being updated constantly the cache will be all but useless. Impressively, the system is transaction-aware—cache entries created during a transaction are held in local memory and only pushed to memcached should the transaction complete successfully.

# 28th February 2010, 10:55 pm / memcached, django, databases, caching, orm, performance, python, ormcaching

Redis Virtual Memory: the story and the code. Fascinating overview of the virtual memory feature coming to Redis 2.0, which will remove the requirement that all Redis data fit in RAM. Keys still stay in RAM, but rarely accessed values will be swapped to disk. 16 GB of RAM will be enough to hold 100 million keys, each with a value as large as you like.

# 9th February 2010, 3:59 pm / redis, virtualmemory, keyvaluestores, databases, salvatore-sanfilippo

FleetDB (via) Yet Another Key-Value Store: Schema-free, JSON protocol, everything cached in RAM, append-only log for durability, multi-record transactions... but what’s really interesting about this one is that it’s written in Clojure and takes full advantage of that language’s concurrency primitives. The prefix operators used by the select API hint at its Lisp heritage.

# 5th January 2010, 11:21 am / nosql, clojure, lisp, keyvaluestore, fleetdb, databases

2009

Django | Multiple Databases. Russell just checked in the final patch developed from Alex Gaynor’s Summer of Code project to add multiple database support to Django. I’d link to the 21,000 line changeset but it crashed our Trac, so here’s the documentation instead.

# 22nd December 2009, 5:22 pm / django, multidb, russell-keith-magee, alex-gaynor, python, databases, scaling

Another leak, the worst so far (via) “Arweena, a spokes-elf for Santa Claus, admitted a few hours ago that the database posted at WikiLeaks yesterday is indeed the comprehensive 2009 list of which kids have been naughty, and which were nice.” The first comment is great too.

# 22nd December 2009, 10:42 am / wikileaks, security, databases, leaks, christmas, funny

PostgreSQL 8.5 alpha 2 is out. “P.S. If you’re wondering about Hot Standby and Synchronous Replication, they’re still under heavy development and still (at this point) expected to be in 8.5.”—Hot Standby is PostgreSQL-speak for MySQL-style master/slave replication for scaling your reads.

# 28th October 2009, 9:02 am / scaling, postgresql, replication, hotstandby, databases

MySQL Connector/Python. A pure Python implementation of the MySQL client/server protocol, meaning you can talk to a MySQL server from Python without needing to first install the MySQL client libraries (which often requires compiling from source).

# 2nd October 2009, 2:16 pm / mysql, python, databases, django

Why we migrated from MySQL to MongoDB. Includes some useful information on MongoDB’s limitations—for example, running many different collections can waste disk space and repairing large datasets or bulk deleting many rows can block and lock the database for the duration of the operation.

# 27th July 2009, 10:49 am / mongodb, databases, mysql, documentstores

Keyspace. Yet Another Key-Value Store—this one focuses on high availability, with one server in the cluster serving as master (and handling all writes), and the paxos algorithm handling replication and ensuring a new master can be elected should the existing master become unavailable. Clients can chose to make dirty reads against replicated servers or clean reads by talking directly to the master. Underlying storage is BerkeleyDB, and the authors claim 100,000 writes/second. Released under the AGPL.

# 16th July 2009, 10:30 am / keyvaluepairs, keyspace, databases, agpl, berkeleydb, scaling, replication, paxos

Using Mongo for Real-Time Analytics. MongoDB supports an “upsert” query, which when combined with the $inc operator can cause counter fields to be incremented if they exist and created otherwise. This makes it a great fit for real-time analytics applications (one increment per page view), something that regular relational databases aren’t particularly good at.

# 30th June 2009, 7:28 pm / mongodb, increment, counters, databases, upsert

PostgreSQL Development Priorities. The top two for 8.4 are “Simple built-in replication” and “Upgrade-in-place”, Josh Berkus is seeking feedback on priorities for future work on 8.5.

# 28th May 2009, 8:08 pm / postgresql, replication, josh-berkus, databases, open-source

ericflo’s django-tokyo-sessions. A Django sessions backend using Tokyo Cabinet, via Tokyo Tyrant and the PyTyrant library. A fast key/value store is a much better solution for sessions than a relational database.

# 7th May 2009, 7:30 am / django, sessions, keyvaluestores, tokyocabinet, tokyotyrant, pytyrant, databases, eric-florenzano

Drop ACID and think about data. I’ve been very impressed with the quality and speed with which the PyCon 2009 videos have been published. Here’s Bob Ippolito on distributed databases and key/value stores.

# 17th April 2009, 5:13 pm / bobippolito, python, acid, databases, data, pycon, pycon2009

Introducing Digg’s IDDB Infrastructure. IDDB is Digg’s new infrastructure component for sharding data across multiple databases, with support for both MySQL and memcachedb. “The DiggBar and URL minifying service is powered by a 16 machine IDDB cluster, which includes 8 write masters in the index and 8 MySQL storage nodes.”

# 3rd April 2009, 8:42 pm / joe-stump, databases, digg, iddb, memcachedb, mysql, scaling, sharding

Southerly Breezes. Andrew Godwin is slowly assimilating the best ideas from other Django migration systems in to South—the latest additions include ORM Freezing from Migratory and automatic change detection. Exciting stuff.

# 15th March 2009, 1:17 pm / andrew-godwin, django, south, migrations, orm, databases

[Drizzle] won’t be a get-out-of-jail-free card for very write-heavy applications but I bet it will do wonders for heavily replicated, heavily federated, read-heavy architectures (you know, normal stuff).

Richard Crowley

# 8th March 2009, 6:05 pm / richard-crowley, drizzle, mysql, databases, replication

What happened to Hot Standby? Hot Standby (the ability to have read-only replication slaves) has been dropped from PostgreSQL 8.4 and is now scheduled for 8.5. “Making hard decisions to postpone features which aren’t quite ready is how PostgreSQL makes sure that our DBMS is ”bulletproof“ and that we release close to on-time every year”.

# 8th March 2009, 9:28 am / postgresql, replication, hotstandby, josh-berkus, scaling, databases

Database Sharding at Netlog, with MySQL and PHP. Detailed MySQL sharding case study from Netlog, who serve five billion page requests a month using thousands of shards across more than 80 database servers.

# 2nd March 2009, 10:22 am / mysql, sharding, scaling, netlog, php, databases

How FriendFeed uses MySQL to store schema-less data. The pain of altering/ adding indexes to tables with 250 million rows was killing their ability to try out new features, so they’ve moved to storing pickled Python objects and manually creating the indexes they need as denormalised two column tables. These can be created and dropped much more easily, and are continually populated by an off-line index building process.

# 27th February 2009, 2:33 pm / mysql, friendfeed, databases, bret-taylor, scaling, sharding, python

DB2 support for Django is coming. From IBM, under the Apache 2.0 License. I’m not sure if this makes it hard to bundle it with the rest of Django, which uses the BSD license.

# 18th February 2009, 10:58 pm / bsd, open-source, licenses, ibm, db2, django, python, databases, orm, antonio-cangiano

Tokyo Tyrant Tutorial. Buried at the bottom of the Tokyo Tyrant protocol documentation, this is the best resource I’ve seen yet for getting up and running with the database server (including setting up replication).

# 14th February 2009, 11:29 am / replication, tokyotyrant, tokyocabinet, databases, keyvaluepairs

Tokyo Cabinet: Beyond Key-Value Store. Useful overview of Yet Another Scalable Key Value Store. Interesting points: multiple backends (hash table, B-Tree, in memory, on disk), a “table” engine which enables more advanced queries, a network server that supports HTTP, memcached or its own binary protocol and the ability to extend the engine with Lua scripts.

# 14th February 2009, 11:17 am / keyvaluepairs, tokyocabinet, http, memcached, lua, hash, databases

Rules of Database App Aging. Peter Harkins: All fields become optional, all relationships become many-to-many, chatter always expands. This is why document oriented databases such as CouchDB are looking more and more attractive.

# 18th January 2009, 9:09 am / databases, peter-harkins, couchdb