Simon Willison on search

91 posts tagged “search”

2010

ElasticSearch: Your Data, Your Search. A neat example of how ElasticSearch’s schemaless indexes and native JSON support make it ridiculously easy to index different types of data and run queries across them.

# 12th February 2010, 3:22 pm / elasticsearch, java, json, schemaless, search

Elastic Search (via) Solr has competition! Like Solr, Elastic Search provides a RESTful JSON HTTP interface to Lucene. The focus here is on distribution, auto-sharding and high availability. It’s even easier to get started with than Solr, partly due to the focus on providing a schema-less document store, but it’s currently missing out on a bunch of useful Solr features (a web interface and faceting are the two that stand out). The high availability features look particularly interesting. UPDATE: I was incorrect, basic faceted queries are already supported.

# 11th February 2010, 6:33 pm / elasticsearch, http, java, json, lucene, rest, scaling, search, sharding, solr

The Seven Deadly Sins of Solr. Useful advice on managing and deploying Solr.

# 24th January 2010, 1:30 pm / lucidimagination, search, solr

2009

Haystack 1.0 Final Released. I’ve used Haystack on a number of projects recently, and it has proved itself as a completely painless way of adding full-text search (using Solr or Whoosh—I haven’t tried the Xapian backend yet) to a Django ORM powered project in just a few minutes. Congratulations, Daniel + contributors.

# 30th November 2009, 8:07 am / daniel-lindsley, django, haystack, python, search, solr, whoosh

Large Problems in Django, Mostly Solved: Search. Eric Holscher shows how Haystack uses a number of common Django patterns (object registration, pluggable backends, QuerySet-style chaining and class-based views) to great effect in creating a powerful search application for Django. Makes me wonder if more of those patterns should be promoted to first class concepts within Django.

# 3rd November 2009, 10:42 am / classbasedviews, django, eric-holscher, haystack, patterns, python, search

So’s your facet: Faceted global search for Mozilla Thunderbird. Yes! This is the kind of innovation I’ve been hoping would show up in e-mail clients for years. Faceting is a really natural fit for e-mail.

# 4th September 2009, 10:29 am / email, faceting, mozilla, search, thunderbird

Collection: Search Patterns. Peter Morville’s enormous collection of screenshots of search engine interfaces.

# 30th July 2009, 12:35 pm / design, patterns, peter-morville, search, ui, usability

MongoDB—Capped Collections. Collections with a size limit that automatically expire older entries are interesting—useful for things like a “recent searches on this site” feature.

# 7th June 2009, 12:50 pm / cappedcollections, mongodb, search

Haystack (via) A brand new modular search plugin for Django, by Daniel Lindsley. The interface is modelled after the Django ORM (complete with declarative classes for defining your search schema) and it ships with backends for both Solr and pure-python Whoosh, with more on the way. Excellent documentation.

# 17th April 2009, 9:53 pm / daniel-lindsley, django, haystack, orm, python, search, solr, whoosh

Digg Search: Now With 99.987% Less Suck. Really nice implementation of faceted search, still using Lucene and Solr under the hood.

# 10th April 2009, 10:17 pm / digg, facets, full-text-search, lucene, search, solr

Sphinx 0.9.9-rc2 is out. Interesting new feature: the Sphinx search server now supports the MySQL binary protocol, so you can talk to it using a regular MySQL client library and fire off search queries using SELECT syntax and the new SphinxQL query language.

# 8th April 2009, 1:59 pm / full-text-search, mysql, search, sphinx-search, sql

Guardian + Lucene = Similar Articles + Categorisation. Alf Eaton loaded 13,000 Guardian articles tagged Science in to Solr and Lucene and is using Solr’s MoreLikeThisHandler to find related articles and automatically apply Guardian tags to Nature News articles.

# 11th March 2009, 12:53 pm / alf-eaton, full-text-search, guardian, lucene, naturenews, openplatform, search, solr, tagging

How search.twitter.com uses Varnish. Includes examples of the configuration options they use.

# 2nd March 2009, 5:08 pm / caching, search, twitter, varnish

django-springsteen and Distributed Search. Will Larson’s Django search library currently just talks to Yahoo! BOSS, but is designed to be extensible for other external search services. Interestingly, it uses threads to fire off several HTTP requests in parallel from within the Django view.

# 25th February 2009, 10:28 pm / concurrency, django, djangospringsteen, http, python, search, threads, will-larson, yahooboss

Xapian performance comparision with Whoosh. Whoosh appears to be around four times slower than Xapian for indexing and empty cache searches, but Xapian with a full cache blows Whoosh out of the water (5408 searches/second compared to 26.3). Considering how fast Xapian is, that’s still a pretty impressive result for the pure-Python Whoosh.

# 14th February 2009, 1:15 pm / full-text-search, python, richard-boulton, search, whoosh, xapian

Whoosh. A brand new, pure-python full text indexing engine (think Lucene). Claims to offer performance in the same league as wrappers to C or Java libraries. If this works as well as it claims it will be an excellent tool for adding search to projects that wish to avoid a dependency on an external engine.

# 12th February 2009, 12:49 pm / full-text-search, lucene, open-source, python, search, whoosh

Introduction to Information Retrieval (via) This looks excellent—a modern guide to implementing search engines written by some of the engineers behind Yahoo! Search. The full text is available online, but it looks like it’s well worth investing in the dead tree edition.

# 9th February 2009, 8:54 pm / books, freebooks, search, yahoosearch

Announcing the Article Search API. The most interesting API from the NYTimes yet—search against 2.8 million articles from 1981 until today using 35 searchable fields and get back detailed metadata as well as the first paragraph of the articles themselves.

# 5th February 2009, 11:06 pm / apis, newspapers, new-york-times, search

solango. Another attempt at a Django/Solr integration library, based on code written for “a top 20 newspaper site” (I’d love to know which one). This is well documented, uses a registration model clearly inspired by the Django admin which keeps search related metadata out of your regular models and includes management commands for re-indexing and generating Solr schema.xml files.

# 4th February 2009, 12:22 pm / django, lucene, python, search, solr

All you ever wanted to know about writing bloom filters. This helped me understand a key use case for bloom filters: reducing the impact of the “worst case search is when there are no matching results so everything gets scanned” problem.

# 30th January 2009, 8:26 am / bloom-filters, jonathan-ellis, search

2008

How-to: Full-text search in Google App Engine. Use search.SearchableModel instead of db.Model—it’s pretty rough at the moment which is probably why it’s still undocumented.

# 27th June 2008, 8:25 am / appengine, full-text-search, googleappengine, python, search

Google AJAX Search API: Flash and Server Side Access. Over a year after Google shot down their SOAP Search API, they’ve quietly released a JSON based one under the guise of supporting “Flash and other non JavaScript environments”. Comes with the strange requirement that an HTTP referer be sent with every request; the API key is optional.

# 22nd April 2008, 7:16 pm / ajax, apis, google, json, search, soap, web-services

In-Depth django-sphinx Tutorial. Another neat Django extension from the guys at Curse: easy integration with the sphinx full text search engine.

# 5th March 2008, 12:03 am / curse, david-cramer, django, python, search, sphinx-search

pysolr. Python wrapper for Solr, the search web service wrapper for Lucene. One thing I’m not clear on: do you need to configure Solr with the fields you’ll be indexing in advance, or can Solr create new fields on the fly to match the data you send it?

# 9th January 2008, 8:50 pm / apache, lucene, pysolr, python, search, solr

2007

Opera 9.5 alpha, Kestrel, released. “With history search, Opera creates a full-text index of each and every page you visit, and when you go to the address bar, you can simply start entering words you know have been on pages you’ve visited before, and items matching your search show up.” I just tried this; it’s magic. I’m switching back to Opera from Camino.

# 16th September 2007, 8:34 pm / browsers, camino, full-text-search, history, kestrel, opera, search

django-sphinx (via) More code from Curse Gaming; this time a really nice API for adding Sphinx full-text search to a Django model.

# 9th September 2007, 12:35 am / cursegaming, david-cramer, django, full-text-search, orm, python, search, sphinx-search

Grub. Jimmy Wales just announced at OSCON that Wikia have acquired Grub from LookSmart, and will be releasing it as open source.

# 27th July 2007, 5:24 pm / grub, jimmywales, looksmart, open-source, oscon, oscon07, search, wikia

Apache Solr 1.1. Solr is the search Web Service built on top of Lucene. The latest release introduces JSON, Python and Ruby response formats in addition to XML.

# 13th January 2007, 1:16 am / json, lucene, python, ruby, search, solr, webservice, xml

2005

Giving away the index

My final year project is due in two weeks, and I’m going to be running on silent for most of them. I have, however, upgraded to Tiger and playing with Spotlight has given me plenty to think about.

[... 414 words]

1:16 am / 4th May 2005 / oreilly, search, security

Google cruft

New Google feature: Google Movies. Displays aggregated movie reviews (like Rotten Tomatoes), looks up local movie times based on your zip code saved in Google Local (more evidence of the fabled Google cookie), and even handles recommendations.

[... 120 words]

12:34 am / 24th February 2005 / cruft, google, search

«« first « previous page 3 / 4 next » last »»

Simon Willison’s Weblog