Simon Willison’s Weblog

Subscribe
Atom feed for search

84 items tagged “search”

2019

Guide To Using Reverse Image Search For Investigations (via) Detailed guide from Bellingcat’s Aric Toler on using reverse image search for investigative reporting. Surprisingly Google Image Search isn’t the state of the art: Russian search engine Yandex offers a much more powerful solution, mainly because it’s the largest public-facing image search engine to integrate scary levels of face recognition.

# 30th December 2019, 10:23 pm / bellingcat, journalism, search

Falsehoods Programmers Believe About Search (via) These are great. “When you find the boolean operator ‘OR’, you always know it doesn’t mean Oregon”.

# 29th May 2019, 8:09 pm / search

Discussion about Altavista on Hacker News. Fascinating thread on Hacker News where Bryant Durrell, a former Director from Altavista provides some insider thoughts on how they lost against Google.

# 16th February 2019, 6:57 pm / search, internet-history, google, computer-history

Exploring search relevance algorithms with SQLite

SQLite isn’t just a fast, high quality embedded database: it also incorporates a powerful full-text search engine in the form of the FTS4 and FTS5 extensions. You’ve probably used these a bunch of times already: many iOS, Android and desktop applications use SQLite under-the-hood and use it to implement their built-in search.

[... 1,390 words]

2018

Fast Autocomplete Search for Your Website (via) I wrote a tutorial for the 24 ways advent calendar on building fast autocomplete search for a website on top of Datasette and SQLite. I built the demo against 24 ways itself—I used wget to recursively fetch all 330 articles as HTML, then wrote code in a Jupyter notebook to extract the raw data from them (with BeautifulSoup) and load them into SQLite using my sqlite-utils Python library. I deployed the resulting database using Datasette, then wrote some vanilla JavaScript to implement autocomplete using fast SQL queries against the Datasette JSON API.

# 19th December 2018, 12:26 am / jupyter, 24-ways, sqlite, search, autocomplete, datasette, beautifulsoup

Datasette: Full-text search. I wrote some documentation for Datasette’s full-text search feature, which detects tables which have been configured to use the SQLite FTS module and adds a search input box and support for a _search= querystring parameter.

# 12th May 2018, 12:09 pm / datasette, search, sqlite, full-text-search

Typesense (via) A new (to me) open source search engine, with a focus on being “typo-tolerant” and offering great, fast autocomplete—incredibly important now that most searches take place using a mobile phone keyboard. Similar to Elasticsearch or Solr in that it runs as an HTTP server that you serve JSON via POST and GET—and it offers read-only replicas for scaling and high availability. And since it’s 2018, if you have Docker running (I use Docker for Mac) you can start up a test instance with a one-line shell command.

# 6th April 2018, 5:07 pm / open-source, search, autocomplete

2017

New in Datasette: filters, foreign keys and search

I’ve released Datasette 0.13 with a number of exciting new features (Datasette previously).

[... 1,143 words]

Implementing faceted search with Django and PostgreSQL

Visit Implementing faceted search with Django and PostgreSQL

I’ve added a faceted search engine to this blog, powered by PostgreSQL. It supports regular text search (proper search, not just SQL“like” queries), filter by tag, filter by date, filter by content type (entries vs blogmarks vs quotation) and any combination of the above. Some example searches:

[... 3,103 words]

2013

Why is site search so bad on most websites?

It’s not so much that site search is bad, it’s that your expectations have been raised enormously high by the incredible quality of search provided by search engines like Google.

[... 125 words]

2012

Is there a place or portal where I can search for hashtags which track possible upcoming events or topics?

Our site http://lanyrd.com/ includes hashtags for thousands of upcoming conferences and professional events.

[... 39 words]

What are the best events search engines?

Since I co-founded one I’m certainly not qualified to express an opinion on which ones are best, but here are a few of my favourites:

[... 233 words]

What kind of publicly available search software is able to be purchased or used freely as part of a website, and how good is it?

There are plenty of good open source options—Solr is currently my favourite. It’s extremely powerful but you do need to do some programming on top of it—I use Django and Haystack to build the search UI on most of my projects.

[... 115 words]

2011

Why does Wolfram|Alpha present all search results as pictures rather than text?

Wolfram Alpha is essentially a web interface to Mathematica (plus a huge corpus of structured data). Mathematica has been around for decades, and has an extremely sophisticated visualisation engine (try typing “sin(x)/cos(x)” in to Wolfram Alpha and see what happens). It’s also very good at rendering mathematical formulae that would be very hard to represent in plain HTML (without using MathML, which isn’t supported by IE).

[... 137 words]

Twitter API: What is the best data storage mechanism and client library for analysing tweets using Python?

It depends on how much data you intend to collect, and how you intend to then share that data.

[... 182 words]

elasticsearch: Percolator. Another fascinating elasticsearch feature: Percolator lets you register searches with your elasticsearch cluster, then pass in a document and have the matching query IDs returned. It’s an upside down search engine. I’m sure there are some very neat things you could build with this, I just haven’t figured out what they are just yet.

# 8th February 2011, 11:16 pm / elasticsearch, search, recovered

2010

Indexing JSON in Solr 3.1. The next release of Solr will support indexing documents provided as JSON—Solr currently requires incoming documents to be formatted as XML.

# 10th December 2010, 9:46 am / json, search, solr, xml, recovered

Who are major competitors to Solr?

ElasticSearch is a really interesting one—it’s the same underlying search library (Lucene) and the same integration model (an HTTP interface) but takes quite a different approach. It hasn’t been around for a long time but it looks very impressive: http://www.elasticsearch.com/

[... 95 words]

How do Solr, Lucene, Sphinx and Searchify compare?

Lucene is a Java library for creating and searching through a full text index. If you want to make use of it, you’ll need to write your own Java code that integrates with it.

[... 109 words]

Which major companies are using Solr for search?

The Guardian newspaper uses Solr for its Open Platform Content API. http://www.guardian.co.uk/open-p...

[... 27 words]

[UPDATE] Spatial Search in Apache Lucene and Solr. Spacial search is finally coming (back) to Solr—trunk now supports sorting and boosting by distance.

# 20th July 2010, 6:28 pm / lucene, search, solr, spatialsearch, recovered

A fast, fuzzy, full-text index using Redis. Interesting twist on building a reverse-index using Redis sets: this one indexes only the metaphones of the words, resulting in a phonetic fuzzy search.

# 5th May 2010, 5:51 pm / fuzzy, metaphone, redis, search, recovered, full-text-search

Search Engine Time Machine. Detailed explanation of how ElasticSearch provides high availability, through clever sharding and replication strategies and configurable gateways for long-term persistent storage.

# 17th February 2010, 10:32 pm / elasticsearch, highavailability, scaling, search

ElasticSearch: Your Data, Your Search. A neat example of how ElasticSearch’s schemaless indexes and native JSON support make it ridiculously easy to index different types of data and run queries across them.

# 12th February 2010, 3:22 pm / elasticsearch, java, search, schemaless, json

Elastic Search (via) Solr has competition! Like Solr, Elastic Search provides a RESTful JSON HTTP interface to Lucene. The focus here is on distribution, auto-sharding and high availability. It’s even easier to get started with than Solr, partly due to the focus on providing a schema-less document store, but it’s currently missing out on a bunch of useful Solr features (a web interface and faceting are the two that stand out). The high availability features look particularly interesting. UPDATE: I was incorrect, basic faceted queries are already supported.

# 11th February 2010, 6:33 pm / search, scaling, rest, lucene, java, elasticsearch, json, http, sharding, solr

The Seven Deadly Sins of Solr. Useful advice on managing and deploying Solr.

# 24th January 2010, 1:30 pm / solr, lucidimagination, search

2009

Haystack 1.0 Final Released. I’ve used Haystack on a number of projects recently, and it has proved itself as a completely painless way of adding full-text search (using Solr or Whoosh—I haven’t tried the Xapian backend yet) to a Django ORM powered project in just a few minutes. Congratulations, Daniel + contributors.

# 30th November 2009, 8:07 am / django, haystack, daniel-lindsley, search, solr, whoosh, python

Large Problems in Django, Mostly Solved: Search. Eric Holscher shows how Haystack uses a number of common Django patterns (object registration, pluggable backends, QuerySet-style chaining and class-based views) to great effect in creating a powerful search application for Django. Makes me wonder if more of those patterns should be promoted to first class concepts within Django.

# 3rd November 2009, 10:42 am / django, eric-holscher, search, haystack, patterns, classbasedviews, python

So’s your facet: Faceted global search for Mozilla Thunderbird. Yes! This is the kind of innovation I’ve been hoping would show up in e-mail clients for years. Faceting is a really natural fit for e-mail.

# 4th September 2009, 10:29 am / faceting, search, email, thunderbird

Collection: Search Patterns. Peter Morville’s enormous collection of screenshots of search engine interfaces.

# 30th July 2009, 12:35 pm / peter-morville, search, ui, patterns, design, usability