How can you build a search engine for a website built in PHP/MySQL?
11th February 2012
My answer to How can you build a search engine for a website built in PHP/MySQL? on Quora
There are a bunch of options.
The easiest to implement is to build search on top of MySQL LIKE queries—performance will be pretty terrible (since every search will require a full table scan) but provided your tables only have a few thousand records on them and your site doesn’t have to cope with more than a dozen or so hits a second it should work fine.
Next easiest: use MySQL’s built-in full text indexing feature. It’s not particularly good, and it requires you to use MyISAM tables (InnoDB is much more reliable, but doesn’t support full text indexing)—but it will do the job. You could always keep your main site data in InnoDB and denormalise in to a MyISAM table just for search—or you could use the trick Flickr used to use, which is to set up MySQL replication and run MyISAM on one of the slaves purely to support fulltext search.
Past that, you’re looking at adding another component to the stack. Sphinx can integrate directly with MySQL and lets you run SQL-style queries against a proper full text index. Personally I’m a big fan of Solr, which runs as a separate (Java) server and requires you to index documents over HTTP. The great thing about Solr is that you can talk to it from any language that has an HTTP client library.
The last option is to go for a hosted solution. Google Custom Search is free, but not particularly flexible. IndexTank was a good option here but they were acquired by LinkedIn and are shutting down the hosted service—they’ve since open sourced their software and other companies such as http://www.searchify.com/ are starting to offer it as a hosted solution.
More recent articles
- Slop is the new name for unwanted AI-generated content - 8th May 2024
- Weeknotes: more datasette-secrets, plus a mystery video project - 7th May 2024
- Weeknotes: Llama 3, AI for Data Journalism, llm-evals and datasette-secrets - 23rd April 2024
- Options for accessing Llama 3 from the terminal using LLM - 22nd April 2024
- AI for Data Journalism: demonstrating what we can do with this stuff right now - 17th April 2024
- Three major LLM releases in 24 hours (plus weeknotes) - 10th April 2024
- Building files-to-prompt entirely using Claude 3 Opus - 8th April 2024
- Running OCR against PDFs and images directly in your browser - 30th March 2024
- llm cmd undo last git commit - a new plugin for LLM - 26th March 2024
- Building and testing C extensions for SQLite with ChatGPT Code Interpreter - 23rd March 2024