Any source available to download sample data (in 10+ GB) for testing?
15th October 2012
My answer to Any source available to download sample data (in 10+ GB) for testing? on Quora
Wikipedia has some pretty interesting dumps, in both XML and SQL format: http://meta.wikimedia.org/wiki/I...
It’s pretty easy to generate 10GB of random data for testing though, which may be a better option as you could better approximate the kind of data your application will be dealing with. There’s a neat Ruby module for doing this called Faker (itself a port of the Perl module of the same name): http://faker.rubyforge.org/—and here’s a Python port of the Ruby one: https://github.com/threadsafelab...
More recent articles
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024