Simon Willison’s Weblog

Subscribe

Any source available to download sample data (in 10+ GB) for testing?

15th October 2012

My answer to Any source available to download sample data (in 10+ GB) for testing? on Quora

Wikipedia has some pretty interesting dumps, in both XML and SQL format: http://meta.wikimedia.org/wiki/I...

It’s pretty easy to generate 10GB of random data for testing though, which may be a better option as you could better approximate the kind of data your application will be dealing with. There’s a neat Ruby module for doing this called Faker (itself a port of the Perl module of the same name): http://faker.rubyforge.org/—and here’s a Python port of the Ruby one: https://github.com/threadsafelab...

This is Any source available to download sample data (in 10+ GB) for testing? by Simon Willison, posted on 15th October 2012.

Next: How do I get out of being on panel at SXSW?

Previous: How can I learn more about server-side technologies?

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe