Simon Willison’s Weblog

Subscribe
Atom feed for reddit

24 items tagged “reddit”

2024

[On Reddit] we had to look up every single comment on the page to see if you had voted on it [...]

But with a bloom filter, we could very quickly look up all the comments and get back a list of all the ones you voted on (with a couple of false positives in there). Then we could go to the cache and see if your actual vote was there (and if it was an upvote or a downvote). It was only after a failed cache hit did we have to actually go to the database.

But that bloom filter saved us from doing sometimes 1000s of cache lookups.

Jeremy Edberg

# 24th December 2024, 7:13 am / reddit, bloom-filters, scaling

[Reddit is] mostly ported over entirely to Lit now. There are a few straggling pages that we're still working on, but most of what everyday typical users see and use is now entirely Lit based. This includes both logged out and logged in experiences.

Jim Simon, Reddit

# 1st October 2024, 12:09 am / web-components, reddit, lit-html, javascript

Google is the only search engine that works on Reddit now thanks to AI deal (via) This is depressing. As of around June 25th reddit.com/robots.txt contains this:

User-agent: *
Disallow: /

Along with a link to Reddit's Public Content Policy.

Is this a direct result of Google's deal to license Reddit content for AI training, rumored at $60 million? That's not been confirmed but it looks likely, especially since accessing that robots.txt using the Google Rich Results testing tool (hence proxied via their IP) appears to return a different file, via this comment, my copy here.

# 24th July 2024, 6:29 pm / google, seo, reddit, ai, search-engines, llms

In 2006, reddit was sold to Conde Nast. It was soon obvious to many that the sale had been premature, the site was unmanaged and under-resourced under the old-media giant who simply didn't understand it and could never realize its full potential, so the founders and their allies in Y-Combinator (where reddit had been born) hatched an audacious plan to re-extract reddit from the clutches of the 100-year-old media conglomerate. [...]

Yishan Wong

# 20th February 2024, 4:23 pm / reddit, y-combinator, startups

2023

Examples of weird GPT-4 behavior for the string “ davidjl”. GPT-4, when told to repeat or otherwise process the string “ davidjl” (note the leading space character), treats it as “jndl” or “jspb” or “JDL” instead. It turns out “ davidjl” has its own single token in the tokenizer: token ID 23282, presumably dating back to the GPT-2 days.

Riley Goodside refers to these as “glitch tokens”.

This token might refer to Reddit user davidjl123 who ranks top of the league for the old /r/counting subreddit, with 163,477 posts there which presumably ended up in older training data.

# 8th June 2023, 9:29 am / riley-goodside, reddit, generative-ai, openai, gpt-4, ai, llms

2022

r/MachineLearning: What is the SOTA explanation for why deep learning works? The thing I find fascinating about this Reddit conversation is that it makes it clear that the machine learning research community has very little agreement on WHY the state of the art techniques that are being used today actually work as well as they do.

# 5th September 2022, 5:46 pm / machine-learning, reddit, ai, generative-ai

2018

The original Reddit source code, written in Lisp in 2005 (via) “If anyone’s interested, I found a hard drive in my garage with the original Reddit Lisp code from 2005. Been looking for it for years. Enjoy.”—spez

# 29th March 2018, 10:13 pm / reddit, lisp

2010

Three new features for reddit gold. Reddit’s experiments with a subscriber program are interesting to watch. 9,000 people signed up as subscribers without there being any benefit at all, and they’re now being rewarded with the ability to opt out of ads and access to computationally expensive features (like different ways of sorting their own user page) that wouldn’t scale for the entire user base.

# 20th July 2010, 5:54 pm / ads, reddit, scaling, subscriptions, recovered

reddit’s May 2010 “State of the Servers” report. An interesting Cassandra war story: Cassandra scales up, but it doesn’t scale down very well: running with just three nodes can make recovery from problems a lot more tricky.

# 18th May 2010, 6:37 pm / cassandra, nosql, reddit, recovered

The Onion Uses Django, And Why It Matters To Us. The Onion ported their main site from PHP and Drupal to Django in three months with a team of four developers, including a full migration of their archived content. Their developers answer questions about the switch in this thread on the Django sub-reddit.

# 25th March 2010, 6:43 pm / reddit, django, python, the-onion, php, drupal

Reddit is now running on Cassandra. Migrating their persistent cache over from memcacheDB to Cassandra took one developer just ten days.

# 13th March 2010, 12:14 am / cassandra, memcachedb, caching, reddit

Since we moved to EC2, the number of unique users has gone up 50%, and pageviews are up more than 100%. To support this growth, we have added 30% more ram and 50% more CPU, yet because of Amazon's constant price reductions, we are actually paying less per month now than when we started.

Jeremy from Reddit

# 7th January 2010, 10:10 pm / reddit, ec2, amazon, pricing, cloud-computing

2008

Heck, I practically invented the formula of "tell a funny story and then get all serious and show how this is amusing anecdote just goes to show that (one thing|the other) is a universal truth." And everybody is like, oh yes! how true! and they link to it with approval, and it zooms to the top of Slashdot. And six years later, a new king arises who did not know Joel, and he writes up another amusing anecdote, really, it's the same anecdote, and he uses it to prove the exact opposite, and everyone is like, oh yes! how true! and it zooms to the top of Reddit.

Joel Spolsky

# 19th November 2008, 8:41 am / joel-spolsky, reddit, slashdot, anecdotes

Low level hooks for multi-database support in Django. As discussed in this sub-thread on reddit: The internal Django Query class has a ’connection’ attribute which can be set by the constructor. This low level hook is the secret to talking to more than one database at once, but higher level APIs have not yet been defined. Jacob Kaplan-Moss: “As a matter of fact, at least a couple high-traffic Django sites are using the new hooks.”

# 3rd September 2008, 11:33 pm / reddit, django, query, multidb, python, jacob-kaplan-moss

Dissecting today’s Internet traffic spikes (via) Theo Schlossnagle on how the increasing popularity of interest aggregation services such as Digg and Reddit result in traffic spikes that dwarf the old Slashdot effect, making a the old rules of thumb for capacity planning irrelevant.

# 29th June 2008, 2:12 pm / slashdotting, reddit, digg, theoschlossnagle, capacityplanning, scaling

This is the new blog-spam. [...] 'web design company' takes the highest ranking comment from reddit, and posts it on the site that the original comment is based on. [...] Neat eh? They get to have links on a site that won't get blog-spam filtered, because the comment is 'relevant', since the comment originates from a comment thread about the site.

ator_fighting_eagle

# 20th June 2008, 6:55 pm / reddit, spam, commentspam

Reddit release their codebase. Under the same Common Public Attribution License used by Facebook for their recent source release.

# 18th June 2008, 2:32 pm / open-source, reddit, python, cpal

Django sub-reddit. Reddit are trialling the ability to create custom sub-reddits, so I put one up for Django links and discussions.

# 26th January 2008, 11:56 pm / django, reddit, python, community

2007

Techniques for safely consuming external HTTP on demand? I asked this question on programming.reddit.com yesterday and got some really insightful answers, including Joe Stump from Digg describing how Digg Images uses Danga’s Gearman worker queue.

# 15th December 2007, 12:29 pm / http, queue, workers, gearman, reddit, askreddit, joe-stump, digg, danga, scaling

An OpenID provider should catalogue the sites that a user logs into and automatically construct a homepage for them. That way, not only do the users have the convenience of having their favourite websites automatically bookmarked and readily available, but (with a little help from the consumers), they don't have to log into the individual sites at all.

Bogtha

# 13th July 2007, 7:26 am / openid, ideas, reddit

The Beauty Of The Diffie-Hellman Protocol. Some useful explanations here. Diffie-Hellman is used by OpenID to establish a shared secret between the provider and the consumer.

# 1st March 2007, 10:08 pm / openid, diffiehellman, reddit, cryptography

2006

Three steps to OpenID. Maybe explaining OpenID isn’t as hard as I thought... Jacob Kaplan-Moss nails it in three.

# 20th December 2006, 12:44 pm / openid, jacob-kaplan-moss, reddit

Never store passwords in a database! The reddit.com developers just learnt this the hard way. It might be time to change some of your passwords.

# 16th December 2006, 12:01 am / reddit, security

Why do so many reddit users hate java? The answers provide a good overview as to why Java has fallen out of favour with the alpha-hacker crowd.

# 15th December 2006, 2:20 pm / java, reddit