April 2024
147 posts: 5 entries, 59 links, 26 quotes, 57 beats
April 25, 2024
I’ve been at OpenAI for almost a year now. In that time, I’ve trained a lot of generative models. [...] It’s becoming awfully clear to me that these models are truly approximating their datasets to an incredible degree. [...] What this manifests as is – trained on the same dataset for long enough, pretty much every model with enough weights and training time converges to the same point. [...] This is a surprising observation! It implies that model behavior is not determined by architecture, hyperparameters, or optimizer choices. It’s determined by your dataset, nothing else. Everything else is a means to an end in efficiently delivery compute to approximating that dataset.
The only difference between screwing around and science is writing it down
— Alex Jason, via Adam Savage
April 26, 2024
Food Delivery Leak Unmasks Russian Security Agents. This story is from April 2022 but I realize now I never linked to it.
Yandex Food, a popular food delivery service in Russia, suffered a major data leak.
The data included an order history with names, addresses and phone numbers of people who had placed food orders through that service.
Bellingcat were able to cross-reference this leak with addresses of Russian security service buildings—including those linked to the GRU and FSB.This allowed them to identify the names and phone numbers of people working for those organizations, and then combine that information with further leaked data as part of their other investigations.
If you look closely at the screenshots in this story they may look familiar: Bellingcat were using Datasette internally as a tool for exploring this data!
If you’re auditioning for your job every day, and you’re auditioning against every other brilliant employee there, and you know that at the end of the year, 6% of you are going to get cut no matter what, and at the same time, you have access to unrivaled data on partners, sellers, and competitors, you might be tempted to look at that data to get an edge and keep your job and get to your restricted stock units.
It's very fast to build something that's 90% of a solution. The problem is that the last 10% of building something is usually the hard part which really matters, and with a black box at the center of the product, it feels much more difficult to me to nail that remaining 10%. With vibecheck, most of the time the results to my queries are great; some percentage of the time they aren't. Closing that gap with gen AI feels much more fickle to me than a normal engineering problem. It could be that I'm unfamiliar with it, but I also wonder if some classes of generative AI based products are just doomed to mediocrity as a result.
April 27, 2024
Everything Google’s Python team were responsible for. In a questionable strategic move, Google laid off the majority of their internal Python team a few days ago. Someone on Hacker News asked what the team had been responsible for, and team member zem relied with this fascinating comment providing detailed insight into how the team worked and indirectly how Python is used within Google.
I've worked out why I don't get much value out of LLMs. The hardest and most time-consuming parts of my job involve distinguishing between ideas that are correct, and ideas that are plausible-sounding but wrong. Current AI is great at the latter type of ideas, and I don't need more of those.
April 28, 2024
Zed Decoded: Rope & SumTree (via) Text editors like Zed need in-memory data structures that are optimized for handling large strings where text can be inserted or deleted at any point without needing to copy the whole string.
Ropes are a classic, widely used data structure for this.
Zed have their own implementation of ropes in Rust, but it's backed by something even more interesting: a SumTree, described here as a thread-safe, snapshot-friendly, copy-on-write B+ tree where each leaf node contains multiple items and a Summary for each Item, and internal tree nodes contain a Summary of the items in its subtree.
These summaries allow for some very fast traversal tree operations, such as turning an offset in the file into a line and row coordinate and vice-versa. The summary itself can be anything, so each application of SumTree in Zed collects different summary information.
Uses in Zed include tracking highlight regions, code folding state, git blame information, project file trees and more - over 20 different classes and counting.
Zed co-founder Nathan Sobo calls SumTree "the soul of Zed".
Also notable: this detailed article is accompanied by an hour long video with a four-way conversation between Zed maintainers providing a tour of these data structures in the Zed codebase.
April 29, 2024
How do you accidentally run for President of Iceland? (via) Anna Andersen writes about a spectacular user interface design case-study from this year's Icelandic presidential election.
Running for President requires 1,500 endorsements. This year, those endorsements can be filed online through a government website.
The page for collecting endorsements originally had two sections - one for registering to collect endorsements, and another to submit your endorsement. The login link for the first came higher on the page, and at least 11 people ended up accidentally running for President!
The creator of a model can not ensure that a model is never used to do something harmful – any more so that the developer of a web browser, calculator, or word processor could. Placing liability on the creators of general purpose tools like these mean that, in practice, such tools can not be created at all, except by big businesses with well funded legal teams.
[...] Instead of regulating the development of AI models, the focus should be on regulating their applications, particularly those that pose high risks to public safety and security. Regulate the use of AI in high-risk areas such as healthcare, criminal justice, and critical infrastructure, where the potential for harm is greatest, would ensure accountability for harmful use, whilst allowing for the continued advancement of AI technology.
My notes on gpt2-chatbot.
There's a new, unlabeled and undocumented model on the LMSYS Chatbot Arena today called gpt2-chatbot. It's been giving some impressive responses - you can prompt it directly in the Direct Chat tab by selecting it from the big model dropdown menu.
It looks like a stealth new model preview. It's giving answers that are comparable to GPT-4 Turbo and in some cases better - my own experiments lead me to think it may have more "knowledge" baked into it, as ego prompts ("Who is Simon Willison?") and questions about things like lists of speakers at DjangoCon over the years seem to hallucinate less and return more specific details than before.
The lack of transparency here is both entertaining and infuriating. Lots of people are performing a parallel distributed "vibe check" and sharing results with each other, but it's annoying that even the most basic questions (What even IS this thing? Can it do RAG? What's its context length?) remain unanswered so far.
The system prompt appears to be the following - but system prompts just influence how the model behaves, they aren't guaranteed to contain truthful information:
You are ChatGPT, a large language model trained
by OpenAI, based on the GPT-4 architecture.
Knowledge cutoff: 2023-11
Current date: 2024-04-29
Image input capabilities: Enabled
Personality: v2
My best guess is that this is a preview of some kind of OpenAI "GPT 4.5" release. I don't think it's a big enough jump in quality to be a GPT-5.
Update: LMSYS do document their policy on using anonymized model names for tests of unreleased models.
Update May 7th: The model has been confirmed as belonging to OpenAI thanks to an error message that leaked details of the underlying API platform.
# All the code is wrapped in a main function that gets called at the bottom of the file, so that a truncated partial download doesn't end up executing half a script.
April 30, 2024
Why SQLite Uses Bytecode (via) Brand new SQLite architecture documentation by D. Richard Hipp explaining the trade-offs between a bytecode based query plan and a tree of objects.
SQLite uses the bytecode approach, which provides an important characteristic that SQLite can very easily execute queries incrementally—stopping after each row, for example. This is more useful for a local library database than for a network server where the assumption is that the entire query will be executed before results are returned over the wire.
My approach to HTML web components. Some neat patterns here from Jeremy Keith, who is using Web Components extensively for progressive enhancement of existing markup.
The reactivity you get with full-on frameworks [like React and Vue] isn’t something that web components offer. But I do think web components can replace jQuery and other approaches to scripting the DOM.
Jeremy likes naming components with their element as a prefix (since all element names must contain at least one hyphen), and suggests building components under the single responsibility principle - so you can do things like <button-confirm><button-clipboard><button>....
He configures these buttons with data- attributes and has them communicate with each other using custom events.
Something I hadn't realized is that since the connectedCallback function on a custom element is fired any time that element is attached to a page you can fetch() and then insertHTML content that includes elements and know that they will initialize themselves without needing any extra logic - great for the kind of pattern encourages by systems such as HTMX.
How an empty S3 bucket can make your AWS bill explode (via) Maciej Pocwierz accidentally created an S3 bucket with a name that was already used as a placeholder value in a widely used piece of software. They saw 100 million PUT requests to their new bucket in a single day, racking up a big bill since AWS charges $5/million PUTs.
It turns out AWS charge that same amount for PUTs that result in a 403 authentication error, a policy that extends even to "requester pays" buckets!
So, if you know someone's S3 bucket name you can DDoS their AWS bill just by flooding them with meaningless unauthenticated PUT requests.
AWS support refunded Maciej's bill as an exception here, but I'd like to see them reconsider this broken policy entirely.
Update from Jeff Barr:
We agree that customers should not have to pay for unauthorized requests that they did not initiate. We’ll have more to share on exactly how we’ll help prevent these charges shortly.
Performance analysis indicates that SQLite spends very little time doing bytecode decoding and dispatch. Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records - all of which happens in compiled C code. Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements.
So at least in the case of SQLite, compiling all the way down to machine code might provide a performance boost 3% or less. That's not very much, considering the size, complexity, and portability costs involved.
We collaborate with open-source and commercial model providers to bring their unreleased models to community for preview testing.
Model providers can test their unreleased models anonymously, meaning the models' names will be anonymized. A model is considered unreleased if its weights are neither open, nor available via a public API or service.
— LMSYS