Simon Willison on andrej-karpathy

40 posts tagged “andrej-karpathy”

2024

When I first published the micrograd repo, it got some traction on GitHub but then somewhat stagnated and it didn't seem that people cared much. [...] When I made the video that built it and walked through it, it suddenly almost 100X'd the overall interest and engagement with that exact same piece of code.

[...] you might be leaving somewhere 10-100X of the potential of that exact same piece of work on the table just because you haven't made it sufficiently accessible.

— Andrej Karpathy

# 21st February 2024, 9:26 pm / andrej-karpathy, video

Let’s build the GPT Tokenizer. When Andrej Karpathy left OpenAI last week a lot of people expressed hope that he would be increasing his output of educational YouTube videos.

Here’s an in-depth 2 hour dive into how tokenizers work and how to build one from scratch, published this morning.

The section towards the end, “revisiting and explaining the quirks of LLM tokenization”, helps explain a number of different LLM weaknesses—inability to reverse strings, confusion over arithmetic and even a note on why YAML can work better than JSON when providing data to LLMs (the same data can be represented in less tokens).

# 20th February 2024, 6:02 pm / ai, andrej-karpathy, generative-ai, llms, tokenization

2023

I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.

We direct their dreams with prompts. The prompts start the dream, and based on the LLM's hazy recollection of its training documents, most of the time the result goes someplace useful.

It's only when the dreams go into deemed factually incorrect territory that we label it a "hallucination". It looks like a bug, but it's just the LLM doing what it always does.

— Andrej Karpathy

# 9th December 2023, 6:08 am / andrej-karpathy, llms, ai, generative-ai, hallucinations

YouTube: Intro to Large Language Models. Andrej Karpathy is an outstanding educator, and this one hour video offers an excellent technical introduction to LLMs.

At 42m Andrej expands on his idea of LLMs as the center of a new style of operating system, tying together tools and and a filesystem and multimodal I/O.

There’s a comprehensive section on LLM security—jailbreaking, prompt injection, data poisoning—at the 45m mark.

I also appreciated his note on how parameter size maps to file size: Llama 70B is 140GB, because each of those 70 billion parameters is a 2 byte 16bit floating point number on disk.

# 23rd November 2023, 5:02 pm / jailbreaking, ai, andrej-karpathy, prompt-injection, generative-ai, llms

Looking at LLMs as chatbots is the same as looking at early computers as calculators. We're seeing an emergence of a whole new computing paradigm, and it is very early.

— Andrej Karpathy

# 28th September 2023, 8:50 pm / andrej-karpathy, llms, ai, generative-ai

llama.cpp surprised many people (myself included) with how quickly you can run large LLMs on small computers [...] TLDR at batch_size=1 (i.e. just generating a single stream of prediction on your computer), the inference is super duper memory-bound. The on-chip compute units are twiddling their thumbs while sucking model weights through a straw from DRAM. [...] A100: 1935 GB/s memory bandwidth, 1248 TOPS. MacBook M2: 100 GB/s, 7 TFLOPS. The compute is ~200X but the memory bandwidth only ~20X. So the little M2 chip that could will only be about ~20X slower than a mighty A100.

— Andrej Karpathy

# 16th August 2023, 4:13 am / andrej-karpathy, generative-ai, llama, ai, llms, llama-cpp

The most dramatic optimization to nanoGPT so far (~25% speedup) is to simply increase vocab size from 50257 to 50304 (nearest multiple of 64). This calculates added useless dimensions but goes down a different kernel path with much higher occupancy. Careful with your Powers of 2.

— Andrej Karpathy

# 4th February 2023, 12:08 am / andrej-karpathy, performance, gpt-3, generative-ai, ai, llms

nanoGPT. “The simplest, fastest repository for training/finetuning medium-sized GPTs”—by Andrej Karpathy, in about 600 lines of Python.

# 2nd January 2023, 11:27 pm / python, ai, gpt-3, andrej-karpathy, generative-ai, llms

2022

karpathy/minGPT (via) A “minimal PyTorch re-implementation” of the OpenAI GPT training and inference model, by Andrej Karpathy. It’s only a few hundred lines of code and includes extensive comments, plus notebook demos.

# 6th September 2022, 2:52 pm / machine-learning, ai, gpt-3, andrej-karpathy, generative-ai, llms

To make the analogy explicit, in Software 1.0, human-engineered source code (e.g. some .cpp files) is compiled into a binary that does useful work. In Software 2.0 most often the source code comprises 1) the dataset that defines the desirable behavior and 2) the neural net architecture that gives the rough skeleton of the code, but with many details (the weights) to be filled in. The process of training the neural network compiles the dataset into the binary — the final neural network. In most practical applications today, the neural net architectures and the training systems are increasingly standardized into a commodity, so most of the active “software development” takes the form of curating, growing, massaging and cleaning labeled datasets.

— Andrej Karpathy

# 24th August 2022, 9:28 pm / machine-learning, ai, data, andrej-karpathy

«« first « previous page 2 / 2

Simon Willison’s Weblog

40 posts tagged “andrej-karpathy”

2024

2023

2022