Simon Willison’s Weblog

Subscribe
Atom feed

Research

Filters: Sorted by date

Research Monty WASM + Pyodide — Monty WASM + Pyodide explores compiling Monty—a Rust-based, sandboxed Python interpreter—into WebAssembly for seamless browser access. It provides two integration paths: a standalone WASM module accessible directly from JavaScript, and a Pyodide-compatible wheel for usage in Python-in-the-browser environments. The project enables safe, dependency-free Python code execution with features like variable injection, output capturing (including print statements), and robust error handling.
Research Building PyO3/Maturin Rust Extension Modules as WebAssembly Wheels for Pyodide — Compiling Rust-based Python extension modules (via PyO3 and maturin) into WebAssembly wheels for Pyodide involves precise coordination of toolchain versions and build flags to ensure compatibility. The process relies on maturin (≥1.0) for packaging, the Emscripten SDK (with the exact version used by Pyodide), and a Rust nightly toolchain matching Pyodide's ABI, particularly the `-Z emscripten-wasm-eh` flag and a compatible sysroot for Python 3.13 (Pyodide 0.28+).
Research just-bash: Deno JSONL Server + Python Client — Exploring the capabilities of just-bash, this project integrates the TypeScript-based bash emulator into a persistent, JSONL-over-stdio server in Deno, accessible via a robust Python client library. The solution enables sandboxed bash scripting with comprehensive built-in commands, a virtual filesystem, and optional network access, with persistent state and fine-grained request control (env, cwd, timeout) supported.
Research WASM REPL CLI Tools — WASM REPL CLI Tools enable JavaScript and Python REPLs from the command line by leveraging WebAssembly runtimes in Go, built on the wazero engine. The project supplies separate binaries for each language—one using QuickJS WASI and the other CPython WASI—offering direct code execution, interactive shells, and a JSONL mode. JSONL mode lets external applications submit code for execution while maintaining persistent state across requests, facilitating programmatic integration.
Research What we learned today about this ChatGPT “container” environment — Experiments in the ChatGPT sandbox reveal that general outbound internet access from Python and other user code (such as HTTP requests) is entirely blocked, while package managers like pip and npm are permitted to fetch dependencies using curated internal registry proxies. The container provides a privileged fetching mechanism (`container.download`) for select public URLs, which is more powerful than standard code-based networking.
Research Cloudflare Workers with Python and SQLite — Exploring the intersection of Cloudflare Workers, Python (via Pyodide), and SQLite persistence, this project demonstrates practical techniques for building serverless applications with both JavaScript and Python runtimes on the Cloudflare platform. JavaScript Workers, paired with D1 for persistent SQLite storage, handled form input, basic routing, and a page view counter.
Research DuckDB Security Constraints for Untrusted Query Execution — Evaluating DuckDB’s sandboxing features for secure untrusted query execution, this project demonstrates how to configure read-only access, restrict file and network operations, and enforce query timeouts in Python environments. Native settings like `read_only`, `enable_external_access`, and `allowed_paths` effectively limit users to preapproved data sources, while locking configuration via `lock_configuration=true` ensures that these controls cannot be altered by malicious queries.
Research String Redaction Library — Designed to detect secrets in text, the String Redaction Library leverages statistical analysis of character patterns—such as vowel/consonant ratios and digit presence—rather than relying on specific secret formats or regular expressions. It identifies highly random or non-English-like alphanumeric strings, hashes, and tokens without context awareness, making it easy to scan for hard-to-spot secrets in source code or logs.
Research whenwords Esoteric Languages — Showcasing the versatility of the whenwords time formatting specification, this project features parallel implementations in three esoteric programming languages: LOLCODE, Rockstar, and WebAssembly Text (WAT). Each version adapts the time formatting logic—such as "3 hours ago" and duration parsing—using the idiomatic constructs and limitations of its language, producing transpiled or compiled code for JavaScript, Python, or a compact WASM binary.
Research pymemchr-c: C Implementation of memchr Library — Offering a pure C reimplementation of the Rust-based pymemchr, pymemchr-c delivers high-performance byte and substring search functions to Python with extensive SIMD (SSE2/AVX2/NEON) optimizations and runtime CPU feature detection. Its unique "Packed Pair" substring search algorithm enables the C version to outperform both Python's built-in methods (up to 28x faster) and the original Rust extension (up to 1.5x faster for substring operations), all while removing the need for a Rust toolchain.
Research sqlite3-wasm Investigation Report — Seeking to enable Python's SQLite interface with WebAssembly, the project developed a `sqlite3_wasm` library—a drop-in replacement for Python's standard `sqlite3` module. By compiling SQLite 3.45.3 to WASM with wasi-sdk and wrapping the resulting binary with a Python API, the solution delivers fully functional, in-memory, WASM-powered database operations using the wasmtime runtime.
Research pymemchr — pymemchr is a Python library that provides ultra-fast byte and substring search functions by binding to the memchr Rust crate, leveraging SIMD optimizations for superior performance. Using PyO3 and Maturin for cross-language integration, pymemchr offers efficient routines for finding single bytes, searching for multiple bytes, and locating substring patterns, both forwards and backwards, with highly competitive speedup over native Python methods.
Research SQLite Time Limit Extension — Designed as a Python C extension, the SQLite Time Limit Extension introduces a function, execute_with_timeout, enabling SQL queries against a SQLite database to be terminated if they exceed a specified millisecond threshold. This is achieved using SQLite's progress handler, ensuring that long-running queries do not block application responsiveness. Usage is simple via standard import, and rigorous tests are provided with pytest to validate both normal operation and timeouts.
Research HTTP Range Requests for Wheel Metadata — Leveraging ZIP file structure and HTTP range requests, tools like uv efficiently extract wheel metadata for Python packages without downloading entire archives. By fetching just the last 16KB of the wheel (central directory and EOCD), parsing for the METADATA file offset, and then requesting exactly its byte range, uv and the accompanying Python prototype routinely reduce bandwidth usage by over 70%.
Research Vibium Python Client Investigation — Examining the Vibium browser automation project, this investigation developed a Python client library that interoperates with Vibium’s Go-powered "clicker" binary and existing Node.js tools. The Python client exposes both synchronous and asynchronous APIs, replicating advanced browser automation features such as auto-waiting, visibility checks, and custom commands (e.g., `vibium:find`, `vibium:click`) via WebDriver BiDi over WebSocket.
Research Debug: Failed Cog Code Fix — Debugging investigation into why commit 0dcfad4's fix for cog code rendering didn't work. The fix correctly used string concatenation to avoid `-->` in Python strings, but the explanatory comment itself contained the literal `-->` sequence, which closed the HTML comment early. Solution: rewrote the comment to avoid the problematic character sequence.

I had Claude Code build an experimental Redis module for executing JavaScript code in a similar way to Lua using JS.EVAL and similar functions.

I was inspired to try this by this comment from antirez about how he would have used JavaScript in place of Lua as the embedded Redis scripting language if MicroQuickJS had been available in 2010.

Research Browser URL Size Limits Investigation — Major browser engines demonstrate significant differences in how they enforce URL length limits. Chromium sets a 2 MB cap at its inter-process communication boundary, rejecting longer URLs when crossing processes. Firefox relies on user-configurable preferences, employing a 1 MB "standard" limit but permitting up to 512 MB in absolute terms, with stricter limits (2,000 characters) for history and bookmarks.
Research mquickjs Sandbox Investigation — Exploring mquickjs, a highly minimal JavaScript engine, this project rigorously evaluates its suitability as a safe sandbox for running untrusted code. Various integration approaches are implemented, including Python FFI, C extensions, subprocess invocation, and WebAssembly runtimes—each tested for startup and execution performance, security isolation, and feature compatibility.
Research Claude Code for Web Environment — Running Claude Code on the web offers developers a versatile coding sandbox on Ubuntu 24.04, leveraging a broad toolkit that includes Python 3.11, Node.js 22, Go, Rust, and more, alongside developer utilities (Git, Make) and database clients (SQLite, PostgreSQL).
Research Litestream S3 Replication Experiments — Experiments in this project evaluate Litestream’s robustness when SQLite writes occur while Litestream is stopped and later restarted, with focus on replication to S3. Both the simple restart and the scenario where the WAL is checkpointed (truncated) while Litestream is offline confirm no data loss: Litestream either streams pending WAL changes upon restart or detects a database change and uploads a new full snapshot (“generation”).
Research Can BeautifulSoup Use JustHTML as a Parser? — BeautifulSoup 4 can be integrated with JustHTML, a pure Python HTML5 parser, enabling full compliance with the HTML5 parsing algorithm according to the WHATWG specification. By implementing a custom `JustHTMLTreeBuilder`, BeautifulSoup’s parser plugin system can leverage JustHTML for parsing, allowing seamless use of BeautifulSoup’s familiar API and features—like `find_all()` and CSS selectors—while inheriting robust, standards-adherent HTML handling.
Research Streaming File Upload Prototype — Demonstrating efficient large file uploads, this prototype integrates the streaming-form-data library with a Starlette-based ASGI server to enable true streaming of multipart file data directly to disk, bypassing memory bottlenecks. It incrementally parses incoming form data and supports checksum calculation on-the-fly, handling multiple simultaneous file uploads via async workflows.
Research Automatic JavaScript API Tagging for simonw/tools — Efficiently categorizing the 155 HTML tools in simonw/tools by their JavaScript API usage, this project developed an automated pipeline combining Cheerio for HTML parsing and Acorn for JavaScript AST analysis. The solution robustly filters out false positives from comments, strings, and non-code regions, accurately tagging over 60 Web APIs and handling modern ES modules and edge script types.
Research Browser-Based Single File Bundler — Investigating the feasibility of Vite as a browser-based bundler, this project demonstrates that while Vite itself cannot operate directly in the browser due to its Node.js dependencies, client-side file bundling is achievable using alternative strategies. Three approaches were prototyped: a pure JavaScript "simple" bundler for inlining assets, an esbuild-wasm browser integration for ES module support, and full Vite bundling via StackBlitz WebContainers using vite-plugin-singlefile.
Research AST-Grep Import Rewriter — Leveraging ast-grep and custom YAML rules, the AST-Grep Import Rewriter offers a structured approach to automatically extract, analyze, and rewrite obfuscated JavaScript import statements across ES6, CommonJS, dynamic imports, and webpack bundles. By parsing source files, it generates mapping templates and applies user-defined mappings, converting unreadable module paths into meaningful names with either regex- or AST-based transformations.
Research Offline Notes Sync System — Building on offline-first principles, this notes sync system enables robust note creation and editing without active internet connectivity, using IndexedDB and service workers on the client side. It employs operation-based sync and vector clocks for fine-grained conflict detection and resolution, and features a three-way character-level merge algorithm inspired by Apple Notes.
Research Epsilon Python Wrapper — Epsilon Python Wrapper provides seamless Python bindings to Epsilon, Google's pure Go WebAssembly 2.0 runtime, enabling efficient and dependency-free WASM execution within Python projects. The wrapper exposes a simple API for module instantiation, function calls (with type safety), memory operations, and export inspection, supporting advanced features like SIMD and resource limiting.
Research Datasette-lite JavaScript Initialization: Problem Analysis & Solution Proposal — Datasette-lite faces a core limitation: HTML content injected via `innerHTML` does not execute embedded JavaScript, breaking interactive features and plugin functionality. The proposed solution introduces a standardized initialization event (`datasette_init`) triggered after each content update, allowing dependent scripts and plugins to reinitialize reliably.
Research Datasette Lite NPM Package Investigation — Converting Datasette Lite into a self-hostable NPM package enables seamless client-side data exploration using SQLite, CSV, JSON, and Parquet files directly in the browser, powered by Pyodide. The project removes analytics, adds a CLI server for local testing, and exposes all necessary static assets for easy deployment to platforms like GitHub Pages, Netlify, or Vercel.