Thursday, 19th September 2024
The web’s clipboard, and how it stores data of different types.
Alex Harri's deep dive into the Web clipboard API, the more recent alternative to the old document.execCommand()
mechanism for accessing the clipboard.
There's a lot to understand here! Some of these APIs have a history dating back to Internet Explorer 4 in 1997, and there have been plenty of changes over the years to account for improved understanding of the security risks of allowing untrusted code to interact with the system clipboard.
Today, the most reliable data formats for interacting with the clipboard are the "standard" formats of text/plain
, text/html
and image/png
.
Figma does a particularly clever trick where they share custom Figma binary data structures by encoding them as base64 in data-metadata
and data-buffer
attributes on a <span>
element, then write the result to the clipboard as HTML. This enables copy-and-paste between the Figma web and native apps via the system clipboard.
Moshi (via) Moshi is "a speech-text foundation model and full-duplex spoken dialogue framework". It's effectively a text-to-text model - like an LLM but you input audio directly to it and it replies with its own audio.
It's fun to play around with, but it's not particularly useful in comparison to other pure text models: I tried to talk to it about California Brown Pelicans and it gave me some very basic hallucinated thoughts about California Condors instead.
It's very easy to run locally, at least on a Mac (and likely on other systems too). I used uv
and got the 8 bit quantized version running as a local web server using this one-liner:
uv run --with moshi_mlx python -m moshi_mlx.local_web -q 8
That downloads ~8.17G of model to a folder in ~/.cache/huggingface/hub/
- or you can use -q 4
and get a 4.81G version instead (albeit even lower quality).