89 items tagged “apis”
2024
openai/openai-openapi. Seeing as the LLM world has semi-standardized on imitating OpenAI's API format for a whole host of different tools, it's useful to note that OpenAI themselves maintain a dedicated repository for a OpenAPI YAML representation of their current API.
(I get OpenAI and OpenAPI typo-confused all the time, so openai-openapi
is a delightfully fiddly repository name.)
The openapi.yaml file itself is over 26,000 lines long, defining 76 API endpoints ("paths" in OpenAPI terminology) and 284 "schemas" for JSON that can be sent to and from those endpoints. A much more interesting view onto it is the commit history for that file, showing details of when each different API feature was released.
Browsing 26,000 lines of YAML isn't pleasant, so I got Claude to build me a rudimentary YAML expand/hide exploration tool. Here's that tool running against the OpenAI schema, loaded directly from GitHub via a CORS-enabled fetch()
call: https://tools.simonwillison.net/yaml-explorer#.eyJ1c... - the code after that fragment is a base64-encoded JSON for the current state of the tool (mostly Claude's idea).
The tool is a little buggy - the expand-all option doesn't work quite how I want - but it's useful enough for the moment.
Update: It turns out the petstore.swagger.io demo has an (as far as I can tell) undocumented ?url=
parameter which can load external YAML files, so here's openai-openapi/openapi.yaml in an OpenAPI explorer interface.
Private School Labeler on Bluesky. I am utterly delighted by this subversive use of Bluesky's labels feature, which allows you to subscribe to a custom application that then adds visible labels to profiles.
The feature was designed for moderation, but this labeler subverts it by displaying labels on accounts belonging to British public figures showing which expensive private school they went to and what the current fees are for that school.
Here's what it looks like on an account - tapping the label brings up the information about the fees:
These labels are only visible to users who have deliberately subscribed to the labeler. Unsurprisingly, some of those labeled aren't too happy about it!
In response to a comment about attending on a scholarship, the label creator said:
I'm explicit with the labeller that scholarship pupils, grant pupils, etc, are still included - because it's the later effects that are useful context - students from these schools get a leg up and a degree of privilege, which contributes eg to the overrepresentation in British media/politics
On the one hand, there are clearly opportunities for abuse here. But given the opt-in nature of the labelers, this doesn't feel hugely different to someone creating a separate webpage full of information about Bluesky profiles.
I'm intrigued by the possibilities of labelers. There's a list of others on bluesky-labelers.io, including another brilliant hack: Bookmarks, which lets you "report" a post to the labeler and then displays those reported posts in a custom feed - providing a private bookmarks feature that Bluesky itself currently lacks.
Update: @us-gov-funding.bsky.social is the inevitable labeler for US politicians showing which companies and industries are their top donors, built by Andrew Lisowski (source code here) using data sourced from OpenScrets. Here's what it looks like on this post:
Bluesky WebSocket Firehose. Very quick (10 seconds of Claude hacking) prototype of a web page that attaches to the public Bluesky WebSocket firehose and displays the results directly in your browser.
Here's the code - there's very little to it, it's basically opening a connection to wss://jetstream2.us-east.bsky.network/subscribe?wantedCollections=app.bsky.feed.post
and logging out the results to a <textarea readonly>
element.
Bluesky's Jetstream isn't their main atproto firehose - that's a more complicated protocol involving CBOR data and CAR files. Jetstream is a new Go proxy (source code here) that provides a subset of that firehose over WebSocket.
Jetstream was built by Bluesky developer Jaz, initially as a side-project, in response to the surge of traffic they received back in September when Brazil banned Twitter. See Jetstream: Shrinking the AT Proto Firehose by >99% for their description of the project when it first launched.
The API scene growing around Bluesky is really exciting right now. Twitter's API is so expensive it may as well not exist, and Mastodon's community have pushed back against many potential uses of the Mastodon API as incompatible with that community's value system.
Hacking on Bluesky feels reminiscent of the massive diversity of innovation we saw around Twitter back in the late 2000s and early 2010s.
Here's a much more fun Bluesky demo by Theo Sanderson: firehose3d.theo.io (source code here) which displays the firehose from that same WebSocket endpoint in the style of a Windows XP screensaver.
How streaming LLM APIs work.
New TIL. I used curl
to explore the streaming APIs provided by OpenAI, Anthropic and Google Gemini and wrote up detailed notes on what I learned.
Also includes example code for receiving streaming events in Python with HTTPX and receiving streaming events in client-side JavaScript using fetch().
Claude’s API now supports CORS requests, enabling client-side applications
Anthropic have enabled CORS support for their JSON APIs, which means it’s now possible to call the Claude LLMs directly from a user’s browser.
[... 625 words]Third, X fails to provide access to its public data to researchers in line with the conditions set out in the DSA. In particular, X prohibits eligible researchers from independently accessing its public data, such as by scraping, as stated in its terms of service. In addition, X's process to grant eligible researchers access to its application programming interface (API) appears to dissuade researchers from carrying out their research projects or leave them with no other choice than to pay disproportionally high fees.
Deactivating an API, one step at a time (via) Bruno Pedro describes a sensible approach for web API deprecation, using API keys to first block new users from using the old API, then track which existing users are depending on the old version and reaching out to them with a sunset period.
The only suggestion I'd add is to implement API brownouts - short periods of time where the deprecated API returns errors, several months before the final deprecation. This can help give users who don't read emails from you notice that they need to pay attention before their integration breaks entirely.
I've seen GitHub use this brownout technique successfully several times over the last few years - here's one example.
Jina AI Reader. Jina AI provide a number of different AI-related platform products, including an excellent family of embedding models, but one of their most instantly useful is Jina Reader, an API for turning any URL into Markdown content suitable for piping into an LLM.
Add r.jina.ai
to the front of a URL to get back Markdown of that page, for example https://r.jina.ai/https://simonwillison.net/2024/Jun/16/jina-ai-reader/ - in addition to converting the content to Markdown it also does a decent job of extracting just the content and ignoring the surrounding navigation.
The API is free but rate-limited (presumably by IP) to 20 requests per minute without an API key or 200 request per minute with a free API key, and you can pay to increase your allowance beyond that.
The Apache 2 licensed source code for the hosted service is on GitHub - it's written in TypeScript and uses Puppeteer to run Readabiliy.js and Turndown against the scraped page.
It can also handle PDFs, which have their contents extracted using PDF.js.
There's also a search feature, s.jina.ai/search+term+goes+here
, which uses the Brave Search API.
Macaroons Escalated Quickly (via) Thomas Ptacek’s follow-up on Macaroon tokens, based on a two year project to implement them at Fly.io. The way they let end users calculate new signed tokens with additional limitations applied to them (“caveats” in Macaroon terminology) is fascinating, and allows for some very creative solutions.
2023
Getting started with the Datasette Cloud API. I wrote an introduction to the Datasette Cloud API for the company blog, with a tutorial showing how to use Python and GitHub Actions to import data from the Federal Register into a table in Datasette Cloud, then configure full-text search against it.
babelmark3 (via) I found this tool today while investigating an bug in Datasette’s datasette-render-markdown plugin: it lets you run a fragment of Markdown through dozens of different Markdown libraries across multiple different languages and compare the results. Under the hood it works with a registry of API URL endpoints for different implementations, most of which are encrypted in the configuration file on GitHub because they are only intended to be used by this comparison tool.
2022
Datasette’s new JSON write API: The first alpha of Datasette 1.0
This week I published the first alpha release of Datasette 1.0, with a significant new feature: Datasette core now includes a JSON API for creating and dropping tables and inserting, updating and deleting data.
[... 2,817 words]2021
API Tokens: A Tedious Survey. Thomas Ptacek reviews different approaches to implementing secure API tokens, from simple random strings stored in a database through various categories of signed token to exotic formats like Macaroons and Biscuits, both new to me.
Macaroons carry a signed list of restrictions with them, but combine it with a mechanism where a client can add their own additional restrictions, sign the combination and pass the token on to someone else.
Biscuits are similar, but “embed Datalog programs to evaluate whether a token allows an operation”.
Notes on streaming large API responses
I started a Twitter conversation last week about API endpoints that stream large amounts of data as an alternative to APIs that return 100 results at a time and require clients to paginate through all of the pages in order to retrieve all of the data:
[... 1,692 words]Replaying logs to exercise the new API
22 days ago n1mmy pushed a change to help.vaccinate
which logged full details of inoming Netlify function API traffic to an Airtable database.
APIs from CSS without JavaScript: the datasette-css-properties plugin
I built a new Datasette plugin called datasette-css-properties. It’s very, very weird—it adds a .css
output extension to Datasette which outputs the result of a SQL query using CSS custom property format. This means you can display the results of database queries using pure CSS and HTML, no JavaScript required!
Custom Properties as State.
Fascinating thought experiment by Chris Coyier: since CSS custom properties can be defined in an external stylesheet, we can APIs that return stylesheets defining dynamically server-side generated CSS values for things like time-of-day colour schemes or even strings that can be inserted using ::after { content: var(--my-property)
.
This gave me a very eccentric idea for a Datasette plugin...
2020
GraphQL in Datasette with the new datasette-graphql plugin
This week I’ve mostly been building datasette-graphql, a plugin that adds GraphQL query support to Datasette.
[... 1,249 words]PostGraphile: Production Considerations. PostGraphile is a tool for building a GraphQL API on top of an existing PostgreSQL schema. Their “production considerations” documentation is particularly interesting because it directly addresses some of my biggest worries about GraphQL: the potential for someone to craft an expensive query that ties up server resources. PostGraphile suggests a number of techniques for avoiding this, including a statement timeout, a query allowlist, pagination caps and (in their “pro” version) a cost limit that uses a calculated cost score for the query.
2019
Building a stateless API proxy (via) This is a really clever idea. The GitHub API is infuriatingly coarsely grained with its permissions: you often end up having to create a token with way more permissions than you actually need for your project. Thea Flowers proposes running your own proxy in front of their API that adds more finely grained permissions, based on custom encrypted proxy API tokens that use JWT to encode the original API key along with the permissions you want to grant to that particular token (as a list of regular expressions matching paths on the underlying API).
2017
Datasette: instantly create and publish an API for your SQLite databases
I just shipped the first public version of datasette, a new tool for creating and publishing JSON APIs for SQLite databases.
[... 968 words]2013
Which format for API documentation programmers prefer: PDF or Web?
HTML is a better format for documentation than PDF.
[... 160 words]Does the Google Maps API let you remove details of the map such as street names to focus on pins on the map?
Yes—you can do this with map styles (which allow you to set the visibility if road labels, among other things): http://developers.google.com/map...
[... 53 words]Which is the most complete and up to date API for restaurants/nightlife?
The foursquare API is pretty great for restaurants and nightlife these days. No chance if revenue share though—how would you envisage revenue share working?
[... 44 words]Which free encyclopedias offer free APIs?
Wikipedia runs using Mediawiki, and Mediawiki has an API: http://www.mediawiki.org/wiki/API
[... 23 words]What information do you feel is most valuable when integrating a Web API (REST or SOAP)?
- A really good API explorer
- Comprehensive documentation of the response format, including what happens if certain fields are missing (empty string, null value, missing key?)
- Comprehensive documentation of the available request parameters, including allowed values
- What are the rate limits?
- What is returned if there is an error?
2012
Is it possible to embed Skype into a webpage to use as live chat support for free?
Olark offer a very neat JavaScript widget that does exactly this (it’s text-based messaging, not video or voice): http://www.olark.com/—you can try their demo at the bottom of their page.
[... 72 words]Does Amazon have a API for websites to utilize order and delivery fulfillment?
The Amazon Fulfillment Web Service used to handle this http://aws.amazon.com/fws/—but their site now says "Effective June 2012, Amazon Services will no longer support Amazon Fulfillment Web Service (Amazon FWS). All functions and services currently supported by Amazon FWS are currently available through Amazon Marketplace Web Service (Amazon MWS)." So I guess you want the Amazon Marketplace Web Service: https://developer.amazonservices...
[... 82 words]Are there any website thumbnail services that generate images in real-time?
http://url2png.com/ generates images on demand—you pass the URL directly to the service and it replies with a PNG image. The first load can take a few seconds (depending on how long it takes the originating site to serve up the assets etc) but they cache the generated images so future requests for the same URL will be served instantly.
Is there an API that returns metadata for a given URL?
I suggest taking a look at http://embed.ly/—it can take a huge range of URLs and turn them in to JSON metadata. Here’s what it can do with a Wikipedia page: http://embed.ly/docs/explore/obj...—and here’s Google Maps URL (not as useful, but still some interesting metadata extracted) http://embed.ly/docs/explore/obj...
[... 69 words]