Single sign-on against GitHub using ASGI middleware
14th July 2019
I released Datasette 0.29 last weekend, the first version of Datasette to be built on top of ASGI (discussed previously in Porting Datasette to ASGI, and Turtles all the way down).
This also marked the introduction of the new asgi_wrapper plugin hook, which allows plugins to wrap the entire Datasette application in their own piece of ASGI middleware.
To celebrate this new capability, I also released two new plugins: datasette-cors, which provides fine-grained control over CORS headers (using my asgi-cors library from a few months ago) and datasette-auth-github, the first of hopefully many authentication plugins for Datasette.
datasette-auth-github
The new plugin is best illustrated with a demo.
Visit https://datasette-auth-demo.now.sh/ and you will be redirected to GitHub and asked to approve access to your account (just your e-mail address, not repository access).
Agree, and you’ll be redirected back to the demo with a new element in the Datasette header: your GitHub username, plus a “log out” link in the navigation bar at the top of the screen.
Controlling who can access
The default behaviour of the plugin is to allow in anyone with a GitHub account. Since the primary use-case for the plugin (at least for the moment) is restricting access to view data to a trusted subset of people, the plugin lets you configure who is allowed to view your data in three different ways:
- You can restrict access to a specific list of GitHub accounts, using the
allow_users
configuration option. - You can restrict access to members of one or more GitHub organizations, with
allow_orgs
. - You can restrict access to members of specific teams within an organization, using
allow_teams
.
Datasette inherits quite a sophisticated user management system from GitHub, with very little effort required from the plugin. The user_is_allowed()
method that implements all three of the above options against the GitHub API in just 40 lines of code.
These options can be set using the "plugins"
section of the Datasette metadata.json
configuration file. Here’s an example:
{
"plugins": {
"datasette-auth-github": {
"client_id": {"$env": "GITHUB_CLIENT_ID"},
"client_secret": {"$env": "GITHUB_CLIENT_SECRET"},
"allow_users": ["simonw"]
}
}
}
This also illustrates a new Datasette feature: the ability to set secret plugin configuration values. {"$env": "GITHUB_CLIENT_SECRET"}
means "read this configuration option from the environment variable GITHUB_CLIENT_SECRET
".
Automatic log in
Like many OAuth providers, GitHub only asks the user for their approval the first time they log into a given app. Any subsequent times they are redirected to GitHub it will skip the permission screen and redirect them right back again with a token.
This means we can implement automatic log in: any time a visitor arrives who does not have a cookie we can bounce them directly to GitHub, and if they have already consented they will be logged in instantly.
This is a great user-experience—provided the user is logged into GitHub they will be treated as if they are logged into your application—but it does come with a downside: what if the user clicks the “log out” link?
For the moment I’ve implemented this using another cookie: if the user clicks “log out”, I set an asgi_auth_logout
cookie marking the user as having explicitly logged out. While they have that cookie they won’t be logged in automatically, instead having to click an explicit link. See issue 41 for thoughts on how this could be further improved.
One pleasant side-effect of all of this is that datasette-auth-github
doesn’t need to persist the users GitHub access_token
anywhere—it uses it during initil authentication check for any required organizations or teams, but then it deliberately forgets the token entirely.
OAuth access tokens are like passwords, so the most resonsible thing for a piece of softare to do with them is avoid storing them anywhere at all unless they are explicitly needed.
What happens when a user leaves an organization?
When building against a single sign-in provider, consideration needs to be given to offboarding: when a user is removed from a team or organization they should also lose access to their SSO applications.
This is difficult when an application sets its own authentication cookies, like datasette-auth-github
does.
One solution would be to make an API call on every request to the application, to verify that the user should still have access. This would slow everything down and is likely to blow through rate limits as well, so we need a more efficient solution.
I ended up solving this with two mechanisms. Since we have automatic log in, our cookies don’t actually need to last very long—so by default the signed cookies set by the plugin last for just one hour. When a user’s cookie has expired they will be redirected back through GitHub—they probably won’t even notice the redirect, and their permissions will be re-verified as part of that flow.
But what if you need to invalidate those cookies instantly?
To cover that case, I’ve incorporated an optional cookie_version
configuration option into the signatures on the cookies. If you need to invalidate every signed cookie that is out there—to lock out a compromised GitHub account owner for example—you can do so by changing the cookie_version
configuration option and restarting (or re-deploying) Datasette.
These options are all described in detail in the project README.
Integration with datasette publish
The datasette publish command-line tool lets users instantly publish a SQLite database to the internet, using Heroku, Cloud Run or Zeit Now v1. I’ve added suppor for setting secret plugin configuration directly to that tool, which means you can publish an authentication-protected SQLite database to the internet with a shell one-liner, using --install=datasette-auth-github
to install the plugin and --plugin-secret
to configure it:
$ datasette publish cloudrun fixtures.db \
--install=datasette-auth-github \
--name datasette-auth-protected \
--service datasette-auth-protected \
--plugin-secret datasette-auth-github allow_users simonw \
--plugin-secret datasette-auth-github client_id 85f6224cb2a44bbad3fa \
--plugin-secret datasette-auth-github client_secret ...
This creates a Cloud Run instance which only allows GitHub user simonw to log in. You could instead use --plugin-secret datasette-auth-github allow_orgs my-org
to allow any users from a specific GitHub organization.
Note that Cloud Run does not yet give you full control over the URL that will be assigned to your deployment. In this case it gave me https://datasette-auth-protected-j7hipcg4aq-uc.a.run.app
—which works fine, but I needed to update my GitHub OAuth application’s callback URL manually to https://datasette-auth-protected-j7hipcg4aq-uc.a.run.app/-/auth-callback
after deploying the application in order to get the authentication flow to work correctly.
Add GitHub authentication to any ASGI application!
datasette-auth-github
isn’t just for Datasette: I deliberately wrote the plugin as ASGI middleware first, with only a very thin layer of extra code to turn it into an installable plugin.
This means that if you are building any other kind of ASGI app (or using an ASGI-compatible framework such as Starlette or Sanic) you can wrap your application directly with the middleware and get the same authentication behaviour as when the plugin is added to Datasette!
Here’s what that looks like:
from datasette_auth_github import GitHubAuth
from starlette.applications import Starlette
from starlette.responses import HTMLResponse
import uvicorn
app = Starlette(debug=True)
@app.route("/")
async def homepage(request):
return HTMLResponse("Hello, {}".format(
repr(request.scope["auth"])
))
authenticated_app = GitHubAuth(
app,
client_id="986f5d837b45e32ee6dd",
client_secret="...",
require_auth=True,
allow_users=["simonw"],
)
if __name__ == "__main__":
uvicorn.run(authenticated_app, host="0.0.0.0", port=8000)
The middleware adds a scope["auth"]
key describing the logged in user, which is then passed through to your application. More on this in the README.
Your security reviews needed!
Since datasette-auth-github
adds authentication to Datasette, it is an extremely security-sensitive piece of code. So far I’m the only person who has looked at it: before I start widely recommending it to people I’d really like to get some more eyes on it to check for any potential security problems.
I’ve opened issue #44 encouraging security-minded developers to have a dig through the code and see if there’s anything that can be tightened up or any potential vulnerabilities that need to be addressed. Please get involved!
It’s a pretty small codebase, but here are some areas you might want to inspect:
- At a high level: is the way I’m verifying the user through the GitHub API and then storing their identity in a signed cookie the right way to go?
- The cookie signing secret is derived from the GitHub OAuth application’s
client_id
andclient_secret
(because that secret is already meant to be a secret), combined with thecookie_version
option described above—implementation here. Since this is a derived secret I’m using pbkdf2_hmac with 100,000 iterations. This is by far the most cryptographically interesting part of the code, and could definitely do with some second opinions. - The code used to sign and verify cookies is based on Django’s (thoroughly reviewed) implementation, but could benefit from a sanity check.
- I wanted this library to work on Glitch, which currently only provides Python 3.5.2. Python’s asyncio HTTP librarys such as http3 and aiohttp both require more modern Pythons, so I ended up rolling my own very simple async HTTP function which uses
urllib.request
inside aloop.run_in_executor
thread pool. Is that approach sound? Rolling my own HTTP client in this way feels a little hairy.
This has been a really fun project so far, and I’m very excited about the potential for authenticated Datasette moving forward—not to mention the possibilites unlocked by an ASGI middleware ecosystem with strong support for wrapping any application in an authentication layer.
More recent articles
- Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac - 12th November 2024
- Visualizing local election results with Datasette, Observable and MapLibre GL - 9th November 2024
- Project: VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5 - 7th November 2024