Saturday, 5th October 2024
Wikidata is a Giant Crosswalk File.
Drew Breunig shows how to take the 140GB Wikidata JSON export, use sed 's/,$//'
to convert it to newline-delimited JSON, then use DuckDB to run queries and extract external identifiers, including a query that pulls out 500MB of latitude and longitude points.
marimo v0.9.0 with mo.ui.chat. The latest release of the Marimo Python reactive notebook project includes a neat new feature: you can now easily embed a custom chat interface directly inside of your notebook.
Marimo co-founder Myles Scolnick posted this intriguing demo on Twitter, demonstrating a chat interface to my LLM library “in only 3 lines of code”:
import marimo as mo import llm model = llm.get_model() conversation = model.conversation() mo.ui.chat(lambda messages: conversation.prompt(messages[-1].content))
I tried that out today - here’s the result:
marimo.ui.chat() takes a function which is passed a list of Marimo chat messages (representing the current state of that widget) and returns a string - or other type of renderable object - to add as the next message in the chat. This makes it trivial to hook in any custom chat mechanism you like.
Marimo also ship their own built-in chat handlers for OpenAI, Anthropic and Google Gemini which you can use like this:
mo.ui.chat( mo.ai.llm.anthropic( "claude-3-5-sonnet-20240620", system_message="You are a helpful assistant.", api_key="sk-ant-...", ), show_configuration_controls=True )
UV with GitHub Actions to run an RSS to README project.
Jeff Triplett demonstrates a very neat pattern for using uv to run Python scripts with their dependencies inside of GitHub Actions. First, add uv
to the workflow using the setup-uv action:
- uses: astral-sh/setup-uv@v3
with:
enable-cache: true
cache-dependency-glob: "*.py"
This enables the caching feature, which stores uv's own cache of downloads from PyPI between runs. The cache-dependency-glob
key ensures that this cache will be invalidated if any .py
file in the repository is updated.
Now you can run Python scripts using steps that look like this:
- run: uv run fetch-rss.py
If that Python script begins with some dependency definitions (PEP 723) they will be automatically installed by uv run
on the first run and reused from the cache in the future. From the start of fetch-rss.py:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "feedparser",
# "typer",
# ]
# ///
uv
will download the required Python version and cache that as well.