5 items tagged “dsl”
2024
yet-another-applied-llm-benchmark. Nicholas Carlini introduced this personal LLM benchmark suite back in February as a collection of over 100 automated tests he runs against new LLM models to evaluate their performance against the kinds of tasks he uses them for.
There are two defining features of this benchmark that make it interesting. Most importantly, I've implemented a simple dataflow domain specific language to make it easy for me (or anyone else!) to add new tests that realistically evaluate model capabilities. This DSL allows for specifying both how the question should be asked and also how the answer should be evaluated. [...] And then, directly as a result of this, I've written nearly 100 tests for different situations I've actually encountered when working with LLMs as assistants
The DSL he's using is fascinating. Here's an example:
"Write a C program that draws an american flag to stdout." >> LLMRun() >> CRun() >> \
VisionLLMRun("What flag is shown in this image?") >> \
(SubstringEvaluator("United States") | SubstringEvaluator("USA")))
This triggers an LLM to execute the prompt asking for a C program that renders an American Flag, runs that through a C compiler and interpreter (executed in a Docker container), then passes the output of that to a vision model to guess the flag and checks that it returns a string containing "United States" or "USA".
The DSL itself is implemented entirely in Python, using the __rshift__
magic method for >>
and __rrshift__
to enable strings to be piped into a custom object using "command to run" >> LLMRunNode
.
2023
Building Search DSLs with Django (via) Neat tutorial by Dan Lamanna: how to build a GitHub-style search feature—supporting modifiers like “is:open author:danlamanna”—using PyParsing and the Django ORM.
2022
Datasette table diagram using Mermaid (via) Mermaid is a DSL for generating diagrams from plain text, designed to be embedded in Markdown. GitHub just added support for Mermaid to their Markdown pipeline, which inspired me to try it out. Here’s an Observable Notebook I built which uses Mermaid to visualize the relationships between Datasette tables based on their foreign keys.
2009
Richard Jones: Something I’m working on... Python’s with statement appears to provide just enough syntactic sugar to create some really interesting DSL-style APIs—here’s a very promising example for laying out GUI applications.
2007
Metaprogramming JavaScript Presentation. Adam McCrea demonstrates some incredibly elegant DSL -style JavaScript based on chaining method calls together.