2 posts tagged “dspy”
2025
Let the LLM Write the Prompts: An Intro to DSPy in Compound Al Pipelines. I've had trouble getting my head around DSPy in the past. This half hour talk by Drew Breunig at the recent Databricks Data + AI Summit is the clearest explanation I've seen yet of the kinds of problems it can help solve.
Here's Drew's written version of the talk.
Drew works on Overture Maps, which combines Point Of Interest data from numerous providers to create a single unified POI database. This is an example of conflation, a notoriously difficult task in GIS where multiple datasets are deduped and merged together.
Drew uses an inexpensive local model, Qwen3-0.6B, to compare 70 million addresses and identity matches, for example between Place(address="3359 FOOTHILL BLVD", name="RESTAURANT LOS ARCOS")
and Place(address="3359 FOOTHILL BLVD", name="Los Arcos Taqueria"')
.
DSPy's role is to optimize the prompt used for that smaller model. Drew used GPT-4.1 and the dspy.MIPROv2 optimizer, producing a 700 token prompt that increased the score from 60.7% to 82%.
Why bother? Drew points out that having a prompt optimization pipeline makes it trivial to evaluate and switch to other models if they can score higher with a custom optimized prompt - without needing to execute that trial-and-error optimization by hand.
2024
AI Prompt Engineering Is Dead. Long live AI prompt engineering. Ignoring the clickbait in the title, this article summarizes research around the idea of using machine learning models to optimize prompts—as seen in tools such as Stanford’s DSPy and Google’s OPRO.
The article includes possibly the biggest abuse of the term “just” I have ever seen:
“But that’s where hopefully this research will come in and say ‘don’t bother.’ Just develop a scoring metric so that the system itself can tell whether one prompt is better than another, and then just let the model optimize itself.”
Developing a scoring metric to determine which prompt works better remains one of the hardest challenges in generative AI!
Imagine if we had a discipline of engineers who could reliably solve that problem—who spent their time developing such metrics and then using them to optimize their prompts. If the term “prompt engineer” hadn’t already been reduced to basically meaning “someone who types out prompts” it would be a pretty fitting term for such experts.