<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: webrtc</title><link href="http://feeds.simonwillison.net/" rel="alternate"/><link href="http://feeds.simonwillison.net/tags/webrtc.atom" rel="self"/><id>http://feeds.simonwillison.net/</id><updated>2026-05-09T01:03:58+00:00</updated><author><name>Simon Willison</name></author><entry><title>Quoting Luke Curley</title><link href="https://simonwillison.net/2026/May/9/luke-curley/#atom-tag" rel="alternate"/><published>2026-05-09T01:03:58+00:00</published><updated>2026-05-09T01:03:58+00:00</updated><id>https://simonwillison.net/2026/May/9/luke-curley/#atom-tag</id><summary type="html">
    &lt;blockquote cite="https://moq.dev/blog/webrtc-is-the-problem/"&gt;&lt;p&gt;WebRTC is designed to &lt;strong&gt;degrade and drop my prompt&lt;/strong&gt; during poor network conditions.&lt;/p&gt;
&lt;p&gt;wtf my dude&lt;/p&gt;
&lt;p&gt;WebRTC aggressively drops audio packets to keep latency low. If you’ve ever heard distorted audio on a conference call, that’s WebRTC baybee. The idea is that conference calls depend on rapid back-and-forth, so pausing to wait for audio is unacceptable.&lt;/p&gt;
&lt;p&gt;…but as a user, I would much rather wait an extra 200ms for my slow/expensive prompt to be accurate. After all, I’m paying good money to boil the ocean, and a garbage prompt means a garbage response. It’s not like LLMs are particularly responsive anyway.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;But I’m not allowed to wait&lt;/strong&gt;. It’s &lt;em&gt;impossible&lt;/em&gt; to even retransmit a WebRTC audio packet within a browser; we tried at Discord. The &lt;em&gt;implementation&lt;/em&gt; is hard-coded for real-time latency &lt;strong&gt;or else&lt;/strong&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p class="cite"&gt;&amp;mdash; &lt;a href="https://moq.dev/blog/webrtc-is-the-problem/"&gt;Luke Curley&lt;/a&gt;, OpenAI’s WebRTC Problem, in response to &lt;a href="https://openai.com/index/delivering-low-latency-voice-ai-at-scale/"&gt;How OpenAI delivers low-latency voice AI at scale&lt;/a&gt;&lt;/p&gt;

    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webrtc"&gt;webrtc&lt;/a&gt;&lt;/p&gt;



</summary><category term="openai"/><category term="webrtc"/></entry><entry><title>openai-realtime-solar-system</title><link href="https://simonwillison.net/2025/Jan/31/openai-realtime-solar-system/#atom-tag" rel="alternate"/><published>2025-01-31T19:13:25+00:00</published><updated>2025-01-31T19:13:25+00:00</updated><id>https://simonwillison.net/2025/Jan/31/openai-realtime-solar-system/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/openai/openai-realtime-solar-system"&gt;openai-realtime-solar-system&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
This was my favourite demo from OpenAI DevDay &lt;a href="https://simonwillison.net/2024/Oct/1/openai-devday-2024-live-blog/#live-update-100"&gt;back in October&lt;/a&gt; - a voice-driven exploration of the solar system, developed by Katia Gil Guzman, where you could say things out loud like "show me Mars" and it would zoom around showing you different planetary bodies.&lt;/p&gt;
&lt;p&gt;&lt;img alt="Zoomed in on Mars. A log panel shows JSON on the right." src="https://static.simonwillison.net/static/2025/openai-solar-mars.jpg" /&gt;&lt;/p&gt;
&lt;p&gt;OpenAI &lt;em&gt;finally&lt;/em&gt; released the code for it, now upgraded to use the new, easier to use WebRTC API they &lt;a href="https://simonwillison.net/2024/Dec/17/openai-webrtc/"&gt;released in December&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;I ran it like this, loading my OpenAI API key using &lt;a href="https://llm.datasette.io/en/stable/help.html#llm-keys-get-help"&gt;llm keys get&lt;/a&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;cd /tmp
git clone https://github.com/openai/openai-realtime-solar-system
cd openai-realtime-solar-system
npm install
OPENAI_API_KEY="$(llm keys get openai)" npm run dev
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You need to click on both the Wifi icon and the microphone icon before you can instruct it with your voice. Try "Show me Mars".


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webrtc"&gt;webrtc&lt;/a&gt;&lt;/p&gt;



</summary><category term="ai"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="webrtc"/></entry><entry><title>OpenAI WebRTC Audio demo</title><link href="https://simonwillison.net/2024/Dec/17/openai-webrtc/#atom-tag" rel="alternate"/><published>2024-12-17T23:50:12+00:00</published><updated>2024-12-17T23:50:12+00:00</updated><id>https://simonwillison.net/2024/Dec/17/openai-webrtc/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://tools.simonwillison.net/openai-webrtc"&gt;OpenAI WebRTC Audio demo&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
OpenAI announced &lt;a href="https://openai.com/index/o1-and-new-tools-for-developers/"&gt;a bunch of API features&lt;/a&gt; today, including a brand new &lt;a href="https://platform.openai.com/docs/guides/realtime-webrtc"&gt;WebRTC API&lt;/a&gt; for setting up a two-way audio conversation with their models.&lt;/p&gt;
&lt;p&gt;They &lt;a href="https://twitter.com/OpenAIDevs/status/1869116585044259059"&gt;tweeted this opaque code example&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;async function createRealtimeSession(inStream, outEl, token) {
const pc = new RTCPeerConnection();
pc.ontrack = e =&amp;gt; outEl.srcObject = e.streams[0];
pc.addTrack(inStream.getTracks()[0]);
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const headers = { Authorization: &lt;code&gt;Bearer ${token}&lt;/code&gt;, 'Content-Type': 'application/sdp' };
const opts = { method: 'POST', body: offer.sdp, headers };
const resp = await fetch('https://api.openai.com/v1/realtime', opts);
await pc.setRemoteDescription({ type: 'answer', sdp: await resp.text() });
return pc;
}&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So I &lt;a href="https://gist.github.com/simonw/69151091f7672adb9b42f5b17bd45d44"&gt;pasted that into Claude&lt;/a&gt; and had it build me &lt;a href="https://tools.simonwillison.net/openai-webrtc"&gt;this interactive demo&lt;/a&gt; for trying out the new API.&lt;/p&gt;
&lt;div style="max-width: 100%; margin: 1em 0"&gt;
    &lt;video 
        controls 
        preload="none"
        poster="https://static.simonwillison.net/static/2024/webrtc-demo.jpg" loop
        style="width: 100%; height: auto;"&gt;
        &lt;source src="https://static.simonwillison.net/static/2024/webrtc-demo.mp4" type="video/mp4"&gt;
    &lt;/video&gt;
&lt;/div&gt;

&lt;p&gt;My demo uses an OpenAI key directly, but the most interesting aspect of the new WebRTC mechanism is its support for &lt;a href="https://platform.openai.com/docs/guides/realtime-webrtc#creating-an-ephemeral-token"&gt;ephemeral tokens&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This solves a major problem with their previous realtime API: in order to connect to their endpoint you need to provide an API key, but that meant making that key visible to anyone who uses your application. The only secure way to handle this was to roll a full server-side proxy for their WebSocket API, just so you could hide your API key in your own server. &lt;a href="https://github.com/cloudflare/openai-workers-relay"&gt;cloudflare/openai-workers-relay&lt;/a&gt; is an example implementation of that pattern.&lt;/p&gt;
&lt;p&gt;Ephemeral tokens solve that by letting you make a server-side call to request an ephemeral token which will only allow a connection to be initiated to their WebRTC endpoint for the next 60 seconds. The user's browser then starts the connection, which will last for up to 30 minutes.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/api"&gt;api&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/audio"&gt;audio&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/security"&gt;security&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tools"&gt;tools&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai"&gt;ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/cloudflare"&gt;cloudflare&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openai"&gt;openai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/generative-ai"&gt;generative-ai&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/llms"&gt;llms&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ai-assisted-programming"&gt;ai-assisted-programming&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/claude"&gt;claude&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/multi-modal-output"&gt;multi-modal-output&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webrtc"&gt;webrtc&lt;/a&gt;&lt;/p&gt;



</summary><category term="api"/><category term="audio"/><category term="security"/><category term="tools"/><category term="ai"/><category term="cloudflare"/><category term="openai"/><category term="generative-ai"/><category term="llms"/><category term="ai-assisted-programming"/><category term="claude"/><category term="multi-modal-output"/><category term="webrtc"/></entry><entry><title>OpenAI WebRTC Audio Session</title><link href="https://simonwillison.net/2024/Dec/17/openai-webrtc-2/#atom-tag" rel="alternate"/><published>2024-12-17T22:06:58+00:00</published><updated>2024-12-17T22:06:58+00:00</updated><id>https://simonwillison.net/2024/Dec/17/openai-webrtc-2/#atom-tag</id><summary type="html">
    
        &lt;p&gt;&lt;strong&gt;Tool:&lt;/strong&gt; &lt;a href="https://tools.simonwillison.net/openai-webrtc"&gt;OpenAI WebRTC Audio Session&lt;/a&gt;&lt;/p&gt;
        
    
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/webrtc"&gt;webrtc&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="webrtc"/></entry><entry><title>How Zoom’s web client avoids using WebRTC</title><link href="https://simonwillison.net/2019/Apr/18/zoom-wasm/#atom-tag" rel="alternate"/><published>2019-04-18T18:20:16+00:00</published><updated>2019-04-18T18:20:16+00:00</updated><id>https://simonwillison.net/2019/Apr/18/zoom-wasm/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://webrtchacks.com/zoom-avoids-using-webrtc/"&gt;How Zoom’s web client avoids using WebRTC&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
It turns out video conferencing app Zoom uses their own WebAssembly compiled video and audio codecs and transmits H264 over WebSockets.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="https://twitter.com/simonw/status/1118941811391651840"&gt;@simonw&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/websockets"&gt;websockets&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webassembly"&gt;webassembly&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webrtc"&gt;webrtc&lt;/a&gt;&lt;/p&gt;



</summary><category term="websockets"/><category term="webassembly"/><category term="webrtc"/></entry></feed>