Simon Willison’s Weblog

Subscribe

ChatGPT agent’s user-agent

4th August 2025

I was exploring how ChatGPT agent works today. I learned some interesting things about how it exposes its identity through HTTP headers, then made a huge blunder in thinking it was leaking its URLs to Bingbot and Yandex... but it turned out that was a Cloudflare feature that had nothing to do with ChatGPT.

ChatGPT agent is the recently released (and confusingly named) ChatGPT feature that provides browser automation combined with terminal access as a feature of ChatGPT—replacing their previous Operator research preview which is scheduled for deprecation on August 31st.

Investigating ChatGPT agent’s user-agent

I decided to dig into how it works by creating a logged web URL endpoint using django-http-debug. Then I told ChatGPT agent mode to explore that new page:

ChatGPT screenshot. My prompt was "Visit https://simonwillison.net/test-url-context and tell me what you see there" - it said "Worked for 15 seconds" with an arrow, then a screnshot of the webpage content showing "simonwillison.net" with a favicon, heading "This is a heading", text "Text and text and more text." and "this came from javascript". The bot then responds with: The webpage displays a simple layout with a large heading at the top that reads “This is a heading.” Below it, there's a short paragraph that says “Text and text and more text.” A final line appears underneath saying “this came from javascript,” indicating that this last line was inserted via a script. The page contains no interactive elements or instructions—just these lines of plain text displayed on a white background.

My logging captured these request headers:

Via: 1.1 heroku-router
Host: simonwillison.net
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Cf-Ray: 96a0f289adcb8e8e-SEA
Cookie: cf_clearance=zzV8W...
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Priority: u=0, i
Sec-Ch-Ua: "Not)A;Brand";v="8", "Chromium";v="138"
Signature: sig1=:1AxfqHocTf693inKKMQ7NRoHoWAZ9d/vY4D/FO0+MqdFBy0HEH3ZIRv1c3hyiTrzCvquqDC8eYl1ojcPYOSpCQ==:
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36
Cf-Ipcountry: US
X-Request-Id: 45ef5be4-ead3-99d5-f018-13c4a55864d3
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Accept-Encoding: gzip, br
Accept-Language: en-US,en;q=0.9
Signature-Agent: "https://chatgpt.com"
Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1754340838;keyid="otMqcjr17mGyruktGvJU8oojQTSMHlVm7uO-lrcqbdg";expires=1754344438;nonce="_8jbGwfLcgt_vUeiZQdWvfyIeh9FmlthEXElL-O2Rq5zydBYWivw4R3sV9PV-zGwZ2OEGr3T2Pmeo2NzmboMeQ";tag="web-bot-auth";alg="ed25519"
X-Forwarded-For: 2a09:bac5:665f:1541::21e:154, 172.71.147.183
X-Request-Start: 1754340840059
Cf-Connecting-Ip: 2a09:bac5:665f:1541::21e:154
Sec-Ch-Ua-Mobile: ?0
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Sec-Ch-Ua-Platform: "Linux"
Upgrade-Insecure-Requests: 1

That Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36 user-agent header is the one used by the most recent Chrome on macOS—which is a little odd here as the Sec-Ch-Ua-Platform : “Linux” indicates that the agent browser runs on Linux.

At first glance it looks like ChatGPT is being dishonest here by not including its bot identity in the user-agent header. I thought for a moment it might be reflecting my own user-agent, but I’m using Firefox on macOS and it identified itself as Chrome.

Then I spotted this header:

Signature-Agent: "https://chatgpt.com"

Which is accompanied by a much more complex header called Signature-Input:

Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1754340838;keyid="otMqcjr17mGyruktGvJU8oojQTSMHlVm7uO-lrcqbdg";expires=1754344438;nonce="_8jbGwfLcgt_vUeiZQdWvfyIeh9FmlthEXElL-O2Rq5zydBYWivw4R3sV9PV-zGwZ2OEGr3T2Pmeo2NzmboMeQ";tag="web-bot-auth";alg="ed25519"

And a Signature header too.

These turn out to come from a relatively new web standard: RFC 9421 HTTP Message Signatures’ published February 2024.

The purpose of HTTP Message Signatures is to allow clients to include signed data about their request in a way that cannot be tampered with by intermediaries. The signature uses a public key that’s provided by the following well-known endpoint:

https://chatgpt.com/.well-known/http-message-signatures-directory

Add it all together and we now have a rock-solid way to identify traffic from ChatGPT agent: look for the Signature-Agent: "https://chatgpt.com" header and confirm its value by checking the signature in the Signature-Input and Signature headers.

And then came Bingbot and Yandex

Just over a minute after it captured that request, my logging endpoint got another request:

Via: 1.1 heroku-router
From: bingbot(at)microsoft.com
Host: simonwillison.net
Accept: */*
Cf-Ray: 96a0f4671d1fc3c6-SEA
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
Cf-Ipcountry: US
X-Request-Id: 6214f5dc-a4ea-5390-1beb-f2d26eac5d01
Accept-Encoding: gzip, br
X-Forwarded-For: 207.46.13.9, 172.71.150.252
X-Request-Start: 1754340916429
Cf-Connecting-Ip: 207.46.13.9
X-Forwarded-Port: 80
X-Forwarded-Proto: http

I pasted 207.46.13.9 into Microsoft’s Verify Bingbot tool (after solving a particularly taxing CAPTCHA) and it confirmed that this was indeed a request from Bingbot.

I set up a second URL to confirm... and this time got a visit from Yandex!

Via: 1.1 heroku-router
From: support@search.yandex.ru
Host: simonwillison.net
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Cf-Ray: 96a16390d8f6f3a7-DME
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Cf-Ipcountry: RU
X-Request-Id: 3cdcbdba-f629-0d29-b453-61644da43c6c
Accept-Encoding: gzip, br
X-Forwarded-For: 213.180.203.138, 172.71.184.65
X-Request-Start: 1754345469921
Cf-Connecting-Ip: 213.180.203.138
X-Forwarded-Port: 80
X-Forwarded-Proto: http

Yandex suggest a reverse DNS lookup to verify, so I ran this command:

dig -x 213.180.203.138 +short

And got back:

213-180-203-138.spider.yandex.com.

Which confirms that this is indeed a Yandex crawler.

I tried a third experiment to be sure... and got hits from both Bingbot and YandexBot.

It was Cloudflare Crawler Hints, not ChatGPT

So I wrote up and posted about my discovery... and Jatan Loya asked:

do you have crawler hints enabled in cf?

And yeah, it turned out I did. I spotted this in my caching configuration page (and it looks like I must have turned it on myself at some point in the past):

Screenshot of Cloudflare settings panel showing "Crawler Hints Beta" with description text explaining that Crawler Hints provide high quality data to search engines and other crawlers when sites using Cloudflare change their content. This allows crawlers to precisely time crawling, avoid wasteful crawls, and generally reduce resource consumption on origins and other Internet infrastructure. Below states "By enabling this service, you agree to share website information required for feature functionality and agree to the Supplemental Terms for Crawler Hints." There is a toggle switch in the on position on the right side and a "Help" link in the bottom right corner.

Here’s the Cloudflare documentation for that feature.

I deleted my posts on Twitter and Bluesky (since you can’t edit those and I didn’t want the misinformation to continue to spread) and edited my post on Mastodon, then updated this entry with the real reason this had happened.

I also changed the URL of this entry as it turned out Twitter and Bluesky were caching my social media preview for the previous one, which included the incorrect information in the title.

Original “So what’s going on here?” section from my post

Here’s a section of my original post with my theories about what was going on before learning about Cloudflare Crawler Hints.

So what’s going on here?

There are quite a few different moving parts here.

  1. I’m using Firefox on macOS with the 1Password and Readwise Highlighter extensions installed and active. Since I didn’t visit the debug pages at all with my own browser I don’t think any of these are relevant to these results.
  2. ChatGPT agent makes just a single request to my debug URL ...
  3. ... which is proxied through both Cloudflare and Heroku.
  4. Within about a minute, I get hits from one or both of Bingbot and Yandex.

Presumably ChatGPT agent itself is running behind at least one proxy—I would expect OpenAI to keep a close eye on that traffic to ensure it doesn’t get abused.

I’m guessing that infrastructure is hosted by Microsoft Azure. The OpenAI Sub-processor List—though that lists Microsoft Corporation, CoreWeave Inc, Oracle Cloud Platform and Google Cloud Platform under the “Cloud infrastructure” section so it could be any of those.

Since the page is served over HTTPS my guess is that any intermediary proxies should be unable to see the path component of the URL, making the mystery of how Bingbot and Yandex saw the URL even more intriguing.

This is ChatGPT agent’s user-agent by Simon Willison, posted on 4th August 2025.

Previous: The ChatGPT sharing dialog demonstrates how difficult it is to design privacy preferences

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe