ChatGPT agent’s user-agent
4th August 2025
I was exploring how ChatGPT agent works today. I learned some interesting things about how it exposes its identity through HTTP headers, then made a huge blunder in thinking it was leaking its URLs to Bingbot and Yandex... but it turned out that was a Cloudflare feature that had nothing to do with ChatGPT.
ChatGPT agent is the recently released (and confusingly named) ChatGPT feature that provides browser automation combined with terminal access as a feature of ChatGPT—replacing their previous Operator research preview which is scheduled for deprecation on August 31st.
Investigating ChatGPT agent’s user-agent
I decided to dig into how it works by creating a logged web URL endpoint using django-http-debug. Then I told ChatGPT agent mode to explore that new page:
My logging captured these request headers:
Via: 1.1 heroku-router
Host: simonwillison.net
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Cf-Ray: 96a0f289adcb8e8e-SEA
Cookie: cf_clearance=zzV8W...
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Priority: u=0, i
Sec-Ch-Ua: "Not)A;Brand";v="8", "Chromium";v="138"
Signature: sig1=:1AxfqHocTf693inKKMQ7NRoHoWAZ9d/vY4D/FO0+MqdFBy0HEH3ZIRv1c3hyiTrzCvquqDC8eYl1ojcPYOSpCQ==:
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36
Cf-Ipcountry: US
X-Request-Id: 45ef5be4-ead3-99d5-f018-13c4a55864d3
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Accept-Encoding: gzip, br
Accept-Language: en-US,en;q=0.9
Signature-Agent: "https://chatgpt.com"
Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1754340838;keyid="otMqcjr17mGyruktGvJU8oojQTSMHlVm7uO-lrcqbdg";expires=1754344438;nonce="_8jbGwfLcgt_vUeiZQdWvfyIeh9FmlthEXElL-O2Rq5zydBYWivw4R3sV9PV-zGwZ2OEGr3T2Pmeo2NzmboMeQ";tag="web-bot-auth";alg="ed25519"
X-Forwarded-For: 2a09:bac5:665f:1541::21e:154, 172.71.147.183
X-Request-Start: 1754340840059
Cf-Connecting-Ip: 2a09:bac5:665f:1541::21e:154
Sec-Ch-Ua-Mobile: ?0
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Sec-Ch-Ua-Platform: "Linux"
Upgrade-Insecure-Requests: 1
That Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36 user-agent header is the one used by the most recent Chrome on macOS—which is a little odd here as the Sec-Ch-Ua-Platform : “Linux” indicates that the agent browser runs on Linux.
At first glance it looks like ChatGPT is being dishonest here by not including its bot identity in the user-agent header. I thought for a moment it might be reflecting my own user-agent, but I’m using Firefox on macOS and it identified itself as Chrome.
Then I spotted this header:
Signature-Agent: "https://chatgpt.com"
Which is accompanied by a much more complex header called Signature-Input:
Signature-Input: sig1=("@authority" "@method" "@path" "signature-agent");created=1754340838;keyid="otMqcjr17mGyruktGvJU8oojQTSMHlVm7uO-lrcqbdg";expires=1754344438;nonce="_8jbGwfLcgt_vUeiZQdWvfyIeh9FmlthEXElL-O2Rq5zydBYWivw4R3sV9PV-zGwZ2OEGr3T2Pmeo2NzmboMeQ";tag="web-bot-auth";alg="ed25519"
And a Signature
header too.
These turn out to come from a relatively new web standard: RFC 9421 HTTP Message Signatures’ published February 2024.
The purpose of HTTP Message Signatures is to allow clients to include signed data about their request in a way that cannot be tampered with by intermediaries. The signature uses a public key that’s provided by the following well-known endpoint:
https://chatgpt.com/.well-known/http-message-signatures-directory
Add it all together and we now have a rock-solid way to identify traffic from ChatGPT agent: look for the Signature-Agent: "https://chatgpt.com"
header and confirm its value by checking the signature in the Signature-Input
and Signature
headers.
And then came Bingbot and Yandex
Just over a minute after it captured that request, my logging endpoint got another request:
Via: 1.1 heroku-router
From: bingbot(at)microsoft.com
Host: simonwillison.net
Accept: */*
Cf-Ray: 96a0f4671d1fc3c6-SEA
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
Cf-Ipcountry: US
X-Request-Id: 6214f5dc-a4ea-5390-1beb-f2d26eac5d01
Accept-Encoding: gzip, br
X-Forwarded-For: 207.46.13.9, 172.71.150.252
X-Request-Start: 1754340916429
Cf-Connecting-Ip: 207.46.13.9
X-Forwarded-Port: 80
X-Forwarded-Proto: http
I pasted 207.46.13.9
into Microsoft’s Verify Bingbot tool (after solving a particularly taxing CAPTCHA) and it confirmed that this was indeed a request from Bingbot.
I set up a second URL to confirm... and this time got a visit from Yandex!
Via: 1.1 heroku-router
From: support@search.yandex.ru
Host: simonwillison.net
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Cf-Ray: 96a16390d8f6f3a7-DME
Server: Heroku
Cdn-Loop: cloudflare; loops=1
Cf-Visitor: {"scheme":"https"}
User-Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Cf-Ipcountry: RU
X-Request-Id: 3cdcbdba-f629-0d29-b453-61644da43c6c
Accept-Encoding: gzip, br
X-Forwarded-For: 213.180.203.138, 172.71.184.65
X-Request-Start: 1754345469921
Cf-Connecting-Ip: 213.180.203.138
X-Forwarded-Port: 80
X-Forwarded-Proto: http
Yandex suggest a reverse DNS lookup to verify, so I ran this command:
dig -x 213.180.203.138 +short
And got back:
213-180-203-138.spider.yandex.com.
Which confirms that this is indeed a Yandex crawler.
I tried a third experiment to be sure... and got hits from both Bingbot and YandexBot.
It was Cloudflare Crawler Hints, not ChatGPT
So I wrote up and posted about my discovery... and Jatan Loya asked:
do you have crawler hints enabled in cf?
And yeah, it turned out I did. I spotted this in my caching configuration page (and it looks like I must have turned it on myself at some point in the past):
Here’s the Cloudflare documentation for that feature.
I deleted my posts on Twitter and Bluesky (since you can’t edit those and I didn’t want the misinformation to continue to spread) and edited my post on Mastodon, then updated this entry with the real reason this had happened.
I also changed the URL of this entry as it turned out Twitter and Bluesky were caching my social media preview for the previous one, which included the incorrect information in the title.
Original “So what’s going on here?” section from my post
Here’s a section of my original post with my theories about what was going on before learning about Cloudflare Crawler Hints.
So what’s going on here?
There are quite a few different moving parts here.
- I’m using Firefox on macOS with the 1Password and Readwise Highlighter extensions installed and active. Since I didn’t visit the debug pages at all with my own browser I don’t think any of these are relevant to these results.
- ChatGPT agent makes just a single request to my debug URL ...
- ... which is proxied through both Cloudflare and Heroku.
- Within about a minute, I get hits from one or both of Bingbot and Yandex.
Presumably ChatGPT agent itself is running behind at least one proxy—I would expect OpenAI to keep a close eye on that traffic to ensure it doesn’t get abused.
I’m guessing that infrastructure is hosted by Microsoft Azure. The OpenAI Sub-processor List—though that lists Microsoft Corporation, CoreWeave Inc, Oracle Cloud Platform and Google Cloud Platform under the “Cloud infrastructure” section so it could be any of those.
Since the page is served over HTTPS my guess is that any intermediary proxies should be unable to see the path component of the URL, making the mystery of how Bingbot and Yandex saw the URL even more intriguing.
More recent articles
- The ChatGPT sharing dialog demonstrates how difficult it is to design privacy preferences - 3rd August 2025
- Reverse engineering some updates to Claude - 31st July 2025