Simon Willison’s Weblog

Subscribe
Atom feed for internet-archive

12 items tagged “internet-archive”

2024

Wayback Machine: Models—Anthropic (8th October 2024). The Internet Archive is only intermittently available at the moment, but the Wayback Machine just came back long enough for me to confirm that the Anthropic Models documentation page listed Claude 3.5 Opus as coming “Later this year” at least as recently as the 8th of October, but today makes no mention of that model at all.

October 8th 2024

Internet Archive capture of the Claude models page - shows both Claude 3.5 Haiku and Claude 3.5 Opus as Later this year

October 22nd 2024

That same page today shows Claude 3.5 Haiku as later this year but no longer mentions Claude 3.5 Opus at all

Claude 3 came in three flavors: Haiku (fast and cheap), Sonnet (mid-range) and Opus (best). We were expecting 3.5 to have the same three levels, and both 3.5 Haiku and 3.5 Sonnet fitted those expectations, matching their prices to the Claude 3 equivalents.

It looks like 3.5 Opus may have been entirely cancelled, or at least delayed for an unpredictable amount of time. I guess that means the new 3.5 Sonnet will be Anthropic's best overall model for a while, maybe until Claude 4.

# 22nd October 2024, 10:42 pm / internet-archive, ai, generative-ai, llms, anthropic, claude

Today’s research challenge: why is August 1st “World Wide Web Day”? Here's a fun mystery. A bunch of publications will tell you that today, August 1st, is "World Wide Web Day"... but where did that idea come from?

It's not an official day marked by any national or international organization. It's not celebrated by CERN or the W3C.

The date August 1st doesn't appear to hold any specific significance in the history of the web. The first website was launched on August 6th 1991.

I posed the following three questions this morning on Mastodon:

  1. Who first decided that August 1st should be "World Wide Web Day"?
  2. Why did they pick that date?
  3. When was the first World Wide Web Day celebrated?

Finding answers to these questions has proven stubbornly difficult. Searches on Google have proven futile, and illustrate the growing impact of LLM-generated slop on the web: they turn up dozens of articles celebrating the day, many from news publications playing the "write about what people might search for" game and many others that have distinctive ChatGPT vibes to them.

One early hint we've found is in the "Bylines 2010 Writer's Desk Calendar" by Snowflake Press, published in January 2009. Jessamyn West spotted that on the book's page in the Internet Archive, but it merely lists "World Wide Web Day" at the bottom of the July calendar page (clearly a printing mistake, the heading is meant to align with August 1st on the next page) without any hint as to the origin:

Screenshot of a section of the calendar showing July 30 (Friday) and 31st (Saturday) - at the very bottom of the Saturday block is the text World Wide Web Day

I found two earlier mentions from August 1st 2008 on Twitter, from @GabeMcCauley and from @iJess.

Our earliest news media reference, spotted by Hugo van Kemenade, is also from August 1st 2008: this opinion piece in the Attleboro Massachusetts Sun Chronicle, which has no byline so presumably was written by the paper's editorial board:

Today is World Wide Web Day, but who cares? We'd rather nap than surf. How about you? Better relax while you can: August presages the start of school, a new season of public meetings, worries about fuel costs, the rundown to the presidential election and local races.

So the mystery remains! Who decided that August 1st should be "World Wide Web Day", why that date and how did it spread so widely without leaving a clear origin story?

If your research skills are up to the challenge, join the challenge!

# 1st August 2024, 5:34 pm / history, internet-archive, w3c, web, mastodon, slop

People share a lot of sensitive material on Quora - controversial political views, workplace gossip and compensation, and negative opinions held of companies. Over many years, as they change jobs or change their views, it is important that they can delete or anonymize their previously-written answers.

We opt out of the wayback machine because inclusion would allow people to discover the identity of authors who had written sensitive answers publicly and later had made them anonymous, and because it would prevent authors from being able to remove their content from the internet if they change their mind about publishing it.

quora.com/robots.txt

# 19th March 2024, 11:09 pm / internet-archive, robots-txt, quora

2020

Internet Archive Software Library: Flash (via) A fantastic new initiative from the Internet Archive: they’re now archiving Flash (.swf) files and serving them for modern browsers using Ruffle, a Flash Player emulator written in Rust and compiled to WebAssembly. They are fully interactive and audio works too. Considering the enormous quantity of creative material released in Flash over the decades this helps fill a big hole in the Internet’s cultural memory.

# 19th November 2020, 9:19 pm / flash, internet-archive, jason-scott, rust, webassembly

2018

Usage of ARIA attributes via HTTP Archive. A neat example of a Google BigQuery query you can run against the HTTP Archive public dataset (a crawl of the “top” websites run periodically by the Internet Archive, which captures the full details of every resource fetched) to see which ARIA attributes are used the most often. Linking to this because I used it successfully today as the basis for my own custom query—I love that it’s possible to analyze a huge representative sample of the modern web in this way.

# 12th July 2018, 3:16 am / aria, http, internet-archive, big-data

2017

Elaborate Halloween Costume Tips from a 19th-Century Guide to Fancy Dress (via) The gilded age had some ridiculous parties. Here are highlights of the most popular costume guide of the era, now available on the Internet Archive.

# 26th October 2017, 2:01 pm / history, internet-archive

Recovering missing content from the Internet Archive

When I restored my blog last weekend I used the most recent SQL backup of my blog’s database from back in 2010. I thought it had all of my content from before I started my 7 year hiatus, but in watching the 404 logs I started seeing the occasional hit to something that really should have been there but wasn’t. Turns out the SQL backup I was working from was missing some content.

[... 636 words]

2009

tr.im is “discontinuing service”. “However, all tr.im links will continue to redirect, and will do so until at least December 31, 2009.Your tweets with tr.im URLs in them will not be affected.”—these statements seem to contradict themselves. Will tr.im URLs in tweets stop working after December 31st or not? Any chance they could hand the domain over to the Internet Archive? At any rate, this is exactly why centralised URL shorteners are a harmful trend.

# 10th August 2009, 11:06 am / internet-archive, redirects, trim, twitter, urls, urlshorteners

A new leaf. George Oates is now heading up the Open Library project at the Internet Archive. Sounds like a perfect match.

# 28th April 2009, 12:55 am / george-oates, internet-archive, openlibrary

TinyURL—Archiveteam. Excellent: the Internet Archive are crawling TinyURL (and hopefully other URL shortening services as well). The wiki page was created back in January. UPDATE from comments: Archiveteam are a separate organisation from the Internet Archive.

# 3rd April 2009, 11:11 pm / archive, archiveteam, internet-archive, tinyurl

The Internet Archive should actively partner with bit.ly / tinyurl.com / icanhaz.com etc. and maintain a mirror database of their redirects

Me, on Twitter

# 8th March 2009, 2:59 pm / bitly, icanhaz, internet-archive, me, tinyurl, twitter, urlshorteners

2007

My Future of Web Apps talk as a slidecast

The team at Carson Systems have a pretty quick turnaround on their podcasts; they’ve had full recordings of every speaker up for a few days now. I spent a bunch of time over the weekend splicing the recording of my talk together with my slides, and the result is now available at The Future of OpenID (a slidecast).

[... 177 words]