Simon Willison’s Weblog

Subscribe
Atom feed for slop

12 items tagged “slop”

Slop describes AI-generated content that is both unrequested and unreviewed. See Slop is the new name for unwanted AI-generated content.

2024

SQL Has Problems. We Can Fix Them: Pipe Syntax In SQL (via) A new paper from Google Research describing custom syntax for analytical SQL queries that has been rolling out inside Google since February, reaching 1,600 "seven-day-active users" by August 2024.

A key idea is here is to fix one of the biggest usability problems with standard SQL: the order of the clauses in a query. Starting with SELECT instead of FROM has always been confusing, see SQL queries don't start with SELECT by Julia Evans.

Here's an example of the new alternative syntax, taken from the Pipe query syntax documentation that was added to Google's open source ZetaSQL project last week.

For this SQL query:

SELECT component_id, COUNT(*)
FROM ticketing_system_table
WHERE
  assignee_user.email = 'username@email.com'
  AND status IN ('NEW', 'ASSIGNED', 'ACCEPTED')
GROUP BY component_id
ORDER BY component_id DESC;

The Pipe query alternative would look like this:

FROM ticketing_system_table
|> WHERE
    assignee_user.email = 'username@email.com'
    AND status IN ('NEW', 'ASSIGNED', 'ACCEPTED')
|> AGGREGATE COUNT(*)
   GROUP AND ORDER BY component_id DESC;

The Google Research paper is released as a two-column PDF. I snarked about this on Hacker News:

Google: you are a web company. Please learn to publish your research papers as web pages.

This remains a long-standing pet peeve of mine. PDFs like this are horrible to read on mobile phones, hard to copy-and-paste from, have poor accessibility (see this Mastodon conversation) and are generally just bad citizens of the web.

Having complained about this I felt compelled to see if I could address it myself. Google's own Gemini Pro 1.5 model can process PDFs, so I uploaded the PDF to Google AI Studio and prompted the gemini-1.5-pro-exp-0801 model like this:

Convert this document to neatly styled semantic HTML

This worked surprisingly well. It output HTML for about half the document and then stopped, presumably hitting the output length limit, but a follow-up prompt of "and the rest" caused it to continue from where it stopped and run until the end.

Here's the result (with a banner I added at the top explaining that it's a conversion): Pipe-Syntax-In-SQL.html

I haven't compared the two completely, so I can't guarantee there are no omissions or mistakes.

The figures from the PDF aren't present - Gemini Pro output tags like <img src="figure1.png" alt="Figure 1: SQL syntactic clause order doesn't match semantic evaluation order. (From [25].)"> but did nothing to help me create those images.

Amusingly the document ends with <p>(A long list of references, which I won't reproduce here to save space.)</p> rather than actually including the references from the paper!

So this isn't a perfect solution, but considering it took just the first prompt I could think of it's a very promising start. I expect someone willing to spend more than the couple of minutes I invested in this could produce a very useful HTML alternative version of the paper with the assistance of Gemini Pro.

One last amusing note: I posted a link to this to Hacker News a few hours ago. Just now when I searched Google for the exact title of the paper my HTML version was already the third result!

I've now added a <meta name="robots" content="noindex, follow"> tag to the top of the HTML to keep this unverified AI slop out of their search index. This is a good reminder of how much better HTML is than PDF for sharing information on the web!

# 24th August 2024, 11 pm / google, pdf, seo, sql, ai, julia-evans, generative-ai, llms, gemini, slop

Where Facebook’s AI Slop Comes From. Jason Koebler continues to provide the most insightful coverage of Facebook's weird ongoing problem with AI slop (previously).

Who's creating this stuff? It looks to primarily come from individuals in countries like India and the Philippines, inspired by get-rich-quick YouTube influencers, who are gaming Facebook's Creator Bonus Program and flooding the platform with AI-generated images.

Jason highlights this YouTube video by YT Gyan Abhishek (136,000 subscribers) and describes it like this:

He pauses on another image of a man being eaten by bugs. “They are getting so many likes,” he says. “They got 700 likes within 2-4 hours. They must have earned $100 from just this one photo. Facebook now pays you $100 for 1,000 likes … you must be wondering where you can get these images from. Don’t worry. I’ll show you how to create images with the help of AI.”

That video is in Hindi but you can request auto-translated English subtitles in the YouTube video settings. The image generator demonstrated in the video is Ideogram, which offers a free plan. (Here's pelicans having a tea party on a yacht.)

Screenshot of a YouTube influencer demonstrating Ideogram generating "BMPoor people with thin body" - caption reads along with this you can also see this image

Jason's reporting here runs deep - he goes as far as buying FewFeed, dedicated software for scraping and automating Facebook, and running his own (unsuccessful) page using prompts from YouTube tutorials like:

an elderly woman celebrating her 104th birthday with birthday cake realistic family realistic jesus celebrating with her

I signed up for a $10/month 404 Media subscription to read this and it was absolutely worth the money.

# 10th August 2024, 12:26 am / ethics, facebook, ai, slop, jason-koebler, meta

Today’s research challenge: why is August 1st “World Wide Web Day”? Here's a fun mystery. A bunch of publications will tell you that today, August 1st, is "World Wide Web Day"... but where did that idea come from?

It's not an official day marked by any national or international organization. It's not celebrated by CERN or the W3C.

The date August 1st doesn't appear to hold any specific significance in the history of the web. The first website was launched on August 6th 1991.

I posed the following three questions this morning on Mastodon:

  1. Who first decided that August 1st should be "World Wide Web Day"?
  2. Why did they pick that date?
  3. When was the first World Wide Web Day celebrated?

Finding answers to these questions has proven stubbornly difficult. Searches on Google have proven futile, and illustrate the growing impact of LLM-generated slop on the web: they turn up dozens of articles celebrating the day, many from news publications playing the "write about what people might search for" game and many others that have distinctive ChatGPT vibes to them.

One early hint we've found is in the "Bylines 2010 Writer's Desk Calendar" by Snowflake Press, published in January 2009. Jessamyn West spotted that on the book's page in the Internet Archive, but it merely lists "World Wide Web Day" at the bottom of the July calendar page (clearly a printing mistake, the heading is meant to align with August 1st on the next page) without any hint as to the origin:

Screenshot of a section of the calendar showing July 30 (Friday) and 31st (Saturday) - at the very bottom of the Saturday block is the text World Wide Web Day

I found two earlier mentions from August 1st 2008 on Twitter, from @GabeMcCauley and from @iJess.

Our earliest news media reference, spotted by Hugo van Kemenade, is also from August 1st 2008: this opinion piece in the Attleboro Massachusetts Sun Chronicle, which has no byline so presumably was written by the paper's editorial board:

Today is World Wide Web Day, but who cares? We'd rather nap than surf. How about you? Better relax while you can: August presages the start of school, a new season of public meetings, worries about fuel costs, the rundown to the presidential election and local races.

So the mystery remains! Who decided that August 1st should be "World Wide Web Day", why that date and how did it spread so widely without leaving a clear origin story?

If your research skills are up to the challenge, join the challenge!

# 1st August 2024, 5:34 pm / history, internet-archive, w3c, web, mastodon, slop

Facebook Is the ’Zombie Internet’. Ever since Facebook started to become infested with weird AI-generated images of shrimp Jesus - with thousands of comments and likes - I've been wondering how much of that activity is real humans as opposed to yet more bots.

Jason Koebler has been on the Facebook AI slop beat for a while. In this superb piece of online investigative reporting he dives deep into an attempt to answer that question, using multiple Facebook burner accounts and contacting more than 300 users who have commented on that kind of image.

I endlessly tried to talk to people who commented on these images, but I had no luck at all. Over the course of several months, I messaged 300 people who commented on bizarre AI-generated images, which I could only do 20 or so at a time before Facebook stopped letting me send messages for several hours. I also commented on dozens of images myself, asking for any human who had also commented on the image to respond to me. Across those hundreds of messages, I got four total responses.

Jacob also talked to Khan Schoolcraft, a moderator of the Um, isn’t that AI? group, who said:

In my experience, the supermajority of engagement on viral AI Facebook pages is just as artificially-generated as the content they publish. When exploring their comment sections, one will often see hundreds of bot-like comments interspersed with a few ‘real’ people sounding the alarm to no avail. [...]

Whether it's a child transforming into a water bottle cyborg, a three-armed flight attendant rescuing Tiger Jesus from a muddy plane crash, or a hybrid human-monkey baby being stung to death by giant hornets, all tend to have copy+pasted captions, reactions & comments which usually make no sense in the observed context.

# 15th July 2024, 6:56 pm / facebook, ai, generative-ai, slop, jason-koebler

Early Apple tech bloggers are shocked to find their name and work have been AI-zombified (via)

TUAW (“The Unofficial Apple Weblog”) was shut down by AOL in 2015, but this past year, a new owner scooped up the domain and began posting articles under the bylines of former writers who haven’t worked there for over a decade.

They're using AI-generated images against real names of original contributors, then publishing LLM-rewritten articles because they didn't buy the rights to the original content!

# 10th July 2024, 10:48 pm / ethics, ai, slop

Content slop has three important characteristics. The first being that, to the user, the viewer, the customer, it feels worthless. This might be because it was clearly generated in bulk by a machine or because of how much of that particular content is being created. The next important feature of slop is that feels forced upon us, whether by a corporation or an algorithm. It’s in the name. We’re the little piggies and it’s the gruel in the trough. But the last feature is the most crucial. It not only feels worthless and ubiquitous, it also feels optimized to be so. The Charli XCX “Brat summer” meme does not feel like slop, nor does Kendrick Lamar’s extremely long “Not Like Us” roll out. But Taylor Swift’s cascade of alternate versions of her songs does. The jury’s still out on Sabrina Carpenter. Similarly, last summer’s Barbenheimer phenomenon did not, to me, feel like slop. Dune: Part Two didn’t either. But Deadpool & Wolverine, at least in the marketing, definitely does.

Ryan Broderick

# 10th July 2024, 5:43 pm / marketing, ai, slop

Open challenges for AI engineering

Visit Open challenges for AI engineering

I gave the opening keynote at the AI Engineer World’s Fair yesterday. I was a late addition to the schedule: OpenAI pulled out of their slot at the last minute, and I was invited to put together a 20 minute talk with just under 24 hours notice!

[... 5,640 words]

First Came ‘Spam.’ Now, With A.I., We’ve Got ‘Slop’. First the Guardian, now the NYT. I've apparently made a habit of getting quoted by journalists talking about slop!

I got the closing quote in this one:

Society needs concise ways to talk about modern A.I. — both the positives and the negatives. ‘Ignore that email, it’s spam,’ and ‘Ignore that article, it’s slop,’ are both useful lessons.

# 11th June 2024, 4:12 pm / ethics, new-york-times, ai, generative-ai, slop

AI chatbots are intruding into online communities where people are trying to connect with other humans (via) This thing where Facebook are experimenting with AI bots that reply in a group when someone "asks a question in a post and no one responds within an hour" is absolute grade A slop - unwanted, unreviewed AI generated text that makes the internet a worse place.

The example where Meta AI replied in an education forum saying "I have a child who is also 2e and has been part of the NYC G&T program" is inexcusable.

# 9th June 2024, 3:14 am / ethics, facebook, ai, generative-ai, llms, slop

Spam, junk … slop? The latest wave of AI behind the ‘zombie internet’. I'm quoted in this piece in the Guardian about slop:

I think having a name for this is really important, because it gives people a concise way to talk about the problem.

Before the term ‘spam’ entered general use it wasn’t necessarily clear to everyone that unwanted marketing messages were a bad way to behave. I’m hoping ‘slop’ has the same impact – it can make it clear to people that generating and publishing unreviewed AI-generated content is bad behaviour.

# 19th May 2024, 7:54 pm / ethics, ai, generative-ai, slop

Slop is the new name for unwanted AI-generated content

Visit Slop is the new name for unwanted AI-generated content

I saw this tweet yesterday from @deepfates, and I am very on board with this:

[... 329 words]

Watching in real time as "slop" becomes a term of art. the way that "spam" became the term for unwanted emails, "slop" is going in the dictionary as the term for unwanted AI generated content

@deepfates

# 7th May 2024, 3:59 pm / ethics, ai, generative-ai, llms, slop