Ouroboros of Garbage: When AI Feeds on Itself

There’s reasonable concern in the world that AI is going to fill the internet with such a torrent of search-engine optimized crap that there’s not going to be much left of interest for actual human readers. The problem is compounded by the reality that AI models are trained on public content on the internet which means that there is a distinct possibility that future models could be fed in large measure by AI-generated content.

Way back when I taught internet journalism, I would do an experiment with my students to draw out the distinction between analog and digital media. We’d play a form of the telephone game where, in one round, they had to whisper information from one person to the next and experience the inevitable data loss by the time the initial message had cycled through 15 or so people. Then we’d repeat the process except they could break the message into discrete, simple blocks of information and could ask the person prior to repeat themselves if they weren’t confident of the message. The idea, obviously, was that digital information and internet protocols were designed to replicate perfectly and infinitely, and analog copies of information were prone to the inevitable rot inherent in trying to duplicate atoms. Not a perfect analogy, but I wasn’t a great teacher.

I say all of this to say that we run the risk of an internet full of terrible AI-generated content that could, in turn, become the fodder for future AI models, effectively turning them into coherent-sounding idiots, while ruining the Web for the rest of us humans.

I experienced this firsthand a couple of weeks ago. While I was working, I was listing to The Hold Steady’s “Teeth Dreams” album (which I’ll write about in the future at greater length) and a song lyric caught my ear: “last night her teeth were in my dreams.” In my urge to self-distract, I asked the GPT-backed model I had open what that song’s lyrics meant.

It gave me some plausible perspective that dreaming about teeth was often connected to feelings of anxiety, but then it started to say something that was just… wrong. Like, factually wrong. It referenced a lyric that didn’t exist in the song or, as far as I know, any Hold Steady song. I clicked through to the source article — “10 Best Hold Steady Songs of All Time” — at a website called SingersRoom, which said the following:

“I Hope This Whole Thing Didn’t Frighten You” is a track from The Hold Steady’s 2014 album, “Teeth Dreams.” The song features the band’s signature sound of driving guitars and pounding drums, along with frontman Craig Finn’s distinctive vocal style. The lyrics tell the story of a character who is struggling to come to terms with a past mistake and the consequences that come with it. The chorus of the song features a refrain of “Let’s not get too fucked up tonight,” which serves as both a warning and a call to action. With its powerful instrumentation and introspective lyrics, “I Hope This Whole Thing Didn’t Frighten You” is a standout track on “Teeth Dreams” and a testament to The Hold Steady’s ability to create music that is both energetic and thought-provoking.

The whole top 10 list is written that way, with some sentences that are kind of correct, lots of statements that are so anodyne as to be meaningless (“The song features a driving guitar riff and pounding drums that create a sense of urgency”) and no shortage of information that’s just clearly not correct. Songs with nonexistent lyrics, songs said to be from the wrong albums; it just all feels machine-generated.

At first I thought that it was an AI hallucination, but digging deeper I realized that something was up.

The site contains hundreds of “top 10 songs by” different artists, all written by the same person: a writer purportedly named Darren Jamison. Darren is, by all accounts, an expert in every genre of music with deep knowledge of the catalogs of many hundreds of artists.

So who is this person? Does he even exist?

It would seem not. I was unable to find a journalist named “Darren Jamison” on LinkedIn or Twitter… but here’s where things get weird.

It gets weirder still. At times the article has shown up as having been written by an Edward Tomlin, who does have a profile with a photograph on Muck Rack (Darrin only has a placeholder). But reverse-searching Edward’s photograph takes you to what looks like the use of a stock photo. There’s apparently a Twitter account, @dvheadlines, but there’s no sign that it’s an actual real person.

Reading through the lists, it’s abundantly clear that they were written by AI and/or other automatically-generated content scripting. There are too many obvious errors and too much of a formula for any human hands to be touching this content. I’m not convinced that there’s any human-written content anywhere on this site, and I’m sure it’s far from the only one.

The problem isn’t that it’s obvious to you and me; it’s that it’s not obvious to the AI models consuming it. Which means that they’ll use the garbage to create even more garbagey garbage until it’s garbage all the way down.

Now, clearly one way to think about this is “who cares?” It’s listicle articles about bands, and most of the Web is crap anyway. But the problem is that the content is almost definitely machine-generated, and AI models are training on this content. Even now, in the early days of generative AI, this kind of garbage is polluting the databases of the models, and as more content gets generated by the models, what comes after is just going to be more inaccurate and cruddy.

What’s to be done about it? Can’t say as I’m sure. Even on legitimate sites that we use every day, lots of content is already created purely to hack search engine algorithms and drive advertising or affiliate traffic. Plenty of legitimate sites clutter themselves with weird clickbait.

I think the only thing we can do is continue to write and make stuff as actual human people and try to ensure that there’s actual, real content on the internet. Use AI when it makes sense, to automate boring stuff or to ideate, but don’t use it to do your creative work for you. Even as we enter a likely age of AI, we need humans more than ever, and if we don’t train the AI models, then who will?

clontzville.com

Ouroboros of Garbage: When AI Feeds on Itself

One response to “Ouroboros of Garbage: When AI Feeds on Itself”

Leave a comment Cancel reply

I’m Lee

Let’s connect

Recent posts

“Fargo,” humor and The Hold Steady

Kiawah Island Marathon 2025: Hello, The Wall

Prompt Droid version 2

Stupid Weekend Project: Prompt Droid edition

The sublime joys of running before dawn

Honey, I bought a CD player