What If the AI You Use Was Trained on Fake Human Answers, Written by Another AI?

For over a decade, platforms like Amazon Mechanical Turk (MTurk) have supported scientific research and machine learning projects. These platforms rely on human workers to summarize, annotate, or rate data. Researchers and companies have trusted that these responses are written by real people.

In 2023, researchers at EPFL discovered something that changes that assumption. They ran a simple task, asking MTurk workers to summarize medical research abstracts. These abstracts, taken from the New England Journal of Medicine, covered topics such as breast cancer, cardiovascular disease, nutrition, and vaccination.

Writing these summaries required time and understanding. Yet many workers didn’t write them at all. Instead, they copied and pasted outputs from ChatGPT. In many cases, the researchers noticed that workers pasted large blocks of text without typing anything.

Out of the 46 collected summaries, only five were typed completely by hand. The rest involved pasting, often without editing. In total, between 33% and 46% of all submissions were likely generated by AI.

When AI Starts Learning from Itself

This discovery isn’t just about workers taking shortcuts. It reveals a deeper issue in how modern AI systems are trained.

Large Language Models (LLMs) like GPT-4 or Claude improve through feedback. That feedback often comes from humans on platforms like MTurk. The idea is simple. Humans help align AI with real-world expectations by giving it examples, corrections, or ratings.

However, if workers are secretly using AI tools like ChatGPT to complete those tasks, then AI is not learning from humans. It is learning from other AIs pretending to be human.

This creates a feedback loop. An LLM generates an output. That output is then reused by workers in training data. A newer LLM then trains on this data, unknowingly learning from itself.

To detect this loop, the researchers trained a custom classifier to distinguish real human summaries from AI-written ones. The model was tested in two ways: on the same topics it was trained on and on completely new ones.

In both cases, it performed impressively. It reached 99% accuracy on known summaries and 97% accuracy on new, unseen data. This high performance confirmed that ChatGPT leaves subtle, consistent patterns in its writing.

Can We Still Trust Human-in-the-Loop Data?

AI alignment depends heavily on trusted, human-authored feedback. Researchers collect these responses to adjust models and ensure they reflect human values and reasoning. This feedback is considered the gold standard.

However, what happens when that gold standard is synthetic?

The EPFL study shows that AI-generated summaries were often accepted as real. In many cases, workers simply fed the original task prompt into ChatGPT and pasted the result. The team even noticed workers copying the exact task instructions into their prompts. This helped confirm their suspicion that many responses were AI-written.

These fake human answers are dangerous because they hide in plain sight. They don’t just reduce quality. They actively mislead training systems.

The team also found that many AI-generated summaries had very little in common with the original medical abstracts. Some workers reused a few phrases, but others barely included any overlapping content. This suggested they weren’t rewriting human content. They were pasting entirely new, AI-written summaries.

A Canary in the Coal Mine

The researchers described their findings as a warning sign. It is a signal of deeper problems that may arise soon.

Their experiment focused on text summarization, a task that is especially vulnerable to AI outsourcing. It is time-consuming for humans but easily handled by LLMs. However, the problem is not limited to summarization.

Any crowdsourced task that can be converted into a text prompt is at risk. This includes tasks like content rewriting, product reviews, short stories, and even survey responses.

As LLMs grow more capable, especially in handling images, video, and sound, this issue could spread to new areas. Tasks such as image captioning, audio transcription, and video description could also be affected.

If synthetic data continues to enter training pipelines without being flagged, future AI models could drift further from actual human values.

Inside the GPTurk Study: How Researchers Caught It

To address the issue, the researchers proposed a few key solutions.

First, they showed that specialized detection models can outperform generic tools like GPTZero or OpenAI’s basic classifiers. Their own model, trained on real and AI-written examples, detected synthetic summaries with nearly perfect accuracy.

Second, they tracked how workers interacted with the task using keystrokes. They looked for signs such as pasting behavior, typing speed, and editing activity. These clues helped them distinguish between genuine effort and AI-assisted shortcuts.

Third, they suggested changing how we use crowdworkers. Rather than expecting them to write content from scratch, they could be tasked with evaluating, reviewing, or filtering AI outputs. This could help preserve human insight while reducing the risks posed by synthetic data.

Why This Feedback Loop Could Break AI Alignment

This study raises a critical question. When you interact with an AI system, whose thinking are you really seeing? Is it the result of careful human guidance? Or is it the echo of earlier machine generations disguised as people?

As more AIs are trained using feedback that may have originated from other AIs, it becomes harder to track the human signal. This blurs the boundary between human knowledge and synthetic approximation.

The GPTurk findings show that this concern is not theoretical. It is already happening. The loop is already forming.

To build trustworthy AI, we must pay closer attention to the invisible humans and synthetic agents shaping their behavior.

If we are not careful, the AI you rely on tomorrow may reflect not your values, but a machine’s memory of itself.

Reference: “Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks” by Veniamin Veselovsky, Manoel Horta Ribeiro, and Robert West, 13 June 2023, arXiv preprint.

DOI: 10.48550/arXiv.2306.07899

TL;DR

A new study found crowdworkers using ChatGPT to do tasks meant for training AI, creating a feedback loop where AI learns from itself. This challenges the trustworthiness of human feedback and raises concerns about future AI alignment.

What If the AI You Use Was Trained on Fake Human Answers, Written by Another AI?

When AI Starts Learning from Itself

Can We Still Trust Human-in-the-Loop Data?

A Canary in the Coal Mine

Inside the GPTurk Study: How Researchers Caught It

Why This Feedback Loop Could Break AI Alignment

TL;DR

Your Cat Might Be Shaping Reality: Scientists Reveal a Mind-Bending Quantum Discovery

Why Knowing the Wait Time Might Be Costing You More Than You Think

Why Don’t We Have Floating Cars Yet? Scientists Are Getting Closer While Defying Gravity!

Why Do I Feel A Strange Force Pulling Me Toward Things I Can’t See?

What If Your Phone Could Charge Itself Forever? Scientists May Have Found the Answer!

Leave a Reply

When AI Starts Learning from Itself

Can We Still Trust Human-in-the-Loop Data?

A Canary in the Coal Mine

Inside the GPTurk Study: How Researchers Caught It

Why This Feedback Loop Could Break AI Alignment

TL;DR

More Reads For Geeks

Leave a Reply