The Ghost in the Machine Was a Student Named Alex

The First Anomaly

It was the third iteration of the Binary Search Tree implementation in CS 301: Data Structures and Algorithms at Carlton University. Professor Anya Sharma, a veteran of fifteen years, was reviewing the auto-grader output. The assignment was straightforward: implement insertion, deletion, and an inorder traversal. The usual spread of results was there—the flawless, the buggy, the creatively over-engineered.

Then she saw it. Two submissions, from students "Alex Chen" and "Jamie Rivera," had failed the same three edge-case tests. Not unusual in itself. But the nature of the failure was identical. Not just the same test cases, but the same incorrect output, down to the formatting of the error message their code printed.

// From Alex Chen's submission, line 47:
if (node.left == null && node.right == null) {
    return node; // Incorrectly returns node instead of null
}

// From Jamie Rivera's submission, line 52:
if (current.left == null && current.right == null) {
    return current; // Same logical error, different variable name
}

Anya ran the standard departmental tool, a venerable MOSS server, on the pair. It returned a 12% similarity score. Well below the 70% threshold that flagged a manual review. Different variable names, slightly altered control flow, distinct commenting styles. MOSS saw two unique solutions. Anya’s intuition saw a ghost.

"MOSS looks for copied text. It's fantastic for the 2005 problem of students emailing .c files to each other. It's blind to the 2024 problem of students using a shared, non-human template." - Prof. Anya Sharma

The Pattern Emerges

She expanded her search, manually comparing the "fingerprint" of the bug. She found it again. And again. A third student, "Samir Gupta," had the same elegant, logical misstep buried in their deletion method. Three students, who according to the seating chart and TA sections had no obvious connection, had produced the same wrong answer in the same structurally specific way.

Anya called her head TA, Mark, a PhD candidate in software engineering. "Run the full assignment set through MOSS with a lower threshold, 30%. Look for clusters."

The results were a mess of false positives—common textbook solutions and legitimate code overlap. The signal was drowned in noise. The bug, however, was a sharper tool. By writing a simple script to search for its syntactic pattern, they identified 11 submissions out of 187 that contained it.

"It's not plagiarism in the classical sense," Mark said, staring at the list. "They didn't copy each other's code. It's like they all copied from a source that itself had a very specific misunderstanding of node deletion."

The source wasn't a person. It was a model.

The Architecture of a Cheat

Anya and Mark devised a hypothesis. One student—or a small group—had used an LLM like ChatGPT or GitHub Copilot to generate a solution. That solution contained a subtle, non-obvious bug. This "seed solution" was then distributed. The recipients weren't copying the code verbatim. They were using the same AI tool, feeding it the same or a similar prompt, and receiving functionally identical logic, which they then manually "transcribed" into their own coding style. Or, they were lightly paraphrasing the seed solution itself.

The result was a distributed plagiarism ring with a single, non-human author. Traditional similarity checkers failed because the surface-level text differed. The conceptual fingerprint, however—the architecture of the solution, including its flaws—was a perfect match.

They needed to prove it. They needed to move from hunting for similar *code* to hunting for similar *logic and structure*.

Anya secured a small departmental grant to trial a more modern analysis platform. She uploaded the 11 suspect files and 20 known-clean samples for a baseline. The system, which included Codequiry's newer analysis engines, didn't just compare raw tokens. It built abstract syntax trees (ASTs), analyzed control flow graphs, and normalized logic.

The report it generated was damning. It clustered the 11 submissions into a single, tight group with a 94% structural similarity score. The report highlighted not just the bug, but the identical traversal order, the same unnecessary helper function pattern, and a peculiar preference for ternary operators in situations where if-else blocks were clearer.

// A normalized logic pattern found in 9 of the 11 submissions
return (leftHeight > rightHeight) ? leftHeight + 1 : rightHeight + 1;
// Versus the more common student pattern:
if (leftHeight > rightHeight) {
    return leftHeight + 1;
} else {
    return rightHeight + 1;
}

The AI had a style. And the students had adopted it.

The Confrontation

Anya brought in Alex Chen first. She showed him his code and Jamie's, side by side, with the structural analysis overlay. She pointed to the AST comparison, which showed the two codebones as nearly identical skeletons.

"We didn't copy," Alex insisted, his initial defiance fading as he scanned the technical report. "We just... used the same resources to study."

"What resources, Alex?"

A long silence. "A Discord server. Someone posted a 'model answer' for the BST problem. They said it was from a tutor. We all used it to check our work."

"And this model answer had the bug? The one where the leaf node isn't properly nullified?"

Alex looked at the floor. "I guess. I thought it was right. It looked so clean."

The story unfolded. A student in a previous semester had built a private Discord bot. For a small Venmo payment, you could DM the bot an assignment prompt. It would return a complete, commented solution. The bot was a wrapper for the GPT-4 API. The "model answer" was AI-generated, every time. The bug was a consistent hallucination of that particular model version when given that specific prompt.

The 11 students had purchased the service. Some copied the code directly and changed variable names. Others used it as a guide, effectively re-implementing the AI's flawed logic. To MOSS, they were innocent. To an analysis that understood *how* code thinks, not just *what* it says, they were a textbook case of modern collusion.

The Aftermath and the New Policy

Carlton's academic integrity board suspended the 11 students. The student running the Discord bot was expelled. The story, kept internal to the department, sent shockwaves through the faculty lounge.

The old policy—"Don't copy code"—was obsolete. The new threat was conceptual contamination. Students could cheat without ever seeing another student's file, by sharing a prompt or outsourcing their thinking to the same black box.

Carlton's CS department made three major changes:

  1. Tooling Shift: They supplemented MOSS with a detection system capable of structural fingerprinting. The goal wasn't to find matching strings, but matching *decision trees* within the code.
  2. Assignment Design: Anya now designs problems with "logic traps." She asks for implementations that are slightly off-textbook, requiring a nuanced understanding no boilerplate AI response can provide. She incorporates recent lecture-specific examples into the requirements.
  3. Transparent Education: On day one, she now shows students a side-by-side: a MOSS report of two AI-derived solutions (low similarity) and an AST/control-flow graph report (near-perfect match). "This is what we see now," she tells them. "You are not cheating a person. You are cheating a pattern recognition system that understands the machine's fingerprints better than you do."

The incident cost a semester of turmoil, but it provided a blueprint. The cheat of the future isn't a copied file. It's a shared cognitive shortcut. Detecting it requires looking past the words and into the architecture of the solution itself. As Anya puts it, "We stopped looking for plagiarized sentences and started looking for plagiarized thought."

The ghost in the machine, it turns out, leaves very clear footprints. You just need to know where, and how, to look.