The Assignment That Broke Every Plagiarism Checker

The Perfect Assignment

Professor Elena Vance of Carnegie Mellon's School of Computer Science prided herself on her assignment design. Her Spring 2024 Data Structures course featured what she called "the binary tree bomb"—a deceptively simple problem that required implementing a self-balancing AVL tree with a custom iterator. The specification was precise: 12 specific methods, exact signatures, comprehensive JUnit tests provided. She'd used variations for three years with minimal cheating.

"The tests were the gatekeeper," she told me in her office, surrounded by whiteboards covered in algorithms. "If your code passed all 47 test cases, you understood the mechanics. If it failed even one, you'd missed something fundamental about rotation or height balancing."

She ran submissions through the standard stack: MOSS for initial similarity screening, then Codequiry for deeper structural analysis. The process caught the usual offenders—students who copied from GitHub repos, shared solutions via Discord, or paraphrased Stack Overflow snippets. Then came Submission #307.

The Ghost in the Machine

TA Mark Chen noticed it first during manual review. "The code looked wrong," he said. "Not wrong as in incorrect—it passed all tests—but wrong as in alien. The variable names were bizarre but consistent. The structure was… rearranged."

He pulled up the suspect submission alongside a known correct solution from a previous semester. Functionally identical. Logically equivalent. But to MOSS and every other similarity detector, they appeared completely different.

"The tools showed 8% similarity. My eyes saw 95%. That's when I realized we were dealing with something new."

Here's a simplified example. A normal AVL tree insertion method might look like this:

public Node insert(Node node, int key) {
    if (node == null) return new Node(key);
    
    if (key < node.key) node.left = insert(node.left, key);
    else if (key > node.key) node.right = insert(node.right, key);
    else return node; // Duplicate keys not allowed
    
    node.height = 1 + Math.max(height(node.left), height(node.right));
    int balance = getBalance(node);
    
    // Left Left Case
    if (balance > 1 && key < node.left.key) return rightRotate(node);
    // Right Right Case
    if (balance < -1 && key > node.right.key) return leftRotate(node);
    // Left Right Case
    if (balance > 1 && key > node.left.key) {
        node.left = leftRotate(node.left);
        return rightRotate(node);
    }
    // Right Left Case
    if (balance < -1 && key < node.right.key) {
        node.right = rightRotate(node.right);
        return leftRotate(node);
    }
    return node;
}

Submission #307 contained this equivalent logic:

public Node place(Node current, int value) {
    if (current == null) return new Node(value);
    
    int comparison = Integer.compare(value, current.value);
    Node alteredChild = null;
    
    if (comparison < 0) alteredChild = place(current.lesser, value);
    else if (comparison > 0) alteredChild = place(current.greater, value);
    else return current;
    
    if (comparison < 0) current.lesser = alteredChild;
    else if (comparison > 0) current.greater = alteredChild;
    
    current.tallness = 1 + maximum(tallness(current.lesser), tallness(current.greater));
    int equilibrium = equilibriumFactor(current);
    
    if (equilibrium > 1 && Integer.compare(value, current.lesser.value) < 0) 
        return rotateClockwise(current);
    if (equilibrium < -1 && Integer.compare(value, current.greater.value) > 0) 
        return rotateCounter(current);
    if (equilibrium > 1 && Integer.compare(value, current.lesser.value) > 0) {
        current.lesser = rotateCounter(current.lesser);
        return rotateClockwise(current);
    }
    if (equilibrium < -1 && Integer.compare(value, current.greater.value) < 0) {
        current.greater = rotateClockwise(current.greater);
        return rotateCounter(current);
    }
    return current;
}

"This wasn't simple find-and-replace," Vance explained. "This was systematic transformation following rules a human wouldn't naturally choose. The algorithm was preserved perfectly, but every surface feature was altered. Control structures rewritten. Method extraction patterns changed. Even the error conditions were expressed differently."

The Investigation

Vance's department launched what became known internally as "Operation Binary." They discovered the technique wasn't isolated. Across 15 submissions in three different courses, they found similarly transformed code. The common thread: all students had accessed a private Discord server called "Code Alchemy."

The server offered a service: upload any working Java, Python, or C++ solution, and their "obfuscation engine" would return functionally identical code with near-zero similarity scores against the original. For $50 per assignment, students received what the server advertised as "plagiarism-proof code."

The engine used a multi-layer approach:

  1. Semantic-preserving transformations: Loop unrolling, function inlining, dead code insertion that didn't affect output
  2. Systematic renaming: Not random, but following thematic dictionaries (astronomy terms, Greek letters, fantasy characters)
  3. Control flow restructuring: Converting while loops to for loops with complex conditions, swapping if-else chains for switch statements
  4. Comment and formatting obfuscation: Strategic whitespace, misleading comments, inconsistent formatting

"What made it dangerous," said Dr. Arjun Patel, who led the technical analysis, "was that it wasn't just obfuscation. Obfuscated code looks suspicious. This looked like competent, if slightly odd, student work. The transformations were just within the bounds of what a creative but struggling student might produce."

The Tools That Failed

The department tested their entire detection arsenal against known pairs of original and transformed code:

ToolReported SimilarityWhy It Failed
MOSS7-12%Token-based matching defeated by systematic renaming and restructuring
JPlag9-15%AST comparison confused by control flow alterations
Simple fingerprinting3-8%Hash-based approaches shattered by surface changes
Manual review (initial)UndetectedTA's assumed "eccentric style" not plagiarism

Even Codequiry's initial configuration, set to standard academic sensitivity, flagged only 14% similarity. "We were looking for copying, not translation," Vance admitted. "The assumption was that students would copy with minor modifications. This was something else entirely—automated, systematic code translation designed specifically to defeat our tools."

The Breakthrough

The turning point came when Patel's team stopped looking for similarity and started looking for unnatural consistency. "The transformations followed patterns too perfect for human work," he said. "When we analyzed 20 'obfuscated' submissions, we found the same transformation rules applied across different codebases."

For example, every transformed piece:

  • Always converted i++ to i += 1 in for loops
  • Never used ternary operators, even when natural
  • Consistently placed opening braces on new lines in Java (unlike most student code)
  • Used the same unusual variable name patterns across unrelated submissions

They developed a new detection heuristic: transformation fingerprinting. Instead of comparing code to code, they compared transformation patterns to known obfuscation signatures. When they ran this analysis against the Spring 2024 submissions, they found 47 students across four courses had used the service.

The Aftermath

CMU's academic integrity committee faced a dilemma. The students had clearly violated the honor code, but the method was novel enough that existing policies didn't explicitly cover automated code transformation services. In the end, 39 students accepted reduced grades and mandatory integrity seminars. Eight contested the charges, arguing they'd merely used "a coding style helper."

The university made three immediate changes:

  1. Assignment redesign: Vance now requires students to explain their design choices in inline comments. "If you can't explain why you used a HashMap instead of a TreeMap, you didn't write the code."
  2. Tool recalibration: The department worked with Codequiry to implement transformation-aware detection, lowering thresholds for certain language patterns.
  3. Process changes: All high-similarity and low-similarity outliers now trigger manual review. "We look for code that's too different as carefully as code that's too similar," said Chen.

The New Arms Race

What happened at CMU wasn't isolated. In the six months since, Stanford, MIT, and University of Washington have reported similar incidents. The "Code Alchemy" Discord server was shut down, but three others have appeared. The price has dropped to $20 per assignment.

"This isn't cheating in the traditional sense. It's code laundering. And it forces us to rethink what originality means in programming education."

Vance now begins each semester with a frank discussion about these services. She shows students side-by-side comparisons of original and "laundered" code. "I tell them: We will catch you. Not because our tools are perfect, but because we understand what the tools miss."

The incident revealed a fundamental truth: plagiarism detection can't just find copied code. It must identify derived code—code that preserves logic while disguising origin. This requires combining structural analysis with behavioral patterns, something human reviewers spotted long before algorithms did.

As Patel put it: "The binary tree assignment didn't break our plagiarism checkers. It showed us they were looking for the wrong thing. Students aren't just copying code anymore. They're having it professionally laundered. And we need detectors that understand the difference between creative variation and systematic obfuscation."

CMU's experience serves as a warning to every computer science department. The old models of plagiarism detection—finding similar strings, tokens, or structures—are becoming obsolete. The new challenge isn't detecting copying. It's detecting derivation. And that requires tools and humans working in ways we're just beginning to understand.