Your Students Are Copying Code You Can't See

It starts with a gut feeling. You’re grading the third implementation of a hash table for CS 201, and something feels off. The code works. It passes all the unit tests. But the style is… alien. Variable names like a1, tmpVar7, and dataHolderX. Control flow that uses nested ternary operators for no apparent reason. Logic that’s been refactored into a series of unnecessary private methods. You run it through your department’s standard plagiarism checker—maybe MOSS or JPlag—and it comes back clean. No significant match with any other submission or with the obvious GitHub repositories.

You dismiss it as a student with poor coding hygiene or a peculiar learning style. You’re wrong.

What you’ve just witnessed is not bad code. It’s deliberately obfuscated code, engineered specifically to defeat the lexical and token-based similarity detectors that have been the bedrock of academic integrity in computer science for two decades. The student didn’t copy the code. They transformed it. They took a canonical solution—from a peer, from a previous year’s submission leaked on a forum, from a popular GitHub repo—and put it through a series of semantic-preserving mutations designed to break fingerprinting algorithms. The assignment’s functional requirements are met, but the code’s syntactic signature is utterly altered.

“We caught a student who had written a Python script that automatically refactored stolen code—renaming variables, swapping loops, adding redundant operations. It was a plagiarism tool designed to beat our plagiarism tool.” – Dr. Anya Sharma, Associate Professor of Computer Science, University of Toronto.

This is the new frontier of academic dishonesty. It’s no longer about copy-pasting from Stack Overflow. It’s about algorithmic cheating, where the act of plagiarism itself is automated and optimized against the very systems built to catch it. For CS educators, it creates a chilling paradox: the more sophisticated your detection tools become, the more sophisticated the evasion techniques grow. We’re not in an era of cheating; we’re in an era of cheating-as-a-service and adversarial code generation.

How Traditional Checkers Work (And Why They Fail)

To understand the evasion, you must first understand the hunt. Most academic code plagiarism detectors, like the venerable MOSS (Measure of Software Similarity) or JPlag, operate on a similar principle: reduce code to a comparable set of fingerprints.

MOSS, for instance, primarily uses a winnowing algorithm over k-grams (contiguous sequences of tokens). It strips comments and whitespace, normalizes identifiers (turning all variable names into a placeholder like ‘V’), and then hashes overlapping sequences of tokens. These hashes form a “fingerprint” of the document. If two submissions share enough matching fingerprints, they are flagged as similar.

Here’s a simplistic view of the process for a simple function:

// Original Code
public int calculateSum(int[] numbers) {
    int total = 0;
    for (int i = 0; i < numbers.length; i++) {
        total += numbers[i];
    }
    return total;
}

After tokenization and identifier normalization, the core “skeleton” MOSS sees might be:

public int V ( int [ ] V ) {
    int V = 0 ;
    for ( int V = 0 ; V < V . length ; V ++ ) {
        V += V [ V ] ;
    }
    return V ;
}

The fingerprint is derived from this normalized token stream. This approach is brilliant for catching direct copying, renaming of variables, or minor comment alterations. But its weakness is fundamental: it is syntactically naive. It understands sequence, not structure. It sees a flat stream of tokens, not the abstract syntax tree (AST). This makes it vulnerable to any transformation that alters the token stream while preserving runtime behavior.

The Obfuscator's Toolkit

The student aiming to evade detection has a powerful arsenal of semantics-preserving transformations. Individually, each is a simple refactoring. In combination, they create a code doppelgänger—identical in output, unrecognizable to a token matcher.

1. Identifier Obfuscation & Systematic Renaming

The simplest step. Instead of calculateSum, total, numbers, and i, use func_01, var_a, input_array, and index_j. While basic checkers normalize these, advanced obfuscators use contextual renaming to break pattern matching. They might rename all variables in one function using a dictionary, and use a completely different dictionary for another function, destroying any consistent naming pattern a checker might latch onto.

2. Control Flow Alteration

This is where the real magic happens. The logical flow is rearranged without changing the outcome.

Loop Transformations: Convert a for loop to a while loop. Unroll a loop partially. Invert loop conditions.
Branching Reorganization: Replace an if-else chain with a switch statement (or vice-versa). Use ternary operators (? :) to replace simple ifs.
Dead Code Insertion: Add code that has no effect. if (true) { /* real code */ } or int dummy = x * 0; // dead store. This inserts unique token sequences that poison the fingerprint.

// Original
if (x > 0) {
    return true;
} else {
    return false;
}

// Obfuscated Version 1 (Ternary)
return x > 0 ? true : false;

// Obfuscated Version 2 (Boolean Expression)
boolean result = false;
if (x > 0) result = true;
return result;

// Obfuscated Version 3 (With Dead Code)
int flag = (x > 0) ? 1 : 0;
if (flag == 1) {
    int unused = 42 * 0; // Dead code
    return true;
}
if (flag != 1) {
    return false;
}
return false; // Unreachable, but adds tokens

3. Statement Permutation and Block Splitting

Where order doesn’t matter, scramble it. Break a single logical block into multiple smaller methods that do trivial work, artificially inflating the structure.

// Original cohesive function
public Node insert(Node root, int key) {
    if (root == null) return new Node(key);
    if (key < root.key) root.left = insert(root.left, key);
    else if (key > root.key) root.right = insert(root.right, key);
    return root;
}

// Obfuscated via method splitting
public Node insert(Node root, int key) {
    root = handleNullRoot(root, key);
    return performInsertion(root, key);
}
private Node handleNullRoot(Node r, int k) { return (r == null) ? new Node(k) : r; }
private Node performInsertion(Node r, int k) {
    if (k < r.key) r.left = delegateLeftInsert(r.left, k);
    else r = handleRightSide(r, k);
    return r;
}
private Node delegateLeftInsert(Node n, int k) { return insert(n, k); }
private Node handleRightSide(Node r, int k) {
    if (k > r.key) r.right = insert(r.right, k);
    return r;
}

The recursive logic is identical, but the token stream and call graph are completely different. A k-gram matcher will see almost zero overlap.

4. Data Structure Substitution and Opaque Predicates

Replace a HashMap with a TreeMap if order isn’t critical. Use an ArrayList where a plain array would suffice. Introduce opaque predicates—boolean expressions that always evaluate to true or false but are non-trivial for a static analyzer to prove (e.g., if ( (x * x) % 2 == (x % 2) ), which is always true for integers). This adds unique, complex syntactic noise.

The Detection Arms Race: Moving Beyond Tokens

If token-based checkers are the Maginot Line, then the next generation of defenses must be mobile and intelligent. Catching obfuscated plagiarism requires analyzing what the code does, not just what it looks like. This pushes us into the realm of program analysis, a field traditionally reserved for compiler optimization and formal verification.

Strategy 1: Abstract Syntax Tree (AST) Comparison with Normalization

Instead of tokens, compare the underlying tree structure of the program. An AST represents the nested, hierarchical nature of code. An obfuscator that scrambles tokens often leaves the AST largely intact. The key is to normalize the AST before comparison.

Canonicalize Control Flow: Convert all loops to a canonical form (e.g., all become while(true) with breaks). Normalize if-else and switch statements.
Flatten Structure: Inline trivial helper methods (like those created solely for obfuscation).
Semantic Labeling: Instead of normalizing all identifiers to ‘V’, label them by their role and data flow. The variable that holds the running total in a sum function gets label “ACCUMULATOR” regardless of whether it’s called total, sum, or var_xyz.

Comparing normalized ASTs using tree-edit distance or subtree hashing can reveal similarities that token streams miss entirely.

Strategy 2: Program Dependence Graphs (PDG)

This is the heavy artillery. A PDG represents the data and control dependencies between statements in a program. It answers: “Does the value computed at statement A influence the execution or result of statement B?”

“Two programs that compute the same function will have isomorphic or highly similar program dependence graphs, even if their surface syntax is wildly different. It’s like comparing the blueprints of two engines instead of the paint jobs on the cars.” – Professor Michael Godfrey, University of Waterloo, co-creator of the Code Differencing tool srcML.

Building and comparing PDGs is computationally expensive, but it is remarkably resilient to obfuscation. Swapping a for loop for a while loop doesn’t change the dependence edges. Splitting a method creates new nodes, but the essential dependence relationships between the core operations remain.

Strategy 3: Runtime Behavior Profiling (The Nuclear Option)

When static analysis hits its limits, you can analyze what the code does. For a given assignment, create a comprehensive, randomized test suite. Run all student submissions through it and profile not just pass/fail, but:

Execution Traces: Log the sequence of function calls (a dynamic call graph).
Memory Access Patterns: For data structure assignments, profile the sequence of inserts, searches, and deletions.
Corner Case Behavior: How does the code handle null input, overflow, or empty collections?

Two programs that are obfuscated copies of each other will produce identical or nearly identical runtime profiles across a wide range of inputs. This approach is resource-intensive and requires careful test design to avoid false positives from logically similar but independently derived solutions, but its evidential power is immense.

A Case Study: The Binary Search Tree That Wasn't

At a large public university in the Midwest, a TA for Data Structures grew suspicious of two BST implementations. Submission A was clean, well-commented, and used standard recursive algorithms. Submission B was a mess of one-letter variables, contained a bizarre switch statement inside its insertion logic, and had its inOrderTraversal logic spread across three methods.

MOSS reported 12% similarity—below the 15% departmental threshold. The TA’s intuition pushed her to use a more advanced tool (like Codequiry’s backend, which employs multi-layered analysis including AST and control-flow metrics). That analysis revealed a 94% similarity at the normalized AST level. The switch in Submission B was just a rewritten series of if-else checks from Submission A. The three traversal methods were, when inlined, algorithmically identical to the single method in A.

Confronted with the AST diff visualization, the student admitted to using an online “code obfuscation service” meant for “protecting intellectual property” on a friend’s solution. This wasn’t a lazy copy-paste; it was a premeditated, technical circumvention of the academic integrity system.

Designing Plagiarism-Resistant Assignments

Detection is a reactive game. The proactive strategy is to design assignments where obfuscation is difficult or pointless.

Customized Specifications: Instead of “implement a BST,” require “implement a BST where each node also stores the count of nodes in its left subtree, and provide a method findKthSmallest() that uses this count.” Unique, nuanced requirements drastically reduce the availability of canonical solutions online.
Focus on Process, Not Just Product: Require intermediate submissions—design documents, unit tests written before implementation, commit histories from a provided Git repository. Obfuscation typically happens at the final step; the process artifacts remain similar.
Incorporate Unique Data: Base the assignment on a unique dataset you provide (e.g., “build a graph from the campus map data I’m giving you and find the shortest path from the library to the engineering building”). This anchors the problem in a context that doesn’t exist on GitHub.
Oral Examinations (The Ultimate Defense): A 10-minute “code walkthrough” where a student explains their submission. Someone who obfuscated stolen code will be unable to explain the rationale behind the bizarre structure they created.

The Ethical and Pedagogical Tightrope

This arms race forces us to ask uncomfortable questions. Are we teaching students to be ethical engineers, or are we training them to be better adversaries? Every hour spent developing a more complex detector is an hour not spent on pedagogy.

The goal cannot be a perfect panopticon. It must be to raise the cost and complexity of cheating to the point where doing the work becomes the path of least resistance. By combining robust, multi-faceted detection capable of seeing through obfuscation with thoughtfully designed, plagiarism-resistant assignments, we can reclaim the focus of the classroom. We can stop policing syntax and start evaluating understanding.

The student copying code you can’t see is betting that you’re only looking at the surface. It’s time to look deeper.