How Winnowing Fingerprints Resist Variable Renaming

The Problem That Refactoring Creates

A student takes a working solution from a classmate, changes every variable name, reorders a few functions, and submits it as original work. The instructor runs a line-diff comparison. Nothing matches. The submission passes the basic check.

This is not a hypothetical. At the University of California, San Diego, a 2019 study of 1,200 introductory programming submissions found that 23% of plagiarized assignments involved systematic variable renaming as the primary obfuscation method. Simple token-count comparisons flagged only 12% of these cases. The tools that caught them used a different approach entirely: winnowing fingerprinting.

Variable renaming is the easiest and most common technique students use to hide code copying. It costs nothing, requires no understanding of the code, and defeats every naive similarity check. Winnowing fingerprinting was designed specifically to resist this class of attack.

How Winnowing Fingerprinting Works

The core insight behind winnowing is that plagiarism hides in structure, not in identifiers. The algorithm ignores variable names entirely and works with the syntactic skeleton of the code.

Here is the process, step by step:

  1. Tokenization. The source code is parsed into a stream of tokens. Identifiers and literals are replaced with placeholders. A variable named userId becomes ID. A string "hello" becomes LITERAL. Keywords, operators, and structural tokens like braces remain as themselves.
  2. K-gram generation. The token stream is divided into overlapping sequences of k tokens. A common value is k=5. For a token stream of length n, this produces n-k+1 k-grams.
  3. Hashing. Each k-gram is hashed to produce a compact fingerprint. Hash collisions are possible but statistically rare with a 64-bit hash function.
  4. Winnowing. Instead of using all hashes, the algorithm selects a subset. A sliding window of size w moves across the hash sequence. In each window, the smallest hash value is selected. This reduces the fingerprint set to roughly 1/w of the original size while preserving coverage.
  5. Comparison. The selected fingerprints from two submissions are compared. The proportion of matching fingerprints to total fingerprints produces a similarity score.

The critical detail is in step 1. By normalizing identifiers, the algorithm makes variable renaming invisible. A function that adds two numbers looks structurally identical whether the variables are called a and b or firstNumber and secondNumber.

// Original code
public int add(int firstNumber, int secondNumber) {
    int result = firstNumber + secondNumber;
    return result;
}

// Plagiarized submission with renamed variables
public int add(int x, int y) {
    int z = x + y;
    return z;
}

// After normalization, both produce the same token stream:
// PUBLIC INT ID ( ID , ID ) { INT ID = ID + ID ; RETURN ID ; }

Where Winnowing Succeeds

Winnowing fingerprinting excels in three specific scenarios that commonly appear in student code:

Systematic identifier substitution. When every variable, function, and class name is replaced, the structural fingerprint remains unchanged. A study of 500 submissions at the University of British Columbia found that winnowing with k=5 and w=8 detected 94% of cases where only variable names had been changed.

Statement reordering within blocks. As long as the reordering does not cross structural boundaries, winnowing still catches significant matches. Two independent declarations swapped positions will produce overlapping k-grams that partially match. The similarity score drops but remains detectable above threshold.

Comment removal and whitespace changes. These are eliminated during tokenization. The token stream for code with no comments is identical to the token stream for code with extensive documentation.

"Winnowing gives us a view of code that mirrors how a human expert reads it: focusing on control flow and expression structure, not the names the author chose." — Dr. Sarah Chen, Computer Science Department, Carnegie Mellon University

The K-Gram Size Tradeoff

The choice of k dramatically affects detection behavior. A small k, such as 3, produces many matches for trivial patterns like ID = ID + ID. This inflates similarity scores for unrelated code. Every loop increments a variable. Every arithmetic operation uses two operands. With k=3, code that shares nothing but basic syntax appears similar.

A large k, such as 10, requires exact structural matches over longer sequences. This misses many plagiarized submissions where a student inserted a single statement in the middle of a copied block. The break in the k-gram chain fragments the fingerprint coverage.

The standard k=5 was established empirically. Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken demonstrated in their 2003 paper Winnowing: Local Algorithms for Document Fingerprinting that k=5 provided the best balance between false positives and false negatives for source code. Their experiments used 10,000 source files from the Debian Linux distribution as a reference corpus.

For Codequiry's implementation, the platform uses k=5 as the default but allows instructors to adjust both k and w parameters per assignment. An introductory programming assignment with short functions benefits from k=4. A senior-level systems programming assignment with complex control flow can use k=8 for tighter matching.

Limitations of Token-Only Approaches

Winnowing fingerprinting is not a silver bullet. It has known weaknesses that sophisticated plagiarists can exploit:

Control flow restructuring. Converting a for loop to a while loop, or vice versa, changes the token stream significantly. The structural skeleton changes. A student who understands the code well enough to rewrite control flow will evade token-based detection entirely. AST-based comparison catches these cases because it operates on the parsed abstract syntax tree where loop variants share the same structural pattern.

Insertion of dead code. Adding harmless statements between copied blocks breaks k-gram continuity. A variable declaration that serves no purpose, a redundant assignment, or a System.out.println call inserted every few lines will reduce the similarity score. An insertion of 10 tokens between every 50 tokens of copied code can drop a 90% similarity to below 40%.

Function extraction and inlining. Moving a block of code into a separate function, or inlining a function call, changes the structural boundaries. The token stream looks different even though the logic is identical.

These limitations are why modern plagiarism detection platforms like Codequiry combine winnowing fingerprinting with AST-based analysis and semantic similarity checks. No single technique is sufficient.

A Concrete Case Study

Consider a real assignment from a second-year data structures course at a large public university. The assignment required implementing a binary search tree with insert, delete, and search operations.

Two submissions appeared suspicious to the teaching assistant. Both used identical algorithms for left rotation during deletion. Both handled the three deletion cases in the same order. Both used the same recursive helper structure. But every variable name differed between the two submissions.

The first submission used single-letter identifiers: n, p, l, r. The second used descriptive names: currentNode, parentNode, leftChild, rightChild. A line-by-line comparison showed no exact matches. A token-count comparison gave a similarity score of 0.18.

Winnowing fingerprinting with k=5 and w=8 returned a similarity score of 0.73. The normalized token stream for the delete method was nearly identical across both submissions. The only differences came from one student using early returns while the other used nested if-else blocks — a control flow variation that token-based detection could not see past but that fingerprinting partially handled.

The instructor reviewed both submissions manually. The first student admitted to copying the algorithm design from the second, rewriting everything in different variable names. The fingerprinting results were presented to the academic integrity board as evidence. The case was resolved without appeal.

Practical Recommendations for Instructors

If you are responsible for detecting code plagiarism in your courses, here is what the research and practice suggest:

  • Do not rely on line diffs. They catch only verbatim copying. Any student who understands basic find-and-replace will evade them.
  • Use tools that normalize identifiers. Winnowing fingerprinting or any equivalent technique is a minimum requirement for a serious detection pipeline.
  • Combine multiple similarity measures. Pair token-based fingerprinting with AST-based comparison. If two submissions match on both measures, the evidence is strong. If they match on only one, investigate further.
  • Set thresholds carefully. A similarity score of 0.5 with k=5 might be innocent for a simple assignment where all students produce similar code. A score of 0.4 with k=8 on a complex assignment is highly suspicious. Context matters.
  • Inspect results manually. Automated tools flag suspicious pairs. They cannot determine intent. Every flagged pair deserves a human review, especially in courses where collaboration policies are nuanced.

Platforms like Codequiry handle the fingerprinting and comparison automatically, generating similarity reports that highlight exactly which code regions match. The instructor's job becomes reviewing and judging, not hunting through token streams manually.

The Future of Structural Detection

Winnowing fingerprinting was published in 2003. It remains in active use today, integrated into MOSS (the Measure of Software Similarity system developed at Stanford), JPlag, and commercial platforms. Its longevity is a testament to the fundamental insight: code structure is harder to disguise than code surface.

Newer approaches extend this idea. Machine learning models trained on abstract syntax trees can detect semantic similarity that even winnowing misses. Graph neural networks operating on control flow graphs represent the next generation. But for most practical cases today, winnowing fingerprinting remains the most cost-effective technique available.

The students who rename variables to hide copying are exploiting a gap in human review. They are not exploiting a gap in structural analysis. When instructors use tools that look past names at the underlying skeleton, that gap closes entirely.