The Code That Broke a University's Honor Code

The email landed in Professor Martin Reeves’s inbox at 11:47 PM on a Sunday. The subject line was stark: “Formal Complaint - CS106B Assignment 3.” It was from a group of four students, and their tone was one of betrayed fury. They had spent 40 hours on the Pathfinder graph algorithm assignment, only to discover another team’s final submission was, in their words, “a blatant, obfuscated copy of our public GitHub repository.”

Reeves, a 20-year veteran of teaching Stanford’s flagship data structures course, felt a familiar dread. He pulled up the two submissions in question. On the surface, they looked nothing alike. The complaining team’s code was clean, modular, and used a recursive depth-first search approach. The accused team’s submission was a sprawling, single-file script littered with oddly named variables like tempVarAlpha and dataHolderX, used iterative loops, and had a completely different control flow. He ran both through MOSS (Measure of Software Similarity), the plagiarism detection tool Stanford had relied on for a decade. The result came back: 12% similarity – well below the 40% threshold that typically flagged a manual review.

“The students were adamant,” Reeves told me later. “They said, ‘Run our original repo against their final submission, but ignore variable names and whitespace.’ That’s when I realized our process was blind to a whole class of cheating.”

“We were in an arms race, and the students had just unveiled a stealth bomber while we were still checking for stolen bicycles.”

The students were right. The accused team had employed a multi-layered obfuscation technique. They had taken the core logic from the public GitHub repo, run it through a crude auto-refactoring script that converted recursion to iteration, inlined functions, added redundant control structures, and then used a simple dictionary to replace every meaningful identifier. The algorithmic heart – the adjacency list traversal, the cycle detection condition, the backtracking logic – was identical. MOSS, which primarily uses a winnowing algorithm on tokenized code (shingles of normalized tokens), was completely fooled by the structural and lexical noise.

Here’s a simplified illustration. The original, clean code snippet:

def find_shortest_path(graph, start, end, path=[]):
    path = path + [start]
    if start == end:
        return path
    shortest = None
    for node in graph[start]:
        if node not in path:
            newpath = find_shortest_path(graph, node, end, path)
            if newpath:
                if not shortest or len(newpath) < len(shortest):
                    shortest = newpath
    return shortest

Was transformed into this obfuscated, but functionally identical, version:

def calculateRoute(network, pointA, pointB, previousRoute=[]):
    currentRoute = previousRoute + [pointA]
    if pointA == pointB:
        return currentRoute
    bestRouteSoFar = None
    vertexList = network[pointA]
    for vertex in vertexList:
        if vertex not in currentRoute:
            tempRoute = calculateRoute(network, vertex, pointB, currentRoute)
            if tempRoute is not None:
                if bestRouteSoFar is None or (len(tempRoute) < len(bestRouteSoFar)):
                    bestRouteSoFar = tempRoute
    return bestRouteSoFar

“The logic, the recursion pattern, the condition checks – it’s a direct translation,” Reeves explained. “MOSS sees different tokens, different structure. But any human, or a more sophisticated tool looking at abstract syntax trees or program dependence graphs, would see the clone.”

The Aftermath and the Audit

The immediate case was resolved: the accused team confessed after being confronted with a side-by-side AST (Abstract Syntax Tree) visualization Reeves manually generated. But the damage was done. The incident triggered a department-wide audit of the previous two semesters’ submissions for CS106B. Teaching assistants, armed with a new suspicion, began looking for patterns MOSS missed.

They found them. Not widespread, but insidious. A student who had translated a Java solution from GitHub into Python, preserving the algorithm but not a single line of syntactic similarity. Another who had used a code obfuscator meant for production JavaScript on their C++ homework. The audit revealed a 3.7% incidence of sophisticated, tool-evading plagiarism that their old system had missed entirely.

“It was a wake-up call,” said Dr. Anya Sharma, the department head. “We were teaching cutting-edge graph algorithms but using decade-old technology to protect academic integrity. We had focused on catching lazy copy-paste. The sophisticated cheaters had moved on.”

Rebuilding the Defenses: A Multi-Layered Approach

Stanford’s CS department didn’t just buy a new tool. They redesigned their entire integrity framework around a core principle: detection is a last line of defense; the primary goal is to make plagiarism less viable and more risky.

1. Assignment Design as a Deterrent

They moved away from generic, “solve this known problem” assignments. The new Pathfinder assignment, for instance, used a unique, procedurally generated “mythical creature migration map” as its graph data, different for each student. The core algorithm was the same, but the input domain and required output format were personalized. Copying a solution required understanding and adapting it, which defeated most casual cheaters.

2. The Tooling Stack Upgrade

They supplemented MOSS with a commercial platform, Codequiry, that offered the AST and semantic analysis they needed. The new workflow was layered:

  1. Initial MOSS Scan: Quick filter for blatant copy-paste.
  2. Semantic Similarity Analysis: Tools like Codequiry compared submissions based on control flow, data flow, and logic structure, not just tokens.
  3. Cross-Language & Web Source Checks: Scanning against a corpus of known solutions from GitHub, Stack Overflow, and Chegg.

“The key was the semantic layer,” said a lead TA. “We finally had a way to flag the ‘translated’ or ‘refactored’ clones. It wasn’t automatic guilt, but it perfectly highlighted submissions for us to scrutinize.”

3. The Cultural Shift: Transparency and Education

On the first day of class, Reeves now shows students the obfuscated code example from the scandal. He walks them through how the detection works. He explicitly defines prohibited behavior: “Using any tool to mechanically alter code to evade similarity detection is a direct violation of the Honor Code, worse than simple copying.”

The department also implemented a “code provenance” statement with each submission, where students must declare if they used AI pair-programmers (like Copilot) or consulted specific online resources. Lying on this statement is itself an honor code violation.

“We stopped pretending we could create a perfect plagiarism forcefield. Instead, we focused on raising the cost of cheating while lowering the cost of doing honest work.”

The Results and the Lingering Questions

Two semesters after the overhaul, the department’s data shows a shift. Overt, detectable plagiarism dropped by 60%. The number of honor code cases related to programming assignments remained steady, but their nature changed – they were now almost entirely the sophisticated cases the old system missed. The teaching staff spent less time playing detective and more time on substantive code review.

But new questions emerged. Was the 3.7% they found the full extent, or just the visible tip of an iceberg? How do you fairly assess the “provenance” of code when AI assistants suggest lines that might match another student’s solution? The arms race continues, but on a different battlefield.

“The scandal wasn’t about one group of cheaters,” Reeves concluded. “It was about a systemic failure to adapt. Our tools defined what we could catch, and students quickly learned the boundaries of that blindness. Now, we’ve made the boundaries much harder to find. More importantly, we’re having an honest conversation with students about why those boundaries exist in the first place.”

The broken honor code was repaired, not just with better software, but with a harder look in the mirror. The real lesson for universities and engineering teams alike is that integrity isn’t a feature you can bolt on. It has to be designed into the process, from the first line of the assignment spec to the final scan of the submitted code.