Why Automate? The Grading Bottleneck
Every semester, the same scene plays out in CS departments around the world. A TA spends twelve hours manually diffing forty Java submissions for a single assignment. They catch a few obvious copy-paste jobs. They miss the ones where students changed variable names and reordered methods. Meanwhile, the professor has no systematic way to flag submissions sourced from Stack Overflow, GitHub repositories, or—increasingly—a ChatGPT session.
Scaling a course from 50 to 300 students without an automated pipeline for code integrity isn't just risky. It's untenable. A well-designed grading workflow that plugs code similarity checks—and optionally AI-generation detection—into your existing LMS can reduce manual review to a fraction of the original time while catching significantly more violations.
"We moved from random spot-checks to a pipeline that flags every submission above a configurable similarity threshold. Coursewide cheating dropped by roughly 40% in two semesters." — instructor at a large public R1 university, 2024 survey
Designing the Pipeline: Four Essential Stages
An effective code integrity workflow has four stages:
- Submission collection — usually through GitHub Classroom, Gradescope, or a direct LMS upload.
- Structural similarity analysis — token-based or AST-based comparison of all pairwise submissions.
- Web-source cross‑check — scanning flagged submissions against a corpus of public repositories and Q&A sites.
- AI-generation screening — statistical analysis for code that exhibits LLM‑typical patterns (low perplexity, uniform comment style, unnatural structure).
You don't have to implement all four at once. Start with stage 2, then add web and AI checks as the semester progresses. Each stage feeds the next with prioritized candidates for manual review.
Stage 1: Collecting Submissions Uniformly
GitHub Classroom is the most popular choice for medium-to-large programming courses. It creates a repository per student from a template assignment and can push a branch after each submission. A simple webhook or scheduled GitHub Action can then trigger the next pipeline step.
If your university uses Canvas or Moodle, you can also use their API to download submissions. For example, a small Python script using canvasapi:
from canvasapi import Canvas
import os, zipfile
API_URL = "https://youruniversity.instructure.com"
API_KEY = os.getenv("CANVAS_TOKEN")
course = Canvas(API_URL, API_KEY).get_course(12345)
assignment = course.get_assignment(67890)
for submission in assignment.get_submissions():
if submission.workflow_state == "submitted":
attachment = submission.attachments[0]
attachment.download(f"submissions/{submission.user_id}.zip")
with zipfile.ZipFile(f"submissions/{submission.user_id}.zip", 'r') as z:
z.extractall(f"submissions/{submission.user_id}")
This gives you a flat directory of student code ready for comparison.
Stage 2: Choosing a Similarity Detector
You have several solid options. MOSS (Measure Of Software Similarity) is the classic choice — free for academic use, token‑based, resistant to many obfuscations. It compares files using Winnowing (a fingerprinting algorithm) and returns a color‑coded HTML report. The catch: it's a black‑box service run by Stanford. You cannot modify its tokenization.
JPlag (Karlsruhe Institute of Technology) is open‑source and supports more languages out of the box (Java, Python, C++, JavaScript, TypeScript, and more). It can operate both locally and as a server. Its AST‑based comparison catches structural similarities even after identifier renaming.
Codequiry goes a step further by combining token‑based and web‑source scanning in a single dashboard. It directly compares student code against a pre‑indexed corpus of GitHub repositories, Stack Overflow answers, and academic archives — which is exactly what you need when a student grabbed a solution from a public repo. It also includes an AI‑generated code detector that reports a confidence score based on perplexity and burstiness metrics. For instructors who want one tool that handles all three angles, it significantly reduces pipeline complexity.
Stage 3: Running a Scripted Pipeline
You can orchestrate the entire workflow with a simple shell script. Below is a minimal example that uses MOSS for similarity and then feeds suspicious submissions into Codequiry's web‑scanning API for a second check. (The same script can call a jplag.jar instead.)
#!/bin/bash
# Usage: ./pipeline.sh path/to/submissions
SUBMISSIONS=$1
REPORT_DIR="reports/$(date +%Y%m%d_%H%M)"
mkdir -p "$REPORT_DIR"
# Step 1: Run MOSS (needs moss script in PATH)
moss -l python -d "$SUBMISSIONS" -o "$REPORT_DIR/moss_report.html"
# Step 2: Extract top pairs from MOSS output (simplified)
# Look for pairs > 70% similarity
grep -oP 'href="#match\d+"[^>]*title="\d+%"' "$REPORT_DIR/moss_report.html" | \
grep -oP '\d+%' | sort -rn | head -20 > "$REPORT_DIR/top_scores.txt"
# Step 3: For each suspicious pair, check against web sources
while IFS= read -r score; do
# Map back to file names (omitted for brevity)
# Call Codequiry API or CLI
codequiry check --file "file1.py" --web-scan
done < "$REPORT_DIR/top_scores.txt"
This is a proof of concept. In practice, you'll want to parse MOSS output more robustly and integrate with your LMS to auto‑flag submissions above a threshold.
Stage 4: Layering in AI‑Generation Detection
AI‑generated code detection is not a replacement for similarity checking — it's a complement. A student who writes a solution from scratch with ChatGPT will not show up in MOSS or JPlag (unless they also copy from a classmate). But they will produce code with statistical signatures different from human‑written code: lower perplexity, more uniform comment style, and a tendency to use overly descriptive variable names that feel generic.
Tools like Codequiry's AI detector evaluate these signals and give a probability score. We've found that stacking the three checks — structural similarity (MOSS/JPlag), web‑source cross‑reference, and AI probability — yields a far stronger signal than any single method alone. An example scenario:
- Student A: 15% similarity with classmates, 85% web‑source match to a 2023 GitHub gist. → Likely copy‑paste from internet.
- Student B: 95% similarity with classmates, 2% web match, 8% AI probability. → Likely collusion.
- Student C: 2% similarity, 3% web match, 92% AI probability. → Likely AI‑written, but worth an interview.
No single flag is perfect. The combination, reviewed by a human, is where the power lies.
Integrating with Your LMS for Auto‑Flagging
Once the pipeline produces a CSV of flags, you can push results back into Canvas, Moodle, or Blackboard via their Gradebook APIs. For Canvas, a simple script can update the "comment" field of a submission's rubric:
# Post a plagiarism alert as an instructor comment
import canvasapi
import json
submission = course.get_assignment(67890).get_submission(student_id=1234)
submission.edit(comment={
"text_comment": "AI‑generated probability: 92%. Similarity: 2%. Review recommended.",
"group_comment": False
})
Many instructors prefer to keep the flagging private — visible only to the grading team — until a conversation with the student is scheduled.
Handling False Positives and Student Privacy
No automated pipeline is perfect. Token‑based detectors can flag boilerplate code (imports, main function signatures) as similar. Web‑source checks may match code that is legitimately common (e.g., a standard sorting algorithm). AI detectors are the newest and most controversial: a highly organized human writer can look like an AI, and a heavily edited AI draft can look human.
Set conservative thresholds. Flag, don't accuse. And always grant students the opportunity to explain their process. A brief interview can distinguish a student who copy‑pasted from one who followed a tutorial and then rewrote it — a blurry line, but one worth discussing pedagogically.
Frequently Asked Questions
How much time does it take to set up this pipeline for a 200‑student course?
Initial setup of the script and API integration takes about four hours for a developer‑friendly instructor. Using a unified tool like Codequiry reduces that to under an hour because the scanning and reporting are pre‑integrated.
Can I run these checks on take‑home exams without violating student privacy?
Yes, as long as your institutional policy permits plagiarism detection. Most universities include a clause in the course syllabus granting permission. Never upload submissions to a third‑party service without disclosing it — check your contract with the tool vendor on data retention.
Does MOSS support cross‑language detection? What about detecting code copied from a Python assignment answered in Java?
MOSS does not support cross‑language comparison because it tokenizes per language. JPlag is similarly limited to single‑language pairs. Codequiry's web‑source scan can match code translated between languages (e.g., a Java solution rewritten in Python) by comparing abstract syntax patterns, though this feature is still evolving.
Should I run similarity checks on every assignment or just major projects?
Running on every assignment gives you a baseline and deters cheating from day one. However, low‑stakes homework often has high similarity from collaboration that is permitted. Use the pipeline to enforce the collaboration policy you define — not as a universal hammer.