Detect plagiarised and similar code across trillions of code sources on the web See what's new
Emily Watson

Emily Watson

Academic Integrity Specialist at Codequiry

Emily works with CS departments and teaching teams on assignment design and fair, defensible plagiarism workflows.

Articles by Emily Watson

Teaching Code Attribution Before Students Write a Single Line Academic Integrity 11 min
Emily Watson Emily Watson 1 day ago

Teaching Code Attribution Before Students Write a Single Line

Too many CS students treat code from Stack Overflow, GitHub, or AI tools as free for the taking. Teaching attribution as a core skill from the first assignment reduces plagiarism and builds professional habits. This article walks through concrete strategies, assignment patterns, and detection workflows that make attribution part of the learning process.

What 1200 Python CS1 Submissions Reveal About AI-Written Code Signatures Case Studies 9 min
Emily Watson Emily Watson 5 days ago

What 1200 Python CS1 Submissions Reveal About AI-Written Code Signatures

We analyzed 1200 introductory Python submissions from three semesters, applying perplexity, burstiness, and token-frequency analysis to separate human-written code from AI-generated samples. The results reveal a consistent set of statistical signatures that can catch GPT-generated and Copilot-assisted assignments—with measured false-positive rates at each threshold.

Automating Code Plagiarism Detection in Your Grading Workflow Tutorials 8 min
Emily Watson Emily Watson 1 week ago

Automating Code Plagiarism Detection in Your Grading Workflow

A practical walkthrough for CS instructors who want to wire code similarity checks directly into their grading workflow. Covers tooling choices, LMS integration, and how to layer in web-source and AI-generated code detection for a complete academic integrity pipeline.

K-gram Fingerprinting for Source Code Similarity Analysis General 9 min
Emily Watson Emily Watson 1 week ago

K-gram Fingerprinting for Source Code Similarity Analysis

K-gram fingerprinting is the backbone of modern code plagiarism detection. This step-by-step guide walks through tokenization, k-gram generation, hashing, winnowing, and comparison — the exact pipeline used by MOSS and Codequiry. Includes Python code examples, algorithmic tradeoffs, and real-world scaling numbers.

How Abstract Syntax Tree Comparison Detects Restructured Code General 1 min
Emily Watson Emily Watson 2 weeks ago

How Abstract Syntax Tree Comparison Detects Restructured Code

Abstract syntax tree (AST) comparison is a powerful technique for detecting code plagiarism that has been restructured through variable renaming, method reordering, and whitespace changes. This article explains how AST comparison works, its strengths and limitations, and when to combine it with token-based methods for best results.

How One Bootcamp Built a Code Originality Pipeline Case Studies 9 min
Emily Watson Emily Watson 2 weeks ago

How One Bootcamp Built a Code Originality Pipeline

When CareerDevs Academy scaled from 30 to 200 students per cohort, their manual code review process couldn't keep up with plagiarism and improper code reuse. Here's how they built a tiered originality pipeline combining static analysis, similarity detection, and educational intervention — and what other programs can learn from their approach.

How Static Analysis Catches Plagiarized Code Before It Ships General 11 min
Emily Watson Emily Watson 1 month ago

How Static Analysis Catches Plagiarized Code Before It Ships

Plagiarism isn't just a classroom problem. When code from Stack Overflow, GitHub repos, or contractor deliverables enters your production codebase without proper attribution, you risk license violations, IP disputes, and technical debt. This guide shows how static analysis tools detect copied code before it ships, using token matching, AST comparison, and dependency scanning.

A Checklist for Evaluating AI Code Detection Tools AI Detection 9 min
Emily Watson Emily Watson 2 months ago

A Checklist for Evaluating AI Code Detection Tools

Not all AI detection tools are created equal, and a single "accuracy" number is dangerously misleading. This article provides a practical, seven-point checklist for evaluating AI-generated code detectors, covering everything from cross-language support and prompt sensitivity to campus-specific deployment constraints.

Your Codebase Is Full of Stolen Web Snippets General 8 min
Emily Watson Emily Watson 2 months ago

Your Codebase Is Full of Stolen Web Snippets

A developer copies a slick animation from CodePen. Another integrates a jQuery plugin from a blog. These everyday acts are quietly filling your codebase with unlicensed, potentially toxic code. This guide shows you how to find it, assess the risk, and clean it up before it triggers a legal notice.

The Assignment That Taught Students How to Cheat Academic Integrity 6 min
Emily Watson Emily Watson 2 months ago

The Assignment That Taught Students How to Cheat

A well-intentioned "cheat-proof" programming project at a top-tier university inadvertently became a masterclass in sophisticated plagiarism. The fallout revealed a critical gap in how we teach and assess code integrity, forcing a department-wide reckoning on what originality really means in software.

AI Detection Is a Distraction From Real Code Integrity Academic Integrity 5 min
Emily Watson Emily Watson 3 months ago

AI Detection Is a Distraction From Real Code Integrity

The industry's panic over ChatGPT is a shiny object distracting us from the foundational rot in how we assess code quality and originality. We're chasing ghosts while ignoring the rampant, mundane plagiarism and technical debt that's been crippling software projects and student learning for decades. True integrity requires looking beyond the AI hype.

Your AI Detection Tool Is Missing These 8 Code Patterns AI Detection 7 min
Emily Watson Emily Watson 3 months ago

Your AI Detection Tool Is Missing These 8 Code Patterns

AI-generated code is evolving past simple pattern matching. The latest models produce code that passes basic similarity checks but reveals its origin through deeper, more subtle signatures. We dissect eight specific, often-overlooked patterns that separate human logic from machine-generated output.