Detect plagiarised and similar code across trillions of code sources on the web See what's new
Alex Petrov

Alex Petrov

Detection Systems Engineer at Codequiry

Alex focuses on refactoring-resistant similarity detection and benchmarking Codequiry against tools like MOSS, JPlag and Dolos.

Articles by Alex Petrov

How Much Copied Stack Overflow Code Do Plagiarism Tools Actually Catch General 10 min
Alex Petrov Alex Petrov 1 day ago

How Much Copied Stack Overflow Code Do Plagiarism Tools Actually Catch

Traditional similarity tools like MOSS and JPlag compare student submissions against each other but leave a massive blind spot: code copied directly from Stack Overflow, GitHub repositories, and online tutorials. This article examines how web source detection works, what it catches that peer comparison misses, and why both approaches together give you the real picture of code originality.

How Code Similarity Checks Catch Open Source License Violations General 9 min
Alex Petrov Alex Petrov 6 days ago

How Code Similarity Checks Catch Open Source License Violations

Code similarity analysis isn't just for catching student plagiarism. Organizations use the same techniques to identify GPL and other open source license violations in their proprietary codebases. This article walks through the algorithms, real-world cases, and practical workflows for automated license compliance auditing.

Can AST Comparison Survive Student Code Obfuscation General 3 min
Alex Petrov Alex Petrov 1 week ago

Can AST Comparison Survive Student Code Obfuscation

Students often try to hide copied code by renaming variables, restructuring loops, or inserting dead code. AST-based comparison resists many of these tricks, but some deliberate obfuscation—like flattening control flow or converting recursion to iteration—can still produce a false negative. This article examines where AST engines excel, where they fall short, and how combining structural matching with token signatures catches the most clever attempts.

How to Design Assignments That Resist Code Plagiarism Academic Integrity 9 min
Alex Petrov Alex Petrov 1 week ago

How to Design Assignments That Resist Code Plagiarism

Simple changes to assignment design—unique interfaces, randomized test harnesses, and automated similarity checks—drastically reduce code plagiarism. This guide walks through six concrete tactics with real code examples and grading workflows.

What 4,200 Python Submissions Tell Us About Code Reuse Case Studies 7 min
Alex Petrov Alex Petrov 1 week ago

What 4,200 Python Submissions Tell Us About Code Reuse

By aggregating similarity scores across 4,200 student Python submissions over three semesters, we uncovered distinct copy-paste behaviors tied to assignment type, submission deadline, and language features. This practical guide walks through the exact process of running a large-scale code reuse audit using Codequiry’s output and Python data analysis, then shows how to turn those numbers into actionable course design decisions.

Automated Code Similarity Checks in a CI Lab Pipeline Tutorials 7 min
Alex Petrov Alex Petrov 2 weeks ago

Automated Code Similarity Checks in a CI Lab Pipeline

Setting up automated code plagiarism and similarity checks inside a CI pipeline cuts manual grading time and catches copying that individual reviewers miss. This practical guide walks through the architecture, tooling choices, and honest tradeoffs of running MOSS, JPlag, or Codequiry’s API on every lab push.

How Automatic Grading Evolved From Scripts to Integrity Pipelines Academic Integrity 9 min
Alex Petrov Alex Petrov 1 month ago

How Automatic Grading Evolved From Scripts to Integrity Pipelines

A retrospective on automatic grading in computer science education—from shell scripts comparing output strings to modern platforms combining unit tests, static analysis, and code similarity detection. What we gained, what we lost, and why integrity pipelines matter more than ever.

Building a Source Code Provenance Pipeline for Contractor Deliverables Tutorials 10 min
Alex Petrov Alex Petrov 1 month ago

Building a Source Code Provenance Pipeline for Contractor Deliverables

When contractors deliver source code, verifying originality and license compliance is critical. This guide walks through building an automated provenance pipeline that checks for code similarity, license violations, and proper attribution before accepting deliverables into your codebase.

How to Build a Source Code Similarity Pipeline for Detection Tutorials 12 min
Alex Petrov Alex Petrov 1 month ago

How to Build a Source Code Similarity Pipeline for Detection

A step-by-step guide to building a source code similarity detection pipeline from scratch. Covers tokenization, AST comparison, Winnowing fingerprinting, and heuristic scoring. Includes working Python code and configuration strategies used by universities and enterprises.

Your Open Source License Is a Social Contract, Not a Rulebook General 6 min
Alex Petrov Alex Petrov 2 months ago

Your Open Source License Is a Social Contract, Not a Rulebook

We treat open source licenses like a tax code to be audited, scanning for SPDX tags and copyright headers. This legalistic approach is creating compliant but ethically bankrupt software. True compliance isn't about checking boxes—it's about understanding and honoring the social intent behind the GPL, MIT, or Apache licenses. It's time to scan for spirit, not just the letter.

Your Static Analysis Tool Is Missing the Real Code Smells General 8 min
Alex Petrov Alex Petrov 2 months ago

Your Static Analysis Tool Is Missing the Real Code Smells

Most static analysis tools flag trivial style issues while missing the architectural rot that cripples productivity. This guide shows you how to detect the five structural code smells that genuinely predict development slowdowns and defect clusters. We'll walk through real code, build custom detection rules, and integrate findings into your CI/CD pipeline.

Your Static Analysis Tool Is Lying to You About Code Smells General 6 min
Alex Petrov Alex Petrov 2 months ago

Your Static Analysis Tool Is Lying to You About Code Smells

A 2024 study of 12 million static analysis warnings found that the majority of flagged "code smells" have zero correlation with actual defects. We're drowning in false positives, wasting developer time, and missing the real architectural rot. It's time to audit your tool's configuration before it audits your team's productivity.