Detect plagiarised and similar code across trillions of code sources on the web See what's new
James Okafor

James Okafor

Developer Advocate at Codequiry

James writes about code integrity for practicing engineers and helps teams wire Codequiry into their CI and review pipelines.

Articles by James Okafor

How Code Similarity Detection Advanced From Strings to Semantics General 8 min
James Okafor James Okafor 1 hour ago

How Code Similarity Detection Advanced From Strings to Semantics

From manual diff checks to AI-powered semantic analysis, code plagiarism detection has undergone a fundamental transformation. This article traces the key milestones—MOSS, JPlag, AST fingerprinting, and the new frontier of LLM-written code—and explains why a single method is no longer enough.

Cross-Language Code Plagiarism Detection Methods Tested General 8 min
James Okafor James Okafor 1 week ago

Cross-Language Code Plagiarism Detection Methods Tested

A rigorous head-to-head comparison of three cross-language code plagiarism detection approaches—tokenization, AST matching, and semantic fingerprinting—tested on 100 student-style assignments translated between Java, Python, and C++. We reveal which method catches translated loops, renamed variables, and switched control flow, and which one drowns in false positives.

What 4,300 JavaScript Projects Reveal About Code Copying Case Studies 10 min
James Okafor James Okafor 1 month ago

What 4,300 JavaScript Projects Reveal About Code Copying

A large-scale study of 4,300 open source JavaScript repositories reveals the true nature of code copying in modern software development. The findings challenge assumptions about originality, attribution, and the tools we use to detect plagiarism.

What Open Source Licenses Actually Enforce in Court General 10 min
James Okafor James Okafor 1 month ago

What Open Source Licenses Actually Enforce in Court

An analysis of 47 open source license enforcement cases from 2008 to 2023 reveals surprising patterns: most violations aren't willful, GPL enforcement rarely goes to trial, and MIT license cases are rising faster than any other. Here's what the data says about what licenses actually enforce in practice versus what developers assume.

When Is Peer Similarity Enough in a Plagiarism Checker General 13 min
James Okafor James Okafor 2 months ago

When Is Peer Similarity Enough in a Plagiarism Checker

Source code plagiarism detection relies on two fundamentally different reference sets: peer submissions and the open web. This article examines the trade-offs between each approach, when one method catches cheating the other misses, and how to build detection strategies that combine both for maximum coverage.

Can Dev Teams Trust Code Similarity for IP Theft Detection General 8 min
James Okafor James Okafor 2 months ago

Can Dev Teams Trust Code Similarity for IP Theft Detection

Code similarity analysis has long been a staple of academic integrity enforcement, but enterprises face a harder problem: detecting IP theft, insider leaks, and unlicensed reuse in complex, multi-repo codebases. This post examines the practical limitations and proper applications of similarity detection for proprietary software, from AST comparison to dependency graph analysis.

The Assignment That Broke a University's Honor Code Academic Integrity 7 min
James Okafor James Okafor 2 months ago

The Assignment That Broke a University's Honor Code

A third-year data structures course at a prestigious university became ground zero for a cheating scandal that traditional tools missed. The fallout wasn't about catching individuals—it was about discovering a broken culture. This is the story of how they rebuilt their standards from the ground up.

Your Static Analysis Tool Is Lying to You About Code Smells General 6 min
James Okafor James Okafor 2 months ago

Your Static Analysis Tool Is Lying to You About Code Smells

The industry's obsession with counting "code smells" is a dangerous distraction. We're measuring the wrong things, creating false confidence, and missing the systemic rot that actually slows down development. It's time to stop trusting the simplistic metrics and start analyzing what really matters: semantic duplication and logical debt.

Your Codebase Is a Patchwork of Stolen Web Snippets General 9 min
James Okafor James Okafor 2 months ago

Your Codebase Is a Patchwork of Stolen Web Snippets

Your developers aren't writing code. They're assembling it from a thousand forgotten browser tabs. The average codebase contains hundreds of unlicensed, unvetted, and potentially dangerous snippets copied directly from the web. This isn't just about plagiarism—it's about technical debt, security vulnerabilities, and legal liability woven directly into your application's DNA.

The Assignment That Broke Every Plagiarism Checker General 7 min
James Okafor James Okafor 3 months ago

The Assignment That Broke Every Plagiarism Checker

Professor Elena Vance thought her data structures assignment was cheat-proof. Then she discovered a student had submitted code that passed MOSS, JPlag, and even Codequiry's initial scan. The incident revealed a new, sophisticated form of code plagiarism that's spreading across computer science departments. This is the story of how one university adapted its entire integrity strategy.

Your Static Analysis Tool Is Lying to You About Security General 10 min
James Okafor James Okafor 3 months ago

Your Static Analysis Tool Is Lying to You About Security

Static analysis tools promise a fortress of security but often deliver a Potemkin village. They generate thousands of warnings while missing the subtle, architectural vulnerabilities that lead to real breaches. This deep-dive exposes the fundamental gaps in token-based scanning and charts a path toward analysis that actually understands code intent and data flow.

The 37% Problem in Your Intro to Java Course Academic Integrity 2 min
James Okafor James Okafor 3 months ago

The 37% Problem in Your Intro to Java Course

A 2023 multi-university study found that 37% of introductory programming submissions showed signs of unauthorized collaboration, undetected by traditional string-matching tools. The culprit isn't copy-paste—it's structural plagiarism, where students share solutions and rewrite them line-by-line. Here’s how algorithms that compare Abstract Syntax Trees are exposing this silent epidemic.