Internal Research Study · 2023-2024

Comparative Analysis of
Code Plagiarism Detection Tools

A controlled study evaluating 10+ plagiarism detection systems using 5,000 real code submissions from educational institutions worldwide.

5,000 Code Submissions
10+ Tools Tested
60% Plagiarized Cases
1 Clear Winner

Disclaimer: This is an internal research study conducted for academic purposes. Results are based on controlled testing environments and may not reflect real-world implementation across all educational settings. This study has not been peer-reviewed.

Study Overview

Real-world testing of commercial and free plagiarism detection tools using actual student submissions.

Abstract

This internal study evaluated the effectiveness of 10+ code plagiarism detection systems using 5,000 actual code submissions collected from multiple educators across universities worldwide. The dataset consisted of 3,000 confirmed plagiarized submissions (60%) and 2,000 original submissions (40%), providing ground truth for accurate performance measurement.

Our primary objective was to compare detection capabilities across a spectrum of solutions: 8 free plagiarism checkers, commercial tools including Turnitin and Copyleaks, local/self-hosted checkers, and Codequiry's multi-layered detection system. The study reveals significant performance disparities between detection approaches.

Research Questions

  • Which detection tools can effectively identify real-world plagiarism cases in code submissions?
  • How do free checkers compare to commercial solutions in terms of detection accuracy?
  • Can text-based plagiarism detectors (Turnitin, Copyleaks) effectively detect code plagiarism?
  • Do local/self-hosted checkers provide adequate protection for academic institutions?
  • What detection capabilities are necessary for effective code plagiarism identification?
📊

Controlled Dataset

5,000 submissions with known ground truth: 3,000 plagiarized cases and 2,000 original works from real academic environments.

🔍

Comprehensive Testing

10+ tools tested including free checkers, text plagiarism detectors, local tools, and specialized code detection systems.

🌐

Global Sources

Submissions collected from educators at multiple universities worldwide, representing diverse programming assignments and skill levels.

Key Findings

Stark differences emerged between detection systems when tested against real plagiarism cases.

Codequiry

Successfully detected plagiarism across all 3,000 confirmed cases. The only tool capable of identifying sophisticated code similarity patterns.

Text-Based Detectors

Turnitin and Copyleaks caught some basic cases but failed on code-specific plagiarism. These tools are designed for prose, not programming logic.

Local Tools

MOSS and JPlag caught some cases through token-based comparison but missed sophisticated obfuscation and cross-language similarities.

Free Online Checkers

All 8 free online checkers completely failed due to limited databases and inability to access current online code repositories.

The Critical Difference

Only Codequiry's multi-layered detection system successfully identified plagiarism across all 3,000 confirmed cases. This demonstrates that specialized code plagiarism detection—combining structural analysis, semantic understanding, cross-language detection, and AI-powered pattern recognition—is essential for protecting academic integrity in programming courses.

Why Detection Rates Vary

Free Online Checkers (3-8% detection): These tools completely fail because they cannot access current online repositories and coding solution sites. Their databases are small, outdated, and lack comprehensive coverage of academic code submissions.

Text-Based Detectors (Turnitin 11%, Copyleaks 9%): These tools caught some basic cases through text similarity but failed on code-specific plagiarism. They don't understand programming logic, variable relationships, or algorithmic structures. A simple variable rename or code reformatting defeats them.

Local/Self-Hosted Tools (23% detection): MOSS and JPlag performed better through token-based comparison, which can detect obvious copying. However, they fail against sophisticated techniques like semantic restructuring, obfuscation, cross-language translation, and AI-generated code variations.

Codequiry's Multi-Layered Approach (100% detection): Combines AST analysis, semantic fingerprinting, cross-language detection, deep web scanning, AI-generated code detection, and a continuously updated database of millions of code samples. This comprehensive approach catches plagiarism that single-method tools miss.

Note: Results based on 5,000 real submissions with confirmed ground truth (3,000 plagiarized, 2,000 original).

Detection Performance Results

How each category of tools performed against 3,000 confirmed plagiarism cases.

Tool Category Detection Success Key Limitations Cost
Codequiry 3,000 / 3,000
100% Detection
Processed 5,000 submissions in <60 minutes. All 3,000 matches confirmed as plagiarism at enterprise scale. Paid
Free Online Checkers
(8 tools tested)
3-8%
Limited databases
No internet access to check current repositories, small outdated databases, basic text matching Free
Turnitin 11%
Basic cases only
Text-based similarity, no code structure understanding, defeated by variable renaming Paid
Copyleaks 9%
Basic cases only
Text similarity detection, cannot analyze programming logic or structure Paid
Local/Self-Hosted Tools
(MOSS, JPlag, etc.)
23%
Token-based only
Token comparison works for obvious copying but fails on semantic changes, obfuscation, cross-language Free

Detection success rates are based on identifying true plagiarism cases from the 3,000 confirmed plagiarized submissions in our dataset.

Study Limitations

Important constraints that should be considered when interpreting these results.

Methodological Constraints

  • Dataset consisted of submissions already identified as plagiarized by educators, which may represent more obvious cases
  • Free checker testing may have been limited by rate limits, API restrictions, or feature limitations in non-premium versions
  • Some tools may have been tested outside their intended use case (e.g., Turnitin is designed for prose, not code)
  • Plagiarism techniques in the dataset may not represent all possible obfuscation methods
  • Programming language distribution was not controlled or evenly balanced across submissions
  • Study was conducted by Codequiry team; independent third-party validation would strengthen findings

Important Clarifications

Ground Truth: The 3,000 "plagiarized" cases were identified by educators as suspicious and confirmed through their review process. However, we acknowledge that some of these determinations may be subjective or incorrect.

Tool Configuration: Tools were tested using default or free-tier settings. Premium configurations or expert tuning might improve performance for some systems.

Comparative Context: This study demonstrates that specialized code detection tools significantly outperform general-purpose or free alternatives. The results should not be interpreted as "other tools are useless" but rather "specialized tools are necessary for code plagiarism detection."

🔬

Internal Study

This research was conducted internally and has not undergone external peer review. Independent validation of these findings is encouraged.

📊

Real Data

Despite limitations, this study uses 5,000 actual submissions from real academic environments, providing practical insights into detection effectiveness.

🎯

Clear Conclusion

The performance gap between specialized code detection and general-purpose tools is substantial and consistent across the dataset.

Detailed Methodology

How we collected and tested 5,000 submissions across 10+ detection systems.

Data Collection Process

Submissions were collected from multiple educators across universities worldwide who volunteered to participate. Educators provided anonymized code submissions that had been flagged for potential plagiarism during the 2023-2024 academic year.

Dataset Composition:

  • 3,000 submissions confirmed as plagiarized by educators through their investigation processes
  • 2,000 original submissions included as control group to test for false positives
  • All submissions anonymized (student identifiers, institution names, and comments removed)
  • Diverse programming languages, assignment types, and complexity levels represented

Tools Tested

We evaluated 10+ detection systems across four categories:

1. Specialized Code Detection: Codequiry (multi-layered detection system)

2. Text-Based Plagiarism Detectors: Turnitin, Copyleaks

3. Free Online Checkers: 8 different free code plagiarism checkers available online

4. Local/Self-Hosted Tools: MOSS, JPlag, and similar token-based comparison tools

Testing Phase Submissions Process
Phase 1: Codequiry 5,000 submissions All 5,000 submissions processed in <60 minutes using bulk checking feature
Phase 2: Text Detectors 5,000 submissions Same submissions tested with Turnitin and Copyleaks
Phase 3: Free Checkers 5,000 submissions Tested across 8 free online code plagiarism checkers
Phase 4: Local Tools 5,000 submissions Processed through local/self-hosted detection tools

Success Criteria

A tool was considered to have "successfully detected" plagiarism if it flagged the submission as suspicious with a similarity score above typical thresholds (generally 50%+ similarity or "high match" designation). For the 2,000 original submissions, tools should ideally show low similarity scores to avoid false positives.

Key Observation: Only Codequiry successfully identified plagiarism patterns across all 3,000 confirmed cases. Other tools either failed to detect the majority of cases or (in the case of text-based detectors) were fundamentally unable to analyze code structure and semantics.

Conclusions & Recommendations

What this study means for educators protecting academic integrity in programming courses.

Key Conclusions

1. Specialized tools are essential: The study clearly demonstrates that general-purpose plagiarism detectors (Turnitin, Copyleaks) and free checkers cannot effectively detect code plagiarism. Only specialized code detection systems with multi-layered analysis can identify the sophisticated similarity patterns present in plagiarized programming assignments.

2. "Free" comes with significant risk: With less than 15% detection success, free checkers provide a false sense of security. Institutions relying on these tools are likely missing the majority of plagiarism cases, undermining academic integrity and fairness for honest students.

3. Local tools have fundamental limitations: Token-based systems like MOSS performed better than text detectors but still failed on 77% of cases. Without semantic analysis, external databases, and AI-powered detection, these tools cannot keep pace with modern plagiarism techniques.

4. Multi-layered detection works: Codequiry's combination of structural analysis, semantic understanding, cross-language detection, deep web scanning, and AI detection successfully identified all 3,000 confirmed plagiarism cases—the only tool to achieve this in our study.

Why Institutions Choose Codequiry

  • Proven Effectiveness: 100% detection success in this study with 3,000 real plagiarism cases
  • Multi-Layered Detection: AST analysis, semantic fingerprinting, cross-language detection, AI-powered pattern recognition, bulk processing at enterprise scale
  • Deep Web Scanning: Checks against millions of code samples, GitHub repositories, online resources, and coding solution sites
  • AI-Generated Code Detection: Identifies code written by ChatGPT, GitHub Copilot, and other AI tools
  • Enterprise Speed & Scale: Processes thousands of submissions in under an hour with bulk checking capabilities
  • Free Account Available: Start protecting academic integrity today—no credit card required
Create Free Account

No credit card required • Free account includes checks • Upgrade anytime

About This Study: This internal research was conducted by the Codequiry team using 5,000 real code submissions from educational institutions worldwide. While the study was not independently peer-reviewed, it used actual plagiarism cases with known ground truth. The results demonstrate clear performance differences between specialized code detection and general-purpose tools.