Study Overview
Real-world testing of commercial and free plagiarism detection tools using actual student submissions.
Abstract
This internal study evaluated the effectiveness of 10+ code plagiarism detection systems using 5,000 actual code submissions collected from multiple educators across universities worldwide. The dataset consisted of 3,000 confirmed plagiarized submissions (60%) and 2,000 original submissions (40%), providing ground truth for accurate performance measurement.
Our primary objective was to compare detection capabilities across a spectrum of solutions: 8 free plagiarism checkers, commercial tools including Turnitin and Copyleaks, local/self-hosted checkers, and Codequiry's multi-layered detection system. The study reveals significant performance disparities between detection approaches.
Research Questions
- Which detection tools can effectively identify real-world plagiarism cases in code submissions?
- How do free checkers compare to commercial solutions in terms of detection accuracy?
- Can text-based plagiarism detectors (Turnitin, Copyleaks) effectively detect code plagiarism?
- Do local/self-hosted checkers provide adequate protection for academic institutions?
- What detection capabilities are necessary for effective code plagiarism identification?
Controlled Dataset
5,000 submissions with known ground truth: 3,000 plagiarized cases and 2,000 original works from real academic environments.
Comprehensive Testing
10+ tools tested including free checkers, text plagiarism detectors, local tools, and specialized code detection systems.
Global Sources
Submissions collected from educators at multiple universities worldwide, representing diverse programming assignments and skill levels.