Your Website's JavaScript Was Stolen Last Month

You launch a slick, interactive data visualization component. Six weeks later, a direct competitor rolls out an update. Their new dashboard module feels… familiar. Too familiar. The animation easing is identical. The edge case it fails on is the same bizarre one you spent a day fixing. The payload structure it expects matches your internal API spec.

Your JavaScript has been stolen.

This isn't about copying a few lines from Stack Overflow. This is the wholesale theft of proprietary client-side logic, the unique IP that differentiates your web application. The code will be minified, obfuscated, and possibly transpiled. A simple `diff` or text search is useless. You need a forensic method.

Web code plagiarism is a silent epidemic. It's easier to steal 50KB of minified JS than to hire a senior front-end developer for six months. Most victims never know.

This guide is for engineering leads, CTOs, and senior developers who need to move from suspicion to evidence. We’ll walk through a seven-step tactical workflow to dissect stolen web code. You’ll need the Chrome DevTools, a basic understanding of ASTs, and about an hour.

Step 1: Gather Your Evidence — The Source Corpus

First, isolate the potentially stolen asset. Be specific. Is it the React hook managing real-time form validation? The WebGL shader for the product configurator? The vanilla JS module handling your custom virtualized list?

Locate your original source. Pull the exact version you suspect was copied from your repo. If you've updated since, use git to check out the historical version. You need the source before minification.

Capture the suspect's code. Open their site in Chrome. In DevTools (F12), navigate to the Sources tab. You'll likely see a single, minified `.js` file. Right-click it and select Save as... to download it. Name it `suspect.min.js`. Also, use the Network tab to note any other relevant script files.

If the code is behind authentication, you'll need to script this. Use Puppeteer:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  
  // Navigate and log in if needed
  await page.goto('https://competitor.com/login');
  await page.type('#email', '[email protected]');
  await page.type('#password', 'password');
  await page.click('button[type="submit"]');
  await page.waitForNavigation();
  
  // Go to the page with the suspect code
  await page.goto('https://competitor.com/feature');
  
  // Extract all script content
  const scripts = await page.evaluate(() => {
    return Array.from(document.scripts).map(script => script.src || script.innerHTML);
  });
  
  console.log(scripts);
  await browser.close();
})();

Step 2: Normalize the Battlefield — De-minify and De-obfuscate

You can't compare a novel to a pile of alphabet soup. Minification strips whitespace, shortens variables, and destroys formatting. Obfuscation actively tries to mislead. We must reverse this.

Use a reliable de-minifier. For JavaScript, Prettier is your first tool. It won't recover original names, but it will reformat the code into a readable structure.

# Using prettier via CLI
npx prettier --write suspect.min.js --parser babel

This creates a `suspect.min.js` file with proper indentation. Now, look for patterns of obfuscation:

Hex-encoded strings: `\x68\x65\x6c\x6c\x6f` instead of `"hello"`.
Array-based string lookups: `_0x1a2b3c[0x45]` where `_0x1a2b3c` is an array of strings.
Excessive function wrapping: `(function(_0x12ab, _0x34cd){ ... })(0xfa, 0xde);`

For simple obfuscation, you can write a small script to decode hex strings or evaluate simple array lookups. For heavy obfuscation, tools like de4js or browser-based deobfuscators can help, but be cautious running unknown code.

The goal isn't perfect reconstruction. It's to get the logic into a comparable state. Your output should be a file named `suspect_normalized.js`.

Step 3: Abstract the Logic — Generate Syntax Trees

This is the core of the forensic method. We stop looking at text and start looking at structure. A thief can change every variable name, but the logical skeleton—the Abstract Syntax Tree (AST)—often remains intact.

We'll use Python's `ast` module for your original source and `esprima` for the JavaScript suspect code.

For your Python/Java/C++ original (if applicable, or a JS prototype):

import ast

with open('your_original_code.py', 'r') as f:
    source = f.read()

tree = ast.parse(source)
# Now you can walk this tree, extracting function definitions, control flow, etc.

For the normalized suspect JavaScript:

const esprima = require('esprima');
const fs = require('fs');

const jsCode = fs.readFileSync('suspect_normalized.js', 'utf8');
const ast = esprima.parseScript(jsCode, { tolerant: true });

fs.writeFileSync('suspect_ast.json', JSON.stringify(ast, null, 2));

Do the same for your original JavaScript source, generating `original_ast.json`. You now have two JSON files representing the code's bones.

Step 4: Identify the Fingerprint — Key Structural Markers

Even with changed logic order, certain complex structures are like fingerprints. They are highly unique and unlikely to be independently recreated. Search both ASTs for these markers:

Specific Algorithmic Flaws: Did you implement a slightly non-standard version of A* search with a particular heuristic? That's a fingerprint. Look for the function structure and the unusual condition inside the main loop.
Bizarre Workarounds: That hack you added for iOS Safari 14.2? The one involving `setTimeout` with a 17ms delay and a `transform: translateZ(0)`? That's a smoking gun. Find the sequence of operations.
Custom Data Structure Shapes: An object with properties `{ hash: string, meta: { rev: number, src: string }, nodes: Array }` is generic. An object with `{ _c: string, _m: { _r: number, _s: string }, _n: Array }` where the shorthand names match your internal convention? Highly suspect.
Comment Artifacts: Sometimes, comments survive minification. Search `suspect_normalized.js` for any string literals that look like old comments: `// TODO`, `// FIXME`, `// HACK for Chrome 91`.

Create a checklist. For each potential fingerprint, note its location in your AST and search the suspect's AST for a structurally isomorphic pattern.

Step 5: Isolate the Clone — Function and Block Matching

Now we perform a targeted comparison. Let's say you suspect the `calculateRiskScore` function was stolen.

Extract its core logic block from your AST. Strip out variable names. Represent it as a sequence of node types and their relationships. For example, your function might have the pattern: [FunctionDeclaration] -> [IfStatement] -> [BinaryExpression] -> [CallExpression] -> [MemberExpression].

Write a small script to traverse the suspect's AST looking for a subtree that matches this pattern. Tools like ast-grep or writing a recursive traversal function work here.

// Pseudo-code for AST subtree matching
function findPattern(node, pattern) {
  if (node.type !== pattern.type) return false;
  // For key nodes like IfStatement, check the test/body structure
  if (node.type === 'IfStatement') {
    return findPattern(node.test, pattern.test) &&
           findPattern(node.consequent, pattern.consequent);
  }
  // For leaf nodes, we might just match type
  if (isLeafNode(node.type)) return true;
  // ... more logic
}

// Search the suspect AST
search(suspectAST, myFunctionPattern);

This is where platforms like Codequiry perform at scale, using multiple algorithms (token-based, tree-based, fingerprinting) to find these non-textual similarities automatically across vast codebases.

Step 6: Construct the Timeline — Prove Access and Opportunity

Technical similarity is one pillar. You must also establish the possibility of copying. This is critical for any legal or official action.

Repository Dates: Your commit history shows the function was created on 2023-11-05.
Deployment Dates: Your feature went live on your public-facing site on 2023-11-20.
Their Deployment: Their feature launched on 2024-01-15.
Access Proof: Can you prove they visited your public feature? Server logs might show user-agent strings matching their company IP block. More commonly, the public availability of your source (via "View Source") is sufficient.

Create a simple timeline diagram. The gap between your public release and their launch is the "window of theft."

Step 7: Document and Act — The Forensic Report

Compile your findings into a clear, technical report. This isn't a rant. It's evidence.

Structure your report:

Executive Summary: One paragraph stating the claim.
Methodology: Briefly describe the AST normalization and comparison process.
Evidence Section A (Structural): Side-by-side code snippets (after normalization) with matching structures highlighted. Include the AST subtree diagrams. Number each matching fingerprint (1-5).
Evidence Section B (Timeline): The deployment timeline chart.
Conclusion: A statement that the combination of unique structural fingerprints and temporal opportunity indicates derivation.

What to do with the report?

Internal: Use it to start a conversation about code protection: stricter license headers, more aggressive obfuscation for critical bundles, or implementing code watermarking techniques.
Legal: Provide it to your legal counsel. They may use it to draft a cease-and-desist letter. Your clear technical documentation strengthens their position immeasurably.
Public: Almost never. Public shaming can backfire and lead to protracted, damaging conflicts.

Moving Forward: Protection Over Detection

Catching theft is satisfying. Preventing it is smarter.

Implement build-time watermarking: Inject unique, recoverable identifiers into your bundled code that are hard to strip. This can be as simple as a unique constant in a closure or a complex steganographic technique within string literals.

// At build time, inject a fingerprint
const BUILD_FINGERPRINT = {
  v: '1.0',
  id: 'a1b2c3d4-e5f6-7890-abcd-ef1234567890',
  ts: 1700000000000
};
Object.freeze(BUILD_FINGERPRINT);
// Then, somewhere in your init, a no-op reference:
const _ = BUILD_FINGERPRINT.id;

License your client-side code: Add a prominent software license comment at the top of your main bundle, even if minified. It sets a legal expectation.

Monitor automatically: Set up a periodic scan. Once a month, use your own forensic pipeline (or a dedicated service) to check competitor sites for structural matches to your core modules. Make it a dashboard metric: "Code Integrity Risk."

Web code plagiarism isn't a theoretical issue. It's a direct transfer of value from your engineering team to a competitor. By adopting a forensic, evidence-based approach, you shift from feeling violated to being in control. You get the proof. Then you decide what to do with it.