Content Analysis Summary

Content Optimized 15 Comparisons 2025-07-10T23:21:01.520Z

Processing Performance

260 Pages Processed
2s Avg Page Time
531ms Processing Speed

Content Analysis Algorithms

🎯

Jaccard Similarity

--
📐

Cosine Similarity

--
🔍

Fingerprint Match

--
🧠

Semantic Analysis

--
📚

Topic Modeling

--

Jaccard Similarity

Measures the similarity between two sets by dividing the intersection by the union. Perfect for comparing shared vs unique content elements. Higher scores indicate more shared content overlap.

Cosine Similarity

Calculates the cosine of the angle between two document vectors. Excellent for comparing text content regardless of document length. Values closer to 1 indicate very similar content themes.

Fingerprint Matching

Uses content hashing to detect exact or near-exact duplicate content blocks. Highly sensitive to copy-paste scenarios. Even small scores suggest potential plagiarism.

Semantic Analysis

Uses AI models to understand meaning and context beyond keywords. Detects paraphrased or rewritten content that maintains similar meaning. High scores indicate conceptual similarity.

Topic Modeling

Identifies underlying topics and themes across content. Groups content by subject matter similarity. Higher scores indicate sites covering similar topic areas or categories.

Content Similarity Analysis

instantcheckmate_20250710_1539_vs_truthfinder_20250710_1544

67% Overall Similarity
𝒊 Jaccard 76%
𝒊 Cosine 92%
𝒊 Fingerprint 1%
𝒊 Semantic 84%
𝒊 Topic 78%

intelius_20250710_1541_vs_truthfinder_20250710_1544

66% Overall Similarity
𝒊 Jaccard 67%
𝒊 Cosine 92%
𝒊 Fingerprint 2%
𝒊 Semantic 93%
𝒊 Topic 76%

instantcheckmate_20250710_1539_vs_intelius_20250710_1541

63% Overall Similarity
𝒊 Jaccard 66%
𝒊 Cosine 91%
𝒊 Fingerprint 1%
𝒊 Semantic 81%
𝒊 Topic 73%

intelius_20250710_1541_vs_whitepages_20250710_1546

34% Overall Similarity
𝒊 Jaccard 20%
𝒊 Cosine 75%
𝒊 Fingerprint 0%
𝒊 Semantic 29%
𝒊 Topic 48%

instantcheckmate_20250710_1539_vs_whitepages_20250710_1546

34% Overall Similarity
𝒊 Jaccard 20%
𝒊 Cosine 74%
𝒊 Fingerprint 0%
𝒊 Semantic 31%
𝒊 Topic 43%

beenverified_20250710_1538_vs_intelius_20250710_1541

32% Overall Similarity
𝒊 Jaccard 32%
𝒊 Cosine 69%
𝒊 Fingerprint 0%
𝒊 Semantic 10%
𝒊 Topic 50%

truthfinder_20250710_1544_vs_whitepages_20250710_1546

32% Overall Similarity
𝒊 Jaccard 20%
𝒊 Cosine 70%
𝒊 Fingerprint 0%
𝒊 Semantic 29%
𝒊 Topic 40%

beenverified_20250710_1538_vs_truthfinder_20250710_1544

32% Overall Similarity
𝒊 Jaccard 33%
𝒊 Cosine 68%
𝒊 Fingerprint 0%
𝒊 Semantic 9%
𝒊 Topic 48%

beenverified_20250710_1538_vs_instantcheckmate_20250710_1539

31% Overall Similarity
𝒊 Jaccard 32%
𝒊 Cosine 67%
𝒊 Fingerprint 0%
𝒊 Semantic 10%
𝒊 Topic 44%

beenverified_20250710_1538_vs_whitepages_20250710_1546

23% Overall Similarity
𝒊 Jaccard 17%
𝒊 Cosine 53%
𝒊 Fingerprint 0%
𝒊 Semantic 10%
𝒊 Topic 37%

instantcheckmate_20250710_1539_vs_truepeoplesearch_20250710_1543

9% Overall Similarity
𝒊 Jaccard 1%
𝒊 Cosine 30%
𝒊 Fingerprint 0%
𝒊 Semantic 0%
𝒊 Topic 10%

intelius_20250710_1541_vs_truepeoplesearch_20250710_1543

8% Overall Similarity
𝒊 Jaccard 1%
𝒊 Cosine 28%
𝒊 Fingerprint 0%
𝒊 Semantic 0%
𝒊 Topic 9%

truepeoplesearch_20250710_1543_vs_truthfinder_20250710_1544

8% Overall Similarity
𝒊 Jaccard 2%
𝒊 Cosine 27%
𝒊 Fingerprint 0%
𝒊 Semantic 0%
𝒊 Topic 10%

beenverified_20250710_1538_vs_truepeoplesearch_20250710_1543

7% Overall Similarity
𝒊 Jaccard 2%
𝒊 Cosine 21%
𝒊 Fingerprint 0%
𝒊 Semantic 0%
𝒊 Topic 9%

truepeoplesearch_20250710_1543_vs_whitepages_20250710_1546

3% Overall Similarity
𝒊 Jaccard 1%
𝒊 Cosine 10%
𝒊 Fingerprint 0%
𝒊 Semantic 0%
𝒊 Topic 5%

Content Analysis Insights

🚨 Risk Assessment

🔴 -- Critical Similarity
🟠 -- High Concern
🟡 -- Moderate Risk
🟢 -- Acceptable Difference

📊 Algorithm Performance

Most Sensitive: Cosine
Least Sensitive: Fingerprint
Best Detector: Semantic

🔍 Content Patterns

InstantCheckmate vs TruthFinder show highest similarity
TruePeopleSearch appears most unique
3 sites form a similarity cluster

💡 Recommendations

CRITICAL Investigate high-similarity pairs for potential copying
HIGH Review semantic matches for paraphrased content
MEDIUM Monitor cosine scores for content theme overlap