🔍 MrAnalysis Engine

Site Comparison Engine

Analyze and compare directory pages across multiple domains. Ensure unique content footprints while avoiding search engine penalties for duplicate content.

5
Analysis Dimensions
99%
Accuracy Rate
7
Comparisons Analyzed
Latest: Aug 20, 2025, 04:43 PM - Analyze 20250820164057

Overall Average:

78%
High
HIGH RISK
Color Similarity 77%
Overall color scheme matching across pages
Typography 97%
Font families and text styling consistency
Layout Structure 99%
Page layout and element positioning

Key Insights

  • 🚨 Extremely high layout similarity - potential design copying
  • 📝 Typography patterns show high similarity
  • 🎨 Color palette analysis completed

Overall Average:

64%
Moderate
MODERATE RISK
Jaccard Similarity 65%
Content overlap and duplicate text analysis
Cosine Similarity 88%
Semantic content meaning and context matching
Topic Modeling 65%
Thematic content analysis and subject matter

Key Insights

  • 📄 Semantic content analysis completed
  • 📄 Moderate content overlap detected

Overall Average:

83%
High
HIGH RISK
HTML Structure 100%
Page markup structure and DOM hierarchy
Meta Tags 64%
Meta information and head tag matching
Frameworks 100%
JavaScript frameworks and library usage

Key Insights

  • 🚨 Extremely high HTML structure similarity - potential template copying
  • ⚡ Perfect framework match - identical technical stack
  • 🏷️ Meta tag analysis completed

Overall Average:

64%
Moderate
MODERATE RISK
Duplicate Content Risk 64%
Risk of search engine duplicate content penalties
Keyword Cannibalization 13%
Pages competing for the same search terms
Technical SEO Issues 27%
Missing meta tags and technical SEO problems

Key Insights

  • ⚠️ HIGH duplicate content risk detected (64%)
  • 🔑 6 high-priority SEO issues identified

Why Choose Our Analysis Engine?

🎯

Unique Content Footprint

Ensure each domain maintains distinct content while avoiding search engine penalties for duplication.

Multi-Dimensional Analysis

Five-layer comparison across visual, content, technical, structural, and SEO dimensions for complete insights.

Scalable Performance

Analyze thousands of pages efficiently with memory-optimized processing and batch operations.

📊

Actionable Reports

Content audit-ready insights with clear recommendations for improving uniqueness.

🔍

SEO Optimization

Comprehensive SEO analysis with duplicate content detection, keyword cannibalization prevention, and actionable improvement roadmaps.

📊

Priority-Based Actions

Get critical, high, medium, and low priority action items with specific timelines and implementation roadmaps.

How The Analysis Works

Our engine analyzes websites across four key dimensions to provide comprehensive similarity detection

1

🌐 Scraping

Capture website data

Visits each URL and captures content, visual elements, and technical metadata across desktop, tablet, and mobile viewports with Cloudflare bypass.

Multi-viewport Anti-detection Auto-retry
2

🎨 Visual Analysis

Compare design & layout

Analyzes colors, typography, layouts, responsive design patterns, and visual hierarchy to detect template usage and design system similarities.

Color palettes Typography Layout detection
3

📝 Content Analysis

Examine text similarity

Uses advanced NLP algorithms including Jaccard similarity, cosine similarity, content fingerprinting, and semantic analysis to compare meaning and structure.

NLP algorithms Semantic analysis Topic modeling
4

⚙️ Technical Analysis

Inspect code structure

Examines HTML structure, frameworks, meta tags, schema markup, and technical implementation patterns to identify shared technologies.

Framework detection Meta analysis Schema markup
5

🔍 SEO Analysis

Optimize for search engines

Comprehensive SEO optimization recommendations including duplicate content detection, keyword cannibalization analysis, and actionable improvement plans with priority-based roadmaps.

Duplicate detection SEO scoring Action plans
6

📝 Final Reporting

Comprehensive, consolidated reports presented in a clear and actionable format

Our final reports aggregate all analysis results, providing a unified overview of content, visual, technical, and SEO findings. Each report is structured for readability, highlighting key issues, recommended actions, and prioritized next steps to support efficient decision-making and implementation.

Analysis of domains and pages Prioritized Recommendations Overview Plans

Understanding the Metrics

📝 Content Metrics

Jaccard Similarity

Word overlap detection for identifying copied or templated content. High scores (>0.8) indicate very similar vocabulary.

Cosine Similarity

Document similarity using word frequency vectors. Better than Jaccard as it accounts for word frequency, not just presence.

Semantic Analysis

Meaning and context similarity across headings, paragraphs, and navigation structure.

🎨 Visual Metrics

Color Similarity

RGB color distance analysis across backgrounds, text, and accent colors. High scores indicate same design systems.

Typography Matching

Font families, sizes, and weights comparison to detect shared typographic systems.

Responsive Design

Consistency analysis across viewports to identify shared responsive frameworks.

⚙️ Technical Metrics

Framework Detection

Identifies React, Vue, Angular, jQuery, Bootstrap usage with weighted scoring for major frameworks.

Meta Tag Analysis

SEO and metadata comparison including viewport, robots, og:tags, and twitter:cards.

Schema Markup

Structured data implementation analysis for SEO and rich snippet strategies.

🔍 SEO Metrics

Duplicate Content Risk

Critical, high, medium, and low risk scoring for content similarity across domains with actionable recommendations.

Keyword Cannibalization

Detection of conflicting keyword strategies and recommendations for unique content positioning.

SEO Optimization

Title tags, meta descriptions, heading structure, and content quality analysis with specific improvement actions.

SEO Analysis Features

✅ Individual Page Analysis

  • Title Tag Optimization: Length, keyword presence, uniqueness scoring
  • Meta Description Analysis: Call-to-action and keyword optimization
  • Heading Structure: H1-H3 hierarchy and keyword distribution
  • Content Quality: Word count, density, readability metrics
  • Technical SEO: Meta tags, structured data, performance indicators

⚖️ Cross-Site Comparison

  • Duplicate Content Detection: Risk scoring and factor identification
  • Keyword Cannibalization: Conflicting strategy detection
  • Uniqueness Gap Analysis: Content differentiation opportunities
  • Structural Similarity: Layout and element comparison
  • Priority Action Plans: Critical to low priority roadmaps

SEO Scoring System

90-100%
Excellent

Optimal SEO implementation

70-89%
Good

Minor improvements needed

50-69%
Moderate

Significant optimization required

0-49%
Poor

Major improvements needed

Similarity Classifications

90-100%
🚨 IDENTICAL

Likely duplicate sites or same template. Immediate action required to avoid SEO penalties.

70-89%
⚠️ VERY SIMILAR

Strong template/content overlap. Review and differentiate key elements and content.

50-69%
📊 SIMILAR

Moderate similarities with possible shared elements. Monitor and optimize for uniqueness.

<50%
✅ DIFFERENT

Sufficiently unique sites. Good differentiation achieved across analyzed dimensions.

Technical Specifications

📋 System Requirements

  • Node.js: v16.0.0 or higher
  • Memory: 8GB RAM minimum (16GB recommended)
  • Storage: 2GB free space for cache
  • Browser: Chrome/Chromium (auto-installed)
  • Network: Stable internet connection

⚡ Performance Features

  • Batch Processing: Configurable batch sizes
  • Memory Optimization: Light versions available
  • Caching: Intelligent analysis caching
  • Retry Logic: Exponential backoff for failures
  • Multi-viewport: Desktop, tablet, mobile analysis

🚀 Quick Start

# Install dependencies
npm install

# Run complete analysis pipeline
npm run SAR

# Scrape and Analyze from custom folders
npm run SAR -- ./sites_urls --dest=scraped_20250708

# Only scrape, into custom folder
npm run scrape-all ./urls -- --folder=scraped_20250708

Ready to Analyze Your Sites?

Start identifying content similarities and ensuring unique content footprints across your domains.

View Style Guide