Text_Authenticator / docs /ARCHITECTURE.md
satyaki-mitra's picture
Architecture updated
44d0409

TEXT-AUTH: System Architecture Documentation

TEXT-AUTH is an evidence-first, domain-aware AI text detection system designed around independent signals, calibrated aggregation, and explainability rather than black-box classification.


Table of Contents

  1. System Overview
  2. High-Level Architecture
  3. Layer-by-Layer Architecture
  4. Data Flow
  5. Technology Stack

System Overview

TEXT-AUTH is a sophisticated AI text detection system that employs multiple machine learning metrics and ensemble methods to determine whether text is synthetically generated, authentically written, or hybrid content.

Key Capabilities

  • Multi-Metric Analysis: 6 independent detection metrics (Structural, Perplexity, Entropy, Semantic, Linguistic, Multi-Perturbation Stability)
  • Domain-Aware Calibration: Adaptive thresholds for 16 text domains (Academic, Creative, Technical, etc.)
  • Ensemble Aggregation: Confidence-weighted combination with uncertainty quantification
  • Sentence-Level Highlighting: Visual feedback with probability scores
  • Comprehensive Reporting: JSON and PDF reports with detailed analysis

Design Principles

  • Modular Architecture: Clean separation of concerns across layers
  • Fail-Safe Design: Graceful degradation with fallback strategies
  • Parallel Processing: Multi-threaded metric execution for performance
  • Domain Expertise: Specialized thresholds calibrated per content type

Why Multi-Metric Instead of a Single Classifier?

  • Single classifiers overfit stylistic artifacts
  • LLMs rapidly adapt to detectors
  • Independent statistical signals decay slower
  • Ensemble disagreement is itself evidence

High-Level Architecture

graph TB
    subgraph "Presentation Layer"
        UI[Web Interface/API]
    end

    subgraph "Application Layer"
        ORCH[Detection Orchestrator]
        ORCH --> |coordinates| PIPE[Processing Pipeline]
    end

    subgraph "Service Layer"
        ENSEMBLE[Ensemble Classifier]
        HIGHLIGHT[Text Highlighter]
        REASON[Reasoning Generator]
        REPORT[Report Generator]
    end

    subgraph "Processing Layer"
        EXTRACT[Document Extractor]
        TEXTPROC[Text Processor]
        DOMAIN[Domain Classifier]
        LANG[Language Detector]
    end

    subgraph "Metrics Layer"
        STRUCT[Structural Metric]
        PERP[Perplexity Metric]
        ENT[Entropy Metric]
        SEM[Semantic Metric]
        LING[Linguistic Metric]
        MPS[Multi-Perturbation Stability]
    end

    subgraph "Model Layer"
        MANAGER[Model Manager]
        REGISTRY[Model Registry]
        CACHE[(Model Cache)]
    end

    subgraph "Configuration Layer"
        CONFIG[Settings]
        ENUMS[Enums]
        SCHEMAS[Data Schemas]
        CONSTANTS[Constants]
        THRESHOLDS[Domain Thresholds]
    end

    UI --> ORCH
    
    ORCH --> EXTRACT
    ORCH --> TEXTPROC
    ORCH --> DOMAIN
    ORCH --> LANG
    
    ORCH --> STRUCT
    ORCH --> PERP
    ORCH --> ENT
    ORCH --> SEM
    ORCH --> LING
    ORCH --> MPS
    
    ORCH --> ENSEMBLE
    ENSEMBLE --> HIGHLIGHT
    ENSEMBLE --> REASON
    ENSEMBLE --> REPORT
    
    STRUCT --> MANAGER
    PERP --> MANAGER
    ENT --> MANAGER
    SEM --> MANAGER
    LING --> MANAGER
    MPS --> MANAGER
    DOMAIN --> MANAGER
    LANG --> MANAGER
    
    MANAGER --> REGISTRY
    MANAGER --> CACHE
    
    ORCH --> CONFIG
    ENSEMBLE --> THRESHOLDS

    style UI fill:#e1f5ff
    style ORCH fill:#fff3e0
    style ENSEMBLE fill:#f3e5f5
    style MANAGER fill:#e8f5e9
    style CONFIG fill:#fce4ec

Layer-by-Layer Architecture

1. Configuration Layer (config/)

The foundation layer providing enums, schemas, constants, and domain-specific thresholds.

graph LR
    subgraph "Configuration Layer"
        direction TB
        
        ENUMS["enums.py
        Domain, Language, Script, 
        ModelType ConfidenceLevel"]
        
        SCHEMAS["schemas.py
        ModelConfig, ProcessedText, MetricResult, EnsembleResult,
        DetectionResult"]
        
        CONSTANTS["constants.py
        TextProcessingParams, MetricParams,
        EnsembleParams"]
        
        THRESHOLDS["threshold_config.py
        DomainThresholds 16, 
        Domain Configs MetricThresholds"]
        
        MODELCFG["model_config.py
        Model Registry, Model Groups, Default Weights"]
        
        SETTINGS["settings.py
        App Settings, Paths, Feature Flags"]
    end
    
    ENUMS -.->|used by| SCHEMAS
    ENUMS -.->|used by| THRESHOLDS
    SCHEMAS -.->|used by| CONSTANTS
    THRESHOLDS -.->|imports| ENUMS
    MODELCFG -.->|imports| ENUMS
    
    style ENUMS fill:#ffebee
    style SCHEMAS fill:#fff3e0
    style CONSTANTS fill:#e8f5e9
    style THRESHOLDS fill:#e1f5ff
    style MODELCFG fill:#f3e5f5
    style SETTINGS fill:#fce4ec

Key Components:

  • enums.py: Core enumerations (Domain, Language, Script, ModelType, ConfidenceLevel)
  • schemas.py: Data classes for structured data exchange
  • constants.py: Frozen dataclasses with hyperparameters for each metric
  • threshold_config.py: Domain-specific thresholds for 16 domains
  • model_config.py: Model registry with download priorities and configurations
  • settings.py: Application settings with Pydantic validation

2. Model Abstraction Layer (models/)

Conceptual model abstraction layer used by metrics for centralized loading and reuse - loading, caching, and providing unified access.

graph TB
    subgraph "Model Layer"
        direction TB
        
        MANAGER["Model Manager
        Singleton Pattern Lazy Loading"]
        
        REGISTRY["Model Registry 
        10 Model Configs Priority Groups"]
        
        subgraph "Model Cache"
            direction LR
            GPT2[GPT-2548MBPerplexity/MPS]
            MINILM[MiniLM-L6-v280MBSemantic]
            SPACY[spaCy sm13MBLinguistic]
            ROBERTA[RoBERTa500MBDomain Classifier]
            DISTIL[DistilRoBERTa330MBMPS Mask]
            XLM[XLM-RoBERTa1100MBLanguage Detection]
        end
        
        STATS[Usage StatisticsTracking Performance Metrics]
    end
    
    MANAGER -->|loads from| REGISTRY
    MANAGER -->|manages| GPT2
    MANAGER -->|manages| MINILM
    MANAGER -->|manages| SPACY
    MANAGER -->|manages| ROBERTA
    MANAGER -->|manages| DISTIL
    MANAGER -->|manages| XLM
    MANAGER -->|tracks| STATS
    
    REGISTRY -.->|defines| GPT2
    REGISTRY -.->|defines| MINILM
    REGISTRY -.->|defines| SPACY
    
    style MANAGER fill:#e3f2fd
    style REGISTRY fill:#f3e5f5
    style STATS fill:#fff3e0

Key Features:

  • Lazy Loading: Models loaded on-demand
  • Caching Strategy: LRU cache with max 5 models
  • Usage Tracking: Statistics for optimization
  • Priority Groups: Essential, Extended, Optional
  • Total Size: ~2.8GB for all models

3. Processing Layer (processors/)

Handles document extraction, text preprocessing, domain classification, and language detection.

graph TB
    subgraph "Processing Layer"
        direction TB
        
        subgraph "Document Extraction"
            EXTRACT[Document Extractor]
            EXTRACT -->|PDF| PYPDF[PyMuPDF Primary]
            EXTRACT -->|PDF| PDFPLUMB[pdfplumber Fallback]
            EXTRACT -->|PDF| PYPDF2[PyPDF2 Fallback]
            EXTRACT -->|DOCX| DOCX[python-docx]
            EXTRACT -->|HTML| BS4[BeautifulSoup4]
            EXTRACT -->|RTF| RTF[Basic Parser]
            EXTRACT -->|TXT| TXT[Chardet Encoding]
        end
        
        subgraph "Text Processing"
            TEXTPROC[Text Processor]
            TEXTPROC --> CLEAN[Unicode NormalizationURL/Email RemovalWhitespace Cleaning]
            TEXTPROC --> SPLIT[Smart Sentence SplittingAbbreviation HandlingWord Tokenization]
            TEXTPROC --> VALIDATE[Length ValidationQuality ChecksStatistics]
        end
        
        subgraph "Domain Classification"
            DOMAIN[Domain Classifier]
            DOMAIN --> ZERO[Heuristic + optional model-assisted domain inference RoBERTa/DeBERTa]
            DOMAIN --> LABELS[16 Domain LabelsMulti-Label Candidates]
            DOMAIN --> THRESH[Domain-SpecificThreshold Selection]
        end
        
        subgraph "Language Detection"
            LANG[Language Detector]
            LANG --> MODEL[XLM-RoBERTaChunk-Based Analysis]
            LANG --> FALLBACK[langdetect Library]
            LANG --> HEURISTIC[Script DetectionCharacter Analysis]
        end
    end
    
    EXTRACT -->|ProcessedText| TEXTPROC
    TEXTPROC -->|Cleaned Text| DOMAIN
    TEXTPROC -->|Cleaned Text| LANG
    
    style EXTRACT fill:#e8f5e9
    style TEXTPROC fill:#fff3e0
    style DOMAIN fill:#e1f5ff
    style LANG fill:#f3e5f5

Processing Pipeline:

  1. Document Extraction: Multi-format support with fallback strategies
  2. Text Cleaning: Unicode normalization, noise removal, validation
  3. Domain Classification: Zero-shot classification with confidence scores
  4. Language Detection: Multi-strategy approach with script analysis

4. Metrics Layer (metrics/)

Six independent detection metrics analyzing different text characteristics.

graph TB
    subgraph "Metrics Layer"
        direction TB
        
        BASE[Base MetricAbstract ClassCommon Interface]
        
        subgraph "Statistical Metrics"
            STRUCT[Structural MetricNo ML ModelStatistical Features]
            STRUCT --> SF1[Sentence Length DistributionBurstiness ScoreReadability]
            STRUCT --> SF2[N-gram DiversityType-Token RatioRepetition Patterns]
        end
        
        subgraph "ML-Based Metrics"
            PERP[Perplexity MetricGPT-2 ModelText Predictability]
            PERP --> PF1[Overall PerplexitySentence-Level PerplexityCross-Entropy]
            PERP --> PF2[Chunk AnalysisVariance ScoringNormalization]
            
            ENT[Entropy MetricGPT-2 TokenizerRandomness Analysis]
            ENT --> EF1[Character EntropyWord EntropyToken Entropy]
            ENT --> EF2[Token DiversitySequence UnpredictabilityPattern Detection]
            
            SEM[Semantic MetricMiniLM EmbeddingsCoherence Analysis]
            SEM --> SF3[Sentence SimilarityTopic ConsistencyCoherence Score]
            SEM --> SF4[Repetition DetectionTopic DriftContextual Consistency]
            
            LING[Linguistic MetricspaCy NLPGrammar Analysis]
            LING --> LF1[POS DiversityPOS EntropySyntactic Complexity]
            LING --> LF2[Grammatical PatternsWriting StylePattern Detection]
            
            MPS[Multi-PerturbationGPT-2 + DistilRoBERTaStability Analysis]
            MPS --> MF1[Text PerturbationLikelihood CalculationStability Score]
            MPS --> MF2[Curvature AnalysisChunk StabilityVariance Scoring]
        end
    end
    
    BASE -.->|inherited by| STRUCT
    BASE -.->|inherited by| PERP
    BASE -.->|inherited by| ENT
    BASE -.->|inherited by| SEM
    BASE -.->|inherited by| LING
    BASE -.->|inherited by| MPS
    
    style BASE fill:#ffebee
    style STRUCT fill:#e8f5e9
    style PERP fill:#fff3e0
    style ENT fill:#e1f5ff
    style SEM fill:#f3e5f5
    style LING fill:#fce4ec
    style MPS fill:#fff9c4

Metric Characteristics:

Metric Model Required Complexity Typical Influence Range (Indicative)
Structural Low 15-20%
Perplexity GPT-2 Medium 20-27%
Entropy GPT-2 Tokenizer Medium 13-17%
Semantic MiniLM Medium 18-20%
Linguistic spaCy Medium 12-16%
MPS GPT-2 + DistilRoBERTa High 8-10%

Actual weights are dynamically calibrated per domain and configuration.


5. Service Layer (services/)

Coordinates ensemble aggregation, highlighting, reasoning generation, and orchestration.

graph TB
    subgraph "Service Layer"
        direction TB
        
        subgraph "Orchestrator"
            ORCH[Detection OrchestratorPipeline Coordinator]
            ORCH --> PIPE[Processing Pipeline6-Step Execution]
            PIPE --> STEP1[1. Text Preprocessing]
            PIPE --> STEP2[2. Language Detection]
            PIPE --> STEP3[3. Domain Classification]
            PIPE --> STEP4[4. Metric ExecutionParallel/Sequential]
            PIPE --> STEP5[5. Ensemble Aggregation]
            PIPE --> STEP6[6. Result Compilation]
        end
        
        subgraph "Ensemble Classifier"
            ENSEMBLE[Ensemble ClassifierMulti-Strategy Aggregation]
            ENSEMBLE --> METHOD1[Confidence CalibratedSigmoid Weighting]
            ENSEMBLE --> METHOD2[Consensus BasedAgreement Rewards]
            ENSEMBLE --> METHOD3[Domain WeightedStatic Weights]
            ENSEMBLE --> METHOD4[Simple AverageFallback]
            ENSEMBLE --> CALC[Uncertainty QuantificationConsensus AnalysisConfidence Scoring]
        end
        
        subgraph "Highlighter"
            HIGHLIGHT[Text HighlighterSentence-Level Analysis]
            HIGHLIGHT --> COLORS[4-Color SystemAuthentic/UncertainHybrid/Synthetic]
            HIGHLIGHT --> SENTENCE[Sentence EnsembleDomain AdjustmentsTooltip Generation]
        end
        
        subgraph "Reasoning"
            REASON[Reasoning GeneratorExplainable AI]
            REASON --> SUMMARY[Executive SummaryVerdict Explanation]
            REASON --> INDICATORS[Key IndicatorsMetric Breakdown]
            REASON --> EVIDENCE[Supporting EvidenceContradicting Evidence]
            REASON --> RECOM[RecommendationsUncertainty Analysis]
        end
    end
    
    ORCH -->|coordinates| ENSEMBLE
    ORCH -->|uses| HIGHLIGHT
    ORCH -->|uses| REASON
    ENSEMBLE -->|provides| HIGHLIGHT
    ENSEMBLE -->|provides| REASON
    
    style ORCH fill:#fff3e0
    style ENSEMBLE fill:#e3f2fd
    style HIGHLIGHT fill:#f3e5f5
    style REASON fill:#e8f5e9

Service Features:

  • Parallel Execution: ThreadPoolExecutor for metric computation
  • Ensemble Methods: 4 aggregation strategies with fallbacks
  • Sentence Highlighting: 4-category color system (Authentic/Uncertain/Hybrid/Synthetic)
  • Explainable AI: Detailed reasoning with metric contributions

6. Reporter Layer (reporter/)

Generates comprehensive reports in multiple formats.

graph TB
    subgraph "Reporter Layer"
        direction TB
        
        REPORT[Report Generator]
        
        subgraph "JSON Report"
            JSON[Structured JSON]
            JSON --> META[Report MetadataTimestampVersion]
            JSON --> RESULTS[Overall ResultsProbabilitiesConfidence]
            JSON --> METRICS[Detailed MetricsSub-metricsWeights]
            JSON --> REASONING[Detection ReasoningEvidenceRecommendations]
            JSON --> HIGHLIGHT[Highlighted SentencesColor ClassesProbabilities]
            JSON --> PERF[Performance MetricsExecution TimesWarnings/Errors]
        end
        
        subgraph "PDF Report"
            PDF[Professional PDF]
            PDF --> PAGE1[Page 1: Executive SummaryVerdict, Stats, Reasoning]
            PDF --> PAGE2[Page 2: Content AnalysisDomain, Metrics, Weights]
            PDF --> PAGE3[Page 3: Structural & Entropy]
            PDF --> PAGE4[Page 4: Perplexity & Semantic]
            PDF --> PAGE5[Page 5: Linguistic & MPS]
            PDF --> PAGE6[Page 6: Recommendations]
            
            STYLE[Premium Styling]
            STYLE --> COLORS[Color SchemeBlue/Green/Red/Purple]
            STYLE --> TABLES[Professional TablesCharts, Metrics]
            STYLE --> LAYOUT[Multi-Page LayoutHeaders, Footers]
        end
    end
    
    REPORT -->|generates| JSON
    REPORT -->|generates| PDF
    PDF -->|uses| STYLE
    
    style REPORT fill:#fff3e0
    style JSON fill:#e8f5e9
    style PDF fill:#e3f2fd
    style STYLE fill:#f3e5f5

Report Formats:

  • JSON: Machine-readable with complete data
  • PDF: Human-readable with professional formatting
  • Charts: Pie charts for probability distribution
  • Tables: Metric contributions, detailed sub-metrics
  • Styling: Color-coded, multi-page layout with branding

Data Flow

Complete Detection Pipeline

sequenceDiagram
    participant User
    participant Orchestrator
    participant Processors
    participant Metrics
    participant Ensemble
    participant Services
    participant Reporter

    User->>Orchestrator: analyze(text)
    
    Note over Orchestrator: Step 1: Preprocessing
    Orchestrator->>Processors: TextProcessor.process()
    Processors-->>Orchestrator: ProcessedText
    
    Note over Orchestrator: Step 2: Language Detection
    Orchestrator->>Processors: LanguageDetector.detect()
    Processors-->>Orchestrator: LanguageResult
    
    Note over Orchestrator: Step 3: Domain Classification
    Orchestrator->>Processors: DomainClassifier.classify()
    Processors-->>Orchestrator: DomainPrediction
    
    Note over Orchestrator: Step 4: Parallel Metric Execution
    par Structural
        Orchestrator->>Metrics: Structural.compute()
        Metrics-->>Orchestrator: MetricResult
    and Perplexity
        Orchestrator->>Metrics: Perplexity.compute()
        Metrics-->>Orchestrator: MetricResult
    and Entropy
        Orchestrator->>Metrics: Entropy.compute()
        Metrics-->>Orchestrator: MetricResult
    and Semantic
        Orchestrator->>Metrics: Semantic.compute()
        Metrics-->>Orchestrator: MetricResult
    and Linguistic
        Orchestrator->>Metrics: Linguistic.compute()
        Metrics-->>Orchestrator: MetricResult
    and MPS
        Orchestrator->>Metrics: MPS.compute()
        Metrics-->>Orchestrator: MetricResult
    end
    
    Note over Orchestrator: Step 5: Ensemble Aggregation
    Orchestrator->>Ensemble: predict(metric_results, domain)
    Ensemble-->>Orchestrator: EnsembleResult
    
    Note over Orchestrator: Step 6: Services
    Orchestrator->>Services: generate_highlights()
    Services-->>Orchestrator: HighlightedSentences
    
    Orchestrator->>Services: generate_reasoning()
    Services-->>Orchestrator: DetailedReasoning
    
    Orchestrator->>Reporter: generate_report()
    Reporter-->>Orchestrator: Report Files
    
    Orchestrator-->>User: DetectionResult

Ensemble Aggregation Flow

graph TD
    START[Metric Results] --> FILTER[Filter Valid MetricsRemove Errors]
    FILTER --> WEIGHTS[Get Domain WeightsBase Weights]
    
    WEIGHTS --> METHOD{Primary Method?}
    
    METHOD -->|Confidence Calibrated| CONF[Sigmoid ConfidenceAdjustment]
    METHOD -->|Consensus Based| CONS[AgreementCalculation]
    METHOD -->|Domain Weighted| DOMAIN[Static DomainWeights]
    
    CONF --> AGGREGATE[Weighted Aggregation]
    CONS --> AGGREGATE
    DOMAIN --> AGGREGATE
    
    AGGREGATE --> NORMALIZE[Normalize to 1.0]
    
    NORMALIZE --> CALC[Calculate Metrics]
    CALC --> CONFIDENCE[Overall ConfidenceBase + Agreement+ Certainty + Quality]
    CALC --> UNCERTAINTY[Uncertainty ScoreVariance + Confidence+ Decision]
    CALC --> CONSENSUS[Consensus LevelStd Dev Analysis]
    
    CONFIDENCE --> THRESHOLD[Apply AdaptiveThreshold]
    UNCERTAINTY --> THRESHOLD
    
    THRESHOLD --> VERDICT{Verdict}
    VERDICT -->|Synthetic >= 0.6| SYNTH[Synthetically-Generated]
    VERDICT -->|Authentic >= 0.6| AUTH[Authentically-Written]
    VERDICT -->|Hybrid > 0.25| HYBRID[Hybrid]
    VERDICT -->|Uncertain| UNC[Uncertain]
    
    SYNTH --> REASON[Generate Reasoning]
    AUTH --> REASON
    HYBRID --> REASON
    UNC --> REASON
    
    REASON --> RESULT[EnsembleResult]
    
    style START fill:#e8f5e9
    style RESULT fill:#e3f2fd
    style SYNTH fill:#ffebee
    style AUTH fill:#e8f5e9
    style HYBRID fill:#fff3e0
    style UNC fill:#f5f5f5

Technology Stack

Core Technologies

graph LR
    subgraph "Language & Runtime"
        PYTHON[Python 3.10+]
        CONDA[Conda Environment]
    end
    
    subgraph "ML Frameworks"
        TORCH[PyTorch]
        HF[HuggingFace Transformers]
        SPACY[spaCy]
        SKLEARN[scikit-learn]
    end
    
    subgraph "NLP Models"
        GPT2[GPT-2Perplexity/MPS]
        MINILM[MiniLM-L6-v2Semantic]
        ROBERTA[RoBERTaDomain Classify]
        DISTIL[DistilRoBERTaMPS Mask]
        XLM[XLM-RoBERTaLanguage Detect]
        SPACYMODEL[en_core_web_smLinguistic]
    end
    
    subgraph "Document Processing"
        PYMUPDF[PyMuPDF]
        PDFPLUMBER[pdfplumber]
        PYPDF2[PyPDF2]
        DOCX[python-docx]
        BS4[BeautifulSoup4]
    end
    
    subgraph "Utilities"
        NUMPY[NumPy]
        PYDANTIC[Pydantic]
        LOGURU[Loguru]
        REPORTLAB[ReportLab]
    end
    
    PYTHON --> TORCH
    TORCH --> HF
    HF --> GPT2
    HF --> MINILM
    HF --> ROBERTA
    HF --> DISTIL
    HF --> XLM
    PYTHON --> SPACY
    SPACY --> SPACYMODEL
    
    style PYTHON fill:#306998
    style TORCH fill:#ee4c2c
    style HF fill:#ff6f00
    style SPACY fill:#09a3d5

Dependencies Summary

Category Libraries Purpose
ML Core PyTorch, Transformers, spaCy Model execution, NLP
Document PyMuPDF, pdfplumber, python-docx Multi-format extraction
Analysis NumPy, scikit-learn Numerical computation
Validation Pydantic Data validation
Logging Loguru Structured logging
Reporting ReportLab PDF generation

Deployment Architecture

graph TB
    subgraph "Deployment Options"
        direction TB
        
        subgraph "Standalone Application"
            SCRIPT[Python Scripts]
        end
        
        subgraph "Web Application"
            FASTAPI[FastAPI Server]
        end
        
        subgraph "API Service"
            REST[REST API Endpoints]
            BATCH[Batch Processing]
            ASYNC[Async Workers]
        end
        
        subgraph "Infrastructure"
            DOCKER[Docker Container]
            GPU[GPU SupportOptional]
            STORAGE[Model Cache2.8GB]
        end
    end
    
    FASTAPI --> DOCKER
    REST --> DOCKER
    
    DOCKER --> GPU
    DOCKER --> STORAGE
    
    style FASTAPI fill:#e3f2fd
    style DOCKER fill:#2496ed
    style GPU fill:#76b900

System Requirements

  • Python: 3.10+
  • RAM: 8GB minimum, 16GB recommended
  • Storage: 5GB (models + data)
  • GPU: Optional (CUDA/MPS for faster inference)
  • CPU: 4+ cores for parallel execution

Performance Characteristics

Execution Modes

graph LR
    subgraph "Sequential Mode"
        S1[Metric 1] --> S2[Metric 2]
        S2 --> S3[Metric 3]
        S3 --> S4[Metric 4]
        S4 --> S5[Metric 5]
        S5 --> S6[Metric 6]
        S6 --> SRESULT[~15-30s]
    end
    
    subgraph "Parallel Mode"
        P1[Metric 1]
        P2[Metric 2]
        P3[Metric 3]
        P4[Metric 4]
        P5[Metric 5]
        P6[Metric 6]
        
        P1 --> PRESULT[~8-12s]
        P2 --> PRESULT
        P3 --> PRESULT
        P4 --> PRESULT
        P5 --> PRESULT
        P6 --> PRESULT
    end
    
    style SRESULT fill:#ffebee
    style PRESULT fill:#e8f5e9

Metric Execution Times

Metric Avg Time Complexity Model Size
Structural 0.5-1s Low 0MB
Perplexity 2-4s Medium 548MB
Entropy 1-2s Medium ~50MB (shared)
Semantic 3-5s Medium 80MB
Linguistic 2-3s Medium 13MB
MPS 5-10s High 878MB (GPT-2 + DistilRoBERTa)

Total Sequential: ~15-25 seconds
Total Parallel: ~8-12 seconds (limited by slowest metric)


Security & Privacy

Data Handling

graph TD
    INPUT[Text Input] --> PROCESS[Processing]
    PROCESS --> MEMORY[In-Memory Only]
    MEMORY --> ANALYSIS[Analysis]
    ANALYSIS --> CLEANUP[Auto Cleanup]
    
    MODELS[Model Cache] -.->|Read Only| ANALYSIS
    
    REPORTS[Optional Reports] --> STORAGE[Local Storage Only]
    
    CLEANUP --> DISCARD[Data Discarded]
    
    style INPUT fill:#e3f2fd
    style MEMORY fill:#fff3e0
    style CLEANUP fill:#e8f5e9
    style DISCARD fill:#ffebee

Security Features

  • No External Data Transmission: All processing local
  • No Data Persistence: Text data not stored by default
  • Model Integrity: Checksums for downloaded models
  • Input Validation: Pydantic schemas for all inputs
  • Error Isolation: Graceful degradation, no information leakage

This system does not claim ground truth authorship. It estimates probabilistic authenticity signals based on measurable text properties.