# TEXT-AUTH: System Architecture Documentation > TEXT-AUTH is an evidence-first, domain-aware AI text detection system > designed around independent signals, calibrated aggregation, and > explainability rather than black-box classification. --- ## Table of Contents 1. [System Overview](#system-overview) 2. [High-Level Architecture](#high-level-architecture) 3. [Layer-by-Layer Architecture](#layer-by-layer-architecture) 4. [Data Flow](#data-flow) 5. [Technology Stack](#technology-stack) --- ## System Overview **TEXT-AUTH** is a sophisticated AI text detection system that employs multiple machine learning metrics and ensemble methods to determine whether text is synthetically generated, authentically written, or hybrid content. ### Key Capabilities - **Multi-Metric Analysis**: 6 independent detection metrics (Structural, Perplexity, Entropy, Semantic, Linguistic, Multi-Perturbation Stability) - **Domain-Aware Calibration**: Adaptive thresholds for 16 text domains (Academic, Creative, Technical, etc.) - **Ensemble Aggregation**: Confidence-weighted combination with uncertainty quantification - **Sentence-Level Highlighting**: Visual feedback with probability scores - **Comprehensive Reporting**: JSON and PDF reports with detailed analysis ### Design Principles - **Modular Architecture**: Clean separation of concerns across layers - **Fail-Safe Design**: Graceful degradation with fallback strategies - **Parallel Processing**: Multi-threaded metric execution for performance - **Domain Expertise**: Specialized thresholds calibrated per content type ## Why Multi-Metric Instead of a Single Classifier? - Single classifiers overfit stylistic artifacts - LLMs rapidly adapt to detectors - Independent statistical signals decay slower - Ensemble disagreement is itself evidence --- ## High-Level Architecture ```mermaid graph TB subgraph "Presentation Layer" UI[Web Interface/API] end subgraph "Application Layer" ORCH[Detection Orchestrator] ORCH --> |coordinates| PIPE[Processing Pipeline] end subgraph "Service Layer" ENSEMBLE[Ensemble Classifier] HIGHLIGHT[Text Highlighter] REASON[Reasoning Generator] REPORT[Report Generator] end subgraph "Processing Layer" EXTRACT[Document Extractor] TEXTPROC[Text Processor] DOMAIN[Domain Classifier] LANG[Language Detector] end subgraph "Metrics Layer" STRUCT[Structural Metric] PERP[Perplexity Metric] ENT[Entropy Metric] SEM[Semantic Metric] LING[Linguistic Metric] MPS[Multi-Perturbation Stability] end subgraph "Model Layer" MANAGER[Model Manager] REGISTRY[Model Registry] CACHE[(Model Cache)] end subgraph "Configuration Layer" CONFIG[Settings] ENUMS[Enums] SCHEMAS[Data Schemas] CONSTANTS[Constants] THRESHOLDS[Domain Thresholds] end UI --> ORCH ORCH --> EXTRACT ORCH --> TEXTPROC ORCH --> DOMAIN ORCH --> LANG ORCH --> STRUCT ORCH --> PERP ORCH --> ENT ORCH --> SEM ORCH --> LING ORCH --> MPS ORCH --> ENSEMBLE ENSEMBLE --> HIGHLIGHT ENSEMBLE --> REASON ENSEMBLE --> REPORT STRUCT --> MANAGER PERP --> MANAGER ENT --> MANAGER SEM --> MANAGER LING --> MANAGER MPS --> MANAGER DOMAIN --> MANAGER LANG --> MANAGER MANAGER --> REGISTRY MANAGER --> CACHE ORCH --> CONFIG ENSEMBLE --> THRESHOLDS style UI fill:#e1f5ff style ORCH fill:#fff3e0 style ENSEMBLE fill:#f3e5f5 style MANAGER fill:#e8f5e9 style CONFIG fill:#fce4ec ``` --- ## Layer-by-Layer Architecture ### 1. Configuration Layer (`config/`) The foundation layer providing enums, schemas, constants, and domain-specific thresholds. ```mermaid graph LR subgraph "Configuration Layer" direction TB ENUMS["enums.py Domain, Language, Script, ModelType ConfidenceLevel"] SCHEMAS["schemas.py ModelConfig, ProcessedText, MetricResult, EnsembleResult, DetectionResult"] CONSTANTS["constants.py TextProcessingParams, MetricParams, EnsembleParams"] THRESHOLDS["threshold_config.py DomainThresholds 16, Domain Configs MetricThresholds"] MODELCFG["model_config.py Model Registry, Model Groups, Default Weights"] SETTINGS["settings.py App Settings, Paths, Feature Flags"] end ENUMS -.->|used by| SCHEMAS ENUMS -.->|used by| THRESHOLDS SCHEMAS -.->|used by| CONSTANTS THRESHOLDS -.->|imports| ENUMS MODELCFG -.->|imports| ENUMS style ENUMS fill:#ffebee style SCHEMAS fill:#fff3e0 style CONSTANTS fill:#e8f5e9 style THRESHOLDS fill:#e1f5ff style MODELCFG fill:#f3e5f5 style SETTINGS fill:#fce4ec ``` **Key Components:** - **enums.py**: Core enumerations (Domain, Language, Script, ModelType, ConfidenceLevel) - **schemas.py**: Data classes for structured data exchange - **constants.py**: Frozen dataclasses with hyperparameters for each metric - **threshold_config.py**: Domain-specific thresholds for 16 domains - **model_config.py**: Model registry with download priorities and configurations - **settings.py**: Application settings with Pydantic validation --- ### 2. Model Abstraction Layer (`models/`) Conceptual model abstraction layer used by metrics for centralized loading and reuse - loading, caching, and providing unified access. ```mermaid graph TB subgraph "Model Layer" direction TB MANAGER["Model Manager Singleton Pattern Lazy Loading"] REGISTRY["Model Registry 10 Model Configs Priority Groups"] subgraph "Model Cache" direction LR GPT2[GPT-2548MBPerplexity/MPS] MINILM[MiniLM-L6-v280MBSemantic] SPACY[spaCy sm13MBLinguistic] ROBERTA[RoBERTa500MBDomain Classifier] DISTIL[DistilRoBERTa330MBMPS Mask] XLM[XLM-RoBERTa1100MBLanguage Detection] end STATS[Usage StatisticsTracking Performance Metrics] end MANAGER -->|loads from| REGISTRY MANAGER -->|manages| GPT2 MANAGER -->|manages| MINILM MANAGER -->|manages| SPACY MANAGER -->|manages| ROBERTA MANAGER -->|manages| DISTIL MANAGER -->|manages| XLM MANAGER -->|tracks| STATS REGISTRY -.->|defines| GPT2 REGISTRY -.->|defines| MINILM REGISTRY -.->|defines| SPACY style MANAGER fill:#e3f2fd style REGISTRY fill:#f3e5f5 style STATS fill:#fff3e0 ``` **Key Features:** - **Lazy Loading**: Models loaded on-demand - **Caching Strategy**: LRU cache with max 5 models - **Usage Tracking**: Statistics for optimization - **Priority Groups**: Essential, Extended, Optional - **Total Size**: ~2.8GB for all models --- ### 3. Processing Layer (`processors/`) Handles document extraction, text preprocessing, domain classification, and language detection. ```mermaid graph TB subgraph "Processing Layer" direction TB subgraph "Document Extraction" EXTRACT[Document Extractor] EXTRACT -->|PDF| PYPDF[PyMuPDF Primary] EXTRACT -->|PDF| PDFPLUMB[pdfplumber Fallback] EXTRACT -->|PDF| PYPDF2[PyPDF2 Fallback] EXTRACT -->|DOCX| DOCX[python-docx] EXTRACT -->|HTML| BS4[BeautifulSoup4] EXTRACT -->|RTF| RTF[Basic Parser] EXTRACT -->|TXT| TXT[Chardet Encoding] end subgraph "Text Processing" TEXTPROC[Text Processor] TEXTPROC --> CLEAN[Unicode NormalizationURL/Email RemovalWhitespace Cleaning] TEXTPROC --> SPLIT[Smart Sentence SplittingAbbreviation HandlingWord Tokenization] TEXTPROC --> VALIDATE[Length ValidationQuality ChecksStatistics] end subgraph "Domain Classification" DOMAIN[Domain Classifier] DOMAIN --> ZERO[Heuristic + optional model-assisted domain inference RoBERTa/DeBERTa] DOMAIN --> LABELS[16 Domain LabelsMulti-Label Candidates] DOMAIN --> THRESH[Domain-SpecificThreshold Selection] end subgraph "Language Detection" LANG[Language Detector] LANG --> MODEL[XLM-RoBERTaChunk-Based Analysis] LANG --> FALLBACK[langdetect Library] LANG --> HEURISTIC[Script DetectionCharacter Analysis] end end EXTRACT -->|ProcessedText| TEXTPROC TEXTPROC -->|Cleaned Text| DOMAIN TEXTPROC -->|Cleaned Text| LANG style EXTRACT fill:#e8f5e9 style TEXTPROC fill:#fff3e0 style DOMAIN fill:#e1f5ff style LANG fill:#f3e5f5 ``` **Processing Pipeline:** 1. **Document Extraction**: Multi-format support with fallback strategies 2. **Text Cleaning**: Unicode normalization, noise removal, validation 3. **Domain Classification**: Zero-shot classification with confidence scores 4. **Language Detection**: Multi-strategy approach with script analysis --- ### 4. Metrics Layer (`metrics/`) Six independent detection metrics analyzing different text characteristics. ```mermaid graph TB subgraph "Metrics Layer" direction TB BASE[Base MetricAbstract ClassCommon Interface] subgraph "Statistical Metrics" STRUCT[Structural MetricNo ML ModelStatistical Features] STRUCT --> SF1[Sentence Length DistributionBurstiness ScoreReadability] STRUCT --> SF2[N-gram DiversityType-Token RatioRepetition Patterns] end subgraph "ML-Based Metrics" PERP[Perplexity MetricGPT-2 ModelText Predictability] PERP --> PF1[Overall PerplexitySentence-Level PerplexityCross-Entropy] PERP --> PF2[Chunk AnalysisVariance ScoringNormalization] ENT[Entropy MetricGPT-2 TokenizerRandomness Analysis] ENT --> EF1[Character EntropyWord EntropyToken Entropy] ENT --> EF2[Token DiversitySequence UnpredictabilityPattern Detection] SEM[Semantic MetricMiniLM EmbeddingsCoherence Analysis] SEM --> SF3[Sentence SimilarityTopic ConsistencyCoherence Score] SEM --> SF4[Repetition DetectionTopic DriftContextual Consistency] LING[Linguistic MetricspaCy NLPGrammar Analysis] LING --> LF1[POS DiversityPOS EntropySyntactic Complexity] LING --> LF2[Grammatical PatternsWriting StylePattern Detection] MPS[Multi-PerturbationGPT-2 + DistilRoBERTaStability Analysis] MPS --> MF1[Text PerturbationLikelihood CalculationStability Score] MPS --> MF2[Curvature AnalysisChunk StabilityVariance Scoring] end end BASE -.->|inherited by| STRUCT BASE -.->|inherited by| PERP BASE -.->|inherited by| ENT BASE -.->|inherited by| SEM BASE -.->|inherited by| LING BASE -.->|inherited by| MPS style BASE fill:#ffebee style STRUCT fill:#e8f5e9 style PERP fill:#fff3e0 style ENT fill:#e1f5ff style SEM fill:#f3e5f5 style LING fill:#fce4ec style MPS fill:#fff9c4 ``` **Metric Characteristics:** | Metric | Model Required | Complexity | Typical Influence Range (Indicative) | |--------|---------------|------------|--------------| | Structural | ❌ | Low | 15-20% | | Perplexity | GPT-2 | Medium | 20-27% | | Entropy | GPT-2 Tokenizer | Medium | 13-17% | | Semantic | MiniLM | Medium | 18-20% | | Linguistic | spaCy | Medium | 12-16% | | MPS | GPT-2 + DistilRoBERTa | High | 8-10% | > *Actual weights are dynamically calibrated per domain and configuration.* --- ### 5. Service Layer (`services/`) Coordinates ensemble aggregation, highlighting, reasoning generation, and orchestration. ```mermaid graph TB subgraph "Service Layer" direction TB subgraph "Orchestrator" ORCH[Detection OrchestratorPipeline Coordinator] ORCH --> PIPE[Processing Pipeline6-Step Execution] PIPE --> STEP1[1. Text Preprocessing] PIPE --> STEP2[2. Language Detection] PIPE --> STEP3[3. Domain Classification] PIPE --> STEP4[4. Metric ExecutionParallel/Sequential] PIPE --> STEP5[5. Ensemble Aggregation] PIPE --> STEP6[6. Result Compilation] end subgraph "Ensemble Classifier" ENSEMBLE[Ensemble ClassifierMulti-Strategy Aggregation] ENSEMBLE --> METHOD1[Confidence CalibratedSigmoid Weighting] ENSEMBLE --> METHOD2[Consensus BasedAgreement Rewards] ENSEMBLE --> METHOD3[Domain WeightedStatic Weights] ENSEMBLE --> METHOD4[Simple AverageFallback] ENSEMBLE --> CALC[Uncertainty QuantificationConsensus AnalysisConfidence Scoring] end subgraph "Highlighter" HIGHLIGHT[Text HighlighterSentence-Level Analysis] HIGHLIGHT --> COLORS[4-Color SystemAuthentic/UncertainHybrid/Synthetic] HIGHLIGHT --> SENTENCE[Sentence EnsembleDomain AdjustmentsTooltip Generation] end subgraph "Reasoning" REASON[Reasoning GeneratorExplainable AI] REASON --> SUMMARY[Executive SummaryVerdict Explanation] REASON --> INDICATORS[Key IndicatorsMetric Breakdown] REASON --> EVIDENCE[Supporting EvidenceContradicting Evidence] REASON --> RECOM[RecommendationsUncertainty Analysis] end end ORCH -->|coordinates| ENSEMBLE ORCH -->|uses| HIGHLIGHT ORCH -->|uses| REASON ENSEMBLE -->|provides| HIGHLIGHT ENSEMBLE -->|provides| REASON style ORCH fill:#fff3e0 style ENSEMBLE fill:#e3f2fd style HIGHLIGHT fill:#f3e5f5 style REASON fill:#e8f5e9 ``` **Service Features:** - **Parallel Execution**: ThreadPoolExecutor for metric computation - **Ensemble Methods**: 4 aggregation strategies with fallbacks - **Sentence Highlighting**: 4-category color system (Authentic/Uncertain/Hybrid/Synthetic) - **Explainable AI**: Detailed reasoning with metric contributions --- ### 6. Reporter Layer (`reporter/`) Generates comprehensive reports in multiple formats. ```mermaid graph TB subgraph "Reporter Layer" direction TB REPORT[Report Generator] subgraph "JSON Report" JSON[Structured JSON] JSON --> META[Report MetadataTimestampVersion] JSON --> RESULTS[Overall ResultsProbabilitiesConfidence] JSON --> METRICS[Detailed MetricsSub-metricsWeights] JSON --> REASONING[Detection ReasoningEvidenceRecommendations] JSON --> HIGHLIGHT[Highlighted SentencesColor ClassesProbabilities] JSON --> PERF[Performance MetricsExecution TimesWarnings/Errors] end subgraph "PDF Report" PDF[Professional PDF] PDF --> PAGE1[Page 1: Executive SummaryVerdict, Stats, Reasoning] PDF --> PAGE2[Page 2: Content AnalysisDomain, Metrics, Weights] PDF --> PAGE3[Page 3: Structural & Entropy] PDF --> PAGE4[Page 4: Perplexity & Semantic] PDF --> PAGE5[Page 5: Linguistic & MPS] PDF --> PAGE6[Page 6: Recommendations] STYLE[Premium Styling] STYLE --> COLORS[Color SchemeBlue/Green/Red/Purple] STYLE --> TABLES[Professional TablesCharts, Metrics] STYLE --> LAYOUT[Multi-Page LayoutHeaders, Footers] end end REPORT -->|generates| JSON REPORT -->|generates| PDF PDF -->|uses| STYLE style REPORT fill:#fff3e0 style JSON fill:#e8f5e9 style PDF fill:#e3f2fd style STYLE fill:#f3e5f5 ``` **Report Formats:** - **JSON**: Machine-readable with complete data - **PDF**: Human-readable with professional formatting - **Charts**: Pie charts for probability distribution - **Tables**: Metric contributions, detailed sub-metrics - **Styling**: Color-coded, multi-page layout with branding --- ## Data Flow ### Complete Detection Pipeline ```mermaid sequenceDiagram participant User participant Orchestrator participant Processors participant Metrics participant Ensemble participant Services participant Reporter User->>Orchestrator: analyze(text) Note over Orchestrator: Step 1: Preprocessing Orchestrator->>Processors: TextProcessor.process() Processors-->>Orchestrator: ProcessedText Note over Orchestrator: Step 2: Language Detection Orchestrator->>Processors: LanguageDetector.detect() Processors-->>Orchestrator: LanguageResult Note over Orchestrator: Step 3: Domain Classification Orchestrator->>Processors: DomainClassifier.classify() Processors-->>Orchestrator: DomainPrediction Note over Orchestrator: Step 4: Parallel Metric Execution par Structural Orchestrator->>Metrics: Structural.compute() Metrics-->>Orchestrator: MetricResult and Perplexity Orchestrator->>Metrics: Perplexity.compute() Metrics-->>Orchestrator: MetricResult and Entropy Orchestrator->>Metrics: Entropy.compute() Metrics-->>Orchestrator: MetricResult and Semantic Orchestrator->>Metrics: Semantic.compute() Metrics-->>Orchestrator: MetricResult and Linguistic Orchestrator->>Metrics: Linguistic.compute() Metrics-->>Orchestrator: MetricResult and MPS Orchestrator->>Metrics: MPS.compute() Metrics-->>Orchestrator: MetricResult end Note over Orchestrator: Step 5: Ensemble Aggregation Orchestrator->>Ensemble: predict(metric_results, domain) Ensemble-->>Orchestrator: EnsembleResult Note over Orchestrator: Step 6: Services Orchestrator->>Services: generate_highlights() Services-->>Orchestrator: HighlightedSentences Orchestrator->>Services: generate_reasoning() Services-->>Orchestrator: DetailedReasoning Orchestrator->>Reporter: generate_report() Reporter-->>Orchestrator: Report Files Orchestrator-->>User: DetectionResult ``` ### Ensemble Aggregation Flow ```mermaid graph TD START[Metric Results] --> FILTER[Filter Valid MetricsRemove Errors] FILTER --> WEIGHTS[Get Domain WeightsBase Weights] WEIGHTS --> METHOD{Primary Method?} METHOD -->|Confidence Calibrated| CONF[Sigmoid ConfidenceAdjustment] METHOD -->|Consensus Based| CONS[AgreementCalculation] METHOD -->|Domain Weighted| DOMAIN[Static DomainWeights] CONF --> AGGREGATE[Weighted Aggregation] CONS --> AGGREGATE DOMAIN --> AGGREGATE AGGREGATE --> NORMALIZE[Normalize to 1.0] NORMALIZE --> CALC[Calculate Metrics] CALC --> CONFIDENCE[Overall ConfidenceBase + Agreement+ Certainty + Quality] CALC --> UNCERTAINTY[Uncertainty ScoreVariance + Confidence+ Decision] CALC --> CONSENSUS[Consensus LevelStd Dev Analysis] CONFIDENCE --> THRESHOLD[Apply AdaptiveThreshold] UNCERTAINTY --> THRESHOLD THRESHOLD --> VERDICT{Verdict} VERDICT -->|Synthetic >= 0.6| SYNTH[Synthetically-Generated] VERDICT -->|Authentic >= 0.6| AUTH[Authentically-Written] VERDICT -->|Hybrid > 0.25| HYBRID[Hybrid] VERDICT -->|Uncertain| UNC[Uncertain] SYNTH --> REASON[Generate Reasoning] AUTH --> REASON HYBRID --> REASON UNC --> REASON REASON --> RESULT[EnsembleResult] style START fill:#e8f5e9 style RESULT fill:#e3f2fd style SYNTH fill:#ffebee style AUTH fill:#e8f5e9 style HYBRID fill:#fff3e0 style UNC fill:#f5f5f5 ``` --- ## Technology Stack ### Core Technologies ```mermaid graph LR subgraph "Language & Runtime" PYTHON[Python 3.10+] CONDA[Conda Environment] end subgraph "ML Frameworks" TORCH[PyTorch] HF[HuggingFace Transformers] SPACY[spaCy] SKLEARN[scikit-learn] end subgraph "NLP Models" GPT2[GPT-2Perplexity/MPS] MINILM[MiniLM-L6-v2Semantic] ROBERTA[RoBERTaDomain Classify] DISTIL[DistilRoBERTaMPS Mask] XLM[XLM-RoBERTaLanguage Detect] SPACYMODEL[en_core_web_smLinguistic] end subgraph "Document Processing" PYMUPDF[PyMuPDF] PDFPLUMBER[pdfplumber] PYPDF2[PyPDF2] DOCX[python-docx] BS4[BeautifulSoup4] end subgraph "Utilities" NUMPY[NumPy] PYDANTIC[Pydantic] LOGURU[Loguru] REPORTLAB[ReportLab] end PYTHON --> TORCH TORCH --> HF HF --> GPT2 HF --> MINILM HF --> ROBERTA HF --> DISTIL HF --> XLM PYTHON --> SPACY SPACY --> SPACYMODEL style PYTHON fill:#306998 style TORCH fill:#ee4c2c style HF fill:#ff6f00 style SPACY fill:#09a3d5 ``` ### Dependencies Summary | Category | Libraries | Purpose | |----------|-----------|---------| | **ML Core** | PyTorch, Transformers, spaCy | Model execution, NLP | | **Document** | PyMuPDF, pdfplumber, python-docx | Multi-format extraction | | **Analysis** | NumPy, scikit-learn | Numerical computation | | **Validation** | Pydantic | Data validation | | **Logging** | Loguru | Structured logging | | **Reporting** | ReportLab | PDF generation | --- ## Deployment Architecture ```mermaid graph TB subgraph "Deployment Options" direction TB subgraph "Standalone Application" SCRIPT[Python Scripts] end subgraph "Web Application" FASTAPI[FastAPI Server] end subgraph "API Service" REST[REST API Endpoints] BATCH[Batch Processing] ASYNC[Async Workers] end subgraph "Infrastructure" DOCKER[Docker Container] GPU[GPU SupportOptional] STORAGE[Model Cache2.8GB] end end FASTAPI --> DOCKER REST --> DOCKER DOCKER --> GPU DOCKER --> STORAGE style FASTAPI fill:#e3f2fd style DOCKER fill:#2496ed style GPU fill:#76b900 ``` ### System Requirements - **Python**: 3.10+ - **RAM**: 8GB minimum, 16GB recommended - **Storage**: 5GB (models + data) - **GPU**: Optional (CUDA/MPS for faster inference) - **CPU**: 4+ cores for parallel execution --- ## Performance Characteristics ### Execution Modes ```mermaid graph LR subgraph "Sequential Mode" S1[Metric 1] --> S2[Metric 2] S2 --> S3[Metric 3] S3 --> S4[Metric 4] S4 --> S5[Metric 5] S5 --> S6[Metric 6] S6 --> SRESULT[~15-30s] end subgraph "Parallel Mode" P1[Metric 1] P2[Metric 2] P3[Metric 3] P4[Metric 4] P5[Metric 5] P6[Metric 6] P1 --> PRESULT[~8-12s] P2 --> PRESULT P3 --> PRESULT P4 --> PRESULT P5 --> PRESULT P6 --> PRESULT end style SRESULT fill:#ffebee style PRESULT fill:#e8f5e9 ``` ### Metric Execution Times | Metric | Avg Time | Complexity | Model Size | |--------|----------|------------|------------| | Structural | 0.5-1s | Low | 0MB | | Perplexity | 2-4s | Medium | 548MB | | Entropy | 1-2s | Medium | ~50MB (shared) | | Semantic | 3-5s | Medium | 80MB | | Linguistic | 2-3s | Medium | 13MB | | MPS | 5-10s | High | 878MB (GPT-2 + DistilRoBERTa) | **Total Sequential**: ~15-25 seconds **Total Parallel**: ~8-12 seconds (limited by slowest metric) --- ## Security & Privacy ### Data Handling ```mermaid graph TD INPUT[Text Input] --> PROCESS[Processing] PROCESS --> MEMORY[In-Memory Only] MEMORY --> ANALYSIS[Analysis] ANALYSIS --> CLEANUP[Auto Cleanup] MODELS[Model Cache] -.->|Read Only| ANALYSIS REPORTS[Optional Reports] --> STORAGE[Local Storage Only] CLEANUP --> DISCARD[Data Discarded] style INPUT fill:#e3f2fd style MEMORY fill:#fff3e0 style CLEANUP fill:#e8f5e9 style DISCARD fill:#ffebee ``` ### Security Features - ✅ **No External Data Transmission**: All processing local - ✅ **No Data Persistence**: Text data not stored by default - ✅ **Model Integrity**: Checksums for downloaded models - ✅ **Input Validation**: Pydantic schemas for all inputs - ✅ **Error Isolation**: Graceful degradation, no information leakage --- > This system does not claim ground truth authorship. It estimates probabilistic authenticity signals based on measurable text properties.