TEXT-AUTH: System Architecture Documentation
TEXT-AUTH is an evidence-first, domain-aware AI text detection system designed around independent signals, calibrated aggregation, and explainability rather than black-box classification.
Table of Contents
System Overview
TEXT-AUTH is a sophisticated AI text detection system that employs multiple machine learning metrics and ensemble methods to determine whether text is synthetically generated, authentically written, or hybrid content.
Key Capabilities
- Multi-Metric Analysis: 6 independent detection metrics (Structural, Perplexity, Entropy, Semantic, Linguistic, Multi-Perturbation Stability)
- Domain-Aware Calibration: Adaptive thresholds for 16 text domains (Academic, Creative, Technical, etc.)
- Ensemble Aggregation: Confidence-weighted combination with uncertainty quantification
- Sentence-Level Highlighting: Visual feedback with probability scores
- Comprehensive Reporting: JSON and PDF reports with detailed analysis
Design Principles
- Modular Architecture: Clean separation of concerns across layers
- Fail-Safe Design: Graceful degradation with fallback strategies
- Parallel Processing: Multi-threaded metric execution for performance
- Domain Expertise: Specialized thresholds calibrated per content type
Why Multi-Metric Instead of a Single Classifier?
- Single classifiers overfit stylistic artifacts
- LLMs rapidly adapt to detectors
- Independent statistical signals decay slower
- Ensemble disagreement is itself evidence
High-Level Architecture
graph TB
subgraph "Presentation Layer"
UI[Web Interface/API]
end
subgraph "Application Layer"
ORCH[Detection Orchestrator]
ORCH --> |coordinates| PIPE[Processing Pipeline]
end
subgraph "Service Layer"
ENSEMBLE[Ensemble Classifier]
HIGHLIGHT[Text Highlighter]
REASON[Reasoning Generator]
REPORT[Report Generator]
end
subgraph "Processing Layer"
EXTRACT[Document Extractor]
TEXTPROC[Text Processor]
DOMAIN[Domain Classifier]
LANG[Language Detector]
end
subgraph "Metrics Layer"
STRUCT[Structural Metric]
PERP[Perplexity Metric]
ENT[Entropy Metric]
SEM[Semantic Metric]
LING[Linguistic Metric]
MPS[Multi-Perturbation Stability]
end
subgraph "Model Layer"
MANAGER[Model Manager]
REGISTRY[Model Registry]
CACHE[(Model Cache)]
end
subgraph "Configuration Layer"
CONFIG[Settings]
ENUMS[Enums]
SCHEMAS[Data Schemas]
CONSTANTS[Constants]
THRESHOLDS[Domain Thresholds]
end
UI --> ORCH
ORCH --> EXTRACT
ORCH --> TEXTPROC
ORCH --> DOMAIN
ORCH --> LANG
ORCH --> STRUCT
ORCH --> PERP
ORCH --> ENT
ORCH --> SEM
ORCH --> LING
ORCH --> MPS
ORCH --> ENSEMBLE
ENSEMBLE --> HIGHLIGHT
ENSEMBLE --> REASON
ENSEMBLE --> REPORT
STRUCT --> MANAGER
PERP --> MANAGER
ENT --> MANAGER
SEM --> MANAGER
LING --> MANAGER
MPS --> MANAGER
DOMAIN --> MANAGER
LANG --> MANAGER
MANAGER --> REGISTRY
MANAGER --> CACHE
ORCH --> CONFIG
ENSEMBLE --> THRESHOLDS
style UI fill:#e1f5ff
style ORCH fill:#fff3e0
style ENSEMBLE fill:#f3e5f5
style MANAGER fill:#e8f5e9
style CONFIG fill:#fce4ec
Layer-by-Layer Architecture
1. Configuration Layer (config/)
The foundation layer providing enums, schemas, constants, and domain-specific thresholds.
graph LR
subgraph "Configuration Layer"
direction TB
ENUMS["enums.py
Domain, Language, Script,
ModelType ConfidenceLevel"]
SCHEMAS["schemas.py
ModelConfig, ProcessedText, MetricResult, EnsembleResult,
DetectionResult"]
CONSTANTS["constants.py
TextProcessingParams, MetricParams,
EnsembleParams"]
THRESHOLDS["threshold_config.py
DomainThresholds 16,
Domain Configs MetricThresholds"]
MODELCFG["model_config.py
Model Registry, Model Groups, Default Weights"]
SETTINGS["settings.py
App Settings, Paths, Feature Flags"]
end
ENUMS -.->|used by| SCHEMAS
ENUMS -.->|used by| THRESHOLDS
SCHEMAS -.->|used by| CONSTANTS
THRESHOLDS -.->|imports| ENUMS
MODELCFG -.->|imports| ENUMS
style ENUMS fill:#ffebee
style SCHEMAS fill:#fff3e0
style CONSTANTS fill:#e8f5e9
style THRESHOLDS fill:#e1f5ff
style MODELCFG fill:#f3e5f5
style SETTINGS fill:#fce4ec
Key Components:
- enums.py: Core enumerations (Domain, Language, Script, ModelType, ConfidenceLevel)
- schemas.py: Data classes for structured data exchange
- constants.py: Frozen dataclasses with hyperparameters for each metric
- threshold_config.py: Domain-specific thresholds for 16 domains
- model_config.py: Model registry with download priorities and configurations
- settings.py: Application settings with Pydantic validation
2. Model Abstraction Layer (models/)
Conceptual model abstraction layer used by metrics for centralized loading and reuse - loading, caching, and providing unified access.
graph TB
subgraph "Model Layer"
direction TB
MANAGER["Model Manager
Singleton Pattern Lazy Loading"]
REGISTRY["Model Registry
10 Model Configs Priority Groups"]
subgraph "Model Cache"
direction LR
GPT2[GPT-2548MBPerplexity/MPS]
MINILM[MiniLM-L6-v280MBSemantic]
SPACY[spaCy sm13MBLinguistic]
ROBERTA[RoBERTa500MBDomain Classifier]
DISTIL[DistilRoBERTa330MBMPS Mask]
XLM[XLM-RoBERTa1100MBLanguage Detection]
end
STATS[Usage StatisticsTracking Performance Metrics]
end
MANAGER -->|loads from| REGISTRY
MANAGER -->|manages| GPT2
MANAGER -->|manages| MINILM
MANAGER -->|manages| SPACY
MANAGER -->|manages| ROBERTA
MANAGER -->|manages| DISTIL
MANAGER -->|manages| XLM
MANAGER -->|tracks| STATS
REGISTRY -.->|defines| GPT2
REGISTRY -.->|defines| MINILM
REGISTRY -.->|defines| SPACY
style MANAGER fill:#e3f2fd
style REGISTRY fill:#f3e5f5
style STATS fill:#fff3e0
Key Features:
- Lazy Loading: Models loaded on-demand
- Caching Strategy: LRU cache with max 5 models
- Usage Tracking: Statistics for optimization
- Priority Groups: Essential, Extended, Optional
- Total Size: ~2.8GB for all models
3. Processing Layer (processors/)
Handles document extraction, text preprocessing, domain classification, and language detection.
graph TB
subgraph "Processing Layer"
direction TB
subgraph "Document Extraction"
EXTRACT[Document Extractor]
EXTRACT -->|PDF| PYPDF[PyMuPDF Primary]
EXTRACT -->|PDF| PDFPLUMB[pdfplumber Fallback]
EXTRACT -->|PDF| PYPDF2[PyPDF2 Fallback]
EXTRACT -->|DOCX| DOCX[python-docx]
EXTRACT -->|HTML| BS4[BeautifulSoup4]
EXTRACT -->|RTF| RTF[Basic Parser]
EXTRACT -->|TXT| TXT[Chardet Encoding]
end
subgraph "Text Processing"
TEXTPROC[Text Processor]
TEXTPROC --> CLEAN[Unicode NormalizationURL/Email RemovalWhitespace Cleaning]
TEXTPROC --> SPLIT[Smart Sentence SplittingAbbreviation HandlingWord Tokenization]
TEXTPROC --> VALIDATE[Length ValidationQuality ChecksStatistics]
end
subgraph "Domain Classification"
DOMAIN[Domain Classifier]
DOMAIN --> ZERO[Heuristic + optional model-assisted domain inference RoBERTa/DeBERTa]
DOMAIN --> LABELS[16 Domain LabelsMulti-Label Candidates]
DOMAIN --> THRESH[Domain-SpecificThreshold Selection]
end
subgraph "Language Detection"
LANG[Language Detector]
LANG --> MODEL[XLM-RoBERTaChunk-Based Analysis]
LANG --> FALLBACK[langdetect Library]
LANG --> HEURISTIC[Script DetectionCharacter Analysis]
end
end
EXTRACT -->|ProcessedText| TEXTPROC
TEXTPROC -->|Cleaned Text| DOMAIN
TEXTPROC -->|Cleaned Text| LANG
style EXTRACT fill:#e8f5e9
style TEXTPROC fill:#fff3e0
style DOMAIN fill:#e1f5ff
style LANG fill:#f3e5f5
Processing Pipeline:
- Document Extraction: Multi-format support with fallback strategies
- Text Cleaning: Unicode normalization, noise removal, validation
- Domain Classification: Zero-shot classification with confidence scores
- Language Detection: Multi-strategy approach with script analysis
4. Metrics Layer (metrics/)
Six independent detection metrics analyzing different text characteristics.
graph TB
subgraph "Metrics Layer"
direction TB
BASE[Base MetricAbstract ClassCommon Interface]
subgraph "Statistical Metrics"
STRUCT[Structural MetricNo ML ModelStatistical Features]
STRUCT --> SF1[Sentence Length DistributionBurstiness ScoreReadability]
STRUCT --> SF2[N-gram DiversityType-Token RatioRepetition Patterns]
end
subgraph "ML-Based Metrics"
PERP[Perplexity MetricGPT-2 ModelText Predictability]
PERP --> PF1[Overall PerplexitySentence-Level PerplexityCross-Entropy]
PERP --> PF2[Chunk AnalysisVariance ScoringNormalization]
ENT[Entropy MetricGPT-2 TokenizerRandomness Analysis]
ENT --> EF1[Character EntropyWord EntropyToken Entropy]
ENT --> EF2[Token DiversitySequence UnpredictabilityPattern Detection]
SEM[Semantic MetricMiniLM EmbeddingsCoherence Analysis]
SEM --> SF3[Sentence SimilarityTopic ConsistencyCoherence Score]
SEM --> SF4[Repetition DetectionTopic DriftContextual Consistency]
LING[Linguistic MetricspaCy NLPGrammar Analysis]
LING --> LF1[POS DiversityPOS EntropySyntactic Complexity]
LING --> LF2[Grammatical PatternsWriting StylePattern Detection]
MPS[Multi-PerturbationGPT-2 + DistilRoBERTaStability Analysis]
MPS --> MF1[Text PerturbationLikelihood CalculationStability Score]
MPS --> MF2[Curvature AnalysisChunk StabilityVariance Scoring]
end
end
BASE -.->|inherited by| STRUCT
BASE -.->|inherited by| PERP
BASE -.->|inherited by| ENT
BASE -.->|inherited by| SEM
BASE -.->|inherited by| LING
BASE -.->|inherited by| MPS
style BASE fill:#ffebee
style STRUCT fill:#e8f5e9
style PERP fill:#fff3e0
style ENT fill:#e1f5ff
style SEM fill:#f3e5f5
style LING fill:#fce4ec
style MPS fill:#fff9c4
Metric Characteristics:
| Metric | Model Required | Complexity | Typical Influence Range (Indicative) |
|---|---|---|---|
| Structural | ❌ | Low | 15-20% |
| Perplexity | GPT-2 | Medium | 20-27% |
| Entropy | GPT-2 Tokenizer | Medium | 13-17% |
| Semantic | MiniLM | Medium | 18-20% |
| Linguistic | spaCy | Medium | 12-16% |
| MPS | GPT-2 + DistilRoBERTa | High | 8-10% |
Actual weights are dynamically calibrated per domain and configuration.
5. Service Layer (services/)
Coordinates ensemble aggregation, highlighting, reasoning generation, and orchestration.
graph TB
subgraph "Service Layer"
direction TB
subgraph "Orchestrator"
ORCH[Detection OrchestratorPipeline Coordinator]
ORCH --> PIPE[Processing Pipeline6-Step Execution]
PIPE --> STEP1[1. Text Preprocessing]
PIPE --> STEP2[2. Language Detection]
PIPE --> STEP3[3. Domain Classification]
PIPE --> STEP4[4. Metric ExecutionParallel/Sequential]
PIPE --> STEP5[5. Ensemble Aggregation]
PIPE --> STEP6[6. Result Compilation]
end
subgraph "Ensemble Classifier"
ENSEMBLE[Ensemble ClassifierMulti-Strategy Aggregation]
ENSEMBLE --> METHOD1[Confidence CalibratedSigmoid Weighting]
ENSEMBLE --> METHOD2[Consensus BasedAgreement Rewards]
ENSEMBLE --> METHOD3[Domain WeightedStatic Weights]
ENSEMBLE --> METHOD4[Simple AverageFallback]
ENSEMBLE --> CALC[Uncertainty QuantificationConsensus AnalysisConfidence Scoring]
end
subgraph "Highlighter"
HIGHLIGHT[Text HighlighterSentence-Level Analysis]
HIGHLIGHT --> COLORS[4-Color SystemAuthentic/UncertainHybrid/Synthetic]
HIGHLIGHT --> SENTENCE[Sentence EnsembleDomain AdjustmentsTooltip Generation]
end
subgraph "Reasoning"
REASON[Reasoning GeneratorExplainable AI]
REASON --> SUMMARY[Executive SummaryVerdict Explanation]
REASON --> INDICATORS[Key IndicatorsMetric Breakdown]
REASON --> EVIDENCE[Supporting EvidenceContradicting Evidence]
REASON --> RECOM[RecommendationsUncertainty Analysis]
end
end
ORCH -->|coordinates| ENSEMBLE
ORCH -->|uses| HIGHLIGHT
ORCH -->|uses| REASON
ENSEMBLE -->|provides| HIGHLIGHT
ENSEMBLE -->|provides| REASON
style ORCH fill:#fff3e0
style ENSEMBLE fill:#e3f2fd
style HIGHLIGHT fill:#f3e5f5
style REASON fill:#e8f5e9
Service Features:
- Parallel Execution: ThreadPoolExecutor for metric computation
- Ensemble Methods: 4 aggregation strategies with fallbacks
- Sentence Highlighting: 4-category color system (Authentic/Uncertain/Hybrid/Synthetic)
- Explainable AI: Detailed reasoning with metric contributions
6. Reporter Layer (reporter/)
Generates comprehensive reports in multiple formats.
graph TB
subgraph "Reporter Layer"
direction TB
REPORT[Report Generator]
subgraph "JSON Report"
JSON[Structured JSON]
JSON --> META[Report MetadataTimestampVersion]
JSON --> RESULTS[Overall ResultsProbabilitiesConfidence]
JSON --> METRICS[Detailed MetricsSub-metricsWeights]
JSON --> REASONING[Detection ReasoningEvidenceRecommendations]
JSON --> HIGHLIGHT[Highlighted SentencesColor ClassesProbabilities]
JSON --> PERF[Performance MetricsExecution TimesWarnings/Errors]
end
subgraph "PDF Report"
PDF[Professional PDF]
PDF --> PAGE1[Page 1: Executive SummaryVerdict, Stats, Reasoning]
PDF --> PAGE2[Page 2: Content AnalysisDomain, Metrics, Weights]
PDF --> PAGE3[Page 3: Structural & Entropy]
PDF --> PAGE4[Page 4: Perplexity & Semantic]
PDF --> PAGE5[Page 5: Linguistic & MPS]
PDF --> PAGE6[Page 6: Recommendations]
STYLE[Premium Styling]
STYLE --> COLORS[Color SchemeBlue/Green/Red/Purple]
STYLE --> TABLES[Professional TablesCharts, Metrics]
STYLE --> LAYOUT[Multi-Page LayoutHeaders, Footers]
end
end
REPORT -->|generates| JSON
REPORT -->|generates| PDF
PDF -->|uses| STYLE
style REPORT fill:#fff3e0
style JSON fill:#e8f5e9
style PDF fill:#e3f2fd
style STYLE fill:#f3e5f5
Report Formats:
- JSON: Machine-readable with complete data
- PDF: Human-readable with professional formatting
- Charts: Pie charts for probability distribution
- Tables: Metric contributions, detailed sub-metrics
- Styling: Color-coded, multi-page layout with branding
Data Flow
Complete Detection Pipeline
sequenceDiagram
participant User
participant Orchestrator
participant Processors
participant Metrics
participant Ensemble
participant Services
participant Reporter
User->>Orchestrator: analyze(text)
Note over Orchestrator: Step 1: Preprocessing
Orchestrator->>Processors: TextProcessor.process()
Processors-->>Orchestrator: ProcessedText
Note over Orchestrator: Step 2: Language Detection
Orchestrator->>Processors: LanguageDetector.detect()
Processors-->>Orchestrator: LanguageResult
Note over Orchestrator: Step 3: Domain Classification
Orchestrator->>Processors: DomainClassifier.classify()
Processors-->>Orchestrator: DomainPrediction
Note over Orchestrator: Step 4: Parallel Metric Execution
par Structural
Orchestrator->>Metrics: Structural.compute()
Metrics-->>Orchestrator: MetricResult
and Perplexity
Orchestrator->>Metrics: Perplexity.compute()
Metrics-->>Orchestrator: MetricResult
and Entropy
Orchestrator->>Metrics: Entropy.compute()
Metrics-->>Orchestrator: MetricResult
and Semantic
Orchestrator->>Metrics: Semantic.compute()
Metrics-->>Orchestrator: MetricResult
and Linguistic
Orchestrator->>Metrics: Linguistic.compute()
Metrics-->>Orchestrator: MetricResult
and MPS
Orchestrator->>Metrics: MPS.compute()
Metrics-->>Orchestrator: MetricResult
end
Note over Orchestrator: Step 5: Ensemble Aggregation
Orchestrator->>Ensemble: predict(metric_results, domain)
Ensemble-->>Orchestrator: EnsembleResult
Note over Orchestrator: Step 6: Services
Orchestrator->>Services: generate_highlights()
Services-->>Orchestrator: HighlightedSentences
Orchestrator->>Services: generate_reasoning()
Services-->>Orchestrator: DetailedReasoning
Orchestrator->>Reporter: generate_report()
Reporter-->>Orchestrator: Report Files
Orchestrator-->>User: DetectionResult
Ensemble Aggregation Flow
graph TD
START[Metric Results] --> FILTER[Filter Valid MetricsRemove Errors]
FILTER --> WEIGHTS[Get Domain WeightsBase Weights]
WEIGHTS --> METHOD{Primary Method?}
METHOD -->|Confidence Calibrated| CONF[Sigmoid ConfidenceAdjustment]
METHOD -->|Consensus Based| CONS[AgreementCalculation]
METHOD -->|Domain Weighted| DOMAIN[Static DomainWeights]
CONF --> AGGREGATE[Weighted Aggregation]
CONS --> AGGREGATE
DOMAIN --> AGGREGATE
AGGREGATE --> NORMALIZE[Normalize to 1.0]
NORMALIZE --> CALC[Calculate Metrics]
CALC --> CONFIDENCE[Overall ConfidenceBase + Agreement+ Certainty + Quality]
CALC --> UNCERTAINTY[Uncertainty ScoreVariance + Confidence+ Decision]
CALC --> CONSENSUS[Consensus LevelStd Dev Analysis]
CONFIDENCE --> THRESHOLD[Apply AdaptiveThreshold]
UNCERTAINTY --> THRESHOLD
THRESHOLD --> VERDICT{Verdict}
VERDICT -->|Synthetic >= 0.6| SYNTH[Synthetically-Generated]
VERDICT -->|Authentic >= 0.6| AUTH[Authentically-Written]
VERDICT -->|Hybrid > 0.25| HYBRID[Hybrid]
VERDICT -->|Uncertain| UNC[Uncertain]
SYNTH --> REASON[Generate Reasoning]
AUTH --> REASON
HYBRID --> REASON
UNC --> REASON
REASON --> RESULT[EnsembleResult]
style START fill:#e8f5e9
style RESULT fill:#e3f2fd
style SYNTH fill:#ffebee
style AUTH fill:#e8f5e9
style HYBRID fill:#fff3e0
style UNC fill:#f5f5f5
Technology Stack
Core Technologies
graph LR
subgraph "Language & Runtime"
PYTHON[Python 3.10+]
CONDA[Conda Environment]
end
subgraph "ML Frameworks"
TORCH[PyTorch]
HF[HuggingFace Transformers]
SPACY[spaCy]
SKLEARN[scikit-learn]
end
subgraph "NLP Models"
GPT2[GPT-2Perplexity/MPS]
MINILM[MiniLM-L6-v2Semantic]
ROBERTA[RoBERTaDomain Classify]
DISTIL[DistilRoBERTaMPS Mask]
XLM[XLM-RoBERTaLanguage Detect]
SPACYMODEL[en_core_web_smLinguistic]
end
subgraph "Document Processing"
PYMUPDF[PyMuPDF]
PDFPLUMBER[pdfplumber]
PYPDF2[PyPDF2]
DOCX[python-docx]
BS4[BeautifulSoup4]
end
subgraph "Utilities"
NUMPY[NumPy]
PYDANTIC[Pydantic]
LOGURU[Loguru]
REPORTLAB[ReportLab]
end
PYTHON --> TORCH
TORCH --> HF
HF --> GPT2
HF --> MINILM
HF --> ROBERTA
HF --> DISTIL
HF --> XLM
PYTHON --> SPACY
SPACY --> SPACYMODEL
style PYTHON fill:#306998
style TORCH fill:#ee4c2c
style HF fill:#ff6f00
style SPACY fill:#09a3d5
Dependencies Summary
| Category | Libraries | Purpose |
|---|---|---|
| ML Core | PyTorch, Transformers, spaCy | Model execution, NLP |
| Document | PyMuPDF, pdfplumber, python-docx | Multi-format extraction |
| Analysis | NumPy, scikit-learn | Numerical computation |
| Validation | Pydantic | Data validation |
| Logging | Loguru | Structured logging |
| Reporting | ReportLab | PDF generation |
Deployment Architecture
graph TB
subgraph "Deployment Options"
direction TB
subgraph "Standalone Application"
SCRIPT[Python Scripts]
end
subgraph "Web Application"
FASTAPI[FastAPI Server]
end
subgraph "API Service"
REST[REST API Endpoints]
BATCH[Batch Processing]
ASYNC[Async Workers]
end
subgraph "Infrastructure"
DOCKER[Docker Container]
GPU[GPU SupportOptional]
STORAGE[Model Cache2.8GB]
end
end
FASTAPI --> DOCKER
REST --> DOCKER
DOCKER --> GPU
DOCKER --> STORAGE
style FASTAPI fill:#e3f2fd
style DOCKER fill:#2496ed
style GPU fill:#76b900
System Requirements
- Python: 3.10+
- RAM: 8GB minimum, 16GB recommended
- Storage: 5GB (models + data)
- GPU: Optional (CUDA/MPS for faster inference)
- CPU: 4+ cores for parallel execution
Performance Characteristics
Execution Modes
graph LR
subgraph "Sequential Mode"
S1[Metric 1] --> S2[Metric 2]
S2 --> S3[Metric 3]
S3 --> S4[Metric 4]
S4 --> S5[Metric 5]
S5 --> S6[Metric 6]
S6 --> SRESULT[~15-30s]
end
subgraph "Parallel Mode"
P1[Metric 1]
P2[Metric 2]
P3[Metric 3]
P4[Metric 4]
P5[Metric 5]
P6[Metric 6]
P1 --> PRESULT[~8-12s]
P2 --> PRESULT
P3 --> PRESULT
P4 --> PRESULT
P5 --> PRESULT
P6 --> PRESULT
end
style SRESULT fill:#ffebee
style PRESULT fill:#e8f5e9
Metric Execution Times
| Metric | Avg Time | Complexity | Model Size |
|---|---|---|---|
| Structural | 0.5-1s | Low | 0MB |
| Perplexity | 2-4s | Medium | 548MB |
| Entropy | 1-2s | Medium | ~50MB (shared) |
| Semantic | 3-5s | Medium | 80MB |
| Linguistic | 2-3s | Medium | 13MB |
| MPS | 5-10s | High | 878MB (GPT-2 + DistilRoBERTa) |
Total Sequential: ~15-25 seconds
Total Parallel: ~8-12 seconds (limited by slowest metric)
Security & Privacy
Data Handling
graph TD
INPUT[Text Input] --> PROCESS[Processing]
PROCESS --> MEMORY[In-Memory Only]
MEMORY --> ANALYSIS[Analysis]
ANALYSIS --> CLEANUP[Auto Cleanup]
MODELS[Model Cache] -.->|Read Only| ANALYSIS
REPORTS[Optional Reports] --> STORAGE[Local Storage Only]
CLEANUP --> DISCARD[Data Discarded]
style INPUT fill:#e3f2fd
style MEMORY fill:#fff3e0
style CLEANUP fill:#e8f5e9
style DISCARD fill:#ffebee
Security Features
- ✅ No External Data Transmission: All processing local
- ✅ No Data Persistence: Text data not stored by default
- ✅ Model Integrity: Checksums for downloaded models
- ✅ Input Validation: Pydantic schemas for all inputs
- ✅ Error Isolation: Graceful degradation, no information leakage
This system does not claim ground truth authorship. It estimates probabilistic authenticity signals based on measurable text properties.