Spaces:
Sleeping
TEXT-AUTH API Documentation
Overview
The TEXT-AUTH API provides evidence-based text forensics and statistical consistency assessment through a RESTful interface. This document covers all endpoints, request/response formats, authentication, rate limiting, and integration examples.
API Version: 1.0.0
Table of Contents
- Authentication & Security
- Rate Limiting
- Common Response Format
- Error Handling
- Core Endpoints
- Report Endpoints
- Utility Endpoints
- Best Practices
Authentication & Security
API Key Authentication
Authentication is not enforced in the current deployment. API key authentication may be added in future versions.
Rate Limiting
Rate limiting is not enforced at the application level. Deployments should use an external gateway (NGINX, API Gateway, Cloudflare) to enforce rate limits if required.
Common Response Format
All successful responses follow this structure:
{
"status": "success",
"analysis_id": "...",
"detection_result": {...},
"highlighted_html": "...",
"reasoning": {...},
"processing_time": 2.34,
"timestamp": "..."
}
HTTP Status Codes
| Code | Meaning | Description |
|---|---|---|
| 200 | OK | Request succeeded |
| 201 | Created | Resource created successfully |
| 400 | Bad Request | Invalid request parameters |
| 404 | Not Found | Resource not found |
| 500 | Internal Server Error | Server error |
| 503 | Service Unavailable | Service temporarily unavailable |
Error Handling
Error Response Format
{
"status": "error",
"error": "Invalid domain...",
"timestamp": "..."
}
Common Error Codes
| Code | Description | Resolution |
|---|---|---|
TEXT_TOO_LONG |
Text exceeds maximum length (50,000 chars) | Split into multiple requests |
FILE_TOO_LARGE |
File exceeds size limit | Compress or split file |
UNSUPPORTED_FORMAT |
File format not supported | Use .txt, .pdf, .docx, .doc, or .md |
EXTRACTION_FAILED |
Document text extraction failed | Ensure file is not corrupted or password-protected |
MODEL_UNAVAILABLE |
Required model temporarily unavailable | Retry after a few minutes |
Core Endpoints
Text Analysis
Endpoint: POST /api/analyze
Analyze raw text for statistical consistency patterns and forensic signals.
Request
Headers:
Content-Type: application/json
Body:
{
"text": "Your text content here...",
"domain": "academic",
"enable_highlighting": true,
"skip_expensive_metrics": false,
"use_sentence_level": true,
"include_metrics_summary": true,
"generate_report": false
}
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text |
string | Yes | - | Text to analyze (50-50,000 chars) |
domain |
string | No | null (auto-detect) |
Content domain (see Domains) |
enable_highlighting |
boolean | No | true |
Generate sentence-level highlights |
skip_expensive_metrics |
boolean | No | false |
Skip computationally expensive metrics for faster results |
use_sentence_level |
boolean | No | true |
Use sentence-level granularity for highlighting |
include_metrics_summary |
boolean | No | true |
Include metric summaries in highlights |
generate_report |
boolean | No | false |
Generate downloadable PDF/JSON report |
Response
{
"status": "success",
"analysis_id": "analysis_1735555800000",
"detection_result": {
"ensemble_result": {
"final_verdict": "Synthetic",
"overall_confidence": 0.89,
"synthetic_probability": 0.92,
"authentic_probability": 0.08,
"uncertainty_score": 0.23,
"decision_boundary_distance": 0.42
},
"metric_results": {
"perplexity": {
"synthetic_probability": 0.94,
"confidence": 0.91,
"raw_score": 15.23,
"evidence_strength": "strong"
},
"entropy": {
"synthetic_probability": 0.88,
"confidence": 0.85,
"raw_score": 4.67,
"evidence_strength": "moderate"
},
"structural": {
"synthetic_probability": 0.91,
"confidence": 0.87,
"burstiness": -0.12,
"uniformity": 0.85,
"evidence_strength": "strong"
},
"linguistic": {
"synthetic_probability": 0.86,
"confidence": 0.82,
"pos_diversity": 0.42,
"mean_tree_depth": 4.2,
"evidence_strength": "moderate"
},
"semantic": {
"synthetic_probability": 0.93,
"confidence": 0.88,
"coherence_mean": 0.91,
"coherence_variance": 0.03,
"evidence_strength": "strong"
},
"multi_perturbation_stability": {
"synthetic_probability": 0.89,
"confidence": 0.84,
"stability_score": 0.12,
"evidence_strength": "moderate"
}
},
"domain_prediction": {
"primary_domain": "academic",
"confidence": 0.94,
"alternative_domains": [
{"domain": "technical_doc", "probability": 0.23},
{"domain": "science", "probability": 0.18}
]
},
"processed_text": {
"word_count": 487,
"sentence_count": 23,
"paragraph_count": 5,
"avg_sentence_length": 21.2,
"language": "en"
}
},
"highlighted_html": "<div class=\"text-forensics-highlight\">...</div>",
"reasoning": {
"summary": "The text exhibits strong statistical consistency patterns typical of language model generation...",
"key_indicators": [
"Unusually uniform sentence structure (burstiness: -0.12)",
"High semantic coherence across all sentences (mean: 0.91)",
"Low perplexity variance indicating predictable token sequences"
],
"confidence_factors": {
"supporting_evidence": [
"6/6 metrics indicate synthetic patterns",
"Strong cross-metric agreement (correlation: 0.87)"
],
"uncertainty_sources": [
"Domain-specific terminology may affect baseline expectations"
]
},
"metric_contributions": {
"perplexity": 0.28,
"entropy": 0.19,
"structural": 0.16,
"semantic": 0.17,
"linguistic": 0.12,
"multi_perturbation_stability": 0.08
}
},
"report_files": null,
"processing_time": 2.34,
"timestamp": "2025-12-30T10:30:00Z"
}
Verdict Interpretation
| Verdict | Probability Range | Interpretation |
|---|---|---|
| Synthetic | > 0.70 | High consistency with language model generation patterns |
| Likely Synthetic | 0.55 - 0.70 | Moderate consistency with synthetic patterns |
| Inconclusive | 0.45 - 0.55 | Insufficient evidence for confident assessment |
| Likely Authentic | 0.30 - 0.45 | Moderate consistency with human authorship patterns |
| Authentic | < 0.30 | High consistency with human authorship patterns |
Important: These verdicts represent statistical consistency assessments, not definitive authorship claims.
Highlighting Color Key
| Color | Meaning | Probability Range |
|---|---|---|
| 🔴 Red | Strong synthetic signals | > 0.80 |
| 🟠 Orange | Moderate synthetic signals | 0.60 - 0.80 |
| 🟡 Yellow | Weak signals | 0.40 - 0.60 |
| 🟢 Green | Authentic signals | < 0.40 |
File Analysis
Endpoint: POST /api/analyze/file
Analyze uploaded documents (PDF, DOCX, DOC, TXT, MD).
Request
Headers:
Content-Type: multipart/form-data
Body (form-data):
file: [binary file data]
domain: "academic"
skip_expensive_metrics: false
use_sentence_level: true
include_metrics_summary: true
generate_report: false
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
file | Yes | - | Document file (max 25MB) |
domain |
string | No | null |
Content domain override |
skip_expensive_metrics |
boolean | No | false |
Skip expensive metrics |
use_sentence_level |
boolean | No | true |
Sentence-level highlighting |
include_metrics_summary |
boolean | No | true |
Include metric summaries |
generate_report |
boolean | No | false |
Generate report |
Supported File Formats
| Format | Extensions | Max Size | Notes |
|---|---|---|---|
| Plain Text | .txt, .md | 25MB | UTF-8 encoding recommended |
| 25MB | Text-based PDFs; OCR not supported | ||
| Word | .docx, .doc | 25MB | Modern and legacy formats |
Response
Same structure as Text Analysis with additional file_info:
{
"status": "success",
"analysis_id": "file_1735555800000",
"file_info": {
"filename": "research_paper.pdf",
"file_type": ".pdf",
"pages": 12,
"extraction_method": "pdfplumber",
"highlighted_html": true
},
"detection_result": { /* same as text analysis */ },
"highlighted_html": "...",
"reasoning": { /* same as text analysis */ },
"processing_time": 4.12,
"timestamp": "2025-12-30T10:30:00Z"
}
cURL Example
curl -X POST https://your-domain.com/api/analyze/file \
-F "file=@/path/to/document.pdf" \
-F "domain=academic" \
-F "generate_report=true"
Batch Analysis
Endpoint: POST /api/analyze/batch
Analyze multiple texts in a single request for efficiency.
Request
{
"texts": [
"First text to analyze...",
"Second text to analyze...",
"Third text to analyze..."
],
"domain": "academic",
"skip_expensive_metrics": true,
"generate_reports": false
}
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
texts |
array[string] | Yes | - | 1-100 texts to analyze |
domain |
string | No | null |
Apply same domain to all texts |
skip_expensive_metrics |
boolean | No | true |
Skip expensive metrics (recommended for batch) |
generate_reports |
boolean | No | false |
Generate reports for each text |
Response
{
"status": "success",
"batch_id": "batch_1735555800000",
"total": 3,
"successful": 3,
"failed": 0,
"results": [
{
"index": 0,
"status": "success",
"detection": {
"ensemble_result": { /* ... */ },
"metric_results": { /* ... */ }
},
"reasoning": { /* ... */ },
"report_files": null
},
{
"index": 1,
"status": "success",
"detection": { /* ... */ }
},
{
"index": 2,
"status": "error",
"error": "Text too short (minimum 50 characters)"
}
],
"processing_time": 8.92,
"timestamp": "2025-12-30T10:30:00Z"
}
Performance Tips
- Set
skip_expensive_metrics: truefor faster batch processing - Keep batch size under 50 texts for optimal performance
- Consider parallel API calls for batches > 100 texts
- Monitor
processing_timeto adjust batch sizes
Report Endpoints
Generate Report
Endpoint: POST /api/report/generate
Generate detailed PDF/JSON reports for cached analyses.
Request
Headers:
Content-Type: application/x-www-form-urlencoded
Body:
analysis_id=analysis_1735555800000
formats=json,pdf
include_highlights=true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
analysis_id |
string | Yes | - | Analysis ID from previous request |
formats |
string | No | "json,pdf" |
Comma-separated formats |
include_highlights |
boolean | No | true |
Include sentence highlights in report |
Response
{
"status": "success",
"analysis_id": "analysis_1735555800000",
"reports": {
"json": "analysis_1735555800000.json",
"pdf": "analysis_1735555800000.pdf"
},
"timestamp": "2025-12-30T10:30:00Z"
}
Download Report
Endpoint: GET /api/report/download/{filename}
Download a generated report file.
Request
GET /api/report/download/analysis_1735555800000.pdf
Response
Binary file download with appropriate Content-Type header.
Headers:
Content-Type: application/pdf
Content-Disposition: attachment; filename="analysis_1735555800000.pdf"
Content-Length: 524288
Utility Endpoints
Health Check
Endpoint: GET /health
Check API health and model availability.
Response
{
"status": "healthy",
"version": "1.0.0",
"uptime": 86400.5,
"models_loaded": {
"orchestrator": true,
"highlighter": true,
"reporter": true,
"reasoning_generator": true,
"document_extractor": true,
"analysis_cache": true,
"parallel_executor": true
}
}
List Domains
Endpoint: GET /api/domains
Get all supported content domains with descriptions.
Response
{
"domains": [
{
"value": "general",
"name": "General",
"description": "General-purpose text without domain-specific structure"
},
{
"value": "academic",
"name": "Academic",
"description": "Academic papers, essays, research"
},
{
"value": "creative",
"name": "Creative",
"description": "Creative writing, fiction, poetry"
},
{
"value": "technical_doc",
"name": "Technical Doc",
"description": "Technical documentation, manuals, specs"
}
// ... 12 more domains
]
}
Supported Domains
| Domain | Use Cases | Threshold Adjustments |
|---|---|---|
general |
Default fallback | Balanced weights |
academic |
Research papers, essays | Higher linguistic weight |
creative |
Fiction, poetry | Higher entropy/structural |
ai_ml |
ML papers, technical AI content | Semantic prioritized |
software_dev |
Code docs, READMEs | Structural relaxed |
technical_doc |
Manuals, specs | Higher semantic weight |
engineering |
Technical reports | Balanced technical focus |
science |
Scientific papers | Academic-like calibration |
business |
Reports, proposals | Formal structure emphasis |
legal |
Contracts, court filings | Strict structural patterns |
medical |
Clinical notes, research | Domain-specific terminology |
journalism |
News articles | Balanced, lower burstiness |
marketing |
Ad copy, campaigns | Creative elements |
social_media |
Posts, casual writing | Relaxed metrics, high perplexity weight |
blog_personal |
Personal blogs, diaries | Creative + casual mix |
tutorial |
How-to guides | Instructional patterns |
Cache Statistics
Endpoint: GET /api/cache/stats
Get analysis cache statistics (admin only).
Response
{
"cache_size": 42,
"max_size": 100,
"ttl_seconds": 3600
}
Clear Cache
Endpoint: POST /api/cache/clear
Clear analysis cache (admin only).
Response
{
"status": "success",
"message": "Cache cleared"
}
Best Practices
Optimization Tips
Domain Selection
- Always specify domain when known for better accuracy
- Use
/api/domainsto explore available options - Let system auto-detect only when domain is truly unknown
Performance
- Set
skip_expensive_metrics: truefor faster results when speed matters - Use batch API for multiple texts instead of sequential single requests
- Cache
analysis_idto regenerate reports without reanalysis
- Set
Accuracy
- Provide clean, well-formatted text (remove excessive whitespace)
- Minimum 100 words recommended for reliable results
- Avoid mixing languages in single analysis
Rate Limiting
- Implement exponential backoff on 429 responses
- Monitor
X-RateLimit-Remainingheader - Upgrade tier if consistently hitting limits
Error Handling
- Always check
statusfield in response - Log
request_idfor support requests - Implement retry logic with jitter for transient errors
- Always check
Security Recommendations
API Key Management
- Rotate keys every 90 days
- Use separate keys for dev/staging/production
- Revoke compromised keys immediately
Data Privacy
- Never send PII unless absolutely necessary
- Use client-side redaction before API calls
- Enable data retention policies in dashboard
Input Validation
- Sanitize user input before sending to API
- Validate file types client-side
- Implement size limits before upload
Version History:
- 1.0.0 (2025-12-30): Initial release
- 6 forensic metrics
- 16 domain support
- PDF/JSON reporting
- Batch processing
Appendix
Complete Domain List with Aliases
DOMAIN_ALIASES = {
'general': ['default', 'generic'],
'academic': ['education', 'research', 'scholarly', 'university'],
'creative': ['fiction', 'literature', 'story', 'narrative'],
'ai_ml': ['ai', 'ml', 'machinelearning', 'neural'],
'software_dev': ['software', 'code', 'programming', 'dev'],
'technical_doc': ['technical', 'tech', 'documentation', 'manual'],
'engineering': ['engineer'],
'science': ['scientific'],
'business': ['corporate', 'commercial', 'enterprise'],
'legal': ['law', 'contract', 'court'],
'medical': ['healthcare', 'clinical', 'medicine', 'health'],
'journalism': ['news', 'reporting', 'media', 'press'],
'marketing': ['advertising', 'promotional', 'brand', 'sales'],
'social_media': ['social', 'casual', 'informal', 'posts'],
'blog_personal': ['blog', 'personal', 'diary', 'lifestyle'],
'tutorial': ['guide', 'howto', 'instructional', 'walkthrough']
}
Metric Weight Defaults
DEFAULT_WEIGHTS = {
'perplexity': 0.25,
'entropy': 0.20,
'structural': 0.15,
'semantic': 0.15,
'linguistic': 0.15,
'multi_perturbation_stability': 0.10
}
Response Time Estimates
| Operation | Min | Avg | Max | P95 |
|---|---|---|---|---|
| Text Analysis (500 words) | 1.2s | 2.3s | 4.5s | 3.8s |
| File Analysis (PDF, 10 pages) | 2.5s | 4.1s | 8.2s | 6.9s |
| Batch (10 texts) | 5.8s | 9.2s | 15.3s | 13.1s |
| Report Generation | 0.3s | 0.8s | 2.1s | 1.5s |
Last Updated: December 30, 2025
API Version: 1.0.0
Documentation Version: 1.0.0