YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
SKT-SURYA-H
SKT AI Labs
Developed in Sidhi, Madhya Pradesh, India
Model Overview
SKT-SURYA-H is an experimental heterogeneous Mixture-of-Experts (MoE) model created using early-stage Weight Manifold Fusion (WMF) technique.
Important Disclaimer:
This upload is an experimental merge/collection of weights from multiple open-source base models.
It is NOT a fully trained unified 2.544 trillion parameter model from scratch.
Calculated total parameters from all experts ≈ 2.28T.
We are treating this as a learning project. We have updated this card with clearer information after community feedback. We are now focusing on proper joint fine-tuning and releasing smaller, fully verifiable versions.
Base Models Used & Credits
We sincerely thank the following organizations and teams for their open-source contributions:
Meta Llama 3.1 405B
Link: https://huggingface.co/meta-llama/Llama-3.1-405B
License: Llama 3.1 Community LicenseDeepSeek-V3
Link: https://huggingface.co/deepseek-ai/DeepSeek-V3
License: MIT LicenseDeepSeek-R1
Link: https://huggingface.co/deepseek-ai/DeepSeek-R1
License: MIT LicenseOther supporting models from the DeepSeek and GLM families (used for specific expert clusters)
All base models have been used in accordance with their respective licenses. We deeply appreciate the open-source community for making high-quality models available.
Architecture
- Type: Causal Language Model with Heterogeneous MoE
- Expert Architecture: 5 expert clusters (inspired by Paanch-Mukhi concept)
- Fusion Method: Early Weight Manifold Fusion (WMF) – experimental topological merging approach
- Context Length: Up to 1M tokens (experimental with YaRN extension)
- Precision: Primarily BF16 with mixed FP8/FP32 in some experts
- Size: ~2.5–3.76 TB (887 safetensors shards)
Training & Data
- Primary Data: 16T+ cleaned Hindi/Indic tokens + 10.26 TB high-entropy Magnum Corpus + private Vedic/Sanskrit corpus
- Hardware: Trained/fine-tuned on 16× A100 nodes (part of a larger cluster with 8 EB storage)
- Current Stage: Experimental merging + continued fine-tuning
Benchmark Results
Important Note:
All scores below are from our internal Bharat-Eval Suite (focused on Indic languages, Sanskrit, Vedic knowledge, and domain-specific tasks). These results are experimental and not yet independently verified by the community.
A fully runnable version is under development. We will soon release:
- A smaller quantized version (100B–400B effective scale) for public testing
- Public standard benchmarks (MMLU, GPQA, LiveCodeBench, etc.) with side-by-side comparison against base models
- Evaluation scripts and logs for full transparency
Internal Bharat-Eval Scores (Experimental):
- Sanskrit Comprehension: 94.3
- Hinglish Understanding: 91.4
- Vedic Philosophy (Vedanta): 87.2
- Indian Constitutional Law: 89.7
- (Other domain-specific scores available in evaluation logs)
These numbers reflect performance on our custom Indic + Vedic-heavy data mix. Real-world performance may vary.
Intended Use & Limitations
- Intended Use: Research, Indic language understanding, Vedic/Sanskrit knowledge exploration, and experimentation with advanced merging techniques.
- Limitations: This is an early experimental model. It may have inconsistencies due to heterogeneous experts. Not recommended for production use yet.
- Ethical Considerations: We are committed to responsible AI. Please use the model ethically and report any concerning behavior.
Next Steps & Roadmap
We are actively working on:
- Proper joint training and improved WMF fusion
- Releasing a smaller, runnable, quantized version for community testing
- Transparent technical report with fusion details and ablation studies
- Stronger public benchmarks and reproducibility
We welcome constructive technical feedback and collaboration from researchers interested in sovereign Indic AI and weight merging techniques.
License
- This collection: CC-BY-2.0 (with additional requirements from base model licenses)
- Users must comply with all base model licenses (Llama 3.1 Community License, MIT, etc.)
Developed with ❤️ in Sidhi, Madhya Pradesh, India