You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SKT-SURYA-H

SKT AI Labs
Developed in Sidhi, Madhya Pradesh, India

Model Overview

SKT-SURYA-H is an experimental heterogeneous Mixture-of-Experts (MoE) model created using early-stage Weight Manifold Fusion (WMF) technique.

Important Disclaimer:
This upload is an experimental merge/collection of weights from multiple open-source base models.
It is NOT a fully trained unified 2.544 trillion parameter model from scratch.
Calculated total parameters from all experts ≈ 2.28T.

We are treating this as a learning project. We have updated this card with clearer information after community feedback. We are now focusing on proper joint fine-tuning and releasing smaller, fully verifiable versions.

Base Models Used & Credits

We sincerely thank the following organizations and teams for their open-source contributions:

Meta Llama 3.1 405B
Link: https://huggingface.co/meta-llama/Llama-3.1-405B
License: Llama 3.1 Community License
DeepSeek-V3
Link: https://huggingface.co/deepseek-ai/DeepSeek-V3
License: MIT License
DeepSeek-R1
Link: https://huggingface.co/deepseek-ai/DeepSeek-R1
License: MIT License
Other supporting models from the DeepSeek and GLM families (used for specific expert clusters)

All base models have been used in accordance with their respective licenses. We deeply appreciate the open-source community for making high-quality models available.

Architecture

Type: Causal Language Model with Heterogeneous MoE
Expert Architecture: 5 expert clusters (inspired by Paanch-Mukhi concept)
Fusion Method: Early Weight Manifold Fusion (WMF) – experimental topological merging approach
Context Length: Up to 1M tokens (experimental with YaRN extension)
Precision: Primarily BF16 with mixed FP8/FP32 in some experts
Size: ~2.5–3.76 TB (887 safetensors shards)

Training & Data

Primary Data: 16T+ cleaned Hindi/Indic tokens + 10.26 TB high-entropy Magnum Corpus + private Vedic/Sanskrit corpus
Hardware: Trained/fine-tuned on 16× A100 nodes (part of a larger cluster with 8 EB storage)
Current Stage: Experimental merging + continued fine-tuning

Benchmark Results

Important Note:
All scores below are from our internal Bharat-Eval Suite (focused on Indic languages, Sanskrit, Vedic knowledge, and domain-specific tasks). These results are experimental and not yet independently verified by the community.

A fully runnable version is under development. We will soon release:

A smaller quantized version (100B–400B effective scale) for public testing
Public standard benchmarks (MMLU, GPQA, LiveCodeBench, etc.) with side-by-side comparison against base models
Evaluation scripts and logs for full transparency

Internal Bharat-Eval Scores (Experimental):

Sanskrit Comprehension: 94.3
Hinglish Understanding: 91.4
Vedic Philosophy (Vedanta): 87.2
Indian Constitutional Law: 89.7
(Other domain-specific scores available in evaluation logs)

These numbers reflect performance on our custom Indic + Vedic-heavy data mix. Real-world performance may vary.

Intended Use & Limitations

Intended Use: Research, Indic language understanding, Vedic/Sanskrit knowledge exploration, and experimentation with advanced merging techniques.
Limitations: This is an early experimental model. It may have inconsistencies due to heterogeneous experts. Not recommended for production use yet.
Ethical Considerations: We are committed to responsible AI. Please use the model ethically and report any concerning behavior.

Next Steps & Roadmap

We are actively working on:

Proper joint training and improved WMF fusion
Releasing a smaller, runnable, quantized version for community testing
Transparent technical report with fusion details and ablation studies
Stronger public benchmarks and reproducibility

We welcome constructive technical feedback and collaboration from researchers interested in sovereign Indic AI and weight merging techniques.

License

This collection: CC-BY-2.0 (with additional requirements from base model licenses)
Users must comply with all base model licenses (Llama 3.1 Community License, MIT, etc.)

Developed with ❤️ in Sidhi, Madhya Pradesh, India

Downloads last month: -; Downloads are not tracked for this model. How to track

Safetensors

Model size

2.6T params

Tensor type

BF16

F32

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including sKT-Ai-Labs/SKT-SURYA-H

SKT-SURYA-H

Collection

Surya -H SERIES World Most Powerfull Agent's In the Community • 2 items • Updated 5 days ago • 3