journal-menarik
updated
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse
Attention
Paper
• 2502.11089
• Published
• 168
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
• 2402.03300
• Published
• 141
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 441
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
• 2504.07128
• Published
• 87
BioBERT: a pre-trained biomedical language representation model for
biomedical text mining
Paper
• 1901.08746
• Published
• 6
TableFormer: Table Structure Understanding with Transformers
Paper
• 2203.01017
• Published
• 1
Language Modeling on Tabular Data: A Survey of Foundations, Techniques
and Evolution
Paper
• 2408.10548
• Published
DTT: An Example-Driven Tabular Transformer for Joinability by Leveraging
Large Language Models
Paper
• 2303.06748
• Published
XTab: Cross-table Pretraining for Tabular Transformers
Paper
• 2305.06090
• Published
UniPredict: Large Language Models are Universal Tabular Classifiers
Paper
• 2310.03266
• Published
Language Models are Realistic Tabular Data Generators
Paper
• 2210.06280
• Published
• 1
TabNet: Attentive Interpretable Tabular Learning
Paper
• 1908.07442
• Published
• 1
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published
• 112
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper
• 2408.12570
• Published
• 32
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150