Instructions to use thelamapi/next-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use thelamapi/next-4b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="thelamapi/next-4b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("thelamapi/next-4b")
model = AutoModelForImageTextToText.from_pretrained("thelamapi/next-4b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use thelamapi/next-4b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="thelamapi/next-4b",
	filename="next-4b-f16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use thelamapi/next-4b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf thelamapi/next-4b:F16
# Run inference directly in the terminal:
llama-cli -hf thelamapi/next-4b:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf thelamapi/next-4b:F16
# Run inference directly in the terminal:
llama-cli -hf thelamapi/next-4b:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf thelamapi/next-4b:F16
# Run inference directly in the terminal:
./llama-cli -hf thelamapi/next-4b:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf thelamapi/next-4b:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf thelamapi/next-4b:F16

Use Docker

docker model run hf.co/thelamapi/next-4b:F16

LM Studio
Jan

vLLM

How to use thelamapi/next-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "thelamapi/next-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thelamapi/next-4b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/thelamapi/next-4b:F16

SGLang

How to use thelamapi/next-4b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "thelamapi/next-4b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thelamapi/next-4b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "thelamapi/next-4b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thelamapi/next-4b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use thelamapi/next-4b with Ollama:
```
ollama run hf.co/thelamapi/next-4b:F16
```

Unsloth Studio new

How to use thelamapi/next-4b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for thelamapi/next-4b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for thelamapi/next-4b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for thelamapi/next-4b to start chatting

Docker Model Runner
How to use thelamapi/next-4b with Docker Model Runner:
```
docker model run hf.co/thelamapi/next-4b:F16
```

Lemonade

How to use thelamapi/next-4b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull thelamapi/next-4b:F16

Run and chat with the model

lemonade run user.next-4b-F16

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

🚀 Next 4B (s330)

Türkiye’s First Vision-Language Model — Efficient, Multimodal, and Reasoning-Focused

📖 Overview

Next 4B is a 4-billion parameter multimodal Vision-Language Model (VLM) based on Gemma 3, fine-tuned to handle both text and images efficiently. It is Türkiye’s first open-source vision-language model, designed for:

Understanding and generating text and image descriptions.
Efficient reasoning and context-aware multimodal outputs.
Turkish support with multilingual capabilities.
Low-resource deployment using 8-bit quantization for consumer-grade GPUs.

This model is ideal for researchers, developers, and organizations who need a high-performance multimodal AI capable of visual understanding, reasoning, and creative generation.

Our Next 1B and Next 4B models are leading to all of the tiny models in benchmarks.

Model	MMLU (5-shot) %	MMLU-Pro %	GSM8K %	MATH %
Next 4B preview	84.6	66.9	82.7	70.5
Next 1B	87.3	69.2	90.5	70.1
Qwen 3 0.6B	52.81	37.6	60.7	20.5
Llama 3.2 1B	49.3	44.4	11.9	30.6

Also, our Next 14b model is leading to state-of-the-art models in some of the Benchmarks.

Model	MMLU (5-shot) %	MMLU-Pro %	GSM8K %	MATH %
Next 14B (Thinking)	94.6	93.2	98.8	92.7
Next 12B	92.7	84.4	95.3	87.2
GPT-5	92.5	87.0	98.4	96.0
Claude Opus 4.1 (Thinking)	~92.0	87.8	84.7	95.4

🚀 Installation & Usage

Use with vision:

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
from PIL import Image
import torch

model_id = "Lamapi/next-4b"

model = AutoModelForCausalLM.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id) # For vision.
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Read image
image = Image.open("image.jpg")

# Create a message in chat format
messages = [
  {"role": "system","content": [{"type": "text", "text": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}]},

  {
      "role": "user","content": [{"type": "image", "image": image},
      {"type": "text", "text": "Who is in this image?"}
    ]
  }
]

# Prepare input with Tokenizer
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image], return_tensors="pt")

# Output from the model
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Who is in this image?

The image shows Mustafa Kemal Atatürk, the founder and first President of the Republic of Turkey.

Use without vision:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Lamapi/next-4b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Chat message
messages = [
    {"role": "system", "content": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."},
    {"role": "user", "content": "Hello, how are you?"}
]

# Prepare input with Tokenizer
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")

# Output from the model
output = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Hello, how are you?

I'm fine, thank you. How are you?

🎯 Goals

Multimodal Intelligence: Understand and reason over images and text.
Efficiency: Run on modest GPUs using 8-bit quantization.
Accessibility: Open-source availability for research and applications.
Cultural Relevance: Optimized for Turkish language and context while remaining multilingual.

✨ Key Features

Feature	Description
🔋 Efficient Architecture	Optimized for low VRAM; supports 8-bit quantization for consumer GPUs.
🖼️ Vision-Language Capable	Understands images, captions them, and performs visual reasoning tasks.
🇹🇷 Multilingual & Turkish-Ready	Handles complex Turkish text with high accuracy.
🧠 Advanced Reasoning	Supports logical and analytical reasoning for both text and images.
📊 Consistent & Reliable Outputs	Reproducible responses across multiple runs.
🌍 Open Source	Transparent, community-driven, and research-friendly.

📐 Model Specifications

Specification	Details
Base Model	Gemma 3
Parameter Count	4 Billion
Architecture	Transformer, causal LLM + Vision Encoder
Fine-Tuning Method	Instruction & multimodal fine-tuning (SFT) on Turkish and multilingual datasets
Optimizations	Q8_0, F16, F32 quantizations for low VRAM and high VRAM usage
Modalities	Text & Image
Use Cases	Image captioning, multimodal QA, text generation, reasoning, creative storytelling

📄 License

This project is licensed under the MIT License — free to use, modify, and distribute. Attribution is appreciated.

📞 Contact & Support

📧 Email: lamapicontact@gmail.com
🤗 HuggingFace: Lamapi

Next 4B — Türkiye’s first vision-language AI, combining multimodal understanding, reasoning, and efficiency.

Downloads last month: 192

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for thelamapi/next-4b

Quantizations

16 models

thelamapi
/

next-4b