Beynele

Beynele is a Lumina-Image 2.0 based text-to-image model adapted for Kazakh cultural image generation. It is trained with a data-centric pipeline that combines curated cultural data, synthetic supervision, revive-before-reject curation, a base-model anchor dataset, and reference-based evaluation with Beynele-Bench.

Beynele qualitative comparison

Use With Diffusers

import torch
from diffusers import Lumina2Pipeline

pipe = Lumina2Pipeline.from_pretrained(
    "issai/Beynele",
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()

prompt = "A Kazakh dombra resting on a patterned felt carpet."
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=40,
    cfg_trunc_ratio=0.25,
    cfg_normalization=True,
    generator=torch.Generator("cpu").manual_seed(42),
).images[0]
image.save("beynele_dombra.png")

Model Details

Field Value
Architecture Lumina-Image 2.0 / flow-based diffusion transformer
Base pipeline Alpha-VLLM/Lumina-Image-2.0
Text encoder google/gemma-2-2b
Diffusers class Lumina2Pipeline
Resolution 1024 x 1024
Recommended dtype torch.bfloat16
Recommended steps 40
Recommended guidance 4.0

Only the diffusion transformer is adapted. The tokenizer, text encoder, scheduler, and VAE are carried over from the Lumina-Image 2.0 Diffusers release to provide a direct from_pretrained loading path.

Training Data Summary

The final training pool contains three branches:

Branch Examples
Core cultural dataset 196k image-text pairs, about 73k unique images
Text-image dataset 128k examples
Base-model anchor dataset 109k examples

The cultural dataset covers Kazakh people, material culture, buildings, landmarks, food, national symbols, natural scenes, activities, and text-bearing images. The full fine-tuning corpus is not released because of privacy, licensing, and cultural-data governance constraints.

Evaluation

Model Beynele-Bench GenEval WISE UniGenBench++
Lumina-Image 2.0 4.85 0.73 0.54 64.98
Qwen-Image 6.51 0.87 0.62 78.36
Beynele 7.29 0.74 0.51 65.53
Beynele + prompt mediation 7.01 0.78 0.73 68.89

Beynele-Bench uses 750 prompt-reference pairs and reports the arithmetic mean of Qwen3-VL 32B and Gemini 2.5 Pro similarity scores on a 1-10 scale.

Intended Use

Beynele is intended for research on cultural text-to-image generation, low-resource visual adaptation, Kazakh cultural representation, benchmarked T2I evaluation, and data-centric model adaptation.

Limitations and Safety

The model may hallucinate cultural details, produce imperfect Kazakh text, blur faces under difficult compositions, or shift prompt details under strong cultural specialization. It should not be used for identity verification, historical authentication, or high-stakes cultural documentation. Human review and local cultural expertise remain important for sensitive uses.

Provenance

The Hub package contains the converted Diffusers transformer/ safetensors used by Lumina2Pipeline.from_pretrained. The source EMA checkpoint is retained in the local release backup and cache for internal traceability.

Licensing

Beynele is released under the Apache License 2.0. The model is adapted from Alpha-VLLM/Lumina-Image-2.0; users should also follow the licenses and terms of any bundled or upstream components used by the Diffusers pipeline.

Citation

@article{aikyn2026beynele,
  title = {A Data-Centric Framework for Adapting Text-to-Image Models to Low-Resource Cultural Domains},
  author = {Aikyn, Nartay and Aryngazin, Anuar and Maxutov, Akylbek and Varol, Huseyin Atakan},
  year = {2026},
  note = {Pre-release manuscript}
}
Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for issai/Beynele

Finetuned
(15)
this model

Dataset used to train issai/Beynele