lca-qwen3-embedding

Domain embedding model for lifecycle assessment (LCA) retrieval. It encodes sentences and short passages into 1024‑d L2-normalized embeddings for semantic search, similarity scoring, and clustering.

Background

Generic embedding models work well in open domains, but professional LCA retrieval often involves long, structured records (e.g., geography/technology/time fields) and domain-specific terminology. This model is trained to better align embeddings with LCA retrieval queries and documents.

Results (our evaluation setup)

On an internal evaluation derived from TianGong LCA records (converted from the Tidas structured format into retrieval-friendly text), this model improved over the base Qwen3-Embedding-0.6B on both ranking quality and tail coverage:

vs base Qwen3-Embedding-0.6B: NDCG@10 +31.2%, Recall@10 +25.7%, MRR@10 +33.5%, Recall@100 +11.5%

Evaluation scale (this experiment):

Train: 17,037 query-doc pairs
Eval: 1,893 queries / 3,786 corpus docs / 1,893 qrels

Model comparisons

Key metrics (@10):

Model	NDCG@10	Recall@10	MRR@10	MAP@10
`Qwen3-Embedding-0.6B` (base)	0.5808	0.7200	0.5367	0.5367
`lca-qwen3-embedding` (this model)	0.7623	0.9049	0.7163	0.7163
`codestral-embed-2505`	0.6628	0.8045	0.6180	0.6180
`qwen3-embedding-8b`	0.5905	0.7369	0.5442	0.5442
`qwen3-embedding-4b`	0.5836	0.7290	0.5377	0.5377
`bge-m3`	0.5839	0.7264	0.5388	0.5388

Tail coverage (@100):

Model	NDCG@100	Recall@100
`Qwen3-Embedding-0.6B` (base)	0.6171	0.8922
`lca-qwen3-embedding` (this model)	0.7826	0.9947
`codestral-embed-2505`	0.6872	0.9171
`qwen3-embedding-8b`	0.6258	0.9033
`qwen3-embedding-4b`	0.6164	0.8822
`bge-m3`	0.6156	0.8743

Protocol note: embeddings are L2-normalized; retrieval uses inner product (equivalent to cosine similarity) with top-100 candidates.

Model details (from the exported config)

Backbone: Qwen3 (model_type=qwen3; config architecture Qwen3ForCausalLM), hidden_size=1024, num_hidden_layers=28
Max sequence length: 1024
Embedding dimension: 1024
Pooling: last-token pooling (pooling_mode_lasttoken=true, include_prompt=true)
Normalization: L2 normalize
Similarity: cosine
Prompts: a query prompt is defined; the document prompt is empty

Module stack:

Transformer -> Pooling(last_token, include_prompt=true) -> Normalize

Usage (SentenceTransformers)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BIaoo/lca-qwen3-embedding")  # replace with your HF repo id if forked/renamed

Retrieval example (encode queries and documents separately; apply the built-in query prompt):

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BIaoo/lca-qwen3-embedding")  # replace with your HF repo id if forked/renamed

queries = ["wood residue gasification heat recovery"]
docs = ["Report describing small-scale biomass CHP units used for district heating."]

q = model.encode(queries, prompt_name="query", normalize_embeddings=True)
d = model.encode(docs, normalize_embeddings=True)
scores = q @ d.T  # cosine similarity (because normalized)
print(scores)

Notes:

Use prompt_name="query" to apply the query instruction prefix from config_sentence_transformers.json.
The document-side prompt is empty; encoding documents with encode(docs, ...) is typically sufficient.

Intended use

Semantic search and reranking for LCA process/flow descriptions and metadata-rich technical text
Similarity scoring for deduplication / clustering of LCA-related passages

Limitations

Trained and evaluated primarily on English technical/LCA text; performance may degrade in other languages or domains.
Evaluation numbers are from a specific internal setup; validate on your own data before production use.

Files

config.json: Qwen3 model config
config_sentence_transformers.json, modules.json, sentence_bert_config.json: SentenceTransformers configs (prompts, modules, max length)
model.safetensors: weights
tokenizer.*, vocab.json, merges.txt: tokenizer assets
1_Pooling/, 2_Normalize/: pooling / normalization modules

Downloads last month: 22

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for BIaoo/lca-qwen3-embedding

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Finetuned

(95)

this model