lca-qwen3-embedding
Domain embedding model for lifecycle assessment (LCA) retrieval. It encodes sentences and short passages into 1024โd L2-normalized embeddings for semantic search, similarity scoring, and clustering.
Background
Generic embedding models work well in open domains, but professional LCA retrieval often involves long, structured records (e.g., geography/technology/time fields) and domain-specific terminology. This model is trained to better align embeddings with LCA retrieval queries and documents.
Results (our evaluation setup)
On an internal evaluation derived from TianGong LCA records (converted from the Tidas structured format into retrieval-friendly text), this model improved over the base Qwen3-Embedding-0.6B on both ranking quality and tail coverage:
- vs base
Qwen3-Embedding-0.6B: NDCG@10 +31.2%, Recall@10 +25.7%, MRR@10 +33.5%, Recall@100 +11.5%
Evaluation scale (this experiment):
- Train: 17,037 query-doc pairs
- Eval: 1,893 queries / 3,786 corpus docs / 1,893 qrels
Model comparisons
Key metrics (@10):
| Model | NDCG@10 | Recall@10 | MRR@10 | MAP@10 |
|---|---|---|---|---|
Qwen3-Embedding-0.6B (base) |
0.5808 | 0.7200 | 0.5367 | 0.5367 |
lca-qwen3-embedding (this model) |
0.7623 | 0.9049 | 0.7163 | 0.7163 |
codestral-embed-2505 |
0.6628 | 0.8045 | 0.6180 | 0.6180 |
qwen3-embedding-8b |
0.5905 | 0.7369 | 0.5442 | 0.5442 |
qwen3-embedding-4b |
0.5836 | 0.7290 | 0.5377 | 0.5377 |
bge-m3 |
0.5839 | 0.7264 | 0.5388 | 0.5388 |
Tail coverage (@100):
| Model | NDCG@100 | Recall@100 |
|---|---|---|
Qwen3-Embedding-0.6B (base) |
0.6171 | 0.8922 |
lca-qwen3-embedding (this model) |
0.7826 | 0.9947 |
codestral-embed-2505 |
0.6872 | 0.9171 |
qwen3-embedding-8b |
0.6258 | 0.9033 |
qwen3-embedding-4b |
0.6164 | 0.8822 |
bge-m3 |
0.6156 | 0.8743 |
Protocol note: embeddings are L2-normalized; retrieval uses inner product (equivalent to cosine similarity) with top-100 candidates.
Model details (from the exported config)
- Backbone: Qwen3 (
model_type=qwen3; config architectureQwen3ForCausalLM),hidden_size=1024,num_hidden_layers=28 - Max sequence length:
1024 - Embedding dimension:
1024 - Pooling: last-token pooling (
pooling_mode_lasttoken=true,include_prompt=true) - Normalization: L2 normalize
- Similarity: cosine
- Prompts: a
queryprompt is defined; thedocumentprompt is empty
Module stack:
Transformer -> Pooling(last_token, include_prompt=true) -> Normalize
Usage (SentenceTransformers)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BIaoo/lca-qwen3-embedding") # replace with your HF repo id if forked/renamed
Retrieval example (encode queries and documents separately; apply the built-in query prompt):
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BIaoo/lca-qwen3-embedding") # replace with your HF repo id if forked/renamed
queries = ["wood residue gasification heat recovery"]
docs = ["Report describing small-scale biomass CHP units used for district heating."]
q = model.encode(queries, prompt_name="query", normalize_embeddings=True)
d = model.encode(docs, normalize_embeddings=True)
scores = q @ d.T # cosine similarity (because normalized)
print(scores)
Notes:
- Use
prompt_name="query"to apply the query instruction prefix fromconfig_sentence_transformers.json. - The document-side prompt is empty; encoding documents with
encode(docs, ...)is typically sufficient.
Intended use
- Semantic search and reranking for LCA process/flow descriptions and metadata-rich technical text
- Similarity scoring for deduplication / clustering of LCA-related passages
Limitations
- Trained and evaluated primarily on English technical/LCA text; performance may degrade in other languages or domains.
- Evaluation numbers are from a specific internal setup; validate on your own data before production use.
Files
config.json: Qwen3 model configconfig_sentence_transformers.json,modules.json,sentence_bert_config.json: SentenceTransformers configs (prompts, modules, max length)model.safetensors: weightstokenizer.*,vocab.json,merges.txt: tokenizer assets1_Pooling/,2_Normalize/: pooling / normalization modules
- Downloads last month
- 22