Instructions to use rLLM/rLLM-FinQA-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rLLM/rLLM-FinQA-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rLLM/rLLM-FinQA-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("rLLM/rLLM-FinQA-4B") model = AutoModelForCausalLM.from_pretrained("rLLM/rLLM-FinQA-4B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use rLLM/rLLM-FinQA-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rLLM/rLLM-FinQA-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rLLM/rLLM-FinQA-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rLLM/rLLM-FinQA-4B
- SGLang
How to use rLLM/rLLM-FinQA-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rLLM/rLLM-FinQA-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rLLM/rLLM-FinQA-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rLLM/rLLM-FinQA-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rLLM/rLLM-FinQA-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use rLLM/rLLM-FinQA-4B with Docker Model Runner:
docker model run hf.co/rLLM/rLLM-FinQA-4B
FinQA Overview
FinQA is a financial question-answering agent fine-tuned from Qwen3-4B-Instruct-2507 using reinforcement learning (RL). The model answers questions about SEC 10-K financial statements using specialized tools (SQL queries, table lookup, calculators), achieving 59.70% accuracy on Snorkel Finance Benchmark and 26.6% on Snorkel Finance Reasoning.
Data
Our training dataset is built from SEC 10-K filings and consists of 5,110 question-answer pairs across:
- 207 companies spanning multiple sectors
- 6,923 financial tables extracted from 10-K filings
- Single-table questions: Direct lookups and calculations from individual tables
- Multi-table questions: Cross-table reasoning requiring data from multiple sources
The dataset is available on HuggingFace.
Tools
The agent uses 4 specialized tools for financial analysis:
| Tool | Description |
|---|---|
get_table_names |
List available tables for a given company |
get_table_info |
Get table metadata, columns, dtypes, and sample values |
sql_query |
Execute SQL queries on financial tables (SQLite) |
calculator |
Evaluate mathematical expressions |
Training
We fine-tune Qwen3-4B-Instruct-2507 using GRPO with LLM-as-judge rewards for correctness evaluation. A more detailed description of the training recipe can be found in our documentation.
Evaluation
| Model | FinQA | FinQA Reasoning |
|---|---|---|
| Qwen3-4B-Instruct-2507 (Base) | 27.90% | 13.90% |
| gpt-5-nano-2025-08-07 | 50.00% | 26.60% |
| Qwen3-235B-A22B | 51.37% | 18.90% |
| rLLM-FinQA-4B (Ours) | 59.70% | 26.60% |
| Gemini-2.5-Pro-Preview | 60.60% | 34.60% |
| GPT-4.1-2025-04-14 | 62.70% | 37.90% |
| o3-mini-2025-01-31 | 63.79% | 30.37% |
Serving FinQA
Start a vLLM server and run the agent:
python -m vllm.entrypoints.openai.api_server \
--model rLLM/rLLM-FinQA-4B \
--host 0.0.0.0 \
--port 30000 \
--dtype bfloat16
python -m projects.finqa.run_finqa
For detailed setup instructions, see the project README.
Acknowledgement
- This is a joint collaboration between the rLLM team at UC Berkeley and Snorkel AI.
- Our model is trained on top of
Qwen3-4B-Instruct-2507. - Our work is done as part of Berkeley Sky Computing Lab.
Citation
@misc{rllm2026finqa,
title={FinQA: Training Financial Agents with Reinforcement Learning},
author={Manan Roongta and Sijun Tan and Bhavishya Pohani and Charles Dickens and Christopher Glaze},
year={2026},
howpublished={\url{https://rllm-project.com/post.html?post=finqa.md}},
note={Blog Post}
}
- Downloads last month
- 70
Model tree for rLLM/rLLM-FinQA-4B
Base model
Qwen/Qwen3-4B-Instruct-2507