File size: 4,955 Bytes
9a9ec03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# GPT4All Service - Project Context

## Project Overview
This is a **Polish Car Description Enhancement Service** built as a FastAPI microservice that uses a Hugging Face Large Language Model to generate enhanced marketing descriptions for cars in Polish language.

## Core Functionality
The service takes basic car information (make, model, year, mileage, features, condition) and generates compelling, marketing-friendly descriptions in Polish using the `speakleash/Bielik-1.5B-v3.0-Instruct` model - a Polish language model from the Bielik series.

## Project Structure

```
gpt4all-service/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py                    # FastAPI application with endpoints
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   └── huggingface_service.py # Core LLM service wrapper
β”‚   └── schemas/
β”‚       └── schemas.py             # Pydantic data models
β”œβ”€β”€ Dockerfile                     # Multi-stage Docker build
β”œβ”€β”€ download_model.py             # Model download script for Docker
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ start_container.ps1           # PowerShell startup script
β”œβ”€β”€ start_container.sh            # Bash startup script
└── README.md                     # Comprehensive documentation
```

## Technical Architecture

### 1. FastAPI Application (`app/main.py`)
- **Framework**: FastAPI with CORS middleware
- **Main Endpoint**: `POST /enhance-description` - takes car data, returns enhanced description
- **Health Check**: `GET /health` - service status and model initialization check
- **CORS**: Configured for frontend on `http://localhost:5173` (likely React/Vue dev server)

### 2. LLM Service (`app/models/huggingface_service.py`)
- **Purpose**: Wrapper around Hugging Face Transformers pipeline
- **Model**: `speakleash/Bielik-1.5B-v3.0-Instruct` (Polish language model)
- **Features**:
  - Async initialization and text generation
  - Support for both GPU (CUDA) and CPU inference
  - Chat template support for conversation-style prompts
  - Configurable generation parameters (temperature, top_p, max_tokens)
  - Smart response parsing to extract only the assistant's response

### 3. Data Models (`app/schemas/schemas.py`)
- **CarData**: Input model with make, model, year, mileage, features[], condition
- **EnhancedDescriptionResponse**: Output model with generated description

### 4. Containerization
- **Docker**: Self-contained image with pre-downloaded model (~3.2GB)
- **Security**: Uses Docker BuildKit secrets for Hugging Face token handling
- **Model Storage**: Downloaded to `/app/pretrain_model` during build
- **Runtime**: Python 3.9-slim base image

## Key Technical Details

### Model Configuration
- **Model Path**: `/app/pretrain_model` (in container) or configurable for local dev
- **Device**: Currently set to CPU in main.py, but service supports GPU
- **Generation Params**: 150 max tokens, temperature 0.75, top_p 0.9

### Prompt Engineering
The service uses a carefully crafted Polish system prompt:
- Instructs the model to create marketing descriptions in Polish
- Limits output to 500 characters maximum
- Tells the model to ignore off-topic content
- Uses chat template format with system/user roles

### Dependencies
- **fastapi**: Web framework
- **uvicorn[standard]**: ASGI server
- **transformers[torch]**: Hugging Face transformers with PyTorch
- **accelerate**: Hugging Face optimization library

## Current State & Issues

### Git Status
- Modified `app/main.py` (likely recent changes)
- Deleted `app/models/gpt4all.py` (indicates migration from GPT4All to Hugging Face)

### Linter Issues in `huggingface_service.py`
1. Import issues: `pipeline` and `AutoTokenizer` imports need specific paths
2. Type annotations: `device: str = None` should be `Optional[str] = None`
3. Method parameters: Similar optional parameter typing issues

## Usage Scenarios
1. **Car Dealership Websites**: Auto-generate compelling descriptions from basic car specs
2. **Marketplace Applications**: Enhance user-provided car listings
3. **Inventory Management**: Bulk description generation for car databases

## Deployment Options
1. **Local Development**: Direct Python/uvicorn execution
2. **Docker Container**: Self-contained deployment with pre-downloaded model
3. **Production**: Containerized deployment with proper authentication

## Authentication Requirements
- Hugging Face Hub token required for model download (gated model)
- Token stored in `my_hf_token.txt` during Docker build
- Securely handled via Docker BuildKit secrets

## Performance Considerations
- Model size: ~3.2GB (significant memory footprint)
- CPU inference: Slower but more accessible
- GPU inference: Faster but requires CUDA setup
- Async design: Non-blocking text generation

This service represents a specialized AI application for the Polish automotive market, focusing on generating marketing content using state-of-the-art Polish language models.