Spaces:
Running
Running
File size: 4,955 Bytes
9a9ec03 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# GPT4All Service - Project Context
## Project Overview
This is a **Polish Car Description Enhancement Service** built as a FastAPI microservice that uses a Hugging Face Large Language Model to generate enhanced marketing descriptions for cars in Polish language.
## Core Functionality
The service takes basic car information (make, model, year, mileage, features, condition) and generates compelling, marketing-friendly descriptions in Polish using the `speakleash/Bielik-1.5B-v3.0-Instruct` model - a Polish language model from the Bielik series.
## Project Structure
```
gpt4all-service/
βββ app/
β βββ main.py # FastAPI application with endpoints
β βββ models/
β β βββ huggingface_service.py # Core LLM service wrapper
β βββ schemas/
β βββ schemas.py # Pydantic data models
βββ Dockerfile # Multi-stage Docker build
βββ download_model.py # Model download script for Docker
βββ requirements.txt # Python dependencies
βββ start_container.ps1 # PowerShell startup script
βββ start_container.sh # Bash startup script
βββ README.md # Comprehensive documentation
```
## Technical Architecture
### 1. FastAPI Application (`app/main.py`)
- **Framework**: FastAPI with CORS middleware
- **Main Endpoint**: `POST /enhance-description` - takes car data, returns enhanced description
- **Health Check**: `GET /health` - service status and model initialization check
- **CORS**: Configured for frontend on `http://localhost:5173` (likely React/Vue dev server)
### 2. LLM Service (`app/models/huggingface_service.py`)
- **Purpose**: Wrapper around Hugging Face Transformers pipeline
- **Model**: `speakleash/Bielik-1.5B-v3.0-Instruct` (Polish language model)
- **Features**:
- Async initialization and text generation
- Support for both GPU (CUDA) and CPU inference
- Chat template support for conversation-style prompts
- Configurable generation parameters (temperature, top_p, max_tokens)
- Smart response parsing to extract only the assistant's response
### 3. Data Models (`app/schemas/schemas.py`)
- **CarData**: Input model with make, model, year, mileage, features[], condition
- **EnhancedDescriptionResponse**: Output model with generated description
### 4. Containerization
- **Docker**: Self-contained image with pre-downloaded model (~3.2GB)
- **Security**: Uses Docker BuildKit secrets for Hugging Face token handling
- **Model Storage**: Downloaded to `/app/pretrain_model` during build
- **Runtime**: Python 3.9-slim base image
## Key Technical Details
### Model Configuration
- **Model Path**: `/app/pretrain_model` (in container) or configurable for local dev
- **Device**: Currently set to CPU in main.py, but service supports GPU
- **Generation Params**: 150 max tokens, temperature 0.75, top_p 0.9
### Prompt Engineering
The service uses a carefully crafted Polish system prompt:
- Instructs the model to create marketing descriptions in Polish
- Limits output to 500 characters maximum
- Tells the model to ignore off-topic content
- Uses chat template format with system/user roles
### Dependencies
- **fastapi**: Web framework
- **uvicorn[standard]**: ASGI server
- **transformers[torch]**: Hugging Face transformers with PyTorch
- **accelerate**: Hugging Face optimization library
## Current State & Issues
### Git Status
- Modified `app/main.py` (likely recent changes)
- Deleted `app/models/gpt4all.py` (indicates migration from GPT4All to Hugging Face)
### Linter Issues in `huggingface_service.py`
1. Import issues: `pipeline` and `AutoTokenizer` imports need specific paths
2. Type annotations: `device: str = None` should be `Optional[str] = None`
3. Method parameters: Similar optional parameter typing issues
## Usage Scenarios
1. **Car Dealership Websites**: Auto-generate compelling descriptions from basic car specs
2. **Marketplace Applications**: Enhance user-provided car listings
3. **Inventory Management**: Bulk description generation for car databases
## Deployment Options
1. **Local Development**: Direct Python/uvicorn execution
2. **Docker Container**: Self-contained deployment with pre-downloaded model
3. **Production**: Containerized deployment with proper authentication
## Authentication Requirements
- Hugging Face Hub token required for model download (gated model)
- Token stored in `my_hf_token.txt` during Docker build
- Securely handled via Docker BuildKit secrets
## Performance Considerations
- Model size: ~3.2GB (significant memory footprint)
- CPU inference: Slower but more accessible
- GPU inference: Faster but requires CUDA setup
- Async design: Non-blocking text generation
This service represents a specialized AI application for the Polish automotive market, focusing on generating marketing content using state-of-the-art Polish language models. |