Spaces:

studzinsky
/

bielik_app_service

Running

File size: 4,955 Bytes

9a9ec03

# GPT4All Service - Project Context

## Project Overview
This is a **Polish Car Description Enhancement Service** built as a FastAPI microservice that uses a Hugging Face Large Language Model to generate enhanced marketing descriptions for cars in Polish language.

## Core Functionality
The service takes basic car information (make, model, year, mileage, features, condition) and generates compelling, marketing-friendly descriptions in Polish using the `speakleash/Bielik-1.5B-v3.0-Instruct` model - a Polish language model from the Bielik series.

## Project Structure

```
gpt4all-service/
├── app/
│   ├── main.py                    # FastAPI application with endpoints
│   ├── models/
│   │   └── huggingface_service.py # Core LLM service wrapper
│   └── schemas/
│       └── schemas.py             # Pydantic data models
├── Dockerfile                     # Multi-stage Docker build
├── download_model.py             # Model download script for Docker
├── requirements.txt              # Python dependencies
├── start_container.ps1           # PowerShell startup script
├── start_container.sh            # Bash startup script
└── README.md                     # Comprehensive documentation
```

## Technical Architecture

### 1. FastAPI Application (`app/main.py`)
- **Framework**: FastAPI with CORS middleware
- **Main Endpoint**: `POST /enhance-description` - takes car data, returns enhanced description
- **Health Check**: `GET /health` - service status and model initialization check
- **CORS**: Configured for frontend on `http://localhost:5173` (likely React/Vue dev server)

### 2. LLM Service (`app/models/huggingface_service.py`)
- **Purpose**: Wrapper around Hugging Face Transformers pipeline
- **Model**: `speakleash/Bielik-1.5B-v3.0-Instruct` (Polish language model)
- **Features**:
  - Async initialization and text generation
  - Support for both GPU (CUDA) and CPU inference
  - Chat template support for conversation-style prompts
  - Configurable generation parameters (temperature, top_p, max_tokens)
  - Smart response parsing to extract only the assistant's response

### 3. Data Models (`app/schemas/schemas.py`)
- **CarData**: Input model with make, model, year, mileage, features[], condition
- **EnhancedDescriptionResponse**: Output model with generated description

### 4. Containerization
- **Docker**: Self-contained image with pre-downloaded model (~3.2GB)
- **Security**: Uses Docker BuildKit secrets for Hugging Face token handling
- **Model Storage**: Downloaded to `/app/pretrain_model` during build
- **Runtime**: Python 3.9-slim base image

## Key Technical Details

### Model Configuration
- **Model Path**: `/app/pretrain_model` (in container) or configurable for local dev
- **Device**: Currently set to CPU in main.py, but service supports GPU
- **Generation Params**: 150 max tokens, temperature 0.75, top_p 0.9

### Prompt Engineering
The service uses a carefully crafted Polish system prompt:
- Instructs the model to create marketing descriptions in Polish
- Limits output to 500 characters maximum
- Tells the model to ignore off-topic content
- Uses chat template format with system/user roles

### Dependencies
- **fastapi**: Web framework
- **uvicorn[standard]**: ASGI server
- **transformers[torch]**: Hugging Face transformers with PyTorch
- **accelerate**: Hugging Face optimization library

## Current State & Issues

### Git Status
- Modified `app/main.py` (likely recent changes)
- Deleted `app/models/gpt4all.py` (indicates migration from GPT4All to Hugging Face)

### Linter Issues in `huggingface_service.py`
1. Import issues: `pipeline` and `AutoTokenizer` imports need specific paths
2. Type annotations: `device: str = None` should be `Optional[str] = None`
3. Method parameters: Similar optional parameter typing issues

## Usage Scenarios
1. **Car Dealership Websites**: Auto-generate compelling descriptions from basic car specs
2. **Marketplace Applications**: Enhance user-provided car listings
3. **Inventory Management**: Bulk description generation for car databases

## Deployment Options
1. **Local Development**: Direct Python/uvicorn execution
2. **Docker Container**: Self-contained deployment with pre-downloaded model
3. **Production**: Containerized deployment with proper authentication

## Authentication Requirements
- Hugging Face Hub token required for model download (gated model)
- Token stored in `my_hf_token.txt` during Docker build
- Securely handled via Docker BuildKit secrets

## Performance Considerations
- Model size: ~3.2GB (significant memory footprint)
- CPU inference: Slower but more accessible
- GPU inference: Faster but requires CUDA setup
- Async design: Non-blocking text generation

This service represents a specialized AI application for the Polish automotive market, focusing on generating marketing content using state-of-the-art Polish language models.