BMS AI Assistant - Technical Specification

📋 Executive Summary

The BMS AI Assistant is an enterprise-grade, zero-cost chatbot system designed for demand forecasting, inventory management, and ERP integration. Built with open-source technologies, it combines rule-based intent parsing with local AI capabilities to provide intelligent, context-aware responses without external API dependencies.

Zero Cost 100% Open Source Local AI Enterprise Ready

🏗️ System Architecture

Presentation Layer

HTML5 + CSS3

JavaScript (Vanilla)

SVG Icons

Hugging Face Spaces

API Layer

FastAPI

Uvicorn (ASGI)

REST Endpoints

Business Logic Layer

Intent Parser

ARIMA Forecasting

LLM Engine

PDF Generator

Data Layer

CSV Files

Pandas DataFrames

In-Memory Cache

AI/ML Layer

TinyLlama 1.1B

ctransformers

Statsmodels (ARIMA)

💻 Technology Stack

Backend Framework

FastAPI 0.104+
Uvicorn (ASGI Server)
Python 3.10+
Async/Await Support

AI/ML Stack

TinyLlama 1.1B (GGUF)
ctransformers 0.2.27+
Hugging Face Hub
Statsmodels (ARIMA)

Data Processing

Pandas 2.0+
NumPy 1.24+
CSV-based Storage
In-Memory Caching

Frontend

HTML5 + CSS3
Vanilla JavaScript
Inter Font Family
SVG Graphics

Cloud Deployment

Hugging Face Spaces
Docker Container
Git Version Control

Testing & QA

Python Requests
Custom Test Suite
Load Testing Scripts
Concurrent Testing

📊 Technical Specifications

Component	Specification	Purpose
LLM Model	TinyLlama 1.1B (Q4_K_M quantized)	General chat, context-aware responses
Model Size	~500 MB (GGUF format)	Optimized for CPU inference
Forecasting Algorithm	ARIMA (AutoRegressive Integrated Moving Average)	Time-series demand prediction
Intent Recognition	Rule-based pattern matching + Regex	Fast, deterministic intent parsing
API Protocol	REST (JSON)	Standard HTTP communication
Data Format	CSV (UTF-8)	Simple, portable data storage
PDF Generation	FPDF Library	Report generation
Response Time	<2s (rule-based), <15s (LLM)	User experience optimization
Concurrent Users	5-10 (free tier), 50+ (paid)	Scalability
Memory Requirement	~2 GB RAM (minimum)	Model + application overhead

🔄 Complete Process Flow

User Input

Component: index.html (Frontend)

Action: User types query in chat interface

Technology: JavaScript event listener (keypress/click)

                            userInput.addEventListener('keypress', function (e) {
                            if (e.key === 'Enter') sendMessage();
                            });

API Request

Component: JavaScript Fetch API

Action: POST request to /api/chat endpoint

Payload: JSON with user message

                            fetch('/api/chat', {
                            method: 'POST',
                            headers: {'Content-Type': 'application/json'},
                            body: JSON.stringify({message: text})
                            });

Intent Parsing

Component: intent_parser.py

Action: Analyze query using regex patterns

Output: Intent type + extracted entities (item_code, quantity, etc.)

                            parsed = parser.parse(message)
                            # Returns: {
                            # "intent": "demand_forecast",
                            # "item_code": "BMS0015",
                            # "horizon_days": 30
                            # }

Intent Routing

Component: main.py (FastAPI)

Action: Route to appropriate handler based on intent

Intents: demand_forecast, item_details, check_inventory, supplier_info, create_requisition, system_status, generate_report, general_chat

Business Logic Execution

Components: Multiple modules based on intent

Actions:

Forecasting: forecasting.py → ARIMA model → Predict demand
Data Retrieval: data_loader.py → Load CSV → Query data
LLM Response: llm_engine.py → TinyLlama → Generate text
PDF Generation: pdf_generator.py → FPDF → Create report

Response Formatting

Component: main.py

Action: Format response as JSON

Structure: {intent, answer, forecast (optional), pdf_link (optional)}

                            return {
                            "intent": "demand_forecast",
                            "answer": "Forecast for BMS0015...",
                            "forecast": [{date: "2025-01-01", qty: 150}, ...]
                            }

Frontend Rendering

Component: index.html (JavaScript)

Action: Parse JSON, render message bubble, display tables/charts

Features: HTML sanitization, table rendering, download buttons

                            addMessage(botHtml, 'bot', true);
                            if (data.forecast) {
                            botHtml += renderForecastTable(data.forecast);
                            }

User Interaction

Component: Frontend UI

Action: Display response with icons, animations, and interactive elements

Features: Smooth scrolling, copy text, download PDFs, click links

🔧 Component Breakdown

1. Intent Parser (intent_parser.py)

                    # Pattern Matching Examples:
                    - "forecast" → demand_forecast
                    - "inventory|stock" → check_inventory
                    - "supplier|vendor" → supplier_info
                    - "order|buy|purchase" → create_requisition
                    - Item Code Extraction: r'\b([a-z]{2}\d{4})\b'
                    - Quantity Extraction: r'\b(\d+)\s*(?:units|pcs)?'
                    - Horizon Parsing: "next month" → 30 days

2. Data Loader (data_loader.py)

                    # Data Sources:
                    - items.csv: Product catalog (item_code, description, price)
                    - demand_history.csv: Historical sales data
                    - inventory.csv: Stock levels by location
                    - suppliers.csv: Supplier information
                    - requisitions.csv: Order tracking

                    # Methods:
                    - load_data(): Initialize all DataFrames
                    - get_item_details(item_code): Retrieve product info
                    - get_inventory(item_code): Check stock levels
                    - create_requisition(item_code, qty): Generate order

3. Forecasting Engine (forecasting.py)

                    # ARIMA Process:
                    1. Load historical demand data
                    2. Aggregate by date
                    3. Fit ARIMA model (order=(1,1,1))
                    4. Generate forecast for N days
                    5. Return predictions with dates

                    # Output Format:
                    [
                    {"date": "2025-01-01", "qty": 150},
                    {"date": "2025-01-02", "qty": 148},
                    ...
                    ]

4. LLM Engine (llm_engine.py)

                    # Model Configuration:
                    - Model: TinyLlama-1.1B-Chat-v1.0 (GGUF)
                    - Context Length: 2048 tokens
                    - Temperature: 0.1 (deterministic)
                    - Max Tokens: 150
                    - Stop Tokens: ["", "User:"]

🏭 BMS AI Assistant