🔧 Machine Maintenance Predictor

Predicts machine failures before they happen using sensor data. Built with LightGBM on the AI4I 2020 Predictive Maintenance Dataset.

Performance

Metric	Score
Macro F1	0.892
AUC-ROC	0.960
Accuracy	0.986
Precision	0.775
Recall	0.809

Model Comparison (5-Fold Stratified CV with SMOTE-in-Fold)

Model	Macro F1	AUC-ROC
LightGBM ✓	0.886 ± 0.007	0.968 ± 0.006
RandomForest	0.780 ± 0.024	0.971 ± 0.006
XGBoost	0.732 ± 0.012	0.956 ± 0.010

Visualizations

Confusion Matrix	ROC Curves

Feature Importance	Model Comparison

Features

Base Features (from sensors)

Feature	Description
Air temperature [K]	Ambient air temperature
Process temperature [K]	Process temperature
Rotational speed [rpm]	Machine rotational speed
Torque [Nm]	Machine torque
Tool wear [min]	Tool wear time
Type_encoded	Product quality variant (L=0, M=1, H=2)

Engineered Features (SHAP-validated, +5% F1 improvement)

Feature	Formula	Physical Meaning
temp_diff	Air temp - Process temp	Temperature differential
power_proxy	Torque / (Speed + 1)	Power consumption indicator
torque_wear	Torque × Tool wear	Stress accumulation
speed_wear	Speed × Tool wear	Rotational stress over time
temp_torque	Process temp × Torque	Thermal-mechanical load

Usage

import pickle
import numpy as np
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="kushal23/machine-maintenance-predictor",
    filename="model.pkl"
)

# Load
with open(model_path, "rb") as f:
    pipeline = pickle.load(f)

# Prepare input: [Air temp, Process temp, Speed, Torque, Tool wear, 
#                  Type_encoded, temp_diff, power_proxy, torque_wear, speed_wear, temp_torque]
air_temp = 298.1
proc_temp = 308.6
speed = 1551
torque = 42.8
tool_wear = 0
type_enc = 1  # L=0, M=1, H=2

sample = np.array([[
    air_temp, proc_temp, speed, torque, tool_wear, type_enc,
    air_temp - proc_temp,           # temp_diff
    torque / (speed + 1),           # power_proxy
    torque * tool_wear,             # torque_wear
    speed * tool_wear,              # speed_wear
    proc_temp * torque              # temp_torque
]])

prediction = pipeline.predict(sample)
probability = pipeline.predict_proba(sample)[:, 1]

print(f"Failure predicted: {'YES ⚠️' if prediction[0] == 1 else 'No ✓'}")
print(f"Failure probability: {probability[0]:.1%}")

Methodology

Algorithm: LightGBM (300 estimators, lr=0.05, 31 leaves, balanced class weights)
Class Imbalance Handling: SMOTE applied inside CV folds only (prevents data leakage)
Validation: 5-fold stratified cross-validation
Preprocessing: StandardScaler normalization
Reference: Based on methodology from arxiv:2603.13343 (2025)

Dataset

The AI4I 2020 Predictive Maintenance Dataset contains 10,000 data points with:

3.4% failure rate (339 failures out of 10,000)
5 failure modes: Tool Wear (TWF), Heat Dissipation (HDF), Power (PWF), Overstrain (OSF), Random (RNF)
3 product types: Low (60%), Medium (30%), High (10%) quality

Files

File	Description
`model.pkl`	Full sklearn Pipeline (StandardScaler + LightGBM)
`metadata.json`	Model metadata, features, and all metrics
`label_encoder.pkl`	Product type encoder (L/M/H → 0/1/2)
`confusion_matrix.png`	Confusion matrix visualization
`feature_importance.png`	Feature importance chart
`model_comparison.png`	All models comparison
`roc_curves.png`	ROC curves for all models

License

MIT

Downloads last month: -

Dataset used to train kushal23/machine-maintenance-predictor

Paper for kushal23/machine-maintenance-predictor

AI-Driven Predictive Maintenance with Real-Time Contextual Data Fusion for Connected Vehicles: A Multi-Dataset Evaluation

Paper • 2603.13343 • Published Mar 7 • 1