🔧 Machine Maintenance Predictor
Predicts machine failures before they happen using sensor data. Built with LightGBM on the AI4I 2020 Predictive Maintenance Dataset.
Performance
| Metric |
Score |
| Macro F1 |
0.892 |
| AUC-ROC |
0.960 |
| Accuracy |
0.986 |
| Precision |
0.775 |
| Recall |
0.809 |
Model Comparison (5-Fold Stratified CV with SMOTE-in-Fold)
| Model |
Macro F1 |
AUC-ROC |
| LightGBM ✓ |
0.886 ± 0.007 |
0.968 ± 0.006 |
| RandomForest |
0.780 ± 0.024 |
0.971 ± 0.006 |
| XGBoost |
0.732 ± 0.012 |
0.956 ± 0.010 |
Visualizations
| Confusion Matrix |
ROC Curves |
 |
 |
| Feature Importance |
Model Comparison |
 |
 |
Features
Base Features (from sensors)
| Feature |
Description |
| Air temperature [K] |
Ambient air temperature |
| Process temperature [K] |
Process temperature |
| Rotational speed [rpm] |
Machine rotational speed |
| Torque [Nm] |
Machine torque |
| Tool wear [min] |
Tool wear time |
| Type_encoded |
Product quality variant (L=0, M=1, H=2) |
Engineered Features (SHAP-validated, +5% F1 improvement)
| Feature |
Formula |
Physical Meaning |
| temp_diff |
Air temp - Process temp |
Temperature differential |
| power_proxy |
Torque / (Speed + 1) |
Power consumption indicator |
| torque_wear |
Torque × Tool wear |
Stress accumulation |
| speed_wear |
Speed × Tool wear |
Rotational stress over time |
| temp_torque |
Process temp × Torque |
Thermal-mechanical load |
Usage
import pickle
import numpy as np
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="kushal23/machine-maintenance-predictor",
filename="model.pkl"
)
with open(model_path, "rb") as f:
pipeline = pickle.load(f)
air_temp = 298.1
proc_temp = 308.6
speed = 1551
torque = 42.8
tool_wear = 0
type_enc = 1
sample = np.array([[
air_temp, proc_temp, speed, torque, tool_wear, type_enc,
air_temp - proc_temp,
torque / (speed + 1),
torque * tool_wear,
speed * tool_wear,
proc_temp * torque
]])
prediction = pipeline.predict(sample)
probability = pipeline.predict_proba(sample)[:, 1]
print(f"Failure predicted: {'YES ⚠️' if prediction[0] == 1 else 'No ✓'}")
print(f"Failure probability: {probability[0]:.1%}")
Methodology
- Algorithm: LightGBM (300 estimators, lr=0.05, 31 leaves, balanced class weights)
- Class Imbalance Handling: SMOTE applied inside CV folds only (prevents data leakage)
- Validation: 5-fold stratified cross-validation
- Preprocessing: StandardScaler normalization
- Reference: Based on methodology from arxiv:2603.13343 (2025)
Dataset
The AI4I 2020 Predictive Maintenance Dataset contains 10,000 data points with:
- 3.4% failure rate (339 failures out of 10,000)
- 5 failure modes: Tool Wear (TWF), Heat Dissipation (HDF), Power (PWF), Overstrain (OSF), Random (RNF)
- 3 product types: Low (60%), Medium (30%), High (10%) quality
Files
| File |
Description |
model.pkl |
Full sklearn Pipeline (StandardScaler + LightGBM) |
metadata.json |
Model metadata, features, and all metrics |
label_encoder.pkl |
Product type encoder (L/M/H → 0/1/2) |
confusion_matrix.png |
Confusion matrix visualization |
feature_importance.png |
Feature importance chart |
model_comparison.png |
All models comparison |
roc_curves.png |
ROC curves for all models |
License
MIT