🔧 Machine Maintenance Predictor

Predicts machine failures before they happen using sensor data. Built with LightGBM on the AI4I 2020 Predictive Maintenance Dataset.

Performance

Metric Score
Macro F1 0.892
AUC-ROC 0.960
Accuracy 0.986
Precision 0.775
Recall 0.809

Model Comparison (5-Fold Stratified CV with SMOTE-in-Fold)

Model Macro F1 AUC-ROC
LightGBM 0.886 ± 0.007 0.968 ± 0.006
RandomForest 0.780 ± 0.024 0.971 ± 0.006
XGBoost 0.732 ± 0.012 0.956 ± 0.010

Visualizations

Confusion Matrix ROC Curves
Confusion Matrix ROC Curves
Feature Importance Model Comparison
Feature Importance Model Comparison

Features

Base Features (from sensors)

Feature Description
Air temperature [K] Ambient air temperature
Process temperature [K] Process temperature
Rotational speed [rpm] Machine rotational speed
Torque [Nm] Machine torque
Tool wear [min] Tool wear time
Type_encoded Product quality variant (L=0, M=1, H=2)

Engineered Features (SHAP-validated, +5% F1 improvement)

Feature Formula Physical Meaning
temp_diff Air temp - Process temp Temperature differential
power_proxy Torque / (Speed + 1) Power consumption indicator
torque_wear Torque × Tool wear Stress accumulation
speed_wear Speed × Tool wear Rotational stress over time
temp_torque Process temp × Torque Thermal-mechanical load

Usage

import pickle
import numpy as np
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(
    repo_id="kushal23/machine-maintenance-predictor",
    filename="model.pkl"
)

# Load
with open(model_path, "rb") as f:
    pipeline = pickle.load(f)

# Prepare input: [Air temp, Process temp, Speed, Torque, Tool wear, 
#                  Type_encoded, temp_diff, power_proxy, torque_wear, speed_wear, temp_torque]
air_temp = 298.1
proc_temp = 308.6
speed = 1551
torque = 42.8
tool_wear = 0
type_enc = 1  # L=0, M=1, H=2

sample = np.array([[
    air_temp, proc_temp, speed, torque, tool_wear, type_enc,
    air_temp - proc_temp,           # temp_diff
    torque / (speed + 1),           # power_proxy
    torque * tool_wear,             # torque_wear
    speed * tool_wear,              # speed_wear
    proc_temp * torque              # temp_torque
]])

prediction = pipeline.predict(sample)
probability = pipeline.predict_proba(sample)[:, 1]

print(f"Failure predicted: {'YES ⚠️' if prediction[0] == 1 else 'No ✓'}")
print(f"Failure probability: {probability[0]:.1%}")

Methodology

  • Algorithm: LightGBM (300 estimators, lr=0.05, 31 leaves, balanced class weights)
  • Class Imbalance Handling: SMOTE applied inside CV folds only (prevents data leakage)
  • Validation: 5-fold stratified cross-validation
  • Preprocessing: StandardScaler normalization
  • Reference: Based on methodology from arxiv:2603.13343 (2025)

Dataset

The AI4I 2020 Predictive Maintenance Dataset contains 10,000 data points with:

  • 3.4% failure rate (339 failures out of 10,000)
  • 5 failure modes: Tool Wear (TWF), Heat Dissipation (HDF), Power (PWF), Overstrain (OSF), Random (RNF)
  • 3 product types: Low (60%), Medium (30%), High (10%) quality

Files

File Description
model.pkl Full sklearn Pipeline (StandardScaler + LightGBM)
metadata.json Model metadata, features, and all metrics
label_encoder.pkl Product type encoder (L/M/H → 0/1/2)
confusion_matrix.png Confusion matrix visualization
feature_importance.png Feature importance chart
model_comparison.png All models comparison
roc_curves.png ROC curves for all models

License

MIT

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train kushal23/machine-maintenance-predictor

Paper for kushal23/machine-maintenance-predictor