β˜€οΈ Suncast β€” Hourly Solar PV Generation Forecasting Model (China Region)

A machine learning model that predicts hourly solar PV power generation (kWh) for any location across mainland China, given latitude, longitude, and a date range.


πŸ“Œ Model Overview

Item Detail
Task Tabular Regression (Solar Irradiance β†’ PV Power)
Algorithm Random Forest Regressor (via PyCaret AutoML)
Target Region Mainland China (UTC+8)
Temporal Resolution 1-hour intervals
Output Unit kWh (1 kW standard PV plant)
Training Period 2024 full year
Training Samples 4,861,296

πŸ“Š Performance

Metric Value
MAE 76.19 W/mΒ²
RMSE 126.96 W/mΒ²
RΒ² 0.748
MAPE 1.49%

Notable observations:

  • βœ… High accuracy during summer months (abundant solar irradiance)
  • ⚠️ Increased error in winter (low irradiance, high meteorological variability)
  • The seasonal structure of the model allows for long-term extensibility

πŸ—‚οΈ Data Sources

Input β€” GFS (Global Forecast System, NOAA)

  • Spatial resolution: 1Β° Γ— 1Β°
  • Temporal resolution: 1 hour
  • Coverage: Lat 19°–53Β° (2Β° step), Lon 74°–134Β° (2Β° step) β†’ 558 grid points
Variable Unit
Surface Pressure Pa
Surface Temperature K
Relative Humidity (2m) %
U-Component of Wind (10m) m/s
V-Component of Wind (10m) m/s
Sunshine Duration s
Low / Mid / High Cloud Cover %
Downward Short-Wave Radiation Flux W/mΒ²

GFS DSWRF is a model-simulated value computed via the RRTMG radiation transfer scheme β€” not a direct satellite measurement.

Target β€” NASA POWER / CERES SYN1deg

  • Source: CERES SYN1deg (Ed4.x), cross-calibrated with Terra/Aqua CERES, MODIS, and GEO satellites
  • Spatial resolution: 1Β° Γ— 1Β° (downsampled to 2Β° Γ— 2Β°)
  • Temporal resolution: 1 hour (linearly interpolated from 3-hour data)
  • Time zone: UTC+8 fixed (unified across all of China)

🧠 Model Training Details

Feature Engineering

  • Spatiotemporal alignment and standardization of GFS input variables
  • Added temporal features: hour_local, month_local, day_of_year, season

Candidate Models Compared

  • Extra Trees Regressor
  • Random Forest Regressor βœ… (selected)
  • LightGBM
  • Gradient Boosting Regressor

Random Forest was selected for its strong resistance to overfitting and balanced performance across all evaluation metrics.

Training Configuration

Setting Value
Train / Test Split 80% / 20%
Cross-Validation k-fold (k=10)
Hyperparameter Tuning Grid Search

⚑ PV Power Conversion

Predicted solar irradiance (W/mΒ²) is converted to power generation (kWh) using pvlib.

Parameter Value
Panel Tilt 25Β°
Panel Azimuth 180Β° (south-facing)
Temperature Coefficient βˆ’0.004 /Β°C
Capacity 1 kW (standard)

Power generation is set to 0 kWh before 06:00 and after 19:00 (local time).


πŸš€ How to Use

from pycaret.regression import load_model, predict_model
import pandas as pd

# Load model
model = load_model("AutoML_model_v1")

# Prepare input features
input_data = pd.DataFrame([{
    "sp": 101325,       # Surface Pressure [Pa]
    "t": 300.15,        # Surface Temperature [K]
    "r2": 60.0,         # Relative Humidity [%]
    "u10": 2.0,         # U-Wind [m/s]
    "v10": -1.5,        # V-Wind [m/s]
    "SUNSD": 3200,      # Sunshine Duration [s]
    "lcc": 10.0,        # Low Cloud Cover [%]
    "mcc": 5.0,         # Mid Cloud Cover [%]
    "hcc": 20.0,        # High Cloud Cover [%]
    "sdswrf": 650.0,    # DSWRF [W/mΒ²]
    "hour_local": 12,
    "month_local": 7,
    "day_of_year": 190
}])

# Predict irradiance
prediction = predict_model(model, data=input_data)
print(prediction["prediction_label"])

πŸ“ Repository Files

File Description
Suncast_v1.pkl Trained PyCaret Random Forest pipeline

⚠️ Limitations

  • Training data is limited to 2024 only (originally planned for 2020–2024; reduced due to GFS server instability and storage constraints)
  • Grid resolution is 2Β° Γ— 2Β° β€” predictions use the nearest grid point to the input coordinates
  • Not applicable outside mainland China grid coverage

πŸ“„ License

This model is released under the Apache 2.0 License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support