βοΈ Suncast β Hourly Solar PV Generation Forecasting Model (China Region)
A machine learning model that predicts hourly solar PV power generation (kWh) for any location across mainland China, given latitude, longitude, and a date range.
π Model Overview
| Item | Detail |
|---|---|
| Task | Tabular Regression (Solar Irradiance β PV Power) |
| Algorithm | Random Forest Regressor (via PyCaret AutoML) |
| Target Region | Mainland China (UTC+8) |
| Temporal Resolution | 1-hour intervals |
| Output Unit | kWh (1 kW standard PV plant) |
| Training Period | 2024 full year |
| Training Samples | 4,861,296 |
π Performance
| Metric | Value |
|---|---|
| MAE | 76.19 W/mΒ² |
| RMSE | 126.96 W/mΒ² |
| RΒ² | 0.748 |
| MAPE | 1.49% |
Notable observations:
- β High accuracy during summer months (abundant solar irradiance)
- β οΈ Increased error in winter (low irradiance, high meteorological variability)
- The seasonal structure of the model allows for long-term extensibility
ποΈ Data Sources
Input β GFS (Global Forecast System, NOAA)
- Spatial resolution: 1Β° Γ 1Β°
- Temporal resolution: 1 hour
- Coverage: Lat 19Β°β53Β° (2Β° step), Lon 74Β°β134Β° (2Β° step) β 558 grid points
| Variable | Unit |
|---|---|
| Surface Pressure | Pa |
| Surface Temperature | K |
| Relative Humidity (2m) | % |
| U-Component of Wind (10m) | m/s |
| V-Component of Wind (10m) | m/s |
| Sunshine Duration | s |
| Low / Mid / High Cloud Cover | % |
| Downward Short-Wave Radiation Flux | W/mΒ² |
GFS DSWRF is a model-simulated value computed via the RRTMG radiation transfer scheme β not a direct satellite measurement.
Target β NASA POWER / CERES SYN1deg
- Source: CERES SYN1deg (Ed4.x), cross-calibrated with Terra/Aqua CERES, MODIS, and GEO satellites
- Spatial resolution: 1Β° Γ 1Β° (downsampled to 2Β° Γ 2Β°)
- Temporal resolution: 1 hour (linearly interpolated from 3-hour data)
- Time zone: UTC+8 fixed (unified across all of China)
π§ Model Training Details
Feature Engineering
- Spatiotemporal alignment and standardization of GFS input variables
- Added temporal features:
hour_local,month_local,day_of_year,season
Candidate Models Compared
- Extra Trees Regressor
- Random Forest Regressor β (selected)
- LightGBM
- Gradient Boosting Regressor
Random Forest was selected for its strong resistance to overfitting and balanced performance across all evaluation metrics.
Training Configuration
| Setting | Value |
|---|---|
| Train / Test Split | 80% / 20% |
| Cross-Validation | k-fold (k=10) |
| Hyperparameter Tuning | Grid Search |
β‘ PV Power Conversion
Predicted solar irradiance (W/mΒ²) is converted to power generation (kWh) using pvlib.
| Parameter | Value |
|---|---|
| Panel Tilt | 25Β° |
| Panel Azimuth | 180Β° (south-facing) |
| Temperature Coefficient | β0.004 /Β°C |
| Capacity | 1 kW (standard) |
Power generation is set to 0 kWh before 06:00 and after 19:00 (local time).
π How to Use
from pycaret.regression import load_model, predict_model
import pandas as pd
# Load model
model = load_model("AutoML_model_v1")
# Prepare input features
input_data = pd.DataFrame([{
"sp": 101325, # Surface Pressure [Pa]
"t": 300.15, # Surface Temperature [K]
"r2": 60.0, # Relative Humidity [%]
"u10": 2.0, # U-Wind [m/s]
"v10": -1.5, # V-Wind [m/s]
"SUNSD": 3200, # Sunshine Duration [s]
"lcc": 10.0, # Low Cloud Cover [%]
"mcc": 5.0, # Mid Cloud Cover [%]
"hcc": 20.0, # High Cloud Cover [%]
"sdswrf": 650.0, # DSWRF [W/mΒ²]
"hour_local": 12,
"month_local": 7,
"day_of_year": 190
}])
# Predict irradiance
prediction = predict_model(model, data=input_data)
print(prediction["prediction_label"])
π Repository Files
| File | Description |
|---|---|
Suncast_v1.pkl |
Trained PyCaret Random Forest pipeline |
β οΈ Limitations
- Training data is limited to 2024 only (originally planned for 2020β2024; reduced due to GFS server instability and storage constraints)
- Grid resolution is 2Β° Γ 2Β° β predictions use the nearest grid point to the input coordinates
- Not applicable outside mainland China grid coverage
π License
This model is released under the Apache 2.0 License.