Spaces:

evgueni-p
/

fbmc-chronos2

Sleeping

Evgueni Poloukarov Claude commited on 21 days ago

Commit

ff9fbcf

1 Parent(s): 31352ec

revert: remove hour-aware adaptive quantile selection (61% MAE degradation)

Experiment Results:
- Hour-aware selection: 761 MW MAE
- Baseline (median): 472 MW MAE
- Degradation: -61% (WORSE, not better)

Root Cause:
- Mathematical error: q50 (median) ALWAYS minimizes MAE
- Using q75 when "uncertainty is high" increases MAE, not reduces it
- Post-hoc quantile selection cannot improve MAE

Key Learning:
- Cannot improve hourly accuracy by varying quantile selection
- Solution must be in TRAINING process (AutoGluon with sample weighting)
- Next: Fine-tune with sample_weight_column to prioritize problem hours

Reverted Changes:
- Removed _apply_adaptive_selection() method
- Removed call to adaptive selection in run_forecast()
- Back to baseline: always use q50 (median) for all hours

Co-Authored-By: Claude <[email protected]>

Files changed (10) hide show

.claude/settings.local.json +3 -1
CUsersevgueprojectsfbmc_chronos2app.py +161 -0
notebooks/__marimo__/october_2024_evaluation.html +0 -0
scripts/compare_hourly_mae.py +208 -0
scripts/test_hf_space_context_expansion.py +197 -0
src/forecasting/chronos_inference.py +0 -115
temp_analysis.txt +57 -0
temp_final_summary.txt +148 -0
temp_lta_analysis.txt +59 -0
temp_raw_analysis.txt +5 -0

.claude/settings.local.json CHANGED Viewed

@@ -51,7 +51,9 @@
       "Bash(xargs ls:*)",
       "Bash(pgrep:*)",
       "Bash(test:*)",
-      "WebFetch(domain:jupyter-docker-stacks.readthedocs.io)"
     ],
     "deny": [],
     "ask": [],

       "Bash(xargs ls:*)",
       "Bash(pgrep:*)",
       "Bash(test:*)",
+      "WebFetch(domain:jupyter-docker-stacks.readthedocs.io)",
+      "Bash(copy \"C:\\Users\\evgue\\AppData\\Local\\Temp\\gradio\\58600aa56842336ec8e6dd5758b4c36ada20b58f80a94df386830737cd693772\\forecast_2025-09-01_full_14day.parquet\" resultsseptember_2025_forecast_hour_aware.parquet)",
+      "Bash(cmd /c copy \"C:\\Users\\evgue\\AppData\\Local\\Temp\\gradio\\58600aa56842336ec8e6dd5758b4c36ada20b58f80a94df386830737cd693772\\forecast_2025-09-01_full_14day.parquet\" \"C:\\Users\\evgue\\projects\\fbmc_chronos2\\results\\september_2025_forecast_hour_aware.parquet\")"
     ],
     "deny": [],
     "ask": [],

CUsersevgueprojectsfbmc_chronos2app.py ADDED Viewed

	@@ -0,0 +1,161 @@

+#!/usr/bin/env python3
+"""
+FBMC Chronos-2 Forecasting API
+HuggingFace Space Gradio Interface
+Version: 1.0.2 (fixed memory fragmentation - expandable_segments)
+"""
+# CRITICAL: Set PyTorch memory allocator config BEFORE any imports
+# This prevents memory fragmentation issues that cause OOM even with sufficient free memory
+# Must be set before torch is imported the first time (including via gradio or other dependencies)
+import os
+os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
+import sys
+print(f"[STARTUP] Python version: {sys.version}", flush=True)
+print(f"[STARTUP] Python path: {sys.path[:3]}", flush=True)
+print(f"[STARTUP] PyTorch memory config: {os.environ.get('PYTORCH_CUDA_ALLOC_CONF')}", flush=True)
+import gradio as gr
+from datetime import datetime
+print("[STARTUP] Basic imports successful", flush=True)
+try:
+    from src.forecasting.chronos_inference import run_inference
+    print("[STARTUP] chronos_inference import successful", flush=True)
+except Exception as e:
+    print(f"[ERROR] Failed to import chronos_inference: {e}", flush=True)
+    import traceback
+    traceback.print_exc()
+    run_inference = None
+# Global configuration
+FORECAST_TYPES = {
+    "smoke_test": "Smoke Test (1 border × 7 days)",
+    "full_14day": "Full Forecast (All borders × 14 days)"
+}
+print("[STARTUP] Configuration loaded", flush=True)
+def forecast_api(run_date_str, forecast_type):
+    """
+    API endpoint for triggering forecasts.
+    Args:
+        run_date_str: Date in YYYY-MM-DD format
+        forecast_type: 'smoke_test' or 'full_14day'
+    Returns:
+        Path to downloadable forecast results file
+    """
+    try:
+        # Validate run date
+        run_date = datetime.strptime(run_date_str, "%Y-%m-%d")
+        # Run inference
+        result_path = run_inference(
+            run_date=run_date_str,
+            forecast_type=forecast_type,
+            output_dir="/tmp"
+        )
+        return result_path
+    except Exception as e:
+        error_msg = f"Error: {str(e)}"
+        print(error_msg)
+        # Return error message as text file
+        error_path = "/tmp/error.txt"
+        with open(error_path, 'w') as f:
+            f.write(error_msg)
+        return error_path
+# Build Gradio interface
+with gr.Blocks(title="FBMC Chronos-2 Forecasting") as demo:
+    gr.Markdown("""
+    # FBMC Chronos-2 Zero-Shot Forecasting API
+    **Flow-Based Market Coupling** electricity flow forecasting using Amazon Chronos-2.
+    This Space provides GPU-accelerated zero-shot inference for cross-border electricity flows.
+    """)
+    with gr.Row():
+        with gr.Column():
+            gr.Markdown("### Configuration")
+            run_date_input = gr.Textbox(
+                label="Run Date (YYYY-MM-DD)",
+                value="2025-09-30",
+                placeholder="2025-09-30",
+                info="Date when forecast is made (data up to this date is historical)"
+            )
+            forecast_type_input = gr.Radio(
+                choices=list(FORECAST_TYPES.keys()),
+                value="smoke_test",
+                label="Forecast Type",
+                info="Smoke test: Quick validation (1 border, 7 days). Full: Production forecast (all borders, 14 days)"
+            )
+            submit_btn = gr.Button("Run Forecast", variant="primary")
+        with gr.Column():
+            gr.Markdown("### Results")
+            output_file = gr.File(
+                label="Download Forecast Results",
+                type="filepath"
+            )
+            gr.Markdown("""
+            **Output format**: Parquet file with columns:
+            - `timestamp`: Hourly timestamps (D+1 to D+7 or D+14)
+            - `{border}_median`: Median forecast (MW)
+            - `{border}_q10`: 10th percentile (MW)
+            - `{border}_q90`: 90th percentile (MW)
+            **Inference environment**:
+            - GPU: NVIDIA T4 (16GB VRAM)
+            - Model: Chronos-T5-Large (710M parameters)
+            - Precision: bfloat16
+            """)
+    # Wire up the interface
+    submit_btn.click(
+        fn=forecast_api,
+        inputs=[run_date_input, forecast_type_input],
+        outputs=output_file
+    )
+    gr.Markdown("""
+    ---
+    ### About
+    **Zero-shot forecasting**: No model training required. The pre-trained Chronos-2 model
+    generalizes directly to FBMC cross-border flows using historical patterns and future covariates.
+    **Features**:
+    - 2,553 engineered features (weather, CNEC constraints, load forecasts, LTA)
+    - 24-month historical context (Oct 2023 - Oct 2025)
+    - Time-aware extraction (prevents data leakage)
+    - Probabilistic forecasts (10th/50th/90th percentiles)
+    **Performance**:
+    - Smoke test: ~30 seconds (1 border × 168 hours)
+    - Full forecast: ~5 minutes (38 borders × 336 hours)
+    **Project**: FBMC Flow Forecasting MVP | **Author**: Evgueni Poloukarov
+    """)
+# Launch the app
+if __name__ == "__main__":
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False
+    )

notebooks/__marimo__/october_2024_evaluation.html ADDED Viewed

The diff for this file is too large to render. See raw diff

scripts/compare_hourly_mae.py ADDED Viewed

	@@ -0,0 +1,208 @@

+#!/usr/bin/env python3
+"""
+Compare hourly MAE: baseline vs hour-aware adaptive selection.
+Loads both forecasts and compares MAE per hour-of-day to measure improvement.
+"""
+import polars as pl
+import numpy as np
+from pathlib import Path
+from datetime import datetime
+# Paths
+PROJECT_ROOT = Path(__file__).parent.parent
+BASELINE_FORECAST = PROJECT_ROOT / 'results' / 'september_2025_forecast_full_14day.parquet'
+HOUR_AWARE_FORECAST = PROJECT_ROOT / 'results' / 'september_2025_forecast_hour_aware_ACTUAL.parquet'
+BASELINE_SUMMARY = PROJECT_ROOT / 'results' / 'september_2025_hourly_summary.csv'
+OUTPUT_PATH = PROJECT_ROOT / 'results' / 'hourly_mae_comparison.csv'
+def load_actuals():
+    """Load actuals from HuggingFace dataset."""
+    print('[INFO] Loading actuals from HuggingFace dataset...')
+    from datasets import load_dataset
+    import os
+    dataset = load_dataset('evgueni-p/fbmc-features-24month', split='train', token=os.environ.get('HF_TOKEN'))
+    df_actuals_full = pl.from_arrow(dataset.data.table)
+    # Filter to September 2-15, 2025
+    forecast_start = datetime(2025, 9, 2)
+    forecast_end = datetime(2025, 9, 16)
+    df_actuals = df_actuals_full.filter(
+        (pl.col('timestamp') >= forecast_start) &
+        (pl.col('timestamp') < forecast_end)
+    )
+    print(f'[INFO] Actuals filtered: {df_actuals.shape[0]} hours')
+    return df_actuals
+def compute_hourly_mae(df_forecast, df_actuals, label):
+    """Compute MAE per hour-of-day for all borders."""
+    print(f'[INFO] Computing hourly MAE for {label}...')
+    # Extract border names
+    # For hour-aware, use _adaptive column; for baseline use _median
+    if '_adaptive' in df_forecast.columns[0] or any(c.endswith('_adaptive') for c in df_forecast.columns):
+        forecast_cols = [col for col in df_forecast.columns if col.endswith('_adaptive')]
+        border_names = [col.replace('_adaptive', '') for col in forecast_cols]
+        col_suffix = '_adaptive'
+    else:
+        forecast_cols = [col for col in df_forecast.columns if col.endswith('_median')]
+        border_names = [col.replace('_median', '') for col in forecast_cols]
+        col_suffix = '_median'
+    print(f'[INFO] Using forecast column suffix: {col_suffix}')
+    hourly_results = []
+    for border in border_names:
+        forecast_col = f'{border}{col_suffix}'
+        actual_col = f'target_border_{border}'
+        if actual_col not in df_actuals.columns:
+            continue
+        # Create unified dataframe
+        df_border = df_forecast.select(['timestamp', forecast_col]).join(
+            df_actuals.select(['timestamp', actual_col]),
+            on='timestamp',
+            how='inner'
+        )
+        # Add hour-of-day
+        df_border = df_border.with_columns([
+            pl.col('timestamp').dt.hour().alias('hour')
+        ])
+        # Compute MAE per hour
+        for hour in range(24):
+            hour_df = df_border.filter(pl.col('hour') == hour)
+            if len(hour_df) == 0:
+                continue
+            mae = (hour_df[forecast_col] - hour_df[actual_col]).abs().mean()
+            hourly_results.append({
+                'border': border,
+                'hour': hour,
+                'mae': mae,
+                'n_hours': len(hour_df),
+                'version': label
+            })
+    return pl.DataFrame(hourly_results)
+def compare_results(df_baseline_hourly, df_hour_aware_hourly):
+    """Compare baseline vs hour-aware hourly MAE."""
+    print('\n' + '='*80)
+    print('HOURLY MAE COMPARISON: Baseline vs Hour-Aware Adaptive Selection')
+    print('='*80)
+    # Aggregate across borders for each version
+    baseline_stats = df_baseline_hourly.group_by('hour').agg([
+        pl.col('mae').mean().alias('baseline_mae'),
+        pl.col('mae').median().alias('baseline_median_mae'),
+        pl.col('border').count().alias('n_borders')
+    ]).sort('hour')
+    hour_aware_stats = df_hour_aware_hourly.group_by('hour').agg([
+        pl.col('mae').mean().alias('hour_aware_mae'),
+        pl.col('mae').median().alias('hour_aware_median_mae')
+    ]).sort('hour')
+    # Join for comparison
+    comparison = baseline_stats.join(hour_aware_stats, on='hour', how='inner')
+    # Calculate improvement
+    comparison = comparison.with_columns([
+        (pl.col('baseline_mae') - pl.col('hour_aware_mae')).alias('mae_reduction'),
+        ((pl.col('baseline_mae') - pl.col('hour_aware_mae')) / pl.col('baseline_mae') * 100).alias('improvement_pct')
+    ])
+    print('\n[INFO] Hour-by-Hour Comparison:')
+    print(comparison)
+    # Overall statistics
+    overall_baseline = df_baseline_hourly['mae'].mean()
+    overall_hour_aware = df_hour_aware_hourly['mae'].mean()
+    overall_improvement = (overall_baseline - overall_hour_aware) / overall_baseline * 100
+    print(f'\n[INFO] Overall MAE:')
+    print(f'  Baseline:    {overall_baseline:.2f} MW')
+    print(f'  Hour-Aware:  {overall_hour_aware:.2f} MW')
+    print(f'  Improvement: {overall_improvement:.2f}%')
+    # Problem hours analysis (15-21)
+    problem_hours = [15, 16, 17, 18, 19, 20, 21]
+    problem_baseline = comparison.filter(pl.col('hour').is_in(problem_hours))['baseline_mae'].mean()
+    problem_hour_aware = comparison.filter(pl.col('hour').is_in(problem_hours))['hour_aware_mae'].mean()
+    problem_improvement = (problem_baseline - problem_hour_aware) / problem_baseline * 100
+    print(f'\n[INFO] Problem Hours (15-21) MAE:')
+    print(f'  Baseline:    {problem_baseline:.2f} MW')
+    print(f'  Hour-Aware:  {problem_hour_aware:.2f} MW')
+    print(f'  Improvement: {problem_improvement:.2f}%')
+    # Best/worst hours
+    print('\n[INFO] Top 5 Most Improved Hours:')
+    best_improvements = comparison.sort('improvement_pct', descending=True).head(5)
+    print(best_improvements.select(['hour', 'baseline_mae', 'hour_aware_mae', 'improvement_pct']))
+    print('\n[INFO] Top 5 Least Improved (or Degraded) Hours:')
+    worst_improvements = comparison.sort('improvement_pct').head(5)
+    print(worst_improvements.select(['hour', 'baseline_mae', 'hour_aware_mae', 'improvement_pct']))
+    # Success criteria check
+    print('\n' + '='*80)
+    if overall_improvement >= 5.0:
+        print(f'[SUCCESS] Hour-aware selection achieved {overall_improvement:.1f}% improvement (target: 5-10%)')
+        print('[RECOMMENDATION] Proceed to Phase 4: AutoGluon fine-tuning with sample weighting')
+    elif overall_improvement >= 3.0:
+        print(f'[PARTIAL SUCCESS] {overall_improvement:.1f}% improvement - marginal gain')
+        print('[RECOMMENDATION] Consider proceeding to fine-tuning, may provide larger gains')
+    else:
+        print(f'[INSUFFICIENT] Only {overall_improvement:.1f}% improvement (target: 5-10%)')
+        print('[RECOMMENDATION] Skip to Phase 4: AutoGluon fine-tuning with sample weighting')
+    print('='*80)
+    return comparison
+def main():
+    """Main comparison workflow."""
+    print('[START] Hourly MAE Comparison Analysis')
+    print(f'[INFO] Baseline forecast: {BASELINE_FORECAST}')
+    print(f'[INFO] Hour-aware forecast: {HOUR_AWARE_FORECAST}')
+    # Load data
+    df_actuals = load_actuals()
+    print(f'\n[INFO] Loading baseline forecast...')
+    df_baseline = pl.read_parquet(BASELINE_FORECAST)
+    print(f'[INFO] Baseline shape: {df_baseline.shape}')
+    print(f'\n[INFO] Loading hour-aware forecast...')
+    df_hour_aware = pl.read_parquet(HOUR_AWARE_FORECAST)
+    print(f'[INFO] Hour-aware shape: {df_hour_aware.shape}')
+    # Compute hourly MAE for both
+    df_baseline_hourly = compute_hourly_mae(df_baseline, df_actuals, 'baseline')
+    df_hour_aware_hourly = compute_hourly_mae(df_hour_aware, df_actuals, 'hour_aware')
+    # Compare results
+    comparison = compare_results(df_baseline_hourly, df_hour_aware_hourly)
+    # Save detailed comparison
+    comparison.write_csv(OUTPUT_PATH)
+    print(f'\n[INFO] Detailed comparison saved to: {OUTPUT_PATH}')
+    print('\n[SUCCESS] Hourly MAE comparison complete!')
+if __name__ == '__main__':
+    main()

scripts/test_hf_space_context_expansion.py ADDED Viewed

	@@ -0,0 +1,197 @@

+#!/usr/bin/env python3
+"""
+Test HF Space with expanded context window (128h -> 2160h).
+Validates VRAM usage and forecast variation patterns.
+"""
+import os
+import sys
+from pathlib import Path
+import polars as pl
+import numpy as np
+from gradio_client import Client
+# Get HF token from environment
+HF_TOKEN = os.getenv("HF_TOKEN")
+if not HF_TOKEN:
+    print("[ERROR] HF_TOKEN environment variable not set")
+    sys.exit(1)
+def test_hf_space_smoke():
+    """Run smoke test on HF Space and validate results"""
+    print("=" * 80)
+    print("HF SPACE SMOKE TEST: Context Window Expansion (128h -> 2160h)")
+    print("=" * 80)
+    # Initialize client
+    print("\nConnecting to HF Space...")
+    client = Client("evgueni-p/fbmc-chronos2", hf_token=HF_TOKEN)
+    print("[OK] Connected to evgueni-p/fbmc-chronos2")
+    # Test parameters
+    run_date = "2024-09-30"
+    test_border = "AT_DE"
+    forecast_type = "smoke_test"  # 7 days, 1 border
+    print(f"\nTest configuration:")
+    print(f"  Border: {test_border}")
+    print(f"  Run date: {run_date}")
+    print(f"  Forecast type: {forecast_type}")
+    print(f"  Expected context: 2160 hours (90 days)")
+    print(f"  Expected batch_size: 48")
+    # Run forecast
+    print(f"\nRunning forecast via API...")
+    try:
+        result = client.predict(
+            run_date_str=run_date,
+            forecast_type=forecast_type,
+            api_name="/forecast_api"
+        )
+        print(f"[OK] Forecast completed")
+        print(f"  Result file: {result}")
+    except Exception as e:
+        print(f"[FAIL] API call failed: {e}")
+        import traceback
+        traceback.print_exc()
+        return False
+    # Download and validate forecast
+    print(f"\nValidating forecast results...")
+    if not os.path.exists(result):
+        print(f"[FAIL] Forecast file not found: {result}")
+        return False
+    # Load forecast
+    df = pl.read_parquet(result)
+    print(f"[OK] Loaded forecast file")
+    print(f"  Shape: {df.shape}")
+    print(f"  Columns: {df.columns}")
+    # Expected: 168 hours (7 days), 4 columns (timestamp + median + q10 + q90)
+    expected_hours = 168
+    if len(df) != expected_hours:
+        print(f"[FAIL] Forecast length mismatch:")
+        print(f"  Expected: {expected_hours} hours")
+        print(f"  Got: {len(df)} hours")
+        return False
+    print(f"[OK] Forecast length: {len(df)} hours (correct)")
+    # Extract median forecast for AT_DE
+    median_col = f"{test_border}_median"
+    if median_col not in df.columns:
+        print(f"[FAIL] Column {median_col} not found in forecast")
+        return False
+    median_forecast = df[median_col].to_numpy()
+    # Check variation statistics
+    mean_val = np.mean(median_forecast)
+    std_val = np.std(median_forecast)
+    min_val = np.min(median_forecast)
+    max_val = np.max(median_forecast)
+    range_val = max_val - min_val
+    print(f"\n[CHECK] Forecast statistics:")
+    print(f"  Mean: {mean_val:.2f} MW")
+    print(f"  Std Dev: {std_val:.2f} MW")
+    print(f"  Min: {min_val:.2f} MW")
+    print(f"  Max: {max_val:.2f} MW")
+    print(f"  Range: {range_val:.2f} MW")
+    # Validation 1: Check for variation
+    if std_val < 1.0:
+        print(f"\n[WARNING] Low variation detected (std={std_val:.4f} MW)")
+        unique_values = len(np.unique(median_forecast))
+        print(f"  Unique values in forecast: {unique_values}/{len(median_forecast)}")
+        if unique_values < 5:
+            print(f"\n[FAIL] Forecast appears constant (only {unique_values} unique values)")
+            print(f"  First 24 values: {median_forecast[:24]}")
+            return False
+    else:
+        print(f"\n[OK] Forecast shows variation (std={std_val:.2f} MW)")
+    # Validation 2: Check unique values count
+    unique_values = len(np.unique(median_forecast))
+    print(f"\n[CHECK] Unique values: {unique_values}/{len(median_forecast)}")
+    if unique_values < 50:
+        print(f"[WARNING] Low diversity (expected >50 unique values)")
+    else:
+        print(f"[OK] Good diversity in forecast")
+    # Validation 3: Check data type (should be integers now)
+    if median_col in df.columns:
+        dtype = df.schema[median_col]
+        print(f"\n[CHECK] Data type: {dtype}")
+        if "Int" in str(dtype):
+            print(f"[OK] MW values converted to integers")
+        else:
+            print(f"[INFO] MW values still float (expected Int32)")
+    # Display first 48 hours
+    print(f"\n[CHECK] First 48 hours of median forecast:")
+    for i in range(min(48, len(median_forecast))):
+        if i % 12 == 0:
+            print(f"  Hour {i:3d}-{i+11:3d}: ", end="")
+        print(f"{median_forecast[i]:7.0f} ", end="")
+        if (i + 1) % 12 == 0:
+            print()
+    print()
+    # Summary
+    print("\n" + "=" * 80)
+    print("SMOKE TEST VALIDATION SUMMARY")
+    print("=" * 80)
+    checks_passed = []
+    checks_failed = []
+    # Check 1: Length
+    if len(df) == expected_hours:
+        checks_passed.append("Forecast length (168 hours)")
+    else:
+        checks_failed.append(f"Forecast length ({len(df)} != {expected_hours})")
+    # Check 2: Variation
+    if std_val >= 1.0:
+        checks_passed.append(f"Variation (std={std_val:.2f} MW)")
+    else:
+        checks_failed.append(f"Low variation (std={std_val:.4f} MW)")
+    # Check 3: Diversity
+    if unique_values >= 50:
+        checks_passed.append(f"Diversity ({unique_values} unique values)")
+    else:
+        checks_failed.append(f"Low diversity ({unique_values} unique values)")
+    print(f"\n[PASSED] {len(checks_passed)} checks:")
+    for check in checks_passed:
+        print(f"  + {check}")
+    if checks_failed:
+        print(f"\n[FAILED] {len(checks_failed)} checks:")
+        for check in checks_failed:
+            print(f"  - {check}")
+    # Overall result
+    if len(checks_failed) == 0:
+        print("\n" + "=" * 80)
+        print("[SUCCESS] ALL CHECKS PASSED - Ready for full 38-border evaluation")
+        print("=" * 80)
+        print("\nNext steps:")
+        print("1. Check HF Space logs for VRAM usage (should be ~76% = 36.6 GB / 48 GB)")
+        print("2. Run full 38-border evaluation")
+        print("3. Compare to Session 12 baseline (15.92 MW D+1 MAE)")
+        return True
+    else:
+        print("\n" + "=" * 80)
+        print("[PARTIAL SUCCESS] Some checks failed - investigate before full evaluation")
+        print("=" * 80)
+        return False
+if __name__ == "__main__":
+    success = test_hf_space_smoke()
+    sys.exit(0 if success else 1)

src/forecasting/chronos_inference.py CHANGED Viewed

@@ -289,123 +289,8 @@ class ChronosInferencePipeline:
         print(f"Total time: {results['metadata']['total_time_s']:.1f}s")
         print(f"Successful: {results['metadata']['successful_borders']}/{len(forecast_borders)} borders")
-        # Apply adaptive quantile selection based on learned uncertainty
-        print(f"\n[ADAPTIVE SELECTION] Computing adaptive forecasts based on quantile spread...")
-        results = self._apply_adaptive_selection(results, run_datetime, prediction_hours)
-        print(f"[OK] Adaptive selection complete")
         return results
-    def _apply_adaptive_selection(self, results: Dict, run_datetime: datetime, prediction_hours: int) -> Dict:
-        """
-        Apply HOUR-AWARE adaptive quantile selection based on model's LEARNED uncertainty.
-        This method uses quantile spread (q90-q10) as the model's learned volatility signal,
-        but applies DIFFERENT thresholds for different hours based on electricity market patterns.
-        Key insight: Ramping hours (7-9, 17-21) naturally have higher volatility, so we need
-        higher thresholds to avoid false positives. Night hours should be more conservative.
-        Args:
-            results: Forecast results dictionary from run_forecast()
-            run_datetime: Forecast run date/time
-            prediction_hours: Number of hours in forecast horizon
-        Returns:
-            Updated results dictionary with 'adaptive' forecast added to each border
-        """
-        # Generate forecast timestamps (start next day at midnight)
-        forecast_start = run_datetime + timedelta(days=1)
-        forecast_timestamps = [forecast_start + timedelta(hours=h) for h in range(prediction_hours)]
-        # Extract hour-of-day for each timestamp
-        hours_of_day = np.array([ts.hour for ts in forecast_timestamps])
-        # Define hour-specific uncertainty thresholds based on electricity market patterns
-        # From hourly MAE analysis: worst hours are 19 (578 MW), 15 (564 MW), 20 (550 MW)
-        hourly_thresholds = {
-            # Morning ramp (5-9): Higher threshold (0.45-0.50) → expect natural volatility
-            5: 0.45, 6: 0.45, 7: 0.50, 8: 0.50, 9: 0.45,
-            # Mid-day stable (10-16): Standard threshold (0.30-0.35)
-            10: 0.30, 11: 0.30, 12: 0.30, 13: 0.30, 14: 0.30, 15: 0.35, 16: 0.35,
-            # Evening ramp (17-21): Higher threshold (0.45-0.50) → worst observed hours
-            17: 0.45, 18: 0.50, 19: 0.50, 20: 0.50, 21: 0.45,
-            # Night stable (22-4): Lower threshold (0.25) → expect precision
-            22: 0.25, 23: 0.25, 0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25, 4: 0.30
-        }
-        for border, data in results['borders'].items():
-            if 'error' in data:
-                continue  # Skip failed borders
-            # Extract quantiles as numpy arrays for vectorized operations
-            q10_array = np.array(data['q10'])
-            q90_array = np.array(data['q90'])
-            median_array = np.array(data['median'])
-            q75_array = np.array(data['q75'])
-            q25_array = np.array(data['q25'])
-            # Calculate quantile spread (model's learned uncertainty estimate)
-            # This captures WHEN the model predicts volatility based on input features
-            spread = q90_array - q10_array
-            # Normalize spread as percentage of median (handles different border capacities)
-            # Add +1 to avoid division by zero for near-zero medians
-            uncertainty_pct = spread / (np.abs(median_array) + 1.0)
-            # HOUR-AWARE adaptive selection using hour-specific thresholds
-            adaptive_forecast = np.zeros_like(median_array, dtype=float)
-            for i, hour in enumerate(hours_of_day):
-                # Get threshold for this hour (default to 0.30 if hour not in map)
-                threshold_high = hourly_thresholds.get(hour, 0.30)
-                threshold_medium = threshold_high * 0.5  # Medium threshold is 50% of high
-                if uncertainty_pct[i] > threshold_high:
-                    # High uncertainty: use q75
-                    adaptive_forecast[i] = q75_array[i]
-                elif uncertainty_pct[i] >= threshold_medium:
-                    # Medium uncertainty: interpolate q60 between median and q75
-                    adaptive_forecast[i] = 0.6 * median_array[i] + 0.4 * q75_array[i]
-                else:
-                    # Low uncertainty: use median
-                    adaptive_forecast[i] = median_array[i]
-            # Round to integers (capacity values are always whole MW)
-            adaptive_forecast = np.round(adaptive_forecast).astype(int)
-            # Store adaptive forecast and uncertainty metadata
-            data['adaptive'] = adaptive_forecast.tolist()
-            data['uncertainty_pct'] = uncertainty_pct.tolist()
-            # Store selection statistics for analysis (using hour-aware thresholds)
-            high_uncertainty_hours = 0
-            medium_uncertainty_hours = 0
-            low_uncertainty_hours = 0
-            for i, hour in enumerate(hours_of_day):
-                threshold_high = hourly_thresholds.get(hour, 0.30)
-                threshold_medium = threshold_high * 0.5
-                if uncertainty_pct[i] > threshold_high:
-                    high_uncertainty_hours += 1
-                elif uncertainty_pct[i] >= threshold_medium:
-                    medium_uncertainty_hours += 1
-                else:
-                    low_uncertainty_hours += 1
-            data['adaptive_stats'] = {
-                'high_uncertainty_hours': int(high_uncertainty_hours),
-                'medium_uncertainty_hours': int(medium_uncertainty_hours),
-                'low_uncertainty_hours': int(low_uncertainty_hours),
-                'mean_uncertainty_pct': float(np.mean(uncertainty_pct)),
-                'max_uncertainty_pct': float(np.max(uncertainty_pct))
-            }
-        return results
     def export_to_parquet(self, results: Dict, output_path: str):
         """

         print(f"Total time: {results['metadata']['total_time_s']:.1f}s")
         print(f"Successful: {results['metadata']['successful_borders']}/{len(forecast_borders)} borders")
         return results
     def export_to_parquet(self, results: Dict, output_path: str):
         """

temp_analysis.txt ADDED Viewed

	@@ -0,0 +1,57 @@

+=== DATA STRUCTURE ===
+Shape: (17544, 199)
+Directional columns (e.g., CZ>PL): 132
+Border_ columns (e.g., border_CZ_PL): 38
+=== SAMPLE VALUES ===
+shape: (5, 5)
+┌────────────────────────────────┬────────┬────────┬──────────────┬──────────────┐
+│ mtu                            ┆ CZ>PL  ┆ PL>CZ  ┆ border_CZ_PL ┆ border_PL_CZ │
+│ ---                            ┆ ---    ┆ ---    ┆ ---          ┆ ---          │
+│ datetime[ns, Europe/Amsterdam] ┆ f64    ┆ f64    ┆ i64          ┆ i64          │
+╞════════════════════════════════╪════════╪════════╪══════════════╪══════════════╡
+│ 2023-10-01 02:00:00 CEST       ┆ 2785.0 ┆ 3883.0 ┆ 0            ┆ 0            │
+│ 2023-10-01 03:00:00 CEST       ┆ 2711.0 ┆ 3775.0 ┆ 0            ┆ 0            │
+│ 2023-10-01 04:00:00 CEST       ┆ 2831.0 ┆ 3787.0 ┆ 0            ┆ 0            │
+│ 2023-10-01 05:00:00 CEST       ┆ 2778.0 ┆ 3361.0 ┆ 0            ┆ 0            │
+│ 2023-10-01 06:00:00 CEST       ┆ 2744.0 ┆ 3057.0 ┆ 0            ┆ 0            │
+└────────────────────────────────┴────────┴────────┴──────────────┴──────────────┘
+=== STATISTICS ===
+shape: (1, 4)
+┌─────────────┬─────────────┬───────────────────┬───────────────────┐
+│ CZ>PL_mean  ┆ PL>CZ_mean  ┆ border_CZ_PL_mean ┆ border_PL_CZ_mean │
+│ ---         ┆ ---         ┆ ---               ┆ ---               │
+│ f64         ┆ f64         ┆ f64               ┆ f64               │
+╞═════════════╪═════════════╪═══════════════════╪═══════════════════╡
+│ 3481.789045 ┆ 2697.566404 ┆ 0.0               ┆ 9.573358          │
+└─────────────┴─────────────┴───────────────────┴───────────────────┘
+=== ARE THEY THE SAME? ===
+shape: (1, 2)
+┌───────────────────────┬───────────────────────┐
+│ CZ>PL == border_CZ_PL ┆ PL>CZ == border_PL_CZ │
+│ ---                   ┆ ---                   │
+│ bool                  ┆ bool                  │
+╞═══════════════════════╪═══════════════════════╡
+│ false                 ┆ false                 │
+└───────────────────────┴───────────────────────┘
+=== CHECKING IF BORDER COLUMNS ARE MAX OF BOTH DIRECTIONS ===
+shape: (10, 4)
+┌─────────────────────────────────┬────────┬────────┬──────────────┐
+│ border_CZ_PL == max(CZ>PL, PL>… ┆ CZ>PL  ┆ PL>CZ  ┆ border_CZ_PL │
+│ ---                             ┆ ---    ┆ ---    ┆ ---          │
+│ bool                            ┆ f64    ┆ f64    ┆ i64          │
+╞═════════════════════════════════╪════════╪════════╪══════════════╡
+│ false                           ┆ 2785.0 ┆ 3883.0 ┆ 0            │
+│ false                           ┆ 2711.0 ┆ 3775.0 ┆ 0            │
+│ false                           ┆ 2831.0 ┆ 3787.0 ┆ 0            │
+│ false                           ┆ 2778.0 ┆ 3361.0 ┆ 0            │
+│ false                           ┆ 2744.0 ┆ 3057.0 ┆ 0            │
+│ false                           ┆ 2838.0 ┆ 2574.0 ┆ 0            │
+│ false                           ┆ 2941.0 ┆ 2660.0 ┆ 0            │
+│ false                           ┆ 3364.0 ┆ 2545.0 ┆ 0            │
+│ false                           ┆ 3762.0 ┆ 2438.0 ┆ 0            │
+│ false                           ┆ 3731.0 ┆ 3120.0 ┆ 0            │
+└─────────────────────────────────┴────────┴────────┴──────────────┘

temp_final_summary.txt ADDED Viewed

	@@ -0,0 +1,148 @@

+================================================================================
+JAO DATA STRUCTURE VERIFICATION - FINAL REPORT
+================================================================================
+QUESTION: What should be the forecast target for "max capacity in a given direction"?
+================================================================================
+1. JAO DATA TYPES IDENTIFIED
+================================================================================
+A. DIRECTIONAL FLOW COLUMNS (CZ>PL, PL>CZ format)
+   - Total: 132 columns (12 x 11 bidirectional combinations)
+   - Source: MaxBEX dataset from JAO
+   - Represents: Maximum Bilateral Exchange Capacity (hub-to-hub)
+   - Type: Commercial trading capacity (MW)
+   - Includes: ALL zone pairs (physical + virtual borders)
+   Example values for CZ<->PL:
+shape: (5, 3)
+┌────────────────────────────────┬────────┬────────┐
+│ mtu                            ┆ CZ>PL  ┆ PL>CZ  │
+│ ---                            ┆ ---    ┆ ---    │
+│ datetime[ns, Europe/Amsterdam] ┆ f64    ┆ f64    │
+╞════════════════════════════════╪════════╪════════╡
+│ 2023-10-01 02:00:00 CEST       ┆ 2785.0 ┆ 3883.0 │
+│ 2023-10-01 03:00:00 CEST       ┆ 2711.0 ┆ 3775.0 │
+│ 2023-10-01 04:00:00 CEST       ┆ 2831.0 ┆ 3787.0 │
+│ 2023-10-01 05:00:00 CEST       ┆ 2778.0 ┆ 3361.0 │
+│ 2023-10-01 06:00:00 CEST       ┆ 2744.0 ┆ 3057.0 │
+└────────────────────────────────┴────────┴────────┘
+   Statistics (CZ<->PL):
+shape: (1, 6)
+┌───────────────┬───────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
+│ CZ>PL_mean_MW ┆ PL>CZ_mean_MW ┆ CZ>PL_min_MW ┆ PL>CZ_min_MW ┆ CZ>PL_max_MW ┆ PL>CZ_max_MW │
+│ ---           ┆ ---           ┆ ---          ┆ ---          ┆ ---          ┆ ---          │
+│ f64           ┆ f64           ┆ f64          ┆ f64          ┆ f64          ┆ f64          │
+╞═══════════════╪═══════════════╪══════════════╪══════════════╪══════════════╪══════════════╡
+│ 3481.789045   ┆ 2697.566404   ┆ 144.0        ┆ 0.0          ┆ 5699.0       ┆ 4631.0       │
+└───────────────┴───────────────┴──────────────┴──────────────┴──────────────┴──────────────┘
+B. BORDER COLUMNS (border_CZ_PL format)
+   - Total: 38 columns
+   - Source: LTA (Long-Term Allocations) dataset from JAO
+   - Represents: Pre-allocated capacity from long-term contracts (MW)
+   - Type: Allocated capacity (reduces available MaxBEX)
+   - Includes: ONLY physical borders with direct interconnectors
+   Example values for CZ-PL border:
+shape: (5, 3)
+┌────────────────────────────────┬──────────────┬──────────────┐
+│ mtu                            ┆ border_CZ_PL ┆ border_PL_CZ │
+│ ---                            ┆ ---          ┆ ---          │
+│ datetime[ns, Europe/Amsterdam] ┆ i64          ┆ i64          │
+╞════════════════════════════════╪══════════════╪══════════════╡
+│ 2023-10-01 02:00:00 CEST       ┆ 0            ┆ 0            │
+│ 2023-10-01 03:00:00 CEST       ┆ 0            ┆ 0            │
+│ 2023-10-01 04:00:00 CEST       ┆ 0            ┆ 0            │
+│ 2023-10-01 05:00:00 CEST       ┆ 0            ┆ 0            │
+│ 2023-10-01 06:00:00 CEST       ┆ 0            ┆ 0            │
+└────────────────────────────────┴──────────────┴──────────────┘
+   Statistics (CZ-PL border):
+shape: (1, 4)
+┌──────────────────────┬──────────────────────┬───────────────────────┬───────────────────────┐
+│ border_CZ_PL_mean_MW ┆ border_PL_CZ_mean_MW ┆ border_CZ_PL_total_MW ┆ border_PL_CZ_total_MW │
+│ ---                  ┆ ---                  ┆ ---                   ┆ ---                   │
+│ f64                  ┆ f64                  ┆ i64                   ┆ i64                   │
+╞══════════════════════╪══════════════════════╪═══════════════════════╪═══════════════════════╡
+│ 0.0                  ┆ 9.573358             ┆ 0                     ┆ 167955                │
+└──────────────────────┴──────────────────────┴───────────────────────┴───────────────────────┘
+================================================================================
+2. KEY DIFFERENCES
+================================================================================
+DIRECTIONAL COLUMNS (CZ>PL):
+- MaxBEX = Commercial trading capacity in specific direction
+- CZ>PL != PL>CZ (asymmetric, depends on network constraints)
+- Avg CZ>PL: 3,482 MW vs Avg PL>CZ: 2,698 MW (significant difference!)
+- Calculated by JAO optimization considering ALL network constraints
+- THIS IS THE FORECAST TARGET!
+BORDER COLUMNS (border_CZ_PL):
+- LTA = Long-term allocated capacity (pre-sold)
+- Only exists for 38 physical borders (not all 132 zone pairs)
+- Much smaller values (avg border_CZ_PL: 0 MW, border_PL_CZ: 9.6 MW)
+- Acts as INPUT/CONSTRAINT to MaxBEX calculation
+- NOT a capacity forecast target
+================================================================================
+3. RELATIONSHIP BETWEEN MaxBEX AND LTA
+================================================================================
+From JAO documentation:
+  MaxBEX (available capacity) = Optimized capacity - LTA allocations
+  LTA reduces available MaxBEX because capacity is pre-sold in:
+  - Yearly auctions
+  - Monthly auctions
+  - Other long-term contracts
+================================================================================
+4. VERIFICATION: PHYSICAL vs VIRTUAL BORDERS
+================================================================================
+Physical borders (with LTA): 38
+Total MaxBEX pairs: 132 (12 x 11)
+Virtual borders: 94 (zone pairs without physical interconnectors)
+================================================================================
+5. FINAL ANSWER
+================================================================================
+TARGET FOR FORECASTING "Max Capacity in a Given Direction":
+  USE: Directional columns (CZ>PL, PL>CZ, DE>FR, etc.)
+  - These are MaxBEX values = commercial trading capacity
+  - Represents actual available capacity in that specific direction
+  - Accounts for network constraints, LTA allocations, and physics
+  - 132 total targets (all zone-pair combinations)
+  DO NOT USE: border_ columns (border_CZ_PL, border_PL_CZ, etc.)
+  - These are LTA values = pre-allocated capacity
+  - Should be used as INPUT FEATURES (future covariates)
+  - Only 38 physical borders (incomplete coverage)
+  - Much smaller values (often near zero)
+================================================================================
+6. CURRENT IMPLEMENTATION STATUS
+================================================================================
+[OK] The change from border_* to directional columns was CORRECT!
+Before: Using border_CZ_PL (LTA allocations) as targets
+  - WRONG: Forecasting pre-allocated capacity (not meaningful)
+  - Only 38 borders covered
+  - Very low values (mostly zeros)
+After: Using CZ>PL directional columns (MaxBEX) as targets
+  - CORRECT: Forecasting commercial trading capacity
+  - All 132 zone pairs covered
+  - Represents actual "max capacity in given direction"
+  - Values match expected capacity ranges (hundreds to thousands of MW)
+================================================================================
+END OF REPORT
+================================================================================

temp_lta_analysis.txt ADDED Viewed

	@@ -0,0 +1,59 @@

+=== LTA DATA STRUCTURE ===
+Shape: (16834, 41)
+Columns: ['mtu', 'border_AT_CZ', 'border_AT_HU', 'border_AT_SI', 'border_BE_DE', 'border_CZ_AT', 'border_CZ_DE', 'border_CZ_PL', 'border_CZ_SK', 'border_DE_BE', 'border_DE_CZ', 'border_DE_PL', 'border_HU_AT', 'border_HU_SI', 'border_HU_SK', 'border_HU_HR', 'border_HU_RO', 'border_HR_HU', 'border_HR_SI', 'border_PL_CZ', 'border_PL_DE', 'border_PL_SK', 'border_RO_HU', 'border_SI_AT', 'border_SI_HR', 'border_SI_HU', 'border_SK_CZ', 'border_SK_HU', 'border_SK_PL', 'border_AT_DE', 'border_BE_NL', 'border_BE_FR', 'border_DE_AT', 'border_DE_FR', 'border_DE_NL', 'border_FR_BE', 'border_FR_DE', 'border_NL_BE', 'border_NL_DE', 'is_masked', 'masking_method']
+=== LTA SAMPLE DATA ===
+shape: (10, 41)
+┌───────────┬───────────┬───────────┬───────────┬───┬───────────┬───────────┬───────────┬──────────┐
+│ mtu       ┆ border_AT ┆ border_AT ┆ border_AT ┆ … ┆ border_NL ┆ border_NL ┆ is_masked ┆ masking_ │
+│ ---       ┆ _CZ       ┆ _HU       ┆ _SI       ┆   ┆ _BE       ┆ _DE       ┆ ---       ┆ method   │
+│ datetime[ ┆ ---       ┆ ---       ┆ ---       ┆   ┆ ---       ┆ ---       ┆ bool      ┆ ---      │
+│ ns, Europ ┆ i64       ┆ i64       ┆ i64       ┆   ┆ i64       ┆ i64       ┆           ┆ str      │
+│ e/Amsterd ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ am]       ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+╞═══════════╪═══════════╪═══════════╪═══════════╪═══╪═══════════╪═══════════╪═══════════╪══════════╡
+│ 2023-10-0 ┆ 350       ┆ 400       ┆ 600       ┆ … ┆ 619       ┆ 1081      ┆ false     ┆ null     │
+│ 1         ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 02:00:00  ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ CEST      ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 2023-10-0 ┆ 350       ┆ 400       ┆ 600       ┆ … ┆ 619       ┆ 1081      ┆ false     ┆ null     │
+│ 1         ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 03:00:00  ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ CEST      ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 2023-10-0 ┆ 350       ┆ 400       ┆ 600       ┆ … ┆ 619       ┆ 1081      ┆ false     ┆ null     │
+│ 1         ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 04:00:00  ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ CEST      ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 2023-10-0 ┆ 350       ┆ 400       ┆ 600       ┆ … ┆ 619       ┆ 1081      ┆ false     ┆ null     │
+│ 1         ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 05:00:00  ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ CEST      ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 2023-10-0 ┆ 350       ┆ 400       ┆ 600       ┆ … ┆ 619       ┆ 1081      ┆ false     ┆ null     │
+│ 1         ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 06:00:00  ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ CEST      ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 2023-10-0 ┆ 350       ┆ 400       ┆ 600       ┆ … ┆ 619       ┆ 1081      ┆ false     ┆ null     │
+│ 1         ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 07:00:00  ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ CEST      ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 2023-10-0 ┆ 350       ┆ 400       ┆ 600       ┆ … ┆ 619       ┆ 1081      ┆ false     ┆ null     │
+│ 1         ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 08:00:00  ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ CEST      ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 2023-10-0 ┆ 350       ┆ 400       ┆ 600       ┆ … ┆ 619       ┆ 1081      ┆ false     ┆ null     │
+│ 1         ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 09:00:00  ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ CEST      ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 2023-10-0 ┆ 350       ┆ 400       ┆ 600       ┆ … ┆ 619       ┆ 1081      ┆ false     ┆ null     │
+│ 1         ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 10:00:00  ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ CEST      ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 2023-10-0 ┆ 350       ┆ 400       ┆ 600       ┆ … ┆ 619       ┆ 1081      ┆ false     ┆ null     │
+│ 1         ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ 11:00:00  ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+│ CEST      ┆           ┆           ┆           ┆   ┆           ┆           ┆           ┆          │
+└───────────┴───────────┴───────────┴───────────┴───┴───────────┴───────────┴───────────┴──────────┘
+=== COLUMN NAMES IN LTA ===
+Found 38 border_ columns
+['border_AT_CZ', 'border_AT_HU', 'border_AT_SI', 'border_BE_DE', 'border_CZ_AT', 'border_CZ_DE', 'border_CZ_PL', 'border_CZ_SK', 'border_DE_BE', 'border_DE_CZ', 'border_DE_PL', 'border_HU_AT', 'border_HU_SI', 'border_HU_SK', 'border_HU_HR', 'border_HU_RO', 'border_HR_HU', 'border_HR_SI', 'border_PL_CZ', 'border_PL_DE', 'border_PL_SK', 'border_RO_HU', 'border_SI_AT', 'border_SI_HR', 'border_SI_HU', 'border_SK_CZ', 'border_SK_HU', 'border_SK_PL', 'border_AT_DE', 'border_BE_NL', 'border_BE_FR', 'border_DE_AT', 'border_DE_FR', 'border_DE_NL', 'border_FR_BE', 'border_FR_DE', 'border_NL_BE', 'border_NL_DE']

temp_raw_analysis.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+=== RAW JAO MAXBEX DATA ===
+Shape: (18696, 132)
+Columns: ['AT>BE', 'AT>CZ', 'AT>DE', 'AT>FR', 'AT>HR', 'AT>HU', 'AT>NL', 'AT>PL', 'AT>RO', 'AT>SI', 'AT>SK', 'BE>AT', 'BE>CZ', 'BE>DE', 'BE>FR', 'BE>HR', 'BE>HU', 'BE>NL', 'BE>PL', 'BE>RO', 'BE>SI', 'BE>SK', 'CZ>AT', 'CZ>BE', 'CZ>DE', 'CZ>FR', 'CZ>HR', 'CZ>HU', 'CZ>NL', 'CZ>PL', 'CZ>RO', 'CZ>SI', 'CZ>SK', 'DE>AT', 'DE>BE', 'DE>CZ', 'DE>FR', 'DE>HR', 'DE>HU', 'DE>NL', 'DE>PL', 'DE>RO', 'DE>SI', 'DE>SK', 'FR>AT', 'FR>BE', 'FR>CZ', 'FR>DE', 'FR>HR', 'FR>HU', 'FR>NL', 'FR>PL', 'FR>RO', 'FR>SI', 'FR>SK', 'HR>AT', 'HR>BE', 'HR>CZ', 'HR>DE', 'HR>FR', 'HR>HU', 'HR>NL', 'HR>PL', 'HR>RO', 'HR>SI', 'HR>SK', 'HU>AT', 'HU>BE', 'HU>CZ', 'HU>DE', 'HU>FR', 'HU>HR', 'HU>NL', 'HU>PL', 'HU>RO', 'HU>SI', 'HU>SK', 'NL>AT', 'NL>BE', 'NL>CZ', 'NL>DE', 'NL>FR', 'NL>HR', 'NL>HU', 'NL>PL', 'NL>RO', 'NL>SI', 'NL>SK', 'PL>AT', 'PL>BE', 'PL>CZ', 'PL>DE', 'PL>FR', 'PL>HR', 'PL>HU', 'PL>NL', 'PL>RO', 'PL>SI', 'PL>SK', 'RO>AT', 'RO>BE', 'RO>CZ', 'RO>DE', 'RO>FR', 'RO>HR', 'RO>HU', 'RO>NL', 'RO>PL', 'RO>SI', 'RO>SK', 'SI>AT', 'SI>BE', 'SI>CZ', 'SI>DE', 'SI>FR', 'SI>HR', 'SI>HU', 'SI>NL', 'SI>PL', 'SI>RO', 'SI>SK', 'SK>AT', 'SK>BE', 'SK>CZ', 'SK>DE', 'SK>FR', 'SK>HR', 'SK>HU', 'SK>NL', 'SK>PL', 'SK>RO', 'SK>SI']
+=== SAMPLE RAW DATA ===