# FBMC Flow Forecasting MVP - Day 0 Quick Start Guide ## Environment Setup (45 Minutes) **Target**: From zero to working local + HF Space environment with all dependencies verified --- ## Prerequisites Check (5 minutes) Before starting, verify you have: ```bash # Check Git git --version # Need: 2.x+ # Check Python python3 --version # Need: 3.10+ ``` **API Keys & Accounts Ready:** - [ ] ENTSO-E Transparency Platform API key - [ ] Hugging Face account with payment method for Spaces - [ ] Hugging Face write token (for uploading datasets) **Important Data Storage Philosophy:** - **Code** → Git repository (small, version controlled) - **Data** → HuggingFace Datasets (separate, not in Git) - **NO Git LFS** needed (following data science best practices) --- ## Step 1: Create Hugging Face Space (10 minutes) 1. **Navigate to**: https://huggingface.co/new-space 2. **Configure Space:** - **Owner**: Your username/organization - **Space name**: `fbmc-forecasting` (or your preference) - **License**: Apache 2.0 - **Select SDK**: `JupyterLab` - **Select Hardware**: `A10G GPU ($30/month)` ← **CRITICAL** - **Visibility**: Private (recommended for MVP) 3. **Create Space** button 4. **Wait 2-3 minutes** for Space initialization 5. **Verify Space Access:** - Visit: `https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting` - Confirm JupyterLab interface loads - Check hardware: Should show "A10G GPU" in bottom-right --- ## Step 2: Local Environment Setup (25 minutes) ### 2.1 Clone HF Space Locally (2 minutes) ```bash # Clone your HF Space git clone https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting cd fbmc-forecasting # Verify remote git remote -v # Should show: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting ``` ### 2.2 Create Directory Structure (1 minute) ```bash # Create project directories mkdir -p notebooks \ notebooks_exported \ src/{data_collection,feature_engineering,model,utils} \ config \ results/{forecasts,evaluation,visualizations} \ docs \ tools \ tests # Note: data/ directory will be created by download scripts # It is NOT tracked in Git (following best practices) # Verify structure tree -L 2 ``` ### 2.3 Install uv Package Manager (2 minutes) ```bash # Install uv (ultra-fast pip replacement) curl -LsSf https://astral.sh/uv/install.sh | sh # Add to PATH (if not automatic) export PATH="$HOME/.cargo/bin:$PATH" # Verify installation uv --version # Should show: uv 0.x.x ``` ### 2.4 Create Virtual Environment (1 minute) ```bash # Create .venv with uv uv venv # Activate (Linux/Mac) source .venv/bin/activate # Activate (Windows) # .venv\Scripts\activate # Verify activation which python # Should point to: /path/to/fbmc-forecasting/.venv/bin/python ``` ### 2.5 Install Dependencies (2 minutes) ```bash # Create requirements.txt cat > requirements.txt << 'EOF' # Core Data & ML polars>=0.20.0 pyarrow>=13.0.0 numpy>=1.24.0 scikit-learn>=1.3.0 # Time Series Forecasting chronos-forecasting>=1.0.0 transformers>=4.35.0 torch>=2.0.0 # Data Collection entsoe-py>=0.5.0 jao-py>=0.6.0 requests>=2.31.0 # HuggingFace Integration (for Datasets, NOT Git LFS) datasets>=2.14.0 huggingface-hub>=0.17.0 # Visualization & Notebooks altair>=5.0.0 marimo>=0.9.0 jupyter>=1.0.0 ipykernel>=6.25.0 # Utilities pyyaml>=6.0.0 python-dotenv>=1.0.0 tqdm>=4.66.0 # HF Space Integration gradio>=4.0.0 EOF # Install with uv (ultra-fast) uv pip install -r requirements.txt # Create lockfile for reproducibility uv pip compile requirements.txt -o requirements.lock ``` **Verify installations:** ```bash python -c "import polars; print(f'polars {polars.__version__}')" python -c "import marimo; print(f'marimo {marimo.__version__}')" python -c "import torch; print(f'torch {torch.__version__}')" python -c "from chronos import ChronosPipeline; print('chronos-forecasting ✓')" python -c "from datasets import Dataset; print('datasets ✓')" python -c "from huggingface_hub import HfApi; print('huggingface-hub ✓')" python -c "import jao; print(f'jao-py {jao.__version__}')" ``` ### 2.6 Configure .gitignore (Data Exclusion) (2 minutes) ```bash # Create .gitignore - CRITICAL for keeping data out of Git cat > .gitignore << 'EOF' # ============================================ # Data Files - NEVER commit to Git # ============================================ # Following data science best practices: # - Code goes in Git # - Data goes in HuggingFace Datasets data/ *.parquet *.pkl *.csv *.h5 *.hdf5 *.feather # ============================================ # Model Artifacts # ============================================ models/checkpoints/ *.pth *.safetensors *.ckpt # ============================================ # Credentials & Secrets # ============================================ .env config/api_keys.yaml *.key *.pem # ============================================ # Python # ============================================ __pycache__/ *.pyc *.pyo *.egg-info/ .pytest_cache/ .venv/ venv/ # ============================================ # IDE & OS # ============================================ .vscode/ .idea/ *.swp .DS_Store Thumbs.db # ============================================ # Jupyter # ============================================ .ipynb_checkpoints/ # ============================================ # Temporary Files # ============================================ *.tmp *.log .cache/ EOF # Stage .gitignore git add .gitignore # Verify data/ will be ignored echo "data/" >> .gitignore git check-ignore data/test.parquet # Should output: data/test.parquet (confirming it's ignored) ``` **Why NO Git LFS?** Following data science best practices: - ✓ **Code** → Git (fast, version controlled) - ✓ **Data** → HuggingFace Datasets (separate, scalable) - ✗ **NOT** Git LFS (expensive, non-standard for ML projects) **Data will be:** - Downloaded via scripts (Day 1) - Uploaded to HF Datasets (Day 1) - Loaded programmatically (Days 2-5) - NEVER committed to Git repository ### 2.7 Configure API Keys & HuggingFace Access (3 minutes) ```bash # Create config directory structure mkdir -p config # Create API keys configuration cat > config/api_keys.yaml << 'EOF' # ENTSO-E Transparency Platform entsoe_api_key: "YOUR_ENTSOE_API_KEY_HERE" # OpenMeteo (free tier - no key required) openmeteo_base_url: "https://api.open-meteo.com/v1/forecast" # Hugging Face (for uploading datasets) hf_token: "YOUR_HF_WRITE_TOKEN_HERE" hf_username: "YOUR_HF_USERNAME" EOF # Create .env file for environment variables cat > .env << 'EOF' ENTSOE_API_KEY=YOUR_ENTSOE_API_KEY_HERE OPENMETEO_BASE_URL=https://api.open-meteo.com/v1/forecast HF_TOKEN=YOUR_HF_WRITE_TOKEN_HERE HF_USERNAME=YOUR_HF_USERNAME EOF ``` **Get your HuggingFace Write Token:** 1. Visit: https://huggingface.co/settings/tokens 2. Click "New token" 3. Name: "FBMC Dataset Upload" 4. Type: **Write** (required for uploading datasets) 5. Copy token **Now edit the files with your actual credentials:** ```bash # Option 1: Use text editor nano config/api_keys.yaml # Update all YOUR_*_HERE placeholders nano .env # Update all YOUR_*_HERE placeholders # Option 2: Use sed (replace with your actual values) sed -i 's/YOUR_ENTSOE_API_KEY_HERE/your-actual-entsoe-key/' config/api_keys.yaml .env sed -i 's/YOUR_HF_WRITE_TOKEN_HERE/hf_your-actual-token/' config/api_keys.yaml .env sed -i 's/YOUR_HF_USERNAME/your-username/' config/api_keys.yaml .env ``` **Verify credentials are set:** ```bash # Should NOT see any "YOUR_*_HERE" placeholders grep "YOUR_" config/api_keys.yaml # Empty output = good! ``` ### 2.8 Create Data Management Utilities (5 minutes) ```bash # Create data collection module with HF Datasets integration cat > src/data_collection/hf_datasets_manager.py << 'EOF' """HuggingFace Datasets manager for FBMC data storage.""" import polars as pl from datasets import Dataset, DatasetDict from huggingface_hub import HfApi from pathlib import Path import yaml class FBMCDatasetManager: """Manage FBMC data uploads/downloads via HuggingFace Datasets.""" def __init__(self, config_path: str = "config/api_keys.yaml"): """Initialize with HF credentials.""" with open(config_path) as f: config = yaml.safe_load(f) self.hf_token = config['hf_token'] self.hf_username = config['hf_username'] self.api = HfApi(token=self.hf_token) def upload_dataset(self, parquet_path: Path, dataset_name: str, description: str = ""): """Upload Parquet file to HuggingFace Datasets.""" print(f"Uploading {parquet_path.name} to HF Datasets...") # Load Parquet as polars, convert to HF Dataset df = pl.read_parquet(parquet_path) dataset = Dataset.from_pandas(df.to_pandas()) # Create full dataset name full_name = f"{self.hf_username}/{dataset_name}" # Upload to HF dataset.push_to_hub( full_name, token=self.hf_token, private=False # Public datasets (free storage) ) print(f"✓ Uploaded to: https://huggingface.co/datasets/{full_name}") return full_name def download_dataset(self, dataset_name: str, output_path: Path): """Download dataset from HF to local Parquet.""" from datasets import load_dataset print(f"Downloading {dataset_name} from HF Datasets...") # Download from HF dataset = load_dataset( f"{self.hf_username}/{dataset_name}", split="train" ) # Convert to polars and save df = pl.from_pandas(dataset.to_pandas()) output_path.parent.mkdir(parents=True, exist_ok=True) df.write_parquet(output_path) print(f"✓ Downloaded to: {output_path}") return df def list_datasets(self): """List all FBMC datasets for this user.""" datasets = self.api.list_datasets(author=self.hf_username) fbmc_datasets = [d for d in datasets if 'fbmc' in d.id.lower()] print(f"\nFBMC Datasets for {self.hf_username}:") for ds in fbmc_datasets: print(f" - {ds.id}") return fbmc_datasets # Example usage (will be used in Day 1) if __name__ == "__main__": manager = FBMCDatasetManager() # Upload example (Day 1 will use this) # manager.upload_dataset( # parquet_path=Path("data/raw/cnecs_2023_2025.parquet"), # dataset_name="fbmc-cnecs-2023-2025", # description="FBMC CNECs data: Oct 2023 - Sept 2025" # ) # Download example (HF Space will use this) # manager.download_dataset( # dataset_name="fbmc-cnecs-2023-2025", # output_path=Path("data/raw/cnecs_2023_2025.parquet") # ) EOF # Create data download orchestrator cat > src/data_collection/download_all.py << 'EOF' """Download all FBMC data from HuggingFace Datasets.""" from pathlib import Path from hf_datasets_manager import FBMCDatasetManager def setup_data(data_dir: Path = Path("data/raw")): """Download all datasets if not present locally.""" manager = FBMCDatasetManager() datasets_to_download = { "fbmc-cnecs-2023-2025": "cnecs_2023_2025.parquet", "fbmc-weather-2023-2025": "weather_2023_2025.parquet", "fbmc-entsoe-2023-2025": "entsoe_2023_2025.parquet", } data_dir.mkdir(parents=True, exist_ok=True) for dataset_name, filename in datasets_to_download.items(): output_path = data_dir / filename if output_path.exists(): print(f"✓ {filename} already exists, skipping") else: try: manager.download_dataset(dataset_name, output_path) except Exception as e: print(f"✗ Failed to download {dataset_name}: {e}") print(f" You may need to run Day 1 data collection first") print("\n✓ Data setup complete") if __name__ == "__main__": setup_data() EOF # Make scripts executable chmod +x src/data_collection/hf_datasets_manager.py chmod +x src/data_collection/download_all.py echo "✓ Data management utilities created" ``` **What This Does:** - `hf_datasets_manager.py`: Upload/download Parquet files to/from HF Datasets - `download_all.py`: One-command data setup for HF Space or analysts **Day 1 Workflow:** 1. Download data from JAO/ENTSO-E/OpenMeteo to `data/raw/` 2. Upload each Parquet to HF Datasets (separate from Git) 3. Git repo stays small (only code) **HF Space Workflow:** ```python # In your Space's app.py startup: from src.data_collection.download_all import setup_data setup_data() # Downloads from HF Datasets, not Git ``` ### 2.9 Create First Marimo Notebook (5 minutes) ```bash # Create initial exploration notebook cat > notebooks/01_data_exploration.py << 'EOF' import marimo __generated_with = "0.9.0" app = marimo.App(width="medium") @app.cell def __(): import marimo as mo import polars as pl import altair as alt from pathlib import Path return mo, pl, alt, Path @app.cell def __(mo): mo.md( """ # FBMC Flow Forecasting - Data Exploration **Day 1 Objective**: Explore JAO FBMC data structure ## Steps: 1. Load downloaded Parquet files 2. Inspect CNECs, PTDFs, RAMs 3. Identify top 200 binding CNECs (50 Tier-1 + 150 Tier-2) 4. Visualize temporal patterns """ ) return @app.cell def __(Path): # Data paths DATA_DIR = Path("../data/raw") CNECS_FILE = DATA_DIR / "cnecs_2023_2025.parquet" return DATA_DIR, CNECS_FILE @app.cell def __(mo, CNECS_FILE): # Check if data exists if CNECS_FILE.exists(): mo.md("✓ CNECs data found - ready for Day 1 analysis") else: mo.md("⚠ CNECs data not yet downloaded - run Day 1 collection script") return if __name__ == "__main__": app.run() EOF # Test Marimo installation marimo edit notebooks/01_data_exploration.py & # This will open browser with interactive notebook # Close after verifying it loads correctly (Ctrl+C in terminal) ``` ### 2.10 Create Utility Modules (2 minutes) ```bash # Create data loading utilities cat > src/utils/data_loader.py << 'EOF' """Data loading utilities for FBMC forecasting project.""" import polars as pl from pathlib import Path from typing import Optional def load_cnecs(data_dir: Path, start_date: Optional[str] = None, end_date: Optional[str] = None) -> pl.DataFrame: """Load CNEC data with optional date filtering.""" cnecs = pl.read_parquet(data_dir / "cnecs_2023_2025.parquet") if start_date: cnecs = cnecs.filter(pl.col("timestamp") >= start_date) if end_date: cnecs = cnecs.filter(pl.col("timestamp") <= end_date) return cnecs def load_weather(data_dir: Path, grid_points: Optional[list] = None) -> pl.DataFrame: """Load weather data with optional grid point filtering.""" weather = pl.read_parquet(data_dir / "weather_2023_2025.parquet") if grid_points: weather = weather.filter(pl.col("grid_point").is_in(grid_points)) return weather EOF # Create __init__.py files touch src/__init__.py touch src/utils/__init__.py touch src/data_collection/__init__.py touch src/feature_engineering/__init__.py touch src/model/__init__.py ``` ### 2.11 Initial Commit (2 minutes) ```bash # Stage all changes (note: data/ is excluded by .gitignore) git add . # Create initial commit git commit -m "Day 0: Initialize FBMC forecasting MVP environment - Add project structure (notebooks, src, config, tools) - Configure uv + polars + Marimo + Chronos + HF Datasets stack - Create .gitignore (excludes data/ following best practices) - Install jao-py Python library for JAO data access - Configure ENTSO-E, OpenMeteo, and HuggingFace API access - Add HF Datasets manager for data storage (separate from Git) - Create data download utilities (download_all.py) - Create initial exploration notebook Data Strategy: - Code → Git (this repo) - Data → HuggingFace Datasets (separate, not in Git) - NO Git LFS (following data science best practices) Infrastructure: HF Space (A10G GPU, \$30/month)" # Push to HF Space git push origin main # Verify push succeeded git status # Should show: "Your branch is up to date with 'origin/main'" # Verify no data files were committed git ls-files | grep "\.parquet" # Should be empty (no .parquet files in Git) ``` --- ## Step 3: Verify Complete Setup (5 minutes) ### 3.1 Python Environment Verification ```bash # Activate environment if not already source .venv/bin/activate # Run comprehensive checks python << 'EOF' import sys print(f"Python: {sys.version}") packages = [ "polars", "pyarrow", "numpy", "scikit-learn", "torch", "transformers", "marimo", "altair", "entsoe", "jao", "requests", "yaml", "gradio", "datasets", "huggingface_hub" ] print("\nPackage Versions:") for pkg in packages: try: if pkg == "entsoe": import entsoe print(f"✓ entsoe-py: {entsoe.__version__}") elif pkg == "jao": import jao print(f"✓ jao-py: {jao.__version__}") elif pkg == "yaml": import yaml print(f"✓ pyyaml: {yaml.__version__}") elif pkg == "huggingface_hub": from huggingface_hub import HfApi print(f"✓ huggingface-hub: Ready") else: mod = __import__(pkg) print(f"✓ {pkg}: {mod.__version__}") except Exception as e: print(f"✗ {pkg}: {e}") # Test Chronos specifically try: from chronos import ChronosPipeline print("\n✓ Chronos forecasting: Ready") except Exception as e: print(f"\n✗ Chronos forecasting: {e}") # Test HF Datasets try: from datasets import Dataset print("✓ HuggingFace Datasets: Ready") except Exception as e: print(f"✗ HuggingFace Datasets: {e}") print("\nAll checks complete!") EOF ``` ### 3.2 API Access Verification ```bash # Test ENTSO-E API python << 'EOF' from entsoe import EntsoePandasClient import yaml # Load API key with open('config/api_keys.yaml') as f: config = yaml.safe_load(f) api_key = config['entsoe_api_key'] if 'YOUR_ENTSOE_API_KEY_HERE' in api_key: print("⚠ ENTSO-E API key not configured - update config/api_keys.yaml") else: try: client = EntsoePandasClient(api_key=api_key) print("✓ ENTSO-E API client initialized successfully") except Exception as e: print(f"✗ ENTSO-E API error: {e}") EOF # Test OpenMeteo API python << 'EOF' import requests response = requests.get( "https://api.open-meteo.com/v1/forecast", params={ "latitude": 52.52, "longitude": 13.41, "hourly": "temperature_2m", "start_date": "2025-01-01", "end_date": "2025-01-02" } ) if response.status_code == 200: print("✓ OpenMeteo API accessible") else: print(f"✗ OpenMeteo API error: {response.status_code}") EOF # Test HuggingFace authentication python << 'EOF' from huggingface_hub import HfApi import yaml with open('config/api_keys.yaml') as f: config = yaml.safe_load(f) hf_token = config['hf_token'] hf_username = config['hf_username'] if 'YOUR_HF' in hf_token or 'YOUR_HF' in hf_username: print("⚠ HuggingFace credentials not configured - update config/api_keys.yaml") else: try: api = HfApi(token=hf_token) user_info = api.whoami() print(f"✓ HuggingFace authenticated as: {user_info['name']}") print(f" Can create datasets: {'datasets' in user_info.get('auth', {}).get('accessToken', {}).get('role', '')}") except Exception as e: print(f"✗ HuggingFace authentication error: {e}") print(f" Verify token has WRITE permissions") EOF ``` ### 3.3 HF Space Verification ```bash # Check HF Space status echo "Visit your HF Space: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting" echo "" echo "Verify:" echo " 1. JupyterLab interface loads" echo " 2. Hardware shows 'A10G GPU' in bottom-right" echo " 3. Files from git push are visible" echo " 4. Can create new notebook" ``` ### 3.4 Final Checklist ```bash # Print final status cat << 'EOF' ╔═══════════════════════════════════════════════════════════╗ ║ DAY 0 SETUP VERIFICATION CHECKLIST ║ ╚═══════════════════════════════════════════════════════════╝ Environment: [ ] Python 3.10+ installed [ ] Git installed (NO Git LFS needed) [ ] uv package manager installed Local Setup: [ ] Virtual environment created and activated [ ] All Python dependencies installed (24 packages including jao-py) [ ] API keys configured (ENTSO-E + OpenMeteo + HuggingFace) [ ] HuggingFace write token obtained [ ] Project structure created (8 directories) [ ] .gitignore configured (data/ excluded) [ ] Initial Marimo notebook created [ ] Data management utilities created (hf_datasets_manager.py) Git & HF Space: [ ] HF Space created (A10G GPU, $30/month) [ ] Repository cloned locally [ ] .gitignore excludes all data files (*.parquet, data/) [ ] Initial commit pushed to HF Space (code only, NO data) [ ] HF Space JupyterLab accessible [ ] Git repo size < 50 MB (no data committed) Verification Tests: [ ] Python imports successful (polars, chronos, jao-py, datasets, etc.) [ ] ENTSO-E API client initializes [ ] OpenMeteo API responds (status 200) [ ] HuggingFace authentication successful (write access) [ ] Marimo notebook opens in browser Data Strategy Confirmed: [ ] Code goes in Git (version controlled) [ ] Data goes in HuggingFace Datasets (separate storage) [ ] NO Git LFS setup (following data science best practices) [ ] data/ directory in .gitignore Ready for Day 1: [ ] Next Step: Run Day 1 data collection (8 hours) - Download data locally via jao-py/APIs - Upload to HuggingFace Datasets (separate from Git) - Total data: ~12 GB (stored in HF Datasets, NOT Git) EOF ``` --- ## Troubleshooting ### Issue: uv installation fails ```bash # Alternative: Use pip directly python -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` ### Issue: Git LFS files not syncing **Not applicable** - We're using HuggingFace Datasets, not Git LFS. If you see Git LFS references, you may have an old version of this guide. Data files should NEVER be in Git. ### Issue: HuggingFace authentication fails ```bash # Verify token is correct python << 'EOF' from huggingface_hub import HfApi import yaml with open('config/api_keys.yaml') as f: config = yaml.safe_load(f) try: api = HfApi(token=config['hf_token']) print(api.whoami()) except Exception as e: print(f"Error: {e}") print("\nTroubleshooting:") print("1. Visit: https://huggingface.co/settings/tokens") print("2. Verify token has WRITE permission") print("3. Copy token exactly (starts with 'hf_')") print("4. Update config/api_keys.yaml and .env") EOF ``` ### Issue: Cannot upload to HuggingFace Datasets ```bash # Common causes: # 1. Token doesn't have write permissions # Fix: Create new token with "write" scope # 2. Dataset name already exists # Fix: Use different name or add version suffix # Example: fbmc-cnecs-2023-2025-v2 # 3. File too large (>5GB single file limit) # Fix: Split into multiple datasets or use sharding # Test upload with small sample: python << 'EOF' from datasets import Dataset import pandas as pd # Create tiny test dataset df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]}) dataset = Dataset.from_pandas(df) # Try uploading try: dataset.push_to_hub("YOUR_USERNAME/test-dataset", token="YOUR_TOKEN") print("✓ Upload successful - authentication works") except Exception as e: print(f"✗ Upload failed: {e}") EOF ``` ### Issue: Marimo notebook won't open ```bash # Check marimo installation marimo --version # Try running without opening browser marimo run notebooks/01_data_exploration.py # Check for port conflicts lsof -i :2718 # Default Marimo port ``` ### Issue: ENTSO-E API key invalid ```bash # Verify key in ENTSO-E Transparency Platform: # 1. Login: https://transparency.entsoe.eu/ # 2. Navigate: Account Settings → Web API Security Token # 3. Copy key exactly (no spaces) # 4. Update: config/api_keys.yaml and .env ``` ### Issue: HF Space shows "Building..." forever ```bash # Check HF Space logs: # Visit: https://huggingface.co/spaces/YOUR_USERNAME/fbmc-forecasting # Click: "Settings" → "Logs" # Common fix: Ensure requirements.txt is valid # Test locally: pip install -r requirements.txt --dry-run ``` ### Issue: jao-py import fails ```bash # Verify jao-py installation python -c "import jao; print(jao.__version__)" # If missing, reinstall uv pip install jao-py>=0.6.0 # Check package is in environment uv pip list | grep jao ``` --- ## What's Next: Day 1 Preview **Day 1 Objective**: Download 24 months of historical data (Oct 2023 - Sept 2025) **Data Collection Tasks:** 1. **JAO FBMC Data** (4-5 hours) - CNECs: ~900 MB (24 months) - PTDFs: ~1.5 GB (24 months) - RAMs: ~800 MB (24 months) - Shadow prices: ~600 MB (24 months) - LTN nominations: ~400 MB (24 months) - Net positions: ~300 MB (24 months) 2. **ENTSO-E Data** (2-3 hours) - Generation forecasts: 13 zones × 24 months - Actual generation: 13 zones × 24 months - Cross-border flows: ~20 borders × 24 months 3. **OpenMeteo Weather** (1-2 hours) - 52 grid points × 24 months - 8 variables per point - Parallel download optimization **Total Data Size**: ~12 GB (compressed Parquet) **Day 1 Script**: Will use jao-py Python library with rate limiting and parallel download logic. --- ## Summary **Time Investment**: 45 minutes **Result**: Production-ready local + cloud development environment **You Now Have:** - ✓ HF Space with A10G GPU ($30/month) - ✓ Local Python environment (24 packages including jao-py and HF Datasets) - ✓ jao-py Python library for JAO data access - ✓ ENTSO-E + OpenMeteo + HuggingFace API access configured - ✓ HuggingFace Datasets manager for data storage (separate from Git) - ✓ Data download/upload utilities (hf_datasets_manager.py) - ✓ Marimo reactive notebook environment - ✓ .gitignore configured (data/ excluded, following best practices) - ✓ Complete project structure (8 directories) **Data Strategy Implemented:** ``` Code (version controlled) → Git Repository (~50 MB) Data (storage & versioning) → HuggingFace Datasets (~12 GB) NO Git LFS (following data science best practices) ``` **Ready For**: Day 1 data collection (8 hours) - Download 24 months data locally (jao-py + APIs) - Upload to HuggingFace Datasets (not Git) - Git repo stays clean (code only) --- **Document Version**: 2.0 **Last Updated**: 2025-10-29 **Project**: FBMC Flow Forecasting MVP (Zero-Shot)