# FBMC Flow Forecasting MVP - Claude Execution Rules # Global Development Rules 1. **Always update `activity.md`** after significant changes with timestamp, description, files modified, and status. It's CRITICAL to always document where we are in the workflow. 2. When starting a new session, always reference activity.md first. 3. **MANDATORY: Activate superpowers plugin at conversation start** - IMMEDIATELY invoke `Skill(superpowers:using-superpowers)` at the start of EVERY conversation - Before responding to ANY task, check available skills for relevance (even 1% match = must use) - If a skill exists for the task, it is MANDATORY to use it - no exceptions, no rationalizations - Skills with checklists require TodoWrite todos for EACH item - Announce which skill you're using before executing it - This is not optional - failing to use available skills = automatic task failure 4. Always look for existing code to iterate on instead of creating new code 5. Do not drastically change the patterns before trying to iterate on existing patterns. 6. Always kill all existing related servers that may have been created in previous testing before trying to start a new server. 7. Always prefer simple solutions 8. Avoid duplication of code whenever possible, which means checking for other areas of the codebase that might already have similar code and functionality 9. Write code that takes into account the different environments: dev, test, and prod 10. You are careful to only make changes that are requested or you are confident are well understood and related to the change being requested 11. When fixing an issue or bug, do not introduce a new pattern or technology without first exhausting all options for the existing implementation. And if you finally do this, make sure to remove the old implementation afterwards so we don't have duplicate logic. 12. Keep the codebase very clean and organized 13. Avoid writing scripts in files if possible, especially if the sript is likely to be run once 14. When you're not sure about something, ask for clarification 15. Avoid having files over 200-300 lines of code. Refactor at that point. 16. Mocking data is only needed for tests, never mock data for dev or prod 17. Never add stubbing or fake data patterns to code that affects the dev or prod environments 18. Never overwrite my .env file without first asking and confirming 19. Focus on the areas of code relevant to the task 20. Do not touch code that is unrelated to the task 21. Write thorough test for all major functionality 22. Avoid making major changes to the patterns of how a feature works, after it has shown to work well, unless explicitly instructed 23. Always think about what method and areas of code might be affected by code changes 24. Keep commits small and focused on a single change 25. Write meaningful commit messages 26. Review your own code before asking others to review it 27. Be mindful of performance implications 28. Always consider security implications of your code 29. After making significant code changes (new features, major fixes, completing implementation phases), proactively offer to commit and push changes to GitHub with descriptive commit messages. Always ask for approval before executing git commands. Ensure no sensitive information (.env files, API keys) is committed. 30. **CRITICAL: HuggingFace Space Deployment - ALWAYS Push to BOTH Remotes** - This project deploys to BOTH GitHub AND HuggingFace Space - Git remotes: `origin` (GitHub) and `hf-new` (HF Space) - **BRANCH MAPPING**: Local uses `master`, HF Space uses `main` - MUST map branches! - **MANDATORY**: After ANY commit affecting HF Space functionality, push to BOTH: ```bash git push origin master # Push to GitHub (master branch) git push hf-new master:main # Push to HF Space (main branch) - NOTE: master:main mapping! ``` - **Why both?** HF Spaces are SEPARATE git repositories - they do NOT auto-sync with GitHub - **Failure mode**: Pushing only to GitHub means HF Space continues running old code indefinitely - **Common mistake**: Pushing `master` to `master` on HF Space - it uses `main` branch! - **Verification**: After pushing to hf-new, wait 3-5 minutes for Space rebuild, then test - **NEVER** push to hf-new without also pushing to origin first (origin is source of truth) 31. ALWAYS use virtual environments for Python projects. NEVER install packages globally. Create virtual environments with clear, project-specific names following the pattern: {project_name}_env (e.g., news_intel_env). Always verify virtual environment is activated before installing packages. 32. **ALWAYS use uv for package management in this project** - NEVER use pip directly for installing/uninstalling packages - NEVER suggest pip commands to the user - ALWAYS use uv instead - Use: `.venv/Scripts/uv.exe pip install ` (Windows) - Use: `/c/Users/evgue/.local/bin/uv.exe pip install ` (Git Bash) - Use: `.venv/Scripts/uv.exe pip uninstall ` - uv is 10-100x faster than pip and provides better dependency resolution - This project uses uv package manager exclusively - Example: Instead of `pip install marimo[mcp]`, use `.venv/Scripts/uv.exe pip install marimo[mcp]` 33. **NEVER pollute directories with multiple file versions** - Do NOT leave test files, backup files, or old versions in main directories - If testing: move test files to archive immediately after use - If updating: either replace the file or archive the old version - Keep only ONE working version of each file in main directories - Use descriptive names in archive folders with dates 34. Creating temporary scripts or files. Make sure they do not pollute the project. Execute them in a temporary script directory, and once you're done with them, delete them. I do not want a buildup of unnecessary files polluting the project. 35. **WINDOWS ENVIRONMENT - NO UNICODE IN BACKEND/SCRIPTS** - NEVER use Unicode symbols (✓, ✗, ✅, →, etc.) in Python backend scripts, CLI tools, or data processing code - Windows console (cmd.exe) uses cp1252 encoding which doesn't support Unicode - Use ASCII alternatives instead: * ✓ → [OK] or + * ✗ → [ERROR] or x * ✅ → [SUCCESS] * → → -> - Unicode IS acceptable in: * Marimo notebooks (rendered in browser) * Documentation files (README.md, etc.) * Comments in code (not print statements) - This is a Windows-specific constraint - the local setup runs on Windows 36. **MARIMO NOTEBOOK VARIABLE DEFINITIONS** - Marimo requires each variable to be defined in ONLY ONE cell (single-definition constraint) - Variables defined in multiple cells cause "This cell redefines variables from other cells" errors - Solution: Use UNIQUE, DESCRIPTIVE variable names that clearly identify their purpose - WRONG: Using `_variable_name` or `variable_name` in multiple cells (confusing, not descriptive) - RIGHT: Use descriptive names like `stats_key_borders`, `timeseries_borders`, `impact_ptdf_cols` - Examples: * BAD: `key_borders` used in 3 cells, or `_key_borders` everywhere * GOOD: `stats_key_borders` (for statistics table), `timeseries_borders` (for chart), `heatmap_borders` (for heatmap) * BAD: `ptdf_cols` used in 2 cells * GOOD: `impact_ptdf_cols` (for impact analysis), `ptdf_cols` (for main PTDF analysis that returns the variable) - Variable names must be self-documenting: reader should understand the variable's purpose without looking at code - When adding new cells to existing notebooks, check for variable name conflicts BEFORE writing code - Only use shared variable names (returned in the cell) if the variable needs to be accessed by other cells - This enables Marimo's reactive execution and prevents redefinition errors 37. **MARIMO NOTEBOOK DATA PROCESSING - POLARS STRONGLY PREFERRED** - **STRONG PREFERENCE**: Use Polars for all data processing in Marimo notebooks - **Pandas/NumPy allowed when absolutely necessary**: e.g., when using libraries like jao-py that require pandas Timestamps - Polars is faster, more memory efficient, and better for large datasets - Examples: * PREFERRED: `import polars as pl`, `df.unpivot()`, Polars-native operations * AVOID when possible: `import pandas as pd`, `pd.melt()`, pandas operations * ACCEPTABLE: Using pandas when required by external libraries (jao-py, entsoe-py) - Only convert to pandas at the very last step for Altair visualization: `chart = alt.Chart(df.to_pandas())` - Use Polars methods whenever possible: * Reshaping: `df.unpivot()` instead of pandas `melt()` * Aggregation: `df.mean()`, `df.group_by().agg()` * Selection: `df.select()`, `df.filter()` * Column operations: `df[col].mean()`, `df.with_columns()` - When iterating through columns: `for col in df.columns` and compute with `df[col].operation()` - Pattern: Use pandas only where unavoidable, immediately convert to Polars for processing - This ensures consistent, fast, memory-efficient data processing throughout notebooks 38. **MARIMO NOTEBOOK WORKFLOW & MCP INTEGRATION** - When editing Marimo notebooks, ALWAYS run `.venv/Scripts/marimo.exe check ` after making changes - Fix ALL issues reported by marimo check before considering the edit complete - Use the check command's feedback for self-correction - Never skip validation - marimo check catches variable redefinitions, syntax errors, and cell issues - Pattern: Edit → Check → Fix → Verify - Start notebooks with `--mcp --no-token --watch` for AI-enhanced development: * `--mcp`: Exposes notebook inspection tools via Model Context Protocol * `--no-token`: Disables authentication for local development * `--watch`: Auto-reloads notebook when file changes on disk - MCP integration enables real-time error detection, variable inspection, and cell state monitoring - Example workflow: Edit in Claude → Save → Auto-reload → Check → Fix errors → Verify - The MCP server exposes these capabilities to Claude Code: * get_active_notebooks - List running notebooks * get_errors - Detect cell errors in real-time * get_variables - Inspect variable definitions * get_cell_code - Read specific cell contents - Use `marimo check` for pre-commit validation to catch issues before deployment - Always verify notebook runs error-free before marking work as complete ## Project Identity **Zero-shot electricity cross-border capacity forecasting using Chronos 2** - 5-day MVP timeline (FIRM - no extensions) - Target: 134 MW MAE on D+1 forecasts - Approach: Zero-shot inference only (NO fine-tuning) - Handover: Complete working system to quantitative analyst --- ## Tech Stack ### Core ML/Data - **Model**: Amazon Chronos 2 Large (710M params, pre-trained) - **Data Processing**: Polars (primary), PyArrow - **Scientific**: NumPy, scikit-learn - **Framework**: PyTorch 2.0+, Transformers 4.35+ ### Development Environment - **Local Notebooks**: Marimo 0.9+ (reactive, .py format) - **Handover Format**: JupyterLab (standard .ipynb) - **Infrastructure**: HuggingFace Space (JupyterLab SDK, A10G GPU) - **Package Manager**: uv (10-100x faster than pip) ### Data Collection - **JAO Data**: jao-py Python library (no Java required) - **Power Data**: entsoe-py (ENTSO-E Transparency API) - **Weather Data**: OpenMeteo API (free tier) - **Data Storage**: HuggingFace Datasets (NOT Git/Git-LFS) ### Visualization & Analysis - **Primary**: Altair 5.0+ - **Notebooks**: Marimo reactive interface - **Export**: Standard matplotlib/seaborn for static reports ### Testing & Quality - **Testing**: pytest (unit, integration, smoke tests) - **Validation**: Custom assertions for data quality - **CI/CD**: GitHub Actions (optional, for automated testing) --- ## Critical Execution Rules ### 1. Scope Discipline - **ONLY** zero-shot inference - no model training/fine-tuning - **ONLY** Core FBMC (13 countries, ~20 borders) - **ONLY** 24 months historical data (Oct 2023 - Sept 2025) - **ONLY** 5 days development time - If asked to add features, reference Phase 2 handover ### 2. Data Management Philosophy ``` Code → Git repository (~50 MB, version controlled) Data → HuggingFace Datasets (~12 GB, separate storage) NO Git LFS (never, following data science best practices) ``` - **NEVER** commit data files (.parquet, .csv, .pkl) to Git - All data goes through HuggingFace Datasets API - `.gitignore` must exclude `data/` directory - Git repo must stay under 100 MB total ### 3. Chronos 2 Zero-Shot Pattern ```python # CORRECT - Zero-shot inference pipeline = ChronosPipeline.from_pretrained("amazon/chronos-t5-large") forecast = pipeline.predict(context=features[-512:], prediction_length=336) # INCORRECT - Do NOT train/fine-tune model.fit(training_data) # ❌ OUT OF SCOPE ``` - Load pre-trained model only - Use 24-month data for feature baselines and context windows - NO gradient updates, NO epoch training, NO .fit() calls ### 4. Marimo Development Workflow - **Use Marimo locally** for reactive development - **Export to Jupyter** for quant analyst handover - Structure: DAG cells, no variable redefinition - Pattern for expensive ops: `mo.ui.run_button() + @mo.cache()` - Configure: `auto_instantiate = false`, `on_cell_change = "lazy"` ### 5. Feature Engineering Constraints - **~1,735 features** across 11 categories (production-grade architecture) - **52 weather grid points** (simplified spatial model) - **200 CNECs** (50 Tier-1 + 150 Tier-2) with weighted scoring - Focus on high-signal features only - Validate >95% feature completeness ### 6. Performance Targets - **Inference**: <5 minutes for complete 14-day forecast - **Accuracy**: D+1 MAE target is 134 MW (must be <150 MW) - **Cost**: $30/month (A10G GPU, no upgrades in MVP) - Document performance gaps for Phase 2 fine-tuning ### 7. Code Quality Standards - Polars-first for data operations (faster, more memory efficient) - Type hints for all function signatures - Docstrings for all non-trivial functions - Validation checks at every pipeline stage - Error handling with informative messages ### 8. Daily Development Structure ``` Day 0: Environment setup (45 min) → git commit + push Day 1: Data collection (8 hrs) → validate data → git commit + push Day 2: Feature engineering (8 hrs) → test features → git commit + push Day 3: Zero-shot inference (8 hrs) → smoke test → git commit + push Day 4: Performance evaluation (8 hrs) → validate metrics → git commit + push Day 5: Documentation + handover (8 hrs) → integration test → final commit + push ``` - Each day ends with validation tests + git commit + push to GitHub - Intermediate commits for major milestones within the day - NO day can bleed into the next - If running behind, scope down (never extend timeline) - Tests must pass before committing ### 9. Git Workflow & Version Control - **Commit frequency**: End of each major milestone + end of each day - **Commit style**: Conventional commits format - `feat: add weather data collection pipeline` - `fix: correct CNEC binding frequency calculation` - `docs: update handover guide with evaluation metrics` - `refactor: optimize feature engineering for polars` - **Push to GitHub**: After every commit (keep remote in sync) - **Branch strategy**: Main branch only for MVP (no feature branches) - **Commit granularity**: Logical units of work (not "end of day dump") - **Git hygiene**: Review `git status` before commits, ensure data/ excluded **Daily commit pattern**: ```bash # End of Day 1 git add . git commit -m "feat: complete data collection pipeline with HF Datasets integration" git push origin main # Mid-Day 2 milestone git commit -m "feat: implement ~1,735-feature engineering pipeline" git push origin main # End of Day 2 git commit -m "test: add feature validation and CNEC identification" git push origin main ``` ### 10. Testing Strategy - **Data validation**: Assert data completeness, check for nulls, validate ranges - **Feature engineering**: Unit tests for each feature calculation - **Model inference**: Smoke test on small sample before full run - **Integration**: End-to-end pipeline test with 1-week subset - **Performance**: Assert inference time <5 min, MAE within bounds **Testing patterns**: ```python # Data validation checks assert df.null_count().sum() < 0.05 * len(df), "Too many missing values" assert date_range_complete(df['timestamp']), "Date gaps detected" # Feature validation features = engineer.transform(data) assert features.shape[1] == 1735, f"Expected ~1,735 features, got {features.shape[1]}" assert (features.select(pl.all().is_null().sum()).row(0) == (0,) * 1735), "Null features detected" # Inference validation forecast = pipeline.predict(context, prediction_length=336) assert forecast.shape == (336, n_borders), "Forecast shape mismatch" assert not np.isnan(forecast).any(), "NaN in predictions" ``` **Testing schedule**: - Day 1: Validate downloaded data completeness - Day 2: Test each feature calculation independently - Day 3: Smoke test inference on 7-day window - Day 4: Validate evaluation metrics calculations - Day 5: Full integration test before handover **Test organization** (tests/ directory): ``` tests/ ├── test_data_collection.py # Data completeness, API responses ├── test_feature_engineering.py # Each feature calculation ├── test_model_inference.py # Inference smoke tests └── test_integration.py # End-to-end pipeline ``` **Running tests**: ```bash # Install pytest uv pip install pytest # Run all tests pytest tests/ -v # Run specific test file pytest tests/test_feature_engineering.py -v # Before each commit pytest tests/ && git commit -m "feat: ..." ``` ### 11. Documentation Requirements - README.md with quick start guide - HANDOVER_GUIDE.md for quant analyst - Inline code comments for complex logic - Results visualization + interpretation - Fine-tuning roadmap (Phase 2 guidance) ### 12. Handover Package Must Include - Working zero-shot forecast system - All Marimo notebooks (.py) + exported Jupyter (.ipynb) - HuggingFace Space with complete environment - Performance analysis showing 134 MW MAE achieved - Error analysis identifying fine-tuning opportunities - Clear Phase 2 roadmap --- ## Geographic Scope (Reference) **Core FBMC Countries** (13 total): AT, BE, HR, CZ, FR, DE-LU, HU, NL, PL, RO, SK, SI **Borders**: ~20 interconnections (multivariate forecasting) **OUT OF SCOPE**: Nordic FBMC (NO, SE, DK, FI) - Phase 2 only --- ## API Access Confirmed - ✓ jao-py library (24 months FBMC data accessible) - ✓ ENTSO-E API key (generation, flows) - ✓ OpenMeteo API (free tier, 52 grid points) - ✓ HuggingFace write token (Datasets upload) --- ## Decision-Making Framework When uncertain, apply this hierarchy: 1. **Does it extend timeline?** → Reject immediately 2. **Does it require fine-tuning?** → Phase 2 only 3. **Does it compromise data management?** → Never commit data to Git 4. **Does it add features beyond 1,735?** → Reject (scope creep) 5. **Does it skip testing/validation?** → Add checks immediately 6. **Does it help quant analyst?** → Include in handover docs 7. **Does it improve zero-shot accuracy?** → Consider if time permits 8. **Does it add complexity?** → Default to simplicity 9. **Can you commit and push?** → Do it now (frequent commits) --- ## Anti-Patterns to Avoid ❌ Training/fine-tuning the model (Phase 2) ❌ Committing data files to Git repository ❌ Using Git LFS for data storage ❌ Extending beyond 5-day timeline ❌ Adding features beyond 1,735 count ❌ Including Nordic FBMC borders ❌ Building production automation (out of scope) ❌ Creating real-time dashboards (out of scope) ❌ Over-engineering infrastructure ❌ Forgetting to document for handover ❌ Skipping data validation checks ❌ Running full pipeline without smoke tests ❌ Committing without pushing to GitHub --- ## Success Criteria Checklist At Day 5 completion: - [ ] Zero-shot forecasts for all ~20 FBMC borders working - [ ] Inference time <5 minutes per 14-day forecast - [ ] D+1 MAE ≤ 134 MW (target <150 MW) - [ ] HuggingFace Space operational at $30/month - [ ] Complete handover documentation written - [ ] All Marimo notebooks exported to Jupyter format - [ ] Git repo <100 MB (code only, no data) - [ ] Data stored in HuggingFace Datasets (separate) - [ ] Quant analyst can fork HF Space and continue - [ ] All tests passing (data validation, feature checks, inference) - [ ] Git history shows daily commits with descriptive messages - [ ] GitHub repo synchronized with all commits pushed --- ## Communication Style When providing updates or recommendations: - Lead with impact on 5-day timeline - Be direct about scope constraints - Suggest alternatives within MVP boundaries - Reference Phase 2 for out-of-scope items - Document assumptions and limitations - Always include next concrete action --- **Version**: 2.0.0 **Created**: 2025-10-27 **Updated**: 2025-10-29 (unified with production-grade scope) **Project**: FBMC Flow Forecasting MVP (Zero-Shot) **Purpose**: Execution rules for Claude during 5-day development