fbmc-chronos2 / CLAUDE.md
Evgueni Poloukarov
feat: Phase 1 complete - Master CNEC list + synchronized feature engineering
d4939ce
|
raw
history blame
20.4 kB

FBMC Flow Forecasting MVP - Claude Execution Rules

Global Development Rules

  1. Always update activity.md after significant changes with timestamp, description, files modified, and status. It's CRITICAL to always document where we are in the workflow.
  2. When starting a new session, always reference activity.md first.
  3. MANDATORY: Activate superpowers plugin at conversation start
    • IMMEDIATELY invoke Skill(superpowers:using-superpowers) at the start of EVERY conversation
    • Before responding to ANY task, check available skills for relevance (even 1% match = must use)
    • If a skill exists for the task, it is MANDATORY to use it - no exceptions, no rationalizations
    • Skills with checklists require TodoWrite todos for EACH item
    • Announce which skill you're using before executing it
    • This is not optional - failing to use available skills = automatic task failure
  4. Always look for existing code to iterate on instead of creating new code
  5. Do not drastically change the patterns before trying to iterate on existing patterns.
  6. Always kill all existing related servers that may have been created in previous testing before trying to start a new server.
  7. Always prefer simple solutions
  8. Avoid duplication of code whenever possible, which means checking for other areas of the codebase that might already have similar code and functionality
  9. Write code that takes into account the different environments: dev, test, and prod
  10. You are careful to only make changes that are requested or you are confident are well understood and related to the change being requested
  11. When fixing an issue or bug, do not introduce a new pattern or technology without first exhausting all options for the existing implementation. And if you finally do this, make sure to remove the old implementation afterwards so we don't have duplicate logic.
  12. Keep the codebase very clean and organized
  13. Avoid writing scripts in files if possible, especially if the sript is likely to be run once
  14. When you're not sure about something, ask for clarification
  15. Avoid having files over 200-300 lines of code. Refactor at that point.
  16. Mocking data is only needed for tests, never mock data for dev or prod
  17. Never add stubbing or fake data patterns to code that affects the dev or prod environments
  18. Never overwrite my .env file without first asking and confirming
  19. Focus on the areas of code relevant to the task
  20. Do not touch code that is unrelated to the task
  21. Write thorough test for all major functionality
  22. Avoid making major changes to the patterns of how a feature works, after it has shown to work well, unless explicitly instructed
  23. Always think about what method and areas of code might be affected by code changes
  24. Keep commits small and focused on a single change
  25. Write meaningful commit messages
  26. Review your own code before asking others to review it
  27. Be mindful of performance implications
  28. Always consider security implications of your code
  29. After making significant code changes (new features, major fixes, completing implementation phases), proactively offer to commit and push changes to GitHub with descriptive commit messages. Always ask for approval before executing git commands. Ensure no sensitive information (.env files, API keys) is committed.
  30. ALWAYS use virtual environments for Python projects. NEVER install packages globally. Create virtual environments with clear, project-specific names following the pattern: {project_name}_env (e.g., news_intel_env). Always verify virtual environment is activated before installing packages.
  31. ALWAYS use uv for package management in this project
    • NEVER use pip directly for installing/uninstalling packages
    • NEVER suggest pip commands to the user - ALWAYS use uv instead
    • Use: .venv/Scripts/uv.exe pip install <package> (Windows)
    • Use: /c/Users/evgue/.local/bin/uv.exe pip install <package> (Git Bash)
    • Use: .venv/Scripts/uv.exe pip uninstall <package>
    • uv is 10-100x faster than pip and provides better dependency resolution
    • This project uses uv package manager exclusively
    • Example: Instead of pip install marimo[mcp], use .venv/Scripts/uv.exe pip install marimo[mcp]
  32. NEVER pollute directories with multiple file versions
    • Do NOT leave test files, backup files, or old versions in main directories
    • If testing: move test files to archive immediately after use
    • If updating: either replace the file or archive the old version
    • Keep only ONE working version of each file in main directories
    • Use descriptive names in archive folders with dates
  33. Creating temporary scripts or files. Make sure they do not pollute the project. Execute them in a temporary script directory, and once you're done with them, delete them. I do not want a buildup of unnecessary files polluting the project. 33a. WINDOWS ENVIRONMENT - NO UNICODE IN BACKEND/SCRIPTS
    • NEVER use Unicode symbols (✓, ✗, ✅, →, etc.) in Python backend scripts, CLI tools, or data processing code
    • Windows console (cmd.exe) uses cp1252 encoding which doesn't support Unicode
    • Use ASCII alternatives instead:
      • ✓ → [OK] or +
      • ✗ → [ERROR] or x
      • ✅ → [SUCCESS]
      • → → ->
    • Unicode IS acceptable in:
      • Marimo notebooks (rendered in browser)
      • Documentation files (README.md, etc.)
      • Comments in code (not print statements)
    • This is a Windows-specific constraint - the local setup runs on Windows
  34. MARIMO NOTEBOOK VARIABLE DEFINITIONS
    • Marimo requires each variable to be defined in ONLY ONE cell (single-definition constraint)
    • Variables defined in multiple cells cause "This cell redefines variables from other cells" errors
    • Solution: Use UNIQUE, DESCRIPTIVE variable names that clearly identify their purpose
    • WRONG: Using _variable_name or variable_name in multiple cells (confusing, not descriptive)
    • RIGHT: Use descriptive names like stats_key_borders, timeseries_borders, impact_ptdf_cols
    • Examples:
      • BAD: key_borders used in 3 cells, or _key_borders everywhere
      • GOOD: stats_key_borders (for statistics table), timeseries_borders (for chart), heatmap_borders (for heatmap)
      • BAD: ptdf_cols used in 2 cells
      • GOOD: impact_ptdf_cols (for impact analysis), ptdf_cols (for main PTDF analysis that returns the variable)
    • Variable names must be self-documenting: reader should understand the variable's purpose without looking at code
    • When adding new cells to existing notebooks, check for variable name conflicts BEFORE writing code
    • Only use shared variable names (returned in the cell) if the variable needs to be accessed by other cells
    • This enables Marimo's reactive execution and prevents redefinition errors
  35. MARIMO NOTEBOOK DATA PROCESSING - POLARS STRONGLY PREFERRED
    • STRONG PREFERENCE: Use Polars for all data processing in Marimo notebooks
    • Pandas/NumPy allowed when absolutely necessary: e.g., when using libraries like jao-py that require pandas Timestamps
    • Polars is faster, more memory efficient, and better for large datasets
    • Examples:
      • PREFERRED: import polars as pl, df.unpivot(), Polars-native operations
      • AVOID when possible: import pandas as pd, pd.melt(), pandas operations
      • ACCEPTABLE: Using pandas when required by external libraries (jao-py, entsoe-py)
    • Only convert to pandas at the very last step for Altair visualization: chart = alt.Chart(df.to_pandas())
    • Use Polars methods whenever possible:
      • Reshaping: df.unpivot() instead of pandas melt()
      • Aggregation: df.mean(), df.group_by().agg()
      • Selection: df.select(), df.filter()
      • Column operations: df[col].mean(), df.with_columns()
    • When iterating through columns: for col in df.columns and compute with df[col].operation()
    • Pattern: Use pandas only where unavoidable, immediately convert to Polars for processing
    • This ensures consistent, fast, memory-efficient data processing throughout notebooks
  36. MARIMO NOTEBOOK WORKFLOW & MCP INTEGRATION
    • When editing Marimo notebooks, ALWAYS run .venv/Scripts/marimo.exe check <notebook.py> after making changes
    • Fix ALL issues reported by marimo check before considering the edit complete
    • Use the check command's feedback for self-correction
    • Never skip validation - marimo check catches variable redefinitions, syntax errors, and cell issues
    • Pattern: Edit → Check → Fix → Verify
    • Start notebooks with --mcp --no-token --watch for AI-enhanced development:
      • --mcp: Exposes notebook inspection tools via Model Context Protocol
      • --no-token: Disables authentication for local development
      • --watch: Auto-reloads notebook when file changes on disk
    • MCP integration enables real-time error detection, variable inspection, and cell state monitoring
    • Example workflow: Edit in Claude → Save → Auto-reload → Check → Fix errors → Verify
    • The MCP server exposes these capabilities to Claude Code:
      • get_active_notebooks - List running notebooks
      • get_errors - Detect cell errors in real-time
      • get_variables - Inspect variable definitions
      • get_cell_code - Read specific cell contents
    • Use marimo check for pre-commit validation to catch issues before deployment
    • Always verify notebook runs error-free before marking work as complete

Project Identity

Zero-shot electricity cross-border capacity forecasting using Chronos 2

  • 5-day MVP timeline (FIRM - no extensions)
  • Target: 134 MW MAE on D+1 forecasts
  • Approach: Zero-shot inference only (NO fine-tuning)
  • Handover: Complete working system to quantitative analyst

Tech Stack

Core ML/Data

  • Model: Amazon Chronos 2 Large (710M params, pre-trained)
  • Data Processing: Polars (primary), PyArrow
  • Scientific: NumPy, scikit-learn
  • Framework: PyTorch 2.0+, Transformers 4.35+

Development Environment

  • Local Notebooks: Marimo 0.9+ (reactive, .py format)
  • Handover Format: JupyterLab (standard .ipynb)
  • Infrastructure: HuggingFace Space (JupyterLab SDK, A10G GPU)
  • Package Manager: uv (10-100x faster than pip)

Data Collection

  • JAO Data: jao-py Python library (no Java required)
  • Power Data: entsoe-py (ENTSO-E Transparency API)
  • Weather Data: OpenMeteo API (free tier)
  • Data Storage: HuggingFace Datasets (NOT Git/Git-LFS)

Visualization & Analysis

  • Primary: Altair 5.0+
  • Notebooks: Marimo reactive interface
  • Export: Standard matplotlib/seaborn for static reports

Testing & Quality

  • Testing: pytest (unit, integration, smoke tests)
  • Validation: Custom assertions for data quality
  • CI/CD: GitHub Actions (optional, for automated testing)

Critical Execution Rules

1. Scope Discipline

  • ONLY zero-shot inference - no model training/fine-tuning
  • ONLY Core FBMC (13 countries, ~20 borders)
  • ONLY 24 months historical data (Oct 2023 - Sept 2025)
  • ONLY 5 days development time
  • If asked to add features, reference Phase 2 handover

2. Data Management Philosophy

Code      → Git repository (~50 MB, version controlled)
Data      → HuggingFace Datasets (~12 GB, separate storage)
NO Git LFS (never, following data science best practices)
  • NEVER commit data files (.parquet, .csv, .pkl) to Git
  • All data goes through HuggingFace Datasets API
  • .gitignore must exclude data/ directory
  • Git repo must stay under 100 MB total

3. Chronos 2 Zero-Shot Pattern

# CORRECT - Zero-shot inference
pipeline = ChronosPipeline.from_pretrained("amazon/chronos-t5-large")
forecast = pipeline.predict(context=features[-512:], prediction_length=336)

# INCORRECT - Do NOT train/fine-tune
model.fit(training_data)  # ❌ OUT OF SCOPE
  • Load pre-trained model only
  • Use 24-month data for feature baselines and context windows
  • NO gradient updates, NO epoch training, NO .fit() calls

4. Marimo Development Workflow

  • Use Marimo locally for reactive development
  • Export to Jupyter for quant analyst handover
  • Structure: DAG cells, no variable redefinition
  • Pattern for expensive ops: mo.ui.run_button() + @mo.cache()
  • Configure: auto_instantiate = false, on_cell_change = "lazy"

5. Feature Engineering Constraints

  • ~1,735 features across 11 categories (production-grade architecture)
  • 52 weather grid points (simplified spatial model)
  • 200 CNECs (50 Tier-1 + 150 Tier-2) with weighted scoring
  • Focus on high-signal features only
  • Validate >95% feature completeness

6. Performance Targets

  • Inference: <5 minutes for complete 14-day forecast
  • Accuracy: D+1 MAE target is 134 MW (must be <150 MW)
  • Cost: $30/month (A10G GPU, no upgrades in MVP)
  • Document performance gaps for Phase 2 fine-tuning

7. Code Quality Standards

  • Polars-first for data operations (faster, more memory efficient)
  • Type hints for all function signatures
  • Docstrings for all non-trivial functions
  • Validation checks at every pipeline stage
  • Error handling with informative messages

8. Daily Development Structure

Day 0: Environment setup (45 min) → git commit + push
Day 1: Data collection (8 hrs) → validate data → git commit + push
Day 2: Feature engineering (8 hrs) → test features → git commit + push
Day 3: Zero-shot inference (8 hrs) → smoke test → git commit + push
Day 4: Performance evaluation (8 hrs) → validate metrics → git commit + push
Day 5: Documentation + handover (8 hrs) → integration test → final commit + push
  • Each day ends with validation tests + git commit + push to GitHub
  • Intermediate commits for major milestones within the day
  • NO day can bleed into the next
  • If running behind, scope down (never extend timeline)
  • Tests must pass before committing

9. Git Workflow & Version Control

  • Commit frequency: End of each major milestone + end of each day
  • Commit style: Conventional commits format
    • feat: add weather data collection pipeline
    • fix: correct CNEC binding frequency calculation
    • docs: update handover guide with evaluation metrics
    • refactor: optimize feature engineering for polars
  • Push to GitHub: After every commit (keep remote in sync)
  • Branch strategy: Main branch only for MVP (no feature branches)
  • Commit granularity: Logical units of work (not "end of day dump")
  • Git hygiene: Review git status before commits, ensure data/ excluded

Daily commit pattern:

# End of Day 1
git add .
git commit -m "feat: complete data collection pipeline with HF Datasets integration"
git push origin main

# Mid-Day 2 milestone
git commit -m "feat: implement ~1,735-feature engineering pipeline"
git push origin main

# End of Day 2
git commit -m "test: add feature validation and CNEC identification"
git push origin main

10. Testing Strategy

  • Data validation: Assert data completeness, check for nulls, validate ranges
  • Feature engineering: Unit tests for each feature calculation
  • Model inference: Smoke test on small sample before full run
  • Integration: End-to-end pipeline test with 1-week subset
  • Performance: Assert inference time <5 min, MAE within bounds

Testing patterns:

# Data validation checks
assert df.null_count().sum() < 0.05 * len(df), "Too many missing values"
assert date_range_complete(df['timestamp']), "Date gaps detected"

# Feature validation
features = engineer.transform(data)
assert features.shape[1] == 1735, f"Expected ~1,735 features, got {features.shape[1]}"
assert (features.select(pl.all().is_null().sum()).row(0) == (0,) * 1735), "Null features detected"

# Inference validation
forecast = pipeline.predict(context, prediction_length=336)
assert forecast.shape == (336, n_borders), "Forecast shape mismatch"
assert not np.isnan(forecast).any(), "NaN in predictions"

Testing schedule:

  • Day 1: Validate downloaded data completeness
  • Day 2: Test each feature calculation independently
  • Day 3: Smoke test inference on 7-day window
  • Day 4: Validate evaluation metrics calculations
  • Day 5: Full integration test before handover

Test organization (tests/ directory):

tests/
├── test_data_collection.py     # Data completeness, API responses
├── test_feature_engineering.py # Each feature calculation
├── test_model_inference.py     # Inference smoke tests
└── test_integration.py         # End-to-end pipeline

Running tests:

# Install pytest
uv pip install pytest

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_feature_engineering.py -v

# Before each commit
pytest tests/ && git commit -m "feat: ..."

11. Documentation Requirements

  • README.md with quick start guide
  • HANDOVER_GUIDE.md for quant analyst
  • Inline code comments for complex logic
  • Results visualization + interpretation
  • Fine-tuning roadmap (Phase 2 guidance)

12. Handover Package Must Include

  • Working zero-shot forecast system
  • All Marimo notebooks (.py) + exported Jupyter (.ipynb)
  • HuggingFace Space with complete environment
  • Performance analysis showing 134 MW MAE achieved
  • Error analysis identifying fine-tuning opportunities
  • Clear Phase 2 roadmap

Geographic Scope (Reference)

Core FBMC Countries (13 total): AT, BE, HR, CZ, FR, DE-LU, HU, NL, PL, RO, SK, SI

Borders: ~20 interconnections (multivariate forecasting)

OUT OF SCOPE: Nordic FBMC (NO, SE, DK, FI) - Phase 2 only


API Access Confirmed

  • ✓ jao-py library (24 months FBMC data accessible)
  • ✓ ENTSO-E API key (generation, flows)
  • ✓ OpenMeteo API (free tier, 52 grid points)
  • ✓ HuggingFace write token (Datasets upload)

Decision-Making Framework

When uncertain, apply this hierarchy:

  1. Does it extend timeline? → Reject immediately
  2. Does it require fine-tuning? → Phase 2 only
  3. Does it compromise data management? → Never commit data to Git
  4. Does it add features beyond 1,735? → Reject (scope creep)
  5. Does it skip testing/validation? → Add checks immediately
  6. Does it help quant analyst? → Include in handover docs
  7. Does it improve zero-shot accuracy? → Consider if time permits
  8. Does it add complexity? → Default to simplicity
  9. Can you commit and push? → Do it now (frequent commits)

Anti-Patterns to Avoid

❌ Training/fine-tuning the model (Phase 2) ❌ Committing data files to Git repository ❌ Using Git LFS for data storage ❌ Extending beyond 5-day timeline ❌ Adding features beyond 1,735 count ❌ Including Nordic FBMC borders ❌ Building production automation (out of scope) ❌ Creating real-time dashboards (out of scope) ❌ Over-engineering infrastructure ❌ Forgetting to document for handover ❌ Skipping data validation checks ❌ Running full pipeline without smoke tests ❌ Committing without pushing to GitHub


Success Criteria Checklist

At Day 5 completion:

  • Zero-shot forecasts for all ~20 FBMC borders working
  • Inference time <5 minutes per 14-day forecast
  • D+1 MAE ≤ 134 MW (target <150 MW)
  • HuggingFace Space operational at $30/month
  • Complete handover documentation written
  • All Marimo notebooks exported to Jupyter format
  • Git repo <100 MB (code only, no data)
  • Data stored in HuggingFace Datasets (separate)
  • Quant analyst can fork HF Space and continue
  • All tests passing (data validation, feature checks, inference)
  • Git history shows daily commits with descriptive messages
  • GitHub repo synchronized with all commits pushed

Communication Style

When providing updates or recommendations:

  • Lead with impact on 5-day timeline
  • Be direct about scope constraints
  • Suggest alternatives within MVP boundaries
  • Reference Phase 2 for out-of-scope items
  • Document assumptions and limitations
  • Always include next concrete action

Version: 2.0.0 Created: 2025-10-27 Updated: 2025-10-29 (unified with production-grade scope) Project: FBMC Flow Forecasting MVP (Zero-Shot) Purpose: Execution rules for Claude during 5-day development