Spaces:

evgueni-p
/

fbmc-chronos2

Sleeping

App Files Files Community

fbmc-chronos2 / CLAUDE.md

Evgueni Poloukarov

feat: Phase 1 complete - Master CNEC list + synchronized feature engineering

d4939ce about 1 month ago

preview code

raw

history blame

20.4 kB

FBMC Flow Forecasting MVP - Claude Execution Rules

Global Development Rules

Always update activity.md after significant changes with timestamp, description, files modified, and status. It's CRITICAL to always document where we are in the workflow.
When starting a new session, always reference activity.md first.
MANDATORY: Activate superpowers plugin at conversation start
- IMMEDIATELY invoke Skill(superpowers:using-superpowers) at the start of EVERY conversation
- Before responding to ANY task, check available skills for relevance (even 1% match = must use)
- If a skill exists for the task, it is MANDATORY to use it - no exceptions, no rationalizations
- Skills with checklists require TodoWrite todos for EACH item
- Announce which skill you're using before executing it
- This is not optional - failing to use available skills = automatic task failure
Always look for existing code to iterate on instead of creating new code
Do not drastically change the patterns before trying to iterate on existing patterns.
Always kill all existing related servers that may have been created in previous testing before trying to start a new server.
Always prefer simple solutions
Avoid duplication of code whenever possible, which means checking for other areas of the codebase that might already have similar code and functionality
Write code that takes into account the different environments: dev, test, and prod
You are careful to only make changes that are requested or you are confident are well understood and related to the change being requested
When fixing an issue or bug, do not introduce a new pattern or technology without first exhausting all options for the existing implementation. And if you finally do this, make sure to remove the old implementation afterwards so we don't have duplicate logic.
Keep the codebase very clean and organized
Avoid writing scripts in files if possible, especially if the sript is likely to be run once
When you're not sure about something, ask for clarification
Avoid having files over 200-300 lines of code. Refactor at that point.
Mocking data is only needed for tests, never mock data for dev or prod
Never add stubbing or fake data patterns to code that affects the dev or prod environments
Never overwrite my .env file without first asking and confirming
Focus on the areas of code relevant to the task
Do not touch code that is unrelated to the task
Write thorough test for all major functionality
Avoid making major changes to the patterns of how a feature works, after it has shown to work well, unless explicitly instructed
Always think about what method and areas of code might be affected by code changes
Keep commits small and focused on a single change
Write meaningful commit messages
Review your own code before asking others to review it
Be mindful of performance implications
Always consider security implications of your code
After making significant code changes (new features, major fixes, completing implementation phases), proactively offer to commit and push changes to GitHub with descriptive commit messages. Always ask for approval before executing git commands. Ensure no sensitive information (.env files, API keys) is committed.
ALWAYS use virtual environments for Python projects. NEVER install packages globally. Create virtual environments with clear, project-specific names following the pattern: {project_name}_env (e.g., news_intel_env). Always verify virtual environment is activated before installing packages.
ALWAYS use uv for package management in this project
- NEVER use pip directly for installing/uninstalling packages
- NEVER suggest pip commands to the user - ALWAYS use uv instead
- Use: .venv/Scripts/uv.exe pip install <package> (Windows)
- Use: /c/Users/evgue/.local/bin/uv.exe pip install <package> (Git Bash)
- Use: .venv/Scripts/uv.exe pip uninstall <package>
- uv is 10-100x faster than pip and provides better dependency resolution
- This project uses uv package manager exclusively
- Example: Instead of pip install marimo[mcp], use .venv/Scripts/uv.exe pip install marimo[mcp]
NEVER pollute directories with multiple file versions
- Do NOT leave test files, backup files, or old versions in main directories
- If testing: move test files to archive immediately after use
- If updating: either replace the file or archive the old version
- Keep only ONE working version of each file in main directories
- Use descriptive names in archive folders with dates
Creating temporary scripts or files. Make sure they do not pollute the project. Execute them in a temporary script directory, and once you're done with them, delete them. I do not want a buildup of unnecessary files polluting the project. 33a. WINDOWS ENVIRONMENT - NO UNICODE IN BACKEND/SCRIPTS
- NEVER use Unicode symbols (✓, ✗, ✅, →, etc.) in Python backend scripts, CLI tools, or data processing code
- Windows console (cmd.exe) uses cp1252 encoding which doesn't support Unicode
- Use ASCII alternatives instead:
  - ✓ → [OK] or +
  - ✗ → [ERROR] or x
  - ✅ → [SUCCESS]
  - → → ->
- Unicode IS acceptable in:
  - Marimo notebooks (rendered in browser)
  - Documentation files (README.md, etc.)
  - Comments in code (not print statements)
- This is a Windows-specific constraint - the local setup runs on Windows
MARIMO NOTEBOOK VARIABLE DEFINITIONS
- Marimo requires each variable to be defined in ONLY ONE cell (single-definition constraint)
- Variables defined in multiple cells cause "This cell redefines variables from other cells" errors
- Solution: Use UNIQUE, DESCRIPTIVE variable names that clearly identify their purpose
- WRONG: Using _variable_name or variable_name in multiple cells (confusing, not descriptive)
- RIGHT: Use descriptive names like stats_key_borders, timeseries_borders, impact_ptdf_cols
- Examples:
  - BAD: key_borders used in 3 cells, or _key_borders everywhere
  - GOOD: stats_key_borders (for statistics table), timeseries_borders (for chart), heatmap_borders (for heatmap)
  - BAD: ptdf_cols used in 2 cells
  - GOOD: impact_ptdf_cols (for impact analysis), ptdf_cols (for main PTDF analysis that returns the variable)
- Variable names must be self-documenting: reader should understand the variable's purpose without looking at code
- When adding new cells to existing notebooks, check for variable name conflicts BEFORE writing code
- Only use shared variable names (returned in the cell) if the variable needs to be accessed by other cells
- This enables Marimo's reactive execution and prevents redefinition errors
MARIMO NOTEBOOK DATA PROCESSING - POLARS STRONGLY PREFERRED
- STRONG PREFERENCE: Use Polars for all data processing in Marimo notebooks
- Pandas/NumPy allowed when absolutely necessary: e.g., when using libraries like jao-py that require pandas Timestamps
- Polars is faster, more memory efficient, and better for large datasets
- Examples:
  - PREFERRED: import polars as pl, df.unpivot(), Polars-native operations
  - AVOID when possible: import pandas as pd, pd.melt(), pandas operations
  - ACCEPTABLE: Using pandas when required by external libraries (jao-py, entsoe-py)
- Only convert to pandas at the very last step for Altair visualization: chart = alt.Chart(df.to_pandas())
- Use Polars methods whenever possible:
  - Reshaping: df.unpivot() instead of pandas melt()
  - Aggregation: df.mean(), df.group_by().agg()
  - Selection: df.select(), df.filter()
  - Column operations: df[col].mean(), df.with_columns()
- When iterating through columns: for col in df.columns and compute with df[col].operation()
- Pattern: Use pandas only where unavoidable, immediately convert to Polars for processing
- This ensures consistent, fast, memory-efficient data processing throughout notebooks
MARIMO NOTEBOOK WORKFLOW & MCP INTEGRATION
- When editing Marimo notebooks, ALWAYS run .venv/Scripts/marimo.exe check <notebook.py> after making changes
- Fix ALL issues reported by marimo check before considering the edit complete
- Use the check command's feedback for self-correction
- Never skip validation - marimo check catches variable redefinitions, syntax errors, and cell issues
- Pattern: Edit → Check → Fix → Verify
- Start notebooks with --mcp --no-token --watch for AI-enhanced development:
  - --mcp: Exposes notebook inspection tools via Model Context Protocol
  - --no-token: Disables authentication for local development
  - --watch: Auto-reloads notebook when file changes on disk
- MCP integration enables real-time error detection, variable inspection, and cell state monitoring
- Example workflow: Edit in Claude → Save → Auto-reload → Check → Fix errors → Verify
- The MCP server exposes these capabilities to Claude Code:
  - get_active_notebooks - List running notebooks
  - get_errors - Detect cell errors in real-time
  - get_variables - Inspect variable definitions
  - get_cell_code - Read specific cell contents
- Use marimo check for pre-commit validation to catch issues before deployment
- Always verify notebook runs error-free before marking work as complete

Project Identity

Zero-shot electricity cross-border capacity forecasting using Chronos 2

5-day MVP timeline (FIRM - no extensions)
Target: 134 MW MAE on D+1 forecasts
Approach: Zero-shot inference only (NO fine-tuning)
Handover: Complete working system to quantitative analyst

Tech Stack

Core ML/Data

Model: Amazon Chronos 2 Large (710M params, pre-trained)
Data Processing: Polars (primary), PyArrow
Scientific: NumPy, scikit-learn
Framework: PyTorch 2.0+, Transformers 4.35+

Development Environment

Local Notebooks: Marimo 0.9+ (reactive, .py format)
Handover Format: JupyterLab (standard .ipynb)
Infrastructure: HuggingFace Space (JupyterLab SDK, A10G GPU)
Package Manager: uv (10-100x faster than pip)

Data Collection

JAO Data: jao-py Python library (no Java required)
Power Data: entsoe-py (ENTSO-E Transparency API)
Weather Data: OpenMeteo API (free tier)
Data Storage: HuggingFace Datasets (NOT Git/Git-LFS)

Visualization & Analysis

Primary: Altair 5.0+
Notebooks: Marimo reactive interface
Export: Standard matplotlib/seaborn for static reports

Testing & Quality

Testing: pytest (unit, integration, smoke tests)
Validation: Custom assertions for data quality
CI/CD: GitHub Actions (optional, for automated testing)

Critical Execution Rules

1. Scope Discipline

ONLY zero-shot inference - no model training/fine-tuning
ONLY Core FBMC (13 countries, ~20 borders)
ONLY 24 months historical data (Oct 2023 - Sept 2025)
ONLY 5 days development time
If asked to add features, reference Phase 2 handover

2. Data Management Philosophy

Code      → Git repository (~50 MB, version controlled)
Data      → HuggingFace Datasets (~12 GB, separate storage)
NO Git LFS (never, following data science best practices)

NEVER commit data files (.parquet, .csv, .pkl) to Git
All data goes through HuggingFace Datasets API
.gitignore must exclude data/ directory
Git repo must stay under 100 MB total

3. Chronos 2 Zero-Shot Pattern

# CORRECT - Zero-shot inference
pipeline = ChronosPipeline.from_pretrained("amazon/chronos-t5-large")
forecast = pipeline.predict(context=features[-512:], prediction_length=336)

# INCORRECT - Do NOT train/fine-tune
model.fit(training_data)  # ❌ OUT OF SCOPE

Load pre-trained model only
Use 24-month data for feature baselines and context windows
NO gradient updates, NO epoch training, NO .fit() calls

4. Marimo Development Workflow

Use Marimo locally for reactive development
Export to Jupyter for quant analyst handover
Structure: DAG cells, no variable redefinition
Pattern for expensive ops: mo.ui.run_button() + @mo.cache()
Configure: auto_instantiate = false, on_cell_change = "lazy"

5. Feature Engineering Constraints

~1,735 features across 11 categories (production-grade architecture)
52 weather grid points (simplified spatial model)
200 CNECs (50 Tier-1 + 150 Tier-2) with weighted scoring
Focus on high-signal features only
Validate >95% feature completeness

6. Performance Targets

Inference: <5 minutes for complete 14-day forecast
Accuracy: D+1 MAE target is 134 MW (must be <150 MW)
Cost: $30/month (A10G GPU, no upgrades in MVP)
Document performance gaps for Phase 2 fine-tuning

7. Code Quality Standards

Polars-first for data operations (faster, more memory efficient)
Type hints for all function signatures
Docstrings for all non-trivial functions
Validation checks at every pipeline stage
Error handling with informative messages

8. Daily Development Structure

Day 0: Environment setup (45 min) → git commit + push
Day 1: Data collection (8 hrs) → validate data → git commit + push
Day 2: Feature engineering (8 hrs) → test features → git commit + push
Day 3: Zero-shot inference (8 hrs) → smoke test → git commit + push
Day 4: Performance evaluation (8 hrs) → validate metrics → git commit + push
Day 5: Documentation + handover (8 hrs) → integration test → final commit + push

Each day ends with validation tests + git commit + push to GitHub
Intermediate commits for major milestones within the day
NO day can bleed into the next
If running behind, scope down (never extend timeline)
Tests must pass before committing

9. Git Workflow & Version Control

Commit frequency: End of each major milestone + end of each day
Commit style: Conventional commits format
- feat: add weather data collection pipeline
- fix: correct CNEC binding frequency calculation
- docs: update handover guide with evaluation metrics
- refactor: optimize feature engineering for polars
Push to GitHub: After every commit (keep remote in sync)
Branch strategy: Main branch only for MVP (no feature branches)
Commit granularity: Logical units of work (not "end of day dump")
Git hygiene: Review git status before commits, ensure data/ excluded

Daily commit pattern:

# End of Day 1
git add .
git commit -m "feat: complete data collection pipeline with HF Datasets integration"
git push origin main

# Mid-Day 2 milestone
git commit -m "feat: implement ~1,735-feature engineering pipeline"
git push origin main

# End of Day 2
git commit -m "test: add feature validation and CNEC identification"
git push origin main

10. Testing Strategy

Data validation: Assert data completeness, check for nulls, validate ranges
Feature engineering: Unit tests for each feature calculation
Model inference: Smoke test on small sample before full run
Integration: End-to-end pipeline test with 1-week subset
Performance: Assert inference time <5 min, MAE within bounds

Testing patterns:

# Data validation checks
assert df.null_count().sum() < 0.05 * len(df), "Too many missing values"
assert date_range_complete(df['timestamp']), "Date gaps detected"

# Feature validation
features = engineer.transform(data)
assert features.shape[1] == 1735, f"Expected ~1,735 features, got {features.shape[1]}"
assert (features.select(pl.all().is_null().sum()).row(0) == (0,) * 1735), "Null features detected"

# Inference validation
forecast = pipeline.predict(context, prediction_length=336)
assert forecast.shape == (336, n_borders), "Forecast shape mismatch"
assert not np.isnan(forecast).any(), "NaN in predictions"

Testing schedule:

Day 1: Validate downloaded data completeness
Day 2: Test each feature calculation independently
Day 3: Smoke test inference on 7-day window
Day 4: Validate evaluation metrics calculations
Day 5: Full integration test before handover

Test organization (tests/ directory):

tests/
├── test_data_collection.py     # Data completeness, API responses
├── test_feature_engineering.py # Each feature calculation
├── test_model_inference.py     # Inference smoke tests
└── test_integration.py         # End-to-end pipeline

Running tests:

# Install pytest
uv pip install pytest

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_feature_engineering.py -v

# Before each commit
pytest tests/ && git commit -m "feat: ..."

11. Documentation Requirements

README.md with quick start guide
HANDOVER_GUIDE.md for quant analyst
Inline code comments for complex logic
Results visualization + interpretation
Fine-tuning roadmap (Phase 2 guidance)

12. Handover Package Must Include

Working zero-shot forecast system
All Marimo notebooks (.py) + exported Jupyter (.ipynb)
HuggingFace Space with complete environment
Performance analysis showing 134 MW MAE achieved
Error analysis identifying fine-tuning opportunities
Clear Phase 2 roadmap

Geographic Scope (Reference)

Core FBMC Countries (13 total): AT, BE, HR, CZ, FR, DE-LU, HU, NL, PL, RO, SK, SI

Borders: ~20 interconnections (multivariate forecasting)

OUT OF SCOPE: Nordic FBMC (NO, SE, DK, FI) - Phase 2 only

API Access Confirmed

✓ jao-py library (24 months FBMC data accessible)
✓ ENTSO-E API key (generation, flows)
✓ OpenMeteo API (free tier, 52 grid points)
✓ HuggingFace write token (Datasets upload)

Decision-Making Framework

When uncertain, apply this hierarchy:

Does it extend timeline? → Reject immediately
Does it require fine-tuning? → Phase 2 only
Does it compromise data management? → Never commit data to Git
Does it add features beyond 1,735? → Reject (scope creep)
Does it skip testing/validation? → Add checks immediately
Does it help quant analyst? → Include in handover docs
Does it improve zero-shot accuracy? → Consider if time permits
Does it add complexity? → Default to simplicity
Can you commit and push? → Do it now (frequent commits)

Anti-Patterns to Avoid

❌ Training/fine-tuning the model (Phase 2) ❌ Committing data files to Git repository ❌ Using Git LFS for data storage ❌ Extending beyond 5-day timeline ❌ Adding features beyond 1,735 count ❌ Including Nordic FBMC borders ❌ Building production automation (out of scope) ❌ Creating real-time dashboards (out of scope) ❌ Over-engineering infrastructure ❌ Forgetting to document for handover ❌ Skipping data validation checks ❌ Running full pipeline without smoke tests ❌ Committing without pushing to GitHub

Success Criteria Checklist

At Day 5 completion:

Zero-shot forecasts for all ~20 FBMC borders working
Inference time <5 minutes per 14-day forecast
D+1 MAE ≤ 134 MW (target <150 MW)
HuggingFace Space operational at $30/month
Complete handover documentation written
All Marimo notebooks exported to Jupyter format
Git repo <100 MB (code only, no data)
Data stored in HuggingFace Datasets (separate)
Quant analyst can fork HF Space and continue
All tests passing (data validation, feature checks, inference)
Git history shows daily commits with descriptive messages
GitHub repo synchronized with all commits pushed

Communication Style

When providing updates or recommendations:

Lead with impact on 5-day timeline
Be direct about scope constraints
Suggest alternatives within MVP boundaries
Reference Phase 2 for out-of-scope items
Document assumptions and limitations
Always include next concrete action

Version: 2.0.0 Created: 2025-10-27 Updated: 2025-10-29 (unified with production-grade scope) Project: FBMC Flow Forecasting MVP (Zero-Shot) Purpose: Execution rules for Claude during 5-day development