---
tags:
- deep-reinforcement-learning
- reinforcement-learning
- TD3
- continuous-control
library_name: stable-baselines3
model-index:
- name: td3_lunar
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: LunarLanderContinuous-v2
      type: LunarLanderContinuous-v2
    metrics:
    - type: mean_reward
      value: 250.00 +/- 50.00
      name: mean_reward
      verified: false
---

# TD3 Model: td3_lunar

## Model Description
This is a trained TD3 (Twin Delayed Deep Deterministic Policy Gradient) agent for the LunarLanderContinuous-v2 environment.

## Environment
- **Environment ID**: `LunarLanderContinuous-v2`
- **Action Space**: Box(2,) - Continuous actions for main engine and side engines
- **Observation Space**: Box(8,) - Position, velocity, angle, angular velocity, leg contact

## Training Details
- **Total Timesteps**: 1,000,000
- **Training Time**: 2 hours
- **Framework**: PyTorch
- **Library**: stable-baselines3 (or your custom implementation)

## Hyperparameters
- **Learning Rate (Actor)**: 3e-4
- **Learning Rate (Critic)**: 3e-4
- **Discount Factor (gamma)**: 0.99
- **Tau**: 0.005
- **Policy Noise**: 0.2
- **Noise Clip**: 0.5
- **Policy Delay**: 2
- **Buffer Size**: 1,000,000
- **Batch Size**: 256

## Results
- **Mean Reward**: 250.00 ± 50.00 (over 100 evaluation episodes)

## Usage
```python
import torch
import gymnasium as gym

# Load the actor model
actor = YourActorClass()  # Define your actor architecture
actor.load_state_dict(torch.load('actor.pth'))
actor.eval()

# Use the model
env = gym.make('LunarLanderContinuous-v2')
state, info = env.reset()
done = False

while not done:
    action = actor(torch.FloatTensor(state)).detach().numpy()
    state, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated
```

## Files
- `actor.pth`: Actor network weights
- `critic_1.pth`: First critic network weights
- `critic_2.pth`: Second critic network weights
- `config.json`: Model configuration