--- tags: - deep-reinforcement-learning - reinforcement-learning - TD3 - continuous-control library_name: stable-baselines3 model-index: - name: td3_lunar results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: LunarLanderContinuous-v2 type: LunarLanderContinuous-v2 metrics: - type: mean_reward value: 250.00 +/- 50.00 name: mean_reward verified: false --- # TD3 Model: td3_lunar ## Model Description This is a trained TD3 (Twin Delayed Deep Deterministic Policy Gradient) agent for the LunarLanderContinuous-v2 environment. ## Environment - **Environment ID**: `LunarLanderContinuous-v2` - **Action Space**: Box(2,) - Continuous actions for main engine and side engines - **Observation Space**: Box(8,) - Position, velocity, angle, angular velocity, leg contact ## Training Details - **Total Timesteps**: 1,000,000 - **Training Time**: 2 hours - **Framework**: PyTorch - **Library**: stable-baselines3 (or your custom implementation) ## Hyperparameters - **Learning Rate (Actor)**: 3e-4 - **Learning Rate (Critic)**: 3e-4 - **Discount Factor (gamma)**: 0.99 - **Tau**: 0.005 - **Policy Noise**: 0.2 - **Noise Clip**: 0.5 - **Policy Delay**: 2 - **Buffer Size**: 1,000,000 - **Batch Size**: 256 ## Results - **Mean Reward**: 250.00 ± 50.00 (over 100 evaluation episodes) ## Usage ```python import torch import gymnasium as gym # Load the actor model actor = YourActorClass() # Define your actor architecture actor.load_state_dict(torch.load('actor.pth')) actor.eval() # Use the model env = gym.make('LunarLanderContinuous-v2') state, info = env.reset() done = False while not done: action = actor(torch.FloatTensor(state)).detach().numpy() state, reward, terminated, truncated, info = env.step(action) done = terminated or truncated ``` ## Files - `actor.pth`: Actor network weights - `critic_1.pth`: First critic network weights - `critic_2.pth`: Second critic network weights - `config.json`: Model configuration