cagataydev commited on
Commit
a86276c
·
verified ·
1 Parent(s): 1827da5

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +65 -116
  2. trainer_state.json +0 -9
  3. training_args.bin +2 -2
README.md CHANGED
@@ -1,144 +1,93 @@
1
- # GR00T Fruit-6K: Robotics Vision-Action Model
2
-
3
- This repository contains a fine-tuned GR00T (N1.5-3B) model trained on fruit manipulation tasks, optimized at 6,000 training steps.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
- ## Model Overview
6
 
7
- - **Base Model**: NVIDIA GR00T N1.5-3B (General-purpose Robotics 00 Transformer)
8
- - **Training Steps**: 6,000 (optimal checkpoint based on evaluation analysis)
9
- - **Task**: Fruit manipulation and handling using robotic arm
10
- - **Dataset**: Wholettheducksout dataset with single-arm configuration
11
- - **Model Size**: ~7.58 GB total model weights
12
 
13
- ## Key Training Metrics
14
 
15
- - **Final Training Loss**: 0.036353
16
- - **Training Configuration**: Single-arm embodiment with front and wrist camera setup
17
- - **Action Dimension**: 32 (single arm + gripper control)
18
- - **Vision Input**: Dual-camera setup (640x480 resolution, 30 FPS)
19
- - **Action Horizon**: 16 timesteps
20
- - **Diffusion Steps**: 4 inference timesteps
21
 
22
- ## Architecture Details
 
 
 
 
 
23
 
24
- ### Action Head Configuration
25
- - **Diffusion Model**: 16 layers, 32 attention heads
26
- - **Hidden Size**: 1024
27
- - **Cross-attention Dimension**: 2048
28
- - **Backbone Embedding**: 2048 dimensions
29
- - **State/Action Encoding**: Multi-layer projections
30
 
31
- ### Vision Backbone
32
- - **Eagle Model**: Qwen3.1-7B + SigLIP-400M hybrid architecture
33
- - **Visual Layers**: 27 encoder layers with self-attention
34
- - **Language Model**: 12 layers with RMSNorm and SwiGLU activation
35
- - **Flash Attention**: Enabled for efficient processing
36
 
37
- ## Files Structure
38
 
 
39
  ```
40
- checkpoint-6000/
41
- ├── config.json # Model configuration
42
- ├── model-00001-of-00002.safetensors # Model weights (part 1)
43
- ├── model-00002-of-00002.safetensors # Model weights (part 2)
44
- ├── model.safetensors.index.json # Weight mapping index
45
- ├── trainer_state.json # Training state and metrics
46
- ├── optimizer.pt # Optimizer state
47
- ├── scheduler.pt # Learning rate scheduler
48
- ├── rng_state.pth # Random number generator state
49
- └── experiment_cfg/
50
- └── metadata.json # Embodiment and modality configuration
51
-
52
- tensorboard_logs/
53
- └── Oct17_23-05-33_ip-172-31-3-77/
54
- └── events.out.tfevents.* # TensorBoard training metrics
55
  ```
56
 
57
- ## Training Analysis
58
-
59
- Based on comprehensive evaluation, the 6K checkpoint demonstrates superior generalization compared to longer-trained models (200K steps). Key findings:
60
-
61
- ### Why 6K Steps is Optimal
62
 
63
- 1. **Balanced Learning**: Model learned general patterns without memorizing specific examples
64
- 2. **Generalization**: Better performance on unseen test scenarios
65
- 3. **Training Efficiency**: Optimal compute-to-performance ratio
66
- 4. **Overfitting Avoidance**: Stopped before the model began fitting training noise
67
-
68
- ### Training Loss Evolution
69
- - **Initial Rapid Learning** (0-2K steps): Loss dropped from 0.778 to ~0.2
70
- - **Steady Refinement** (2K-5K steps): Gradual improvement to ~0.05
71
- - **Fine-tuning Phase** (5K-6K steps): Final optimization to 0.036
72
 
73
  ## Usage
74
 
75
- ### Loading the Model
76
 
77
  ```python
78
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
79
 
80
- # Load the fine-tuned 6K checkpoint
81
- model = AutoModelForCausalLM.from_pretrained("cagataydev/gr00t-fruit-6k", subfolder="checkpoint-6000")
82
  ```
83
 
84
- ### TensorBoard Visualization
85
-
86
- To view the training metrics:
87
 
88
- ```bash
89
- # Clone the repository
90
- git clone https://huggingface.co/cagataydev/gr00t-fruit-6k
91
- cd gr00t-fruit-6k
92
-
93
- # Launch TensorBoard
94
- tensorboard --logdir=tensorboard_logs/
95
- ```
96
 
97
- Then navigate to `http://localhost:6006` to view:
98
- - Training loss curves
99
- - Learning rate schedules
100
- - Gradient norms
101
- - Step-by-step training progress
102
 
103
- ## Evaluation Metrics
 
 
 
104
 
105
- The model was evaluated against longer-trained variants and showed:
106
- - **Superior test performance** vs 200K step model
107
- - **Efficient training curve** with clear convergence
108
- - **Stable gradient norms** throughout training
109
- - **Optimal stopping point** identified at 6K steps
110
 
111
- ## Embodiment Configuration
112
 
113
- - **State Space**: 6D (5D single arm + 1D gripper)
114
- - **Action Space**: 6D (5D single arm + 1D gripper)
115
- - **Vision Modalities**: Front camera + wrist camera
116
- - **Control Frequency**: 30 Hz
117
- - **Planning Horizon**: 16 timesteps
118
-
119
- ## Technical Specifications
120
-
121
- - **Model Type**: `gr00t_n1_5`
122
- - **Compute Type**: `bfloat16`
123
- - **Model Precision**: `float32`
124
- - **Flash Attention**: Enabled
125
- - **Vision Tuning**: Enabled (backbone frozen, visual layers tuned)
126
- - **Diffusion Inference**: 4 timesteps with noise scheduling
127
-
128
- ## Performance Characteristics
129
-
130
- This 6K checkpoint represents the optimal balance between:
131
- - **Learning Capability**: Sufficient training to master fruit manipulation
132
- - **Generalization**: Avoids overfitting to specific training examples
133
- - **Computational Efficiency**: Minimal training time for maximum performance
134
- - **Deployment Readiness**: Stable, production-ready model weights
135
-
136
- ## Citation
137
-
138
- If you use this model in your research, please consider citing the original GR00T paper and this fine-tuned variant.
139
-
140
- ---
141
 
142
- **Created by**: cagataydev
143
- **Training Date**: October 2024
144
- **Optimization**: 6K steps identified as optimal through systematic evaluation
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - robotics
5
+ - embodied-ai
6
+ - fruit-manipulation
7
+ - gr00t
8
+ - nvidia
9
+ - pytorch
10
+ - fine-tuned
11
+ datasets:
12
+ - aaronsu11/so101_fruit
13
+ library_name: transformers
14
+ pipeline_tag: robotics
15
+ base_model: nvidia/GR00T-N1.5-3B
16
+ model_type: gr00t
17
+ language:
18
+ - en
19
+ ---
20
 
21
+ # GR00T Fruit Manipulation Model
22
 
23
+ ## Model Description
 
 
 
 
24
 
25
+ This is a GR00T model fine-tuned for fruit manipulation tasks. The model has been trained for 6,000 steps on fruit handling and manipulation scenarios.
26
 
27
+ ## Training Details
 
 
 
 
 
28
 
29
+ - **Model Architecture**: GR00T-N1.5-3B
30
+ - **Training Steps**: 6,000
31
+ - **Training Duration**: ~2 hours
32
+ - **Batch Size**: 32
33
+ - **Data Configuration**: so100_dualcam
34
+ - **Embodiment**: New embodiment configuration
35
 
36
+ ## Dataset
 
 
 
 
 
37
 
38
+ This model was trained using the **so101_fruit** dataset, which contains fruit manipulation demonstrations.
 
 
 
 
39
 
40
+ **Original Dataset Source**: [https://huggingface.co/datasets/aaronsu11/so101_fruit](https://huggingface.co/datasets/aaronsu11/so101_fruit)
41
 
42
+ Please cite the original dataset when using this model:
43
  ```
44
+ @dataset{aaronsu11_so101_fruit,
45
+ title={SO101 Fruit Dataset},
46
+ author={aaronsu11},
47
+ url={https://huggingface.co/datasets/aaronsu11/so101_fruit},
48
+ year={2024}
49
+ }
 
 
 
 
 
 
 
 
 
50
  ```
51
 
52
+ ## Capabilities
 
 
 
 
53
 
54
+ This model is designed for:
55
+ - Fruit handling and manipulation tasks
56
+ - Object grasping and placement
57
+ - Robotic manipulation in kitchen/food preparation scenarios
 
 
 
 
 
58
 
59
  ## Usage
60
 
61
+ Load the model using the standard GR00T inference pipeline:
62
 
63
  ```python
64
+ # Example usage with GR00T inference
65
+ from gr00t_inference import GR00TModel
66
 
67
+ model = GR00TModel.from_pretrained("cagataydev/gr00t-fruit-6k")
68
+ # Use for fruit manipulation tasks
69
  ```
70
 
71
+ ## Model Files
 
 
72
 
73
+ The repository contains:
74
+ - `model-00001-of-00002.safetensors` & `model-00002-of-00002.safetensors`: Model weights
75
+ - `config.json`: Model configuration
76
+ - `model.safetensors.index.json`: Model index
77
+ - `trainer_state.json`: Training state information
78
+ - `training_args.bin`: Training arguments
 
 
79
 
80
+ ## Training Infrastructure
 
 
 
 
81
 
82
+ - **Platform**: Ubuntu
83
+ - **Compute**: Single GPU
84
+ - **Framework**: GR00T training pipeline
85
+ - **Checkpoints**: Saved every 2,000 steps
86
 
87
+ ## License
 
 
 
 
88
 
89
+ Please refer to the original dataset license and GR00T model license for usage terms.
90
 
91
+ ## Acknowledgments
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92
 
93
+ Special thanks to the creators of the original SO101 Fruit dataset for providing high-quality training data for robotic manipulation research.
 
 
trainer_state.json CHANGED
@@ -4208,15 +4208,6 @@
4208
  "learning_rate": 7.594339912486703e-12,
4209
  "loss": 0.0155,
4210
  "step": 6000
4211
- },
4212
- {
4213
- "epoch": 4.87012987012987,
4214
- "step": 6000,
4215
- "total_flos": 0.0,
4216
- "train_loss": 0.036353461609532435,
4217
- "train_runtime": 6781.7376,
4218
- "train_samples_per_second": 28.311,
4219
- "train_steps_per_second": 0.885
4220
  }
4221
  ],
4222
  "logging_steps": 10,
 
4208
  "learning_rate": 7.594339912486703e-12,
4209
  "loss": 0.0155,
4210
  "step": 6000
 
 
 
 
 
 
 
 
 
4211
  }
4212
  ],
4213
  "logging_steps": 10,
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:53b2f21ba255d343422c9703b7275bf05abe3dd421dd558ee4637405fe6f0c22
3
- size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e1039c945ace050bc2c045345c6c9addb640349becd4812d2abb5daf8f02feb
3
+ size 129