bubbliiiing
commited on
Commit
·
b3798d8
1
Parent(s):
f754113
Update Readme
Browse files- README.md +90 -23
- README_en.md +90 -23
README.md
CHANGED
|
@@ -33,6 +33,8 @@ tasks:
|
|
| 33 |
|
| 34 |
😊 Welcome!
|
| 35 |
|
|
|
|
|
|
|
| 36 |
[English](./README_en.md) | 简体中文
|
| 37 |
|
| 38 |
# 目录
|
|
@@ -52,6 +54,7 @@ CogVideoX-Fun是一个基于CogVideoX结构修改后的的pipeline,是一个
|
|
| 52 |
我们会逐渐支持从不同平台快速启动,请参阅 [快速启动](#快速启动)。
|
| 53 |
|
| 54 |
新特性:
|
|
|
|
| 55 |
- 创建代码!现在支持 Windows 和 Linux。支持2b与5b最大256x256x49到1024x1024x49的任意分辨率的视频生成。[ 2024.09.18 ]
|
| 56 |
|
| 57 |
功能概览:
|
|
@@ -95,10 +98,10 @@ cd CogVideoX-Fun
|
|
| 95 |
mkdir models/Diffusion_Transformer
|
| 96 |
mkdir models/Personalized_Model
|
| 97 |
|
| 98 |
-
wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz
|
| 99 |
|
| 100 |
cd models/Diffusion_Transformer/
|
| 101 |
-
tar -xvf CogVideoX-Fun-2b-InP.tar.gz
|
| 102 |
cd ../../
|
| 103 |
```
|
| 104 |
|
|
@@ -130,8 +133,8 @@ Linux 的详细信息:
|
|
| 130 |
```
|
| 131 |
📦 models/
|
| 132 |
├── 📂 Diffusion_Transformer/
|
| 133 |
-
│ ├── 📂 CogVideoX-Fun-2b-InP/
|
| 134 |
-
│ └── 📂 CogVideoX-Fun-5b-InP/
|
| 135 |
├── 📂 Personalized_Model/
|
| 136 |
│ └── your trained trainformer model / your trained lora model (for UI load)
|
| 137 |
```
|
|
@@ -139,42 +142,43 @@ Linux 的详细信息:
|
|
| 139 |
# 视频作品
|
| 140 |
所展示的结果都是图生视频获得。
|
| 141 |
|
| 142 |
-
### CogVideoX-Fun-5B
|
| 143 |
|
| 144 |
Resolution-1024
|
| 145 |
|
| 146 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 147 |
<tr>
|
| 148 |
<td>
|
| 149 |
-
<video src="https://github.com/user-attachments/assets/
|
| 150 |
</td>
|
| 151 |
<td>
|
| 152 |
-
<video src="https://github.com/user-attachments/assets/
|
| 153 |
</td>
|
| 154 |
<td>
|
| 155 |
-
<video src="https://github.com/user-attachments/assets/
|
| 156 |
</td>
|
| 157 |
<td>
|
| 158 |
-
<video src="https://github.com/user-attachments/assets/
|
| 159 |
</td>
|
| 160 |
</tr>
|
| 161 |
</table>
|
| 162 |
|
|
|
|
| 163 |
Resolution-768
|
| 164 |
|
| 165 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 166 |
<tr>
|
| 167 |
<td>
|
| 168 |
-
<video src="https://github.com/user-attachments/assets/
|
| 169 |
</td>
|
| 170 |
<td>
|
| 171 |
-
<video src="https://github.com/user-attachments/assets/
|
| 172 |
</td>
|
| 173 |
<td>
|
| 174 |
-
<video src="https://github.com/user-attachments/assets/
|
| 175 |
</td>
|
| 176 |
<td>
|
| 177 |
-
<video src="https://github.com/user-attachments/assets/
|
| 178 |
</td>
|
| 179 |
</tr>
|
| 180 |
</table>
|
|
@@ -184,41 +188,92 @@ Resolution-512
|
|
| 184 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 185 |
<tr>
|
| 186 |
<td>
|
| 187 |
-
<video src="https://github.com/user-attachments/assets/
|
| 188 |
</td>
|
| 189 |
<td>
|
| 190 |
-
<video src="https://github.com/user-attachments/assets/
|
| 191 |
</td>
|
| 192 |
<td>
|
| 193 |
-
<video src="https://github.com/user-attachments/assets/
|
| 194 |
</td>
|
| 195 |
<td>
|
| 196 |
-
<video src="https://github.com/user-attachments/assets/
|
| 197 |
</td>
|
| 198 |
</tr>
|
| 199 |
</table>
|
| 200 |
|
| 201 |
-
### CogVideoX-Fun-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 202 |
|
| 203 |
Resolution-768
|
| 204 |
|
| 205 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 206 |
<tr>
|
| 207 |
<td>
|
| 208 |
-
<video src="https://github.com/user-attachments/assets/
|
| 209 |
</td>
|
| 210 |
<td>
|
| 211 |
-
<video src="https://github.com/user-attachments/assets/
|
| 212 |
</td>
|
| 213 |
<td>
|
| 214 |
-
<video src="https://github.com/user-attachments/assets/
|
| 215 |
</td>
|
| 216 |
<td>
|
| 217 |
-
<video src="https://github.com/user-attachments/assets/
|
| 218 |
</td>
|
| 219 |
</tr>
|
| 220 |
</table>
|
| 221 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 222 |
|
| 223 |
# 如何使用
|
| 224 |
|
|
@@ -318,6 +373,18 @@ sh scripts/train.sh
|
|
| 318 |
关于一些参数的设置细节,可以查看[Readme Train](scripts/README_TRAIN.md)与[Readme Lora](scripts/README_TRAIN_LORA.md)
|
| 319 |
|
| 320 |
# 模型地址
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 321 |
| 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
| 322 |
|--|--|--|--|--|
|
| 323 |
| CogVideoX-Fun-2b-InP.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | 官方的图生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
|
@@ -335,4 +402,4 @@ sh scripts/train.sh
|
|
| 335 |
|
| 336 |
CogVideoX-2B 模型 (包括其对应的Transformers模块,VAE模块) 根据 [Apache 2.0 协议](LICENSE) 许可证发布。
|
| 337 |
|
| 338 |
-
CogVideoX-5B 模型(Transformer 模块)在[CogVideoX许可证](https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)下发布.
|
|
|
|
| 33 |
|
| 34 |
😊 Welcome!
|
| 35 |
|
| 36 |
+
[](https://huggingface.co/spaces/alibaba-pai/CogVideoX-Fun-5b)
|
| 37 |
+
|
| 38 |
[English](./README_en.md) | 简体中文
|
| 39 |
|
| 40 |
# 目录
|
|
|
|
| 54 |
我们会逐渐支持从不同平台快速启动,请参阅 [快速启动](#快速启动)。
|
| 55 |
|
| 56 |
新特性:
|
| 57 |
+
- 重新训练i2v模型,添加Noise,使得视频的运动幅度更大。上传控制模型训练代码与Control模型。[ 2024.09.29 ]
|
| 58 |
- 创建代码!现在支持 Windows 和 Linux。支持2b与5b最大256x256x49到1024x1024x49的任意分辨率的视频生成。[ 2024.09.18 ]
|
| 59 |
|
| 60 |
功能概览:
|
|
|
|
| 98 |
mkdir models/Diffusion_Transformer
|
| 99 |
mkdir models/Personalized_Model
|
| 100 |
|
| 101 |
+
wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz
|
| 102 |
|
| 103 |
cd models/Diffusion_Transformer/
|
| 104 |
+
tar -xvf CogVideoX-Fun-V1.1-2b-InP.tar.gz
|
| 105 |
cd ../../
|
| 106 |
```
|
| 107 |
|
|
|
|
| 133 |
```
|
| 134 |
📦 models/
|
| 135 |
├── 📂 Diffusion_Transformer/
|
| 136 |
+
│ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
|
| 137 |
+
│ └── 📂 CogVideoX-Fun-V1.1-5b-InP/
|
| 138 |
├── 📂 Personalized_Model/
|
| 139 |
│ └── your trained trainformer model / your trained lora model (for UI load)
|
| 140 |
```
|
|
|
|
| 142 |
# 视频作品
|
| 143 |
所展示的结果都是图生视频获得。
|
| 144 |
|
| 145 |
+
### CogVideoX-Fun-V1.1-5B
|
| 146 |
|
| 147 |
Resolution-1024
|
| 148 |
|
| 149 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 150 |
<tr>
|
| 151 |
<td>
|
| 152 |
+
<video src="https://github.com/user-attachments/assets/34e7ec8f-293e-4655-bb14-5e1ee476f788" width="100%" controls autoplay loop></video>
|
| 153 |
</td>
|
| 154 |
<td>
|
| 155 |
+
<video src="https://github.com/user-attachments/assets/7809c64f-eb8c-48a9-8bdc-ca9261fd5434" width="100%" controls autoplay loop></video>
|
| 156 |
</td>
|
| 157 |
<td>
|
| 158 |
+
<video src="https://github.com/user-attachments/assets/8e76aaa4-c602-44ac-bcb4-8b24b72c386c" width="100%" controls autoplay loop></video>
|
| 159 |
</td>
|
| 160 |
<td>
|
| 161 |
+
<video src="https://github.com/user-attachments/assets/19dba894-7c35-4f25-b15c-384167ab3b03" width="100%" controls autoplay loop></video>
|
| 162 |
</td>
|
| 163 |
</tr>
|
| 164 |
</table>
|
| 165 |
|
| 166 |
+
|
| 167 |
Resolution-768
|
| 168 |
|
| 169 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 170 |
<tr>
|
| 171 |
<td>
|
| 172 |
+
<video src="https://github.com/user-attachments/assets/0bc339b9-455b-44fd-8917-80272d702737" width="100%" controls autoplay loop></video>
|
| 173 |
</td>
|
| 174 |
<td>
|
| 175 |
+
<video src="https://github.com/user-attachments/assets/70a043b9-6721-4bd9-be47-78b7ec5c27e9" width="100%" controls autoplay loop></video>
|
| 176 |
</td>
|
| 177 |
<td>
|
| 178 |
+
<video src="https://github.com/user-attachments/assets/d5dd6c09-14f3-40f8-8b6d-91e26519b8ac" width="100%" controls autoplay loop></video>
|
| 179 |
</td>
|
| 180 |
<td>
|
| 181 |
+
<video src="https://github.com/user-attachments/assets/9327e8bc-4f17-46b0-b50d-38c250a9483a" width="100%" controls autoplay loop></video>
|
| 182 |
</td>
|
| 183 |
</tr>
|
| 184 |
</table>
|
|
|
|
| 188 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 189 |
<tr>
|
| 190 |
<td>
|
| 191 |
+
<video src="https://github.com/user-attachments/assets/ef407030-8062-454d-aba3-131c21e6b58c" width="100%" controls autoplay loop></video>
|
| 192 |
</td>
|
| 193 |
<td>
|
| 194 |
+
<video src="https://github.com/user-attachments/assets/7610f49e-38b6-4214-aa48-723ae4d1b07e" width="100%" controls autoplay loop></video>
|
| 195 |
</td>
|
| 196 |
<td>
|
| 197 |
+
<video src="https://github.com/user-attachments/assets/1fff0567-1e15-415c-941e-53ee8ae2c841" width="100%" controls autoplay loop></video>
|
| 198 |
</td>
|
| 199 |
<td>
|
| 200 |
+
<video src="https://github.com/user-attachments/assets/bcec48da-b91b-43a0-9d50-cf026e00fa4f" width="100%" controls autoplay loop></video>
|
| 201 |
</td>
|
| 202 |
</tr>
|
| 203 |
</table>
|
| 204 |
|
| 205 |
+
### CogVideoX-Fun-V1.1-5B-Pose
|
| 206 |
+
|
| 207 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 208 |
+
<tr>
|
| 209 |
+
<td>
|
| 210 |
+
Resolution-512
|
| 211 |
+
</td>
|
| 212 |
+
<td>
|
| 213 |
+
Resolution-768
|
| 214 |
+
</td>
|
| 215 |
+
<td>
|
| 216 |
+
Resolution-1024
|
| 217 |
+
</td>
|
| 218 |
+
<tr>
|
| 219 |
+
<td>
|
| 220 |
+
<video src="https://github.com/user-attachments/assets/a746df51-9eb7-4446-bee5-2ee30285c143" width="100%" controls autoplay loop></video>
|
| 221 |
+
</td>
|
| 222 |
+
<td>
|
| 223 |
+
<video src="https://github.com/user-attachments/assets/db295245-e6aa-43be-8c81-32cb411f1473" width="100%" controls autoplay loop></video>
|
| 224 |
+
</td>
|
| 225 |
+
<td>
|
| 226 |
+
<video src="https://github.com/user-attachments/assets/ec9875b2-fde0-48e1-ab7e-490cee51ef40" width="100%" controls autoplay loop></video>
|
| 227 |
+
</td>
|
| 228 |
+
</tr>
|
| 229 |
+
</table>
|
| 230 |
+
|
| 231 |
+
### CogVideoX-Fun-V1.1-2B
|
| 232 |
|
| 233 |
Resolution-768
|
| 234 |
|
| 235 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 236 |
<tr>
|
| 237 |
<td>
|
| 238 |
+
<video src="https://github.com/user-attachments/assets/03235dea-980e-4fc5-9c41-e40a5bc1b6d0" width="100%" controls autoplay loop></video>
|
| 239 |
</td>
|
| 240 |
<td>
|
| 241 |
+
<video src="https://github.com/user-attachments/assets/f7302648-5017-47db-bdeb-4d893e620b37" width="100%" controls autoplay loop></video>
|
| 242 |
</td>
|
| 243 |
<td>
|
| 244 |
+
<video src="https://github.com/user-attachments/assets/cbadf411-28fa-4b87-813d-da63ff481904" width="100%" controls autoplay loop></video>
|
| 245 |
</td>
|
| 246 |
<td>
|
| 247 |
+
<video src="https://github.com/user-attachments/assets/87cc9d0b-b6fe-4d2d-b447-174513d169ab" width="100%" controls autoplay loop></video>
|
| 248 |
</td>
|
| 249 |
</tr>
|
| 250 |
</table>
|
| 251 |
|
| 252 |
+
### CogVideoX-Fun-V1.1-2B-Pose
|
| 253 |
+
|
| 254 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 255 |
+
<tr>
|
| 256 |
+
<td>
|
| 257 |
+
Resolution-512
|
| 258 |
+
</td>
|
| 259 |
+
<td>
|
| 260 |
+
Resolution-768
|
| 261 |
+
</td>
|
| 262 |
+
<td>
|
| 263 |
+
Resolution-1024
|
| 264 |
+
</td>
|
| 265 |
+
<tr>
|
| 266 |
+
<td>
|
| 267 |
+
<video src="https://github.com/user-attachments/assets/487bcd7b-1b7f-4bb4-95b5-96a6b6548b3e" width="100%" controls autoplay loop></video>
|
| 268 |
+
</td>
|
| 269 |
+
<td>
|
| 270 |
+
<video src="https://github.com/user-attachments/assets/2710fd18-8489-46e4-8086-c237309ae7f6" width="100%" controls autoplay loop></video>
|
| 271 |
+
</td>
|
| 272 |
+
<td>
|
| 273 |
+
<video src="https://github.com/user-attachments/assets/b79513db-7747-4512-b86c-94f9ca447fe2" width="100%" controls autoplay loop></video>
|
| 274 |
+
</td>
|
| 275 |
+
</tr>
|
| 276 |
+
</table>
|
| 277 |
|
| 278 |
# 如何使用
|
| 279 |
|
|
|
|
| 373 |
关于一些参数的设置细节,可以查看[Readme Train](scripts/README_TRAIN.md)与[Readme Lora](scripts/README_TRAIN_LORA.md)
|
| 374 |
|
| 375 |
# 模型地址
|
| 376 |
+
|
| 377 |
+
V1.1:
|
| 378 |
+
|
| 379 |
+
| 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
| 380 |
+
|--|--|--|--|--|
|
| 381 |
+
| CogVideoX-Fun-V1.1-2b-InP.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-InP) | 官方的图生视频权重。添加了Noise,运动幅度相比于V1.0更大。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
| 382 |
+
| CogVideoX-Fun-V1.1-5b-InP.tar.gz | 解压前 16.0GB / 解压后 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP) | 官方的图生视频权重。添加了Noise,运动幅度相比于V1.0更大。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
| 383 |
+
| CogVideoX-Fun-V1.1-2b-Pose.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-Pose) | 官方的姿态控制生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
| 384 |
+
| CogVideoX-Fun-V1.1-5b-Pose.tar.gz | 解压前 16.0GB / 解压后 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-Pose) | 官方的姿态控制生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
| 385 |
+
|
| 386 |
+
V1.0:
|
| 387 |
+
|
| 388 |
| 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
| 389 |
|--|--|--|--|--|
|
| 390 |
| CogVideoX-Fun-2b-InP.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | 官方的图生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
|
|
|
| 402 |
|
| 403 |
CogVideoX-2B 模型 (包括其对应的Transformers模块,VAE模块) 根据 [Apache 2.0 协议](LICENSE) 许可证发布。
|
| 404 |
|
| 405 |
+
CogVideoX-5B 模型(Transformer 模块)在[CogVideoX许可证](https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)下发布.
|
README_en.md
CHANGED
|
@@ -23,6 +23,7 @@ CogVideoX-Fun is a modified pipeline based on the CogVideoX structure, designed
|
|
| 23 |
We will support quick pull-ups from different platforms, refer to [Quick Start](#quick-start).
|
| 24 |
|
| 25 |
What's New:
|
|
|
|
| 26 |
- Create code! Now supporting Windows and Linux. Supports 2b and 5b models. Supports video generation at any resolution from 256x256x49 to 1024x1024x49. [ 2024.09.18 ]
|
| 27 |
|
| 28 |
Function:
|
|
@@ -68,10 +69,10 @@ cd CogVideoX-Fun
|
|
| 68 |
mkdir models/Diffusion_Transformer
|
| 69 |
mkdir models/Personalized_Model
|
| 70 |
|
| 71 |
-
wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz
|
| 72 |
|
| 73 |
cd models/Diffusion_Transformer/
|
| 74 |
-
tar -xvf CogVideoX-Fun-2b-InP.tar.gz
|
| 75 |
cd ../../
|
| 76 |
```
|
| 77 |
|
|
@@ -103,8 +104,8 @@ We'd better place the [weights](#model-zoo) along the specified path:
|
|
| 103 |
```
|
| 104 |
📦 models/
|
| 105 |
├── 📂 Diffusion_Transformer/
|
| 106 |
-
│ ├── 📂 CogVideoX-Fun-2b-InP/
|
| 107 |
-
│ └── 📂 CogVideoX-Fun-5b-InP/
|
| 108 |
├── 📂 Personalized_Model/
|
| 109 |
│ └── your trained trainformer model / your trained lora model (for UI load)
|
| 110 |
```
|
|
@@ -112,42 +113,43 @@ We'd better place the [weights](#model-zoo) along the specified path:
|
|
| 112 |
# Video Result
|
| 113 |
The results displayed are all based on image.
|
| 114 |
|
| 115 |
-
### CogVideoX-Fun-5B
|
| 116 |
|
| 117 |
Resolution-1024
|
| 118 |
|
| 119 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 120 |
<tr>
|
| 121 |
<td>
|
| 122 |
-
<video src="https://github.com/user-attachments/assets/
|
| 123 |
</td>
|
| 124 |
<td>
|
| 125 |
-
<video src="https://github.com/user-attachments/assets/
|
| 126 |
</td>
|
| 127 |
<td>
|
| 128 |
-
<video src="https://github.com/user-attachments/assets/
|
| 129 |
</td>
|
| 130 |
<td>
|
| 131 |
-
<video src="https://github.com/user-attachments/assets/
|
| 132 |
</td>
|
| 133 |
</tr>
|
| 134 |
</table>
|
| 135 |
|
|
|
|
| 136 |
Resolution-768
|
| 137 |
|
| 138 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 139 |
<tr>
|
| 140 |
<td>
|
| 141 |
-
<video src="https://github.com/user-attachments/assets/
|
| 142 |
</td>
|
| 143 |
<td>
|
| 144 |
-
<video src="https://github.com/user-attachments/assets/
|
| 145 |
</td>
|
| 146 |
<td>
|
| 147 |
-
<video src="https://github.com/user-attachments/assets/
|
| 148 |
</td>
|
| 149 |
<td>
|
| 150 |
-
<video src="https://github.com/user-attachments/assets/
|
| 151 |
</td>
|
| 152 |
</tr>
|
| 153 |
</table>
|
|
@@ -157,35 +159,89 @@ Resolution-512
|
|
| 157 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 158 |
<tr>
|
| 159 |
<td>
|
| 160 |
-
<video src="https://github.com/user-attachments/assets/
|
| 161 |
</td>
|
| 162 |
<td>
|
| 163 |
-
<video src="https://github.com/user-attachments/assets/
|
| 164 |
</td>
|
| 165 |
<td>
|
| 166 |
-
<video src="https://github.com/user-attachments/assets/
|
| 167 |
</td>
|
| 168 |
<td>
|
| 169 |
-
<video src="https://github.com/user-attachments/assets/
|
| 170 |
</td>
|
| 171 |
</tr>
|
| 172 |
</table>
|
| 173 |
|
| 174 |
-
### CogVideoX-Fun-
|
| 175 |
|
| 176 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 177 |
<tr>
|
| 178 |
<td>
|
| 179 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
</td>
|
| 181 |
<td>
|
| 182 |
-
<video src="https://github.com/user-attachments/assets/
|
| 183 |
</td>
|
| 184 |
<td>
|
| 185 |
-
<video src="https://github.com/user-attachments/assets/
|
| 186 |
</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 187 |
<td>
|
| 188 |
-
<video src="https://github.com/user-attachments/assets/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 189 |
</td>
|
| 190 |
</tr>
|
| 191 |
</table>
|
|
@@ -283,11 +339,22 @@ Then, we run scripts/train.sh.
|
|
| 283 |
sh scripts/train.sh
|
| 284 |
```
|
| 285 |
|
| 286 |
-
For details on setting some parameters, please refer to [Readme Train](scripts/README_TRAIN.md)
|
| 287 |
|
| 288 |
|
| 289 |
# Model zoo
|
| 290 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 291 |
| Name | Storage Space | Hugging Face | Model Scope | Description |
|
| 292 |
|--|--|--|--|--|
|
| 293 |
| CogVideoX-Fun-2b-InP.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. |
|
|
|
|
| 23 |
We will support quick pull-ups from different platforms, refer to [Quick Start](#quick-start).
|
| 24 |
|
| 25 |
What's New:
|
| 26 |
+
- Retrain the i2v model and add noise to increase the motion amplitude of the video. Upload the control model training code and control model. [ 2024.09.29 ]
|
| 27 |
- Create code! Now supporting Windows and Linux. Supports 2b and 5b models. Supports video generation at any resolution from 256x256x49 to 1024x1024x49. [ 2024.09.18 ]
|
| 28 |
|
| 29 |
Function:
|
|
|
|
| 69 |
mkdir models/Diffusion_Transformer
|
| 70 |
mkdir models/Personalized_Model
|
| 71 |
|
| 72 |
+
wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz
|
| 73 |
|
| 74 |
cd models/Diffusion_Transformer/
|
| 75 |
+
tar -xvf CogVideoX-Fun-V1.1-2b-InP.tar.gz
|
| 76 |
cd ../../
|
| 77 |
```
|
| 78 |
|
|
|
|
| 104 |
```
|
| 105 |
📦 models/
|
| 106 |
├── 📂 Diffusion_Transformer/
|
| 107 |
+
│ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
|
| 108 |
+
│ └── 📂 CogVideoX-Fun-V1.1-5b-InP/
|
| 109 |
├── 📂 Personalized_Model/
|
| 110 |
│ └── your trained trainformer model / your trained lora model (for UI load)
|
| 111 |
```
|
|
|
|
| 113 |
# Video Result
|
| 114 |
The results displayed are all based on image.
|
| 115 |
|
| 116 |
+
### CogVideoX-Fun-V1.1-5B
|
| 117 |
|
| 118 |
Resolution-1024
|
| 119 |
|
| 120 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 121 |
<tr>
|
| 122 |
<td>
|
| 123 |
+
<video src="https://github.com/user-attachments/assets/34e7ec8f-293e-4655-bb14-5e1ee476f788" width="100%" controls autoplay loop></video>
|
| 124 |
</td>
|
| 125 |
<td>
|
| 126 |
+
<video src="https://github.com/user-attachments/assets/7809c64f-eb8c-48a9-8bdc-ca9261fd5434" width="100%" controls autoplay loop></video>
|
| 127 |
</td>
|
| 128 |
<td>
|
| 129 |
+
<video src="https://github.com/user-attachments/assets/8e76aaa4-c602-44ac-bcb4-8b24b72c386c" width="100%" controls autoplay loop></video>
|
| 130 |
</td>
|
| 131 |
<td>
|
| 132 |
+
<video src="https://github.com/user-attachments/assets/19dba894-7c35-4f25-b15c-384167ab3b03" width="100%" controls autoplay loop></video>
|
| 133 |
</td>
|
| 134 |
</tr>
|
| 135 |
</table>
|
| 136 |
|
| 137 |
+
|
| 138 |
Resolution-768
|
| 139 |
|
| 140 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 141 |
<tr>
|
| 142 |
<td>
|
| 143 |
+
<video src="https://github.com/user-attachments/assets/0bc339b9-455b-44fd-8917-80272d702737" width="100%" controls autoplay loop></video>
|
| 144 |
</td>
|
| 145 |
<td>
|
| 146 |
+
<video src="https://github.com/user-attachments/assets/70a043b9-6721-4bd9-be47-78b7ec5c27e9" width="100%" controls autoplay loop></video>
|
| 147 |
</td>
|
| 148 |
<td>
|
| 149 |
+
<video src="https://github.com/user-attachments/assets/d5dd6c09-14f3-40f8-8b6d-91e26519b8ac" width="100%" controls autoplay loop></video>
|
| 150 |
</td>
|
| 151 |
<td>
|
| 152 |
+
<video src="https://github.com/user-attachments/assets/9327e8bc-4f17-46b0-b50d-38c250a9483a" width="100%" controls autoplay loop></video>
|
| 153 |
</td>
|
| 154 |
</tr>
|
| 155 |
</table>
|
|
|
|
| 159 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 160 |
<tr>
|
| 161 |
<td>
|
| 162 |
+
<video src="https://github.com/user-attachments/assets/ef407030-8062-454d-aba3-131c21e6b58c" width="100%" controls autoplay loop></video>
|
| 163 |
</td>
|
| 164 |
<td>
|
| 165 |
+
<video src="https://github.com/user-attachments/assets/7610f49e-38b6-4214-aa48-723ae4d1b07e" width="100%" controls autoplay loop></video>
|
| 166 |
</td>
|
| 167 |
<td>
|
| 168 |
+
<video src="https://github.com/user-attachments/assets/1fff0567-1e15-415c-941e-53ee8ae2c841" width="100%" controls autoplay loop></video>
|
| 169 |
</td>
|
| 170 |
<td>
|
| 171 |
+
<video src="https://github.com/user-attachments/assets/bcec48da-b91b-43a0-9d50-cf026e00fa4f" width="100%" controls autoplay loop></video>
|
| 172 |
</td>
|
| 173 |
</tr>
|
| 174 |
</table>
|
| 175 |
|
| 176 |
+
### CogVideoX-Fun-V1.1-5B-Pose
|
| 177 |
|
| 178 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 179 |
<tr>
|
| 180 |
<td>
|
| 181 |
+
Resolution-512
|
| 182 |
+
</td>
|
| 183 |
+
<td>
|
| 184 |
+
Resolution-768
|
| 185 |
+
</td>
|
| 186 |
+
<td>
|
| 187 |
+
Resolution-1024
|
| 188 |
+
</td>
|
| 189 |
+
<tr>
|
| 190 |
+
<td>
|
| 191 |
+
<video src="https://github.com/user-attachments/assets/a746df51-9eb7-4446-bee5-2ee30285c143" width="100%" controls autoplay loop></video>
|
| 192 |
</td>
|
| 193 |
<td>
|
| 194 |
+
<video src="https://github.com/user-attachments/assets/db295245-e6aa-43be-8c81-32cb411f1473" width="100%" controls autoplay loop></video>
|
| 195 |
</td>
|
| 196 |
<td>
|
| 197 |
+
<video src="https://github.com/user-attachments/assets/ec9875b2-fde0-48e1-ab7e-490cee51ef40" width="100%" controls autoplay loop></video>
|
| 198 |
</td>
|
| 199 |
+
</tr>
|
| 200 |
+
</table>
|
| 201 |
+
|
| 202 |
+
### CogVideoX-Fun-V1.1-2B
|
| 203 |
+
|
| 204 |
+
Resolution-768
|
| 205 |
+
|
| 206 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 207 |
+
<tr>
|
| 208 |
+
<td>
|
| 209 |
+
<video src="https://github.com/user-attachments/assets/03235dea-980e-4fc5-9c41-e40a5bc1b6d0" width="100%" controls autoplay loop></video>
|
| 210 |
+
</td>
|
| 211 |
<td>
|
| 212 |
+
<video src="https://github.com/user-attachments/assets/f7302648-5017-47db-bdeb-4d893e620b37" width="100%" controls autoplay loop></video>
|
| 213 |
+
</td>
|
| 214 |
+
<td>
|
| 215 |
+
<video src="https://github.com/user-attachments/assets/cbadf411-28fa-4b87-813d-da63ff481904" width="100%" controls autoplay loop></video>
|
| 216 |
+
</td>
|
| 217 |
+
<td>
|
| 218 |
+
<video src="https://github.com/user-attachments/assets/87cc9d0b-b6fe-4d2d-b447-174513d169ab" width="100%" controls autoplay loop></video>
|
| 219 |
+
</td>
|
| 220 |
+
</tr>
|
| 221 |
+
</table>
|
| 222 |
+
|
| 223 |
+
### CogVideoX-Fun-V1.1-2B-Pose
|
| 224 |
+
|
| 225 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 226 |
+
<tr>
|
| 227 |
+
<td>
|
| 228 |
+
Resolution-512
|
| 229 |
+
</td>
|
| 230 |
+
<td>
|
| 231 |
+
Resolution-768
|
| 232 |
+
</td>
|
| 233 |
+
<td>
|
| 234 |
+
Resolution-1024
|
| 235 |
+
</td>
|
| 236 |
+
<tr>
|
| 237 |
+
<td>
|
| 238 |
+
<video src="https://github.com/user-attachments/assets/487bcd7b-1b7f-4bb4-95b5-96a6b6548b3e" width="100%" controls autoplay loop></video>
|
| 239 |
+
</td>
|
| 240 |
+
<td>
|
| 241 |
+
<video src="https://github.com/user-attachments/assets/2710fd18-8489-46e4-8086-c237309ae7f6" width="100%" controls autoplay loop></video>
|
| 242 |
+
</td>
|
| 243 |
+
<td>
|
| 244 |
+
<video src="https://github.com/user-attachments/assets/b79513db-7747-4512-b86c-94f9ca447fe2" width="100%" controls autoplay loop></video>
|
| 245 |
</td>
|
| 246 |
</tr>
|
| 247 |
</table>
|
|
|
|
| 339 |
sh scripts/train.sh
|
| 340 |
```
|
| 341 |
|
| 342 |
+
For details on setting some parameters, please refer to [Readme Train](scripts/README_TRAIN.md), [Readme Lora](scripts/README_TRAIN_LORA.md) and [Readme Control](scripts/README_TRAIN_CONTROL.md).
|
| 343 |
|
| 344 |
|
| 345 |
# Model zoo
|
| 346 |
|
| 347 |
+
V1.1:
|
| 348 |
+
|
| 349 |
+
| 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
| 350 |
+
|--|--|--|--|--|
|
| 351 |
+
| CogVideoX-Fun-V1.1-2b-InP.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. Noise has been added to the reference image, and the amplitude of motion is greater compared to V1.0. |
|
| 352 |
+
| CogVideoX-Fun-V1.1-5b-InP.tar.gz | Before extraction:16.0 GB \/ After extraction: 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. Noise has been added to the reference image, and the amplitude of motion is greater compared to V1.0. |
|
| 353 |
+
| CogVideoX-Fun-V1.1-2b-Pose.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-Pose) | Our official pose-control video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second.|
|
| 354 |
+
| CogVideoX-Fun-V1.1-5b-Pose.tar.gz | Before extraction:16.0 GB \/ After extraction: 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-Pose) | Our official pose-control video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second.|
|
| 355 |
+
|
| 356 |
+
V1.0:
|
| 357 |
+
|
| 358 |
| Name | Storage Space | Hugging Face | Model Scope | Description |
|
| 359 |
|--|--|--|--|--|
|
| 360 |
| CogVideoX-Fun-2b-InP.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. |
|