mpt-7b-instruct-sharded
What are the steps required to replicate this for mpt-7b-instruct?
Hey - if it's useful, I can take a look at replicating this for mpt-7b-instruct, but it might take me some time to get around to it.
The short version of how to DIY this is:
- load the model as it says on the original mosaicML model card
- if you want to have it on the hub, make a new model repo & clone your repo locally
- follow the transformers docs for saving a sharded model checkpoint & save it and the tokenizer to
my_model_dir.- For this, I used
model.save_pretrained(my_model_dir, max_shard_size="2GB"), but you can change the shard size as needed.
- For this, I used
- to add basic support for
device_map="auto", gradient checkpointing, etc., update the relevant.pyfiles as on this model - see the commit history - now you can use it like this one/push to hub/etc
@pszemraj I was able to replicate this easily with the instructions you provided. For anyone interested, the resulting model weights are available at jprafael/mpt-7b-instruct-sharded.
awesome! great stuff. BTW, I am discussing with a user on this discussion post - there may be some additional updates to make sure that everything works with device_map="auto" specifically in the case of a multi-GPU setup. I have tested inference and fine-tuning with a single GPU and everything works fine, so don't worry about this if multi-gpu is irrelevant for you 👍
I'll reply here/ping you if/when that happens, but just FYI.
Currently I'm just using a single GPU, but I'm happy to incorporate the changes on my side when they're done.
will keep you posted!