A collection of parameter-efficient fine-tuning experiments for sentiment classification using chat-based instruction tuning
Denis Matveev
sodeniZz
AI & ML interests
None yet
Organizations
None yet
Parameter-Efficient Fine-Tuning (LoRA & DoRa & QLoRA)
A collection of parameter-efficient fine-tuning experiments for sentiment classification using chat-based instruction tuning
LLM Course Homework 2: RLHF (DPO & PPO)
The collection includes the DPO-trained model, PPO-trained model, and the Reward Model used for PPO.
models 9
sodeniZz/llm-course-hw3-tinyllama-qlora
Updated
sodeniZz/llm-course-hw3-dora
Text Generation • 0.3B • Updated • 1
sodeniZz/llm-course-hw3-lora
Text Generation • 0.3B • Updated • 1
sodeniZz/llm-course-hw3-tinyllamma-qlora
Updated
sodeniZz/llm-course-hw2-dpo
Text Generation • 0.1B • Updated • 1
sodeniZz/llm-course-hw2-ppo
Text Generation • 0.1B • Updated • 2
sodeniZz/llm-course-hw2-reward-model
Text Classification • 0.1B • Updated
sodeniZz/llm-course-hw1
Updated
sodeniZz/bert-ner-finetuned
33.2M • Updated
datasets 0
None public yet