Post
27
Currently having a blast learning the transformers library.
I noticed that model cards usually have Transformers code as usage examples.
So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp.
Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies.
Once I got the model loaded and sample inference working, I made an API to serve it.
I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working!
Model used: Qwen/Qwen3-VL-8B-Instruct
GPU: NVIDIA GeForce RTX 3090
Here's the result of my experimentation
I noticed that model cards usually have Transformers code as usage examples.
So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp.
Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies.
Once I got the model loaded and sample inference working, I made an API to serve it.
I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working!
Model used: Qwen/Qwen3-VL-8B-Instruct
GPU: NVIDIA GeForce RTX 3090
Here's the result of my experimentation