Transformers documentation
Overview
Get started
Base classes
Inference
Pipeline API
Generate API
Optimization
OverviewAttention backendsExperts backendsContinuous batchingKernels in transformerstorch.compileTensor parallelismExpert parallelismCache strategiesCachingAssisted decodingGetting the most out of LLMs
Chat with models
Serving
Training
Quantization
Export to production
Resources
Contribute
API
You are viewing v5.0.0rc2 version. A newer version v5.8.1 is available.
Overview
Kernels in transformers are used to optimize the performance of models with custom layers from the hub and very low effort.
Update on GitHub