Machine Learning Models
Optimization Environment

Kernl is an open source project
that optimizes and accelerates your PyTorch model

from transformers import AutoModel
from kernl.model_optimization import optimize_model

model = AutoModel.from_pretrained(model_name).eval().cuda()
// model optimization in one line of code 🙂
optimize_model(model)

What is kernl?

  • Drop-in solution icon

    Drop-in solution

    Kernl's goal is to optimize the most common models in one line and simplify the way you work.

    Its philosophy is to remain simple and accessible.
    No need to rewrite your PyTorch model, you stay in the comfort of Python to train and infer.

  • Optimization tooling icon

    Optimization tooling

    For advanced cases, Kernl provides resources such as a debugger, tutorials, etc., to allow everyone to tweak and optimize their own models with OpenAI Triton.

    More accessible than CUDA, there is no need to relearn everything, we remain in the world of PyTorch.

  • Performant and efficient solution icon

    Performant & efficient solution

    Kernl is based on kernel fusion and relies on open source technologies, such as CUDA Graphs, OpenAI Triton, TorchDynamo.

    This combination drastically reduces memory accesses, eliminates CPU overhead and ultimately makes the models significantly much faster.

Why kernl?

At Lefebvre Sarrut we dedicate our innovation and R&D initiatives to empower legal professionals, enabling knowledge in law, tax and regulation.

We already run several large language models to make law more accessible.

Our need is to explore and iterate quickly, at low cost, to train and infer our own models without being dependent on other solutions we had used like TensorRT or ONNX.

Our goal with Kernl is to be able to optimize any model, simply and efficiently, while remaining autonomous and independent of complex CUDA code.

Lefebvre Sarrut logo

Open source and ethics

  • Share icon

    Share

    Providing educational materials to help you is one of our goals because sharing is part of our DNA.

    Kernl is part of an Open Source approach because we firmly believe in its virtues of sharing and exchange.

  • Contribution icon

    Contribution

    We are working to make the project as accessible as possible and we encourage everyone to contribute in their own way.

    Please feel free to consult the contribution guide.

  • Ethics icon

    Ethics

    The purpose of kernl is to make latest models more accessible to a wider audience of developers, with time-cost efficiency at heart.

    By doing so, not only we democratize large language models but we also contribute to a more resilient AI ecosystem.

illustration of efficiency

How is it efficient?

Kernel fusion is based on a simple recipe:

  • Make a graph of the model with PyTorch FX and TorchDynamo
  • Identify only costly operations (e.g. Attention, Linear Layer, etc.)
  • Dynamically replace these by an OpenAI Triton operation that fuses them
  • And keep the pre-existing optimizations

This simple recipe drastically reduces GPU memory bandwidth bottleneck and accelerates inference and training.

Pretty crazy performance gains

Kernels fusion is a part of our optimizations. By fusing them, GPU memory accesses are significantly reduced, CPU overhead is eliminated, which reduces inference latency and increases training speed.

For example, Bert is up to 12 times faster than the Hugging Face baseline.

T5 is also 6 times faster (and we are still halfway through the optimizations!).

Contribute in your own way!