ScaleLLM

ScaleLLM is a cutting-edge inference system engineered for large language models (LLMs). It is meticulously designed to meet the demands of production environments and extends its support to a wide range of popular open-source models, including Llama3, Gemma, Bloom, GPT-NeoX, and more.

Note

ScaleLLM is currently in alpha. We are actively working on improving the system and adding new features. If you have any feedback or suggestions, please feel free to reach out to us.

ScaleLLM is available as a Python Wheel package on PyPI. You can install it using pip:

# Install scalellm with CUDA 12.1 and Pytorch 2.4.0
$ pip install -U scalellm

Table of contents

Developer Guide

Reference