Quick Start

Installation

Install with pip

ScaleLLM is available as a Python Wheel package on PyPI. You can install it using pip:

# Install ScaleLLM with CUDA 12.4 and Pytorch 2.5.1
$ pip install scalellm

Install other versions

If you want to install ScaleLLM with different versions of CUDA and PyTorch, you can use pip by providing the index URL of the desired version.

$ pip install -U scalellm -i https://whl.vectorch.com/cu124/torch2.5.1/

Build from source

If no wheel package is available for your configuration, you can build ScaleLLM from source code. Clone the repository and install it locally using the following commands:

$ git clone --recursive https://github.com/vectorch-ai/ScaleLLM.git
$ cd ScaleLLM
$ python3 setup.py bdist_wheel
$ pip install dist/scalellm-*.whl

Inference

You can use ScaleLLM for offline batch inference or online distributed inference.

OpenAI-Compatible Server

To start a server that is compatible with the OpenAI API, run the following command:

$ python3 -m scalellm.serve.api_server --model=meta-llama/Meta-Llama-3.1-8B-Instruct