ScaleLLM

ScaleLLM is a cutting-edge inference system engineered for large language models (LLMs), meticulously designed to meet the demands of production environments. It extends its support to a wide range of popular open-source models, including Llama3, Gemma, Bloom, GPT-NeoX, and more.

Note

ScaleLLM is currently in alpha. We are actively working on improving the system and adding new features. If you have any feedback or suggestions, please feel free to reach out to us.

Table of contents

Developer Guide

Reference