Supported ModelsΒΆ

Models

Tensor Parallel

Quantization

Chat API

HF models examples

Aquila

Yes

Yes

Yes

BAAI/Aquila-7B, BAAI/AquilaChat-7B

Bloom

Yes

Yes

No

bigscience/bloom

Baichuan

Yes

Yes

Yes

baichuan-inc/Baichuan2-7B-Chat

ChatGLM3

Yes

Yes

Yes

THUDM/chatglm3-6b

Gemma

Yes

Yes

Yes

google/gemma-2b

GPT_j

Yes

Yes

No

EleutherAI/gpt-j-6b

GPT_NeoX

Yes

Yes

No

EleutherAI/gpt-neox-20b

GPT2

Yes

Yes

No

gpt2

InternLM

Yes

Yes

Yes

internlm/internlm-7b

Llama3/2

Yes

Yes

Yes

meta-llama/Meta-Llama-3.1-8B-Instruct, meta-llama/Meta-Llama-3.1-8B, meta-llama/Llama-2-7b

Mistral

Yes

Yes

Yes

mistralai/Mistral-7B-v0.1

MPT

Yes

Yes

Yes

mosaicml/mpt-30b

Phi2

Yes

Yes

No

microsoft/phi-2

Qwen

Yes

Yes

Yes

Qwen/Qwen-72B-Chat

Yi

Yes

Yes

Yes

01-ai/Yi-6B, 01-ai/Yi-34B-Chat-4bits, 01-ai/Yi-6B-200K

If your model is not included in the supported list, we are more than willing to assist you. Please feel free to create a request for adding a new model on GitHub Issues.