Model Hosting

150+ models. One endpoint away.

Every model runs behind a hardened inference API. Deploy as a private endpoint or call the shared API directly — both are production-ready.

Deploy a model View API docs

20 models

Meta

Live

Llama 3.1 70B Instruct

70B

params

44ms

P99

128k

context

instructstreaming

Deploy endpoint

Meta

Live

Llama 3.1 8B Instruct

params

18ms

P99

128k

context

instructfast

Deploy endpoint

Meta

Live

Llama 3.2 3B Instruct

params

11ms

P99

128k

context

instructfast

Deploy endpoint

Mistral AI

Live

Mistral 7B v0.3

params

22ms

P99

32k

context

instruct

Deploy endpoint

Mistral AI

Live

Mixtral 8x7B Instruct

47B

params

58ms

P99

32k

context

instructMoE

Deploy endpoint

Google

Live

Gemma 2 27B IT

27B

params

61ms

P99

context

instruct

Deploy endpoint

Google

Live

Gemma 2 9B IT

params

26ms

P99

context

instructfast

Deploy endpoint

Microsoft

Live

Phi-3 Medium 128k

14B

params

33ms

P99

128k

context

instruct

Deploy endpoint

Meta

Live

Llama 3.2 11B Vision

11B

params

88ms

P99

128k

context

visionmultimodal

Deploy endpoint

Meta

Live

Llama 3.2 90B Vision

90B

params

142ms

P99

128k

context

visionmultimodal

Deploy endpoint

OpenGVLab

Live

InternVL2 8B

params

76ms

P99

context

visionmultimodal

Deploy endpoint

Community

Live

LLaVA 1.6 34B

34B

params

98ms

P99

context

vision

Deploy endpoint

Meta

Live

Code Llama 70B Instruct

70B

params

52ms

P99

100k

context

codeinstruct

Deploy endpoint

DeepSeek

Live

DeepSeek Coder V2 Lite

16B

params

39ms

P99

128k

context

codeMoE

Deploy endpoint

BigCode

Live

StarCoder2 15B

15B

params

35ms

P99

16k

context

codefill-in-middle

Deploy endpoint

Alibaba

Live

Qwen2.5 Coder 7B

params

24ms

P99

128k

context

codefast

Deploy endpoint

BAAI

Live

BGE-M3

570M

params

8ms

P99

context

embeddingmultilingual

Deploy endpoint

Microsoft

Live

E5-Mistral 7B Instruct

params

19ms

P99

32k

context

embedding

Deploy endpoint

Nomic

Live

Nomic Embed Text v1.5

137M

params

4ms

P99

context

embeddingfast

Deploy endpoint

Alibaba

Live

GTE-Qwen2 7B Instruct

params

17ms

P99

32k

context

embedding

Deploy endpoint

How it works

Deploy once.
Scale automatically.

Point Kybra at a Hugging Face model ID or your own weights. We handle containerisation, autoscaling, versioning, and routing. You get a stable endpoint URL and a p99 SLA.

One-command deploy from HF Hub or private storage
Automatic container build — no Dockerfile needed
Scales to zero when idle, instant warm-up on request
Version pinning so deployments never break unexpectedly
Private endpoints with mTLS for sensitive workloads

Read hosting docs

Model endpoints6 active

meta/llama-3.1-70b-instruct

Live

3 endpointsp99: 44ms12.4k rpm

mistralai/mistral-7b-v0.3

Live

2 endpointsp99: 31ms8.2k rpm

google/gemma-2-27b-it

Live

1 endpointp99: 58ms3.1k rpm

Your cluster is ready
in 60 seconds.

No credit card required. Scale from a single model endpoint to thousands of GPUs — at your pace.

Create free account Talk to sales

Questions? [email protected]

150+ models. One endpoint away.

Llama 3.1 70B Instruct

Llama 3.1 8B Instruct

Llama 3.2 3B Instruct

Mistral 7B v0.3

Mixtral 8x7B Instruct

Gemma 2 27B IT

Gemma 2 9B IT

Phi-3 Medium 128k

Llama 3.2 11B Vision

Llama 3.2 90B Vision

InternVL2 8B

LLaVA 1.6 34B

Code Llama 70B Instruct

DeepSeek Coder V2 Lite

StarCoder2 15B

Qwen2.5 Coder 7B

BGE-M3

E5-Mistral 7B Instruct

Nomic Embed Text v1.5

GTE-Qwen2 7B Instruct

Deploy once.Scale automatically.

Your cluster is readyin 60 seconds.

Deploy once.
Scale automatically.

Your cluster is ready
in 60 seconds.