Get started freeSign in

Model Hosting

150+ models. One endpoint away.

Every model runs behind a hardened inference API. Deploy as a private endpoint or call the shared API directly — both are production-ready.

20 models

Meta

Live

Llama 3.1 70B Instruct

70B

params

44ms

P99

128k

context

instructstreaming
Deploy endpoint

Meta

Live

Llama 3.1 8B Instruct

8B

params

18ms

P99

128k

context

instructfast
Deploy endpoint

Meta

Live

Llama 3.2 3B Instruct

3B

params

11ms

P99

128k

context

instructfast
Deploy endpoint

Mistral AI

Live

Mistral 7B v0.3

7B

params

22ms

P99

32k

context

instruct
Deploy endpoint

Mistral AI

Live

Mixtral 8x7B Instruct

47B

params

58ms

P99

32k

context

instructMoE
Deploy endpoint

Google

Live

Gemma 2 27B IT

27B

params

61ms

P99

8k

context

instruct
Deploy endpoint

Google

Live

Gemma 2 9B IT

9B

params

26ms

P99

8k

context

instructfast
Deploy endpoint

Microsoft

Live

Phi-3 Medium 128k

14B

params

33ms

P99

128k

context

instruct
Deploy endpoint

Meta

Live

Llama 3.2 11B Vision

11B

params

88ms

P99

128k

context

visionmultimodal
Deploy endpoint

Meta

Live

Llama 3.2 90B Vision

90B

params

142ms

P99

128k

context

visionmultimodal
Deploy endpoint

OpenGVLab

Live

InternVL2 8B

8B

params

76ms

P99

8k

context

visionmultimodal
Deploy endpoint

Community

Live

LLaVA 1.6 34B

34B

params

98ms

P99

4k

context

vision
Deploy endpoint

Meta

Live

Code Llama 70B Instruct

70B

params

52ms

P99

100k

context

codeinstruct
Deploy endpoint

DeepSeek

Live

DeepSeek Coder V2 Lite

16B

params

39ms

P99

128k

context

codeMoE
Deploy endpoint

BigCode

Live

StarCoder2 15B

15B

params

35ms

P99

16k

context

codefill-in-middle
Deploy endpoint

Alibaba

Live

Qwen2.5 Coder 7B

7B

params

24ms

P99

128k

context

codefast
Deploy endpoint

BAAI

Live

BGE-M3

570M

params

8ms

P99

8k

context

embeddingmultilingual
Deploy endpoint

Microsoft

Live

E5-Mistral 7B Instruct

7B

params

19ms

P99

32k

context

embedding
Deploy endpoint

Nomic

Live

Nomic Embed Text v1.5

137M

params

4ms

P99

8k

context

embeddingfast
Deploy endpoint

Alibaba

Live

GTE-Qwen2 7B Instruct

7B

params

17ms

P99

32k

context

embedding
Deploy endpoint

How it works

Deploy once.
Scale automatically.

Point Kybra at a Hugging Face model ID or your own weights. We handle containerisation, autoscaling, versioning, and routing. You get a stable endpoint URL and a p99 SLA.

  • One-command deploy from HF Hub or private storage
  • Automatic container build — no Dockerfile needed
  • Scales to zero when idle, instant warm-up on request
  • Version pinning so deployments never break unexpectedly
  • Private endpoints with mTLS for sensitive workloads
Model endpoints6 active

meta/llama-3.1-70b-instruct

v3
Live
3 endpointsp99: 44ms12.4k rpm

mistralai/mistral-7b-v0.3

v1
Live
2 endpointsp99: 31ms8.2k rpm

google/gemma-2-27b-it

v2
Live
1 endpointp99: 58ms3.1k rpm

Your cluster is ready
in 60 seconds.

No credit card required. Scale from a single model endpoint to thousands of GPUs — at your pace.

Questions? [email protected]