Model Hosting
150+ models. One endpoint away.
Every model runs behind a hardened inference API. Deploy as a private endpoint or call the shared API directly — both are production-ready.
20 models
How it works
Deploy once.
Scale automatically.
Point Kybra at a Hugging Face model ID or your own weights. We handle containerisation, autoscaling, versioning, and routing. You get a stable endpoint URL and a p99 SLA.
- One-command deploy from HF Hub or private storage
- Automatic container build — no Dockerfile needed
- Scales to zero when idle, instant warm-up on request
- Version pinning so deployments never break unexpectedly
- Private endpoints with mTLS for sensitive workloads
Model endpoints6 active
meta/llama-3.1-70b-instruct
v33 endpointsp99: 44ms12.4k rpm
mistralai/mistral-7b-v0.3
v12 endpointsp99: 31ms8.2k rpm
google/gemma-2-27b-it
v21 endpointp99: 58ms3.1k rpm
Your cluster is ready
in 60 seconds.
No credit card required. Scale from a single model endpoint to thousands of GPUs — at your pace.
Questions? [email protected]