✨ New Integration: Portkey AI × vLLM

Announcement

vLLM is a fast and easy-to-use library for LLM inference and serving, with State-of-the-art serving throughput.

Portkey now integrates seamlessly with vLLM.

Run Llama, Mistral, Qwen, and other open-source LLMs with Portkey to:

  1. Save $$$s on your GPU costs using semantic caching
  2. Full stack observability (logs, tokens, latency)
  3. Built-in reliability features: load-balancing, fallbacks, etc.
  4. Prompt management & versioning
  5. Guardrails to protect your data

and much more...

Link to docs