Skip to main content

Benchmarks

Benchmarks for LiteLLM Gateway (Proxy Server) tested against a fake OpenAI endpoint.

1 Instance LiteLLM Proxy​

MetricLitellm Proxy (1 Instance)
Median Latency (ms)110
RPS68.2

Horizontal Scaling - 10K RPS​

Key Findings​

  • Single instance: 68.2 RPS @ 100ms latency
  • 10 instances: 4.3% efficiency loss (653 RPS vs expected 682 RPS), latency stable at 100ms
  • For 10,000 RPS: Need ~154 instances @ 95.7% efficiency, 100ms latency

2 Instances​

Adding 1 instance, will double the RPS and maintain the 100ms-110ms median latency.

MetricLitellm Proxy (2 Instances)
Median Latency (ms)100
RPS142

10 Instances​

MetricLitellm Proxy (10 Instances)
Median Latency (ms)110
RPS653

Logging Callbacks​

GCS Bucket Logging​

Using GCS Bucket has no impact on latency, RPS compared to Basic Litellm Proxy

MetricBasic Litellm ProxyLiteLLM Proxy with GCS Bucket Logging
RPS1133.21137.3
Median Latency (ms)140138

LangSmith logging​

Using LangSmith has no impact on latency, RPS compared to Basic Litellm Proxy

MetricBasic Litellm ProxyLiteLLM Proxy with LangSmith
RPS1133.21135
Median Latency (ms)140132

Locust Settings​

  • 2500 Users
  • 100 user Ramp Up