-
Notifications
You must be signed in to change notification settings - Fork 97
Description
Description
When an HTTP client sends a POST request to Docker Model Runner's OpenAI-compatible API (/engines/v1/chat/completions) using HTTP/2 cleartext (h2c), the vLLM backend returns HTTP 422 Unprocessable Entity because the request body is silently dropped during the protocol translation in DMR's reverse proxy layer.
The same request succeeds with HTTP/1.1. The llama.cpp backend is not affected — it handles HTTP/2 requests correctly.
Environment
- Docker Desktop: 4.41.2 (macOS, Apple Silicon)
- Docker Model Runner: enabled with vLLM backend
- Model:
docker.io/ai/gemma3-vllm:latest - OS: macOS 15.4 (Sequoia)
Steps to Reproduce
1. HTTP/1.1 works (baseline)
curl -s http://localhost:12434/engines/v1/chat/completions \
--http1.1 \
-H "Content-Type: application/json" \
-d '{"model":"ai/gemma3-vllm:latest","messages":[{"role":"user","content":"Say hello"}]}'
# Returns 200 with valid chat completion response2. HTTP/2 cleartext (h2c) fails
// Java HttpClient — defaults to HTTP/2
HttpClient client = HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_2)
.build();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create("http://localhost:12434/engines/v1/chat/completions"))
.header("Content-Type", "application/json")
.POST(HttpRequest.BodyPublishers.ofString(
"{\"model\":\"ai/gemma3-vllm:latest\",\"messages\":[{\"role\":\"user\",\"content\":\"Say hello\"}]}"
))
.build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
// Returns 422: {"object":"error","message":"[{'type':'missing','loc':['body'],'msg':'Field required'...}]"}3. Forcing HTTP/1.1 fixes it
HttpClient client = HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_1_1) // Force HTTP/1.1
.build();
// Same request now returns 2004. llama.cpp backend is NOT affected
// HTTP/2 against llama.cpp model works fine:
// POST to ai/gemma3:4B → 200 OKRoot Cause Analysis
Java's HttpClient (and other HTTP/2-capable clients) attempts an h2c upgrade (HTTP/2 over cleartext) when connecting to http:// endpoints. Docker Model Runner's reverse proxy accepts the h2c upgrade but appears to drop the POST body when translating the request to the vLLM backend. The vLLM backend then sees an empty body and returns 422 ("Field required").
The llama.cpp backend is unaffected, suggesting the issue is specific to how DMR proxies requests to the vLLM container.
Additional Issue: vLLM ignores stream=true
During testing, I also observed that the vLLM backend ignores "stream": true in the request body. Instead of returning text/event-stream SSE chunks, it returns a single application/json response with object: "chat.completion" (not "chat.completion.chunk"). This forces clients to detect and handle the non-streaming fallback.
Expected Behavior
- HTTP/2 cleartext (h2c) POST requests to DMR should forward the full request body to the vLLM backend, same as HTTP/1.1
"stream": trueshould return SSE chunks (text/event-stream) from the vLLM backend
Impact
Any HTTP client that defaults to HTTP/2 (Java's HttpClient, Go's net/http, many modern HTTP libraries) will fail silently against the vLLM backend. Only clients that explicitly force HTTP/1.1 or don't attempt h2c upgrades (like curl without --http2) will work correctly.