Skip to content

vLLM backend: HTTP/2 (h2c) POST requests return 422 — reverse proxy drops request body #716

@dougcain

Description

@dougcain

Description

When an HTTP client sends a POST request to Docker Model Runner's OpenAI-compatible API (/engines/v1/chat/completions) using HTTP/2 cleartext (h2c), the vLLM backend returns HTTP 422 Unprocessable Entity because the request body is silently dropped during the protocol translation in DMR's reverse proxy layer.

The same request succeeds with HTTP/1.1. The llama.cpp backend is not affected — it handles HTTP/2 requests correctly.

Environment

  • Docker Desktop: 4.41.2 (macOS, Apple Silicon)
  • Docker Model Runner: enabled with vLLM backend
  • Model: docker.io/ai/gemma3-vllm:latest
  • OS: macOS 15.4 (Sequoia)

Steps to Reproduce

1. HTTP/1.1 works (baseline)

curl -s http://localhost:12434/engines/v1/chat/completions \
  --http1.1 \
  -H "Content-Type: application/json" \
  -d '{"model":"ai/gemma3-vllm:latest","messages":[{"role":"user","content":"Say hello"}]}'
# Returns 200 with valid chat completion response

2. HTTP/2 cleartext (h2c) fails

// Java HttpClient — defaults to HTTP/2
HttpClient client = HttpClient.newBuilder()
    .version(HttpClient.Version.HTTP_2)
    .build();

HttpRequest request = HttpRequest.newBuilder()
    .uri(URI.create("http://localhost:12434/engines/v1/chat/completions"))
    .header("Content-Type", "application/json")
    .POST(HttpRequest.BodyPublishers.ofString(
        "{\"model\":\"ai/gemma3-vllm:latest\",\"messages\":[{\"role\":\"user\",\"content\":\"Say hello\"}]}"
    ))
    .build();

HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
// Returns 422: {"object":"error","message":"[{'type':'missing','loc':['body'],'msg':'Field required'...}]"}

3. Forcing HTTP/1.1 fixes it

HttpClient client = HttpClient.newBuilder()
    .version(HttpClient.Version.HTTP_1_1)  // Force HTTP/1.1
    .build();
// Same request now returns 200

4. llama.cpp backend is NOT affected

// HTTP/2 against llama.cpp model works fine:
// POST to ai/gemma3:4B → 200 OK

Root Cause Analysis

Java's HttpClient (and other HTTP/2-capable clients) attempts an h2c upgrade (HTTP/2 over cleartext) when connecting to http:// endpoints. Docker Model Runner's reverse proxy accepts the h2c upgrade but appears to drop the POST body when translating the request to the vLLM backend. The vLLM backend then sees an empty body and returns 422 ("Field required").

The llama.cpp backend is unaffected, suggesting the issue is specific to how DMR proxies requests to the vLLM container.

Additional Issue: vLLM ignores stream=true

During testing, I also observed that the vLLM backend ignores "stream": true in the request body. Instead of returning text/event-stream SSE chunks, it returns a single application/json response with object: "chat.completion" (not "chat.completion.chunk"). This forces clients to detect and handle the non-streaming fallback.

Expected Behavior

  1. HTTP/2 cleartext (h2c) POST requests to DMR should forward the full request body to the vLLM backend, same as HTTP/1.1
  2. "stream": true should return SSE chunks (text/event-stream) from the vLLM backend

Impact

Any HTTP client that defaults to HTTP/2 (Java's HttpClient, Go's net/http, many modern HTTP libraries) will fail silently against the vLLM backend. Only clients that explicitly force HTTP/1.1 or don't attempt h2c upgrades (like curl without --http2) will work correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions