vLLM backend: HTTP/2 (h2c) POST requests return 422 — reverse proxy drops request body

## Description

When an HTTP client sends a `POST` request to Docker Model Runner's OpenAI-compatible API (`/engines/v1/chat/completions`) using **HTTP/2 cleartext (h2c)**, the vLLM backend returns **HTTP 422 Unprocessable Entity** because the request body is silently dropped during the protocol translation in DMR's reverse proxy layer.

The same request succeeds with **HTTP/1.1**. The llama.cpp backend is **not affected** — it handles HTTP/2 requests correctly.

## Environment

- **Docker Desktop**: 4.41.2 (macOS, Apple Silicon)
- **Docker Model Runner**: enabled with vLLM backend
- **Model**: `docker.io/ai/gemma3-vllm:latest`
- **OS**: macOS 15.4 (Sequoia)

## Steps to Reproduce

### 1. HTTP/1.1 works (baseline)

```bash
curl -s http://localhost:12434/engines/v1/chat/completions \
  --http1.1 \
  -H "Content-Type: application/json" \
  -d '{"model":"ai/gemma3-vllm:latest","messages":[{"role":"user","content":"Say hello"}]}'
# Returns 200 with valid chat completion response
```

### 2. HTTP/2 cleartext (h2c) fails

```java
// Java HttpClient — defaults to HTTP/2
HttpClient client = HttpClient.newBuilder()
    .version(HttpClient.Version.HTTP_2)
    .build();

HttpRequest request = HttpRequest.newBuilder()
    .uri(URI.create("http://localhost:12434/engines/v1/chat/completions"))
    .header("Content-Type", "application/json")
    .POST(HttpRequest.BodyPublishers.ofString(
        "{\"model\":\"ai/gemma3-vllm:latest\",\"messages\":[{\"role\":\"user\",\"content\":\"Say hello\"}]}"
    ))
    .build();

HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
// Returns 422: {"object":"error","message":"[{'type':'missing','loc':['body'],'msg':'Field required'...}]"}
```

### 3. Forcing HTTP/1.1 fixes it

```java
HttpClient client = HttpClient.newBuilder()
    .version(HttpClient.Version.HTTP_1_1)  // Force HTTP/1.1
    .build();
// Same request now returns 200
```

### 4. llama.cpp backend is NOT affected

```java
// HTTP/2 against llama.cpp model works fine:
// POST to ai/gemma3:4B → 200 OK
```

## Root Cause Analysis

Java's `HttpClient` (and other HTTP/2-capable clients) attempts an **h2c upgrade** (HTTP/2 over cleartext) when connecting to `http://` endpoints. Docker Model Runner's reverse proxy accepts the h2c upgrade but appears to **drop the POST body** when translating the request to the vLLM backend. The vLLM backend then sees an empty body and returns 422 ("Field required").

The llama.cpp backend is unaffected, suggesting the issue is specific to how DMR proxies requests to the vLLM container.

## Additional Issue: vLLM ignores `stream=true`

During testing, I also observed that the vLLM backend **ignores `"stream": true`** in the request body. Instead of returning `text/event-stream` SSE chunks, it returns a single `application/json` response with `object: "chat.completion"` (not `"chat.completion.chunk"`). This forces clients to detect and handle the non-streaming fallback.

## Expected Behavior

1. HTTP/2 cleartext (h2c) POST requests to DMR should forward the full request body to the vLLM backend, same as HTTP/1.1
2. `"stream": true` should return SSE chunks (`text/event-stream`) from the vLLM backend

## Impact

Any HTTP client that defaults to HTTP/2 (Java's `HttpClient`, Go's `net/http`, many modern HTTP libraries) will fail silently against the vLLM backend. Only clients that explicitly force HTTP/1.1 or don't attempt h2c upgrades (like `curl` without `--http2`) will work correctly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM backend: HTTP/2 (h2c) POST requests return 422 — reverse proxy drops request body #716

Description

Environment

Steps to Reproduce

1. HTTP/1.1 works (baseline)

2. HTTP/2 cleartext (h2c) fails

3. Forcing HTTP/1.1 fixes it

4. llama.cpp backend is NOT affected

Root Cause Analysis

Additional Issue: vLLM ignores `stream=true`

Expected Behavior

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vLLM backend: HTTP/2 (h2c) POST requests return 422 — reverse proxy drops request body #716

Description

Description

Environment

Steps to Reproduce

1. HTTP/1.1 works (baseline)

2. HTTP/2 cleartext (h2c) fails

3. Forcing HTTP/1.1 fixes it

4. llama.cpp backend is NOT affected

Root Cause Analysis

Additional Issue: vLLM ignores stream=true

Expected Behavior

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Additional Issue: vLLM ignores `stream=true`