multiprocessing.Pool workers hold on to response body

In our app, we make use of pull task queues, and we pull large number of tasks with large bodies (e.g. ~5 MB of payloads). I noticed that it appeared we had a memory leak when running this in the `python-compat` runtime, with many megabytes of strings being retained. With a whole lot of hacking, I ended up tracking it down to the following:
- Multiprocessing pool workers hold on to the result objects in their local variables until they get another work item.
- The pool has 100 threads in the python-compat environment. This means it can retain up to ~100 items.
- The requests.Response is the result type that gets cached, and it stores the body of the response.

My hack to fix it: in `google/appengine/ext/vmruntime/vmstub.py`, I set `response._content = None` right after the response protocol buffer message is parsed. This caused our process to use ~100 MB of memory, instead of ~400 MB of memory.

Hacky output from my tool that found the large strings that were being retained, showing what is holding on to it:

```
found new string! len=4653338
referred by 139691536297608 <type 'dict'>
-referred by 139691536340816 <class 'requests_nologs.models.Response'>
--referred by 139691536378120 <type 'tuple'>
---referred by frame /usr/lib/python2.7/multiprocessing/pool.py 102 worker
---referred by frame /usr/lib/python2.7/multiprocessing/pool.py 380 _handle_results
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

multiprocessing.Pool workers hold on to response body #95

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

multiprocessing.Pool workers hold on to response body #95

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions