This repository was archived by the owner on Jan 12, 2022. It is now read-only.

Description
In our app, we make use of pull task queues, and we pull large number of tasks with large bodies (e.g. ~5 MB of payloads). I noticed that it appeared we had a memory leak when running this in the python-compat runtime, with many megabytes of strings being retained. With a whole lot of hacking, I ended up tracking it down to the following:
- Multiprocessing pool workers hold on to the result objects in their local variables until they get another work item.
- The pool has 100 threads in the python-compat environment. This means it can retain up to ~100 items.
- The requests.Response is the result type that gets cached, and it stores the body of the response.
My hack to fix it: in google/appengine/ext/vmruntime/vmstub.py, I set response._content = None right after the response protocol buffer message is parsed. This caused our process to use ~100 MB of memory, instead of ~400 MB of memory.
Hacky output from my tool that found the large strings that were being retained, showing what is holding on to it:
found new string! len=4653338
referred by 139691536297608 <type 'dict'>
-referred by 139691536340816 <class 'requests_nologs.models.Response'>
--referred by 139691536378120 <type 'tuple'>
---referred by frame /usr/lib/python2.7/multiprocessing/pool.py 102 worker
---referred by frame /usr/lib/python2.7/multiprocessing/pool.py 380 _handle_results