You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Micah Wylde (JIRA)" <ji...@apache.org> on 2018/10/04 00:25:00 UTC
[jira] [Created] (BEAM-5640) Portable python sdk worker leaks
memory when PyOpenSSL package is present
Micah Wylde created BEAM-5640:
---------------------------------
Summary: Portable python sdk worker leaks memory when PyOpenSSL package is present
Key: BEAM-5640
URL: https://issues.apache.org/jira/browse/BEAM-5640
Project: Beam
Issue Type: Bug
Components: sdk-py-harness
Reporter: Micah Wylde
Assignee: Robert Bradshaw
When PyOpenSSL package is installed on a system (e.g., in a virtualenv) the python sdk_worker process leaks memory. I've validated this when using the flink portable runner in streaming mode, but it may occur in other configurations as well. The leak is pretty significant, amounting to tens of MBs/sec.
I've put together a reproduction for the issue [here|https://github.com/mwylde/beam/tree/micah_memory_leak]. That branch includes a flink streaming data source that generates data, as well as a python pipeline that demonstrates the issue.
To reproduce:
{code:java}
check out the branch:
$ git clone git@github.com:mwylde/beam.git
$ git checkout micah_memory_leak
build the python docker container with pyopenssl installed:
$ cd beam
$ ./gradlew :beam-sdks-python-container:docker
start the job server with embedded flink cluster:
$ ./gradlew runShadow
run the pipeline:
$ ./gradlew :beam-sdks-python:streamingLeak{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)