You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Micah Wylde (JIRA)" <ji...@apache.org> on 2018/10/04 00:25:00 UTC

[jira] [Created] (BEAM-5640) Portable python sdk worker leaks memory when PyOpenSSL package is present

Micah Wylde created BEAM-5640:
---------------------------------

             Summary: Portable python sdk worker leaks memory when PyOpenSSL package is present
                 Key: BEAM-5640
                 URL: https://issues.apache.org/jira/browse/BEAM-5640
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-harness
            Reporter: Micah Wylde
            Assignee: Robert Bradshaw


When PyOpenSSL package is installed on a system (e.g., in a virtualenv) the python sdk_worker process leaks memory. I've validated this when using the flink portable runner in streaming mode, but it may occur in other configurations as well. The leak is pretty significant, amounting to tens of MBs/sec.

I've put together a reproduction for the issue [here|https://github.com/mwylde/beam/tree/micah_memory_leak]. That branch includes a flink streaming data source that generates data, as well as a python pipeline that demonstrates the issue.

To reproduce:
{code:java}
check out the branch:
$ git clone git@github.com:mwylde/beam.git
$ git checkout micah_memory_leak

build the python docker container with pyopenssl installed:
$ cd beam
$ ./gradlew :beam-sdks-python-container:docker

start the job server with embedded flink cluster:
$ ./gradlew runShadow

run the pipeline:
$ ./gradlew :beam-sdks-python:streamingLeak{code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)