You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2019/09/04 19:05:00 UTC

[jira] [Commented] (BEAM-7848) Add possibility to manage quantity of instances (threads) per worker in Python SDK

    [ https://issues.apache.org/jira/browse/BEAM-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922762#comment-16922762 ] 

Luke Cwik commented on BEAM-7848:
---------------------------------

This is available via the worker_threads experiment, see more details here https://github.com/apache/beam/blob/bb278f4a762eae767e9d052374cda98e90733b43/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L186

> Add possibility to manage quantity of instances (threads) per worker in Python SDK
> ----------------------------------------------------------------------------------
>
>                 Key: BEAM-7848
>                 URL: https://issues.apache.org/jira/browse/BEAM-7848
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>         Environment: Python SDK
> ApacheBeam version==2.13.0
> worker_type==n1-standard-4
>            Reporter: Severyn Parkhomenko
>            Priority: Major
>         Attachments: Selection_042.png
>
>
> I'm developing a streaming pipeline with big memory consumption in one of the PTransforms. 
>  After some period after starting this pipeline fails without any specific logs (see attachment file)
> It looks like, that it happens because of OutOfMemory.
> It would be great to set a limit of threads that will be used in a single worker to control memory load.
>  I found such option in JAVA SDK (--_numberOfWorkerHarnessThreads_), but in Python SDK it is absent



--
This message was sent by Atlassian Jira
(v8.3.2#803003)