You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2019/09/04 19:05:00 UTC
[jira] [Commented] (BEAM-7848) Add possibility to manage quantity
of instances (threads) per worker in Python SDK
[ https://issues.apache.org/jira/browse/BEAM-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922762#comment-16922762 ]
Luke Cwik commented on BEAM-7848:
---------------------------------
This is available via the worker_threads experiment, see more details here https://github.com/apache/beam/blob/bb278f4a762eae767e9d052374cda98e90733b43/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L186
> Add possibility to manage quantity of instances (threads) per worker in Python SDK
> ----------------------------------------------------------------------------------
>
> Key: BEAM-7848
> URL: https://issues.apache.org/jira/browse/BEAM-7848
> Project: Beam
> Issue Type: Improvement
> Components: runner-dataflow
> Environment: Python SDK
> ApacheBeam version==2.13.0
> worker_type==n1-standard-4
> Reporter: Severyn Parkhomenko
> Priority: Major
> Attachments: Selection_042.png
>
>
> I'm developing a streaming pipeline with big memory consumption in one of the PTransforms.
> After some period after starting this pipeline fails without any specific logs (see attachment file)
> It looks like, that it happens because of OutOfMemory.
> It would be great to set a limit of threads that will be used in a single worker to control memory load.
> I found such option in JAVA SDK (--_numberOfWorkerHarnessThreads_), but in Python SDK it is absent
--
This message was sent by Atlassian Jira
(v8.3.2#803003)