You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Hannah Jiang (Jira)" <ji...@apache.org> on 2020/01/10 18:12:00 UTC
[jira] [Comment Edited] (BEAM-7861) Make it easy to change between
multi-process and multi-thread mode for Python Direct runners
[ https://issues.apache.org/jira/browse/BEAM-7861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013105#comment-17013105 ]
Hannah Jiang edited comment on BEAM-7861 at 1/10/20 6:11 PM:
-------------------------------------------------------------
We can use --direct_running_mode to switch between multi_threading and multi_processing.
We can set direct_running_mode to one of ['in_memory', 'multi_threading', 'multi_processing']. Default mode is in_memory.
*in_memory*: it is multi threading mode, worker and runners' communication happens in the memory (not through gRPC).
*multi_threading*: it is multi threading mode, worker and runners communicate through gRPC.
*multi_processing*: it is multi processing, worker and runners communicate through gRPC.
Here is how to set the direct_running_mode.
*Option 1*: set it with pipeline options.
{code:java}
pipeline_options = PipelineOptions(direct_num_workers=2, direct_running_mode='multi_threading')
p = beam.Pipeline(
runner=fn_api_runner.FnApiRunner(),
options=pipeline_options)
{code}
*Option 2*: pass it with CLI.
{code:java}
python xxx --direct_num_workers 2 - -direct_running_mode 'multi_threading'
p = beam.Pipeline(runner=fn_api_runner.FnApiRunner())
{code}
was (Author: hannahjiang):
We can use --direct_running_mode to switch between multi_threading and multi_processing.
We can set direct_running_mode to one of ['in_memory', 'multi_threading', 'multi_processing']. Default mode is in_memory.
*in_memory*: it is multi threading mode, worker and runners' communication happens in the memory (not through gRPC).
*multi_threading*: it is multi threading mode, worker and runners communicate through gRPC.
*multi_processing*: it is multi processing, worker and runners communicate through gRPC.
Here is how to set the direct_running_mode.
*Option 1*: set it with pipeline options.
pipeline_options = PipelineOptions(direct_num_workers=2, direct_running_mode='multi_threading')
p = beam.Pipeline(
runner=fn_api_runner.FnApiRunner(),
options=pipeline_options)
*Option 2*: pass it with CLI.
python xxx --direct_num_workers 2 --direct_running_mode 'multi_threading'
p = beam.Pipeline(runner=fn_api_runner.FnApiRunner())
> Make it easy to change between multi-process and multi-thread mode for Python Direct runners
> --------------------------------------------------------------------------------------------
>
> Key: BEAM-7861
> URL: https://issues.apache.org/jira/browse/BEAM-7861
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Hannah Jiang
> Assignee: Hannah Jiang
> Priority: Major
> Fix For: 2.19.0
>
> Time Spent: 2h 40m
> Remaining Estimate: 0h
>
> BEAM-3645 makes it possible to run a map task parallel.
> However, users need to change runner when switch between multithreading and multiprocessing mode.
> We want to add a flag (ex: --use-multiprocess) to make the switch easy without changing the runner each time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)