You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Eleanore Jin <el...@gmail.com> on 2020/04/29 17:31:59 UTC

Set parallelism for each operator

Hi all,

I just wonder can Beam allow to set parallelism for each operator
(PTransform) separately? Flink provides such feature.

The usecase I have is the source is kafka topics, which has less
partitions, while we have heavy PTransform and would like to scale it with
more parallelism.

Thanks a lot!
Eleanore

Re: Set parallelism for each operator

Posted by Maximilian Michels <mx...@apache.org>.
Beam and its Flink Runner do not allow setting the parallelism at the
operator level. The wish to configure per-operator came up numerous
times over the years. I'm not opposed to allowing  for special cases,
e.g. via a pipeline option.

It doesn't look like it is necessary for the use case discussed here.

-Max

On 06.05.20 18:27, Alexey Romanenko wrote:
> One of the option when reading from topics with a small number of
> partitions could be to do a Reshuffle right after read transform to
> parallelize better other pipeline steps.
> 
> We had a discussion in this Jira about that a while ago:
> https://issues.apache.org/jira/browse/BEAM-8121
> 
>> On 30 Apr 2020, at 03:56, Eleanore Jin <eleanore.jin@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> Thanks all for the information! 
>>
>> Eleanore 
>>
>> On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka <goenka@google.com
>> <ma...@google.com>> wrote:
>>
>>     Beam does support parallelism for the job which applies to all the
>>     transforms in the job when executing on Flink using the
>>     "--parallelism" flag.
>>
>>     From the usecase you mentioned, Kafka read operations will be over
>>     parallelised but it should be ok as they will only have a small
>>     amount of memory impact in loading some state for kafka client etc.
>>     Also flink can run multiple operations for the same Job in a
>>     single task slot so having higher parallelism for lightweight
>>     operations should not be a problem.
>>
>>     On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <lcwik@google.com
>>     <ma...@google.com>> wrote:
>>
>>         Beam doesn't expose such a thing directly but the FlinkRunner
>>         may be able to take some pipeline options to configure this.
>>
>>         On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin
>>         <eleanore.jin@gmail.com <ma...@gmail.com>> wrote:
>>
>>             Hi Kyle, 
>>
>>             I am using Flink Runner (v1.8.2)
>>
>>             Thanks!
>>             Eleanore
>>
>>             On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver
>>             <kcweaver@google.com <ma...@google.com>> wrote:
>>
>>                 Which runner are you using?
>>
>>                 On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin
>>                 <eleanore.jin@gmail.com
>>                 <ma...@gmail.com>> wrote:
>>
>>                     Hi all, 
>>
>>                     I just wonder can Beam allow to set
>>                     parallelism for each operator (PTransform)
>>                     separately? Flink provides such feature. 
>>
>>                     The usecase I have is the source is kafka topics,
>>                     which has less partitions, while we have heavy
>>                     PTransform and would like to scale it with more
>>                     parallelism. 
>>
>>                     Thanks a lot!
>>                     Eleanore
>>
> 

Re: Set parallelism for each operator

Posted by Alexey Romanenko <ar...@gmail.com>.
One of the option when reading from topics with a small number of partitions could be to do a Reshuffle right after read transform to parallelize better other pipeline steps.

We had a discussion in this Jira about that a while ago:
https://issues.apache.org/jira/browse/BEAM-8121 <https://issues.apache.org/jira/browse/BEAM-8121>

> On 30 Apr 2020, at 03:56, Eleanore Jin <el...@gmail.com> wrote:
> 
> Thanks all for the information! 
> 
> Eleanore 
> 
> On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka <goenka@google.com <ma...@google.com>> wrote:
> Beam does support parallelism for the job which applies to all the transforms in the job when executing on Flink using the "--parallelism" flag.
> 
> From the usecase you mentioned, Kafka read operations will be over parallelised but it should be ok as they will only have a small amount of memory impact in loading some state for kafka client etc.
> Also flink can run multiple operations for the same Job in a single task slot so having higher parallelism for lightweight operations should not be a problem.
> 
> On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
> Beam doesn't expose such a thing directly but the FlinkRunner may be able to take some pipeline options to configure this.
> 
> On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin <eleanore.jin@gmail.com <ma...@gmail.com>> wrote:
> Hi Kyle, 
> 
> I am using Flink Runner (v1.8.2)
> 
> Thanks!
> Eleanore
> 
> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <kcweaver@google.com <ma...@google.com>> wrote:
> Which runner are you using?
> 
> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <eleanore.jin@gmail.com <ma...@gmail.com>> wrote:
> Hi all, 
> 
> I just wonder can Beam allow to set parallelism for each operator (PTransform) separately? Flink provides such feature. 
> 
> The usecase I have is the source is kafka topics, which has less partitions, while we have heavy PTransform and would like to scale it with more parallelism. 
> 
> Thanks a lot!
> Eleanore


Re: Set parallelism for each operator

Posted by Eleanore Jin <el...@gmail.com>.
Thanks all for the information!

Eleanore

On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka <go...@google.com> wrote:

> Beam does support parallelism for the job which applies to all the
> transforms in the job when executing on Flink using the "--parallelism"
> flag.
>
> From the usecase you mentioned, Kafka read operations will be over
> parallelised but it should be ok as they will only have a small amount of
> memory impact in loading some state for kafka client etc.
> Also flink can run multiple operations for the same Job in a single task
> slot so having higher parallelism for lightweight operations should not be
> a problem.
>
> On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <lc...@google.com> wrote:
>
>> Beam doesn't expose such a thing directly but the FlinkRunner may be able
>> to take some pipeline options to configure this.
>>
>> On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin <el...@gmail.com>
>> wrote:
>>
>>> Hi Kyle,
>>>
>>> I am using Flink Runner (v1.8.2)
>>>
>>> Thanks!
>>> Eleanore
>>>
>>> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <kc...@google.com>
>>> wrote:
>>>
>>>> Which runner are you using?
>>>>
>>>> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <el...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I just wonder can Beam allow to set parallelism for each operator
>>>>> (PTransform) separately? Flink provides such feature.
>>>>>
>>>>> The usecase I have is the source is kafka topics, which has less
>>>>> partitions, while we have heavy PTransform and would like to scale it with
>>>>> more parallelism.
>>>>>
>>>>> Thanks a lot!
>>>>> Eleanore
>>>>>
>>>>

Re: Set parallelism for each operator

Posted by Ankur Goenka <go...@google.com>.
Beam does support parallelism for the job which applies to all the
transforms in the job when executing on Flink using the "--parallelism"
flag.

From the usecase you mentioned, Kafka read operations will be over
parallelised but it should be ok as they will only have a small amount of
memory impact in loading some state for kafka client etc.
Also flink can run multiple operations for the same Job in a single task
slot so having higher parallelism for lightweight operations should not be
a problem.

On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <lc...@google.com> wrote:

> Beam doesn't expose such a thing directly but the FlinkRunner may be able
> to take some pipeline options to configure this.
>
> On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin <el...@gmail.com>
> wrote:
>
>> Hi Kyle,
>>
>> I am using Flink Runner (v1.8.2)
>>
>> Thanks!
>> Eleanore
>>
>> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <kc...@google.com> wrote:
>>
>>> Which runner are you using?
>>>
>>> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <el...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I just wonder can Beam allow to set parallelism for each operator
>>>> (PTransform) separately? Flink provides such feature.
>>>>
>>>> The usecase I have is the source is kafka topics, which has less
>>>> partitions, while we have heavy PTransform and would like to scale it with
>>>> more parallelism.
>>>>
>>>> Thanks a lot!
>>>> Eleanore
>>>>
>>>

Re: Set parallelism for each operator

Posted by Luke Cwik <lc...@google.com>.
Beam doesn't expose such a thing directly but the FlinkRunner may be able
to take some pipeline options to configure this.

On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin <el...@gmail.com> wrote:

> Hi Kyle,
>
> I am using Flink Runner (v1.8.2)
>
> Thanks!
> Eleanore
>
> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <kc...@google.com> wrote:
>
>> Which runner are you using?
>>
>> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <el...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I just wonder can Beam allow to set parallelism for each operator
>>> (PTransform) separately? Flink provides such feature.
>>>
>>> The usecase I have is the source is kafka topics, which has less
>>> partitions, while we have heavy PTransform and would like to scale it with
>>> more parallelism.
>>>
>>> Thanks a lot!
>>> Eleanore
>>>
>>

Re: Set parallelism for each operator

Posted by Eleanore Jin <el...@gmail.com>.
Hi Kyle,

I am using Flink Runner (v1.8.2)

Thanks!
Eleanore

On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <kc...@google.com> wrote:

> Which runner are you using?
>
> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <el...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I just wonder can Beam allow to set parallelism for each operator
>> (PTransform) separately? Flink provides such feature.
>>
>> The usecase I have is the source is kafka topics, which has less
>> partitions, while we have heavy PTransform and would like to scale it with
>> more parallelism.
>>
>> Thanks a lot!
>> Eleanore
>>
>

Re: Set parallelism for each operator

Posted by Kyle Weaver <kc...@google.com>.
Which runner are you using?

On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <el...@gmail.com> wrote:

> Hi all,
>
> I just wonder can Beam allow to set parallelism for each operator
> (PTransform) separately? Flink provides such feature.
>
> The usecase I have is the source is kafka topics, which has less
> partitions, while we have heavy PTransform and would like to scale it with
> more parallelism.
>
> Thanks a lot!
> Eleanore
>