You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Eleanore Jin <el...@gmail.com> on 2020/04/29 17:31:59 UTC
Set parallelism for each operator
Hi all,
I just wonder can Beam allow to set parallelism for each operator
(PTransform) separately? Flink provides such feature.
The usecase I have is the source is kafka topics, which has less
partitions, while we have heavy PTransform and would like to scale it with
more parallelism.
Thanks a lot!
Eleanore
Re: Set parallelism for each operator
Posted by Maximilian Michels <mx...@apache.org>.
Beam and its Flink Runner do not allow setting the parallelism at the
operator level. The wish to configure per-operator came up numerous
times over the years. I'm not opposed to allowing for special cases,
e.g. via a pipeline option.
It doesn't look like it is necessary for the use case discussed here.
-Max
On 06.05.20 18:27, Alexey Romanenko wrote:
> One of the option when reading from topics with a small number of
> partitions could be to do a Reshuffle right after read transform to
> parallelize better other pipeline steps.
>
> We had a discussion in this Jira about that a while ago:
> https://issues.apache.org/jira/browse/BEAM-8121
>
>> On 30 Apr 2020, at 03:56, Eleanore Jin <eleanore.jin@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> Thanks all for the information!
>>
>> Eleanore
>>
>> On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka <goenka@google.com
>> <ma...@google.com>> wrote:
>>
>> Beam does support parallelism for the job which applies to all the
>> transforms in the job when executing on Flink using the
>> "--parallelism" flag.
>>
>> From the usecase you mentioned, Kafka read operations will be over
>> parallelised but it should be ok as they will only have a small
>> amount of memory impact in loading some state for kafka client etc.
>> Also flink can run multiple operations for the same Job in a
>> single task slot so having higher parallelism for lightweight
>> operations should not be a problem.
>>
>> On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <lcwik@google.com
>> <ma...@google.com>> wrote:
>>
>> Beam doesn't expose such a thing directly but the FlinkRunner
>> may be able to take some pipeline options to configure this.
>>
>> On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin
>> <eleanore.jin@gmail.com <ma...@gmail.com>> wrote:
>>
>> Hi Kyle,
>>
>> I am using Flink Runner (v1.8.2)
>>
>> Thanks!
>> Eleanore
>>
>> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver
>> <kcweaver@google.com <ma...@google.com>> wrote:
>>
>> Which runner are you using?
>>
>> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin
>> <eleanore.jin@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> Hi all,
>>
>> I just wonder can Beam allow to set
>> parallelism for each operator (PTransform)
>> separately? Flink provides such feature.
>>
>> The usecase I have is the source is kafka topics,
>> which has less partitions, while we have heavy
>> PTransform and would like to scale it with more
>> parallelism.
>>
>> Thanks a lot!
>> Eleanore
>>
>
Re: Set parallelism for each operator
Posted by Alexey Romanenko <ar...@gmail.com>.
One of the option when reading from topics with a small number of partitions could be to do a Reshuffle right after read transform to parallelize better other pipeline steps.
We had a discussion in this Jira about that a while ago:
https://issues.apache.org/jira/browse/BEAM-8121 <https://issues.apache.org/jira/browse/BEAM-8121>
> On 30 Apr 2020, at 03:56, Eleanore Jin <el...@gmail.com> wrote:
>
> Thanks all for the information!
>
> Eleanore
>
> On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka <goenka@google.com <ma...@google.com>> wrote:
> Beam does support parallelism for the job which applies to all the transforms in the job when executing on Flink using the "--parallelism" flag.
>
> From the usecase you mentioned, Kafka read operations will be over parallelised but it should be ok as they will only have a small amount of memory impact in loading some state for kafka client etc.
> Also flink can run multiple operations for the same Job in a single task slot so having higher parallelism for lightweight operations should not be a problem.
>
> On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
> Beam doesn't expose such a thing directly but the FlinkRunner may be able to take some pipeline options to configure this.
>
> On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin <eleanore.jin@gmail.com <ma...@gmail.com>> wrote:
> Hi Kyle,
>
> I am using Flink Runner (v1.8.2)
>
> Thanks!
> Eleanore
>
> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <kcweaver@google.com <ma...@google.com>> wrote:
> Which runner are you using?
>
> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <eleanore.jin@gmail.com <ma...@gmail.com>> wrote:
> Hi all,
>
> I just wonder can Beam allow to set parallelism for each operator (PTransform) separately? Flink provides such feature.
>
> The usecase I have is the source is kafka topics, which has less partitions, while we have heavy PTransform and would like to scale it with more parallelism.
>
> Thanks a lot!
> Eleanore
Re: Set parallelism for each operator
Posted by Eleanore Jin <el...@gmail.com>.
Thanks all for the information!
Eleanore
On Wed, Apr 29, 2020 at 6:36 PM Ankur Goenka <go...@google.com> wrote:
> Beam does support parallelism for the job which applies to all the
> transforms in the job when executing on Flink using the "--parallelism"
> flag.
>
> From the usecase you mentioned, Kafka read operations will be over
> parallelised but it should be ok as they will only have a small amount of
> memory impact in loading some state for kafka client etc.
> Also flink can run multiple operations for the same Job in a single task
> slot so having higher parallelism for lightweight operations should not be
> a problem.
>
> On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <lc...@google.com> wrote:
>
>> Beam doesn't expose such a thing directly but the FlinkRunner may be able
>> to take some pipeline options to configure this.
>>
>> On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin <el...@gmail.com>
>> wrote:
>>
>>> Hi Kyle,
>>>
>>> I am using Flink Runner (v1.8.2)
>>>
>>> Thanks!
>>> Eleanore
>>>
>>> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <kc...@google.com>
>>> wrote:
>>>
>>>> Which runner are you using?
>>>>
>>>> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <el...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I just wonder can Beam allow to set parallelism for each operator
>>>>> (PTransform) separately? Flink provides such feature.
>>>>>
>>>>> The usecase I have is the source is kafka topics, which has less
>>>>> partitions, while we have heavy PTransform and would like to scale it with
>>>>> more parallelism.
>>>>>
>>>>> Thanks a lot!
>>>>> Eleanore
>>>>>
>>>>
Re: Set parallelism for each operator
Posted by Ankur Goenka <go...@google.com>.
Beam does support parallelism for the job which applies to all the
transforms in the job when executing on Flink using the "--parallelism"
flag.
From the usecase you mentioned, Kafka read operations will be over
parallelised but it should be ok as they will only have a small amount of
memory impact in loading some state for kafka client etc.
Also flink can run multiple operations for the same Job in a single task
slot so having higher parallelism for lightweight operations should not be
a problem.
On Wed, Apr 29, 2020 at 6:28 PM Luke Cwik <lc...@google.com> wrote:
> Beam doesn't expose such a thing directly but the FlinkRunner may be able
> to take some pipeline options to configure this.
>
> On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin <el...@gmail.com>
> wrote:
>
>> Hi Kyle,
>>
>> I am using Flink Runner (v1.8.2)
>>
>> Thanks!
>> Eleanore
>>
>> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <kc...@google.com> wrote:
>>
>>> Which runner are you using?
>>>
>>> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <el...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I just wonder can Beam allow to set parallelism for each operator
>>>> (PTransform) separately? Flink provides such feature.
>>>>
>>>> The usecase I have is the source is kafka topics, which has less
>>>> partitions, while we have heavy PTransform and would like to scale it with
>>>> more parallelism.
>>>>
>>>> Thanks a lot!
>>>> Eleanore
>>>>
>>>
Re: Set parallelism for each operator
Posted by Luke Cwik <lc...@google.com>.
Beam doesn't expose such a thing directly but the FlinkRunner may be able
to take some pipeline options to configure this.
On Wed, Apr 29, 2020 at 5:51 PM Eleanore Jin <el...@gmail.com> wrote:
> Hi Kyle,
>
> I am using Flink Runner (v1.8.2)
>
> Thanks!
> Eleanore
>
> On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <kc...@google.com> wrote:
>
>> Which runner are you using?
>>
>> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <el...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I just wonder can Beam allow to set parallelism for each operator
>>> (PTransform) separately? Flink provides such feature.
>>>
>>> The usecase I have is the source is kafka topics, which has less
>>> partitions, while we have heavy PTransform and would like to scale it with
>>> more parallelism.
>>>
>>> Thanks a lot!
>>> Eleanore
>>>
>>
Re: Set parallelism for each operator
Posted by Eleanore Jin <el...@gmail.com>.
Hi Kyle,
I am using Flink Runner (v1.8.2)
Thanks!
Eleanore
On Wed, Apr 29, 2020 at 10:33 AM Kyle Weaver <kc...@google.com> wrote:
> Which runner are you using?
>
> On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <el...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I just wonder can Beam allow to set parallelism for each operator
>> (PTransform) separately? Flink provides such feature.
>>
>> The usecase I have is the source is kafka topics, which has less
>> partitions, while we have heavy PTransform and would like to scale it with
>> more parallelism.
>>
>> Thanks a lot!
>> Eleanore
>>
>
Re: Set parallelism for each operator
Posted by Kyle Weaver <kc...@google.com>.
Which runner are you using?
On Wed, Apr 29, 2020 at 1:32 PM Eleanore Jin <el...@gmail.com> wrote:
> Hi all,
>
> I just wonder can Beam allow to set parallelism for each operator
> (PTransform) separately? Flink provides such feature.
>
> The usecase I have is the source is kafka topics, which has less
> partitions, while we have heavy PTransform and would like to scale it with
> more parallelism.
>
> Thanks a lot!
> Eleanore
>