You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Robert Bradshaw <ro...@google.com> on 2021/11/23 01:05:16 UTC

Re:

The source code can be found at
https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java

In a nutshell, a full set of (topic, partionIndex) pairs gets shuffled
and assigned to the workers randomly, each of which gets processed by
a SplittableDoFn that emits the actual data on that topic/partition.

On Mon, Nov 22, 2021 at 12:07 PM Antonio Si <an...@gmail.com> wrote:
>
> Hello Beam community,
>
> I am experimenting with the new SDF KafkaIO in a bit more detail. I have a quick question. How are the topics and partitions got assigned to each Task Manager?
>
> Can someone point me to the code?
>
> Thanks in advance.
>
> Antonio.

Re:

Posted by Antonio Si <an...@gmail.com>.
Thank you. Really appreciate the information.

Antonio.

On Mon, Nov 22, 2021 at 5:35 PM Robert Bradshaw <ro...@google.com> wrote:

> This is inherent to the way that SDF operate. Essentially,
>
>     ParDo(someSdfInstance)
>
> gets expanded into
>
>    Map(element -> (element,
> restriction)).shuffle().FlatMap(someSdfInstance.process)
>
> On Mon, Nov 22, 2021 at 5:28 PM Antonio Si <an...@gmail.com> wrote:
> >
> > Thanks Robert. Do you mind pointing me to the code that shuffled and
> assigned to the workers?
> > I trace through the code where it loops through the topic partitions and
> for each, create a KafkaSourceDescriptor
> > and calls ReadFromKafkaDoFn.initialRestriction(). But I couldn't find
> out where it got shuffled and assigned to the workers.
> >
> > Thank you very much.
> >
> > Antonio.
> >
> > On Mon, Nov 22, 2021 at 5:05 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >>
> >> The source code can be found at
> >>
> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
> >>
> >> In a nutshell, a full set of (topic, partionIndex) pairs gets shuffled
> >> and assigned to the workers randomly, each of which gets processed by
> >> a SplittableDoFn that emits the actual data on that topic/partition.
> >>
> >> On Mon, Nov 22, 2021 at 12:07 PM Antonio Si <an...@gmail.com>
> wrote:
> >> >
> >> > Hello Beam community,
> >> >
> >> > I am experimenting with the new SDF KafkaIO in a bit more detail. I
> have a quick question. How are the topics and partitions got assigned to
> each Task Manager?
> >> >
> >> > Can someone point me to the code?
> >> >
> >> > Thanks in advance.
> >> >
> >> > Antonio.
>

Re:

Posted by Robert Bradshaw <ro...@google.com>.
This is inherent to the way that SDF operate. Essentially,

    ParDo(someSdfInstance)

gets expanded into

   Map(element -> (element,
restriction)).shuffle().FlatMap(someSdfInstance.process)

On Mon, Nov 22, 2021 at 5:28 PM Antonio Si <an...@gmail.com> wrote:
>
> Thanks Robert. Do you mind pointing me to the code that shuffled and assigned to the workers?
> I trace through the code where it loops through the topic partitions and for each, create a KafkaSourceDescriptor
> and calls ReadFromKafkaDoFn.initialRestriction(). But I couldn't find out where it got shuffled and assigned to the workers.
>
> Thank you very much.
>
> Antonio.
>
> On Mon, Nov 22, 2021 at 5:05 PM Robert Bradshaw <ro...@google.com> wrote:
>>
>> The source code can be found at
>> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
>>
>> In a nutshell, a full set of (topic, partionIndex) pairs gets shuffled
>> and assigned to the workers randomly, each of which gets processed by
>> a SplittableDoFn that emits the actual data on that topic/partition.
>>
>> On Mon, Nov 22, 2021 at 12:07 PM Antonio Si <an...@gmail.com> wrote:
>> >
>> > Hello Beam community,
>> >
>> > I am experimenting with the new SDF KafkaIO in a bit more detail. I have a quick question. How are the topics and partitions got assigned to each Task Manager?
>> >
>> > Can someone point me to the code?
>> >
>> > Thanks in advance.
>> >
>> > Antonio.

Re:

Posted by Antonio Si <an...@gmail.com>.
Thanks Robert. Do you mind pointing me to the code that shuffled and
assigned to the workers?
I trace through the code where it loops through the topic partitions and
for each, create a KafkaSourceDescriptor
and calls ReadFromKafkaDoFn.initialRestriction(). But I couldn't find out
where it got shuffled and assigned to the workers.

Thank you very much.

Antonio.

On Mon, Nov 22, 2021 at 5:05 PM Robert Bradshaw <ro...@google.com> wrote:

> The source code can be found at
>
> https://github.com/apache/beam/blob/master/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java
>
> In a nutshell, a full set of (topic, partionIndex) pairs gets shuffled
> and assigned to the workers randomly, each of which gets processed by
> a SplittableDoFn that emits the actual data on that topic/partition.
>
> On Mon, Nov 22, 2021 at 12:07 PM Antonio Si <an...@gmail.com> wrote:
> >
> > Hello Beam community,
> >
> > I am experimenting with the new SDF KafkaIO in a bit more detail. I have
> a quick question. How are the topics and partitions got assigned to each
> Task Manager?
> >
> > Can someone point me to the code?
> >
> > Thanks in advance.
> >
> > Antonio.
>