You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Evan Galpin <eg...@apache.org> on 2022/11/10 23:30:25 UTC

[KafkaIO] Use of sinkGroupId with Exactly Once Semantics

Hey folks,

I can see in the docs for "withEOS"[1] in the KafkaIO#Write section that
the sinkGroupId is recommended to be unique per job.  I'm wondering about a
case where a single job outputs to multiple topics.  Would it be advisable
to have a unique sinkGroupId per instance of KafkaIO#Write transform, or
still only per job even if the job has multiple KafkaIO#Write?

Thanks in advance!

[1]
https://beam.apache.org/releases/javadoc/2.41.0/org/apache/beam/sdk/io/kafka/KafkaIO.WriteRecords.html#withEOS-int-java.lang.String-

Re: [KafkaIO] Use of sinkGroupId with Exactly Once Semantics

Posted by Evan Galpin <eg...@apache.org>.
Thanks for the input folks! Sounds like a singular value across my
application (despite that application writing to multiple topics) is what I
want.

On Fri, Nov 11, 2022 at 1:34 PM Byron Ellis via user <us...@beam.apache.org>
wrote:

> The Kafka consumer offset key
> <https://github.com/apache/kafka/blob/trunk/core/src/main/resources/common/message/OffsetCommitKey.json> is
> (group, topic, partition)
>
> On Fri, Nov 11, 2022 at 7:58 AM John Casey via user <us...@beam.apache.org>
> wrote:
>
>> I haven't done this experimentally before, so take this with a grain of
>> salt, but...
>>
>> Kafka Group Ids are essentially used to track where a logical (aka
>> application level, not thread/machine level) producer / consumer is at. As
>> such, I think it would be fine to use just one group id, even when writing
>> to multiple topics
>>
>> On Thu, Nov 10, 2022 at 6:30 PM Evan Galpin <eg...@apache.org> wrote:
>>
>>> Hey folks,
>>>
>>> I can see in the docs for "withEOS"[1] in the KafkaIO#Write section that
>>> the sinkGroupId is recommended to be unique per job.  I'm wondering about a
>>> case where a single job outputs to multiple topics.  Would it be advisable
>>> to have a unique sinkGroupId per instance of KafkaIO#Write transform, or
>>> still only per job even if the job has multiple KafkaIO#Write?
>>>
>>> Thanks in advance!
>>>
>>> [1]
>>> https://beam.apache.org/releases/javadoc/2.41.0/org/apache/beam/sdk/io/kafka/KafkaIO.WriteRecords.html#withEOS-int-java.lang.String-
>>>
>>

Re: [KafkaIO] Use of sinkGroupId with Exactly Once Semantics

Posted by Byron Ellis via user <us...@beam.apache.org>.
The Kafka consumer offset key
<https://github.com/apache/kafka/blob/trunk/core/src/main/resources/common/message/OffsetCommitKey.json>
is
(group, topic, partition)

On Fri, Nov 11, 2022 at 7:58 AM John Casey via user <us...@beam.apache.org>
wrote:

> I haven't done this experimentally before, so take this with a grain of
> salt, but...
>
> Kafka Group Ids are essentially used to track where a logical (aka
> application level, not thread/machine level) producer / consumer is at. As
> such, I think it would be fine to use just one group id, even when writing
> to multiple topics
>
> On Thu, Nov 10, 2022 at 6:30 PM Evan Galpin <eg...@apache.org> wrote:
>
>> Hey folks,
>>
>> I can see in the docs for "withEOS"[1] in the KafkaIO#Write section that
>> the sinkGroupId is recommended to be unique per job.  I'm wondering about a
>> case where a single job outputs to multiple topics.  Would it be advisable
>> to have a unique sinkGroupId per instance of KafkaIO#Write transform, or
>> still only per job even if the job has multiple KafkaIO#Write?
>>
>> Thanks in advance!
>>
>> [1]
>> https://beam.apache.org/releases/javadoc/2.41.0/org/apache/beam/sdk/io/kafka/KafkaIO.WriteRecords.html#withEOS-int-java.lang.String-
>>
>

Re: [KafkaIO] Use of sinkGroupId with Exactly Once Semantics

Posted by John Casey via user <us...@beam.apache.org>.
I haven't done this experimentally before, so take this with a grain of
salt, but...

Kafka Group Ids are essentially used to track where a logical (aka
application level, not thread/machine level) producer / consumer is at. As
such, I think it would be fine to use just one group id, even when writing
to multiple topics

On Thu, Nov 10, 2022 at 6:30 PM Evan Galpin <eg...@apache.org> wrote:

> Hey folks,
>
> I can see in the docs for "withEOS"[1] in the KafkaIO#Write section that
> the sinkGroupId is recommended to be unique per job.  I'm wondering about a
> case where a single job outputs to multiple topics.  Would it be advisable
> to have a unique sinkGroupId per instance of KafkaIO#Write transform, or
> still only per job even if the job has multiple KafkaIO#Write?
>
> Thanks in advance!
>
> [1]
> https://beam.apache.org/releases/javadoc/2.41.0/org/apache/beam/sdk/io/kafka/KafkaIO.WriteRecords.html#withEOS-int-java.lang.String-
>