You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/07 09:24:24 UTC

[GitHub] [beam] jeanwisser opened a new issue, #21730: [Bug]: KafkaIO SplittableDoFn not resuming from last committed offset

jeanwisser opened a new issue, #21730:
URL: https://github.com/apache/beam/issues/21730

   ### What happened?
   
   Using `KafkaIO` with [ReadFromKafkaDoFn.java](https://github.com/apache/beam/blob/v2.39.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java) and `commitOffsetsFinalize` should commit offsets of processed messages and if the pipeline is restarted, should resume from the last committed offset.
   
   While committing the offset works correctly, resuming from the latest committed offset does not work because the groupId used are not the same (using 2.39.0).
   
   - Committing the offsets happens in [KafkaCommitOffset.java](https://github.com/apache/beam/blob/v2.39.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaCommitOffset.java)`with a consumer defined as:
    `consumerFactoryFn.apply(updatedConsumerConfig)`
   
   - Reading the startOffset is done in `initialRestriction()` in [ReadFromKafkaDoFn.java](https://github.com/apache/beam/blob/v2.39.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/ReadFromKafkaDoFn.java) with a consumer defined as:
   `consumerFactoryFn.apply(
               KafkaIOUtils.getOffsetConsumerConfig(
                   "initialOffset", offsetConsumerConfig, updatedConsumerConfig)))`
   
   Since the group name of the consumer is not the same as the group name of the consumer fetching the latest offset, the pipeline will always start from the beginning of the (topic,partition) again.
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: io-java-kafka


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on issue #21730: [Bug]: KafkaIO SplittableDoFn not resuming from last committed offset

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on issue #21730:
URL: https://github.com/apache/beam/issues/21730#issuecomment-1246899238

   It was


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on issue #21730: [Bug]: KafkaIO SplittableDoFn not resuming from last committed offset

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on issue #21730:
URL: https://github.com/apache/beam/issues/21730#issuecomment-1151435454

   I am, you can assign this to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj closed issue #21730: [Bug]: KafkaIO SplittableDoFn not resuming from last committed offset

Posted by GitBox <gi...@apache.org>.
chamikaramj closed issue #21730: [Bug]: KafkaIO SplittableDoFn not resuming from last committed offset
URL: https://github.com/apache/beam/issues/21730


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #21730: [Bug]: KafkaIO SplittableDoFn not resuming from last committed offset

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on issue #21730:
URL: https://github.com/apache/beam/issues/21730#issuecomment-1246045098

   I think this was fixed by https://github.com/apache/beam/pull/22450


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #21730: [Bug]: KafkaIO SplittableDoFn not resuming from last committed offset

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on issue #21730:
URL: https://github.com/apache/beam/issues/21730#issuecomment-1151422909

   @johnjcasey I believe you are looking into a similar issue ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #21730: [Bug]: KafkaIO SplittableDoFn not resuming from last committed offset

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on issue #21730:
URL: https://github.com/apache/beam/issues/21730#issuecomment-1151468490

   Done. Feel free mark as a duplicate if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #21730: [Bug]: KafkaIO SplittableDoFn not resuming from last committed offset

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #21730:
URL: https://github.com/apache/beam/issues/21730#issuecomment-1363222858

   There is still a reference to this issue in our code: Update: the reason is here: https://github.com/apache/beam/blob/eb23b0a123ea94429f34385194e8eca7299d66cd/sdks/python/apache_beam/io/kafka.py#L191
   
   Since it is fixed by #22450 is it safe to remove `append_args=['--experiments=use_unbounded_sdf_wrapper']` below that line?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #21730: [Bug]: KafkaIO SplittableDoFn not resuming from last committed offset

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #21730:
URL: https://github.com/apache/beam/issues/21730#issuecomment-1246040912

   Doesn't this mean that some stated functionality just isn't working? Is there a chance of data corruption if the user doesn't notice? That would make this P1 and release blocking, really.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org