You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/08 04:03:09 UTC

[GitHub] [beam] deepix opened a new issue, #21742: [Task]: Warn about data loss with Beam Kafka I/O and Flink when checkpoints are enabled

deepix opened a new issue, #21742:
URL: https://github.com/apache/beam/issues/21742

   ### What needs to happen?
   
   Under the following conditions, Beam can lose data on Kafka, i.e. it will mark offsets as processed even when the pipeline has not processed them.
   
     * Checkpointing is enabled: checkpointing_interval is set for Flink runner,
     * Kafka consumer config: enable.auto.commit set to true (default),
   auto.offset.reset set to latest (default)
     * ReadFromKafka(): commit_offset_in_finalize set to false (default)
     * No successful checkpoints (i.e. every checkpoint times out)
   
   This can be fixed with the following config:
     * commit_offset_in_finalize is true in ReadFromKafka(),
     * enable.auto.commit is false in Kafka consumer config
     * auto.offset.reset set to none in Kafka consumer config
   
   Therefore, when checkpointing_interval is set, and user calls ReadFromKafka(), we should check and warn about data loss if any of the above 3 config pieces is incorrect. All of them are required to be set correctly.
   
   More context in this mailing list thread: https://lists.apache.org/thread/r12sqodn9xojr3fhlkgjyq6sx0phyh7n
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: io-java-kafka


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on issue #21742: [Task]: Warn about data loss with Beam Kafka I/O and Flink when checkpoints are enabled

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on issue #21742:
URL: https://github.com/apache/beam/issues/21742#issuecomment-1150270589

   I can, can you assign this to me?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on issue #21742: [Task]: Warn about data loss with Beam Kafka I/O and Flink when checkpoints are enabled

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on issue #21742:
URL: https://github.com/apache/beam/issues/21742#issuecomment-1150221284

   @johnjcasey will you be able to look into this and add a warning for potentially unsafe cases ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem closed issue #21742: [Task]: Warn about data loss with Beam Kafka I/O and Flink when checkpoints are enabled

Posted by GitBox <gi...@apache.org>.
pabloem closed issue #21742: [Task]: Warn about data loss with Beam Kafka I/O and Flink when checkpoints are enabled
URL: https://github.com/apache/beam/issues/21742


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org