You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 23:34:45 UTC

[GitHub] [beam] damccorm opened a new issue, #21465: Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle

damccorm opened a new issue, #21465:
URL: https://github.com/apache/beam/issues/21465

   A user noticed that we commit Kafka offsets without any obvious checkpointing. We use a `Reshuffle.byRandomKey()` to cause Dataflow and the SparkRunner to checkpoint. But on runners with non-checkpointing shuffle, this risks data loss.
   
    
   
   The modern solution is to use `{}@RequiresStableInput{`}. This is not perfectly/fully implemented across many runners, so we still need the explicit shuffle for now.
   
    
   
   https://stackoverflow.com/questions/70785672/apache-beam-kafkaio-commit-offset-behaviour
   
   Imported from Jira [BEAM-13715](https://issues.apache.org/jira/browse/BEAM-13715). Original Jira may contain additional context.
   Reported by: kenn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey closed issue #21465: Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle

Posted by GitBox <gi...@apache.org>.
johnjcasey closed issue #21465: Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle
URL: https://github.com/apache/beam/issues/21465


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #21465: Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #21465:
URL: https://github.com/apache/beam/issues/21465#issuecomment-1246011674

   @johnjcasey as an FYI. I believe this may still be an issue. Reshuffle is a non-checkpointing operation but it is commonly used for checkpointing due to Dataflow's history.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on issue #21465: Kafka commit offset drop data on failure for runners that have non-checkpointing shuffle

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on issue #21465:
URL: https://github.com/apache/beam/issues/21465#issuecomment-1246861491

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org