You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by GitBox <gi...@apache.org> on 2021/12/22 18:37:13 UTC

[GitHub] [samza] dxichen opened a new pull request #1571: Fix rollback stale checkpoints

dxichen opened a new pull request #1571:
URL: https://github.com/apache/samza/pull/1571


   Symptom: Checkpoint v2 preference could cause stale checkpoints to get picked up when newer v1 checkpoints are available
   
   Cause: Kafka checkpoint manager selects the checkpoints based on the `task.checkpoint.read.versions` priority list. This does not account for the fact that the prioritized checkpoint may be stale and a new checkpoint lower on the priority list is present in the checkpoint topic. This will cause the job to silently use the stale checkpoint and cause more reprocessing than the most recent commit. 
   
   An example case where this could happen is when the users upgrades to a v2 checkpoint commits a v2 checkpoint and rollbacks to a previous version using only v1 checkpoint version then upgrading to the newest version again, using that stale v2 checkpoint from the initial upgrade.
   
   Changes:
   Added a `task.live.checkpoint.max.age` default 10 mins (must be > task.commit.ms interval; internal config) which would indicate if a checkpoint has gone stale if:
   a) There exists a checkpoint of another type more recent than it
   b) The diff between new checkpoint's kafka log append time and the prioritized version's log append time > `task.live.checkpoint.max.age'
   
   Tests: Integration tests writing to the checkpoint topic with stale checkpoint v2 and consuming it with a job that prioritize v2 over v1 checkpoints with `task.live.checkpoint.max.age' = 0 asserting it prioritizes v1 checkpoints instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] dxichen merged pull request #1571: Fix rollback stale checkpoints

Posted by GitBox <gi...@apache.org>.
dxichen merged pull request #1571:
URL: https://github.com/apache/samza/pull/1571


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [samza] prateekm commented on a change in pull request #1571: Fix rollback stale checkpoints

Posted by GitBox <gi...@apache.org>.
prateekm commented on a change in pull request #1571:
URL: https://github.com/apache/samza/pull/1571#discussion_r774096693



##########
File path: samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
##########
@@ -82,6 +82,8 @@ class KafkaCheckpointManager(checkpointSpec: KafkaStreamSpec,
   val stopConsumerAfterFirstRead: Boolean = new TaskConfig(config).getCheckpointManagerConsumerStopAfterFirstRead
 
   val checkpointReadVersions: util.List[lang.Short] = new TaskConfig(config).getCheckpointReadVersions
+  val LiveCheckpointMaxAgeMillis: Long = new TaskConfig(config).getLiveCheckpointMaxAgeMillis

Review comment:
       Minor: 'liveCheckpintMaxAgeMillis' (lowercase l)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@samza.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org