You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2018/12/14 23:19:39 UTC

[GitHub] zsxwing opened a new pull request #23324: [SPARK-26267][SS]Retry when detecting incorrect offsets from Kafka

zsxwing opened a new pull request #23324: [SPARK-26267][SS]Retry when detecting incorrect offsets from Kafka
URL: https://github.com/apache/spark/pull/23324
 
 
   ## What changes were proposed in this pull request?
   
   Due to [KAFKA-7703](https://issues.apache.org/jira/browse/KAFKA-7703), Kafka may return an earliest offset when we are request a latest offset. This will cause Spark to reprocess data.
   
   To reduce the impact of KAFKA-7703, this PR will use the previous offsets we get to audit the result from Kafka. If we find any incorrect offset, we will retry at most `maxOffsetFetchAttempts` times. For the first batch of a new query, as we don't have any previous offsets, we simply fetch offsets twice. This should reduce the chance to hit KAFKA-7703 a lot.
   
   ## How was this patch tested?
   
   Jenkins

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org