You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/10/29 21:47:35 UTC

[GitHub] [kafka] gitlw opened a new pull request #9533: Show log end offset during truncation to help estimate data loss during ULE

gitlw opened a new pull request #9533:
URL: https://github.com/apache/kafka/pull/9533


   During Unclean Leader Election, there could be data loss due to truncation at the resigned leader.
   This PR tries to add more logs to understand the scale of message loss during an unclean leader election.
   
   Suppose there are 3 brokers that has replicas for a given partition:
   Broker A (leader) with largest offset 9 (log end offset 10)
   Broker B (follower) with largest offset 4 (log end offset 5)
   Broker C (follower) with largest offset 1 (log end offset 2)
   
   Only the leader A is in the ISR with B and C lagging behind.
   Now an unclean leader election causes the leadership to be transferred to C. Broker A would need to truncate 8 messages, and Broker B 3 messages.
   
   Case 1: if these messages have been produced with acks=0 or 1, then clients would experience 8 lost messages.
   Case 2: if the client is using acks=all and the partition's minISR setting is 2, and further let's assume broker B dropped out of the ISR after receiving the message with offset 4, then only the messages with offset<=4 have been acked to the client. The truncation effectively causes the client to lose 3 messages.
   
   Knowing the exact amount of data loss involves knowing the client's acks setting when the messages are produced, and also whether the messages have been sufficiently replicated according to the MinISR setting.
   Without getting too involved, this PR reduces the requirement from getting the exact data loss numbers to getting an ESTIMATE of the data loss.
   Specifically this PR adds logs during truncation to show the log end offset, number of messages truncated, and number of bytes truncated.
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] gitlw closed pull request #9533: KAFKA-10751: Generate logs to help estimate the amount of data loss during ULE

Posted by GitBox <gi...@apache.org>.
gitlw closed pull request #9533:
URL: https://github.com/apache/kafka/pull/9533


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org