You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2021/06/25 17:53:47 UTC

[GitHub] [kafka] rajinisivaram opened a new pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

rajinisivaram opened a new pull request #10930:
URL: https://github.com/apache/kafka/pull/10930


   If fetchOffset < startOffset, we currently throw OffsetOutOfRangeException when attempting to read from the log in the regular case. But for diverging epochs, we return Errors.NONE with the new leader start offset, hwm etc.. ReplicaFetcherThread throws OffsetOutOfRangeException when processing responses with Errors.NONE if the leader's offsets in the response are out of range and this moves the partition to failed state. The PR adds a check for this case when processing fetch requests and throws OffsetOutOfRangeException regardless of epoch.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rajinisivaram commented on pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
rajinisivaram commented on pull request #10930:
URL: https://github.com/apache/kafka/pull/10930#issuecomment-869929738


   @guozhangwang With Errors.NONE, we throw OffsetOutOfRangeException in the follower when attempting to update follower's start offset based on the leader's start offset returned in the response: https://github.com/apache/kafka/blob/397fa1f894c176d71601183c36e5d498fc83fd1e/core/src/main/scala/kafka/log/Log.scala#L997.  Since that is a safeguard that existed prior to the new code in the leader to process diverging epochs for IBP 2.7 and higher, it seems safer to retain it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rajinisivaram commented on pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
rajinisivaram commented on pull request #10930:
URL: https://github.com/apache/kafka/pull/10930#issuecomment-870475481


   Test failures are unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rajinisivaram commented on pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
rajinisivaram commented on pull request #10930:
URL: https://github.com/apache/kafka/pull/10930#issuecomment-870376743


   @guozhangwang Yes, that's correct. Will rerun the build. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] guozhangwang commented on pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
guozhangwang commented on pull request #10930:
URL: https://github.com/apache/kafka/pull/10930#issuecomment-869764069






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] guozhangwang commented on pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
guozhangwang commented on pull request #10930:
URL: https://github.com/apache/kafka/pull/10930#issuecomment-870715774


   Merged to trunk. Thanks @rajinisivaram !
   
   Should we cherry-pick to older branches too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rajinisivaram commented on pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
rajinisivaram commented on pull request #10930:
URL: https://github.com/apache/kafka/pull/10930#issuecomment-869525608


   @rite2nikhil @guozhangwang @showuon Thanks for the reviews.
   
   @guozhangwang We throw `UnexpectedAppendOffsetException` only if we have data records with the wrong offset. In this case, ReplicaFetcherThread is requesting an offset that is lower than the start offset in the leader since leader's start offset was changed (e.g. fetcher thread is requesting offset 100, but leader's start offset is 200). We don't return any records since leader doesn't have records for the offset 100. Typically, leader would go through Log.read (https://github.com/apache/kafka/blob/bd1ee02b87ea508c1372af2d3982a8919e375b2d/core/src/main/scala/kafka/log/Log.scala#L1213) and throw OffsetOutOfRangeException that returns Errors.OFFSET_OUT_OF_RANGE. ReplicaFetcherThread already has special handling of this error (https://github.com/apache/kafka/blob/bd1ee02b87ea508c1372af2d3982a8919e375b2d/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L391). In case there was also a diverging epoch at the time, leader currently returns early with diverging epoch metadata 
 and Errors.NONE.   The change in this PR ensures that we throw OffsetOutOfRangeException in this case as well so that ReplicaFetcherThread applies the logic for OFFSET_OUT_OF_RANGE. Hope that makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rajinisivaram commented on pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
rajinisivaram commented on pull request #10930:
URL: https://github.com/apache/kafka/pull/10930#issuecomment-870776438


   Thanks @guozhangwang ! Yes, we should cherry-pick to 2.8 and 2.7. I can do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] guozhangwang commented on pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
guozhangwang commented on pull request #10930:
URL: https://github.com/apache/kafka/pull/10930#issuecomment-870078185


   > @guozhangwang With Errors.NONE, we throw OffsetOutOfRangeException in the follower when attempting to update follower's start offset based on the leader's start offset returned in the response:
   
   I see, I thought you meant there are some conditions on the follower's side that still can protect us from not capturing this error. Now that I realized this condition may or may not hit really, but in either case it's bad:
   
   1. If it is not hit, we would ended up not capturing this error and proceed as if nothing went wrong.
   2. If it is hit, we throw OOO to capture, on follower's side, but also moved the partition to failed state and we would not be able to recover from that state.
   
   If my understanding here is correct, I think I can go ahead and merge the PR. BTW could you re-trigger the unit tests?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] guozhangwang commented on pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
guozhangwang commented on pull request #10930:
URL: https://github.com/apache/kafka/pull/10930#issuecomment-869764069


   @rajinisivaram I think my confusion comes from `ReplicaFetcherThread throws OffsetOutOfRangeException when processing responses with Errors.NONE if the leader's offsets in the response are out of range and this moves the partition to failed state. ` Could you point me to the code where this is currently happening? Also I'm wondering since we are fixing the logic on the leader now, if this logic did exist do we still need it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] rajinisivaram commented on pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
rajinisivaram commented on pull request #10930:
URL: https://github.com/apache/kafka/pull/10930#issuecomment-869929738






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [kafka] guozhangwang merged pull request #10930: KAFKA-12996; Return OFFSET_OUT_OF_RANGE for fetchOffset < startOffset even for diverging epochs

Posted by GitBox <gi...@apache.org>.
guozhangwang merged pull request #10930:
URL: https://github.com/apache/kafka/pull/10930


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org