You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "hudeqi (via GitHub)" <gi...@apache.org> on 2023/05/10 03:07:19 UTC

[GitHub] [kafka] hudeqi opened a new pull request, #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

hudeqi opened a new pull request, #13696:
URL: https://github.com/apache/kafka/pull/13696

   When the partitions of ReplicaFetcherThread finished truncating, the ReplicaAlterLogDirsThread to which these partitions belong needs to be marked truncate. The lag value in the newState (PartitionFetchState) obtained in this process is still the original value (state.lag). If the truncationOffset is smaller than the original state.fetchOffset, then the original lag value is incorrect and needs to be updated. It should be the original lag value plus the difference between the original state.fetchOffset and truncationOffset.
   
   If the original lag value is used incorrectly, then ReplicaAlterLogDirsThread may set the wrong lag value for fetcherLagStats when executing "processFetchRequest". So it might be more reasonable to recalculate a lag value based on new state.fetchOffset.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1551312847

   hi, do you have time to review this issue? @showuon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1596776203

   I am so sorry, I will update the issue and description later, I think the current description is a bit problematic and unclear. @viktorsomogyi @Hangleton 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] viktorsomogyi commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "viktorsomogyi (via GitHub)" <gi...@apache.org>.
viktorsomogyi commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1594603126

   @hudeqi I can get to this early next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi commented on a diff in pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi commented on code in PR #13696:
URL: https://github.com/apache/kafka/pull/13696#discussion_r1205574820


##########
core/src/main/scala/kafka/server/AbstractFetcherThread.scala:
##########
@@ -460,8 +460,16 @@ abstract class AbstractFetcherThread(name: String,
     partitionMapLock.lockInterruptibly()
     try {
       Option(partitionStates.stateValue(topicPartition)).foreach { state =>
-        val newState = PartitionFetchState(state.topicId, math.min(truncationOffset, state.fetchOffset),
-          state.lag, state.currentLeaderEpoch, state.delay, state = Truncating,
+        var lag = state.lag

Review Comment:
   > Isn't the lag recalculated on the next fetch iteration ([here](https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L362))?
   
   Hi Hangleton, thanks for your review! As you said, under normal circumstances, the lag will be recalculated on the next fetch. However, if an exception occurs before updating the lag (for example, processPartitionData throws an exception due to a disk error) or the return value of logAppendInfoOpt is None, then the lag value will always be or last for a period of time as the previous error value (obviously truncate, lag display is still very large, which seems to deviate greatly from our understanding). Therefore, the modification here is a defensive measure that can effectively avoid such situations. @Hangleton 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] viktorsomogyi commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "viktorsomogyi (via GitHub)" <gi...@apache.org>.
viktorsomogyi commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1596773263

   @Hangleton sorry, I just noticed I somehow removed you unintentionally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi closed pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi closed pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread
URL: https://github.com/apache/kafka/pull/13696


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1543392891

   Does anyone care about this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1580752286

   Hello, please help to review this PR if you have time, thank you! @satishd 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1571226445

   > Hi, do you have time to review this? @divijvaidya
   
   Hi, is this free to review? @divijvaidya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi commented on a diff in pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi commented on code in PR #13696:
URL: https://github.com/apache/kafka/pull/13696#discussion_r1221534901


##########
core/src/main/scala/kafka/server/AbstractFetcherThread.scala:
##########
@@ -460,8 +460,16 @@ abstract class AbstractFetcherThread(name: String,
     partitionMapLock.lockInterruptibly()
     try {
       Option(partitionStates.stateValue(topicPartition)).foreach { state =>
-        val newState = PartitionFetchState(state.topicId, math.min(truncationOffset, state.fetchOffset),
-          state.lag, state.currentLeaderEpoch, state.delay, state = Truncating,
+        var lag = state.lag

Review Comment:
   @Hangleton hi! Are you still following this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1582997213

   This minor pr is also about replica fetcher thread, please help to review, thanks! @viktorsomogyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1555415608

   Hi, do you have time to review this? @divijvaidya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] Hangleton commented on a diff in pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "Hangleton (via GitHub)" <gi...@apache.org>.
Hangleton commented on code in PR #13696:
URL: https://github.com/apache/kafka/pull/13696#discussion_r1205272856


##########
core/src/main/scala/kafka/server/AbstractFetcherThread.scala:
##########
@@ -460,8 +460,16 @@ abstract class AbstractFetcherThread(name: String,
     partitionMapLock.lockInterruptibly()
     try {
       Option(partitionStates.stateValue(topicPartition)).foreach { state =>
-        val newState = PartitionFetchState(state.topicId, math.min(truncationOffset, state.fetchOffset),
-          state.lag, state.currentLeaderEpoch, state.delay, state = Truncating,
+        var lag = state.lag

Review Comment:
   Isn't the lag recalculated on the next fetch iteration ([here](https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/AbstractFetcherThread.scala#L362))?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] dajac commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "dajac (via GitHub)" <gi...@apache.org>.
dajac commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1587563547

   @hudeqi Sure. I am trying to understand the issue. Have you been able to repro it with a unit test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] hudeqi commented on pull request #13696: KAFKA-14979:Incorrect lag was calculated when markPartitionsForTruncation in ReplicaAlterLogDirsThread

Posted by "hudeqi (via GitHub)" <gi...@apache.org>.
hudeqi commented on PR #13696:
URL: https://github.com/apache/kafka/pull/13696#issuecomment-1586899776

   Hi, could you review this pr? It seems to be similar to KAFKA-15080. @dajac 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org