You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2020/12/10 20:31:31 UTC

[GitHub] [kafka] kowshik commented on a change in pull request #9680: MINOR: a small refactor for LogManage#shutdown

kowshik commented on a change in pull request #9680:
URL: https://github.com/apache/kafka/pull/9680#discussion_r540476157



##########
File path: core/src/main/scala/kafka/log/LogManager.scala
##########
@@ -479,14 +479,14 @@ class LogManager(logDirs: Seq[File],
 
     try {
       for ((dir, dirJobs) <- jobs) {
-        val hasErrors = dirJobs.map { future =>
+        val hasErrors = dirJobs.exists  { future =>
           Try(future.get) match {
             case Success(_) => false
             case Failure(e) =>
               warn(s"There was an error in one of the threads during LogManager shutdown: ${e.getCause}")
               true
           }
-        }.contains(true)
+        }

Review comment:
       @chia7712 @ijuma I think you were referring to https://github.com/apache/kafka/pull/9596 ?
   
   The test added in #9596 to verify this was `LogManagerTest.testHandlingExceptionsDuringShutdown` ([link](https://github.com/apache/kafka/blob/dcbd28da53bfc6edb78fd1735993163fc6f4c7c7/core/src/test/scala/unit/kafka/log/LogManagerTest.scala#L94)). I agree that the test wasn't effective enough to catch the problem. The issue was that fundamentally this area of the code was a little bit difficult to test due to the asynchronous behavior.
   
   So for example, although @chia7712 has a fix in #9728, I still think it is not easy to write a test that reliably fails if the bug is introduced again.
   
   Below is my explanation on why it is difficult to improve the existing `testHandlingExceptionsDuringShutdown` to catch such issues. Let us imagine that the test setup introduced multiple logs for each log dir, and setup an expectation (in the test code) that if one of the logs failed during `close()`, then, the other logs under the same log directory should still be closed by the time `LogManager.shutdown()` returns. Such a kind of test expectatio would have caught the problem introduced in this PR. Now, writing that kind of a test was not easy, because, even if one of the logs failed to close, the other futures can still be completed by the time `LogManager.shutdown()` returns (there is no strong need to call `future.get` for the future to pass). Thus, the test can pass despite the bug.
   
   One potential solution (a bigger change) is to see if we can inject a different thread pool executor that only during test executes the threads lazily i.e. it executes the submitted threads only when `future.get` is called, otherwise it will never execute them. If we succeed in doing this, then, the test can be improved such that unless `future.get` is called on all logs, some of the log `close()` will never happen and will fail the test in the presence of the bug introduced by this PR.
   
   cc @junrao 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org