You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "Luke Chen (Jira)" <ji...@apache.org> on 2022/09/19 04:48:00 UTC

[jira] [Commented] (KAFKA-14242) Hanging logManager in testReloadUpdatedFilesWithoutConfigChange test

    [ https://issues.apache.org/jira/browse/KAFKA-14242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606419#comment-17606419 ] 

Luke Chen commented on KAFKA-14242:
-----------------------------------

proposed solution PR: https://github.com/apache/kafka/pull/12639

> Hanging logManager in testReloadUpdatedFilesWithoutConfigChange test
> --------------------------------------------------------------------
>
>                 Key: KAFKA-14242
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14242
>             Project: Kafka
>          Issue Type: Test
>            Reporter: Luke Chen
>            Assignee: Luke Chen
>            Priority: Major
>
> Recently, we got a lot of build failed (and terminated) with core:unitTest failure. The failed messages look like this:
> {code:java}
> FAILURE: Build failed with an exception.
> [2022-09-14T09:51:52.190Z] 
> [2022-09-14T09:51:52.190Z] * What went wrong:
> [2022-09-14T09:51:52.190Z] Execution failed for task ':core:unitTest'.
> [2022-09-14T09:51:52.190Z] > Process 'Gradle Test Executor 128' finished with non-zero exit value 1 {code}
> After investigation, I found one reason of it (maybe there are other reasons). In {{BrokerMetadataPublisherTest#testReloadUpdatedFilesWithoutConfigChange}} test, we created logManager twice, but when cleanup, we only close one of them. So, there will be a log cleaner keeping running. But during this time, the temp log dirs are deleted, so it will {{{}Exit.halt(1){}}}, and got the error we saw in gradle, like this code did when we encounter IOException in all our log dirs:
> {code:java}
> fatal(s"Shutdown broker because all log dirs in ${logDirs.mkString(", ")} have failed")
> Exit.halt(1) {code}
> And, why does it sometimes pass, sometimes failed? Because during test cluster close, we shutdown broker first, and then other components. And the log cleaner is triggered in an interval. So, if the cluster can close fast enough, and finish this test, it'll be passed. Otherwise, it'll exit with 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)