You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by GitBox <gi...@apache.org> on 2022/09/14 13:23:23 UTC

[GitHub] [kafka] showuon opened a new pull request, #12639: KAFKA-14233: do not init managers twice to avoid resource leak

showuon opened a new pull request, #12639:
URL: https://github.com/apache/kafka/pull/12639

   Recently, we got a lot of build failed (and terminated) with core:unitTest failure. The failed messages look like this:
   ```
   FAILURE: Build failed with an exception.
   [2022-09-14T09:51:52.190Z] 
   [2022-09-14T09:51:52.190Z] * What went wrong:
   [2022-09-14T09:51:52.190Z] Execution failed for task ':core:unitTest'.
   [2022-09-14T09:51:52.190Z] > Process 'Gradle Test Executor 128' finished with non-zero exit value 1
   ```
   
   After investigation, I found one reason of it (maybe there are other reasons). In  `BrokerMetadataPublisherTest#testReloadUpdatedFilesWithoutConfigChange` test, we created logManager twice, but when cleanup, we only close one of them. So, there will be a log cleaner keeping running. But during this time, the temp log dirs are deleted, so it will `Exit.halt(1)`, and got the error we saw in gradle, like this code did when we encounter IOException in all our log dirs:
   
   ```
   fatal(s"Shutdown broker because all log dirs in ${logDirs.mkString(", ")} have failed")
   Exit.halt(1)
   ```
   
   Fixed it by disable `_firstPublish` flag for mock publisher, to avoid resource leak.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] divijvaidya commented on pull request #12639: KAFKA-14233: do not init managers twice to avoid resource leak

Posted by GitBox <gi...@apache.org>.
divijvaidya commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1247027848

   Wow! Great deep dive to find the root cause here @showuon 👏
   
   I am curious, how did you narrow down that `Gradle Test Executor 128` is related to `testReloadUpdatedFilesWithoutConfigChange`. Did you check the the logs and search for exception stack traces and managed to get the `IOException`? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] showuon commented on pull request #12639: KAFKA-14242: use mock managers to avoid duplicated resource allocation

Posted by GitBox <gi...@apache.org>.
showuon commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1296684059

   @hachikuji @mumrah , please take a look. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] showuon commented on pull request #12639: KAFKA-14233: do not init managers twice to avoid resource leak

Posted by GitBox <gi...@apache.org>.
showuon commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1250444038

   > @showuon Great find. I am not sure about the fix though, it seems a bit odd to update the mock in this way. To get things back into a stable state, we should perhaps disable this test while we figure out the right way to fix it.
   
   Sure, I've opened https://github.com/apache/kafka/pull/12658 to disable the test first. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] showuon commented on pull request #12639: KAFKA-14233: do not init managers twice to avoid resource leak

Posted by GitBox <gi...@apache.org>.
showuon commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1250415069

   @dajac @hachikuji , FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] showuon commented on pull request #12639: KAFKA-14233: do not init managers twice to avoid resource leak

Posted by GitBox <gi...@apache.org>.
showuon commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1248055476

   cc @ijuma , I think we should merge this fix soon to help our build turn red back to yellow/green light.
   
   <img width="1384" alt="image" src="https://user-images.githubusercontent.com/43372967/190407215-e2c95d76-0e81-436d-b209-91f82f800e53.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] showuon merged pull request #12639: KAFKA-14242: use mock managers to avoid duplicated resource allocation

Posted by GitBox <gi...@apache.org>.
showuon merged PR #12639:
URL: https://github.com/apache/kafka/pull/12639


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] showuon commented on pull request #12639: KAFKA-14233: do not init managers twice to avoid resource leak

Posted by GitBox <gi...@apache.org>.
showuon commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1247685312

   @divijvaidya , thanks.
   
   > I am curious, how did you narrow down that Gradle Test Executor 128 is related to testReloadUpdatedFilesWithoutConfigChange. Did you check the the logs and search for exception stack traces and managed to get the IOException?
   
   Since the test got terminated, not closed gracefully, so there's no exception stack traces unfortunately, which is why it is not easy to identify the problem. For me, I don't have any trick here. I analyze some of the build logs like [here](https://ci-builds.apache.org/blue/rest/organizations/jenkins/pipelines/Kafka/pipelines/kafka/branches/trunk/runs/1228/nodes/14/steps/134/log/?start=0), and found the "core:unitTest" mostly got terminated at `ClusterTestExtensionsTest` suite. Starting from it, I check the test code, found nothing special, then, find the tests earlier, since I believe there could be some earlier test affect it or somewhat. So, in the end, I found the issue in this PR in the `BrokerMetadataPublisherTest` test suite. Hope this is the only issue we need to fix!
   
   ```
   2022-09-15T01:16:27.075Z] BrokerMetadataPublisherTest > testReloadUpdatedFilesWithoutConfigChange() STARTED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] BrokerMetadataPublisherTest > testReloadUpdatedFilesWithoutConfigChange() PASSED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] BrokerMetadataSnapshotterTest > testCreateAndClose() STARTED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] BrokerMetadataSnapshotterTest > testCreateAndClose() PASSED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] BrokerMetadataSnapshotterTest > testCreateSnapshot() STARTED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] BrokerMetadataSnapshotterTest > testCreateSnapshot() PASSED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] BrokerMetadataSnapshotterTest > testCreateSnapshotMultipleReasons() STARTED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] BrokerMetadataSnapshotterTest > testCreateSnapshotMultipleReasons() PASSED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] BrokerMetadataSnapshotterTest > testRecordListConsumer() STARTED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] BrokerMetadataSnapshotterTest > testRecordListConsumer() PASSED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] MockConfigRepositoryTest > testEmptyRepository() STARTED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] MockConfigRepositoryTest > testEmptyRepository() PASSED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] MockConfigRepositoryTest > testSetTopicConfig() STARTED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] MockConfigRepositoryTest > testSetTopicConfig() PASSED
   [2022-09-15T01:16:27.075Z] 
   [2022-09-15T01:16:27.075Z] ClusterTestExtensionsTest > testClusterTest(ClusterConfig, ClusterInstance) > kafka.test.ClusterTestExtensionsTest.testClusterTest(ClusterConfig, ClusterInstance)[1] STARTED
   [2022-09-15T01:16:28.050Z] 
   [2022-09-15T01:16:28.050Z] ClusterTestExtensionsTest > testClusterTest(ClusterConfig, ClusterInstance) > kafka.test.ClusterTestExtensionsTest.testClusterTest(ClusterConfig, ClusterInstance)[1] PASSED
   [2022-09-15T01:16:28.050Z] 
   [2022-09-15T01:16:28.050Z] ClusterTestExtensionsTest > testNoAutoStart() > kafka.test.ClusterTestExtensionsTest.testNoAutoStart()[1] STARTED
   [2022-09-15T01:16:32.108Z] 
   [2022-09-15T01:16:32.108Z] ClusterTestExtensionsTest > testNoAutoStart() > kafka.test.ClusterTestExtensionsTest.testNoAutoStart()[1] PASSED
   [2022-09-15T01:16:32.108Z] 
   [2022-09-15T01:16:32.108Z] ClusterTestExtensionsTest > testClusterTests() > kafka.test.ClusterTestExtensionsTest.testClusterTests()[1] STARTED
   [2022-09-15T01:16:34.943Z] 
   [2022-09-15T01:16:34.943Z] ClusterTestExtensionsTest > testClusterTests() > kafka.test.ClusterTestExtensionsTest.testClusterTests()[1] PASSED
   [2022-09-15T01:16:34.943Z] 
   [2022-09-15T01:16:34.943Z] ClusterTestExtensionsTest > testClusterTests() > kafka.test.ClusterTestExtensionsTest.testClusterTests()[2] STARTED
   [2022-09-15T01:16:38.147Z] 
   [2022-09-15T01:16:38.147Z] ClusterTestExtensionsTest > testClusterTests() > kafka.test.ClusterTestExtensionsTest.testClusterTests()[2] PASSED
   [2022-09-15T01:16:38.147Z] 
   [2022-09-15T01:16:38.147Z] ClusterTestExtensionsTest > testClusterTests() > kafka.test.ClusterTestExtensionsTest.testClusterTests()[3] STARTED
   [2022-09-15T01:16:40.319Z] 
   [2022-09-15T01:16:40.319Z] ClusterTestExtensionsTest > testClusterTests() > kafka.test.ClusterTestExtensionsTest.testClusterTests()[3] SKIPPED
   [2022-09-15T01:16:41.474Z] 
   [2022-09-15T01:16:41.474Z] > Task :core:unitTest FAILED
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] showuon commented on a diff in pull request #12639: KAFKA-14242: do not init managers twice to avoid resource leak

Posted by GitBox <gi...@apache.org>.
showuon commented on code in PR #12639:
URL: https://github.com/apache/kafka/pull/12639#discussion_r983504767


##########
core/src/test/scala/unit/kafka/server/metadata/BrokerMetadataPublisherTest.scala:
##########
@@ -178,15 +181,21 @@ class BrokerMetadataPublisherTest {
 
   private def newMockPublisher(
     broker: BrokerServer,
+    logManager: LogManager,
+    replicaManager: ReplicaManager,
+    groupCoordinator: GroupCoordinator,
+    txnCoordinator: TransactionCoordinator,
     errorHandler: FaultHandler = new MockFaultHandler("publisher")
   ): BrokerMetadataPublisher = {
+    val mockLogManager = Mockito.mock(classOf[LogManager])
+    Mockito.when(mockLogManager.allLogs).thenReturn(Iterable.empty[UnifiedLog])
     Mockito.spy(new BrokerMetadataPublisher(
       conf = broker.config,
       metadataCache = broker.metadataCache,
-      logManager = broker.logManager,
-      replicaManager = broker.replicaManager,
-      groupCoordinator = broker.groupCoordinator,
-      txnCoordinator = broker.transactionCoordinator,

Review Comment:
   Feed mock publisher with mock log manager and other managers to avoid duplicate resource allocation in initManager



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] showuon commented on pull request #12639: KAFKA-14242: use mock managers to avoid duplicated resource allocation

Posted by GitBox <gi...@apache.org>.
showuon commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1262235095

   @hachikuji , please take a look. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] showuon commented on pull request #12639: KAFKA-14233: do not init managers twice to avoid resource leak

Posted by GitBox <gi...@apache.org>.
showuon commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1247933895

   There's been 3 straight builds not get terminated in the `:core:unitTest`.
   
   <img width="1162" alt="image" src="https://user-images.githubusercontent.com/43372967/190386149-93694736-db70-4196-b626-b07ff275adc2.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] showuon commented on pull request #12639: KAFKA-14233: do not init managers twice to avoid resource leak

Posted by GitBox <gi...@apache.org>.
showuon commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1246762358

   @hachikuji @jsancio , please take a look. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] ijuma commented on pull request #12639: KAFKA-14233: do not init managers twice to avoid resource leak

Posted by GitBox <gi...@apache.org>.
ijuma commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1250429683

   @showuon Great find. I am not sure about the fix though, it seems a bit odd to update the mock in this way. To get things back into a stable state, we should perhaps disable this test while we figure out the right way to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [kafka] C0urante commented on pull request #12639: KAFKA-14242: do not init managers twice to avoid resource leak

Posted by GitBox <gi...@apache.org>.
C0urante commented on PR #12639:
URL: https://github.com/apache/kafka/pull/12639#issuecomment-1252547139

   @showuon @ijuma I've taken a stab at a safeguard to prevent tests from accidentally terminating the JVM, and to accurately report where those calls for termination come from. If you're interested, the PR can be found [here](https://github.com/apache/kafka/pull/12666).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscribe@kafka.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org