You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/04/09 00:30:47 UTC

[GitHub] [helix] narendly opened a new issue #939: Fix flaky tests

narendly opened a new issue #939: Fix flaky tests
URL: https://github.com/apache/helix/issues/939
 
 
   [ERROR]   TestRebalancePipeline.testMsgTriggeredRebalance:195 expected:<1> but was:<0>
   [ERROR]   TestEnableCompression.testEnableCompressionResource:117 expected:<true> but was:<false>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on issue #939: Fix flaky tests

Posted by GitBox <gi...@apache.org>.
pkuwm commented on issue #939: Fix flaky tests
URL: https://github.com/apache/helix/issues/939#issuecomment-613087471
 
 
   ```
   [INFO] Results:
   [INFO]
   [ERROR] Failures:
   [ERROR] org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack.test(org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack)
   [INFO]   Run 1: PASS
   [ERROR]   Run 2: TestCrushAutoRebalanceNonRack.test:165->validateIsolation:313 expected:<3> but was:<4>
   [INFO]
   [INFO]
   [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 3
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on issue #939: Fix flaky tests

Posted by GitBox <gi...@apache.org>.
pkuwm commented on issue #939: Fix flaky tests
URL: https://github.com/apache/helix/issues/939#issuecomment-613164085
 
 
   ```
   -------------------------------------------------------------------------------
   Test set: TestSuite
   -------------------------------------------------------------------------------
   Tests run: 1119, Failures: 3, Errors: 0, Skipped: 1, Time elapsed: 4,883.15 s <<< FAILURE! - in TestSuite
   testFixedTargetTaskAndDisabledRebalanceAndNodeAdded(org.apache.helix.integration.task.TestRebalanceRunningTask)  Time elapsed: 11.404 s  <<< FAILURE!
   java.lang.AssertionError: expected:<true> but was:<false>
           at org.apache.helix.integration.task.TestRebalanceRunningTask.testFixedTargetTaskAndDisabledRebalanceAndNodeAdded(TestRebalanceRunningTask.java:265)
   
   testDeletingRecurrentQueueWithHistory(org.apache.helix.integration.task.TestRecurringJobQueue)  Time elapsed: 121.864 s  <<< FAILURE!
   java.lang.AssertionError: expected:<true> but was:<false>
           at org.apache.helix.integration.task.TestRecurringJobQueue.testDeletingRecurrentQueueWithHistory(TestRecurringJobQueue.java:296)
   
   testJobQueueAutoCleanUp(org.apache.helix.integration.task.TestJobQueueCleanUp)  Time elapsed: 300.002 s  <<< FAILURE!
   org.testng.internal.thread.ThreadTimeoutException: Method org.testng.internal.TestNGMethod.testJobQueueAutoCleanUp() didn't finish within the time-out 300000
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on issue #939: Fix flaky tests

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on issue #939: Fix flaky tests
URL: https://github.com/apache/helix/issues/939#issuecomment-612335352
 
 
   Adding 2 unstable tests that happen in the other test runs.
   
   TestControllerLeadershipChange.testMissingTopStateDurationMonitoring:262 expected:<true> but was:<false>
   org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack.testLackEnoughInstances(org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack)
     Run 1: PASS
     Run 2: TestCrushAutoRebalanceNonRack.testLackEnoughInstances:250 » ZkClient Failed to...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on issue #939: Fix flaky tests

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on issue #939: Fix flaky tests
URL: https://github.com/apache/helix/issues/939#issuecomment-612268860
 
 
   Let me take a look at the testMsgTriggeredRebalance failure.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on issue #939: Fix flaky tests

Posted by GitBox <gi...@apache.org>.
pkuwm commented on issue #939: Fix flaky tests
URL: https://github.com/apache/helix/issues/939#issuecomment-612373304
 
 
   ```
   Results :
   
   Failed tests:
     TestBasicSpectator.TestSpectator:58 expected:<true> but was:<false>
   org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceTopoplogyAwareDisabled.testLackEnoughInstances(org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceTopoplogyAwareDisabled)
     Run 1: PASS
     Run 2: TestCrushAutoRebalanceTopoplogyAwareDisabled.testLackEnoughInstances:86->TestCrushAutoRebalanceNonRack.testLackEnoughInstances:250 » ZkClient
   
     TestWorkflowTermination.testWorkflowJobFail:251->verifyWorkflowCleanup:257 expected:<true> but was:<false>
     TestWorkflowTermination.testWorkflowRunningTimeout:131->verifyWorkflowCleanup:257 expected:<true> but was:<false>
   
   Tests run: 1111, Failures: 4, Errors: 0, Skipped: 1
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on issue #939: Fix flaky tests

Posted by GitBox <gi...@apache.org>.
pkuwm commented on issue #939: Fix flaky tests
URL: https://github.com/apache/helix/issues/939#issuecomment-613789198
 
 
   TestZkMetadataStoreDirectory failure.
   
   ```
   [ERROR] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 18.857 s <<< FAILURE! - in org.apache.helix.rest.metadatastore.TestZkMetadataStoreDirectory
   [ERROR] testSetNamespaceRoutingData(org.apache.helix.rest.metadatastore.TestZkMetadataStoreDirectory)  Time elapsed: 0.027 s  <<< FAILURE!
   java.util.NoSuchElementException: Namespace namespace_0 does not exist!
   	at org.apache.helix.rest.metadatastore.TestZkMetadataStoreDirectory.testSetNamespaceRoutingData(TestZkMetadataStoreDirectory.java:175)
   
   [INFO]
   [INFO] Results:
   [INFO]
   [ERROR] Failures:
   [ERROR]   TestZkMetadataStoreDirectory.testSetNamespaceRoutingData:175 » NoSuchElement N...
   [INFO]
   [ERROR] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0
   [INFO]
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] ------------------------------------------------------------------------
   
   
   
   
   [ERROR] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 18.897 s <<< FAILURE! - in org.apache.helix.rest.metadatastore.TestZkMetadataStoreDirectory
   [ERROR] testDataDeletionCallback(org.apache.helix.rest.metadatastore.TestZkMetadataStoreDirectory)  Time elapsed: 0.029 s  <<< FAILURE!
   java.lang.IllegalArgumentException: Provided path is not a valid Zookeeper path: anyKey
   	at org.apache.helix.rest.metadatastore.TestZkMetadataStoreDirectory.lambda$testDataDeletionCallback$10(TestZkMetadataStoreDirectory.java:340)
   	at org.apache.helix.rest.metadatastore.TestZkMetadataStoreDirectory.testDataDeletionCallback(TestZkMetadataStoreDirectory.java:337)
   
   [INFO]
   [INFO] Results:
   [INFO]
   [ERROR] Failures:
   [ERROR]   TestZkMetadataStoreDirectory.testDataDeletionCallback:337->lambda$testDataDeletionCallback$10:340 » IllegalArgument
   [INFO]
   [ERROR] Tests run: 11, Failures: 1, Errors: 0, Skipped: 0
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] pkuwm commented on issue #939: Fix flaky tests

Posted by GitBox <gi...@apache.org>.
pkuwm commented on issue #939: Fix flaky tests
URL: https://github.com/apache/helix/issues/939#issuecomment-613140335
 
 
   > Results :
   > 
   > Failed tests:
   >   TestBasicSpectator.TestSpectator:58 expected:<true> but was:<false>
   > org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceTopoplogyAwareDisabled.testLackEnoughInstances(org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceTopoplogyAwareDisabled)
   >   Run 1: PASS
   >   Run 2: TestCrushAutoRebalanceTopoplogyAwareDisabled.testLackEnoughInstances:86->TestCrushAutoRebalanceNonRack.testLackEnoughInstances:250 » ZkClient
   > 
   > Tests run: 1111, Failures: 4, Errors: 0, Skipped: 1
   
   1. In ZkClient.deleteRecursively(), getChildren(path) for 
   ```
   /CLUSTER_TestCrushAutoRebalanceNonRack/INSTANCES/localhost_12921/MESSAGES is
   [c024282d-b461-47ae-994a-3e951b33bda8, 81c9d998-a4f7-463d-b7d2-444851ac3b94, 8b0aae78-4f4a-488a-8b57-60c1f5cba2a0, e51f3138-0d7b-44ff-96d9-9b199dedd945, 6
   803ce21-818f-404f-84d8-c39d60e28c2d, c8a22399-de1b-4d7e-8266-6712e16dd0bc, 7b6428ad-1da2-40b6-bc00-44091723df68, ed661695-0c32-42cf-bf76-08da4914c40a, 257f61e5-7238-4a
   fa-867b-003275d63708, 3c66083b-48f6-48f5-b59d-33b94f246cc8, 35c0f0af-2bca-40bd-b23f-f0c417b43504, 3cb84f5f-d121-4492-a5a5-230c39c8b621, c5f9e7db-fc9f-4f33-9770-7fc86ac
   fe0b2, 90cfa4a1-8632-49c7-abc0-2e4ac14d5870, baf6698f-e9a5-433e-a9ef-02af0218992e, 253f5bd6-d703-4a37-8aa3-1b2f688c5d4e, b97e8cca-a498-47b7-a331-21423a687c48, 0f43e755
   -fa3c-444f-b588-4a93ac10b05f, cd9b3160-9b40-422c-828d-c2890f6d6e5e]
   ```
   Later after all the previously fetch children are deleted and the path is being deleted, the path has children again:
   ```
   2020-04-13 01:01:32,393 [main] ERROR org.apache.helix.zookeeper.zkclient.ZkClient - children: [99db6e1a-26dd-4abb-bc67-420f2ce17333, c563c2d0-d20a-46ed-823a-69bd5d1b16ad, ac8b5651-03a1-4dbf-a659-16a45381e9a3, 2d93d9a7-72e5-4381-a432-f88cc28b73b3, 04b43472-d372-4841-a006-f0faa180f829, fae92659-4e3a-46e6-8ec0-349811ead778, 444f8824-5cc5-4e51-ab42-5ce0b917364a, ed77a721-f021-432a-ab4b-a5f2a7d34a5c, b8cd4995-8119-4675-b1e6-09b3e5ff7c0f, 27550b68-0af8-4cad-8fbc-704e90e39c3d, 8e576677-8b3d-459e-9e94-1dabc33f9a46, 5c033c68-ec61-41b0-8e5e-bb3a8dd750a4, 277c3c23-46f1-4683-ac58-2bf1ef045cea, dc263f96-ec5a-47a2-92e3-ebc1776e99c2, c37655ab-3ad2-4797-ac99-a4fbd1775e65, 3e236227-87c4-47db-9d2c-0f4340afd87c, 3e06b4bc-3485-4927-81a7-84f95ae93597, 8205b6a3-f7e5-4abf-8e30-404a4d24fe9a, 4ee60a6c-fd70-4240-9c27-5d801a263b14, e4a22424-3216-4e98-a9f0-fe25338c9260, c0d2ccf7-bed6-489d-856e-f4e85262d597, 2c72f45e-d4cb-4c24-adf1-fd8183912207, e01fb704-ffd4-41a8-95e1-9eed0590a770, a5e51478-ae7a-48c6-9e42-8ce1269cd288]
   ```
   From the log, messages are sent to the path during deleteRecursively().
   ```
   2020-04-13 01:01:31,732 [HelixController-pipeline-default-CLUSTER_TestCrushAutoRebalanceNonRack-(ff4d92c0_DEFAULT)] INFO  org.apache.helix.controller.stages.MessageDis
   patchStage - Event ff4d92c0_DEFAULT : Sending Message 99db6e1a-26dd-4abb-bc67-420f2ce17333 to localhost_12921 transit Test-DB-CrushRebalanceStrategy-0.Test-DB-CrushReb
   alanceStrategy-0_18|[] from:OFFLINE to:DROPPED, relayMessages: 0
   ```
   
   It seems messages are not processed completely while dropping the instance from cluster.
   
   2. One more thing to notice is, ZkHelixAdmin.dropInstance() has a retry. But in the log retry is not executed. The root cause is ZkClient.deleteRecursively() changes the throwing exception from HelixException to ZkClientException but ZkHelixAdmin.dropInstance() is still catching HelixException to retry. So ZkClientException is not caught and no retry to delete the path.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org