You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/09/17 20:32:01 UTC
[GitHub] [helix] kaisun2000 opened a new issue #1371: fix TestCrushAutoRebalanceNonRack.testLackEnoughInstances
kaisun2000 opened a new issue #1371:
URL: https://github.com/apache/helix/issues/1371
LOG 981 touch 9
>2020-09-17T11:16:32.0302688Z [ERROR] testLackEnoughInstances(org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack) Time elapsed: 0.2 s <<< FAILURE!
2020-09-17T11:16:32.0306145Z org.apache.helix.HelixException: Node localhost_12920 is still alive for cluster CLUSTER_TestCrushAutoRebalanceNonRack, can't drop.
2020-09-17T11:16:32.0311064Z at org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack.testLackEnoughInstances(TestCrushAutoRebalanceNonRack.java:282)
2020-09-17T11:16:32.0314239Z
2020-09-17T11:16:32.4240947Z [ERROR] Failures:
2020-09-17T11:16:32.4249888Z [ERROR] org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack.testLackEnoughInstances(org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack)
2020-09-17T11:16:32.4263407Z [ERROR] Run 1: TestCrushAutoRebalanceNonRack.testLackEnoughInstances:282 ยป Helix Node localho...
2020-09-17T11:16:32.4264750Z [ERROR] Tests run: 1195, Failures: 1, Errors: 0, Skipped: 1
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org
[GitHub] [helix] jiajunwang commented on issue #1371: fix TestCrushAutoRebalanceNonRack.testLackEnoughInstances
Posted by GitBox <gi...@apache.org>.
jiajunwang commented on issue #1371:
URL: https://github.com/apache/helix/issues/1371#issuecomment-849104215
Close test unstable tickets since we have an automatic tracking mechanism https://github.com/apache/helix/pull/1757 now for tracking the most recent test issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org
[GitHub] [helix] kaisun2000 edited a comment on issue #1371: fix TestCrushAutoRebalanceNonRack.testLackEnoughInstances
Posted by GitBox <gi...@apache.org>.
kaisun2000 edited a comment on issue #1371:
URL: https://github.com/apache/helix/issues/1371#issuecomment-694535151
The root cause seems to be at restarting the participant at `testLackEnoughLiveInstances`
```
for (int i = 2; i < _participants.size(); i++) {
_participants.get(i).syncStart();
}
```
In fact, participant can't be restart with syncStop before, we have to create a new participant before fixing life cycle of zkclient in CB issue.
see the right way to do it, in TestWagedRebalancce
```
// restart the participants within the zone
for (int i = 2; i < _participants.size(); i++) {
MockParticipantManager p = _participants.get(i);
MockParticipantManager newNode =
new MockParticipantManager(ZK_ADDR, CLUSTER_NAME, p.getInstanceName());
_participants.set(i, newNode);
newNode.syncStart();
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org
[GitHub] [helix] kaisun2000 commented on issue #1371: fix TestCrushAutoRebalanceNonRack.testLackEnoughInstances
Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on issue #1371:
URL: https://github.com/apache/helix/issues/1371#issuecomment-698047813
Participant can't be restart once it is stopped.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org
[GitHub] [helix] kaisun2000 commented on issue #1371: fix TestCrushAutoRebalanceNonRack.testLackEnoughInstances
Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on issue #1371:
URL: https://github.com/apache/helix/issues/1371#issuecomment-694498009
LOG 981 touch 9
> 2020-09-17T11:16:32.0302688Z [ERROR] testLackEnoughInstances(org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack) Time elapsed: 0.2 s <<< FAILURE!
2020-09-17T11:16:32.0306145Z org.apache.helix.HelixException: Node localhost_12920 is still alive for cluster CLUSTER_TestCrushAutoRebalanceNonRack, can't drop.
2020-09-17T11:16:32.0311064Z at org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack.testLackEnoughInstances(TestCrushAutoRebalanceNonRack.java:282)
2020-09-17T11:16:32.0314239Z
2020-09-17T11:16:32.4240947Z [ERROR] Failures:
2020-09-17T11:16:32.4249888Z [ERROR] org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack.testLackEnoughInstances(org.apache.helix.integration.rebalancer.CrushRebalancers.TestCrushAutoRebalanceNonRack)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org
[GitHub] [helix] kaisun2000 commented on issue #1371: fix TestCrushAutoRebalanceNonRack.testLackEnoughInstances
Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on issue #1371:
URL: https://github.com/apache/helix/issues/1371#issuecomment-694535151
The root cause seems to be at restarting the participant at `testLackEnoughLiveInstances`
```
for (int i = 2; i < _participants.size(); i++) {
_participants.get(i).syncStart();
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org
[GitHub] [helix] kaisun2000 commented on issue #1371: fix TestCrushAutoRebalanceNonRack.testLackEnoughInstances
Posted by GitBox <gi...@apache.org>.
kaisun2000 commented on issue #1371:
URL: https://github.com/apache/helix/issues/1371#issuecomment-694505483
2020-09-17T10:09:09.0947422Z START TestCrushAutoRebalanceNonRack testLackEnoughInstances at Thu Sep 17 10:09:09 UTC 2020
2020-09-17T10:09:09.1057616Z TestLackEnoughInstances CrushRebalanceStrategy
2020-09-17T10:09:09.3539409Z 406919 [ClusterManager_Watcher_CLUSTER_TestCrushAutoRebalanceNonRack_localhost_12922_PARTICIPANT_10159] ERROR org.apache.helix.manager.zk.ZKExceptionHandler - Exception while invoking init callback for listener:org.apache.helix.messaging.handling.HelixTaskExecutor@428b9021
2020-09-17T10:09:09.3542946Z java.lang.IllegalStateException: ZkClient already closed!
2020-09-17T10:09:09.3544367Z at org.apache.helix.zookeeper.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1664)
2020-09-17T10:09:09.3546267Z at org.apache.helix.zookeeper.zkclient.ZkClient.watchForChilds(ZkClient.java:2098)
2020-09-17T10:09:09.3548564Z at org.apache.helix.zookeeper.zkclient.ZkClient.subscribeChildChanges(ZkClient.java:259)
2020-09-17T10:09:09.3551731Z at org.apache.helix.manager.zk.CallbackHandler.subscribeChildChange(CallbackHandler.java:551)
2020-09-17T10:09:09.3554377Z at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:608)
2020-09-17T10:09:09.3556221Z at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:401)
2020-09-17T10:09:09.3558141Z at org.apache.helix.manager.zk.CallbackHandler.init(CallbackHandler.java:705)
2020-09-17T10:09:09.3559741Z at org.apache.helix.manager.zk.ZKHelixManager.initHandlers(ZKHelixManager.java:1122)
2020-09-17T10:09:09.3561591Z at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1344)
2020-09-17T10:09:09.3563435Z at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:752)
2020-09-17T10:09:09.3565237Z at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:789)
2020-09-17T10:09:09.3567163Z at org.apache.helix.integration.manager.MockParticipantManager.run(MockParticipantManager.java:98)
2020-09-17T10:09:09.3568578Z at java.lang.Thread.run(Thread.java:748)
2020-09-17T10:09:09.3610845Z 406929 [ClusterManager_Watcher_CLUSTER_TestCrushAutoRebalanceNonRack_localhost_12921_PARTICIPANT_10158] ERROR org.apache.helix.manager.zk.ZKExceptionHandler - Exception while invoking init callback for listener:org.apache.helix.messaging.handling.HelixTaskExecutor@3f5341de
2020-09-17T10:09:09.3613140Z java.lang.IllegalStateException: ZkClient already closed!
2020-09-17T10:09:09.3614649Z at org.apache.helix.zookeeper.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1664)
2020-09-17T10:09:09.3616612Z at org.apache.helix.zookeeper.zkclient.ZkClient.watchForChilds(ZkClient.java:2098)
2020-09-17T10:09:09.3624017Z at org.apache.helix.zookeeper.zkclient.ZkClient.subscribeChildChanges(ZkClient.java:259)
2020-09-17T10:09:09.3626898Z at org.apache.helix.manager.zk.CallbackHandler.subscribeChildChange(CallbackHandler.java:551)
2020-09-17T10:09:09.3629267Z at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:608)
2020-09-17T10:09:09.3631469Z at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:401)
2020-09-17T10:09:09.3633204Z at org.apache.helix.manager.zk.CallbackHandler.init(CallbackHandler.java:705)
2020-09-17T10:09:09.3648973Z at org.apache.helix.manager.zk.ZKHelixManager.initHandlers(ZKHelixManager.java:1122)
2020-09-17T10:09:09.3654146Z at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1344)
2020-09-17T10:09:09.3655960Z at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:752)
2020-09-17T10:09:09.3657539Z at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:789)
2020-09-17T10:09:09.3659983Z at org.apache.helix.integration.manager.MockParticipantManager.run(MockParticipantManager.java:98)
2020-09-17T10:09:09.3661448Z at java.lang.Thread.run(Thread.java:748)
2020-09-17T10:09:09.3766251Z 406936 [ClusterManager_Watcher_CLUSTER_TestCrushAutoRebalanceNonRack_localhost_12920_PARTICIPANT_10157] ERROR org.apache.helix.manager.zk.ZKExceptionHandler - Exception while invoking init callback for listener:org.apache.helix.messaging.handling.HelixTaskExecutor@d3c8e10
2020-09-17T10:09:09.3768903Z java.lang.IllegalStateException: ZkClient already closed!
2020-09-17T10:09:09.3770472Z at org.apache.helix.zookeeper.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1664)
2020-09-17T10:09:09.3772586Z at org.apache.helix.zookeeper.zkclient.ZkClient.watchForChilds(ZkClient.java:2098)
2020-09-17T10:09:09.3774749Z at org.apache.helix.zookeeper.zkclient.ZkClient.subscribeChildChanges(ZkClient.java:259)
2020-09-17T10:09:09.3777101Z at org.apache.helix.manager.zk.CallbackHandler.subscribeChildChange(CallbackHandler.java:551)
2020-09-17T10:09:09.3780113Z at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:608)
2020-09-17T10:09:09.3782768Z at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:401)
2020-09-17T10:09:09.3784325Z at org.apache.helix.manager.zk.CallbackHandler.init(CallbackHandler.java:705)
2020-09-17T10:09:09.3786221Z at org.apache.helix.manager.zk.ZKHelixManager.initHandlers(ZKHelixManager.java:1122)
2020-09-17T10:09:09.3788364Z at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1344)
2020-09-17T10:09:09.3790798Z at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:752)
2020-09-17T10:09:09.3792551Z at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:789)
2020-09-17T10:09:09.3794663Z at org.apache.helix.integration.manager.MockParticipantManager.run(MockParticipantManager.java:98)
2020-09-17T10:09:09.3796235Z at java.lang.Thread.run(Thread.java:748)
2020-09-17T10:09:09.3823444Z 406950 [ClusterManager_Watcher_CLUSTER_TestCrushAutoRebalanceNonRack_localhost_12923_PARTICIPANT_10160] ERROR org.apache.helix.manager.zk.ZKExceptionHandler - Exception while invoking init callback for listener:org.apache.helix.messaging.handling.HelixTaskExecutor@61516afd
2020-09-17T10:09:09.3827698Z java.lang.IllegalStateException: ZkClient already closed!
2020-09-17T10:09:09.3829474Z at org.apache.helix.zookeeper.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1664)
2020-09-17T10:09:09.3831670Z at org.apache.helix.zookeeper.zkclient.ZkClient.watchForChilds(ZkClient.java:2098)
2020-09-17T10:09:09.3834665Z at org.apache.helix.zookeeper.zkclient.ZkClient.subscribeChildChanges(ZkClient.java:259)
2020-09-17T10:09:09.3837645Z at org.apache.helix.manager.zk.CallbackHandler.subscribeChildChange(CallbackHandler.java:551)
2020-09-17T10:09:09.3840088Z at org.apache.helix.manager.zk.CallbackHandler.subscribeForChanges(CallbackHandler.java:608)
2020-09-17T10:09:09.3842261Z at org.apache.helix.manager.zk.CallbackHandler.invoke(CallbackHandler.java:401)
2020-09-17T10:09:09.3844057Z at org.apache.helix.manager.zk.CallbackHandler.init(CallbackHandler.java:705)
2020-09-17T10:09:09.3845805Z at org.apache.helix.manager.zk.ZKHelixManager.initHandlers(ZKHelixManager.java:1122)
2020-09-17T10:09:09.3847785Z at org.apache.helix.manager.zk.ZKHelixManager.handleNewSession(ZKHelixManager.java:1344)
2020-09-17T10:09:09.3850612Z at org.apache.helix.manager.zk.ZKHelixManager.createClient(ZKHelixManager.java:752)
2020-09-17T10:09:09.3852580Z at org.apache.helix.manager.zk.ZKHelixManager.connect(ZKHelixManager.java:789)
2020-09-17T10:09:09.3854650Z at org.apache.helix.integration.manager.MockParticipantManager.run(MockParticipantManager.java:98)
2020-09-17T10:09:09.3856155Z at java.lang.Thread.run(Thread.java:748)
2020-09-17T10:09:09.5025072Z END TestCrushAutoRebalanceNonRack testLackEnoughInstances at Thu Sep 17 10:09:09 UTC 2020, took: 403ms.
2020-09-17T10:09:09.5029878Z START TestCrushAutoRebalanceNonRack testLackEnoughInstances at Thu Sep 17 10:09:09 UTC 2020
2020-09-17T10:09:09.5030874Z TestLackEnoughInstances CrushEdRebalanceStrategy
2020-09-17T10:09:12.5796494Z END TestCrushAutoRebalanceNonRack testLackEnoughInstances at Thu Sep 17 10:09:12 UTC 2020, took: 3081ms.
2020-09-17T10:09:12.5798704Z AfterClass: TestCrushAutoRebalanceNonRack of TestCrushAutoRebalanceNonRack called.
2020-09-17T10:09:12.6201211Z 410185 [HelixController-pipeline-default-CLUSTER_TestCrushAutoRebalanceNonRack-(8c650be6_DEFAULT)] ERROR org.apache.helix.controller.GenericHelixController - Exception while executing DEFAULT pipeline: CLUSTER_TestCrushAutoRebalanceNonRack for cluster [org.apache.helix.zookeeper.zkclient.ZkClient.retryUntilConnected(ZkClient.java:1689), org.apache.helix.zookeeper.zkclient.ZkClient.getChildren(ZkClient.java:1046), org.apache.helix.zookeeper.zkclient.ZkClient.getChildren(ZkClient.java:1039), org.apache.helix.manager.zk.ZkBaseDataAccessor.getChildNames(ZkBaseDataAccessor.java:669), org.apache.helix.manager.zk.ZKHelixDataAccessor.getChildNames(ZKHelixDataAccessor.java:394), org.apache.helix.common.caches.InstanceMessagesCache.refresh(InstanceMessagesCache.java:113), org.apache.helix.controller.dataproviders.BaseControllerDataProvider.doRefresh(BaseControllerDataProvider.java:341), org.apache.helix.controller.dataproviders.ResourceControllerDataProvider.re
fresh(ResourceControllerDataProvider.java:143), org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:63), org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:68), org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:777), org.apache.helix.controller.GenericHelixController.access$500(GenericHelixController.java:128), org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:1407)]. Will not continue to next pipeline
2020-09-17T10:09:12.6221231Z 410189 [HelixController-pipeline-task-CLUSTER_TestCrushAutoRebalanceNonRack-(8c650be6_TASK)] ERROR org.apache.helix.zookeeper.zkclient.callback.ZkAsyncCallbacks - Interrupted waiting for success
2020-09-17T10:09:12.6223689Z java.lang.InterruptedException
2020-09-17T10:09:12.6224464Z at java.lang.Object.wait(Native Method)
2020-09-17T10:09:12.6225020Z at java.lang.Object.wait(Object.java:502)
2020-09-17T10:09:12.6226384Z at org.apache.helix.zookeeper.zkclient.callback.ZkAsyncCallbacks$DefaultCallback.waitForSuccess(ZkAsyncCallbacks.java:220)
2020-09-17T10:09:12.6228337Z at org.apache.helix.manager.zk.ZkBaseDataAccessor.getStats(ZkBaseDataAccessor.java:1197)
2020-09-17T10:09:12.6230966Z at org.apache.helix.manager.zk.ZKHelixDataAccessor.getPropertyStats(ZKHelixDataAccessor.java:365)
2020-09-17T10:09:12.6233258Z at org.apache.helix.common.caches.AbstractDataCache.refreshProperties(AbstractDataCache.java:66)
2020-09-17T10:09:12.6236181Z at org.apache.helix.common.caches.ParticipantStateCache.refreshParticipantStatesCacheFromZk(ParticipantStateCache.java:122)
2020-09-17T10:09:12.6239188Z at org.apache.helix.common.caches.ParticipantStateCache.refresh(ParticipantStateCache.java:62)
2020-09-17T10:09:12.6241762Z at org.apache.helix.controller.dataproviders.BaseControllerDataProvider.doRefresh(BaseControllerDataProvider.java:342)
2020-09-17T10:09:12.6246818Z at org.apache.helix.controller.dataproviders.WorkflowControllerDataProvider.refresh(WorkflowControllerDataProvider.java:88)
2020-09-17T10:09:12.6250040Z at org.apache.helix.controller.stages.ReadClusterDataStage.process(ReadClusterDataStage.java:63)
2020-09-17T10:09:12.6252562Z at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:68)
2020-09-17T10:09:12.6254787Z at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:777)
2020-09-17T10:09:12.6257152Z at org.apache.helix.controller.GenericHelixController.access$500(GenericHelixController.java:128)
2020-09-17T10:09:12.6260562Z at org.apache.helix.controller.GenericHelixController$ClusterEventProcessor.run(GenericHelixController.java:1407)
2020-09-17T10:09:13.7190598Z AfterClass: TestCrushAutoRebalanceNonRack of ZkStandAloneCMTestBase called
2020-09-17T10:09:13.7196116Z END TestCrushAutoRebalanceNonRack at Thu Sep 17 10:09:13 UTC 2020
2020-09-17T10:09:13.7221008Z AfterClass:TestCrushAutoRebalanceNonRack afterclass of ZkTestBase called!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org
[GitHub] [helix] jiajunwang closed issue #1371: fix TestCrushAutoRebalanceNonRack.testLackEnoughInstances
Posted by GitBox <gi...@apache.org>.
jiajunwang closed issue #1371:
URL: https://github.com/apache/helix/issues/1371
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org