You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Andrew Kyle Purtell (Jira)" <ji...@apache.org> on 2022/06/12 18:43:00 UTC

[jira] [Resolved] (HBASE-12852) Tests from hbase-it that use ChaosMonkey don't fail if SSH commands fail

     [ https://issues.apache.org/jira/browse/HBASE-12852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Kyle Purtell resolved HBASE-12852.
-----------------------------------------
    Resolution: Incomplete

> Tests from hbase-it that use ChaosMonkey don't fail if SSH commands fail
> ------------------------------------------------------------------------
>
>                 Key: HBASE-12852
>                 URL: https://issues.apache.org/jira/browse/HBASE-12852
>             Project: HBase
>          Issue Type: Bug
>          Components: integration tests
>    Affects Versions: 0.98.6
>            Reporter: Dima Spivak
>            Priority: Major
>
> I've just started rolling my sleeves up and playing about with hbase-it (at the moment, only on 0.98.6), but wanted to begin filing JIRAs for issues I encounter so that I don't forget to get to them. First up is the fact that it seems that tests run with ChaosMonkey don't fail when the ChaosMonkey fails to work. As an example, while running IntegrationTestIngest with a slowDeterministic CM, I forgot to set up SSH properly and saw the following:
> {code}
> 15/01/14 07:36:53 WARN hbase.ClusterManager: Remote command: ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:node-5.internal failed at attempt 4. Retrying until maxAttempts: 5. Exception: stderr: Permission denied, please try again.
> Permission denied, please try again.
> Permission denied (publickey,password).
> , stdout: 
> 15/01/14 07:36:53 INFO util.RetryCounter: Sleeping 16000ms before retry #4...
> 15/01/14 07:36:53 INFO zookeeper.ZooKeeper: Session: 0x14ae74d7bac006b closed
> 15/01/14 07:36:53 INFO policies.Policy: Sleeping for: 59541
> 15/01/14 07:36:53 INFO zookeeper.ClientCnxn: EventThread shut down
> Failed to write keys: 0
> Key range: [150000..159999]
> Batch updates: false
> Percent of keys to update: 60
> Updater threads: 10
> Ignore nonce conflicts: true
> Regions per server: 5
> 15/01/14 07:36:56 INFO util.LoadTestTool: Starting to mutate data...
> Starting to mutate data...
> 15/01/14 07:36:57 INFO policies.Policy: Sleeping for: 88816
> 15/01/14 07:37:01 INFO util.MultiThreadedAction: [U:10] Keys=471, cols=5.7 K, time=00:00:05 Overall: [keys/s= 94, latency=102 ms] Current: [keys/s=94, latency=102 ms], wroteUpTo=149999
> 15/01/14 07:37:06 INFO util.MultiThreadedAction: [U:10] Keys=908, cols=11.0 K, time=00:00:10 Overall: [keys/s= 90, latency=90 ms] Current: [keys/s=87, latency=77 ms], wroteUpTo=149999
> 15/01/14 07:37:09 INFO hbase.ClusterManager: Executing remote command: ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:node-5.internal
> 15/01/14 07:37:09 INFO util.Shell: Executing full command [/usr/bin/ssh  node-5.internal "ps aux | grep proc_regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL"]
> 15/01/14 07:37:09 WARN policies.Policy: Exception occured during performing action: ExitCodeException exitCode=255: stderr: Permission denied, please try again.
> Permission denied, please try again.
> Permission denied (publickey,password).
> , stdout: 
> 	at org.apache.hadoop.hbase.HBaseClusterManager.exec(HBaseClusterManager.java:208)
> 	at org.apache.hadoop.hbase.HBaseClusterManager.execWithRetries(HBaseClusterManager.java:223)
> 	at org.apache.hadoop.hbase.HBaseClusterManager.signal(HBaseClusterManager.java:268)
> 	at org.apache.hadoop.hbase.ClusterManager.kill(ClusterManager.java:97)
> 	at org.apache.hadoop.hbase.DistributedHBaseCluster.killRegionServer(DistributedHBaseCluster.java:110)
> 	at org.apache.hadoop.hbase.chaos.actions.Action.killRs(Action.java:84)
> 	at org.apache.hadoop.hbase.chaos.actions.RestartActionBaseAction.restartRs(RestartActionBaseAction.java:50)
> 	at org.apache.hadoop.hbase.chaos.actions.RestartRsHoldingMetaAction.perform(RestartRsHoldingMetaAction.java:38)
> 	at org.apache.hadoop.hbase.chaos.policies.DoActionsOncePolicy.runOneIteration(DoActionsOncePolicy.java:50)
> 	at org.apache.hadoop.hbase.chaos.policies.PeriodicPolicy.run(PeriodicPolicy.java:41)
> 	at org.apache.hadoop.hbase.chaos.policies.CompositeSequentialPolicy.run(CompositeSequentialPolicy.java:42)
> 	at java.lang.Thread.run(Thread.java:745)
> {code}
> Seems to me that tests should fail in these instances rather than just toss a warning. Was this just an oversight, [~enis] and [~ndimiduk], or is this by design?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)