You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Dmitry Lysnichenko (JIRA)" <ji...@apache.org> on 2015/12/01 18:00:12 UTC

[jira] [Updated] (AMBARI-14137) Rebalance HDFS after enabling NN HA failed

     [ https://issues.apache.org/jira/browse/AMBARI-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitry Lysnichenko updated AMBARI-14137:
----------------------------------------
    Attachment: AMBARI-14137.patch

> Rebalance HDFS after enabling NN HA failed
> ------------------------------------------
>
>                 Key: AMBARI-14137
>                 URL: https://issues.apache.org/jira/browse/AMBARI-14137
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>         Attachments: AMBARI-14137.patch
>
>
> STR:
> 1) Install and deploy cluster
> 2) Enable NameNode HA
> 3) Enable security
> 3) Start rebalance HDFS
> Actually result:
> {code}
> "stderr" : "Traceback (most recent call last):\n  File \"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py\", line 425, in <module>\n    NameNode().execute()\n  File \"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py\", line 218, in execute\n    method(env)\n  File \"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py\", line 363, in rebalancehdfs\n    logoutput = False,\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/base.py\", line 154, in __init__\n    self.env.run()\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/environment.py\", line 156, in run\n    self.run_action(resource, action)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/environment.py\", line 119, in run_action\n    provider_action()\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py\", line 238, in action_run\n    tries=self.resource.tries, try_sleep=self.resource.try_sleep)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\", line 70, in inner\n    result = function(command, **kwargs)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\", line 92, in checked_call\n    tries=tries, try_sleep=try_sleep)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\", line 140, in _call_wrapper\n    result = _call(command, **kwargs_copy)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\", line 291, in _call\n    raise Fail(err_msg)\nresource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su cstm-hdfs -l -s /bin/bash -c 'export  PATH='\"'\"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'\"'\"' KRB5CCNAME=/tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8 ; hdfs --config /usr/hdp/current/hadoop-client/conf balancer -threshold 10'' returned 252. ######## Hortonworks #############\nThis is MOTD message, added for testing in qe infra\n15/11/17 07:29:08 INFO balancer.Balancer: Using a threshold of 10.0\n15/11/17 07:29:08 INFO balancer.Balancer: namenodes  = [hdfs://os-d7-mwznvu-ambari-hv-ser-ha-5-2.novalocal:8020, hdfs://nameservice]\n15/11/17 07:29:08 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]\n15/11/17 07:29:08 INFO balancer.Balancer: included nodes = []\n15/11/17 07:29:08 INFO balancer.Balancer: excluded nodes = []\n15/11/17 07:29:08 INFO balancer.Balancer: source nodes = []\nTime Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved\norg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1927)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getServerDefaults(FSNamesystem.java:1625)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getServerDefaults(NameNodeRpcServer.java:659)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getServerDefaults(ClientNamenodeProtocolServerSideTranslatorPB.java:383)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)\n.  Exiting ...\nNov 17, 2015 7:29:09 AM  Balancing took 1.932 seconds",
> {code}
> {code}
> "stdout" : "Starting balancer with threshold = 10\n2015-11-17 07:29:06,492 - call['/usr/bin/klist -s /tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8'] {'user': 'cstm-hdfs'}\n2015-11-17 07:29:06,514 - call returned (1, '######## Hortonworks #############\\nThis is MOTD message, added for testing in qe infra')\n2015-11-17 07:29:06,515 - Execute['/usr/bin/kinit -c /tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8 -kt /etc/security/keytabs/hdfs.headless.keytab cstm-hdfs@EXAMPLE.COM'] {'user': 'cstm-hdfs'}\nExecuting command ambari-sudo.sh su cstm-hdfs -l -s /bin/bash -c 'export  PATH='\"'\"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'\"'\"' KRB5CCNAME=/tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8 ; hdfs --config /usr/hdp/current/hadoop-client/conf balancer -threshold 10'\n2015-11-17 07:29:06,550 - Execute['ambari-sudo.sh su cstm-hdfs -l -s /bin/bash -c 'export  PATH='\"'\"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'\"'\"' KRB5CCNAME=/tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8 ; hdfs --config /usr/hdp/current/hadoop-client/conf balancer -threshold 10''] {'logoutput': False, 'on_new_line': handle_new_line}\n[balancer] ######## Hortonworks #############\nThis is MOTD message, added for testing in qe infra\n[balancer] 15/11/17 07:29:08 INFO balancer.Balancer: Using a threshold of 10.0\n[balancer] 15/11/17 07:29:08 INFO balancer.Balancer: namenodes  = [hdfs://os-d7-mwznvu-ambari-hv-ser-ha-5-2.novalocal:8020, hdfs://nameservice]\n[balancer] 15/11/17 07:29:08 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]\n[balancer] 15/11/17 07:29:08 INFO balancer.Balancer: included nodes = []\n[balancer] 15/11/17 07:29:08 INFO balancer.Balancer: excluded nodes = []\n[balancer] 15/11/17 07:29:08 INFO balancer.Balancer: source nodes = []\n[balancer] Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved[balancer] \n[balancer] org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1927)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getServerDefaults(FSNamesystem.java:1625)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getServerDefaults(NameNodeRpcServer.java:659)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getServerDefaults(ClientNamenodeProtocolServerSideTranslatorPB.java:383)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto[balancer] bufRpcEngine.java:616)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)\n.  Exiting ...[balancer] \n[balancer] Nov 17, 2015 7:29:09 AM [balancer]  [balancer] Balancing took 1.932 seconds[balancer]",
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)