You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Siddharth Wagle (JIRA)" <ji...@apache.org> on 2015/05/31 20:18:17 UTC
[jira] [Created] (AMBARI-11571) Move NameNode with AMS in distributed mode - Stop All fails for AMS

Siddharth Wagle created AMBARI-11571:
----------------------------------------

             Summary: Move NameNode with AMS in distributed mode - Stop All fails for AMS
                 Key: AMBARI-11571
                 URL: https://issues.apache.org/jira/browse/AMBARI-11571
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.1.0
            Reporter: Siddharth Wagle
            Assignee: Siddharth Wagle
            Priority: Critical
             Fix For: 2.1.0


When using AMS in distributed mode, we went to issue a stop-all in order to perform a Namenode move. Since AMS in distributed mode depends on HDFS, the AMS stop hung since HDFS was unreachable. We had to manually kill AMS and “retry” the NN move wizard.

{code}
stderr:
2015-05-25 14:35:54,589 - Error while executing command 'stop':
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 214, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 55, in stop
self.configure(env) # for security
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 39, in configure
hbase('master')
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/hbase.py", line 197, in hbase
params.HdfsDirectory(None, action="create")
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in _init_
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_directory.py", line 107, in action_create
not_if=as_user(format("hadoop --config
{hdp_conf_dir}

fs -ls
{dir_list_str}

"), hdp_hdfs_user)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in _init_
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 274, in action_run
raise ex
Fail: Execution of 'hadoop --config /etc/hadoop/conf fs -mkdir -p hdfs://muonblue:8020/ams-hbase /apps/hbase/staging && hadoop --config /etc/hadoop/conf fs -chmod 775 hdfs://muonblue:8020/ams-hbase && hadoop --config /etc/hadoop/conf fs -chmod 711 /apps/hbase/staging && hadoop --config /etc/hadoop/conf fs -chown ams hdfs://muonblue:8020/ams-hbase /apps/hbase/staging' returned 1. 15/05/25 14:29:32 INFO retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over nn00034.blue.ygrid.kks.yahoo.co.jp/100.97.64.62:8020 after 1 fail over attempts. Trying to fail over after sleeping for 760ms.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)