You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Andrew Onischuk (JIRA)" <ji...@apache.org> on 2014/06/24 18:34:25 UTC

[jira] [Commented] (AMBARI-6261) agent reg failed with timeout but didn't error out. installer stuck

    [ https://issues.apache.org/jira/browse/AMBARI-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042326#comment-14042326 ] 

Andrew Onischuk commented on AMBARI-6261:
-----------------------------------------

Committed to trunk

> agent reg failed with timeout but didn't error out. installer stuck
> -------------------------------------------------------------------
>
>                 Key: AMBARI-6261
>                 URL: https://issues.apache.org/jira/browse/AMBARI-6261
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Andrew Onischuk
>            Assignee: Andrew Onischuk
>             Fix For: 1.6.1
>
>
> Name : ambari-server  
> Arch : noarch  
> Version : 1.6.1  
> Release : 72
> 2 out of 3 agents register fine. The remaining, timed out.
> Problem #1: The error message didn't have the time but had the string replace
> "timeout =
> {0} seconds"  
> Problem #2: even though I'm in a registration error situation, the UI still
> says Installing...so I can't go back, remove the host or try again.  
>   
> {"status":"ERROR","hostsStatus":[{"hostName":"ip-10-164-165-204.ec2.internal",
> "status":"RUNNING","log":"==========================\nCopying common functions
> script...\n==========================\n\nCommand start time 2014-06-24
> 08:48:01\n\nWarning: Permanently added
> 'ip-10-164-165-204.ec2.internal,10.164.165.204' (RSA) to the list of known
> hosts.\nscp /usr/lib/python2.6/site-
> packages/ambari_commons\nhost=ip-10-164-165-204.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:48:26\n\n==========================\nCopying OS type check
> script...\n==========================\n\nCommand start time 2014-06-24
> 08:48:26\n\nscp /usr/lib/python2.6/site-
> packages/ambari_server/os_check_type.py\nhost=ip-10-164-165-204.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:49:11\n\n==========================\nRunning OS type
> check...\n==========================\n\nCommand start time 2014-06-24
> 08:49:11\nCluster primary/cluster OS type is redhat6 and local/current OS type
> is redhat6\n\nConnection to ip-10-164-165-204.ec2.internal closed.\nSSH
> command execution finished\nhost=ip-10-164-165-204.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:49:52\n\n==========================\nChecking 'sudo' package on remote
> host...\n==========================\n\nCommand start time 2014-06-24
> 08:49:52\nsudo-1.8.6p3-12.el6.x86_64\n\nConnection to
> ip-10-164-165-204.ec2.internal closed.\nSSH command execution
> finished\nhost=ip-10-164-165-204.ec2.internal, exitcode=0\nCommand end time
> 2014-06-24 08:50:27\n\n==========================\nCopying repo file to 'tmp'
> folder...\n==========================\n\nCommand start time 2014-06-24
> 08:50:27\n\nscp
> /etc/yum.repos.d/ambari.repo\nhost=ip-10-164-165-204.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:50:58\n\n==========================\nMoving file to repo
> dir...\n==========================\n\nCommand start time 2014-06-24
> 08:50:58\n\nConnection to ip-10-164-165-204.ec2.internal closed.\nSSH command
> execution finished\nhost=ip-10-164-165-204.ec2.internal, exitcode=0\nCommand
> end time 2014-06-24 08:51:33\n\n==========================\nCopying setup
> script file...\n==========================\n\nCommand start time 2014-06-24
> 08:51:33\n\nscp /usr/lib/python2.6/site-
> packages/ambari_server/setupAgent.py\nhost=ip-10-164-165-204.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:52:13\n\n==========================\nRunning setup agent
> script...\n==========================\n\nCommand start time 2014-06-24
> 08:52:13\nAutomatic Agent registration timed out (timeout = {0}
> seconds). Check your network connectivity and retry registration, or use
> manual agent registration."},
> {"hostName":"ip-10-136-91-58.ec2.internal","status":"DONE","statusCode":"0","l
> og":"==========================\nCopying common functions
> script...\n==========================\n\nCommand start time 2014-06-24
> 08:48:01\n\nWarning: Permanently added
> 'ip-10-136-91-58.ec2.internal,10.136.91.58' (RSA) to the list of known
> hosts.\nscp /usr/lib/python2.6/site-
> packages/ambari_commons\nhost=ip-10-136-91-58.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:48:37\n\n==========================\nCopying OS type check
> script...\n==========================\n\nCommand start time 2014-06-24
> 08:48:37\n\nscp /usr/lib/python2.6/site-
> packages/ambari_server/os_check_type.py\nhost=ip-10-136-91-58.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:49:22\n\n==========================\nRunning OS type
> check...\n==========================\n\nCommand start time 2014-06-24
> 08:49:22\nCluster primary/cluster OS type is redhat6 and local/current OS type
> is redhat6\n\nConnection to ip-10-136-91-58.ec2.internal closed.\nSSH command
> execution finished\nhost=ip-10-136-91-58.ec2.internal, exitcode=0\nCommand end
> time 2014-06-24 08:49:42\n\n==========================\nChecking 'sudo'
> package on remote host...\n==========================\n\nCommand start time
> 2014-06-24 08:49:42\nsudo-1.8.6p3-12.el6.x86_64\n\nConnection to
> ip-10-136-91-58.ec2.internal closed.\nSSH command execution
> finished\nhost=ip-10-136-91-58.ec2.internal, exitcode=0\nCommand end time
> 2014-06-24 08:50:19\n\n==========================\nCopying repo file to 'tmp'
> folder...\n==========================\n\nCommand start time 2014-06-24
> 08:50:19\n\nscp
> /etc/yum.repos.d/ambari.repo\nhost=ip-10-136-91-58.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:50:54\n\n==========================\nMoving file to repo
> dir...\n==========================\n\nCommand start time 2014-06-24
> 08:50:54\n\nConnection to ip-10-136-91-58.ec2.internal closed.\nSSH command
> execution finished\nhost=ip-10-136-91-58.ec2.internal, exitcode=0\nCommand end
> time 2014-06-24 08:51:35\n\n==========================\nCopying setup script
> file...\n==========================\n\nCommand start time 2014-06-24
> 08:51:35\n\nscp /usr/lib/python2.6/site-
> packages/ambari_server/setupAgent.py\nhost=ip-10-136-91-58.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:52:00\n\n==========================\nRunning setup agent
> script...\n==========================\n\nCommand start time 2014-06-24
> 08:52:00\n/bin/sh: /usr/sbin/ambari-agent: No such file or
> directory\nRestarting ambari-agent\nVerifying Python version
> compatibility...\nUsing python /usr/bin/python2.6\nambari-agent is not
> running. No PID found at /var/run/ambari-agent/ambari-agent.pid\nVerifying
> Python version compatibility...\nUsing python /usr/bin/python2.6\nChecking for
> previously running Ambari Agent...\nStarting ambari-agent\nVerifying ambari-
> agent process status...\nAmbari Agent successfully started\nAgent PID at:
> /var/run/ambari-agent/ambari-agent.pid\nAgent out at: /var/log/ambari-agent
> /ambari-agent.out\nAgent log at: /var/log/ambari-agent/ambari-
> agent.log\n('INFO 2014-06-24 08:52:46,109 main.py:83 -
> loglevel=logging.INFO\nINFO 2014-06-24 08:52:46,110 DataCleaner.py:36 - Data
> cleanup thread started\nINFO 2014-06-24 08:52:46,111 DataCleaner.py:71 - Data
> cleanup started\nINFO 2014-06-24 08:52:46,111 DataCleaner.py:73 - Data cleanup
> finished\nINFO 2014-06-24 08:52:46,235 PingPortListener.py:51 - Ping port
> listener started on port: 8670\nINFO 2014-06-24 08:52:46,236 main.py:227 -
> Connecting to the server at: https://ip-10-164-165-204.ec2.internal:8440\nINFO
> 2014-06-24 08:52:46,236 NetUtil.py:72 - DEBUG: Trying to connect to the server
> at https://ip-10-164-165-204.ec2.internal:8440\nINFO 2014-06-24 08:52:46,236
> NetUtil.py:42 - Connecting to the following url
> https://ip-10-164-165-204.ec2.internal:8440/cert/ca\n', None)\n\nConnection to
> ip-10-136-91-58.ec2.internal closed.\nSSH command execution
> finished\nhost=ip-10-136-91-58.ec2.internal, exitcode=0\nCommand end time
> 2014-06-24 08:52:48\n"}
> ,
> {"hostName":"ip-10-95-170-54.ec2.internal","status":"DONE","statusCode":"0","l
> og":"==========================\nCopying common functions
> script...\n==========================\n\nCommand start time 2014-06-24
> 08:48:01\n\nWarning: Permanently added
> 'ip-10-95-170-54.ec2.internal,10.95.170.54' (RSA) to the list of known
> hosts.\nscp /usr/lib/python2.6/site-
> packages/ambari_commons\nhost=ip-10-95-170-54.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:48:17\n\n==========================\nCopying OS type check
> script...\n==========================\n\nCommand start time 2014-06-24
> 08:48:17\n\nscp /usr/lib/python2.6/site-
> packages/ambari_server/os_check_type.py\nhost=ip-10-95-170-54.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:48:47\n\n==========================\nRunning OS type
> check...\n==========================\n\nCommand start time 2014-06-24
> 08:48:47\nCluster primary/cluster OS type is redhat6 and local/current OS type
> is redhat6\n\nConnection to ip-10-95-170-54.ec2.internal closed.\nSSH command
> execution finished\nhost=ip-10-95-170-54.ec2.internal, exitcode=0\nCommand end
> time 2014-06-24 08:49:22\n\n==========================\nChecking 'sudo'
> package on remote host...\n==========================\n\nCommand start time
> 2014-06-24 08:49:22\nsudo-1.8.6p3-12.el6.x86_64\n\nConnection to
> ip-10-95-170-54.ec2.internal closed.\nSSH command execution
> finished\nhost=ip-10-95-170-54.ec2.internal, exitcode=0\nCommand end time
> 2014-06-24 08:49:58\n\n==========================\nCopying repo file to 'tmp'
> folder...\n==========================\n\nCommand start time 2014-06-24
> 08:49:58\n\nscp
> /etc/yum.repos.d/ambari.repo\nhost=ip-10-95-170-54.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:50:34\n\n==========================\nMoving file to repo
> dir...\n==========================\n\nCommand start time 2014-06-24
> 08:50:34\n\nConnection to ip-10-95-170-54.ec2.internal closed.\nSSH command
> execution finished\nhost=ip-10-95-170-54.ec2.internal, exitcode=0\nCommand end
> time 2014-06-24 08:50:49\n\n==========================\nCopying setup script
> file...\n==========================\n\nCommand start time 2014-06-24
> 08:50:49\n\nscp /usr/lib/python2.6/site-
> packages/ambari_server/setupAgent.py\nhost=ip-10-95-170-54.ec2.internal,
> exitcode=0\nCommand end time 2014-06-24
> 08:51:20\n\n==========================\nRunning setup agent
> script...\n==========================\n\nCommand start time 2014-06-24
> 08:51:20\n/bin/sh: /usr/sbin/ambari-agent: No such file or
> directory\nRestarting ambari-agent\nVerifying Python version
> compatibility...\nUsing python /usr/bin/python2.6\nambari-agent is not
> running. No PID found at /var/run/ambari-agent/ambari-agent.pid\nVerifying
> Python version compatibility...\nUsing python /usr/bin/python2.6\nChecking for
> previously running Ambari Agent...\nStarting ambari-agent\nVerifying ambari-
> agent process status...\nAmbari Agent successfully started\nAgent PID at:
> /var/run/ambari-agent/ambari-agent.pid\nAgent out at: /var/log/ambari-agent
> /ambari-agent.out\nAgent log at: /var/log/ambari-agent/ambari-
> agent.log\n('INFO 2014-06-24 08:51:45,593 main.py:83 -
> loglevel=logging.INFO\nINFO 2014-06-24 08:51:45,593 DataCleaner.py:36 - Data
> cleanup thread started\nINFO 2014-06-24 08:51:45,594 DataCleaner.py:71 - Data
> cleanup started\nINFO 2014-06-24 08:51:45,595 DataCleaner.py:73 - Data cleanup
> finished\nINFO 2014-06-24 08:51:45,722 PingPortListener.py:51 - Ping port
> listener started on port: 8670\nINFO 2014-06-24 08:51:45,723 main.py:227 -
> Connecting to the server at: https://ip-10-164-165-204.ec2.internal:8440\nINFO
> 2014-06-24 08:51:45,723 NetUtil.py:72 - DEBUG: Trying to connect to the server
> at https://ip-10-164-165-204.ec2.internal:8440\nINFO 2014-06-24 08:51:45,723
> NetUtil.py:42 - Connecting to the following url
> https://ip-10-164-165-204.ec2.internal:8440/cert/ca\n', None)\n\nConnection to
> ip-10-95-170-54.ec2.internal closed.\nSSH command execution
> finished\nhost=ip-10-95-170-54.ec2.internal, exitcode=0\nCommand end time
> 2014-06-24 08:51:48\n"}
> ],"log":"\n\nINFO:root:BootStrapping hosts
> ['ip-10-164-165-204.ec2.internal',\n 'ip-10-136-91-58.ec2.internal',\n
> 'ip-10-95-170-54.ec2.internal'] using /usr/lib/python2.6/site-
> packages/ambari_server cluster primary OS: redhat6 with user 'ec2-user' sshKey
> File /var/run/ambari-server/bootstrap/1/sshKey password File null using tmp
> dir /var/run/ambari-server/bootstrap/1 ambari: ip-10-164-165-204.ec2.internal;
> server_port: 8080; ambari version: 1.6.1\nINFO:root:Executing parallel
> bootstrap\nWARNING:root:Bootstrap at host ip-10-164-165-204.ec2.internal timed
> out and will be interrupted\nTraceback (most recent call last):\n File
> \"/usr/lib/python2.6/site-packages/ambari_server/bootstrap.py\", line 660, in
> <module>\n main(sys.argv)\n File \"/usr/lib/python2.6/site-
> packages/ambari_server/bootstrap.py\", line 655, in main\n pbootstrap.run()\n
> File \"/usr/lib/python2.6/site-packages/ambari_server/bootstrap.py\", line
> 582, in run\n bootstrap.interruptBootstrap()\n File \"/usr/lib/python2.6/site-
> packages/ambari_server/bootstrap.py\", line 544, in interruptBootstrap\n
> self.host_log.write(\"Automatic Agent registration timed out (timeout =
> {0}
> seconds). \" \\\\\nAttributeError: 'NoneType' object has no attribute
> 'format'\n"}



--
This message was sent by Atlassian JIRA
(v6.2#6252)