You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Sumit Mohanty (JIRA)" <ji...@apache.org> on 2013/07/16 06:28:48 UTC

[jira] [Commented] (AMBARI-2651) If there is at least one host that is not heartbeating with host components in INSTALL_FAILED state, service operations fail

    [ https://issues.apache.org/jira/browse/AMBARI-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709453#comment-13709453 ] 

Sumit Mohanty commented on AMBARI-2651:
---------------------------------------

Looks like INSTALL_FAILED is the state that causes this issue. Other combinations work fine. The issue is that when a SCH is in INSTALL_FAILED state then only way to get out of it is to perform a successful INSTALL.

* Tasks get created because, only SCH that are in MAINTENANCE or UNKNOWN are ignored
* SCH in INSTALL_FAILED do not go to UNKNOWN

Possible solutions:
* Do not create tasks when HOST is in HEARTBEAT_LOST or UNHEALTHY state
* Allow going into MAINTENANCE from more states than just INSTALLED

We should implement the first solution. The second one can be discussed for later.
                
> If there is at least one host that is not heartbeating with host components in INSTALL_FAILED state, service operations fail
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-2651
>                 URL: https://issues.apache.org/jira/browse/AMBARI-2651
>             Project: Ambari
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 1.2.5
>            Reporter: Sumit Mohanty
>            Assignee: Sumit Mohanty
>            Priority: Critical
>             Fix For: 1.2.5
>
>         Attachments: AMBARI-2651.patch
>
>
> This was a 3-host cluster. Tried to add one host via the Add Hosts Wizard.
> Forced an install failure and stopped ambari-agent on it. The Add Hosts Wizard was stuck in the "Install, Start and Test" state. Fired an API call to get out of this state. This left the host in a state where its host components are in INSTALL_FAILED state.
> Invoked MapReduce stop from the UI. This created host component install tasks on the host as stage 1 tasks. This causes stage 2 tasks to be aborted (in this example, JobTracker stop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira