You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Dmytro Sen (JIRA)" <ji...@apache.org> on 2015/08/10 13:18:45 UTC

[jira] [Updated] (AMBARI-12688) HDFS check fails after move NameNode on NN HA cluster

     [ https://issues.apache.org/jira/browse/AMBARI-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmytro Sen updated AMBARI-12688:
--------------------------------
    Description: 
This ticket is related to
https://issues.apache.org/jira/browse/AMBARI-10750 Initial merge of advanced api provisioning work

Detailed scenario
1. Deploy 3-node cluster from a blueprint (NameNode is on host1. Here NN is mapped to host1 in the cluster topology)
2. Enable NN HA (now NameNodes are on host1 and host2) 
3. Move NN from host1 to host3(now NameNodes are on host3 and host2, but according to cluster topology NN also mapped to host1)
4. HDFS service check fails
Because of the code below, we always have phantom host components, added from topology manager.

org/apache/ambari/server/utils/StageUtils.java:354
{code}
    // add components from topology manager
    for (Map.Entry<String, Collection<String>> entry : pendingHostComponents.entrySet()) {
      String hostname = entry.getKey();
      Collection<String> hostComponents = entry.getValue();

      for (String hostComponent : hostComponents) {
        String roleName = componentToClusterInfoKeyMap.get(hostComponent);
        if (null == roleName) {
          roleName = additionalComponentToClusterInfoKeyMap.get(hostComponent);
        }
        if (null == roleName) {
          // even though all mappings are being added, componentToClusterInfoKeyMap is
          // a higher priority lookup
          for (Service service : cluster.getServices().values()) {
            for (ServiceComponent sc : service.getServiceComponents().values()) {
              if (!sc.isClientComponent() && sc.getName().equals(hostComponent)) {
                roleName = hostComponent.toLowerCase() + "_hosts";
                additionalComponentToClusterInfoKeyMap.put(hostComponent, roleName);
              }
            }
          }
        }

        if (roleName != null) {
          SortedSet<Integer> hostsForComponentsHost = hostRolesInfo.get(roleName);

          if (hostsForComponentsHost == null) {
            hostsForComponentsHost = new TreeSet<Integer>();
            hostRolesInfo.put(roleName, hostsForComponentsHost);
          }

          int hostIndex = hostsList.indexOf(hostname);
          if (hostIndex != -1) {
            if (!hostsForComponentsHost.contains(hostIndex)) {
              hostsForComponentsHost.add(hostIndex);
            }
          } else {
            //todo: I don't think that this can happen
            //todo: determine if it can and if so, handle properly
            //todo: if it 'cant' should probably enforce invariant
            throw new RuntimeException("Unable to get host index for host: " + hostname);
          }
        }
      }
    }
{code}

 clusterHost info is a merged result of actual host components mapping and components mapping from topolgy manager(which is stale if component have been moved/removed)
{noformat}
    "clusterHostInfo": {
...
        "namenode_host": [
            "host1",
            "host2",
            "host3"
        ],
{noformat}

I think clusterHostInfo is incorrect if any component has been moved or removed from the node where it had been initially deployed by a blueprint.

  was:
This ticket is related to
https://issues.apache.org/jira/browse/AMBARI-10750 Initial merge of advanced api provisioning work

Detailed scenario
1. Deploy 3-node cluster from a blueprint (NameNode is on host1. Here NN is mapped to host1 in the cluster topology)
2. Enable NN HA (now NameNodes are on host1 and host2) 
3. Move NN from host1 to host3(now NameNodes are on host3 and host2, but according to cluster topology NN also mapped to host1)

Because of the code below, we always have phantom host components, added from topology manager.

org/apache/ambari/server/utils/StageUtils.java:354
{code}
    // add components from topology manager
    for (Map.Entry<String, Collection<String>> entry : pendingHostComponents.entrySet()) {
      String hostname = entry.getKey();
      Collection<String> hostComponents = entry.getValue();

      for (String hostComponent : hostComponents) {
        String roleName = componentToClusterInfoKeyMap.get(hostComponent);
        if (null == roleName) {
          roleName = additionalComponentToClusterInfoKeyMap.get(hostComponent);
        }
        if (null == roleName) {
          // even though all mappings are being added, componentToClusterInfoKeyMap is
          // a higher priority lookup
          for (Service service : cluster.getServices().values()) {
            for (ServiceComponent sc : service.getServiceComponents().values()) {
              if (!sc.isClientComponent() && sc.getName().equals(hostComponent)) {
                roleName = hostComponent.toLowerCase() + "_hosts";
                additionalComponentToClusterInfoKeyMap.put(hostComponent, roleName);
              }
            }
          }
        }

        if (roleName != null) {
          SortedSet<Integer> hostsForComponentsHost = hostRolesInfo.get(roleName);

          if (hostsForComponentsHost == null) {
            hostsForComponentsHost = new TreeSet<Integer>();
            hostRolesInfo.put(roleName, hostsForComponentsHost);
          }

          int hostIndex = hostsList.indexOf(hostname);
          if (hostIndex != -1) {
            if (!hostsForComponentsHost.contains(hostIndex)) {
              hostsForComponentsHost.add(hostIndex);
            }
          } else {
            //todo: I don't think that this can happen
            //todo: determine if it can and if so, handle properly
            //todo: if it 'cant' should probably enforce invariant
            throw new RuntimeException("Unable to get host index for host: " + hostname);
          }
        }
      }
    }
{code}

 clusterHost info is a merged result of actual host components mapping and components mapping from topolgy manager(which is stale if component have been moved/removed)
{noformat}
    "clusterHostInfo": {
...
        "namenode_host": [
            "host1",
            "host2",
            "host3"
        ],
{noformat}

I think clusterHostInfo is incorrect if any component has been moved or removed from the node where it had been initially deployed by a blueprint.


> HDFS check fails after move NameNode on NN HA cluster
> -----------------------------------------------------
>
>                 Key: AMBARI-12688
>                 URL: https://issues.apache.org/jira/browse/AMBARI-12688
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server, blueprints
>    Affects Versions: 2.1.1
>            Reporter: Dmytro Sen
>            Assignee: Dmytro Sen
>            Priority: Critical
>             Fix For: 2.1.1
>
>
> This ticket is related to
> https://issues.apache.org/jira/browse/AMBARI-10750 Initial merge of advanced api provisioning work
> Detailed scenario
> 1. Deploy 3-node cluster from a blueprint (NameNode is on host1. Here NN is mapped to host1 in the cluster topology)
> 2. Enable NN HA (now NameNodes are on host1 and host2) 
> 3. Move NN from host1 to host3(now NameNodes are on host3 and host2, but according to cluster topology NN also mapped to host1)
> 4. HDFS service check fails
> Because of the code below, we always have phantom host components, added from topology manager.
> org/apache/ambari/server/utils/StageUtils.java:354
> {code}
>     // add components from topology manager
>     for (Map.Entry<String, Collection<String>> entry : pendingHostComponents.entrySet()) {
>       String hostname = entry.getKey();
>       Collection<String> hostComponents = entry.getValue();
>       for (String hostComponent : hostComponents) {
>         String roleName = componentToClusterInfoKeyMap.get(hostComponent);
>         if (null == roleName) {
>           roleName = additionalComponentToClusterInfoKeyMap.get(hostComponent);
>         }
>         if (null == roleName) {
>           // even though all mappings are being added, componentToClusterInfoKeyMap is
>           // a higher priority lookup
>           for (Service service : cluster.getServices().values()) {
>             for (ServiceComponent sc : service.getServiceComponents().values()) {
>               if (!sc.isClientComponent() && sc.getName().equals(hostComponent)) {
>                 roleName = hostComponent.toLowerCase() + "_hosts";
>                 additionalComponentToClusterInfoKeyMap.put(hostComponent, roleName);
>               }
>             }
>           }
>         }
>         if (roleName != null) {
>           SortedSet<Integer> hostsForComponentsHost = hostRolesInfo.get(roleName);
>           if (hostsForComponentsHost == null) {
>             hostsForComponentsHost = new TreeSet<Integer>();
>             hostRolesInfo.put(roleName, hostsForComponentsHost);
>           }
>           int hostIndex = hostsList.indexOf(hostname);
>           if (hostIndex != -1) {
>             if (!hostsForComponentsHost.contains(hostIndex)) {
>               hostsForComponentsHost.add(hostIndex);
>             }
>           } else {
>             //todo: I don't think that this can happen
>             //todo: determine if it can and if so, handle properly
>             //todo: if it 'cant' should probably enforce invariant
>             throw new RuntimeException("Unable to get host index for host: " + hostname);
>           }
>         }
>       }
>     }
> {code}
>  clusterHost info is a merged result of actual host components mapping and components mapping from topolgy manager(which is stale if component have been moved/removed)
> {noformat}
>     "clusterHostInfo": {
> ...
>         "namenode_host": [
>             "host1",
>             "host2",
>             "host3"
>         ],
> {noformat}
> I think clusterHostInfo is incorrect if any component has been moved or removed from the node where it had been initially deployed by a blueprint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)