You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Tom Beerbower (JIRA)" <ji...@apache.org> on 2012/11/30 18:09:59 UTC

[jira] [Created] (AMBARI-1044) API is not returning Ganglia metrics for one of the hosts in the cluster

Tom Beerbower created AMBARI-1044:
-------------------------------------

             Summary: API is not returning Ganglia metrics for one of the hosts in the cluster
                 Key: AMBARI-1044
                 URL: https://issues.apache.org/jira/browse/AMBARI-1044
             Project: Ambari
          Issue Type: Sub-task
            Reporter: Tom Beerbower
            Assignee: Tom Beerbower




A cluster was deployed with 4 hosts, with Ambari Server running on a different host.
Host graphs are showing for 3 of the hosts.
For one of the hosts, API is not returning any temporal data.
Ganglia is showing host-level metrics.

UI: http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/#/main/hosts/ip-10-224-42-108.ec2.internal/summary
Ganglia UI: http://ec2-174-129-70-110.compute-1.amazonaws.com/ganglia/mobile_helper.php?show_host_metrics=1&h=ip-10-224-42-108.ec2.internal&c=HDPNameNode&r=hour&cs=&ce=

API response:
{
"href" : "http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/api/v1/clusters/C2/hosts/ip-10-224-42-108.ec2.internal?fields=metrics/cpu/cpu_user1354227417,1354231017,15,metrics/cpu/cpu_wio1354227417,1354231017,15,metrics/cpu/cpu_nice1354227417,1354231017,15,metrics/cpu/cpu_aidle1354227417,1354231017,15,metrics/cpu/cpu_system1354227417,1354231017,15,metrics/cpu/cpu_idle1354227417,1354231017,15",
"Hosts" :
{ "cluster_name" : "C2", "host_name" : "ip-10-224-42-108.ec2.internal" }

}

We need to understand the root cause.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AMBARI-1044) API is not returning Ganglia metrics for one of the hosts in the cluster

Posted by "Tom Beerbower (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AMBARI-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom Beerbower updated AMBARI-1044:
----------------------------------

    Attachment: AMBARI-1044.patch
    
> API is not returning Ganglia metrics for one of the hosts in the cluster
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-1044
>                 URL: https://issues.apache.org/jira/browse/AMBARI-1044
>             Project: Ambari
>          Issue Type: Sub-task
>            Reporter: Tom Beerbower
>            Assignee: Tom Beerbower
>         Attachments: AMBARI-1044.patch
>
>
> A cluster was deployed with 4 hosts, with Ambari Server running on a different host.
> Host graphs are showing for 3 of the hosts.
> For one of the hosts, API is not returning any temporal data.
> Ganglia is showing host-level metrics.
> UI: http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/#/main/hosts/ip-10-224-42-108.ec2.internal/summary
> Ganglia UI: http://ec2-174-129-70-110.compute-1.amazonaws.com/ganglia/mobile_helper.php?show_host_metrics=1&h=ip-10-224-42-108.ec2.internal&c=HDPNameNode&r=hour&cs=&ce=
> API response:
> {
> "href" : "http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/api/v1/clusters/C2/hosts/ip-10-224-42-108.ec2.internal?fields=metrics/cpu/cpu_user1354227417,1354231017,15,metrics/cpu/cpu_wio1354227417,1354231017,15,metrics/cpu/cpu_nice1354227417,1354231017,15,metrics/cpu/cpu_aidle1354227417,1354231017,15,metrics/cpu/cpu_system1354227417,1354231017,15,metrics/cpu/cpu_idle1354227417,1354231017,15",
> "Hosts" :
> { "cluster_name" : "C2", "host_name" : "ip-10-224-42-108.ec2.internal" }
> }
> We need to understand the root cause.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AMBARI-1044) API is not returning Ganglia metrics for one of the hosts in the cluster

Posted by "Tom Beerbower (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/AMBARI-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507459#comment-13507459 ] 

Tom Beerbower commented on AMBARI-1044:
---------------------------------------

I don't see any related exceptions in the server log which means that either its not attempting to get the metrics for this host or they are just not being set on the host resource.

I think that I see what is happening. One of the arguments that can be specified for the rrd query is the Ganglia cluster (HDPHBaseMaster, HDPJobTracker, HDPNameNode or HDPSlaves). The question is, for a host level query which Ganglia cluster should we specify?

Its hard to say since a host isn't necessarily with any of the services related to those clusters... or maybe more than one. It turns out it doesn't really matter. In this case I can see the system level rrd files that we use for host level metrics for ip-10-224-42-108.ec2.internal under any of the Ganglia cluster folders. For example ...
{code}
[root@ip-10-40-91-121 rrds]# ls ./HDPHBaseMaster/ip-10-224-42-108.ec2.internal
boottime.rrd  bytes_out.rrd  cpu_idle.rrd  cpu_num.rrd    cpu_system.rrd  cpu_wio.rrd    disk_total.rrd    load_five.rrd  mem_buffers.rrd  mem_free.rrd    mem_total.rrd      pkts_in.rrd   proc_run.rrd    swap_free.rrd
bytes_in.rrd  cpu_aidle.rrd  cpu_nice.rrd  cpu_speed.rrd  cpu_user.rrd    disk_free.rrd  load_fifteen.rrd  load_one.rrd   mem_cached.rrd   mem_shared.rrd  part_max_used.rrd  pkts_out.rrd  proc_total.rrd  swap_total.rrd

...

[root@ip-10-40-91-121 rrds]# ls HDPNameNode/ip-10-224-42-108.ec2.internal
boottime.rrd  bytes_out.rrd  cpu_idle.rrd  cpu_num.rrd    cpu_system.rrd  cpu_wio.rrd    disk_total.rrd    load_five.rrd  mem_buffers.rrd  mem_free.rrd    mem_total.rrd      pkts_in.rrd   proc_run.rrd    swap_free.rrd
bytes_in.rrd  cpu_aidle.rrd  cpu_nice.rrd  cpu_speed.rrd  cpu_user.rrd    disk_free.rrd  load_fifteen.rrd  load_one.rrd   mem_cached.rrd   mem_shared.rrd  part_max_used.rrd  pkts_out.rrd  proc_total.rrd  swap_total.rrd
{code}
The approach that I've been using is to look through the host components for the host that we are interested in and try to map one of its component names back to a Ganglia cluster. In this case it looks like the host with the missing metrics is not associated with any component that would map back given the mapping method that I am using.

Given what I am currently seeing with the system level metrics, I think that it would be safe to simply use HDPSlaves as the Ganglia cluster for host level queries.
                
> API is not returning Ganglia metrics for one of the hosts in the cluster
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-1044
>                 URL: https://issues.apache.org/jira/browse/AMBARI-1044
>             Project: Ambari
>          Issue Type: Sub-task
>            Reporter: Tom Beerbower
>            Assignee: Tom Beerbower
>
> A cluster was deployed with 4 hosts, with Ambari Server running on a different host.
> Host graphs are showing for 3 of the hosts.
> For one of the hosts, API is not returning any temporal data.
> Ganglia is showing host-level metrics.
> UI: http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/#/main/hosts/ip-10-224-42-108.ec2.internal/summary
> Ganglia UI: http://ec2-174-129-70-110.compute-1.amazonaws.com/ganglia/mobile_helper.php?show_host_metrics=1&h=ip-10-224-42-108.ec2.internal&c=HDPNameNode&r=hour&cs=&ce=
> API response:
> {
> "href" : "http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/api/v1/clusters/C2/hosts/ip-10-224-42-108.ec2.internal?fields=metrics/cpu/cpu_user1354227417,1354231017,15,metrics/cpu/cpu_wio1354227417,1354231017,15,metrics/cpu/cpu_nice1354227417,1354231017,15,metrics/cpu/cpu_aidle1354227417,1354231017,15,metrics/cpu/cpu_system1354227417,1354231017,15,metrics/cpu/cpu_idle1354227417,1354231017,15",
> "Hosts" :
> { "cluster_name" : "C2", "host_name" : "ip-10-224-42-108.ec2.internal" }
> }
> We need to understand the root cause.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (AMBARI-1044) API is not returning Ganglia metrics for one of the hosts in the cluster

Posted by "Tom Beerbower (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/AMBARI-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom Beerbower updated AMBARI-1044:
----------------------------------

    Attachment: AMBARI-1044-2.patch
    
> API is not returning Ganglia metrics for one of the hosts in the cluster
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-1044
>                 URL: https://issues.apache.org/jira/browse/AMBARI-1044
>             Project: Ambari
>          Issue Type: Sub-task
>            Reporter: Tom Beerbower
>            Assignee: Tom Beerbower
>         Attachments: AMBARI-1044-2.patch, AMBARI-1044.patch
>
>
> A cluster was deployed with 4 hosts, with Ambari Server running on a different host.
> Host graphs are showing for 3 of the hosts.
> For one of the hosts, API is not returning any temporal data.
> Ganglia is showing host-level metrics.
> UI: http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/#/main/hosts/ip-10-224-42-108.ec2.internal/summary
> Ganglia UI: http://ec2-174-129-70-110.compute-1.amazonaws.com/ganglia/mobile_helper.php?show_host_metrics=1&h=ip-10-224-42-108.ec2.internal&c=HDPNameNode&r=hour&cs=&ce=
> API response:
> {
> "href" : "http://ec2-54-242-174-25.compute-1.amazonaws.com:8080/api/v1/clusters/C2/hosts/ip-10-224-42-108.ec2.internal?fields=metrics/cpu/cpu_user1354227417,1354231017,15,metrics/cpu/cpu_wio1354227417,1354231017,15,metrics/cpu/cpu_nice1354227417,1354231017,15,metrics/cpu/cpu_aidle1354227417,1354231017,15,metrics/cpu/cpu_system1354227417,1354231017,15,metrics/cpu/cpu_idle1354227417,1354231017,15",
> "Hosts" :
> { "cluster_name" : "C2", "host_name" : "ip-10-224-42-108.ec2.internal" }
> }
> We need to understand the root cause.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira