You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Oleksandr Diachenko (JIRA)" <ji...@apache.org> on 2013/09/04 20:06:54 UTC

[jira] [Commented] (AMBARI-3013) Powering off RM node increases API latency by a factor of 6

    [ https://issues.apache.org/jira/browse/AMBARI-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758083#comment-13758083 ] 

Oleksandr Diachenko commented on AMBARI-3013:
---------------------------------------------

I've managed out to reproduce issue on my 4 hosts VM cluster. As [~srimanth.gunturi] I used default layout, so ganglia server was on the same host as rm. After turning off node with rm and ganglia connection time out exceptions appears in ambari-server.log:
{code}
17:41:02,730 ERROR [qtp1424875934-587] GangliaPropertyProvider:491 - Caught exception getting Ganglia metrics : spec=http://c6402.ambari.apache.org/cgi-bin/rrd.py?c=HDPResourceManager&h=c6402.ambari.apache.org&m=cpu_wio&e=now&pt=true
java.net.SocketTimeoutException: connect timed out
{code}
Obviously, the reason of such increasing latency is unavailable ganglia server. Our current connection time out is 5000 ms.
I've tried to change connection timeout to 1000 ms and got a result. 
For 5000 ms increasing factor ~ 8.
1000 ms ~ 3.
                
> Powering off RM node increases API latency by a factor of 6
> -----------------------------------------------------------
>
>                 Key: AMBARI-3013
>                 URL: https://issues.apache.org/jira/browse/AMBARI-3013
>             Project: Ambari
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 1.4.0
>            Reporter: Srimanth Gunturi
>            Assignee: Mahadev konar
>              Labels: perfomance
>             Fix For: 1.4.0
>
>         Attachments: Response Time Graph_conn_timeout1000.png, Response Time Graph_conn_timeout5000.png, RMpaused.png
>
>
> On a 4 node cluster I was testing the below API call.
> {noformat}
> /api/v1/clusters/${cluster}/services?fields=components/ServiceComponentInfo,components/host_components,components/host_components/HostRoles,components/host_components/metrics/jvm/memHeapUsedM,components/host_components/metrics/jvm/memHeapCommittedM,components/host_components/metrics/mapred/jobtracker/trackers_decommissioned,components/host_components/metrics/cpu/cpu_wio,components/host_components/metrics/rpc/RpcQueueTime_avg_time,components/host_components/metrics/flume/flume,components/host_components/metrics/yarn/Queue
> {noformat}
> When everything was working the latency was ~500ms. 
> I then powered off the RM node, and immediately the call latency spiked by 30 times (~15000ms) . After some time, it reduced, but still was 6 times the original latency (~3000ms). When the machine came back online, the call again fell back to its original ~500ms latency.
> Images attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira