You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Jaimin D Jetly (JIRA)" <ji...@apache.org> on 2014/02/21 00:47:19 UTC

[jira] [Resolved] (AMBARI-2617) History server should be managed as separate component

     [ https://issues.apache.org/jira/browse/AMBARI-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jaimin D Jetly resolved AMBARI-2617.
------------------------------------

    Resolution: Duplicate

> History server should be managed as separate component
> ------------------------------------------------------
>
>                 Key: AMBARI-2617
>                 URL: https://issues.apache.org/jira/browse/AMBARI-2617
>             Project: Ambari
>          Issue Type: Improvement
>    Affects Versions: 1.2.4
>            Reporter: Jeff Sposetti
>            Assignee: Arsen Babych
>
> Ambari is currently not tracking history server as a separate master component of mapreduce service. This can make it challenging to track problems starting mapreduce w/o knowing to go onto the host and check the history server logs.
> history server should be separate component, similar to job tracker. I think it will be OK if we make historyserver always on the same machine as jobtracker but it needs to be handled just like jobtracker with distinct and clear start/stop operation results, and host component start/stop controls.
> Easily can see the challenge by not having historyserver separate:
> 1) Stop HDFS and Mapreduce
> 2) Only start Mapreduce
> 3) You'll see the start mapreduce operation fails because of the MapReduce Check execute fails
> 4) No indication anywhere that something failed to start (JobTracker shows started ok, which is true)
> 5) Mapreduce shows green dot as started ok
> 6) Go to the Hosts > Host page and jobtracker is running
> 7) So you think everything started fine so you start thinking something might be wrong with mapreduce configs or something...
> Problem: Hosts > Host page doesn't list history server so you don't know it failed to start. And the operations didn't show distinct history server fail to start operation so user wasn't aware of failure.
> Once you figure out that history server didn't start, then you go onto the machine and see the historyserver process isn't running. Then you figure out how to check the logs and see that it failed to start completely (because NN isn't up).
> Note: we do have a nagios alert watching history server web ui so that does have an alert. But that alert alone is not enough to help people troubleshoot what is wrong in their cluster related to history server.
> 2013-06-06 07:43:38,930 FATAL org.apache.hadoop.mapred.JobHistoryServer: java.net.ConnectException: Call to xx-xx-xx-xx/xx-xx-xx-xx:8020 failed on connection exception: java.net.ConnectException: Connection refused
> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1147)
> at org.apache.hadoop.ipc.Client.call(Client.java:1123)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
> at $Proxy5.getProtocolVersion(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)