You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Richard Calaba (JIRA)" <ji...@apache.org> on 2016/05/02 22:53:13 UTC

[jira] [Commented] (KYLIN-1319) Find a better way to check hadoop job status

    [ https://issues.apache.org/jira/browse/KYLIN-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267485#comment-15267485 ] 

Richard Calaba commented on KYLIN-1319:
---------------------------------------

Ok, I think we have found the correct way how to have this properly working.

1) If the kylin.job.yarn.app.rest.check.status.url IS SET in conf/kylin.properties file -> then this URL is used to check the MR job status. In this scenario - there is currently (1.5.1 version) no support for having multiple URLs and have them being used in round-robin fashion. This could be added by developer(s).

2) If you enable Yarn HA  - then the property yarn.resourcemanager.webapp.address (in yarn-site.xml) defines which host is used when calling the Job Status WebService.

This is indicated in MR job log by those entries:

2016-04-30 01:40:31,361 INFO  [pool-2-thread-1] execution.AbstractExecutable:218 : kylin.job.yarn.app.rest.check.status.url is not set, read from job configuration
2016-04-30 01:40:31,362 INFO  [pool-2-thread-1] execution.AbstractExecutable:234 : yarn.resourcemanager.webapp.address:http://<hostname>:8088

HA Enabled (in yarn-site.xml):
========================

  <!-- Resource Manager MapR HA Configs -->
  <property>
    <name>yarn.resourcemanager.ha.custom-ha-enabled</name>
    <value>true</value>
    <description>MapR Zookeeper based RM Reconnect Enabled. If this is true, set the failover proxy to be the class MapRZKBasedRMFailoverProxyPr$
  </property>
  <property>
    <name>yarn.client.failover-proxy-provider</name>
    <value>org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider</value>
    <description>Zookeeper based reconnect proxy provider. Should be set if and only if mapr-ha-enabled property is true.</description>
  </property>
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
    <description>RM Recovery Enabled</description>
  </property>

)

Job Status WebService Host (in yarn-site.xml):
========================

(replace <rm_host_name> with your own host name)

<property>
      <name>yarn.resourcemanager.webapp.address</name>
      <value><rm_host_name>:8088</value>
    </property>
    <property>
      <name>yarn.resourcemanager.webapp.https.address</name>
      <value><rm_host_name>:8090</value>
  </property>


> Find a better way to check hadoop job status
> --------------------------------------------
>
>                 Key: KYLIN-1319
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1319
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: liyang
>            Assignee: Zhong Yanghong
>              Labels: newbie
>
> Currently Kylin retrieves jobs status via a resource manager web service like {code}https://<your_rm_server>:<port>/ws/v1/cluster/apps/${job_id}?anonymous=true{code}
> It is not most robust. Some user does not have "yarn.resourcemanager.webapp.address" set in yarm-site.xml, then get status will fail out-of-box. They have to set a Kylin property "kylin.job.yarn.app.rest.check.status.url" to overcome, which is not user friendly.
> Kerberos authentication might cause problem too if security is enabled.
> Is there a more robust way to check job status? Via Job API?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)