You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Richard Calaba (JIRA)" <ji...@apache.org> on 2016/05/02 22:53:13 UTC
[jira] [Commented] (KYLIN-1319) Find a better way to check hadoop
job status
[ https://issues.apache.org/jira/browse/KYLIN-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267485#comment-15267485 ]
Richard Calaba commented on KYLIN-1319:
---------------------------------------
Ok, I think we have found the correct way how to have this properly working.
1) If the kylin.job.yarn.app.rest.check.status.url IS SET in conf/kylin.properties file -> then this URL is used to check the MR job status. In this scenario - there is currently (1.5.1 version) no support for having multiple URLs and have them being used in round-robin fashion. This could be added by developer(s).
2) If you enable Yarn HA - then the property yarn.resourcemanager.webapp.address (in yarn-site.xml) defines which host is used when calling the Job Status WebService.
This is indicated in MR job log by those entries:
2016-04-30 01:40:31,361 INFO [pool-2-thread-1] execution.AbstractExecutable:218 : kylin.job.yarn.app.rest.check.status.url is not set, read from job configuration
2016-04-30 01:40:31,362 INFO [pool-2-thread-1] execution.AbstractExecutable:234 : yarn.resourcemanager.webapp.address:http://<hostname>:8088
HA Enabled (in yarn-site.xml):
========================
<!-- Resource Manager MapR HA Configs -->
<property>
<name>yarn.resourcemanager.ha.custom-ha-enabled</name>
<value>true</value>
<description>MapR Zookeeper based RM Reconnect Enabled. If this is true, set the failover proxy to be the class MapRZKBasedRMFailoverProxyPr$
</property>
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.MapRZKBasedRMFailoverProxyProvider</value>
<description>Zookeeper based reconnect proxy provider. Should be set if and only if mapr-ha-enabled property is true.</description>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
<description>RM Recovery Enabled</description>
</property>
)
Job Status WebService Host (in yarn-site.xml):
========================
(replace <rm_host_name> with your own host name)
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value><rm_host_name>:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value><rm_host_name>:8090</value>
</property>
> Find a better way to check hadoop job status
> --------------------------------------------
>
> Key: KYLIN-1319
> URL: https://issues.apache.org/jira/browse/KYLIN-1319
> Project: Kylin
> Issue Type: Improvement
> Reporter: liyang
> Assignee: Zhong Yanghong
> Labels: newbie
>
> Currently Kylin retrieves jobs status via a resource manager web service like {code}https://<your_rm_server>:<port>/ws/v1/cluster/apps/${job_id}?anonymous=true{code}
> It is not most robust. Some user does not have "yarn.resourcemanager.webapp.address" set in yarm-site.xml, then get status will fail out-of-box. They have to set a Kylin property "kylin.job.yarn.app.rest.check.status.url" to overcome, which is not user friendly.
> Kerberos authentication might cause problem too if security is enabled.
> Is there a more robust way to check job status? Via Job API?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)