You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Chao Sun (JIRA)" <ji...@apache.org> on 2017/06/28 19:08:00 UTC

[jira] [Assigned] (HIVE-16984) HoS: avoid waiting for RemoteSparkJobStatus::getAppID() when remote driver died

     [ https://issues.apache.org/jira/browse/HIVE-16984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chao Sun reassigned HIVE-16984:
-------------------------------


> HoS: avoid waiting for RemoteSparkJobStatus::getAppID() when remote driver died
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-16984
>                 URL: https://issues.apache.org/jira/browse/HIVE-16984
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>
> In HoS, after a RemoteDriver is launched, it may fail to initialize a Spark context and thus the ApplicationMaster will die eventually. In this case, there are two issues related to RemoteSparkJobStatus::getAppID():
> 1. Currently we call {{getAppID()}} before starting the monitoring job. For the first, it will wait for {{hive.spark.client.future.timeout}}, and for the latter, it will wait for {{hive.spark.job.monitor.timeout}}. The error message for the latter treats the {{hive.spark.job.monitor.timeout}} as the time waiting for the job submission. However, this is inaccurate as it doesn't include {{hive.spark.client.future.timeout}}.
> 2. In case the RemoteDriver suddenly died, currently we still may wait hopelessly for the timeouts. This should potentially be avoided if we know that the channel has closed between the client and remote driver.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)