You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Sahil Takiar (JIRA)" <ji...@apache.org> on 2018/02/12 21:05:00 UTC

[jira] [Commented] (HIVE-18684) Race condition in RemoteSparkJobMonitor

    [ https://issues.apache.org/jira/browse/HIVE-18684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361440#comment-16361440 ] 

Sahil Takiar commented on HIVE-18684:
-------------------------------------

Right now, it looks like the code in {{RemoteSparkJobMonitor}} is very poll-based. It polls the {{RemoteDriver}} for information every second and displays it. Ideally we would be more event driven here, and whenever the {{SparkClient}} receives an update from the {{RemoteDriver}} it is logged immediately. However, implementing an event-driven model would require re-writing a lot of this code. Unless there is a more compelling reason to implement an event-based model, we should probably just stick to the current code. There should be a simpler workaround for the bug reported in this JIRA anyway.

> Race condition in RemoteSparkJobMonitor
> ---------------------------------------
>
>                 Key: HIVE-18684
>                 URL: https://issues.apache.org/jira/browse/HIVE-18684
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>
> There is a race condition in {{RemoteSparkJobMonitor}}. Sometimes the info in {{RemoteSparkJobMonitor#startMonitor.STARTED}} gets printed out, sometimes it doesn't. This can be easily verified by running a qtest on {{TestMiniSparkOnYarnCliDriver}} and counting the number of times {{Query Hive on Spark job}} is printed vs. the number of times {{Finished successfully in}} gets printed.
> The issue is that {{RemoteSparkJobMonitor}} runs every one second, and checks the state of {{JobHandle}}. Depending on the state, it prints out some logging info. The content of the logs contain an implicit assumption that logs in the {{STARTED}} state are printed before the logs in the {{SUCCEEDED}} state. However, this isn't always the case. The state transitions are driven by how long the remote Spark job takes to run, and it it finishes within one second then the logs in the {{STARTED}} state never printed.
> This can be confusing to users, and there is key debugging information that is printed in the {{STARTED}} state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)