You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Thejas Nair (Jira)" <ji...@apache.org> on 2021/06/02 20:13:00 UTC
[jira] [Commented] (HIVE-25191) Modernize Hive Thrift CLI Service Protocol

    [ https://issues.apache.org/jira/browse/HIVE-25191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355944#comment-17355944 ] 

Thejas Nair commented on HIVE-25191:
------------------------------------

[~mattmccline] -  Thanks for working on this. I would reccomend creating two specific jiras for two issues you mention here with titles that are more self explanatory.

For query execution, we do have a long poll mechanism for query execution part, so that you don't need a long persistent connection (

hive.server2.long.polling.timeout config is relevant to that) . I think there was some work done by [~vgumashta] for async query compilation as well, but that might not be complete.

 

About the cleanup on client going away, there is already support for hive.server2.idle.session.timeout and hive.server2.idle.operation.timeout . Does that not address the use case ?

 

 

 

> Modernize Hive Thrift CLI Service Protocol
> ------------------------------------------
>
>                 Key: HIVE-25191
>                 URL: https://issues.apache.org/jira/browse/HIVE-25191
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Major
>
> Unnecessary errors are occurring with the advent of proxy use such as Gateways between the Hive client and Hive Server 2. Query failures can be due to arbitrary proxy timeouts. This proposal avoids the timeouts by changing the protocol to do regular polling. Currently, the Hive client uses one request for the query compile request. Long query compile times make those requests vulnerable to the arbitrary proxy timeouts.
> Another issue is Hive Server 2 sometimes does not notice the client has failed or has lost interest in a potentially long running query. This causes Hive locks and Big Data query resources to be held unnecessarily. The assumption is the client issues a cancel query request when it gets an error. This assumption does not always hold. If the proxy returned an error itself, that proxy may reject the subsequent cancel request, too. And, if the client is killed or the network is down, the client cannot complete a cancel request. The proposed solution here is for Hive Server 2 to watch that the client is sending regular polling requests for status. If a client ceases those requests, then Hive Server 2 will cancel the query.
> Hive owns the JDBC path (i.e. HiveDriver). The ODBC path may be more challenging because vendors provide ODBC drivers and Hive does not own the ODBC protocol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)