You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Bertrand Dechoux <de...@gmail.com> on 2012/08/27 18:00:40 UTC

HiveServer can not handle concurrent requests from more than one client?

Hi,

I would like to have more information about this specific sentence from the
documentation.
"HiveServer can not handle concurrent requests from more than one client."
https://cwiki.apache.org/Hive/hiveserver.html

Does it mean it is not possible with this server to provide a JDBC access
to an 'almost closed' environment for multiple users?

Regards

Bertrand

Re: HiveServer can not handle concurrent requests from more than one client?

Posted by Ranjith <ra...@gmail.com>.

thanks guys for the clarification. What about multiple queries run through a single session? Do they get queued and executed one after the other?

Thanks,
Ranjith

On Aug 27, 2012, at 5:27 PM, Bertrand Dechoux <de...@gmail.com> wrote:

> Thanks a lot.
> 
> >  It's possible to live with this limitation if you're ok with sometimes fetching other people's result sets instead of your own. 
> I hadn't thought about that, only about the states of variables. That consequence isn't nice. It won't be a security issue really in my context but that can be very inconvenient.
> 
> > Yes, this is true regardless of the configuration. Ranjiht's statement is incorrect.
> Ok, so the only true solution, as proposed in the jira is to 'serialize' the calls with a kind of proxy like a queue. But that would go against the multi users goals and relatively low latency that Hive could provide.
> 
> > These two configuration properties are actually completely orthogonal to the HiveServer multi-client issue
> I thought so but wasn't sure. Thank you for the full explanation and making clear what is the difference.
> 
>  > I recommend taking a look at the Beeswax web interface for Hive. More details (including screenshots) are available here: https://ccp.cloudera.com/display/CDHDOC/Beeswax
> 
> I know about that but I am afraid that it would mean changing the distribution which is currently used which is not a small thing. But I will consider that solution more seriously. I take it from your answer that the backend is different? I could not find much information about it and wasn't sure if the same issues applied to Beeswax.
> 
> Thanks a lot, again.
> 
> Bertrand
> 
> On Tue, Aug 28, 2012 at 12:04 AM, Carl Steinbach <cw...@apache.org> wrote:
> Hi Bertrand,
> 
> According to the proposal for HiveServer2, the current hive server provides no insurance about "session state in between calls".
> If that was all, it is something that can be lived with. It only means that for a JDBC client, all requests should be conceived as isolated.
> 
> In the HiveServer Thrift API Execute() and Fetch() are two separate calls and require two separate RPCs. In between these calls HiveServer has to maintain session state so that when the Fetch() call is made it knows which result set to look at. The current HiveServer Thrift API assumes that Thrift will consistently map the same physical connection to the same Thrift worker thread, and consequently it stores the session state in a thread local variable. Unfortunately, this assumption is false. It's possible to live with this limitation if you're ok with sometimes fetching other people's result sets instead of your own. 
>  
> The page of the Hive Server (1) says "HiveServer can not handle concurrent requests from more than one client."
> According to the jira, one may run into issues when multiples users are running it. Is that true regardless of the configuration?
> It should not be interpreted as "query will be executed one after the other", like Ranjiht said?
> 
> Yes, this is true regardless of the configuration. Ranjiht's statement is incorrect.
>  
> Eg what would be the impact of hive.exec.parallel or hive.support.concurrency?
> 
> These two configuration properties are actually completely orthogonal to the HiveServer multi-client issue, though it's hard to know that since the configuration property names were very poorly chosen. hive.exec.parallel controls whether or not the the MR jobs in the query plan DAG are executed in parallel on the cluster (https://issues.apache.org/jira/browse/HIVE-549). hive.support.concurrency controls whether or not Hive supports coarse-grained locks on tables and partitions (see https://cwiki.apache.org/confluence/display/Hive/Locking). 
>  
> What would be the recommended way for providing a hive access to multiple users to a production environnement which is thightly fire walled? Ssh is not a viable solution in my context and the hive web interface does not seem mature enough.
> 
> I recommend taking a look at the Beeswax web interface for Hive. More details (including screenshots) are available here: https://ccp.cloudera.com/display/CDHDOC/Beeswax
> 
> Thanks.
> 
> Carl
> 
> 
> 
> 
> -- 
> Bertrand Dechoux

Re: HiveServer can not handle concurrent requests from more than one client?

Posted by Bertrand Dechoux <de...@gmail.com>.

Thanks a lot.

>  It's possible to live with this limitation if you're ok with sometimes
fetching other people's result sets instead of your own.
I hadn't thought about that, only about the states of variables. That
consequence isn't nice. It won't be a security issue really in my context
but that can be very inconvenient.

> Yes, this is true regardless of the configuration. Ranjiht's statement is
incorrect.
Ok, so the only true solution, as proposed in the jira is to 'serialize'
the calls with a kind of proxy like a queue. But that would go against the
multi users goals and relatively low latency that Hive could provide.

> These two configuration properties are actually completely orthogonal to
the HiveServer multi-client issue
I thought so but wasn't sure. Thank you for the full explanation and making
clear what is the difference.

 > I recommend taking a look at the Beeswax web interface for Hive. More
details (including screenshots) are available here:
https://ccp.cloudera.com/display/CDHDOC/Beeswax

I know about that but I am afraid that it would mean changing the
distribution which is currently used which is not a small thing. But I will
consider that solution more seriously. I take it from your answer that the
backend is different? I could not find much information about it and wasn't
sure if the same issues applied to Beeswax.

Thanks a lot, again.

Bertrand

On Tue, Aug 28, 2012 at 12:04 AM, Carl Steinbach <cw...@apache.org> wrote:

> Hi Bertrand,
>
> According to the proposal for HiveServer2, the current hive server
>> provides no insurance about "session state in between calls".
>> If that was all, it is something that can be lived with. It only means
>> that for a JDBC client, all requests should be conceived as isolated.
>>
>
> In the HiveServer Thrift API Execute() and Fetch() are two separate calls
> and require two separate RPCs. In between these calls HiveServer has to
> maintain session state so that when the Fetch() call is made it knows which
> result set to look at. The current HiveServer Thrift API assumes that
> Thrift will consistently map the same physical connection to the same
> Thrift worker thread, and consequently it stores the session state in a
> thread local variable. Unfortunately, this assumption is false. It's
> possible to live with this limitation if you're ok with sometimes fetching
> other people's result sets instead of your own.
>
>
>> The page of the Hive Server (1) says "HiveServer can not handle
>> concurrent requests from more than one client."
>> According to the jira, one may run into issues when multiples users are
>> running it. Is that true regardless of the configuration?
>> It should not be interpreted as "query will be executed one after the
>> other", like Ranjiht said?
>>
>
> Yes, this is true regardless of the configuration. Ranjiht's statement is
> incorrect.
>
>
>> Eg what would be the impact of hive.exec.parallel or
>> hive.support.concurrency?
>>
>
> These two configuration properties are actually completely orthogonal to
> the HiveServer multi-client issue, though it's hard to know that since the
> configuration property names were very poorly chosen. hive.exec.parallel
> controls whether or not the the MR jobs in the query plan DAG are executed
> in parallel on the cluster (https://issues.apache.org/jira/browse/HIVE-549).
> hive.support.concurrency controls whether or not Hive supports
> coarse-grained locks on tables and partitions (see
> https://cwiki.apache.org/confluence/display/Hive/Locking).
>
>
>> What would be the recommended way for providing a hive access to multiple
>> users to a production environnement which is thightly fire walled? Ssh is
>> not a viable solution in my context and the hive web interface does not
>> seem mature enough.
>>
>
> I recommend taking a look at the Beeswax web interface for Hive. More
> details (including screenshots) are available here:
> https://ccp.cloudera.com/display/CDHDOC/Beeswax
>
> Thanks.
>
> Carl
>
>


-- 
Bertrand Dechoux

Re: HiveServer can not handle concurrent requests from more than one client?

Posted by Carl Steinbach <cw...@apache.org>.

Hi Bertrand,

According to the proposal for HiveServer2, the current hive server provides
> no insurance about "session state in between calls".
> If that was all, it is something that can be lived with. It only means
> that for a JDBC client, all requests should be conceived as isolated.
>

In the HiveServer Thrift API Execute() and Fetch() are two separate calls
and require two separate RPCs. In between these calls HiveServer has to
maintain session state so that when the Fetch() call is made it knows which
result set to look at. The current HiveServer Thrift API assumes that
Thrift will consistently map the same physical connection to the same
Thrift worker thread, and consequently it stores the session state in a
thread local variable. Unfortunately, this assumption is false. It's
possible to live with this limitation if you're ok with sometimes fetching
other people's result sets instead of your own.


> The page of the Hive Server (1) says "HiveServer can not handle
> concurrent requests from more than one client."
> According to the jira, one may run into issues when multiples users are
> running it. Is that true regardless of the configuration?
> It should not be interpreted as "query will be executed one after the
> other", like Ranjiht said?
>

Yes, this is true regardless of the configuration. Ranjiht's statement is
incorrect.


> Eg what would be the impact of hive.exec.parallel or
> hive.support.concurrency?
>

These two configuration properties are actually completely orthogonal to
the HiveServer multi-client issue, though it's hard to know that since the
configuration property names were very poorly chosen. hive.exec.parallel
controls whether or not the the MR jobs in the query plan DAG are executed
in parallel on the cluster (https://issues.apache.org/jira/browse/HIVE-549).
hive.support.concurrency controls whether or not Hive supports
coarse-grained locks on tables and partitions (see
https://cwiki.apache.org/confluence/display/Hive/Locking).


> What would be the recommended way for providing a hive access to multiple
> users to a production environnement which is thightly fire walled? Ssh is
> not a viable solution in my context and the hive web interface does not
> seem mature enough.
>

I recommend taking a look at the Beeswax web interface for Hive. More
details (including screenshots) are available here:
https://ccp.cloudera.com/display/CDHDOC/Beeswax

Thanks.

Carl

Re: HiveServer can not handle concurrent requests from more than one client?

Posted by Bertrand Dechoux <de...@gmail.com>.

Thanks for the answers.

I had already read it but both pages (and the jira) are not very explicit
about the problem.

According to the proposal for HiveServer2, the current hive server provides
no insurance about "session state in between calls".
If that was all, it is something that can be lived with. It only means that
for a JDBC client, all requests should be conceived as isolated.

The page of the Hive Server (1) says "HiveServer can not handle concurrent
requests from more than one client."
According to the jira, one may run into issues when multiples users are
running it. Is that true regardless of the configuration?
It should not be interpreted as "query will be executed one after the
other", like Ranjiht said?

Eg what would be the impact of hive.exec.parallel or
hive.support.concurrency?

What would be the recommended way for providing a hive access to multiple
users to a production environnement which is thightly fire walled? Ssh is
not a viable solution in my context and the hive web interface does not
seem mature enough.

Bertrand

On Mon, Aug 27, 2012 at 9:15 PM, Carl Steinbach <ca...@cloudera.com> wrote:

> HiveServer is multi-threaded, but there is a defect in the current
> HiveServer Thrift API that prevents it from robustly handling concurrent
> connections. This problem is described in more detail here:
>
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API
>
> Thanks.
>
> Carl
>
> On Mon, Aug 27, 2012 at 9:03 AM, Raghunath, Ranjith <
> Ranjith.Raghunath1@usaa.com> wrote:
>
>>  Bertrand,****
>>
>> ** **
>>
>> The Hive Server is a thrift service that provides an interface for Hive.
>> You can connect to it using JDBC. It is not sure (out of box) as there is
>> no userid and password restrictions. On the concurrency part, it is single
>> threaded…….one query gets executed after the other.****
>>
>> ** **
>>
>> Thanks,****
>>
>> Ranjith****
>>
>> ** **
>>
>> *From:* Bertrand Dechoux [mailto:dechouxb@gmail.com]
>> *Sent:* Monday, August 27, 2012 11:01 AM
>> *To:* user@hive.apache.org
>> *Subject:* HiveServer can not handle concurrent requests from more than
>> one client?****
>>
>> ** **
>>
>> Hi,
>>
>> I would like to have more information about this specific sentence from
>> the documentation.
>> "HiveServer can not handle concurrent requests from more than one client."
>> https://cwiki.apache.org/Hive/hiveserver.html
>>
>> Does it mean it is not possible with this server to provide a JDBC access
>> to an 'almost closed' environment for multiple users?
>>
>> Regards
>>
>> Bertrand****
>>
>
>

-- 
Bertrand Dechoux

Re: HiveServer can not handle concurrent requests from more than one client?

Posted by Carl Steinbach <ca...@cloudera.com>.

HiveServer is multi-threaded, but there is a defect in the current
HiveServer Thrift API that prevents it from robustly handling concurrent
connections. This problem is described in more detail here:

https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API

Thanks.

Carl

On Mon, Aug 27, 2012 at 9:03 AM, Raghunath, Ranjith <
Ranjith.Raghunath1@usaa.com> wrote:

>  Bertrand,****
>
> ** **
>
> The Hive Server is a thrift service that provides an interface for Hive.
> You can connect to it using JDBC. It is not sure (out of box) as there is
> no userid and password restrictions. On the concurrency part, it is single
> threaded…….one query gets executed after the other.****
>
> ** **
>
> Thanks,****
>
> Ranjith****
>
> ** **
>
> *From:* Bertrand Dechoux [mailto:dechouxb@gmail.com]
> *Sent:* Monday, August 27, 2012 11:01 AM
> *To:* user@hive.apache.org
> *Subject:* HiveServer can not handle concurrent requests from more than
> one client?****
>
> ** **
>
> Hi,
>
> I would like to have more information about this specific sentence from
> the documentation.
> "HiveServer can not handle concurrent requests from more than one client."
> https://cwiki.apache.org/Hive/hiveserver.html
>
> Does it mean it is not possible with this server to provide a JDBC access
> to an 'almost closed' environment for multiple users?
>
> Regards
>
> Bertrand****
>

RE: HiveServer can not handle concurrent requests from more than one client?

Posted by "Raghunath, Ranjith" <Ra...@usaa.com>.

Bertrand,

The Hive Server is a thrift service that provides an interface for Hive. You can connect to it using JDBC. It is not sure (out of box) as there is no userid and password restrictions. On the concurrency part, it is single threaded.......one query gets executed after the other.

Thanks,
Ranjith

From: Bertrand Dechoux [mailto:dechouxb@gmail.com]
Sent: Monday, August 27, 2012 11:01 AM
To: user@hive.apache.org
Subject: HiveServer can not handle concurrent requests from more than one client?

Hi,

I would like to have more information about this specific sentence from the documentation.
"HiveServer can not handle concurrent requests from more than one client."
https://cwiki.apache.org/Hive/hiveserver.html

Does it mean it is not possible with this server to provide a JDBC access to an 'almost closed' environment for multiple users?

Regards

Bertrand