You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Stephen Sprague <sp...@gmail.com> on 2016/08/10 00:12:24 UTC

beeline/hiveserver2 + logging

hey guys,
try as i might i cannot seem to get beeline (via jdbc) to log information
back from hiveserver2 like job_id, progress and that kind of information
(similiar to what the local beeline or hive clients do.)

i see this ticket that is closed:
https://issues.apache.org/jira/browse/HIVE-7615 which leads me to believe
things should work.

info:
   * running hive v2.1.0
   * i have enabled all the hive.hiveserver2.logging settings


do other people have this working? whats' the secret?

Thanks,
stephen.

1. beeline -u jdbc:hive2://  #local beeline-cli) reports progress
2. hive --service cli  #local hive-cli reports progress

3. beeline -u jdbc:hive2://dwrdevnn1:10001/default;auth=noSasl  #remote
beeline client *does not* report progress

i have Hiveserver2 running in binary mode (not http) not sure that makes
any difference.

Re: beeline/hiveserver2 + logging

Posted by Stephen Sprague <sp...@gmail.com>.
Hi Gopal,
Aha!  thank you for background behind this.  that makes things much more
understandable.

and ~3000 queries across 10 HS2 servers. sweet. now that's what i call
pushing the edge. I like it!

Thanks again,
Stephen.

On Tue, Aug 9, 2016 at 10:29 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

> > not get the progress messages back until the query finishes which
> >somewhat defeats the purpose of interactive usage.
>
> That happens entirely on the client side btw.
>
> So to avoid a hard sleep() + check loop causing pointless HTTP traffic,
> HiveServer2 now does a long poll on the server side.
>
> hive.server2.long.polling.timeout", "5000ms"
>
>
> This means that it is edge-triggered to return whenever the query finishes
> instead of adding extra time when the results are ready but beeline
> doesn't know about.
>
>
> However, the get_logs() synchronizes on the same HiveStatement and is
> mutexed out by the long poll for getting results.
>
> You can escape this on a low-concurrency cluster by changing the
> long.polling.timeout to 0.5s instead of 5s & restarting HS2.
>
> However as the total # of concurrent queries goes up, the current setting
> does very well due to the reduction in total # of "Nope, come back" http
> noise (largest parallel workload I've seen is about ~3000 queries on 10
> HS2 nodes load-balanced).
>
> Cheers,
> Gopal
>
>
>

Re: beeline/hiveserver2 + logging

Posted by Gopal Vijayaraghavan <go...@apache.org>.
> not get the progress messages back until the query finishes which
>somewhat defeats the purpose of interactive usage.

That happens entirely on the client side btw.

So to avoid a hard sleep() + check loop causing pointless HTTP traffic,
HiveServer2 now does a long poll on the server side.

hive.server2.long.polling.timeout", "5000ms"


This means that it is edge-triggered to return whenever the query finishes
instead of adding extra time when the results are ready but beeline
doesn't know about.


However, the get_logs() synchronizes on the same HiveStatement and is
mutexed out by the long poll for getting results.

You can escape this on a low-concurrency cluster by changing the
long.polling.timeout to 0.5s instead of 5s & restarting HS2.

However as the total # of concurrent queries goes up, the current setting
does very well due to the reduction in total # of "Nope, come back" http
noise (largest parallel workload I've seen is about ~3000 queries on 10
HS2 nodes load-balanced).

Cheers,
Gopal



Re: beeline/hiveserver2 + logging

Posted by Stephen Sprague <sp...@gmail.com>.
finishing this thread off... yes, it worked to a degree.

by setting hive.async.log.enabled=false one does get the job_id and the
tracking url returned to the client before the job is launched on the
cluster. So that part is good.  However, one does not get the progress
messages back until the query finishes which somewhat defeats the purpose
of interactive usage.  For batch though not that bad.

as a side note the "operation_logs" directory is updated in real-time - and
deleted - after its copied back to the client.  so if you're hard-up for
that progress as its happening seek out that operation_log dir & associated
file.


Thanks,
Stephen.

On Tue, Aug 9, 2016 at 6:44 PM, Stephen Sprague <sp...@gmail.com> wrote:

> well, well. i just found this: https://issues.apache.org/
> jira/browse/HIVE-14183   seems something changed between 1.2.1 and
> 2.1.0.
>
> i'll see if the Rx as prescribed in that ticket does indeed work for me.
>
> Thanks,
> Stephen.
>
> On Tue, Aug 9, 2016 at 5:12 PM, Stephen Sprague <sp...@gmail.com>
> wrote:
>
>> hey guys,
>> try as i might i cannot seem to get beeline (via jdbc) to log information
>> back from hiveserver2 like job_id, progress and that kind of information
>> (similiar to what the local beeline or hive clients do.)
>>
>> i see this ticket that is closed: https://issues.apache.org/jira
>> /browse/HIVE-7615 which leads me to believe things should work.
>>
>> info:
>>    * running hive v2.1.0
>>    * i have enabled all the hive.hiveserver2.logging settings
>>
>>
>> do other people have this working? whats' the secret?
>>
>> Thanks,
>> stephen.
>>
>> 1. beeline -u jdbc:hive2://  #local beeline-cli) reports progress
>> 2. hive --service cli  #local hive-cli reports progress
>>
>> 3. beeline -u jdbc:hive2://dwrdevnn1:10001/default;auth=noSasl  #remote
>> beeline client *does not* report progress
>>
>> i have Hiveserver2 running in binary mode (not http) not sure that makes
>> any difference.
>>
>>
>>
>

Re: beeline/hiveserver2 + logging

Posted by Stephen Sprague <sp...@gmail.com>.
well, well. i just found this:
https://issues.apache.org/jira/browse/HIVE-14183   seems something changed
between 1.2.1 and 2.1.0.

i'll see if the Rx as prescribed in that ticket does indeed work for me.

Thanks,
Stephen.

On Tue, Aug 9, 2016 at 5:12 PM, Stephen Sprague <sp...@gmail.com> wrote:

> hey guys,
> try as i might i cannot seem to get beeline (via jdbc) to log information
> back from hiveserver2 like job_id, progress and that kind of information
> (similiar to what the local beeline or hive clients do.)
>
> i see this ticket that is closed: https://issues.apache.org/
> jira/browse/HIVE-7615 which leads me to believe things should work.
>
> info:
>    * running hive v2.1.0
>    * i have enabled all the hive.hiveserver2.logging settings
>
>
> do other people have this working? whats' the secret?
>
> Thanks,
> stephen.
>
> 1. beeline -u jdbc:hive2://  #local beeline-cli) reports progress
> 2. hive --service cli  #local hive-cli reports progress
>
> 3. beeline -u jdbc:hive2://dwrdevnn1:10001/default;auth=noSasl  #remote
> beeline client *does not* report progress
>
> i have Hiveserver2 running in binary mode (not http) not sure that makes
> any difference.
>
>
>