You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Kristopher Glover <kg...@appnexus.com> on 2013/08/15 18:12:32 UTC

threading with hive client

Hi Everyone,

I'm experiencing a threading issue with the Hive client where I want to run multiple queries on the same JVM.

 The problem I'm having is that org.apache.hadoop.hive.ql.Driver#run (line 907)  has the following few lines of code :

 synchronized (compileMonitor) {

      ret = compile(command);

    }


The compileMonitor is a static so it blocks all threads even though I'm using different instances of the Driver class. I could explicitly call Driver#compile then Driver#execute to avoid the synchronized block but I don't know if it's serving a special purpose. Does anyone know why that synchronized block is there and if its really necessary ?


Thanks,

Kris

Re: threading with hive client

Posted by Xuefu Zhang <xz...@cloudera.com>.
To add,

1. Being public doesn't necessarily guarantee thread-safety. Of course,
this is no excuse for not documenting thread-safety.
2. Sometimes a method is made public for testing, which is bad in my
opnion, but I saw many instances like this before.

--Xuefu



On Thu, Aug 15, 2013 at 1:11 PM, Brock Noland <br...@cloudera.com> wrote:

> Well you would have probably found the areas we need to fix! :) The hive
> source is is not strict about methods and member visibility. The good news
> is that we have been making significant improvements in this aspect.
>
> Brock
>
>
> On Thu, Aug 15, 2013 at 2:55 PM, Kristopher Glover <kglover@appnexus.com
> >wrote:
>
> > Interesting, I didn't realize that. If that's the case then I suppose
> it'd
> > be really bad for me to circumvent the lock by reproducing the Driver#run
> > method by calling Driver#compile and Driver#execute directly from within
> > my app.
> >
> > If that is the case why make Driver#compile and Driver#execute public
> > methods? There doesn't seem to be any inheritance that requires them to
> be
> > public and the fact that they are public opens up a thread safety issue.
> >
> > Thanks,
> > Kris
> >
> > On 8/15/13 1:11 PM, "Brock Noland" <br...@cloudera.com> wrote:
> >
> > >The hive semantic analyzer is not fully thread safe.  We'd like to
> remove
> > >that lock but it will be a large project.
> > >
> > >Brock
> > >
> > >
> > >On Thu, Aug 15, 2013 at 11:12 AM, Kristopher Glover
> > ><kg...@appnexus.com>wrote:
> > >
> > >> Hi Everyone,
> > >>
> > >> I'm experiencing a threading issue with the Hive client where I want
> to
> > >> run multiple queries on the same JVM.
> > >>
> > >>  The problem I'm having is that org.apache.hadoop.hive.ql.Driver#run
> > >>(line
> > >> 907)  has the following few lines of code :
> > >>
> > >>  synchronized (compileMonitor) {
> > >>
> > >>       ret = compile(command);
> > >>
> > >>     }
> > >>
> > >>
> > >> The compileMonitor is a static so it blocks all threads even though
> I'm
> > >> using different instances of the Driver class. I could explicitly call
> > >> Driver#compile then Driver#execute to avoid the synchronized block
> but I
> > >> don't know if it's serving a special purpose. Does anyone know why
> that
> > >> synchronized block is there and if its really necessary ?
> > >>
> > >>
> > >> Thanks,
> > >>
> > >> Kris
> > >>
> > >
> > >
> > >
> > >--
> > >Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
> >
> >
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>

Re: threading with hive client

Posted by Brock Noland <br...@cloudera.com>.
Well you would have probably found the areas we need to fix! :) The hive
source is is not strict about methods and member visibility. The good news
is that we have been making significant improvements in this aspect.

Brock


On Thu, Aug 15, 2013 at 2:55 PM, Kristopher Glover <kg...@appnexus.com>wrote:

> Interesting, I didn't realize that. If that's the case then I suppose it'd
> be really bad for me to circumvent the lock by reproducing the Driver#run
> method by calling Driver#compile and Driver#execute directly from within
> my app.
>
> If that is the case why make Driver#compile and Driver#execute public
> methods? There doesn't seem to be any inheritance that requires them to be
> public and the fact that they are public opens up a thread safety issue.
>
> Thanks,
> Kris
>
> On 8/15/13 1:11 PM, "Brock Noland" <br...@cloudera.com> wrote:
>
> >The hive semantic analyzer is not fully thread safe.  We'd like to remove
> >that lock but it will be a large project.
> >
> >Brock
> >
> >
> >On Thu, Aug 15, 2013 at 11:12 AM, Kristopher Glover
> ><kg...@appnexus.com>wrote:
> >
> >> Hi Everyone,
> >>
> >> I'm experiencing a threading issue with the Hive client where I want to
> >> run multiple queries on the same JVM.
> >>
> >>  The problem I'm having is that org.apache.hadoop.hive.ql.Driver#run
> >>(line
> >> 907)  has the following few lines of code :
> >>
> >>  synchronized (compileMonitor) {
> >>
> >>       ret = compile(command);
> >>
> >>     }
> >>
> >>
> >> The compileMonitor is a static so it blocks all threads even though I'm
> >> using different instances of the Driver class. I could explicitly call
> >> Driver#compile then Driver#execute to avoid the synchronized block but I
> >> don't know if it's serving a special purpose. Does anyone know why that
> >> synchronized block is there and if its really necessary ?
> >>
> >>
> >> Thanks,
> >>
> >> Kris
> >>
> >
> >
> >
> >--
> >Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org

Re: threading with hive client

Posted by Kristopher Glover <kg...@appnexus.com>.
Interesting, I didn't realize that. If that's the case then I suppose it'd
be really bad for me to circumvent the lock by reproducing the Driver#run
method by calling Driver#compile and Driver#execute directly from within
my app. 

If that is the case why make Driver#compile and Driver#execute public
methods? There doesn't seem to be any inheritance that requires them to be
public and the fact that they are public opens up a thread safety issue.

Thanks,
Kris

On 8/15/13 1:11 PM, "Brock Noland" <br...@cloudera.com> wrote:

>The hive semantic analyzer is not fully thread safe.  We'd like to remove
>that lock but it will be a large project.
>
>Brock
>
>
>On Thu, Aug 15, 2013 at 11:12 AM, Kristopher Glover
><kg...@appnexus.com>wrote:
>
>> Hi Everyone,
>>
>> I'm experiencing a threading issue with the Hive client where I want to
>> run multiple queries on the same JVM.
>>
>>  The problem I'm having is that org.apache.hadoop.hive.ql.Driver#run
>>(line
>> 907)  has the following few lines of code :
>>
>>  synchronized (compileMonitor) {
>>
>>       ret = compile(command);
>>
>>     }
>>
>>
>> The compileMonitor is a static so it blocks all threads even though I'm
>> using different instances of the Driver class. I could explicitly call
>> Driver#compile then Driver#execute to avoid the synchronized block but I
>> don't know if it's serving a special purpose. Does anyone know why that
>> synchronized block is there and if its really necessary ?
>>
>>
>> Thanks,
>>
>> Kris
>>
>
>
>
>-- 
>Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org


Re: threading with hive client

Posted by Brock Noland <br...@cloudera.com>.
The hive semantic analyzer is not fully thread safe.  We'd like to remove
that lock but it will be a large project.

Brock


On Thu, Aug 15, 2013 at 11:12 AM, Kristopher Glover <kg...@appnexus.com>wrote:

> Hi Everyone,
>
> I'm experiencing a threading issue with the Hive client where I want to
> run multiple queries on the same JVM.
>
>  The problem I'm having is that org.apache.hadoop.hive.ql.Driver#run (line
> 907)  has the following few lines of code :
>
>  synchronized (compileMonitor) {
>
>       ret = compile(command);
>
>     }
>
>
> The compileMonitor is a static so it blocks all threads even though I'm
> using different instances of the Driver class. I could explicitly call
> Driver#compile then Driver#execute to avoid the synchronized block but I
> don't know if it's serving a special purpose. Does anyone know why that
> synchronized block is there and if its really necessary ?
>
>
> Thanks,
>
> Kris
>



-- 
Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org