You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Jayanth Muthya <ja...@gmail.com> on 2012/06/21 10:16:51 UTC

Concurrency in hive

Hi,
I was looking into some of the source code for hive. And had a few
questions regarding parallelism in hive. Can a map task in
hive exploit parallelism and run multiple threads? If it can do that, does
it do it by default? or does a user have to configure the settings?
This question seems really basic, I just started looking into hadoop/hive.
Thanks in advance!

-Jay

Re: Concurrency in hive

Posted by Edward Capriolo <ed...@gmail.com>.
Almost all operations in hive can exploit map reduce for parallelism.
(isnt not really done on the thread level) essentially if you run a
hive job and there is multiple mappers or reducers it was parallelism.

On Fri, Jun 22, 2012 at 5:14 AM, Jayanth Muthya <ja...@gmail.com> wrote:
> Thanks or clarifying, I'll look into it too and see if I can find anything.
>
> -Jayanth
>
> On Thu, Jun 21, 2012 at 10:47 PM, Jerome Banks <je...@klout.com> wrote:
>
>> set hive.exec.parallel=true;
>>
>> This will run Hive jobs in parallel, if they are able to do so.
>>
>> As for multi-threading in the actual job itself, I don't think so, but I'm
>> not sure. The query planner will merge steps together, in order to try to
>> minimize the number of MR jobs needed to run a query, but I think those are
>> chained together in a single thread, both on the mapper and reduce.
>>
>> When I was at Quantcast, we had some multi-threading in the mapper ands
>> reducers, to try to increase throughput, by utilizing the CPU when the job
>> would otherwise be blocked on IO.  This helps out, if your IO is very slow,
>> but if the IO no longer becomes a bottleneck, then you spend a lot of time
>> context-switching, and it no longer efficient.
>>
>> Interesting question, I'll look into it some more. Let me know if you find
>> out anything.
>>
>> -- jerome
>>
>> On Thu, Jun 21, 2012 at 1:16 AM, Jayanth Muthya <jayanthmuthya@gmail.com
>> >wrote:
>>
>> > Hi,
>> > I was looking into some of the source code for hive. And had a few
>> > questions regarding parallelism in hive. Can a map task in
>> > hive exploit parallelism and run multiple threads? If it can do that,
>> does
>> > it do it by default? or does a user have to configure the settings?
>> > This question seems really basic, I just started looking into
>> hadoop/hive.
>> > Thanks in advance!
>> >
>> > -Jay
>> >
>>

Re: Concurrency in hive

Posted by Jayanth Muthya <ja...@gmail.com>.
Thanks or clarifying, I'll look into it too and see if I can find anything.

-Jayanth

On Thu, Jun 21, 2012 at 10:47 PM, Jerome Banks <je...@klout.com> wrote:

> set hive.exec.parallel=true;
>
> This will run Hive jobs in parallel, if they are able to do so.
>
> As for multi-threading in the actual job itself, I don't think so, but I'm
> not sure. The query planner will merge steps together, in order to try to
> minimize the number of MR jobs needed to run a query, but I think those are
> chained together in a single thread, both on the mapper and reduce.
>
> When I was at Quantcast, we had some multi-threading in the mapper ands
> reducers, to try to increase throughput, by utilizing the CPU when the job
> would otherwise be blocked on IO.  This helps out, if your IO is very slow,
> but if the IO no longer becomes a bottleneck, then you spend a lot of time
> context-switching, and it no longer efficient.
>
> Interesting question, I'll look into it some more. Let me know if you find
> out anything.
>
> -- jerome
>
> On Thu, Jun 21, 2012 at 1:16 AM, Jayanth Muthya <jayanthmuthya@gmail.com
> >wrote:
>
> > Hi,
> > I was looking into some of the source code for hive. And had a few
> > questions regarding parallelism in hive. Can a map task in
> > hive exploit parallelism and run multiple threads? If it can do that,
> does
> > it do it by default? or does a user have to configure the settings?
> > This question seems really basic, I just started looking into
> hadoop/hive.
> > Thanks in advance!
> >
> > -Jay
> >
>

Re: Concurrency in hive

Posted by Jerome Banks <je...@klout.com>.
set hive.exec.parallel=true;

This will run Hive jobs in parallel, if they are able to do so.

As for multi-threading in the actual job itself, I don't think so, but I'm
not sure. The query planner will merge steps together, in order to try to
minimize the number of MR jobs needed to run a query, but I think those are
chained together in a single thread, both on the mapper and reduce.

When I was at Quantcast, we had some multi-threading in the mapper ands
reducers, to try to increase throughput, by utilizing the CPU when the job
would otherwise be blocked on IO.  This helps out, if your IO is very slow,
but if the IO no longer becomes a bottleneck, then you spend a lot of time
context-switching, and it no longer efficient.

Interesting question, I'll look into it some more. Let me know if you find
out anything.

-- jerome

On Thu, Jun 21, 2012 at 1:16 AM, Jayanth Muthya <ja...@gmail.com>wrote:

> Hi,
> I was looking into some of the source code for hive. And had a few
> questions regarding parallelism in hive. Can a map task in
> hive exploit parallelism and run multiple threads? If it can do that, does
> it do it by default? or does a user have to configure the settings?
> This question seems really basic, I just started looking into hadoop/hive.
> Thanks in advance!
>
> -Jay
>