You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Chunky Gupta <ch...@vizury.com> on 2012/10/22 13:57:45 UTC

How to run multiple Hive queries in parallel

Hi,

I have one name node machine and under which there are 4 slaves machines to
run the job.

The way users run queries is
- They ssh into the name node machine
- They initiate hive and submit their queries

Currently multiple users log in with the same credentials and submit queries

Whenever 2 or more users try to run queries at a same time from different
hive console , it runs only one query at a time and when that query is
finished then only next query starts executing and so on.

In this scenario if there is a large query which is submitted earlier then
all the other queries have to wait for that query to complete.

I want to run multiple query at the same time. Is there any way or any
configuration parameter to do the same ?

PS: The data is in S3 and running HIVE on AWS EMR infrastructure in
interactive mode.

Thank You,
Chunky.

Re: How to run multiple Hive queries in parallel

Posted by Bejoy KS <be...@yahoo.com>.

Hi 

From the jobtracker web UI you can get the total number of map and reduce slots. Also from the wen UI itself you can get the num of running map/reduce tasks. Second value subtracted from first would give you the available slots.

Fair scheduler is a property of map reduce and not of hive. It is primarily used to control the number of slots used by each user/pool in a cluster. You can read more @

http://hadoop.apache.org/docs/mapreduce/r0.20.2/fair_scheduler.html 

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Chunky Gupta <ch...@vizury.com>
Date: Mon, 22 Oct 2012 18:22:03 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Cc: <be...@yahoo.com>; <de...@gmail.com>
Subject: Re: How to run multiple Hive queries in parallel

Hi Bejoy and Bertrand

Thanks for quick reply.

I think tasks slots are not available in my cluster because I have only 4
slave machines.
Actually I am beginner to HIVE.  So, if you can let me know how I can check
if time slots are available or not.

I have different users credentials to log in into my name node machine, but
I don't have much idea about fair scheduler.

In case time slots are not available and are exhausted , then if you can
please point me to some publicly available fair scheduler which I can
integrate with HIVE to solve my problem.

Thank You,
Chunky.

On Mon, Oct 22, 2012 at 5:52 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> Bejoy is right. I just want to say explicitly that the scheduler
> configuration is something which is orthogonal to the use of Hive. (ie same
> problem with Pig or standard MapReduce jobs).
>
> Regards
>
> Bertrand
>
> PS : There is also the capacity scheduler.
>
>
> On Mon, Oct 22, 2012 at 2:18 PM, Bejoy KS <be...@yahoo.com> wrote:
>
>> **
>> Hi
>>
>> Is your hive queries in waiting mode even though there are task slots
>> available on your cluster?
>>
>> If task slots are getting exhausted and you need parallelism here, then
>> you may need to look at some approaches of using fair scheduler and
>> different user accounts for each user so that each user gets his fair share
>> of task slots.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>> ------------------------------
>> *From: * Chunky Gupta <ch...@vizury.com>
>> *Date: *Mon, 22 Oct 2012 17:27:45 +0530
>> *To: *<us...@hive.apache.org>
>> *ReplyTo: * user@hive.apache.org
>> *Subject: *How to run multiple Hive queries in parallel
>>
>> Hi,
>>
>> I have one name node machine and under which there are 4 slaves machines
>> to run the job.
>>
>> The way users run queries is
>> - They ssh into the name node machine
>> - They initiate hive and submit their queries
>>
>> Currently multiple users log in with the same credentials and submit
>> queries
>>
>> Whenever 2 or more users try to run queries at a same time from different
>> hive console , it runs only one query at a time and when that query is
>> finished then only next query starts executing and so on.
>>
>> In this scenario if there is a large query which is submitted earlier
>> then all the other queries have to wait for that query to complete.
>>
>> I want to run multiple query at the same time. Is there any way or any
>> configuration parameter to do the same ?
>>
>> PS: The data is in S3 and running HIVE on AWS EMR infrastructure in
>> interactive mode.
>>
>> Thank You,
>> Chunky.
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: How to run multiple Hive queries in parallel

Posted by Chunky Gupta <ch...@vizury.com>.

Hi Bejoy and Bertrand

Thanks for quick reply.

I think tasks slots are not available in my cluster because I have only 4
slave machines.
Actually I am beginner to HIVE.  So, if you can let me know how I can check
if time slots are available or not.

I have different users credentials to log in into my name node machine, but
I don't have much idea about fair scheduler.

In case time slots are not available and are exhausted , then if you can
please point me to some publicly available fair scheduler which I can
integrate with HIVE to solve my problem.

Thank You,
Chunky.

On Mon, Oct 22, 2012 at 5:52 PM, Bertrand Dechoux <de...@gmail.com>wrote:

> Bejoy is right. I just want to say explicitly that the scheduler
> configuration is something which is orthogonal to the use of Hive. (ie same
> problem with Pig or standard MapReduce jobs).
>
> Regards
>
> Bertrand
>
> PS : There is also the capacity scheduler.
>
>
> On Mon, Oct 22, 2012 at 2:18 PM, Bejoy KS <be...@yahoo.com> wrote:
>
>> **
>> Hi
>>
>> Is your hive queries in waiting mode even though there are task slots
>> available on your cluster?
>>
>> If task slots are getting exhausted and you need parallelism here, then
>> you may need to look at some approaches of using fair scheduler and
>> different user accounts for each user so that each user gets his fair share
>> of task slots.
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>> ------------------------------
>> *From: * Chunky Gupta <ch...@vizury.com>
>> *Date: *Mon, 22 Oct 2012 17:27:45 +0530
>> *To: *<us...@hive.apache.org>
>> *ReplyTo: * user@hive.apache.org
>> *Subject: *How to run multiple Hive queries in parallel
>>
>> Hi,
>>
>> I have one name node machine and under which there are 4 slaves machines
>> to run the job.
>>
>> The way users run queries is
>> - They ssh into the name node machine
>> - They initiate hive and submit their queries
>>
>> Currently multiple users log in with the same credentials and submit
>> queries
>>
>> Whenever 2 or more users try to run queries at a same time from different
>> hive console , it runs only one query at a time and when that query is
>> finished then only next query starts executing and so on.
>>
>> In this scenario if there is a large query which is submitted earlier
>> then all the other queries have to wait for that query to complete.
>>
>> I want to run multiple query at the same time. Is there any way or any
>> configuration parameter to do the same ?
>>
>> PS: The data is in S3 and running HIVE on AWS EMR infrastructure in
>> interactive mode.
>>
>> Thank You,
>> Chunky.
>>
>>
>
>
> --
> Bertrand Dechoux
>

Re: How to run multiple Hive queries in parallel

Posted by Bertrand Dechoux <de...@gmail.com>.

Bejoy is right. I just want to say explicitly that the scheduler
configuration is something which is orthogonal to the use of Hive. (ie same
problem with Pig or standard MapReduce jobs).

Regards

Bertrand

PS : There is also the capacity scheduler.

On Mon, Oct 22, 2012 at 2:18 PM, Bejoy KS <be...@yahoo.com> wrote:

> **
> Hi
>
> Is your hive queries in waiting mode even though there are task slots
> available on your cluster?
>
> If task slots are getting exhausted and you need parallelism here, then
> you may need to look at some approaches of using fair scheduler and
> different user accounts for each user so that each user gets his fair share
> of task slots.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------------------------------
> *From: * Chunky Gupta <ch...@vizury.com>
> *Date: *Mon, 22 Oct 2012 17:27:45 +0530
> *To: *<us...@hive.apache.org>
> *ReplyTo: * user@hive.apache.org
> *Subject: *How to run multiple Hive queries in parallel
>
> Hi,
>
> I have one name node machine and under which there are 4 slaves machines
> to run the job.
>
> The way users run queries is
> - They ssh into the name node machine
> - They initiate hive and submit their queries
>
> Currently multiple users log in with the same credentials and submit
> queries
>
> Whenever 2 or more users try to run queries at a same time from different
> hive console , it runs only one query at a time and when that query is
> finished then only next query starts executing and so on.
>
> In this scenario if there is a large query which is submitted earlier then
> all the other queries have to wait for that query to complete.
>
> I want to run multiple query at the same time. Is there any way or any
> configuration parameter to do the same ?
>
> PS: The data is in S3 and running HIVE on AWS EMR infrastructure in
> interactive mode.
>
> Thank You,
> Chunky.
>
>


-- 
Bertrand Dechoux

Re: How to run multiple Hive queries in parallel

Posted by Bejoy KS <be...@yahoo.com>.

Hi

Is your hive queries in waiting mode even though there are task slots available on your cluster?

If task slots are getting exhausted and you need parallelism here, then you may need to look at some approaches of using fair scheduler and different user accounts for each user so that each user gets his fair share of task slots.



Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Chunky Gupta <ch...@vizury.com>
Date: Mon, 22 Oct 2012 17:27:45 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: How to run multiple Hive queries in parallel

Hi,

I have one name node machine and under which there are 4 slaves machines to
run the job.

The way users run queries is
- They ssh into the name node machine
- They initiate hive and submit their queries

Currently multiple users log in with the same credentials and submit queries

Whenever 2 or more users try to run queries at a same time from different
hive console , it runs only one query at a time and when that query is
finished then only next query starts executing and so on.

In this scenario if there is a large query which is submitted earlier then
all the other queries have to wait for that query to complete.

I want to run multiple query at the same time. Is there any way or any
configuration parameter to do the same ?

PS: The data is in S3 and running HIVE on AWS EMR infrastructure in
interactive mode.

Thank You,
Chunky.