You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Raajay <ra...@gmail.com> on 2015/08/25 15:21:24 UTC

Run multiple queries simultaneously

Hello,

I want to compare the running time of an query when run alone against the
run time in presence of other queries.

What is the ideal setup required to run this experiment ? Should I have two
Hive CLI's open and issue queries simultaneously ? How to script such
experiment in Hive ?

Raajay

Re: Run multiple queries simultaneously

Posted by Sergey Shelukhin <se...@hortonworks.com>.
You can start HiveServer2, then submit queries to it using JDBC. If you open multiple sessions using multiple threads, you will be able to submit queries in parallel, although the compilation is still currently serialized.

From: Raajay <ra...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Tuesday, August 25, 2015 at 06:21
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Run multiple queries simultaneously

Hello,

I want to compare the running time of an query when run alone against the run time in presence of other queries.

What is the ideal setup required to run this experiment ? Should I have two Hive CLI's open and issue queries simultaneously ? How to script such experiment in Hive ?

Raajay

Re: Run multiple queries simultaneously

Posted by Raajay <ra...@gmail.com>.
The back-end execution engine is Tez, and I use YARN for resource
management.

I completely agree with your deduction that the impact on the run time will
be dependent on the nature of the queries. I would like to conduct some
experiments (for a given workload, cluster configuration) to quantify the
impact.

For this, I need to be able to run queries simultaneously and measure the
running times. What I glean from other threads is that, it should be good
enough to fire up 2 CLI's and issue the queries.

Raajay

On Tue, Aug 25, 2015 at 4:17 PM, Ryan Harris <Ry...@zionsbancorp.com>
wrote:

> You need to be a bit more clear with your environment and objective
> here....
>
> What is your back-end execution engine?  MapReduce, Spark, or Tez?
>
> What are you using for resource management? YARN or MapReduce?
>
>
>
> The running time of one query in the presence of other queries will
> entirely depend on the cost/complexity of the queries.  If each query is
> able to fully utilize your allocated resources on the cluster then they
> will be slower when run at the same time.  However, many times in hive,
> depending on the query and depending on the cluster resources, a single
> hive query will only utilize a fraction of the cluster resources, in this
> case multiple queries could be run at the same time with no detrimental
> impact to performance as long as those queries aren't updating the same
> hive table.
>
>
>
> *From:* Raajay [mailto:raajay.v@gmail.com]
> *Sent:* Tuesday, August 25, 2015 7:21 AM
> *To:* user@hive.apache.org
> *Subject:* Run multiple queries simultaneously
>
>
>
> Hello,
>
> I want to compare the running time of an query when run alone against the
> run time in presence of other queries.
>
> What is the ideal setup required to run this experiment ? Should I have
> two Hive CLI's open and issue queries simultaneously ? How to script such
> experiment in Hive ?
>
> Raajay
> ------------------------------
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>

RE: Run multiple queries simultaneously

Posted by Ryan Harris <Ry...@zionsbancorp.com>.
You need to be a bit more clear with your environment and objective here....
What is your back-end execution engine?  MapReduce, Spark, or Tez?
What are you using for resource management? YARN or MapReduce?

The running time of one query in the presence of other queries will entirely depend on the cost/complexity of the queries.  If each query is able to fully utilize your allocated resources on the cluster then they will be slower when run at the same time.  However, many times in hive, depending on the query and depending on the cluster resources, a single hive query will only utilize a fraction of the cluster resources, in this case multiple queries could be run at the same time with no detrimental impact to performance as long as those queries aren't updating the same hive table.

From: Raajay [mailto:raajay.v@gmail.com]
Sent: Tuesday, August 25, 2015 7:21 AM
To: user@hive.apache.org
Subject: Run multiple queries simultaneously

Hello,
I want to compare the running time of an query when run alone against the run time in presence of other queries.
What is the ideal setup required to run this experiment ? Should I have two Hive CLI's open and issue queries simultaneously ? How to script such experiment in Hive ?
Raajay

======================================================================
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately.  Thank you.

Re: Run multiple queries simultaneously

Posted by Raajay <ra...@gmail.com>.
Noam,

I am concerned with cases where the network is a bottleneck. Will i be able
control it in YARN ? Ideally, I would like to run multiple queries
simultaneously.

Raajay


On Tue, Aug 25, 2015 at 9:31 AM, Noam Hasson <no...@kenshoo.com>
wrote:

> I would just limit the resources given to the user on YARN.
>
> On Tue, Aug 25, 2015 at 4:21 PM, Raajay <ra...@gmail.com> wrote:
>
>> Hello,
>>
>> I want to compare the running time of an query when run alone against the
>> run time in presence of other queries.
>>
>> What is the ideal setup required to run this experiment ? Should I have
>> two Hive CLI's open and issue queries simultaneously ? How to script such
>> experiment in Hive ?
>>
>> Raajay
>>
>
>
> This e-mail, as well as any attached document, may contain material which
> is confidential and privileged and may include trademark, copyright and
> other intellectual property rights that are proprietary to Kenshoo Ltd,
>  its subsidiaries or affiliates ("Kenshoo"). This e-mail and its
> attachments may be read, copied and used only by the addressee for the
> purpose(s) for which it was disclosed herein. If you have received it in
> error, please destroy the message and any attachment, and contact us
> immediately. If you are not the intended recipient, be aware that any
> review, reliance, disclosure, copying, distribution or use of the contents
> of this message without Kenshoo's express permission is strictly prohibited.

Re: Run multiple queries simultaneously

Posted by Noam Hasson <no...@kenshoo.com>.
I would just limit the resources given to the user on YARN.

On Tue, Aug 25, 2015 at 4:21 PM, Raajay <ra...@gmail.com> wrote:

> Hello,
>
> I want to compare the running time of an query when run alone against the
> run time in presence of other queries.
>
> What is the ideal setup required to run this experiment ? Should I have
> two Hive CLI's open and issue queries simultaneously ? How to script such
> experiment in Hive ?
>
> Raajay
>

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.