You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by sh...@accenture.com on 2014/05/31 00:05:35 UTC
Need urgent help on hive query performance
Hi,
Does anybody help urgently on optimizing hive query performance? I am looking more Hadoop tuning point of view. Currently, small amount of table takes much time to query?
We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task Nodes.
Quick help is much appreciated.
Thanks,
Shouvanik
________________________________
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________
www.accenture.com
Re: Need urgent help on hive query performance
Posted by Ashish Garg <ga...@gmail.com>.
hive> Create External Table Emp(
> id INT,
> name STRING,
> Salary INT)
> PARTITIONED BY (Country STRING, State STRING)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ‘\t’
> LOCATION ‘/user/data/’;
Now load the data which is partition specific. For example,
hive> LOAD DATA LOCAL INPATH ‘---‘
> OVERWRITE INTO TABLE Emp
> PARTITION (Country=’US’ , State=’NJ’);
Now try running queries like
hive> Select Count(*), MAX(Salary) FROM Emp Where Country='US' And
State='NJ';
This will optimize your query performance.
On Fri, May 30, 2014 at 6:32 PM, <sh...@accenture.com> wrote:
> Can you please give a specific example or blog to refer to. I did not
> understand
>
>
>
> *From:* Ashish Garg [mailto:gargcreation1992@gmail.com]
> *Sent:* Friday, May 30, 2014 3:31 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Need urgent help on hive query performance
>
>
>
> try partitioning the table and run the queries which are partition
> specific. Hope this helps.
>
> Thanks and Regards,
>
> Ashish Garg.
>
>
>
> On Fri, May 30, 2014 at 6:05 PM, <sh...@accenture.com> wrote:
>
> Hi,
>
>
>
> Does anybody help urgently on optimizing hive query performance? I am
> looking more Hadoop tuning point of view. Currently, small amount of table
> takes much time to query?
>
>
>
> We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task
> Nodes.
>
>
>
> Quick help is much appreciated.
>
>
>
> Thanks,
>
> Shouvanik
>
>
> ------------------------------
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>
>
>
RE: Need urgent help on hive query performance
Posted by sh...@accenture.com.
Thanks to all for all your suggestions. Really appreciate.
But we have a constraint on Amazon EMR. It would be great if I get any pointer on how to tune Hadoop configurations(e.g. core-site.xml, mapred-site.xml etc) so that HIVE query gets executed faster.
Please help ASAP. Sorry for the urgency.
Thanks,
Shouvanik
From: Bala Krishna Gangisetty [mailto:bala@altiscale.com]
Sent: Friday, May 30, 2014 4:08 PM
To: user@hive.apache.org
Subject: Re: Need urgent help on hive query performance
Another dimension,
Try storing Hive table in ORC format. From my experience, it significantly improves the performance compare to other formats.
Since you mentioned about join queries, on a side note, as a long term goal, you probably want to explore Hive with Tez.
--Bala G.
On Fri, May 30, 2014 at 3:59 PM, kulkarni.swarnim@gmail.com<ma...@gmail.com> <ku...@gmail.com>> wrote:
> It has innumerable no of joins. Since its client specific query, u understand I cannot share. Sorry about that
Like I said, Joins are slow and in not done correctly could have terrible performance. A couple of handy techniques depend on how exactly are you trying to perform the join. For instance, if you are trying to join a smaller table to a larger one, a map join could work well for you where the smaller table is kept in-memory when the join is performed. Also if you are able to break your table down to smaller buckets, you might as well be able to use a bucketed map join for instance. Following link should be helpful[1][2].
Hope this helps.
[1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization
[2] http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables
On Fri, May 30, 2014 at 5:38 PM, <sh...@accenture.com>> wrote:
Pls find the answers
From: kulkarni.swarnim@gmail.com<ma...@gmail.com> [mailto:kulkarni.swarnim@gmail.com<ma...@gmail.com>]
Sent: Friday, May 30, 2014 3:34 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Need urgent help on hive query performance
I feel it's pretty hard to answer this without understanding the following:
1. What exactly are you trying to query? CSV? Avro? ....
HIVE table
2. Where is your data? HDFS? HBase? Local filesystem?
Data is in s3
3. What version of hive are you using?
Hive 0.12
4. What is an example of a query that is slow? Some queries like joins and stuff would be inherently slower than other simpler ones(though can be optimized).
It has innumerable no of joins. Since its client specific query, u understand I cannot share. Sorry about that
Thanks,
--
Swarnim
On Fri, May 30, 2014 at 5:32 PM, <sh...@accenture.com>> wrote:
Can you please give a specific example or blog to refer to. I did not understand
From: Ashish Garg [mailto:gargcreation1992@gmail.com<ma...@gmail.com>]
Sent: Friday, May 30, 2014 3:31 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Need urgent help on hive query performance
try partitioning the table and run the queries which are partition specific. Hope this helps.
Thanks and Regards,
Ashish Garg.
On Fri, May 30, 2014 at 6:05 PM, <sh...@accenture.com>> wrote:
Hi,
Does anybody help urgently on optimizing hive query performance? I am looking more Hadoop tuning point of view. Currently, small amount of table takes much time to query?
We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task Nodes.
Quick help is much appreciated.
Thanks,
Shouvanik
________________________________
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________
www.accenture.com<http://www.accenture.com>
--
Swarnim
--
Swarnim
Re: Need urgent help on hive query performance
Posted by Bala Krishna Gangisetty <ba...@altiscale.com>.
Another dimension,
Try storing Hive table in ORC format. From my experience, it significantly
improves the performance compare to other formats.
Since you mentioned about join queries, on a side note, as a long term
goal, you probably want to explore Hive with Tez.
--Bala G.
On Fri, May 30, 2014 at 3:59 PM, kulkarni.swarnim@gmail.com <
kulkarni.swarnim@gmail.com> wrote:
> > It has innumerable no of joins. Since its client specific query, u
> understand I cannot share. Sorry about that
>
> Like I said, Joins are slow and in not done correctly could have terrible
> performance. A couple of handy techniques depend on how exactly are you
> trying to perform the join. For instance, if you are trying to join a
> smaller table to a larger one, a map join could work well for you where the
> smaller table is kept in-memory when the join is performed. Also if you are
> able to break your table down to smaller buckets, you might as well be able
> to use a bucketed map join for instance. Following link should be
> helpful[1][2].
>
> Hope this helps.
>
> [1]
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization
> [2]
> http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables
>
>
> On Fri, May 30, 2014 at 5:38 PM, <sh...@accenture.com> wrote:
>
>> Pls find the answers
>>
>>
>>
>>
>>
>>
>>
>> *From:* kulkarni.swarnim@gmail.com [mailto:kulkarni.swarnim@gmail.com]
>> *Sent:* Friday, May 30, 2014 3:34 PM
>>
>> *To:* user@hive.apache.org
>> *Subject:* Re: Need urgent help on hive query performance
>>
>>
>>
>> I feel it's pretty hard to answer this without understanding the
>> following:
>>
>>
>>
>> 1. What exactly are you trying to query? CSV? Avro? ....
>>
>> HIVE table
>>
>> 2. Where is your data? HDFS? HBase? Local filesystem?
>>
>> Data is in s3
>>
>> 3. What version of hive are you using?
>>
>> Hive 0.12
>>
>> 4. What is an example of a query that is slow? Some queries like
>> joins and stuff would be inherently slower than other simpler ones(though
>> can be optimized).
>>
>> It has innumerable no of joins. Since its client specific query, u
>> understand I cannot share. Sorry about that
>>
>>
>>
>> Thanks,
>>
>>
>>
>> --
>> Swarnim
>>
>>
>>
>> On Fri, May 30, 2014 at 5:32 PM, <sh...@accenture.com> wrote:
>>
>> Can you please give a specific example or blog to refer to. I did not
>> understand
>>
>>
>>
>> *From:* Ashish Garg [mailto:gargcreation1992@gmail.com]
>> *Sent:* Friday, May 30, 2014 3:31 PM
>> *To:* user@hive.apache.org
>> *Subject:* Re: Need urgent help on hive query performance
>>
>>
>>
>> try partitioning the table and run the queries which are partition
>> specific. Hope this helps.
>>
>> Thanks and Regards,
>>
>> Ashish Garg.
>>
>>
>>
>> On Fri, May 30, 2014 at 6:05 PM, <sh...@accenture.com> wrote:
>>
>> Hi,
>>
>>
>>
>> Does anybody help urgently on optimizing hive query performance? I am
>> looking more Hadoop tuning point of view. Currently, small amount of table
>> takes much time to query?
>>
>>
>>
>> We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task
>> Nodes.
>>
>>
>>
>> Quick help is much appreciated.
>>
>>
>>
>> Thanks,
>>
>> Shouvanik
>>
>>
>> ------------------------------
>>
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Where allowed
>> by local law, electronic communications with Accenture and its affiliates,
>> including e-mail and instant messaging (including content), may be scanned
>> by our systems for the purposes of information security and assessment of
>> internal compliance with Accenture policy.
>>
>> ______________________________________________________________________________________
>>
>> www.accenture.com
>>
>>
>>
>>
>>
>>
>>
>> --
>> Swarnim
>>
>
>
>
> --
> Swarnim
>
Re: Need urgent help on hive query performance
Posted by "kulkarni.swarnim@gmail.com" <ku...@gmail.com>.
> It has innumerable no of joins. Since its client specific query, u
understand I cannot share. Sorry about that
Like I said, Joins are slow and in not done correctly could have terrible
performance. A couple of handy techniques depend on how exactly are you
trying to perform the join. For instance, if you are trying to join a
smaller table to a larger one, a map join could work well for you where the
smaller table is kept in-memory when the join is performed. Also if you are
able to break your table down to smaller buckets, you might as well be able
to use a bucketed map join for instance. Following link should be
helpful[1][2].
Hope this helps.
[1]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization
[2]
http://stackoverflow.com/questions/20199077/hive-efficient-join-of-two-tables
On Fri, May 30, 2014 at 5:38 PM, <sh...@accenture.com> wrote:
> Pls find the answers
>
>
>
>
>
>
>
> *From:* kulkarni.swarnim@gmail.com [mailto:kulkarni.swarnim@gmail.com]
> *Sent:* Friday, May 30, 2014 3:34 PM
>
> *To:* user@hive.apache.org
> *Subject:* Re: Need urgent help on hive query performance
>
>
>
> I feel it's pretty hard to answer this without understanding the following:
>
>
>
> 1. What exactly are you trying to query? CSV? Avro? ....
>
> HIVE table
>
> 2. Where is your data? HDFS? HBase? Local filesystem?
>
> Data is in s3
>
> 3. What version of hive are you using?
>
> Hive 0.12
>
> 4. What is an example of a query that is slow? Some queries like
> joins and stuff would be inherently slower than other simpler ones(though
> can be optimized).
>
> It has innumerable no of joins. Since its client specific query, u
> understand I cannot share. Sorry about that
>
>
>
> Thanks,
>
>
>
> --
> Swarnim
>
>
>
> On Fri, May 30, 2014 at 5:32 PM, <sh...@accenture.com> wrote:
>
> Can you please give a specific example or blog to refer to. I did not
> understand
>
>
>
> *From:* Ashish Garg [mailto:gargcreation1992@gmail.com]
> *Sent:* Friday, May 30, 2014 3:31 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Need urgent help on hive query performance
>
>
>
> try partitioning the table and run the queries which are partition
> specific. Hope this helps.
>
> Thanks and Regards,
>
> Ashish Garg.
>
>
>
> On Fri, May 30, 2014 at 6:05 PM, <sh...@accenture.com> wrote:
>
> Hi,
>
>
>
> Does anybody help urgently on optimizing hive query performance? I am
> looking more Hadoop tuning point of view. Currently, small amount of table
> takes much time to query?
>
>
>
> We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task
> Nodes.
>
>
>
> Quick help is much appreciated.
>
>
>
> Thanks,
>
> Shouvanik
>
>
> ------------------------------
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>
>
>
>
>
>
>
> --
> Swarnim
>
--
Swarnim
RE: Need urgent help on hive query performance
Posted by sh...@accenture.com.
Pls find the answers
From: kulkarni.swarnim@gmail.com [mailto:kulkarni.swarnim@gmail.com]
Sent: Friday, May 30, 2014 3:34 PM
To: user@hive.apache.org
Subject: Re: Need urgent help on hive query performance
I feel it's pretty hard to answer this without understanding the following:
1. What exactly are you trying to query? CSV? Avro? ....
HIVE table
2. Where is your data? HDFS? HBase? Local filesystem?
Data is in s3
3. What version of hive are you using?
Hive 0.12
4. What is an example of a query that is slow? Some queries like joins and stuff would be inherently slower than other simpler ones(though can be optimized).
It has innumerable no of joins. Since its client specific query, u understand I cannot share. Sorry about that
Thanks,
--
Swarnim
On Fri, May 30, 2014 at 5:32 PM, <sh...@accenture.com>> wrote:
Can you please give a specific example or blog to refer to. I did not understand
From: Ashish Garg [mailto:gargcreation1992@gmail.com<ma...@gmail.com>]
Sent: Friday, May 30, 2014 3:31 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: Need urgent help on hive query performance
try partitioning the table and run the queries which are partition specific. Hope this helps.
Thanks and Regards,
Ashish Garg.
On Fri, May 30, 2014 at 6:05 PM, <sh...@accenture.com>> wrote:
Hi,
Does anybody help urgently on optimizing hive query performance? I am looking more Hadoop tuning point of view. Currently, small amount of table takes much time to query?
We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task Nodes.
Quick help is much appreciated.
Thanks,
Shouvanik
________________________________
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________
www.accenture.com<http://www.accenture.com>
--
Swarnim
Re: Need urgent help on hive query performance
Posted by "kulkarni.swarnim@gmail.com" <ku...@gmail.com>.
I feel it's pretty hard to answer this without understanding the following:
1. What exactly are you trying to query? CSV? Avro? ....
2. Where is your data? HDFS? HBase? Local filesystem?
3. What version of hive are you using?
4. What is an example of a query that is slow? Some queries like joins and
stuff would be inherently slower than other simpler ones(though can be
optimized).
Thanks,
--
Swarnim
On Fri, May 30, 2014 at 5:32 PM, <sh...@accenture.com> wrote:
> Can you please give a specific example or blog to refer to. I did not
> understand
>
>
>
> *From:* Ashish Garg [mailto:gargcreation1992@gmail.com]
> *Sent:* Friday, May 30, 2014 3:31 PM
> *To:* user@hive.apache.org
> *Subject:* Re: Need urgent help on hive query performance
>
>
>
> try partitioning the table and run the queries which are partition
> specific. Hope this helps.
>
> Thanks and Regards,
>
> Ashish Garg.
>
>
>
> On Fri, May 30, 2014 at 6:05 PM, <sh...@accenture.com> wrote:
>
> Hi,
>
>
>
> Does anybody help urgently on optimizing hive query performance? I am
> looking more Hadoop tuning point of view. Currently, small amount of table
> takes much time to query?
>
>
>
> We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task
> Nodes.
>
>
>
> Quick help is much appreciated.
>
>
>
> Thanks,
>
> Shouvanik
>
>
> ------------------------------
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>
>
>
--
Swarnim
RE: Need urgent help on hive query performance
Posted by sh...@accenture.com.
Can you please give a specific example or blog to refer to. I did not understand
From: Ashish Garg [mailto:gargcreation1992@gmail.com]
Sent: Friday, May 30, 2014 3:31 PM
To: user@hive.apache.org
Subject: Re: Need urgent help on hive query performance
try partitioning the table and run the queries which are partition specific. Hope this helps.
Thanks and Regards,
Ashish Garg.
On Fri, May 30, 2014 at 6:05 PM, <sh...@accenture.com>> wrote:
Hi,
Does anybody help urgently on optimizing hive query performance? I am looking more Hadoop tuning point of view. Currently, small amount of table takes much time to query?
We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task Nodes.
Quick help is much appreciated.
Thanks,
Shouvanik
________________________________
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy.
______________________________________________________________________________________
www.accenture.com<http://www.accenture.com>
Re: Need urgent help on hive query performance
Posted by Ashish Garg <ga...@gmail.com>.
try partitioning the table and run the queries which are partition
specific. Hope this helps.
Thanks and Regards,
Ashish Garg.
On Fri, May 30, 2014 at 6:05 PM, <sh...@accenture.com> wrote:
> Hi,
>
>
>
> Does anybody help urgently on optimizing hive query performance? I am
> looking more Hadoop tuning point of view. Currently, small amount of table
> takes much time to query?
>
>
>
> We are running EMR cluster with 1 MASTER node, 2 Core Nodes and Task
> Nodes.
>
>
>
> Quick help is much appreciated.
>
>
>
> Thanks,
>
> Shouvanik
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>