You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by abhiTowson cal <ab...@gmail.com> on 2012/07/24 03:54:36 UTC
Hive query optimization
Hi all,
Some queries in hive are executing for too long.So i have overriden
some parameters in hive, for some querys performance increased rapidly
when i overriden this properities for some querys no change in
performance.Can any one you
tell me any other optimizations in hive apart from partitions and
buckets,
set io.sort.mb=512;
set io.sort.factor=100;
set mapred.reduce.parallel.copies=40;
set hive.map.aggr =true;
set hive.exec.parallel=true;
set hive.groupby.skewindata=true;
set mapred.job.reuse.jvm.num.tasks=-1;
default values were
io.sort.mb=256;
io.sort.factor=10;
mapred.reduce.parallel.copies=10;
Thanks
Abhishek
Re: Hive query optimization
Posted by Abhishek <ab...@gmail.com>.
Hi Tatarinov,
Thanks for the reply, by my understanding did you mean to set number to reduce tasks equal to number of reduce slots in the cluster?
Regards
Abhi
Sent from my iPhone
On Jul 24, 2012, at 12:51 AM, Igor Tatarinov <ig...@decide.com> wrote:
> Here is my 2 cents.
> The parameters you are looking at are quite specific. Unless you know what you are doing it might be hard to set them exactly right and they shouldn't make that much of a difference - again unless you know the specifics.
>
> What worked for me is using a single "wave" of reducers. Basically, you want to set the number of reduce tasks to be equal to the number of reduce slots (assuming your job will run by itself).
>
> It might also help to re-arrange your joins so that the larger table is streamed (https://cwiki.apache.org/Hive/languagemanual-joins.html).
> That seems especially important with map joins since those fail if there is not enough memory and have to be rerun as regular joins.
>
> Hope this helps.
>
> On Mon, Jul 23, 2012 at 6:54 PM, abhiTowson cal <ab...@gmail.com> wrote:
> Hi all,
>
> Some queries in hive are executing for too long.So i have overriden
> some parameters in hive, for some querys performance increased rapidly
> when i overriden this properities for some querys no change in
> performance.Can any one you
> tell me any other optimizations in hive apart from partitions and
> buckets,
>
> set io.sort.mb=512;
> set io.sort.factor=100;
> set mapred.reduce.parallel.copies=40;
> set hive.map.aggr =true;
> set hive.exec.parallel=true;
> set hive.groupby.skewindata=true;
> set mapred.job.reuse.jvm.num.tasks=-1;
>
> default values were
>
> io.sort.mb=256;
> io.sort.factor=10;
> mapred.reduce.parallel.copies=10;
>
> Thanks
> Abhishek
>
Re: Hive query optimization
Posted by Igor Tatarinov <ig...@decide.com>.
Here is my 2 cents.
The parameters you are looking at are quite specific. Unless you know what
you are doing it might be hard to set them exactly right and they shouldn't
make that much of a difference - again unless you know the specifics.
What worked for me is using a single "wave" of reducers. Basically, you
want to set the number of reduce tasks to be equal to the number of reduce
slots (assuming your job will run by itself).
It might also help to re-arrange your joins so that the larger table is
streamed (https://cwiki.apache.org/Hive/languagemanual-joins.html).
That seems especially important with map joins since those fail if there is
not enough memory and have to be rerun as regular joins.
Hope this helps.
On Mon, Jul 23, 2012 at 6:54 PM, abhiTowson cal
<ab...@gmail.com>wrote:
> Hi all,
>
> Some queries in hive are executing for too long.So i have overriden
> some parameters in hive, for some querys performance increased rapidly
> when i overriden this properities for some querys no change in
> performance.Can any one you
> tell me any other optimizations in hive apart from partitions and
> buckets,
>
> set io.sort.mb=512;
> set io.sort.factor=100;
> set mapred.reduce.parallel.copies=40;
> set hive.map.aggr =true;
> set hive.exec.parallel=true;
> set hive.groupby.skewindata=true;
> set mapred.job.reuse.jvm.num.tasks=-1;
>
> default values were
>
> io.sort.mb=256;
> io.sort.factor=10;
> mapred.reduce.parallel.copies=10;
>
> Thanks
> Abhishek
>