You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Keith Wiley <kw...@keithwiley.com> on 2013/03/23 00:02:24 UTC

Query crawls through reducer

The following query translates into a many-map-single-reduce job (which is common) and also slags through the reduce stage...it's killing the overall query:

select * from a where b >= 'c' order by b desc limit 100

Note that b is a partition.  What component is making the reducer heavy?  Is it the order by or the limit (I'm sure it's not the partition-specific where clause, right?)?  Are there ways to improve its performance?

________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
                                           --  Keith Wiley
________________________________________________________________________________


Re: Query crawls through reducer

Posted by Keith Wiley <kw...@keithwiley.com>.
Thanks.

On Mar 22, 2013, at 21:02 , Nitin Pawar wrote:

> instead of >= can you just try =  if you want to limit top 100 (b being a partition  i guess it will have more that 100 records to fit into your limit)
> 
> to improve your query performance your table file format matters as well. Which one are you using?  
> how many partitions are there? 
> what's the size of the cluster?
> you can set the number of reducers but if your query just has one key then only one reducer will get the data and rest will run empty 
> 
> 
> 
> On Sat, Mar 23, 2013 at 4:32 AM, Keith Wiley <kw...@keithwiley.com> wrote:
> The following query translates into a many-map-single-reduce job (which is common) and also slags through the reduce stage...it's killing the overall query:
> 
> select * from a where b >= 'c' order by b desc limit 100
> 
> Note that b is a partition.  What component is making the reducer heavy?  Is it the order by or the limit (I'm sure it's not the partition-specific where clause, right?)?  Are there ways to improve its performance?
> 
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com
> 
> "You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
> itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
> scratch. All together this implies: He scratched the itch from the scratch that
> itched but would never itch the scratch from the itch that scratched."
>                                            --  Keith Wiley
> ________________________________________________________________________________
> 
> 
> 
> 
> -- 
> Nitin Pawar


________________________________________________________________________________
Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
                                           --  Keith Wiley
________________________________________________________________________________


Re: Query crawls through reducer

Posted by Nitin Pawar <ni...@gmail.com>.
instead of >= can you just try =  if you want to limit top 100 (b being a
partition  i guess it will have more that 100 records to fit into your
limit)

to improve your query performance your table file format matters as well.
Which one are you using?
how many partitions are there?
what's the size of the cluster?
you can set the number of reducers but if your query just has one key then
only one reducer will get the data and rest will run empty



On Sat, Mar 23, 2013 at 4:32 AM, Keith Wiley <kw...@keithwiley.com> wrote:

> The following query translates into a many-map-single-reduce job (which is
> common) and also slags through the reduce stage...it's killing the overall
> query:
>
> select * from a where b >= 'c' order by b desc limit 100
>
> Note that b is a partition.  What component is making the reducer heavy?
>  Is it the order by or the limit (I'm sure it's not the partition-specific
> where clause, right?)?  Are there ways to improve its performance?
>
>
> ________________________________________________________________________________
> Keith Wiley     kwiley@keithwiley.com     keithwiley.com
> music.keithwiley.com
>
> "You can scratch an itch, but you can't itch a scratch. Furthermore, an
> itch can
> itch but a scratch can't scratch. Finally, a scratch can itch, but an itch
> can't
> scratch. All together this implies: He scratched the itch from the scratch
> that
> itched but would never itch the scratch from the itch that scratched."
>                                            --  Keith Wiley
>
> ________________________________________________________________________________
>
>


-- 
Nitin Pawar