You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Vaibhav Jain <va...@gmail.com> on 2014/02/21 05:32:14 UTC

Query regarding Hive Parallel Orderby

Hi,

Hive 12 has added the functionality of parallel order by. I have a few
queries regarding the working of it.
>From the source code I have figured out that to do a parallel orderby , a
partition table needs to created
which is provided as an input to TotalOrderPartitioner.  To create the
partition table, a sample of
the hive table is stored as ArrayList of byte arrays and then sorted.

So I have the following queries :

1)  Is my understanding correct?

2) Isn't it a possibility that storing the entire sample in memory would
become a bottleneck when the sample size is large?


-- 
Thanks
Vaibhav Jain

Re: Query regarding Hive Parallel Orderby

Posted by Navis류승우 <na...@nexr.com>.
bq. Is my understanding correct?

Yes.

bq. Isn't it a possibility that storing the entire sample in memory would
become a bottleneck when the sample size is large?

Yes.

Thanks,


2014-02-21 13:32 GMT+09:00 Vaibhav Jain <va...@gmail.com>:

> Hi,
>
> Hive 12 has added the functionality of parallel order by. I have a few
> queries regarding the working of it.
> From the source code I have figured out that to do a parallel orderby , a
> partition table needs to created
> which is provided as an input to TotalOrderPartitioner.  To create the
> partition table, a sample of
> the hive table is stored as ArrayList of byte arrays and then sorted.
>
> So I have the following queries :
>
> 1)  Is my understanding correct?
>
> 2) Isn't it a possibility that storing the entire sample in memory would
> become a bottleneck when the sample size is large?
>
>
> --
> Thanks
> Vaibhav Jain
>