You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Vaibhav Jain <va...@gmail.com> on 2014/02/21 05:32:14 UTC
Query regarding Hive Parallel Orderby
Hi,
Hive 12 has added the functionality of parallel order by. I have a few
queries regarding the working of it.
>From the source code I have figured out that to do a parallel orderby , a
partition table needs to created
which is provided as an input to TotalOrderPartitioner. To create the
partition table, a sample of
the hive table is stored as ArrayList of byte arrays and then sorted.
So I have the following queries :
1) Is my understanding correct?
2) Isn't it a possibility that storing the entire sample in memory would
become a bottleneck when the sample size is large?
--
Thanks
Vaibhav Jain
Re: Query regarding Hive Parallel Orderby
Posted by Navis류승우 <na...@nexr.com>.
bq. Is my understanding correct?
Yes.
bq. Isn't it a possibility that storing the entire sample in memory would
become a bottleneck when the sample size is large?
Yes.
Thanks,
2014-02-21 13:32 GMT+09:00 Vaibhav Jain <va...@gmail.com>:
> Hi,
>
> Hive 12 has added the functionality of parallel order by. I have a few
> queries regarding the working of it.
> From the source code I have figured out that to do a parallel orderby , a
> partition table needs to created
> which is provided as an input to TotalOrderPartitioner. To create the
> partition table, a sample of
> the hive table is stored as ArrayList of byte arrays and then sorted.
>
> So I have the following queries :
>
> 1) Is my understanding correct?
>
> 2) Isn't it a possibility that storing the entire sample in memory would
> become a bottleneck when the sample size is large?
>
>
> --
> Thanks
> Vaibhav Jain
>