You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by 俊杰陈 <cj...@gmail.com> on 2017/08/04 06:13:07 UTC

Impala Sorter just sort small partition?

Hi
I'm looking Sorter.cc and found that Sorter::SortHelper just sort smaller
partition. Is there anything I missed?

-- 
Thanks & Best Regards

Re: Impala Sorter just sort small partition?

Posted by Tim Armstrong <ta...@cloudera.com>.
No problems at all, thanks for your interest.

On Fri, Aug 4, 2017 at 9:07 PM, cjjnjust@gmail.com <cj...@gmail.com>
wrote:

> oh, there is an assignment to low.  thanks for patience:)
>
> ---Original---
> *From:* "Tim Armstrong "<ta...@cloudera.com>
> *Date:* 2017/8/5 11:27:21
> *To:* "dev@impala"<de...@impala.incubator.apache.org>;
> *Subject:* Re: Impala Sorter just sort small partition?
>
> It does sort both left and right partitions - it just recurses on the small
> partition and the next iteration of the loop processes the large partition.
>
> This is a pretty common optimisation. This page has a nice explanation:http://www.geeksforgeeks.org/quicksort-tail-call-optimization-reducing-worst-case-space-log-n/
>
> On Fri, Aug 4, 2017 at 6:12 PM, 俊杰陈 <cj...@gmail.com> wrote:
>
> > Thanks for your detail description.
> >
> > My question should be more specific to quicksort part. This line
> > <https://github.com/apache/incubator-impala/blob/master/
> > be/src/runtime/sorter.cc#L1258>
> > say
> > recurse on the small partition due to stack consideration, while as my
> > understanding quicksort should recurse on both left partition and right
> > partition, so I'm curious how it keep one run sorted, does it sort in later
> > merge sort or somewhere else?   But the merge process should take sorted
> > runs as input.
> >
> > 2017-08-05 0:18 GMT+08:00 Tim Armstrong <ta...@cloudera.com>:
> >
> > > The Sorter does a 3-level hybrid sort with merge sort, quicksort and
> > > insertion sort.
> > >
> > > SortHelper implements a 2-level hybrid in-memory sort. It fully sorts an
> > > arbitrarily sized in-memory input. E.g. if 'begin' and 'end' point to the
> > > begin and end of the sorted run, it will sort the full run. It does
> > > quicksort recursively then switches to insertion sort once the partitions
> > > are less than INSERTION_THRESHOLD = 16.
> > >
> > > Sorter also supports an external merge sort - if the full input doesn't
> > fit
> > > in memory, it sorts in-memory runs with SortHelper() then does merge sort
> > > with the sorted runs.
> > >
> > > On Thu, Aug 3, 2017 at 11:13 PM, 俊杰陈 <cjjnjust@gmail.com
> > wrote:
> > >
> > > > Hi
> > > > I'm looking Sorter.cc and found that Sorter::SortHelper just sort
> > smaller
> > > > partition. Is there anything I missed?
> > > >
> > > > --
> > > > Thanks & Best Regards
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Best Regards
> >
>
>

Re: Impala Sorter just sort small partition?

Posted by Tim Armstrong <ta...@cloudera.com>.
It does sort both left and right partitions - it just recurses on the small
partition and the next iteration of the loop processes the large partition.

This is a pretty common optimisation. This page has a nice explanation:
http://www.geeksforgeeks.org/quicksort-tail-call-optimization-reducing-worst-case-space-log-n/

On Fri, Aug 4, 2017 at 6:12 PM, 俊杰陈 <cj...@gmail.com> wrote:

> Thanks for your detail description.
>
> My question should be more specific to quicksort part. This line
> <https://github.com/apache/incubator-impala/blob/master/
> be/src/runtime/sorter.cc#L1258>
> say
> recurse on the small partition due to stack consideration, while as my
> understanding quicksort should recurse on both left partition and right
> partition, so I'm curious how it keep one run sorted, does it sort in later
> merge sort or somewhere else?   But the merge process should take sorted
> runs as input.
>
> 2017-08-05 0:18 GMT+08:00 Tim Armstrong <ta...@cloudera.com>:
>
> > The Sorter does a 3-level hybrid sort with merge sort, quicksort and
> > insertion sort.
> >
> > SortHelper implements a 2-level hybrid in-memory sort. It fully sorts an
> > arbitrarily sized in-memory input. E.g. if 'begin' and 'end' point to the
> > begin and end of the sorted run, it will sort the full run. It does
> > quicksort recursively then switches to insertion sort once the partitions
> > are less than INSERTION_THRESHOLD = 16.
> >
> > Sorter also supports an external merge sort - if the full input doesn't
> fit
> > in memory, it sorts in-memory runs with SortHelper() then does merge sort
> > with the sorted runs.
> >
> > On Thu, Aug 3, 2017 at 11:13 PM, 俊杰陈 <cj...@gmail.com> wrote:
> >
> > > Hi
> > > I'm looking Sorter.cc and found that Sorter::SortHelper just sort
> smaller
> > > partition. Is there anything I missed?
> > >
> > > --
> > > Thanks & Best Regards
> > >
> >
>
>
>
> --
> Thanks & Best Regards
>

Re: Impala Sorter just sort small partition?

Posted by 俊杰陈 <cj...@gmail.com>.
Thanks for your detail description.

My question should be more specific to quicksort part. This line
<https://github.com/apache/incubator-impala/blob/master/be/src/runtime/sorter.cc#L1258>
say
recurse on the small partition due to stack consideration, while as my
understanding quicksort should recurse on both left partition and right
partition, so I'm curious how it keep one run sorted, does it sort in later
merge sort or somewhere else?   But the merge process should take sorted
runs as input.

2017-08-05 0:18 GMT+08:00 Tim Armstrong <ta...@cloudera.com>:

> The Sorter does a 3-level hybrid sort with merge sort, quicksort and
> insertion sort.
>
> SortHelper implements a 2-level hybrid in-memory sort. It fully sorts an
> arbitrarily sized in-memory input. E.g. if 'begin' and 'end' point to the
> begin and end of the sorted run, it will sort the full run. It does
> quicksort recursively then switches to insertion sort once the partitions
> are less than INSERTION_THRESHOLD = 16.
>
> Sorter also supports an external merge sort - if the full input doesn't fit
> in memory, it sorts in-memory runs with SortHelper() then does merge sort
> with the sorted runs.
>
> On Thu, Aug 3, 2017 at 11:13 PM, 俊杰陈 <cj...@gmail.com> wrote:
>
> > Hi
> > I'm looking Sorter.cc and found that Sorter::SortHelper just sort smaller
> > partition. Is there anything I missed?
> >
> > --
> > Thanks & Best Regards
> >
>



-- 
Thanks & Best Regards

Re: Impala Sorter just sort small partition?

Posted by Tim Armstrong <ta...@cloudera.com>.
The Sorter does a 3-level hybrid sort with merge sort, quicksort and
insertion sort.

SortHelper implements a 2-level hybrid in-memory sort. It fully sorts an
arbitrarily sized in-memory input. E.g. if 'begin' and 'end' point to the
begin and end of the sorted run, it will sort the full run. It does
quicksort recursively then switches to insertion sort once the partitions
are less than INSERTION_THRESHOLD = 16.

Sorter also supports an external merge sort - if the full input doesn't fit
in memory, it sorts in-memory runs with SortHelper() then does merge sort
with the sorted runs.

On Thu, Aug 3, 2017 at 11:13 PM, 俊杰陈 <cj...@gmail.com> wrote:

> Hi
> I'm looking Sorter.cc and found that Sorter::SortHelper just sort smaller
> partition. Is there anything I missed?
>
> --
> Thanks & Best Regards
>