You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Prashanth Vasudev (JIRA)" <ji...@apache.org> on 2016/10/04 04:05:20 UTC

[jira] [Commented] (TRAFODION-2259) Sort TopN operator

    [ https://issues.apache.org/jira/browse/TRAFODION-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15544272#comment-15544272 ] 

Prashanth Vasudev commented on TRAFODION-2259:
----------------------------------------------

This improvement involves two sets of changes.
1. Executor sort operator change that implements the sort.
2. Compiler change to push down topN to sort.

The executor sort implementation will be as follows:

1. Sort would initially maintain Top N array of elements to being with.
2. Read records into TopN array. 
3. Once TopN array is full, heapify the array into max heap. Top node in the heap is always the highest node.
4. Subsequent record read either gets discarded( if greater than top node) or replace top node( if lesser then top node) . if replaced top node, re-balance the heap.
5. Repeat steps 4 until last record is read.
6. sort the final heap using heap sort. 



> Sort TopN operator
> ------------------
>
>                 Key: TRAFODION-2259
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2259
>             Project: Apache Trafodion
>          Issue Type: Improvement
>          Components: sql-exe
>    Affects Versions: 2.1-incubating
>            Reporter: Prashanth Vasudev
>            Assignee: Prashanth Vasudev
>
> Sort operator consumes all records before producing sorted records. For certain use cases where only Top N records are required, today sort consumes all records into memory and overflows( spills ) to disk. This impacts performance. 
> if topN is pushed down to sort, only required memory can be allocated and sort would only hold topN records in memory. Once all the records are read, sorted records in topN is returned. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)