You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "James Taylor (JIRA)" <ji...@apache.org> on 2015/10/27 01:46:27 UTC
[jira] [Commented] (PHOENIX-2126) Improving performance of merge sort by multi-threaded and minheap implementation

    [ https://issues.apache.org/jira/browse/PHOENIX-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975439#comment-14975439 ] 

James Taylor commented on PHOENIX-2126:
---------------------------------------

Thanks for the patch, [~ankit.singhal]. There's some really excellent work there - we really appreciate it. It'd be good to break up the patch into more manageable pieces as it's pretty sizable. I think different parts of your patch may be applicable for different query optimizations. In general, though, we don't want to pull over all rows from the server into a memory mapped file on the client for all the cases doing merge sort. In the case of a multi billion row table, just imagine the disk space that would be required (FWIW, we used to do this when a SELECT * query was done with our spooling, but changed it for just that reason).

Here's an idea on how to get this committed in pieces:
- Use a priority queue in MergeSortResultIterator. This will greatly reduce the number of key comparisons we're doing. The queue depth would be bounded by the number of child iterators, so memory consumption would not be an issue.
- Use your parallelized two level heap implementation to perform a client-side sort when it's necessary. This would be a big performance boost in these situations as we're doing this client side sort single threaded currently (and we're already buffering on the client side so there's no additional cost for that).

Thoughts?


> Improving performance of merge sort by multi-threaded and minheap implementation
> --------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2126
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2126
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.1.0, 4.2.0
>            Reporter: Ankit Singhal
>         Attachments: PHOENIX-2126_v1.0.patch, PHOENIX-2126_v2.0.patch, PHOENIX-2126_v3.patch
>
>
> {code}
> CREATE TABLE IF NOT EXISTS test (
> dim1 INTEGER NOT NULL,
> A.B INTEGER,
> A.M DECIMAL,
> CONSTRAINT PK PRIMARY KEY
> (dim1))
> SALT_BUCKETS =256,DEFAULT_COLUMN_FAMILY='A';
> {code}
> *Query to benchmark:-*
> {code}
> select dim1,sum(b),sum(m) from test where Datemth>=201505 and Datemth<=201505 and dim1 IS NOT NULL  group by dim1 order by sum(m) desc nulls last limit 10;
> {code}
> *current scenario:-*
> *CASE 1: * consider the case when dim1 is high cardinality attribute (10K+) and table have salt bucket set to 256, we will get 256 iterators from above query at the client and MergeSortRowKeyResultIterator has to merge these 256 iterators with single thread. So let's say each iterator has 10k tuples returned, then merge sort needs to merge 2.5M tuples which will be costly if it is done with single thread and the query spend most of its time on client
> *CASE 2: * consider the case when dim1 is high cardinality attribute (10K+) and table have salt bucket set to 1, we will get 1 iterator from  above query at the client and MergeSortRowKeyResultIterator doesn't need to merge anything. Here, it is fine with single thread.
> *CASE 3: * consider the case when dim1 is low cardinality attribute (10-100) and table have salt bucket set to 256, we will get 256 iterator from  above query at the client and MergeSortRowKeyResultIterator has to merge these 256 iterators with single thread. here the single thread is also fine as he has to merge only 2560 tuples.
> *Solution for case1 problem is:-*
> Optimized the implementation of merging 'n'-sorted iterators(having 'm' tuples)  by using "min heap" which optimizes the time complexity from 
> O(n2m) to O(nmLogn) (as heapify takes (Logn) time).
> And, By using multiple-threads('t') to process group of iterators which further optimized the complexity to 
> T(nm)=T(nm)/t+T(t)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)