You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/07/02 20:46:00 UTC
[jira] [Commented] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

    [ https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530424#comment-16530424 ] 

ASF GitHub Bot commented on PHOENIX-4751:
-----------------------------------------

GitHub user geraldss opened a pull request:

    https://github.com/apache/phoenix/pull/308

    Client-side hash aggregation

    Client-side hash aggregation for use with sort-merge join.
    
    Implements https://issues.apache.org/jira/browse/PHOENIX-4751


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/geraldss/phoenix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #308
    
----
commit c8acc6cb39e222a5206c79566552c5c27cbe27f1
Author: Gerald Sangudi <gs...@...>
Date:   2018-06-14T19:49:30Z

    PHOENIX-4751 Add HASH_AGGREGATE hint

commit a261b3f94f753b4a8d6baaad6168e76f97d76bb6
Author: Gerald Sangudi <gs...@...>
Date:   2018-06-16T04:17:32Z

    PHOENIX-4751 Begin implementation of client hash aggregation

commit 863d24e34a83282f90d5d2db05522b678dfced74
Author: Rajeshbabu Chintaguntla <ra...@...>
Date:   2018-06-15T22:38:44Z

    PHOENIX-4786 Reduce log level to debug when logging new aggregate row key found and added results for scan ordered queries(Rajeshbabu)

commit cfae7ddcfa5b58a367cd0c57c23f394ceb9f1259
Author: Gerald Sangudi <gs...@...>
Date:   2018-06-16T04:55:00Z

    Merge remote-tracking branch 'upstream/master'

commit 1f453308a24be49a8036292671d51eb25137d680
Author: Gerald Sangudi <gs...@...>
Date:   2018-06-20T17:47:34Z

    PHOENIX-4751 Generated aggregated results

commit 66aaacfd989c63e18fb9a5c5b9e133519ab93507
Author: Gerald Sangudi <gs...@...>
Date:   2018-06-24T23:18:14Z

    PHOENIX-4751 Sort results of client hash aggregation

commit a6c2b7ce738710cfdffc1e9e4d1d234d2090a225
Author: James Taylor <ja...@...>
Date:   2018-06-18T13:00:02Z

    PHOENIX-4789 Exception when setting TTL on Tephra transactional table

commit fba4196fcace83d4e42e902d2cb6295bb519ed39
Author: Ankit Singhal <an...@...>
Date:   2018-06-21T23:11:02Z

    PHOENIX-4785 Unable to write to table if index is made active during retry

commit 05de081b386c502b6c90ff24357ed7dbbc6dedd2
Author: Gerald Sangudi <gs...@...>
Date:   2018-06-29T05:01:55Z

    PHOENIX-4751 Add integration test for client hash aggregation

commit b7960d0daedc6ce3c2fbcf0794e4a95639d7ba3c
Author: Gerald Sangudi <gs...@...>
Date:   2018-06-30T00:03:59Z

    PHOENIX-4751 Fix and run integration tests for query results

commit a3629ac64b90c117f5caceddbb45fb9dc14649b8
Author: Gerald Sangudi <gs...@...>
Date:   2018-06-30T06:22:43Z

    PHOENIX-4751 Add integration test for EXPLAIN

commit 3aa85d5c04309f6e0c5167c002e9dcb6091ea757
Author: Gerald Sangudi <gs...@...>
Date:   2018-06-30T17:13:17Z

    PHOENIX-4751 Verify EXPLAIN plan for both salted and unsalted

----


> Support client-side hash aggregation with SORT_MERGE_JOIN
> ---------------------------------------------------------
>
>                 Key: PHOENIX-4751
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4751
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.14.0, 4.13.1
>            Reporter: Gerald Sangudi
>            Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash aggregation. It instead performs a CLIENT SORT followed by a CLIENT AGGREGATE. The performance can be improved if (a) the GROUP BY output does not need to be sorted, and (b) the GROUP BY input is large enough and has low cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 4.13.1 that would benefit from hash aggregation if the GROUP BY input is large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  +-------------------------------------------------------------+----------------++------------------+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-------------------------------------------------------------+----------------++------------------+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-------------------------------------------------------------+----------------++------------------+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)