You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Gerald Sangudi <gs...@23andme.com> on 2018/05/18 17:57:13 UTC

Hash aggregation

Hello,

Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
should I file a ticket? We have aggregation queries that do not require
sorted results.

For example, this EXPLAIN plan shows a CLIENT SORT.

*CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB BIGINT NOT
NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA, keyB));*


*EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2, COUNT(*) c
FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP BY t1.val,
t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
                           PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
|+------------------------------------------------------------+-----------------+----------------+--+|
SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
| || AND                                                        | null |
null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
| null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
   | null | null |
|+------------------------------------------------------------+-----------------+----------------+--+*
Thanks,
Gerald

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Hi folks,

I've received a couple of reviews and applied some of the feedback.

Is there any way to know when this pull request will be merged?

Thanks,
Gerald

On Mon, Jul 9, 2018 at 9:36 AM, Gerald Sangudi <gs...@23andme.com> wrote:

> Hi folks,
>
> Any idea of when this might be reviewed? I realize there are many open
> tasks.
>
> Thanks,
> Gerald
>
>
> On Mon, Jul 2, 2018 at 1:54 PM, Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Hello all,
>>
>> I've submitted a patch for this issue: https://github.com/apac
>> he/phoenix/pull/308
>>
>> The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751
>>
>> Thanks,
>> Gerald
>>
>>
>> On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi <gs...@23andme.com>
>> wrote:
>>
>>> Thanks James. Looking into that.
>>>
>>> Gerald
>>>
>>>
>>> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <ja...@apache.org>
>>> wrote:
>>>
>>>> Hi Gerald,
>>>> No further suggestions than my comments on the JIRA. Maybe a good next
>>>> step would be a patch?
>>>> Thanks,
>>>> James
>>>>
>>>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gs...@23andme.com>
>>>> wrote:
>>>>
>>>>> Hi Maryann and James,
>>>>>
>>>>> Any further guidance on PHOENIX-4751
>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gsangudi@23andme.com
>>>>> > wrote:
>>>>>
>>>>>> Hi Maryann,
>>>>>>
>>>>>> I filed PHOENIX-4751
>>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>>>
>>>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>>>> look at the Phoenix source to estimate the scope / impact?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Since the performance running a group-by aggregation on client side
>>>>>>> is most likely bad, it’s usually not desired. The original implementation
>>>>>>> was for functionality completeness only so it chose the easiest way, which
>>>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maryann
>>>>>>>
>>>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <
>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Any guidance or thoughts on the thread below?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>
>>>>>>>>> Maryann,
>>>>>>>>>
>>>>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Gerald
>>>>>>>>>
>>>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <
>>>>>>>>> maryann.xue@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Gerald,
>>>>>>>>>>
>>>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>>>>>> which is a client driven join, and after that join stage all operations can
>>>>>>>>>> only be on the client-side.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Marynn
>>>>>>>>>>
>>>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that do not
>>>>>>>>>>> require sorted results.
>>>>>>>>>>>
>>>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>>>>
>>>>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>>>>>> keyB));*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>>>>>>> BY t1.val,
>>>>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>>>>>>> | || AND                                                        | null |
>>>>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>>>>>    | null | null |
>>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Gerald
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Hi folks,

Any idea of when this might be reviewed? I realize there are many open
tasks.

Thanks,
Gerald

On Mon, Jul 2, 2018 at 1:54 PM, Gerald Sangudi <gs...@23andme.com> wrote:

> Hello all,
>
> I've submitted a patch for this issue: https://github.com/
> apache/phoenix/pull/308
>
> The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751
>
> Thanks,
> Gerald
>
>
> On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Thanks James. Looking into that.
>>
>> Gerald
>>
>>
>> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <ja...@apache.org>
>> wrote:
>>
>>> Hi Gerald,
>>> No further suggestions than my comments on the JIRA. Maybe a good next
>>> step would be a patch?
>>> Thanks,
>>> James
>>>
>>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gs...@23andme.com>
>>> wrote:
>>>
>>>> Hi Maryann and James,
>>>>
>>>> Any further guidance on PHOENIX-4751
>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gs...@23andme.com>
>>>> wrote:
>>>>
>>>>> Hi Maryann,
>>>>>
>>>>> I filed PHOENIX-4751
>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>>
>>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>>> look at the Phoenix source to estimate the scope / impact?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Since the performance running a group-by aggregation on client side
>>>>>> is most likely bad, it’s usually not desired. The original implementation
>>>>>> was for functionality completeness only so it chose the easiest way, which
>>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Any guidance or thoughts on the thread below?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>
>>>>>>>> Maryann,
>>>>>>>>
>>>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <
>>>>>>>> maryann.xue@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Gerald,
>>>>>>>>>
>>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>>>>> which is a client driven join, and after that join stage all operations can
>>>>>>>>> only be on the client-side.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Marynn
>>>>>>>>>
>>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that do not
>>>>>>>>>> require sorted results.
>>>>>>>>>>
>>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>>>
>>>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>>>>> keyB));*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>>>>>> BY t1.val,
>>>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>>>>>> | || AND                                                        | null |
>>>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>>>>    | null | null |
>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>>>> Thanks,
>>>>>>>>>> Gerald
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Hello all,

I've submitted a patch for this issue:
https://github.com/apache/phoenix/pull/308

The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751

Thanks,
Gerald


On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi <gs...@23andme.com>
wrote:

> Thanks James. Looking into that.
>
> Gerald
>
>
> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <ja...@apache.org>
> wrote:
>
>> Hi Gerald,
>> No further suggestions than my comments on the JIRA. Maybe a good next
>> step would be a patch?
>> Thanks,
>> James
>>
>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gs...@23andme.com>
>> wrote:
>>
>>> Hi Maryann and James,
>>>
>>> Any further guidance on PHOENIX-4751
>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gs...@23andme.com>
>>> wrote:
>>>
>>>> Hi Maryann,
>>>>
>>>> I filed PHOENIX-4751
>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>
>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>> look at the Phoenix source to estimate the scope / impact?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> Since the performance running a group-by aggregation on client side is
>>>>> most likely bad, it’s usually not desired. The original implementation was
>>>>> for functionality completeness only so it chose the easiest way, which
>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Any guidance or thoughts on the thread below?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>>
>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>> gsangudi@23andme.com> wrote:
>>>>>>
>>>>>>> Maryann,
>>>>>>>
>>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann.xue@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Gerald,
>>>>>>>>
>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>>>> which is a client driven join, and after that join stage all operations can
>>>>>>>> only be on the client-side.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Marynn
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that do not
>>>>>>>>> require sorted results.
>>>>>>>>>
>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>>
>>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>>>> keyB));*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>>>>> BY t1.val,
>>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>>>>> | || AND                                                        | null |
>>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>>>    | null | null |
>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>>> Thanks,
>>>>>>>>> Gerald
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Thanks James. Looking into that.

Gerald


On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <ja...@apache.org>
wrote:

> Hi Gerald,
> No further suggestions than my comments on the JIRA. Maybe a good next
> step would be a patch?
> Thanks,
> James
>
> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Hi Maryann and James,
>>
>> Any further guidance on PHOENIX-4751
>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>
>> Thanks,
>> Gerald
>>
>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gs...@23andme.com>
>> wrote:
>>
>>> Hi Maryann,
>>>
>>> I filed PHOENIX-4751
>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>
>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>> look at the Phoenix source to estimate the scope / impact?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
>>> wrote:
>>>
>>>> Since the performance running a group-by aggregation on client side is
>>>> most likely bad, it’s usually not desired. The original implementation was
>>>> for functionality completeness only so it chose the easiest way, which
>>>> reused some existing classes. In some cases, though, the client group-by
>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Any guidance or thoughts on the thread below?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>>
>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gsangudi@23andme.com
>>>>> > wrote:
>>>>>
>>>>>> Maryann,
>>>>>>
>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Gerald,
>>>>>>>
>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>>> which is a client driven join, and after that join stage all operations can
>>>>>>> only be on the client-side.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Marynn
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that do not
>>>>>>>> require sorted results.
>>>>>>>>
>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>
>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>>> keyB));*
>>>>>>>>
>>>>>>>>
>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>>>> BY t1.val,
>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>>>> | || AND                                                        | null |
>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>>    | null | null |
>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: Hash aggregation

Posted by James Taylor <ja...@apache.org>.

Hi Gerald,
No further suggestions than my comments on the JIRA. Maybe a good next step
would be a patch?
Thanks,
James

On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gs...@23andme.com>
wrote:

> Hi Maryann and James,
>
> Any further guidance on PHOENIX-4751
> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>
> Thanks,
> Gerald
>
> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Hi Maryann,
>>
>> I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751>
>> .
>>
>> Is this likely to be reviewed soon (say next few weeks), or should I look
>> at the Phoenix source to estimate the scope / impact?
>>
>> Thanks,
>> Gerald
>>
>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
>> wrote:
>>
>>> Since the performance running a group-by aggregation on client side is
>>> most likely bad, it’s usually not desired. The original implementation was
>>> for functionality completeness only so it chose the easiest way, which
>>> reused some existing classes. In some cases, though, the client group-by
>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Any guidance or thoughts on the thread below?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>>
>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gs...@23andme.com>
>>>> wrote:
>>>>
>>>>> Maryann,
>>>>>
>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Gerald,
>>>>>>
>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>> which is a client driven join, and after that join stage all operations can
>>>>>> only be on the client-side.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Marynn
>>>>>>
>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>> gsangudi@23andme.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap,
>>>>>>> or should I file a ticket? We have aggregation queries that do not require
>>>>>>> sorted results.
>>>>>>>
>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>
>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>> keyB));*
>>>>>>>
>>>>>>>
>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>>> BY t1.val,
>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>>> | || AND                                                        | null |
>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>    | null | null |
>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Hi Maryann and James,

Any further guidance on PHOENIX-4751
<https://issues.apache.org/jira/browse/PHOENIX-4751>?

Thanks,
Gerald

On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gs...@23andme.com>
wrote:

> Hi Maryann,
>
> I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>
> Is this likely to be reviewed soon (say next few weeks), or should I look
> at the Phoenix source to estimate the scope / impact?
>
> Thanks,
> Gerald
>
> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
> wrote:
>
>> Since the performance running a group-by aggregation on client side is
>> most likely bad, it’s usually not desired. The original implementation was
>> for functionality completeness only so it chose the easiest way, which
>> reused some existing classes. In some cases, though, the client group-by
>> can still be tolerable if there aren’t many distinct keys. So yes, please
>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>
>>
>> Thanks,
>> Maryann
>>
>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Any guidance or thoughts on the thread below?
>>>
>>> Thanks,
>>> Gerald
>>>
>>>
>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gs...@23andme.com>
>>> wrote:
>>>
>>>> Maryann,
>>>>
>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gerald,
>>>>>
>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>> which is a client driven join, and after that join stage all operations can
>>>>> only be on the client-side.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Marynn
>>>>>
>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gsangudi@23andme.com
>>>>> > wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap,
>>>>>> or should I file a ticket? We have aggregation queries that do not require
>>>>>> sorted results.
>>>>>>
>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>
>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>> keyB));*
>>>>>>
>>>>>>
>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>> BY t1.val,
>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>> | || AND                                                        | null |
>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>    | null | null |
>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Hi Maryann,

I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751>.

Is this likely to be reviewed soon (say next few weeks), or should I look
at the Phoenix source to estimate the scope / impact?

Thanks,
Gerald

On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com> wrote:

> Since the performance running a group-by aggregation on client side is
> most likely bad, it’s usually not desired. The original implementation was
> for functionality completeness only so it chose the easiest way, which
> reused some existing classes. In some cases, though, the client group-by
> can still be tolerable if there aren’t many distinct keys. So yes, please
> open a JIRA for implementing hash aggregation on client side. Thank you!
>
>
> Thanks,
> Maryann
>
> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Hello,
>>
>> Any guidance or thoughts on the thread below?
>>
>> Thanks,
>> Gerald
>>
>>
>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gs...@23andme.com>
>> wrote:
>>
>>> Maryann,
>>>
>>> Can Phoenix provide hash aggregation on the client side? Are there
>>> design / implementation reasons not to, or should I file a ticket for this?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <ma...@gmail.com>
>>> wrote:
>>>
>>>> Hi Gerald,
>>>>
>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>> aggregation is used in your query plan is that the aggregation happens on
>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>> which is a client driven join, and after that join stage all operations can
>>>> only be on the client-side.
>>>>
>>>>
>>>> Thanks,
>>>> Marynn
>>>>
>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gs...@23andme.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap,
>>>>> or should I file a ticket? We have aggregation queries that do not require
>>>>> sorted results.
>>>>>
>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>
>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB BIGINT
>>>>> NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>> keyB));*
>>>>>
>>>>>
>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>> BY t1.val,
>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>> | || AND                                                        | null |
>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>    | null | null |
>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>
>>>>
>>>
>>

Re: Hash aggregation

Posted by Maryann Xue <ma...@gmail.com>.

Since the performance running a group-by aggregation on client side is most
likely bad, it’s usually not desired. The original implementation was for
functionality completeness only so it chose the easiest way, which reused
some existing classes. In some cases, though, the client group-by can still
be tolerable if there aren’t many distinct keys. So yes, please open a JIRA
for implementing hash aggregation on client side. Thank you!


Thanks,
Maryann
On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
wrote:

> Hello,
>
> Any guidance or thoughts on the thread below?
>
> Thanks,
> Gerald
>
>
> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Maryann,
>>
>> Can Phoenix provide hash aggregation on the client side? Are there design
>> / implementation reasons not to, or should I file a ticket for this?
>>
>> Thanks,
>> Gerald
>>
>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <ma...@gmail.com>
>> wrote:
>>
>>> Hi Gerald,
>>>
>>> Phoenix does have hash aggregation. The reason why sort-based
>>> aggregation is used in your query plan is that the aggregation happens on
>>> the client side. And that is because sort-merge join is used (as hinted)
>>> which is a client driven join, and after that join stage all operations can
>>> only be on the client-side.
>>>
>>>
>>> Thanks,
>>> Marynn
>>>
>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gs...@23andme.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
>>>> should I file a ticket? We have aggregation queries that do not require
>>>> sorted results.
>>>>
>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>
>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB BIGINT
>>>> NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>> keyB));*
>>>>
>>>>
>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>> BY t1.val,
>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>> | || AND                                                        | null |
>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>    | null | null |
>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>> Thanks,
>>>> Gerald
>>>>
>>>
>>>
>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Hello,

Any guidance or thoughts on the thread below?

Thanks,
Gerald

On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gs...@23andme.com>
wrote:

> Maryann,
>
> Can Phoenix provide hash aggregation on the client side? Are there design
> / implementation reasons not to, or should I file a ticket for this?
>
> Thanks,
> Gerald
>
> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <ma...@gmail.com>
> wrote:
>
>> Hi Gerald,
>>
>> Phoenix does have hash aggregation. The reason why sort-based aggregation
>> is used in your query plan is that the aggregation happens on the client
>> side. And that is because sort-merge join is used (as hinted) which is a
>> client driven join, and after that join stage all operations can only be on
>> the client-side.
>>
>>
>> Thanks,
>> Marynn
>>
>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gs...@23andme.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
>>> should I file a ticket? We have aggregation queries that do not require
>>> sorted results.
>>>
>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>
>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB BIGINT
>>> NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>> keyB));*
>>>
>>>
>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2, COUNT(*)
>>> c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP BY t1.val,
>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>> | || AND                                                        | null |
>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>    | null | null |
>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>> Thanks,
>>> Gerald
>>>
>>
>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Maryann,

Can Phoenix provide hash aggregation on the client side? Are there design /
implementation reasons not to, or should I file a ticket for this?

Thanks,
Gerald

On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <ma...@gmail.com> wrote:

> Hi Gerald,
>
> Phoenix does have hash aggregation. The reason why sort-based aggregation
> is used in your query plan is that the aggregation happens on the client
> side. And that is because sort-merge join is used (as hinted) which is a
> client driven join, and after that join stage all operations can only be on
> the client-side.
>
>
> Thanks,
> Marynn
>
> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Hello,
>>
>> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
>> should I file a ticket? We have aggregation queries that do not require
>> sorted results.
>>
>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>
>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB BIGINT
>> NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>> keyB));*
>>
>>
>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2, COUNT(*)
>> c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP BY t1.val,
>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>> |+------------------------------------------------------------+-----------------+----------------+--+|
>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>> | || AND                                                        | null |
>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>    | null | null |
>> |+------------------------------------------------------------+-----------------+----------------+--+*
>> Thanks,
>> Gerald
>>
>
>

Re: Hash aggregation

Posted by Maryann Xue <ma...@gmail.com>.

Hi Gerald,

Phoenix does have hash aggregation. The reason why sort-based aggregation
is used in your query plan is that the aggregation happens on the client
side. And that is because sort-merge join is used (as hinted) which is a
client driven join, and after that join stage all operations can only be on
the client-side.


Thanks,
Marynn

On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gs...@23andme.com>
wrote:

> Hello,
>
> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
> should I file a ticket? We have aggregation queries that do not require
> sorted results.
>
> For example, this EXPLAIN plan shows a CLIENT SORT.
>
> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB BIGINT NOT
> NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA, keyB));*
>
>
> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2, COUNT(*) c
> FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP BY t1.val,
> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
> |+------------------------------------------------------------+-----------------+----------------+--+|
> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
> | || AND                                                        | null |
> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>    | null | null |
> |+------------------------------------------------------------+-----------------+----------------+--+*
> Thanks,
> Gerald
>