You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by Gerald Sangudi <gs...@23andme.com> on 2018/06/12 18:15:33 UTC

Re: Hash aggregation

Hi Maryann and James,

Any further guidance on PHOENIX-4751
<https://issues.apache.org/jira/browse/PHOENIX-4751>?

Thanks,
Gerald

On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gs...@23andme.com>
wrote:

> Hi Maryann,
>
> I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>
> Is this likely to be reviewed soon (say next few weeks), or should I look
> at the Phoenix source to estimate the scope / impact?
>
> Thanks,
> Gerald
>
> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
> wrote:
>
>> Since the performance running a group-by aggregation on client side is
>> most likely bad, it’s usually not desired. The original implementation was
>> for functionality completeness only so it chose the easiest way, which
>> reused some existing classes. In some cases, though, the client group-by
>> can still be tolerable if there aren’t many distinct keys. So yes, please
>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>
>>
>> Thanks,
>> Maryann
>>
>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Any guidance or thoughts on the thread below?
>>>
>>> Thanks,
>>> Gerald
>>>
>>>
>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gs...@23andme.com>
>>> wrote:
>>>
>>>> Maryann,
>>>>
>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gerald,
>>>>>
>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>> which is a client driven join, and after that join stage all operations can
>>>>> only be on the client-side.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Marynn
>>>>>
>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gsangudi@23andme.com
>>>>> > wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap,
>>>>>> or should I file a ticket? We have aggregation queries that do not require
>>>>>> sorted results.
>>>>>>
>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>
>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>> keyB));*
>>>>>>
>>>>>>
>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>> BY t1.val,
>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>> | || AND                                                        | null |
>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>    | null | null |
>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Hi folks,

I've received a couple of reviews and applied some of the feedback.

Is there any way to know when this pull request will be merged?

Thanks,
Gerald

On Mon, Jul 9, 2018 at 9:36 AM, Gerald Sangudi <gs...@23andme.com> wrote:

> Hi folks,
>
> Any idea of when this might be reviewed? I realize there are many open
> tasks.
>
> Thanks,
> Gerald
>
>
> On Mon, Jul 2, 2018 at 1:54 PM, Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Hello all,
>>
>> I've submitted a patch for this issue: https://github.com/apac
>> he/phoenix/pull/308
>>
>> The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751
>>
>> Thanks,
>> Gerald
>>
>>
>> On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi <gs...@23andme.com>
>> wrote:
>>
>>> Thanks James. Looking into that.
>>>
>>> Gerald
>>>
>>>
>>> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <ja...@apache.org>
>>> wrote:
>>>
>>>> Hi Gerald,
>>>> No further suggestions than my comments on the JIRA. Maybe a good next
>>>> step would be a patch?
>>>> Thanks,
>>>> James
>>>>
>>>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gs...@23andme.com>
>>>> wrote:
>>>>
>>>>> Hi Maryann and James,
>>>>>
>>>>> Any further guidance on PHOENIX-4751
>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gsangudi@23andme.com
>>>>> > wrote:
>>>>>
>>>>>> Hi Maryann,
>>>>>>
>>>>>> I filed PHOENIX-4751
>>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>>>
>>>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>>>> look at the Phoenix source to estimate the scope / impact?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Since the performance running a group-by aggregation on client side
>>>>>>> is most likely bad, it’s usually not desired. The original implementation
>>>>>>> was for functionality completeness only so it chose the easiest way, which
>>>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maryann
>>>>>>>
>>>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <
>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Any guidance or thoughts on the thread below?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>
>>>>>>>>> Maryann,
>>>>>>>>>
>>>>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Gerald
>>>>>>>>>
>>>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <
>>>>>>>>> maryann.xue@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Gerald,
>>>>>>>>>>
>>>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>>>>>> which is a client driven join, and after that join stage all operations can
>>>>>>>>>> only be on the client-side.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Marynn
>>>>>>>>>>
>>>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that do not
>>>>>>>>>>> require sorted results.
>>>>>>>>>>>
>>>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>>>>
>>>>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>>>>>> keyB));*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>>>>>>> BY t1.val,
>>>>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>>>>>>> | || AND                                                        | null |
>>>>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>>>>>    | null | null |
>>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Gerald
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Hi folks,

Any idea of when this might be reviewed? I realize there are many open
tasks.

Thanks,
Gerald

On Mon, Jul 2, 2018 at 1:54 PM, Gerald Sangudi <gs...@23andme.com> wrote:

> Hello all,
>
> I've submitted a patch for this issue: https://github.com/
> apache/phoenix/pull/308
>
> The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751
>
> Thanks,
> Gerald
>
>
> On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Thanks James. Looking into that.
>>
>> Gerald
>>
>>
>> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <ja...@apache.org>
>> wrote:
>>
>>> Hi Gerald,
>>> No further suggestions than my comments on the JIRA. Maybe a good next
>>> step would be a patch?
>>> Thanks,
>>> James
>>>
>>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gs...@23andme.com>
>>> wrote:
>>>
>>>> Hi Maryann and James,
>>>>
>>>> Any further guidance on PHOENIX-4751
>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gs...@23andme.com>
>>>> wrote:
>>>>
>>>>> Hi Maryann,
>>>>>
>>>>> I filed PHOENIX-4751
>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>>
>>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>>> look at the Phoenix source to estimate the scope / impact?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Since the performance running a group-by aggregation on client side
>>>>>> is most likely bad, it’s usually not desired. The original implementation
>>>>>> was for functionality completeness only so it chose the easiest way, which
>>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Any guidance or thoughts on the thread below?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>
>>>>>>>> Maryann,
>>>>>>>>
>>>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <
>>>>>>>> maryann.xue@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Gerald,
>>>>>>>>>
>>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>>>>> which is a client driven join, and after that join stage all operations can
>>>>>>>>> only be on the client-side.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Marynn
>>>>>>>>>
>>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that do not
>>>>>>>>>> require sorted results.
>>>>>>>>>>
>>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>>>
>>>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>>>>> keyB));*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>>>>>> BY t1.val,
>>>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>>>>>> | || AND                                                        | null |
>>>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>>>>    | null | null |
>>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>>>> Thanks,
>>>>>>>>>> Gerald
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Hello all,

I've submitted a patch for this issue:
https://github.com/apache/phoenix/pull/308

The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751

Thanks,
Gerald


On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi <gs...@23andme.com>
wrote:

> Thanks James. Looking into that.
>
> Gerald
>
>
> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <ja...@apache.org>
> wrote:
>
>> Hi Gerald,
>> No further suggestions than my comments on the JIRA. Maybe a good next
>> step would be a patch?
>> Thanks,
>> James
>>
>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gs...@23andme.com>
>> wrote:
>>
>>> Hi Maryann and James,
>>>
>>> Any further guidance on PHOENIX-4751
>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gs...@23andme.com>
>>> wrote:
>>>
>>>> Hi Maryann,
>>>>
>>>> I filed PHOENIX-4751
>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>
>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>> look at the Phoenix source to estimate the scope / impact?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
>>>> wrote:
>>>>
>>>>> Since the performance running a group-by aggregation on client side is
>>>>> most likely bad, it’s usually not desired. The original implementation was
>>>>> for functionality completeness only so it chose the easiest way, which
>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Any guidance or thoughts on the thread below?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>>
>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>> gsangudi@23andme.com> wrote:
>>>>>>
>>>>>>> Maryann,
>>>>>>>
>>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann.xue@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Gerald,
>>>>>>>>
>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>>>> which is a client driven join, and after that join stage all operations can
>>>>>>>> only be on the client-side.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Marynn
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that do not
>>>>>>>>> require sorted results.
>>>>>>>>>
>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>>
>>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>>>> keyB));*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>>>>> BY t1.val,
>>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>>>>> | || AND                                                        | null |
>>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>>>    | null | null |
>>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>>> Thanks,
>>>>>>>>> Gerald
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: Hash aggregation

Posted by Gerald Sangudi <gs...@23andme.com>.

Thanks James. Looking into that.

Gerald


On Thu, Jun 14, 2018 at 6:30 AM, James Taylor <ja...@apache.org>
wrote:

> Hi Gerald,
> No further suggestions than my comments on the JIRA. Maybe a good next
> step would be a patch?
> Thanks,
> James
>
> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Hi Maryann and James,
>>
>> Any further guidance on PHOENIX-4751
>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>
>> Thanks,
>> Gerald
>>
>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gs...@23andme.com>
>> wrote:
>>
>>> Hi Maryann,
>>>
>>> I filed PHOENIX-4751
>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>
>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>> look at the Phoenix source to estimate the scope / impact?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
>>> wrote:
>>>
>>>> Since the performance running a group-by aggregation on client side is
>>>> most likely bad, it’s usually not desired. The original implementation was
>>>> for functionality completeness only so it chose the easiest way, which
>>>> reused some existing classes. In some cases, though, the client group-by
>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Any guidance or thoughts on the thread below?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>>
>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gsangudi@23andme.com
>>>>> > wrote:
>>>>>
>>>>>> Maryann,
>>>>>>
>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <ma...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Gerald,
>>>>>>>
>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>>> which is a client driven join, and after that join stage all operations can
>>>>>>> only be on the client-side.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Marynn
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>> gsangudi@23andme.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that do not
>>>>>>>> require sorted results.
>>>>>>>>
>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>
>>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>>> keyB));*
>>>>>>>>
>>>>>>>>
>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>>>> BY t1.val,
>>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>>>> | || AND                                                        | null |
>>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>>    | null | null |
>>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: Hash aggregation

Posted by James Taylor <ja...@apache.org>.

Hi Gerald,
No further suggestions than my comments on the JIRA. Maybe a good next step
would be a patch?
Thanks,
James

On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gs...@23andme.com>
wrote:

> Hi Maryann and James,
>
> Any further guidance on PHOENIX-4751
> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>
> Thanks,
> Gerald
>
> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gs...@23andme.com>
> wrote:
>
>> Hi Maryann,
>>
>> I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751>
>> .
>>
>> Is this likely to be reviewed soon (say next few weeks), or should I look
>> at the Phoenix source to estimate the scope / impact?
>>
>> Thanks,
>> Gerald
>>
>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <ma...@gmail.com>
>> wrote:
>>
>>> Since the performance running a group-by aggregation on client side is
>>> most likely bad, it’s usually not desired. The original implementation was
>>> for functionality completeness only so it chose the easiest way, which
>>> reused some existing classes. In some cases, though, the client group-by
>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gs...@23andme.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Any guidance or thoughts on the thread below?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>>
>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gs...@23andme.com>
>>>> wrote:
>>>>
>>>>> Maryann,
>>>>>
>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <ma...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Gerald,
>>>>>>
>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>> which is a client driven join, and after that join stage all operations can
>>>>>> only be on the client-side.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Marynn
>>>>>>
>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>> gsangudi@23andme.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap,
>>>>>>> or should I file a ticket? We have aggregation queries that do not require
>>>>>>> sorted results.
>>>>>>>
>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>
>>>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB
>>>>>>> BIGINT NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>>>> keyB));*
>>>>>>>
>>>>>>>
>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>>> BY t1.val,
>>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>>>>> | || AND                                                        | null |
>>>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>    | null | null |
>>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>