You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@tajo.apache.org by "Jaehwa Jung (JIRA)" <ji...@apache.org> on 2014/08/18 11:53:18 UTC

[jira] [Created] (TAJO-1010) Improve multiple DISTINCT aggregation.

Jaehwa Jung created TAJO-1010:
---------------------------------

             Summary: Improve multiple DISTINCT aggregation.
                 Key: TAJO-1010
                 URL: https://issues.apache.org/jira/browse/TAJO-1010
             Project: Tajo
          Issue Type: Improvement
          Components: planner/optimizer
            Reporter: Jaehwa Jung
            Assignee: Jaehwa Jung


Currently, tajo provides three stage for optimizing distinct query aggregation. But it just supports one column for distinct aggregation as follows:
{code:title=Query1|borderStyle=solid}
select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
from table1
group by a.flag
{code}

If you write two more columns for distinct aggregation, you can't apply optimized distinct aggregation as follows:
{code:title=Query2|borderStyle=solid}
select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
, count(distinct a.name) as cnt2, count(distinct a.code) as cnt3
from table1
group by a.flag
{code}

In this case, you may see low performance for your query. Thus, we need to improve multiple DISTINCT aggregation. Correctly, we should support three stage for multiple DISTINCT aggregation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)