You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Jaehwa Jung (JIRA)" <ji...@apache.org> on 2014/08/18 11:53:18 UTC
[jira] [Created] (TAJO-1010) Improve multiple DISTINCT aggregation.
Jaehwa Jung created TAJO-1010:
---------------------------------
Summary: Improve multiple DISTINCT aggregation.
Key: TAJO-1010
URL: https://issues.apache.org/jira/browse/TAJO-1010
Project: Tajo
Issue Type: Improvement
Components: planner/optimizer
Reporter: Jaehwa Jung
Assignee: Jaehwa Jung
Currently, tajo provides three stage for optimizing distinct query aggregation. But it just supports one column for distinct aggregation as follows:
{code:title=Query1|borderStyle=solid}
select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
from table1
group by a.flag
{code}
If you write two more columns for distinct aggregation, you can't apply optimized distinct aggregation as follows:
{code:title=Query2|borderStyle=solid}
select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
, count(distinct a.name) as cnt2, count(distinct a.code) as cnt3
from table1
group by a.flag
{code}
In this case, you may see low performance for your query. Thus, we need to improve multiple DISTINCT aggregation. Correctly, we should support three stage for multiple DISTINCT aggregation.
--
This message was sent by Atlassian JIRA
(v6.2#6252)