You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by Hyunsik Choi <hy...@apache.org> on 2014/02/18 13:03:45 UTC

Review Request 18210: TAJO-601: Improve distinct aggregation query processing.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/
-----------------------------------------------------------

Review request for Tajo.


Bugs: TAJO-601
    https://issues.apache.org/jira/browse/TAJO-601


Repository: tajo


Description
-------

Currently, distinct aggregation queries are executed as follows:
* the first stage: it just shuffles tuples by hashing grouping keys.
* the second stage: it sorts them and executes sort aggregation.

This way executes queries including distinct aggregation functions with only two stages. But, it leads to large intermediate data during shuffle phase.

This kind of query can be rewritten as two queries:

[Original query]
----------
SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from rel1 group by grp1, grp2;
----------

[Rewritten query]
----------
SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
  SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) tmp1 group by grp1, grp2
) table1;
----------

I'm expecting that this rewrite will significantly reduce the intermediate data volume and query response time in most cases.


Diffs
-----

  tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d4 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java da05739 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java 10fd720 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java b14c448 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java f7c0bfa 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java 624518b 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java 6dac031 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java efa1e05 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java f390b52 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java 91f658d 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java a0c0eeb 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java 399903c 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java e5f7fb4 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java 633d0c1 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ae6d5eb 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java 3c30e38 
  tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java d756242 
  tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java 1f80bce 
  tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java 053c028 
  tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java 2d3124d 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql 6fe604e 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql 6bf8a8a 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation1.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation2.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result f2ad32a 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result 9164120 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation1.result PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation2.result PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result PRE-CREATION 

Diff: https://reviews.apache.org/r/18210/diff/


Testing
-------

mvn clean install


Thanks,

Hyunsik Choi


Re: Review Request 18210: TAJO-601: Improve distinct aggregation query processing.

Posted by Hyunsik Choi <hy...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/#review34693
-----------------------------------------------------------


The current approach has shown poor performance. You can see the current approach in the description of this issue.

This patch improves the performance of distinct aggregation. Unlike the current approach, in the this patch, GlobalPlanner builds three phase plan using two hash shuffles. Then, GlobalPlanner adds an enforcer of sort aggregation to the final execution block. As a result, it can reduce significantly intermediate data volume according to the cardinality of grouping columns.
 
This patch also allows Tajo to support multiple distinct functions. For example, the following query works well.
 
select l_orderkey, count(distinct l_partkey), sum(distinct l_partkey) from lineitem group by l_orderkey;
 
But, the current patch still has some limitations. The above query includes there are two count distinct functions: count(distinct), sum(distinct). They use the same distinct column 'l_partkey', so it works well. In contrast, the following case where there are two or more distinct columns is not supported yet.
 
select l_orderkey, count(distinct l_partkey), sum(distinct l_linenumber) from lineitem group by l_orderkey;
 
If you submit such a query, you will see the following messages: "different DISTINCT columns are not supported yet: l_partkey, l_linenumber". In order to support this kind of queries, we need additional physical executors. I'll add this feature later in another Jira issue.

- Hyunsik Choi


On Feb. 18, 2014, 9:03 p.m., Hyunsik Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18210/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 9:03 p.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-601
>     https://issues.apache.org/jira/browse/TAJO-601
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> Currently, distinct aggregation queries are executed as follows:
> * the first stage: it just shuffles tuples by hashing grouping keys.
> * the second stage: it sorts them and executes sort aggregation.
> 
> This way executes queries including distinct aggregation functions with only two stages. But, it leads to large intermediate data during shuffle phase.
> 
> This kind of query can be rewritten as two queries:
> 
> [Original query]
> ----------
> SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from rel1 group by grp1, grp2;
> ----------
> 
> [Rewritten query]
> ----------
> SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
>   SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) tmp1 group by grp1, grp2
> ) table1;
> ----------
> 
> I'm expecting that this rewrite will significantly reduce the intermediate data volume and query response time in most cases.
> 
> 
> Diffs
> -----
> 
>   tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java da05739 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java 10fd720 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java b14c448 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java f7c0bfa 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java 624518b 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java 6dac031 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java efa1e05 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java f390b52 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java 91f658d 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java a0c0eeb 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java 399903c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java e5f7fb4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java 633d0c1 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ae6d5eb 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java 3c30e38 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java d756242 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java 1f80bce 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java 053c028 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java 2d3124d 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql 6fe604e 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql 6bf8a8a 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation2.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result f2ad32a 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result 9164120 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation1.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation2.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18210/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Hyunsik Choi
> 
>


Re: Review Request 18210: TAJO-601: Improve distinct aggregation query processing.

Posted by Hyunsik Choi <hy...@apache.org>.

> On Feb. 20, 2014, 10:51 a.m., Jung JaeHwa wrote:
> > Hyunsik, thank you for waiting.
> > 
> > I tested the patch on my local cluster. 
> > But validation for different columns doesn't work as expected. For example, following queries finished without the PlanningException.
> > 
> > - select count(distinct id), sum(distinct score) from table1
> > - select id, count(distinct id), sum(distinct name) from table1 group by id
> > 
> > For reference, I created a table which written at tajo wiki.
> > 
> > Anyway, I found that it has never been called. Please, check this situation.
> > 
> > And if that's okay with you, I want to suggest unit test cases for unsupported queries.
> > But if you think that it's waste of resource, may be disregarded. :)

Could you check the patch once again? I've tried your test, but I can see the following messages:

tajo> select count(distinct l_orderkey), sum(distinct l_partkey) from lineitem;
different DISTINCT columns are not supported yet: l_orderkey, l_partkey

tajo> select id, count(distinct l_orderkey), sum(distinct l_partkey) from lineitem group by id;
different DISTINCT columns are not supported yet: l_orderkey, l_partkey

tajo> select count(distinct id), sum(distinct score) from table1;
different DISTINCT columns are not supported yet: id, score
tajo> select id, count(distinct id), sum(distinct name) from table1 group by id;
different DISTINCT columns are not supported yet: id, name


Thanks!


- Hyunsik


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/#review34962
-----------------------------------------------------------


On Feb. 18, 2014, 9:03 p.m., Hyunsik Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18210/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 9:03 p.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-601
>     https://issues.apache.org/jira/browse/TAJO-601
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> Currently, distinct aggregation queries are executed as follows:
> * the first stage: it just shuffles tuples by hashing grouping keys.
> * the second stage: it sorts them and executes sort aggregation.
> 
> This way executes queries including distinct aggregation functions with only two stages. But, it leads to large intermediate data during shuffle phase.
> 
> This kind of query can be rewritten as two queries:
> 
> [Original query]
> ----------
> SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from rel1 group by grp1, grp2;
> ----------
> 
> [Rewritten query]
> ----------
> SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
>   SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) tmp1 group by grp1, grp2
> ) table1;
> ----------
> 
> I'm expecting that this rewrite will significantly reduce the intermediate data volume and query response time in most cases.
> 
> 
> Diffs
> -----
> 
>   tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java da05739 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java 10fd720 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java b14c448 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java f7c0bfa 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java 624518b 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java 6dac031 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java efa1e05 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java f390b52 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java 91f658d 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java a0c0eeb 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java 399903c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java e5f7fb4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java 633d0c1 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ae6d5eb 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java 3c30e38 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java d756242 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java 1f80bce 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java 053c028 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java 2d3124d 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql 6fe604e 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql 6bf8a8a 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation2.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result f2ad32a 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result 9164120 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation1.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation2.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18210/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Hyunsik Choi
> 
>


Re: Review Request 18210: TAJO-601: Improve distinct aggregation query processing.

Posted by Jung JaeHwa <jh...@gruter.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/#review34962
-----------------------------------------------------------


Hyunsik, thank you for waiting.

I tested the patch on my local cluster. 
But validation for different columns doesn't work as expected. For example, following queries finished without the PlanningException.

- select count(distinct id), sum(distinct score) from table1
- select id, count(distinct id), sum(distinct name) from table1 group by id

For reference, I created a table which written at tajo wiki.

Anyway, I found that it has never been called. Please, check this situation.

And if that's okay with you, I want to suggest unit test cases for unsupported queries.
But if you think that it's waste of resource, may be disregarded. :)

- Jung JaeHwa


On Feb. 18, 2014, 12:03 p.m., Hyunsik Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18210/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 12:03 p.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-601
>     https://issues.apache.org/jira/browse/TAJO-601
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> Currently, distinct aggregation queries are executed as follows:
> * the first stage: it just shuffles tuples by hashing grouping keys.
> * the second stage: it sorts them and executes sort aggregation.
> 
> This way executes queries including distinct aggregation functions with only two stages. But, it leads to large intermediate data during shuffle phase.
> 
> This kind of query can be rewritten as two queries:
> 
> [Original query]
> ----------
> SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from rel1 group by grp1, grp2;
> ----------
> 
> [Rewritten query]
> ----------
> SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
>   SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) tmp1 group by grp1, grp2
> ) table1;
> ----------
> 
> I'm expecting that this rewrite will significantly reduce the intermediate data volume and query response time in most cases.
> 
> 
> Diffs
> -----
> 
>   tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java da05739 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java 10fd720 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java b14c448 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java f7c0bfa 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java 624518b 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java 6dac031 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java efa1e05 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java f390b52 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java 91f658d 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java a0c0eeb 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java 399903c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java e5f7fb4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java 633d0c1 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ae6d5eb 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java 3c30e38 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java d756242 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java 1f80bce 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java 053c028 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java 2d3124d 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql 6fe604e 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql 6bf8a8a 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation2.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result f2ad32a 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result 9164120 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation1.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation2.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18210/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Hyunsik Choi
> 
>


Re: Review Request 18210: TAJO-601: Improve distinct aggregation query processing.

Posted by Hyunsik Choi <hy...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/#review34987
-----------------------------------------------------------


Thank you for the review. I've fixed all of them you mentioned. And, I've committed it to master branch.

- Hyunsik Choi


On Feb. 20, 2014, 2:28 p.m., Hyunsik Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18210/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2014, 2:28 p.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-601
>     https://issues.apache.org/jira/browse/TAJO-601
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> Currently, distinct aggregation queries are executed as follows:
> * the first stage: it just shuffles tuples by hashing grouping keys.
> * the second stage: it sorts them and executes sort aggregation.
> 
> This way executes queries including distinct aggregation functions with only two stages. But, it leads to large intermediate data during shuffle phase.
> 
> This kind of query can be rewritten as two queries:
> 
> [Original query]
> ----------
> SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from rel1 group by grp1, grp2;
> ----------
> 
> [Rewritten query]
> ----------
> SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
>   SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) tmp1 group by grp1, grp2
> ) table1;
> ----------
> 
> I'm expecting that this rewrite will significantly reduce the intermediate data volume and query response time in most cases.
> 
> 
> Diffs
> -----
> 
>   tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/SortSpec.java 3ef73d5c5385b40fcfb3b0ecbbc35b783224c760 
>   tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d43f42f68945cf53a7b8b9bbdca97a4f205 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java da05739b8feff0e04b1762f8000b1f3818c773a2 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/FunctionEval.java 0555bdec8aff6fa79c02b640c81ad55d4666b90a 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java 10fd7205f29c82adf87816737598ce762ee0ebc9 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java b14c448ee5b3ce0dfca67c6a9b942f1803cc91f9 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java f7c0bfab78cb3416e7a2ed263cc362917023e3ca 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java 67f56303e04787bf950c4a9a703faec58fb74cd4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java 7d5e2fc7e085cc36527383a208277384035263e7 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java 6dac031218c650b9c1c86811b4552fe6d82da0c1 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/enforce/Enforcer.java dd46996eca7eb9c38f87d97813f5dcc7220429ed 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java 9f5c6bf9dd7b549308724ce1e8044aff1630cef1 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java f390b52f378a2d7e84e40876df4a4b416af912ef 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java 91f658dab395620f5a891f51407b3676b07a8fa5 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java 791781e526c54f216152e935682bc2c3147a9e0c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java 53a1c24197c40c77153f79f90c05882c90aae957 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java 399903c66bb8a62074facd0bbbe9b3b8e891c067 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java e5f7fb40414e0b2e2e40bccebe24069ee4d9301b 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java 633d0c1857533b02c4ecc6913c740fd2e3722845 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ae6d5ebb97f8c4287ffd11262b2932d2f8b1250c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java 3c30e3854abaa891f72b368144942164e5dffab7 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 56c26797aad1dbe95945567961e9425fef72fa96 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java d7562426647a6a9d6aae5207a67ddcdd03d0ee3a 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java 1f80bce23c74e3abdcbf9bc0553ec30244d6bd93 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java 053c02833e80dd931807fa6314965e687d7b26c0 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java 2d3124d7e9d7853b0f872eee1016cbae504c9c6b 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql  
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql  
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithUnion1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result  
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result  
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithUnion1.result PRE-CREATION 
>   tajo-storage/src/main/java/org/apache/tajo/storage/RawFile.java c3a7525154e0f36d51dcca211949f21f57a9f1c8 
> 
> Diff: https://reviews.apache.org/r/18210/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Hyunsik Choi
> 
>


Re: Review Request 18210: TAJO-601: Improve distinct aggregation query processing.

Posted by Jung JaeHwa <jh...@gruter.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/#review34986
-----------------------------------------------------------

Ship it!


+1 for the patch.

Sorry, Hyunsik.
I found a misconfiguration on my local cluster, PlanningException works as expected. 


- Jung JaeHwa


On Feb. 20, 2014, 5:28 a.m., Hyunsik Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18210/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2014, 5:28 a.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-601
>     https://issues.apache.org/jira/browse/TAJO-601
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> Currently, distinct aggregation queries are executed as follows:
> * the first stage: it just shuffles tuples by hashing grouping keys.
> * the second stage: it sorts them and executes sort aggregation.
> 
> This way executes queries including distinct aggregation functions with only two stages. But, it leads to large intermediate data during shuffle phase.
> 
> This kind of query can be rewritten as two queries:
> 
> [Original query]
> ----------
> SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from rel1 group by grp1, grp2;
> ----------
> 
> [Rewritten query]
> ----------
> SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
>   SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) tmp1 group by grp1, grp2
> ) table1;
> ----------
> 
> I'm expecting that this rewrite will significantly reduce the intermediate data volume and query response time in most cases.
> 
> 
> Diffs
> -----
> 
>   tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/SortSpec.java 3ef73d5c5385b40fcfb3b0ecbbc35b783224c760 
>   tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d43f42f68945cf53a7b8b9bbdca97a4f205 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java da05739b8feff0e04b1762f8000b1f3818c773a2 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/FunctionEval.java 0555bdec8aff6fa79c02b640c81ad55d4666b90a 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java 10fd7205f29c82adf87816737598ce762ee0ebc9 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java b14c448ee5b3ce0dfca67c6a9b942f1803cc91f9 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java f7c0bfab78cb3416e7a2ed263cc362917023e3ca 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java 67f56303e04787bf950c4a9a703faec58fb74cd4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java 7d5e2fc7e085cc36527383a208277384035263e7 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java 6dac031218c650b9c1c86811b4552fe6d82da0c1 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/enforce/Enforcer.java dd46996eca7eb9c38f87d97813f5dcc7220429ed 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java 9f5c6bf9dd7b549308724ce1e8044aff1630cef1 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java f390b52f378a2d7e84e40876df4a4b416af912ef 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java 91f658dab395620f5a891f51407b3676b07a8fa5 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java 791781e526c54f216152e935682bc2c3147a9e0c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java 53a1c24197c40c77153f79f90c05882c90aae957 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java 399903c66bb8a62074facd0bbbe9b3b8e891c067 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java e5f7fb40414e0b2e2e40bccebe24069ee4d9301b 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java 633d0c1857533b02c4ecc6913c740fd2e3722845 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ae6d5ebb97f8c4287ffd11262b2932d2f8b1250c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java 3c30e3854abaa891f72b368144942164e5dffab7 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 56c26797aad1dbe95945567961e9425fef72fa96 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java d7562426647a6a9d6aae5207a67ddcdd03d0ee3a 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java 1f80bce23c74e3abdcbf9bc0553ec30244d6bd93 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java 053c02833e80dd931807fa6314965e687d7b26c0 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java 2d3124d7e9d7853b0f872eee1016cbae504c9c6b 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql  
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql  
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithUnion1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result  
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result  
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithUnion1.result PRE-CREATION 
>   tajo-storage/src/main/java/org/apache/tajo/storage/RawFile.java c3a7525154e0f36d51dcca211949f21f57a9f1c8 
> 
> Diff: https://reviews.apache.org/r/18210/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Hyunsik Choi
> 
>


Re: Review Request 18210: TAJO-601: Improve distinct aggregation query processing.

Posted by Hyunsik Choi <hy...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/
-----------------------------------------------------------

(Updated Feb. 20, 2014, 2:28 p.m.)


Review request for Tajo.


Changes
-------

rebased against the latest revision.


Bugs: TAJO-601
    https://issues.apache.org/jira/browse/TAJO-601


Repository: tajo


Description
-------

Currently, distinct aggregation queries are executed as follows:
* the first stage: it just shuffles tuples by hashing grouping keys.
* the second stage: it sorts them and executes sort aggregation.

This way executes queries including distinct aggregation functions with only two stages. But, it leads to large intermediate data during shuffle phase.

This kind of query can be rewritten as two queries:

[Original query]
----------
SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from rel1 group by grp1, grp2;
----------

[Rewritten query]
----------
SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
  SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) tmp1 group by grp1, grp2
) table1;
----------

I'm expecting that this rewrite will significantly reduce the intermediate data volume and query response time in most cases.


Diffs (updated)
-----

  tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/SortSpec.java 3ef73d5c5385b40fcfb3b0ecbbc35b783224c760 
  tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d43f42f68945cf53a7b8b9bbdca97a4f205 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java da05739b8feff0e04b1762f8000b1f3818c773a2 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/FunctionEval.java 0555bdec8aff6fa79c02b640c81ad55d4666b90a 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java 10fd7205f29c82adf87816737598ce762ee0ebc9 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java PRE-CREATION 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java b14c448ee5b3ce0dfca67c6a9b942f1803cc91f9 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java f7c0bfab78cb3416e7a2ed263cc362917023e3ca 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java 67f56303e04787bf950c4a9a703faec58fb74cd4 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java 7d5e2fc7e085cc36527383a208277384035263e7 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java 6dac031218c650b9c1c86811b4552fe6d82da0c1 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/enforce/Enforcer.java dd46996eca7eb9c38f87d97813f5dcc7220429ed 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java 9f5c6bf9dd7b549308724ce1e8044aff1630cef1 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java f390b52f378a2d7e84e40876df4a4b416af912ef 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java 91f658dab395620f5a891f51407b3676b07a8fa5 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java 791781e526c54f216152e935682bc2c3147a9e0c 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java 53a1c24197c40c77153f79f90c05882c90aae957 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java 399903c66bb8a62074facd0bbbe9b3b8e891c067 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java e5f7fb40414e0b2e2e40bccebe24069ee4d9301b 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java 633d0c1857533b02c4ecc6913c740fd2e3722845 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ae6d5ebb97f8c4287ffd11262b2932d2f8b1250c 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java 3c30e3854abaa891f72b368144942164e5dffab7 
  tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java 56c26797aad1dbe95945567961e9425fef72fa96 
  tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java d7562426647a6a9d6aae5207a67ddcdd03d0ee3a 
  tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java 1f80bce23c74e3abdcbf9bc0553ec30244d6bd93 
  tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java 053c02833e80dd931807fa6314965e687d7b26c0 
  tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java 2d3124d7e9d7853b0f872eee1016cbae504c9c6b 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql  
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql  
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithUnion1.sql PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result  
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result  
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result PRE-CREATION 
  tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithUnion1.result PRE-CREATION 
  tajo-storage/src/main/java/org/apache/tajo/storage/RawFile.java c3a7525154e0f36d51dcca211949f21f57a9f1c8 

Diff: https://reviews.apache.org/r/18210/diff/


Testing
-------

mvn clean install


Thanks,

Hyunsik Choi


Re: Review Request 18210: TAJO-601: Improve distinct aggregation query processing.

Posted by Jung JaeHwa <jh...@gruter.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/#review34855
-----------------------------------------------------------



tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java
<https://reviews.apache.org/r/18210/#comment65247>

    It needs to update as follows:
    INT8 sum(value INT4)



tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java
<https://reviews.apache.org/r/18210/#comment65248>

    It needs to update as follows:
    INT8 sum(value INT8)


- Jung JaeHwa


On Feb. 18, 2014, 12:03 p.m., Hyunsik Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18210/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 12:03 p.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-601
>     https://issues.apache.org/jira/browse/TAJO-601
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> Currently, distinct aggregation queries are executed as follows:
> * the first stage: it just shuffles tuples by hashing grouping keys.
> * the second stage: it sorts them and executes sort aggregation.
> 
> This way executes queries including distinct aggregation functions with only two stages. But, it leads to large intermediate data during shuffle phase.
> 
> This kind of query can be rewritten as two queries:
> 
> [Original query]
> ----------
> SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from rel1 group by grp1, grp2;
> ----------
> 
> [Rewritten query]
> ----------
> SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
>   SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) tmp1 group by grp1, grp2
> ) table1;
> ----------
> 
> I'm expecting that this rewrite will significantly reduce the intermediate data volume and query response time in most cases.
> 
> 
> Diffs
> -----
> 
>   tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java da05739 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java 10fd720 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java b14c448 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java f7c0bfa 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java 624518b 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java 6dac031 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java efa1e05 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java f390b52 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java 91f658d 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java a0c0eeb 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java 399903c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java e5f7fb4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java 633d0c1 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ae6d5eb 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java 3c30e38 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java d756242 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java 1f80bce 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java 053c028 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java 2d3124d 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql 6fe604e 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql 6bf8a8a 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation2.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result f2ad32a 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result 9164120 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation1.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation2.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18210/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Hyunsik Choi
> 
>


Re: Review Request 18210: TAJO-601: Improve distinct aggregation query processing.

Posted by Jung JaeHwa <jh...@gruter.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/#review34856
-----------------------------------------------------------


Hi Hyunsik.
I'm reviewing your patch.
First, I found some typos. After I review other codes, I'll comment again.

- Jung JaeHwa


On Feb. 18, 2014, 12:03 p.m., Hyunsik Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18210/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 12:03 p.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-601
>     https://issues.apache.org/jira/browse/TAJO-601
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> Currently, distinct aggregation queries are executed as follows:
> * the first stage: it just shuffles tuples by hashing grouping keys.
> * the second stage: it sorts them and executes sort aggregation.
> 
> This way executes queries including distinct aggregation functions with only two stages. But, it leads to large intermediate data during shuffle phase.
> 
> This kind of query can be rewritten as two queries:
> 
> [Original query]
> ----------
> SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from rel1 group by grp1, grp2;
> ----------
> 
> [Rewritten query]
> ----------
> SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
>   SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) tmp1 group by grp1, grp2
> ) table1;
> ----------
> 
> I'm expecting that this rewrite will significantly reduce the intermediate data volume and query response time in most cases.
> 
> 
> Diffs
> -----
> 
>   tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java da05739 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java 10fd720 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java PRE-CREATION 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java b14c448 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java f7c0bfa 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java 624518b 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java 6dac031 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java efa1e05 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java f390b52 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java 91f658d 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java a0c0eeb 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java 399903c 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java e5f7fb4 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java 633d0c1 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java ae6d5eb 
>   tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java 3c30e38 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java d756242 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java 1f80bce 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java 053c028 
>   tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java 2d3124d 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql 6fe604e 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql 6bf8a8a 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation2.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result f2ad32a 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result 9164120 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation1.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation2.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result PRE-CREATION 
>   tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18210/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Hyunsik Choi
> 
>