You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2014/02/10 17:05:19 UTC
[jira] [Created] (TAJO-593) outer groupby and groupby in derived
table causes only one shuffle output number
Hyunsik Choi created TAJO-593:
---------------------------------
Summary: outer groupby and groupby in derived table causes only one shuffle output number
Key: TAJO-593
URL: https://issues.apache.org/jira/browse/TAJO-593
Project: Tajo
Issue Type: Bug
Components: distributed query plan
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi
Fix For: 0.8-incubating
See the following query case:
{code:sql}
select count(*) from (select l_orderkey, l_partkey, count(*) from lineitem group by l_orderkey, l_partkey) t1;
{code}
In this case, SubQuery::calculateShuffleOutputNum() are used two times for choosing the number of shuffle outputs. At that time, SubQuery::calculateShuffleOutputNum() method finds GroupByNode to know the number of grouping keys. Here is one bug. SubQuery::calculateShuffleOutputNum() always the topmost GroupByNode. In most cases, it work well. But, outer groupby and groupby in derived table can cause the problem. In this case, we must use the most bottom groupby node. Actually, it is always the correct way.
This patch fixes SubQuery::calculateShuffleOutputNum() to use the most bottom groupby node.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)