You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Jihoon Son (JIRA)" <ji...@apache.org> on 2015/05/13 08:07:59 UTC

[jira] [Created] (TAJO-1600) Invalid query planning for distinct group-by

Jihoon Son created TAJO-1600:
--------------------------------

             Summary: Invalid query planning for distinct group-by
                 Key: TAJO-1600
                 URL: https://issues.apache.org/jira/browse/TAJO-1600
             Project: Tajo
          Issue Type: Bug
          Components: planner/optimizer
            Reporter: Jihoon Son
             Fix For: 0.11.0


For a query involving distinct operator, group-by is always executed at the last step of the query. Let me consider an example query as follows.
{noformat}
default> select distinct a.col3 from test as a left outer join lineitem b on a.col1 = b.l_orderkey order by a.col3;
{noformat}

The plan for this query is
{noformat}
GROUP_BY(5)(col3)
  => target list: default.a.col3 (TEXT)
  => out schema:{(1) default.a.col3 (TEXT)}
  => in schema:{(1) default.a.col3 (TEXT)}
   SORT(3)
     => Sort Keys: default.a.col3 (TEXT) (asc)
      JOIN(7)(LEFT_OUTER)
        => Join Cond: default.a.col1 (INT4) = default.b.l_orderkey (INT4)
        => target list: default.a.col3 (TEXT)
        => out schema: {(1) default.a.col3 (TEXT)}
        => in schema: {(3) default.a.col3 (TEXT), default.a.col1 (INT4), default.b.l_orderkey (INT4)}
         SCAN(1) on default.lineitem_large as b
           => target list: default.b.l_orderkey (INT4)
           => out schema: {(1) default.b.l_orderkey (INT4)}
           => in schema: {(16) default.b.l_orderkey (INT4), default.b.l_partkey (INT4), default.b.l_suppkey (INT4), default.b.l_linenumber (INT4), default.b.l_quantity (FLOAT8), default.b.l_extendedprice (FLOAT8), default.b.l_discount (FLOAT8), default.b.l_tax (FLOAT8), default.b.l_returnflag (TEXT), default.b.l_linestatus (TEXT), default.b.l_shipdate (TEXT), default.b.l_commitdate (TEXT), default.b.l_receiptdate (TEXT), default.b.l_shipinstruct (TEXT), default.b.l_shipmode (TEXT), default.b.l_comment (TEXT)}
         PARTITIONS_SCAN(8) on default.testbroadcastmulticolumnpartitiontable as a
           => target list: default.a.col3 (TEXT), default.a.col1 (INT4)
           => num of filtered paths: 3
           => out schema: {(2) default.a.col3 (TEXT), default.a.col1 (INT4)}
           => in schema: {(2) default.a.col1 (INT4), default.a.col2 (FLOAT4)}
           => 0: hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=01/col4=1996
           => 1: hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=10/col4=1993
           => 2: hdfs://localhost:52705/tajo/warehouse/default/testbroadcastmulticolumnpartitiontable/col3=12/col4=1996
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)