You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@trafodion.apache.org by "Weishiun Tsai (JIRA)" <ji...@apache.org> on 2016/03/11 19:24:38 UTC

[jira] [Created] (TRAFODION-1889) Unique constraints converted from HIVE SORTED BY causes query to return wrong results

Weishiun Tsai created TRAFODION-1889:
----------------------------------------

             Summary: Unique constraints converted from HIVE SORTED BY causes query to return wrong results
                 Key: TRAFODION-1889
                 URL: https://issues.apache.org/jira/browse/TRAFODION-1889
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-cmp
    Affects Versions: 2.0-incubating
            Reporter: Weishiun Tsai


When a hive table is defined with the SORTED BY option, the SORTED BY option is currently converted to key columns and a unique constraint is generated for these columns.  This causes a Trafodion query on these hive tables to return wrong results.   The suggestion is to suppress unique constraints for hive tables.

Here is some initial analysis and a small test case to see the problem:

The issue is that we define a "key" on Hive tables when they use the SORTED BY clause. The LINEITEM table in this case has such a SORTED BY clause. We create a uniqueness constraint for the key of a Hive table, and that will later cause a groupby to be eliminated when it shouldn't.

 Here is a simple test case (create the table in Hive, do the select in Trafodion):

 -- Hive:

create table lineitem1(L_ORDERKEY int, L_PARTKEY int, L_SUPPKEY int, L_QUANTITY string,
                        L_EXTENDEDPRICE string, L_DISCOUNT string, L_TAX string, L_RETURNFLAG string,
                        L_LINESTATUS string, L_SHIPDATE string, L_COMMITDATE string, L_RECEIPTDATE string,
                        L_SHIPINSTRUCT string, L_SHIPMODE string, L_COMMENT string)
 partitioned by (L_LINENUMBER int)
 clustered by (L_ORDERKEY) sorted by (L_ORDERKEY) into 4 buckets
 row format delimited fields terminated by '|';

 -- sqlci:
 explain options 'f' select distinct l_orderkey from hive.hive.lineitem1;




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)