You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Weishiun Tsai (JIRA)" <ji...@apache.org> on 2016/03/11 19:24:38 UTC
[jira] [Created] (TRAFODION-1889) Unique constraints converted from
HIVE SORTED BY causes query to return wrong results
Weishiun Tsai created TRAFODION-1889:
----------------------------------------
Summary: Unique constraints converted from HIVE SORTED BY causes query to return wrong results
Key: TRAFODION-1889
URL: https://issues.apache.org/jira/browse/TRAFODION-1889
Project: Apache Trafodion
Issue Type: Bug
Components: sql-cmp
Affects Versions: 2.0-incubating
Reporter: Weishiun Tsai
When a hive table is defined with the SORTED BY option, the SORTED BY option is currently converted to key columns and a unique constraint is generated for these columns. This causes a Trafodion query on these hive tables to return wrong results. The suggestion is to suppress unique constraints for hive tables.
Here is some initial analysis and a small test case to see the problem:
The issue is that we define a "key" on Hive tables when they use the SORTED BY clause. The LINEITEM table in this case has such a SORTED BY clause. We create a uniqueness constraint for the key of a Hive table, and that will later cause a groupby to be eliminated when it shouldn't.
Here is a simple test case (create the table in Hive, do the select in Trafodion):
-- Hive:
create table lineitem1(L_ORDERKEY int, L_PARTKEY int, L_SUPPKEY int, L_QUANTITY string,
L_EXTENDEDPRICE string, L_DISCOUNT string, L_TAX string, L_RETURNFLAG string,
L_LINESTATUS string, L_SHIPDATE string, L_COMMITDATE string, L_RECEIPTDATE string,
L_SHIPINSTRUCT string, L_SHIPMODE string, L_COMMENT string)
partitioned by (L_LINENUMBER int)
clustered by (L_ORDERKEY) sorted by (L_ORDERKEY) into 4 buckets
row format delimited fields terminated by '|';
-- sqlci:
explain options 'f' select distinct l_orderkey from hive.hive.lineitem1;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)