You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Matthew Jacobs (JIRA)" <ji...@apache.org> on 2017/05/22 19:20:04 UTC

[jira] [Resolved] (IMPALA-5343) Sort by Column(s) added as part of inserting into Kudu table is incorrect

     [ https://issues.apache.org/jira/browse/IMPALA-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthew Jacobs resolved IMPALA-5343.
------------------------------------
       Resolution: Not A Problem
    Fix Version/s: Impala 2.9.0

The plan and sort is correct, the reason the "KuduPartition" expr is there is because multiple partitions end up at a given sink fragment, and we want the rows inserted to kudu to be per-partition and then ordered by PK.

> Sort by Column(s) added as part of inserting into Kudu table is incorrect 
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-5343
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5343
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Mostafa Mokhtar
>            Assignee: Thomas Tauber-Marshall
>            Priority: Critical
>              Labels: kudu
>             Fix For: Impala 2.9.0
>
>
> The planner is including the KuduPartition(PARTITION_COLUMN) as part of the columns included in the sort by clause, The Sort should match the columns as in the primary key.
> Plan
> {code}
> Query: explain insert into lineitem_kudu_ts  select * from lineitem_kudu
> | INSERT INTO KUDU [scan_primitives_tpch_3tb.lineitem_kudu_ts]                                                                                                                    |
> | |                                                                                                                                                                               |
> | 02:SORT                                                                                                                                                                         |
> | |  order by: KuduPartition(scan_primitives_tpch_3tb.lineitem_kudu.l_orderkey) ASC NULLS LAST, l_shipdate ASC NULLS LAST, l_orderkey ASC NULLS LAST, l_linenumber ASC NULLS LAST |
> | |                                                                                                                                                                               |
> | 01:EXCHANGE [KUDU(KuduPartition(scan_primitives_tpch_3tb.lineitem_kudu.l_orderkey))]                                                                                            |
> | |                                                                                                                                                                               |
> | 00:SCAN KUDU [scan_primitives_tpch_3tb.lineitem_kudu]                                                                                                                           |
> {code}
> DDL 
> {code}
> [vd1302.halxg.cloudera.com:21000] > show create table scan_primitives_tpch_3tb.lineitem_kudu_ts;
> Query: show create table scan_primitives_tpch_3tb.lineitem_kudu_ts
>  CREATE TABLE scan_primitives_tpch_3tb.lineitem_kudu_ts (                                                
>    l_shipdate STRING NOT NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                    
>    l_orderkey BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                      
>    l_linenumber BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                    
>    l_partkey BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                       
>    l_suppkey BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                       
>    l_quantity DOUBLE NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                          
>    l_extendedprice DOUBLE NULL ENCODING PLAIN_ENCODING COMPRESSION LZ4,                                  
>    l_discount DOUBLE NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                          
>    l_tax DOUBLE NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                               
>    l_returnflag STRING NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                      
>    l_linestatus STRING NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                      
>    l_commitdate TIMESTAMP NULL ENCODING BIT_SHUFFLE COMPRESSION LZ4,                                     
>    l_receiptdate STRING NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                     
>    l_shipinstruct STRING NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                    
>    l_shipmode STRING NULL ENCODING DICT_ENCODING COMPRESSION LZ4,                                        
>    l_comment STRING NULL ENCODING PLAIN_ENCODING COMPRESSION LZ4,                                        
>    PRIMARY KEY (l_shipdate, l_orderkey, l_linenumber)                                                    
>  )                                                                                                       
>  PARTITION BY HASH (l_orderkey) PARTITIONS 140                                                           
>  STORED AS KUDU                                                                                          
>  TBLPROPERTIES ('kudu.master_addresses'='vd1301.halxg.cloudera.com:7051,vd1128.halxg.cloudera.com:7051') 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)