You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by "chenerlu (JIRA)" <ji...@apache.org> on 2017/05/22 10:41:04 UTC

[jira] [Created] (CARBONDATA-1076) Join Issue caused by dictionary and shuffle exchange

chenerlu created CARBONDATA-1076:
------------------------------------

             Summary: Join Issue caused by dictionary and shuffle exchange
                 Key: CARBONDATA-1076
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1076
             Project: CarbonData
          Issue Type: Bug
         Environment: Carbon + spark 2.1
            Reporter: chenerlu
            Assignee: Ravindra Pesala


We can reproduce this issue as following steps:

Step1: create a carbon table
 
carbon.sql("CREATE TABLE IF NOT EXISTS carbon_table (col1 int, col2 int, col3 int) STORED by 'carbondata' TBLPROPERTIES('DICTIONARY_INCLUDE'='col1,col2,col3','TABLE_BLOCKSIZE'='4')")
 
Step2: load data
carbon.sql("LOAD DATA LOCAL INPATH '/opt/carbon_table' INTO TABLE carbon_table")
 
you can get carbon_table file in attachment.
 
Step3: do the query
 
[expected] Hive table and parquet table get same result as below and it should be correct.
 
 
[acutally] carbon will get null because wrong match
 
 
Root cause analysis:
 
It is because this query has two subquery, and one subquey do the decode after exchange and the other subquery do the decode before exchange, and this may lead to wrong match when execute full join.
 
My idea: Can we move decode before exchange ? Because I am not very familiar with Carbon query, so any idea about this ?




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)