You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Ryan P (JIRA)" <ji...@apache.org> on 2015/04/09 21:27:14 UTC

[jira] [Created] (HIVE-10283) HIVE-4240 may be causing issue with bucketed tables

Ryan P created HIVE-10283:
-----------------------------

             Summary: HIVE-4240 may be causing issue with bucketed tables 
                 Key: HIVE-10283
                 URL: https://issues.apache.org/jira/browse/HIVE-10283
             Project: Hive
          Issue Type: Bug
          Components: Hive
            Reporter: Ryan P


I suspect that by removing the reducer, HIVE-4240, may be causing issues. Because of this inserts will not consolidate 'buckets' into single files which is problematic when attempting to use bucketmapjoin.

CREATE TABLE IF NOT EXISTS buckettestinput( 
data string 
) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 


CREATE TABLE IF NOT EXISTS buckettestoutput1( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 


CREATE TABLE IF NOT EXISTS buckettestoutput2( 
data string 
)CLUSTERED BY(data) 
INTO 2 BUCKETS 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 

Then I inserted the following data into the "buckettestinput" table 

firstinsert1 
firstinsert2 
firstinsert3 
firstinsert4 
firstinsert5 
firstinsert6 
firstinsert7 
firstinsert8 
secondinsert1 
secondinsert2 
secondinsert3 
secondinsert4 
secondinsert5 
secondinsert6 
secondinsert7 
secondinsert8 

set hive.enforce.bucketing = true; 
set hive.enforce.sorting=true; 

insert into table buckettestoutput1 
select * from buckettestinput where data like 'first%' 

SELECT * 
FROM buckettestoutput1 TABLESAMPLE(BUCKET 1 OUT OF 1 ON data) s; 


insert into table buckettestoutput1 
select * from buckettestinput where data like 'second%' 

check the results of the table sample query. 

for sort merge bucket map join 

set hive.auto.convert.sortmerge.join=true; 
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 
set hive.auto.convert.sortmerge.join.noconditionaltask=true; 


select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data) 

hive> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data); 
FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)