You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ajo Fod <aj...@gmail.com> on 2011/01/17 20:03:25 UTC

On bucketing : fewer files than buckets.

Hello,

In the documentation I read that as many files are created in each
partition as there are buckets. In the following sample script, I
created 32 buckets, but only find 2 files in each partition directory.
 Am I missing something?

In this sample script, I'm trying to load a tab separated file from
disk into the table trades ... and then transferring data into
alltrades based on the example in :
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL/BucketedTables

BTW, ANOTHER  question : How does one put in comments in a hive.q file?

-------- sample script ------------
SET hive.enforce.bucketing=TRUE;

CREATE TABLE trades
       (symbol STRING, time STRING, exchange STRING, price FLOAT, volume INT)
PARTITIONED BY (dt STRING)
CLUSTERED BY (symbol)
SORTED BY (time ASC)
INTO 1 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
 STORED AS TEXTFILE ;

LOAD DATA LOCAL INPATH 'data/2001-05-22'
     INTO TABLE trades
     PARTITION (dt='2001-05-22');

CREATE TABLE alltrades
       (symbol STRING, time STRING, exchange STRING, price FLOAT, volume INT)
PARTITIONED BY (dt STRING)
CLUSTERED BY (symbol)
SORTED BY (time ASC)
INTO 32 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
 STORED AS TEXTFILE;

FROM trades
INSERT OVERWRITE TABLE alltrades
PARTITION (dt='2001-05-22')
SELECT symbol, time, exchange, price, volume
WHERE dt='2001-05-22';