You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Vijay Ramachandran <vi...@linkedin.com> on 2016/12/01 03:08:05 UTC

Bucketed table info

Hi.

If I have an orc table bucketed and sorted on a column, where does hive
keep the mapping from column value to bucket? Specifically, if I know the
column value, and need to find the specific hdfs file, is there an api to
do this?

Related, is there any documentation on how the read path works for
bucketed, sorted tables?

Thanks

Re: Bucketed table info

Posted by Gopal Vijayaraghavan <go...@apache.org>.

> If I have an orc table bucketed and sorted on a column, where does hive keep the mapping from column value to bucket? Specifically, if I know the column value, and need to find the specific hdfs file, is there an api to do this?

The closest to an API is ObjectInspectorUtils.getBucketNumber().

The Tez bucket pruning optimizer should be helpful, in understanding how that can be used.

That prunes all other buckets for a query like "select * from table where id=?" if the table is bucketed on id.

Planning side:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java#L223

Execution side:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L233

Cheers,
Gopal