You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Vijay Ramachandran <vi...@linkedin.com> on 2016/12/01 03:08:05 UTC
Bucketed table info
Hi.
If I have an orc table bucketed and sorted on a column, where does hive
keep the mapping from column value to bucket? Specifically, if I know the
column value, and need to find the specific hdfs file, is there an api to
do this?
Related, is there any documentation on how the read path works for
bucketed, sorted tables?
Thanks
Re: Bucketed table info
Posted by Gopal Vijayaraghavan <go...@apache.org>.
> If I have an orc table bucketed and sorted on a column, where does hive keep the mapping from column value to bucket? Specifically, if I know the column value, and need to find the specific hdfs file, is there an api to do this?
The closest to an API is ObjectInspectorUtils.getBucketNumber().
The Tez bucket pruning optimizer should be helpful, in understanding how that can be used.
That prunes all other buckets for a query like "select * from table where id=?" if the table is bucketed on id.
Planning side:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/FixedBucketPruningOptimizer.java#L223
Execution side:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java#L233
Cheers,
Gopal