You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Natarajan, Prabakaran 1. (NSN - IN/Bangalore)" <pr...@nsn.com> on 2014/08/22 08:01:25 UTC
Ideal Bucket Size
Hi
How can I determine a ideal bucket size?
Info:
1) I have 2 billion rows in a hive table, it is in ORC format
2) I want to create bucket on a column X.
3) Column X has 100 million unique values.
4) Reason for bucketing - Want to make efficient distinct count on X - this is over my own UDAF. In merge function I will just count++ instead of merging the Set.
Thanks and Regards
Prabakaran.N aka NP
Nokia Networks, Bangalore
When "I" is replaced by "We" - even Illness becomes "Wellness"