You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/08/07 01:36:47 UTC
[Hadoop Wiki] Update of "Hive/LanguageManual/Sampling" by AMammenT
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by AMammenT:
http://wiki.apache.org/hadoop/Hive/LanguageManual/Sampling
The comment on the change is:
Clear up confusion around cluster vs. bucket and how they interact.
------------------------------------------------------------------------------
So in the above example, if table 'source' was created with 'CLUSTERED BY id INTO 32 BUCKETS'
{{{
- TABLESAMPLE(BUCKET 3 OUT OF 16)
+ TABLESAMPLE(BUCKET 3 OUT OF 16 ON id)
}}}
- would pick out the 3rd and 19th buckets.
+ would pick out the 3rd and 19th clusters as each bucket would be composed of (32/16)=2 clusters.
On the other hand the tablesample clause
{{{
TABLESAMPLE(BUCKET 3 OUT OF 64 ON id)
}}}
- would pick out half of the 3rd bucket.
+ would pick out half of the 3rd cluster as each bucket would be composed of (32/64)=1/2 of a cluster.