You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/08/07 01:36:47 UTC

[Hadoop Wiki] Update of "Hive/LanguageManual/Sampling" by AMammenT

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by AMammenT:
http://wiki.apache.org/hadoop/Hive/LanguageManual/Sampling

The comment on the change is:
Clear up confusion around cluster vs. bucket and how they interact.  

------------------------------------------------------------------------------
  
  So in the above example, if table 'source' was created with 'CLUSTERED BY id INTO 32 BUCKETS' 
  {{{
-     TABLESAMPLE(BUCKET 3 OUT OF 16) 
+     TABLESAMPLE(BUCKET 3 OUT OF 16 ON id) 
  }}}
- would pick out the 3rd and 19th buckets. 
+ would pick out the 3rd and 19th clusters as each bucket would be composed of (32/16)=2 clusters. 
  
  On the other hand the tablesample clause
  {{{
      TABLESAMPLE(BUCKET 3 OUT OF 64 ON id) 
  }}}
- would pick out half of the 3rd bucket. 
+ would pick out half of the 3rd cluster as each bucket would be composed of (32/64)=1/2 of a cluster.