You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/03/10 23:31:29 UTC

[Hadoop Wiki] Update of "Hive/Tutorial" by PrasadChakka

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by PrasadChakka:
http://wiki.apache.org/hadoop/Hive/Tutorial

------------------------------------------------------------------------------
      COMMENT 'This is the page view table' 
      PARTITIONED BY(dt STRING, country STRING) 
      ROW FORMAT DELIMITED
-             FIELDS TERMINATED BY '\001' 
+             FIELDS TERMINATED BY '1' 
-             LINES TERMINATED BY '\012' 
+             LINES TERMINATED BY '12' 
      STORED AS SEQUENCEFILE; 
  }}}
  
@@ -217, +217 @@

      PARTITIONED BY(dt STRING, country STRING) 
      CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS 
      ROW FORMAT DELIMITED
-             FIELDS TERMINATED BY '\001' 
+             FIELDS TERMINATED BY '1' 
-             COLLECTION ITEMS TERMINATED BY '\002' 
+             COLLECTION ITEMS TERMINATED BY '2' 
-             MAP KEYS TERMINATED BY '\003' 
+             MAP KEYS TERMINATED BY '3' 
-             LINES TERMINATED BY '\012' 
+             LINES TERMINATED BY '12' 
      STORED AS SEQUENCEFILE; 
  }}}
  In the example above, the table is bucketed(clustered by) userid and within each bucket the data is sorted in the increasing order of viewTime. Such an organization allows the user to do efficient sampling on the clustered column - in this case userid. The sorting property allows internal operators to take advantage of the better-known data structure while evaluating queries, also increasing efficiency.
@@ -233, +233 @@

      PARTITIONED BY(dt STRING, country STRING) 
      CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS 
      ROW FORMAT DELIMITED
-             FIELDS TERMINATED BY '\001' 
+             FIELDS TERMINATED BY '1' 
-             COLLECTION ITEMS TERMINATED BY '\002' 
+             COLLECTION ITEMS TERMINATED BY '2' 
-             MAP KEYS TERMINATED BY '\003' 
+             MAP KEYS TERMINATED BY '3' 
-             LINES TERMINATED BY '\012' 
+             LINES TERMINATED BY '12' 
      STORED AS SEQUENCEFILE; 
  }}}
  
@@ -290, +290 @@

                      ip STRING COMMENT 'IP Address of the User', 
                      country STRING COMMENT 'country of origination') 
      COMMENT 'This is the staging page view table' 
-     ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' LINES TERMINATED BY '\012' 
+     ROW FORMAT DELIMITED FIELDS TERMINATED BY '54' LINES TERMINATED BY '12' 
      STORED AS TEXTFILE 
      LOCATION '/user/data/stagging/page_view';