You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/10/10 01:16:56 UTC

[Hadoop Wiki] Update of "Hive/LanguageManual/DDL" by Ning Zhang

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/DDL" page has been changed by Ning Zhang:
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL?action=diff&rev1=16&rev2=17

    [ROW FORMAT row_format]
    [STORED AS file_format]
    [LOCATION hdfs_path]
+   [AS select_statement]
  
  data_type
    : primitive_type
@@ -57, +58 @@

  Partitioned tables can be created using PARTIONED BY clause. A table can have one or more partition columns and a separate data directory is created for each set of partition columns values. Further tables or partitions can be bucketed using CLUSTERD BY columns and data can be sorted with in that bucket by SORT BY columns. This can improve performance on certain kind of queries.
  
  Table names and column names are case insensitive but SerDe and property names are case sensitive.
+ 
+ Tables can also be created and populated by the results of a query in one CTAS (create-table-as-select) statement. There are two parts in CTAS, the select part can be any select statement supported by HiveQL. The create part of the CTAS takes the result schema (column names are aliases in the select clause and data types are derived from the select expressions) from the select part and create the target table with other properties (such as SerDe and storage format). The only restrictions in the CTAS is that the target table cannot be a partitioned table nor an external table. 
  
  Examples:
  
@@ -119, +122 @@

  
  You can use above statement to create page_view table which points to data location any hdfs location. But you have to make sure that the data is delimited as specified in the query above.
  
+ 
+ {{{
+ CREATE TABLE new_key_value_store
+    ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe" 
+    STORED AS RCFile AS 
+ SELECT (key % 1024) new_key, concat(key, value) key_value_pair 
+ FROM src 
+ SORT BY new_key, key_value_pair;
+ }}}
+ 
+ The above CTAS statement create the target table new_key_value_store with the schema indicated by the results of the SELECT statement. In addition, the new target table is using a specific SerDe and storage format independent of the source tables in the SELECT statement. 
+ 
  ==== Inserting Data Into Bucketed Tables ====
  The CLUSTER BY and SORTED BY creation commands do not effect how data is inserted into a table -- only how it is read.  This means that users must actively insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query.