You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2011/04/15 19:37:05 UTC

[Hadoop Wiki] Update of "Hive/LanguageManual/DDL" by PhiloVivero

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/DDL" page has been changed by PhiloVivero.
The comment on this change is: Add example for changing serde for CSV - this seems a common need, and it took me over an hour to figure this out because no good example exists.
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL?action=diff&rev1=82&rev2=83

--------------------------------------------------

  = Hive Data Definition Language =
- 
  ## page was renamed from Hive/LanguageManual/LanguageManual/DDL
  <<TableOfContents>>
  
@@ -61, +60 @@

    | RCFILE     (Note:  only available starting with 0.6.0)
    | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
  }}}
- 
  CREATE TABLE creates a table with the given name. An error is thrown if a table or view with the same name already exists. You can use IF NOT EXISTS to skip the error.
  
  The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes in handy if you already have data generated. When dropping an EXTERNAL table, data in the table is NOT deleted from the file system.
@@ -172, +170 @@

  
  See the next section on ALTER TABLE for how to drop partitions.
  
- Otherwise, the table information is removed from the metastore and the raw data is removed as if by 'hadoop dfs -rm'. In many cases, this results in the table data being moved into the user's .Trash folder in their home directory; users who mistakenly DROP TABLEs mistakenly may thus be able to recover their lost data by re-creating a table with the same schema, re-creating any necessary partitions, and then moving the data back into place manually using Hadoop. This solution is subject to change over time or across installations as it relies on the underlying implementation; users are strongly encouraged not to drop tables capriciously. 
+ Otherwise, the table information is removed from the metastore and the raw data is removed as if by 'hadoop dfs -rm'. In many cases, this results in the table data being moved into the user's .Trash folder in their home directory; users who mistakenly DROP TABLEs mistakenly may thus be able to recover their lost data by re-creating a table with the same schema, re-creating any necessary partitions, and then moving the data back into place manually using Hadoop. This solution is subject to change over time or across installations as it relies on the underlying implementation; users are strongly encouraged not to drop tables capriciously.
  
  In Hive 0.70 or later, DROP returns an error if the table doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.
  
@@ -191, +189 @@

  {{{
  ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/path/to/us/part080808' PARTITION (dt='2008-08-09', country='us') location '/path/to/us/part080809';
  }}}
- 
  An error is thrown if the partition_spec for the table already exists. You can use IF NOT EXISTS to skip the error.
  
  === Drop Partitions ===
@@ -209, +206 @@

  {{{
  ALTER TABLE table_name RENAME TO new_table_name
  }}}
- This statement lets you change the name of a table to a different name. 
+ This statement lets you change the name of a table to a different name.
  
  As of version 0.6, a rename on a managed table moves its HDFS location as well.  (Older Hive versions just renamed the table in the metastore without moving the HDFS location.)
  
@@ -258, +255 @@

  }}}
  This statement enables you to add user defined metadata to table SerDe object. The serde properties are passed to the table's SerDe when it is being initialized by Hive to serialize and deserialize data. So users can store any information required for their custom serde here. Refer to SerDe section of Users Guide for more information.
  
+ Example, note that both property_name and property_value must be quoted:
+ 
+ {{{
+ ALTER TABLE table_name SET SERDEPROPERTIES ('field.delim' = ',');
+ }}}
+ 
  === Alter Table File Format and Organization ===
  {{{
  ALTER TABLE table_name [partitionSpec] SET FILEFORMAT file_format
@@ -273, +276 @@

  {{{
  ALTER TABLE table_name [partitionSpec] SET LOCATION "new location"
  }}}
- 
  === Alter Table Touch ===
  {{{
  ALTER TABLE table_name TOUCH;
  ALTER TABLE table_name TOUCH PARTITION partition_spec;
  }}}
- 
  TOUCH reads the metadata, and writes it back. This has the effect of causing the pre/post execute hooks to fire. An example use case is if you have a hook that logs all the tables/partitions that were modified, along with an external script that alters the files on HDFS directly. Since the script modifies files outside of hive, the modification wouldn't be logged by the hook. The external script could call TOUCH to fire the hook and mark the said table or partition as modified.
  
  Also, it may be useful later if we incorporate reliable last modified times. Then touch would update that time as well.
  
- Note that TOUCH doesn't create a table or partition if it doesn't already exist. (See [[Hive/LanguageManual/DDL#Create.2BAC8-Drop_Table|Create Table]])
+ Note that TOUCH doesn't create a table or partition if it doesn't already exist. (See [[#Create.2BAC8-Drop_Table|Create Table]])
- 
  
  === Alter Table (Un)Archive ===
- 
  {{{
  ALTER TABLE table_name ARCHIVE PARTITION partition_spec;
  ALTER TABLE table_name UNARCHIVE PARTITION parition_spec;
  }}}
- 
  Archiving is a feature to moves a partition's files into a Hadoop Archive (HAR). Note that only the file count will be reduced; HAR does not provide any compression. See [[Hive/LanguageManual/Archiving]] for more information
+ 
  == Create/Drop View ==
- 
  ''Note:'' View support is only available starting in Hive 0.6.
  
  === Create View ===
- 
  {{{
  CREATE VIEW [IF NOT EXISTS] view_name [ (column_name [COMMENT column_comment], ...) ]
  [COMMENT view_comment]
  [TBLPROPERTIES (property_name = property_value, ...)]
  AS SELECT ...
  }}}
- 
  CREATE VIEW creates a view with the given name. An error is thrown if a table or view with the same name already exists. You can use IF NOT EXISTS to skip the error.
  
  If no column names are supplied, the names of the view's columns will be derived automatically from the defining SELECT expression.  (If the SELECT contains unaliased scalar expressions such as x+y, the resulting view column names will be generated in the form _C0, _C1, etc.)  When renaming columns, column comments can also optionally be supplied.  (Comments are not automatically inherited from underlying columns.)
@@ -325, +321 @@

  Example of view creation:
  
  {{{
- CREATE VIEW onion_referrers(url COMMENT 'URL of Referring page') 
+ CREATE VIEW onion_referrers(url COMMENT 'URL of Referring page')
  COMMENT 'Referrers to The Onion website'
  AS
  SELECT DISTINCT referrer_url
  FROM page_view
  WHERE page_url='http://www.theonion.com';
  }}}
- 
  === Drop View ===
- 
  {{{
  DROP VIEW [IF EXISTS] view_name
  }}}
- 
  DROP VIEW removes metadata for the specified view.  (It is illegal to use DROP TABLE on a view.)
  
  When dropping a view referenced by other views, no warning is given (the dependent views are left dangling as invalid and must be dropped or recreated by the user).
@@ -350, +343 @@

  {{{
  DROP VIEW onion_referrers;
  }}}
- 
- 
  === Alter View Properties ===
  {{{
  ALTER VIEW view_name SET TBLPROPERTIES table_properties
@@ -359, +350 @@

  table_properties:
    : (property_name = property_value, property_name = property_value, ...)
  }}}
- 
  As with ALTER TABLE, you can use this statement to add your own metadata to a view.
  
  == Create/Drop Function ==
@@ -375, +365 @@

  {{{
  DROP TEMPORARY FUNCTION [IF EXISTS] function_name
  }}}
- 
  In Hive 0.70 or later, DROP returns an error if the function doesn't exist, unless IF EXISTS is specified or the configuration variable hive.exec.drop.ignorenonexistent is set to true.
  
  == Create/Drop Index ==
  Not available until 0.7 release
+ 
  === Create Index ===
  {{{
- CREATE INDEX index_name 
+ CREATE INDEX index_name
  ON TABLE base_table_name (col_name, ...)
  AS index_type
  [ WITH DEFERRED REBUILD ]
@@ -397, +387 @@

  [ TBLPROPERTIES (...) ]
  [COMMENT "index comment"]
  }}}
- 
  CREATE INDEX creates an index on a table using the given list of columns as keys.  See http://wiki.apache.org/hadoop/Hive/IndexDev#CREATE_INDEX
  
  === Drop Index ===
@@ -424, +413 @@

  SHOW PARTITIONS lists all the existing partitions for a given base table. Partitions are listed in alphabetical order.
  
  It is also possible to specify parts of a partition specification to filter the resulting list. Note: This feature is only available starting in version 0.6.
+ 
  {{{
  SHOW PARTITIONS table_name PARTITION(ds='2010-03-03');
  SHOW PARTITIONS table_name PARTITION(hr='12');
  SHOW PARTITIONS table_name PARTITION(ds='2010-03-03', hr='12');
  }}}
- 
- 
  === Show Table/Partitions Extended ===
  {{{
  SHOW TABLE EXTENDED [IN|FROM database_name] LIKE identifier_with_wildcards [PARTITION(partition_desc)]
@@ -460, +448 @@

  
  For a view, DESCRIBE TABLE EXTENDED can be used to retrieve the view's definition.  Two relevant attributes are provided:  both the original view definition as specified by the user, and an expanded definition used internally by Hive.
  
- 
  === Describe Partition ===
  {{{
  DESCRIBE [EXTENDED] table_name partition_spec