You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2011/05/14 21:35:36 UTC

[Hadoop Wiki] Update of "Hive/Design" by AllenSmith

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/Design" page has been changed by AllenSmith.
http://wiki.apache.org/hadoop/Hive/Design?action=diff&rev1=13&rev2=14

--------------------------------------------------

  Meta Store store provides two important but often over looked features of a data warehouse: data abstraction and data discovery. Without the data abstractions provided in Hive, user has to provide information about data formats, exractors and loaders along with the query. In Hive, this information given during table creation and reused everytime the table is referenced. This is very similar to the traditional warehousing systems. The second functionality, data discovery, enables users to discover and explore relevant and specific data in the warehouse. Other tools can be built using this metadata to expose and possibly enhance the information about the data and its availability. Hive accomplishes both of these features by providing a metdata repository that is tightly integrated with the Hive query processing system so that data and metadata are in sync.
  
  === Metadata Objects ===
-  * Database - is a namespace for tables. It can be used as an administrative unit in future. The database 'default' is used for tables with no user supplied database name.
+  * Database - is a namespace for tables. It can be used as an <span class="plainlinks">[http://www.outdoorfountains.com/ <span style="color:black;font-weight:normal; text-decoration:none!important; background:none!important; text-decoration:none;">outdoor fountains</span>] administrative unit in future. The database 'default' is used for tables with no user supplied database name.
   * Table - Metadata for table contains list of columns, owner, storage and SerDe information. It can also contain any user supplied key and value data. Storage information includes location of the underlying data, file inout and output formats and bucketing information. SerDe metadata includes the implementation class of serializer and deserializer and any supporting information required by the implementation. All of these information can be provided during the creation of table.
   * Partition - Each partition can have its own columns and SerDe and storage information. This facilitates schema changes without affecting older partitions.