You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2010/06/08 23:42:16 UTC
[Hadoop Wiki] Trivial Update of "Hive/StorageHandlers" by CarlSteinbach
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "Hive/StorageHandlers" page has been changed by CarlSteinbach.
http://wiki.apache.org/hadoop/Hive/StorageHandlers?action=diff&rev1=3&rev2=4
--------------------------------------------------
+ = Hive Storage Handlers =
+
+ <<TableOfContents>>
+
- = Introduction =
+ == Introduction ==
This page documents the storage handler support being added to Hive as
part of work on [[Hive/HBaseIntegration]]. The motivation is to make
@@ -24, +28 @@
managing object definitions in both the Hive metastore and the
other system's catalog simultaneously and consistently.
- = Terminology =
+ == Terminology ==
Before storage handlers, Hive already had a concept of ''managed'' vs
''external'' tables. A managed table is one for which the definition
@@ -50, +54 @@
Note that we avoid the term ''file-based'' in these definitions, since
the form of storage used by the other system is irrelevant.
- = DDL =
+ == DDL ==
Storage handlers are associated with a table when it is created via
the new STORED BY clause, an alternative to the existing ROW FORMAT
@@ -89, +93 @@
DROP TABLE works as usual, but ALTER TABLE is not yet supported for
non-native tables.
- = Storage Handler Interface =
+ == Storage Handler Interface ==
The Java interface which must be implemented by a storage handler is
reproduced below; for details, see the Javadoc in the code:
@@ -127, +131 @@
attributes on jobProperties. At execution time, only these jobProperties
will be available to the input format, output format, and serde.
- = HiveMetaHook Interface =
+ == HiveMetaHook Interface ==
The {{{HiveMetaHook}}} interface is reproduced below; for details, see
the Javadoc in the code:
@@ -165, +169 @@
result, there is a small window in which a crash during DDL can lead
to the two systems getting out of sync.
- = Open Issues =
+ == Open Issues ==
* The storage handler class name is currently saved to the metastore via table property {{{storage_handler}}}; this should probably be a first-class attribute on MStorageDescriptor instead
* Names of helper classes such as input format and output format are saved into the metastore based on what the storage handler returns during CREATE TABLE; it would be better to leave these null in case they are changed later as part of a handler upgrade