You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tajo.apache.org by ji...@apache.org on 2015/01/20 14:04:20 UTC
[2/5] tajo git commit: TAJO-1290: Add HBase Storage Integration
Documentation. (jaehwa)
TAJO-1290: Add HBase Storage Integration Documentation. (jaehwa)
Closes #352
Project: http://git-wip-us.apache.org/repos/asf/tajo/repo
Commit: http://git-wip-us.apache.org/repos/asf/tajo/commit/9d749d6b
Tree: http://git-wip-us.apache.org/repos/asf/tajo/tree/9d749d6b
Diff: http://git-wip-us.apache.org/repos/asf/tajo/diff/9d749d6b
Branch: refs/heads/index_support
Commit: 9d749d6b798d8a3cfb5c11f0b6ad9e8f98c52fd4
Parents: 76ece8b
Author: JaeHwa Jung <bl...@apache.org>
Authored: Mon Jan 19 14:20:25 2015 +0900
Committer: JaeHwa Jung <bl...@apache.org>
Committed: Mon Jan 19 14:20:25 2015 +0900
----------------------------------------------------------------------
CHANGES | 2 +
tajo-docs/src/main/sphinx/hbase_integration.rst | 181 +++++++++++++++++++
tajo-docs/src/main/sphinx/index.rst | 3 +-
3 files changed, 185 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/tajo/blob/9d749d6b/CHANGES
----------------------------------------------------------------------
diff --git a/CHANGES b/CHANGES
index 212131d..1be019a 100644
--- a/CHANGES
+++ b/CHANGES
@@ -27,6 +27,8 @@ Release 0.9.1 - unreleased
IMPROVEMENT
+ TAJO-1290: Add HBase Storage Integration Documentation. (jaehwa)
+
TAJO-1293: Tajo have to accept hostname beginning with digits.
(Jinhang Choi via jihun)
http://git-wip-us.apache.org/repos/asf/tajo/blob/9d749d6b/tajo-docs/src/main/sphinx/hbase_integration.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/hbase_integration.rst b/tajo-docs/src/main/sphinx/hbase_integration.rst
new file mode 100644
index 0000000..73ef6d1
--- /dev/null
+++ b/tajo-docs/src/main/sphinx/hbase_integration.rst
@@ -0,0 +1,181 @@
+*************************************
+HBase Integration
+*************************************
+
+Apache Tajo™ storage supports integration with Apache HBase™.
+This integration allows Tajo to access all tables used in Apache HBase.
+
+In order to use this feature, you need to build add some configs into ``conf/tajo-env.sh`` and then add some properties into a table create statement.
+
+This section describes how to setup HBase integration.
+
+First, you need to set your HBase home directory to the environment variable ``HBASE_HOME`` in conf/tajo-env.sh as follows: ::
+
+ export HBASE_HOME=/path/to/your/hbase/directory
+
+If you set the directory, Tajo will add HBase library file to classpath.
+
+
+
+========================
+CREATE TABLE
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+ CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <table_name> [(<column_name> <data_type>, ... )]
+ USING hbase
+ WITH ('table'='<hbase_table_name>'
+ , 'columns'=':key,<column_family_name>:<qualifier_name>, ...'
+ , 'hbase.zookeeper.quorum'='<zookeeper_address>')
+
+Options
+
+* ``table`` : Set hbase origin table name. If you want to create an external table, the table must exists on HBase. The other way, if you want to create a managed table, the table must doesn't exist on HBase.
+* ``columns`` : :key means HBase row key. The number of columns entry need to equals to the number of Tajo table column
+* ``hbase.zookeeper.quorum`` : Set zookeeper quorum address. You can use different zookeeper cluster on the same Tajo database. If you don't set the zookeeper address, Tajo will refer the property of hbase-site.xml file.
+
+
+``IF NOT EXISTS`` allows ``CREATE [EXTERNAL] TABLE`` statement to avoid an error which occurs when the table does not exist.
+
+
+
+========================
+ DROP TABLE
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+ DROP TABLE [IF EXISTS] <table_name> [PURGE]
+
+``IF EXISTS`` allows ``DROP TABLE`` statement to avoid an error which occurs when the table does not exist. ``DROP TABLE`` statement removes a table from Tajo catalog, but it does not remove the contents on HBase cluster. If ``PURGE`` option is given, ``DROP TABLE`` statement will eliminate the entry in the catalog as well as the contents on HBase cluster.
+
+
+========================
+INSERT (OVERWRITE) INTO
+========================
+
+INSERT OVERWRITE statement overwrites a table data of an existing table. Tajo's INSERT OVERWRITE statement follows ``INSERT INTO SELECT`` statement of SQL. The examples are as follows:
+
+.. code-block:: sql
+
+ -- when a target table schema and output schema are equivalent to each other
+ INSERT OVERWRITE INTO t1 SELECT l_orderkey, l_partkey, l_quantity FROM lineitem;
+ -- or
+ INSERT OVERWRITE INTO t1 SELECT * FROM lineitem;
+
+ -- when the output schema are smaller than the target table schema
+ INSERT OVERWRITE INTO t1 SELECT l_orderkey FROM lineitem;
+
+ -- when you want to specify certain target columns
+ INSERT OVERWRITE INTO t1 (col1, col3) SELECT l_orderkey, l_quantity FROM lineitem;
+
+
+.. note::
+
+ If you don't set row key option, You are never able to use your table data. Because Tajo need to have some key columns for sorting before creating result data.
+
+
+
+========================
+Usage
+========================
+
+In order to create a new HBase table which is to be managed by Tajo, use the USING clause on CREATE TABLE:
+
+.. code-block:: sql
+
+ CREATE EXTERNAL TABLE blog (rowkey text, author text, register_date text, title text)
+ USING hbase WITH (
+ 'table'='blog'
+ , 'columns'=':key,info:author,info:date,content:title');
+
+After executing the command above, you should be able to see the new table in the HBase shell:
+
+.. code-block:: sql
+
+ $ hbase shell
+ create 'blog', {NAME=>'info'}, {NAME=>'content'}
+ put 'blog', 'hyunsik-02', 'content:title', 'Getting started with Tajo on your desktop'
+ put 'blog', 'hyunsik-02', 'info:author', 'Hyunsik Choi'
+ put 'blog', 'hyunsik-02', 'info:date', '2014-12-03'
+ put 'blog', 'blrunner-01', 'content:title', 'Apache Tajo: A Big Data Warehouse System on Hadoop'
+ put 'blog', 'blrunner-01', 'info:author', 'Jaehwa Jung'
+ put 'blog', 'blrunner-01', 'info:date', '2014-10-31'
+ put 'blog', 'jhkim-01', 'content:title', 'APACHE TAJO™ v0.9 HAS ARRIVED!'
+ put 'blog', 'jhkim-01', 'info:author', 'Jinho Kim'
+ put 'blog', 'jhkim-01', 'info:date', '2014-10-22'
+
+And then create the table and query the table meta data with ``\d`` option:
+
+.. code-block:: sql
+
+ default> \d blog;
+
+ table name: default.blog
+ table path:
+ store type: HBASE
+ number of rows: unknown
+ volume: 0 B
+ Options:
+ 'columns'=':key,info:author,info:date,content:title'
+ 'table'='blog'
+
+ schema:
+ rowkey TEXT
+ author TEXT
+ register_date TEXT
+ title TEXT
+
+
+And then query the table as follows:
+
+.. code-block:: sql
+
+ default> SELECT * FROM blog;
+ rowkey, author, register_date, title
+ -------------------------------
+ blrunner-01, Jaehwa Jung, 2014-10-31, Apache Tajo: A Big Data Warehouse System on Hadoop
+ hyunsik-02, Hyunsik Choi, 2014-12-03, Getting started with Tajo on your desktop
+ jhkim-01, Jinho Kim, 2014-10-22, APACHE TAJO™ v0.9 HAS ARRIVED!
+
+ default> SELECT * FROM blog WHERE rowkey = 'blrunner-01';
+ Progress: 100%, response time: 2.043 sec
+ rowkey, author, register_date, title
+ -------------------------------
+ blrunner-01, Jaehwa Jung, 2014-10-31, Apache Tajo: A Big Data Warehouse System on Hadoop
+
+
+Here's how to insert data the HBase table:
+
+.. code-block:: sql
+
+ CREATE TABLE blog_backup(rowkey text, author text, register_date text, title text)
+ USING hbase WITH (
+ 'table'='blog_backup'
+ , 'columns'=':key,info:author,info:date,content:title');
+ INSERT OVERWRITE INTO blog_backup SELECT * FROM blog;
+
+
+Use HBase shell to verify that the data actually got loaded:
+
+.. code-block:: sql
+
+ hbase(main):004:0> scan 'blog_backup'
+ ROW COLUMN+CELL
+ blrunner-01 column=content:title, timestamp=1421227531054, value=Apache Tajo: A Big Data Warehouse System on Hadoop
+ blrunner-01 column=info:author, timestamp=1421227531054, value=Jaehwa Jung
+ blrunner-01 column=info:date, timestamp=1421227531054, value=2014-10-31
+ hyunsik-02 column=content:title, timestamp=1421227531054, value=Getting started with Tajo on your desktop
+ hyunsik-02 column=info:author, timestamp=1421227531054, value=Hyunsik Choi
+ hyunsik-02 column=info:date, timestamp=1421227531054, value=2014-12-03
+ jhkim-01 column=content:title, timestamp=1421227531054, value=APACHE TAJO\xE2\x84\xA2 v0.9 HAS ARRIVED!
+ jhkim-01 column=info:author, timestamp=1421227531054, value=Jinho Kim
+ jhkim-01 column=info:date, timestamp=1421227531054, value=2014-10-22
+ 3 row(s) in 0.0470 seconds
+
+
http://git-wip-us.apache.org/repos/asf/tajo/blob/9d749d6b/tajo-docs/src/main/sphinx/index.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/index.rst b/tajo-docs/src/main/sphinx/index.rst
index 667f270..1222f54 100644
--- a/tajo-docs/src/main/sphinx/index.rst
+++ b/tajo-docs/src/main/sphinx/index.rst
@@ -40,7 +40,8 @@ Table of Contents:
index_overview
backup_and_restore
hcatalog_integration
- jdbc_driver
+ hbase_integration
+ jdbc_driver
tajo_client_api
faq