You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "schubert zhang (JIRA)" <ji...@apache.org> on 2009/08/10 19:26:15 UTC
[jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables

    [ https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741431#action_12741431 ] 

schubert zhang commented on HIVE-705:
-------------------------------------

Hi Samuel,

Thanks for your great job.
In you patch, we found many java files are modified, it is really a big effort. I don't know if there is any way to avoid such a big modification.

Regards the schema mapping between HBase table and Hive SQL table, I have following consideration.
1. We just want to use HBase as a scalable structure data store, or key-value store.
2. The performance is not good when we maped SQL columns to HBase columns in our past experience. For example, we have a table with 20 columns, then, each read or write of a row will comprise 20 key-value operations. It is ineffective.

How about consider more flexible schema mapping:
1. one HBase column can map to multiple hive-SQL columns with a SerDe. e.g.  cf1:q1 => {(col1, col2, col3), Default SerDe} 
2. one HBase column family can map to multiple hive-SQL columns with a SerDe. e.g. cf2: => {(col3, col5, col6), Default SerDe} 
3. your MAP column (in Hive table) for sparse column family. [Optional] Since Hive is a structured data analysis front-end, we can omit this feature at the beginning.

For example:

CREATE EXTERNAL TABLE hive_table (pkey STRING,  col1 STRING, col2 INT, col2, STRING, col3 INT, col4 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MyHBaseSerDe'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "cf1:(col1,col2,col3) with DefaultSerDe, cf2:c1 (col4) with DefaultSerDe",
)
STORED AS HBASETABLE
LOCATION '<hbase_table_location>'

Usually,  we want a more advanced data store backend than HDFS, to achieve more flexible data placement and indexing. HBase's data model is very good to meet this requirement, but we may need not the full fearures of HBase here.

--
Look forward to have more communication with you in Chinese, by your convenience.

Schubert

> Let Hive can analyse hbase's tables
> -----------------------------------
>
>                 Key: HIVE-705
>                 URL: https://issues.apache.org/jira/browse/HIVE-705
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Samuel Guo
>         Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, HIVE-705_draft.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored in hbase easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.