You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Weidong Bian (JIRA)" <ji...@apache.org> on 2012/07/17 09:53:36 UTC

[jira] [Commented] (HIVE-2373) Importing hive tables into hbase+hive requires a lot of work which often can be implied

    [ https://issues.apache.org/jira/browse/HIVE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415990#comment-13415990 ] 

Weidong Bian commented on HIVE-2373:
------------------------------------

I've also encountered this issue and got a quick and dirty fix for this.
the attached preliminary patch is to specify a hard coded default mapping if "WITH SERDEPROPERTIES ("hbase.columns.mapping")" is missing.
It will use the first column specified by the user as :key and "cf" as the column family name and of course will only work if all columns are mapped to one column family.
A better approach would be allow the user to specify something like WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key@2") to specify the second column as the :key and add the rest automatically. If anyone is interested, I can work on this.
                
> Importing hive tables into hbase+hive requires a lot of work which often can be implied
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-2373
>                 URL: https://issues.apache.org/jira/browse/HIVE-2373
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Alex Newman
>            Priority: Minor
>
> The HiveQL way of creating a HBase table looks something like 
> REATE TABLE bla(id_1 type_1, id_2 type_2..., id_n type_n)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:id_2, cf:id_3") TBLPROPERTIES ("hbase.table.name" = "blah");
> But in most cases huge amounts of this can be assumed from the original table description. In fact in most cases, especially ones when that data was imported from MySQL it is trivial to generate at least one HBase backing for that data. I currently wrote a python script which our users can use to make things simpler. Would anyone be interested in that script? Would it make sense to make it easy from Hive? I hate to add reserved words so any suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira