You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Alex Newman (JIRA)" <ji...@apache.org> on 2011/08/13 00:31:31 UTC

[jira] [Created] (HIVE-2373) Importing hive tables into hbase+hive requires a lot of work which often can be implied

Importing hive tables into hbase+hive requires a lot of work which often can be implied
---------------------------------------------------------------------------------------

                 Key: HIVE-2373
                 URL: https://issues.apache.org/jira/browse/HIVE-2373
             Project: Hive
          Issue Type: Improvement
            Reporter: Alex Newman
            Priority: Minor


The HiveQL way of creating a HBase table looks something like 
REATE TABLE bla(id_1 type_1, id_2 type_2..., id_n type_n)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:id_2, cf:id_3") TBLPROPERTIES ("hbase.table.name" = "blah");

But in most cases huge amounts of this can be assumed from the original table description. In fact in most cases, especially ones when that data was imported from MySQL it is trivial to generate at least one HBase backing for that data. I currently wrote a python script which our users can use to make things simpler. Would anyone be interested in that script? Would it make sense to make it easy from Hive? I hate to add reserved words so any suggestions are welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2373) Importing hive tables into hbase+hive requires a lot of work which often can be implied

Posted by "Alex Newman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085498#comment-13085498 ] 

Alex Newman commented on HIVE-2373:
-----------------------------------

@John, I'm doing some internal review before I paste it. I'm guessing I'm looking at some guidance of what it should look like from someone else, if they have thought of it. I guess I could just add an auto for the configuration parameter.

> Importing hive tables into hbase+hive requires a lot of work which often can be implied
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-2373
>                 URL: https://issues.apache.org/jira/browse/HIVE-2373
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Alex Newman
>            Priority: Minor
>
> The HiveQL way of creating a HBase table looks something like 
> REATE TABLE bla(id_1 type_1, id_2 type_2..., id_n type_n)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:id_2, cf:id_3") TBLPROPERTIES ("hbase.table.name" = "blah");
> But in most cases huge amounts of this can be assumed from the original table description. In fact in most cases, especially ones when that data was imported from MySQL it is trivial to generate at least one HBase backing for that data. I currently wrote a python script which our users can use to make things simpler. Would anyone be interested in that script? Would it make sense to make it easy from Hive? I hate to add reserved words so any suggestions are welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2373) Importing hive tables into hbase+hive requires a lot of work which often can be implied

Posted by "Weidong Bian (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weidong Bian updated HIVE-2373:
-------------------------------

    Status: Patch Available  (was: Open)
    
> Importing hive tables into hbase+hive requires a lot of work which often can be implied
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-2373
>                 URL: https://issues.apache.org/jira/browse/HIVE-2373
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Alex Newman
>            Priority: Minor
>         Attachments: HIVE-2373_v1.patch
>
>
> The HiveQL way of creating a HBase table looks something like 
> REATE TABLE bla(id_1 type_1, id_2 type_2..., id_n type_n)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:id_2, cf:id_3") TBLPROPERTIES ("hbase.table.name" = "blah");
> But in most cases huge amounts of this can be assumed from the original table description. In fact in most cases, especially ones when that data was imported from MySQL it is trivial to generate at least one HBase backing for that data. I currently wrote a python script which our users can use to make things simpler. Would anyone be interested in that script? Would it make sense to make it easy from Hive? I hate to add reserved words so any suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2373) Importing hive tables into hbase+hive requires a lot of work which often can be implied

Posted by "Weidong Bian (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415990#comment-13415990 ] 

Weidong Bian commented on HIVE-2373:
------------------------------------

I've also encountered this issue and got a quick and dirty fix for this.
the attached preliminary patch is to specify a hard coded default mapping if "WITH SERDEPROPERTIES ("hbase.columns.mapping")" is missing.
It will use the first column specified by the user as :key and "cf" as the column family name and of course will only work if all columns are mapped to one column family.
A better approach would be allow the user to specify something like WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key@2") to specify the second column as the :key and add the rest automatically. If anyone is interested, I can work on this.
                
> Importing hive tables into hbase+hive requires a lot of work which often can be implied
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-2373
>                 URL: https://issues.apache.org/jira/browse/HIVE-2373
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Alex Newman
>            Priority: Minor
>
> The HiveQL way of creating a HBase table looks something like 
> REATE TABLE bla(id_1 type_1, id_2 type_2..., id_n type_n)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:id_2, cf:id_3") TBLPROPERTIES ("hbase.table.name" = "blah");
> But in most cases huge amounts of this can be assumed from the original table description. In fact in most cases, especially ones when that data was imported from MySQL it is trivial to generate at least one HBase backing for that data. I currently wrote a python script which our users can use to make things simpler. Would anyone be interested in that script? Would it make sense to make it easy from Hive? I hate to add reserved words so any suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2373) Importing hive tables into hbase+hive requires a lot of work which often can be implied

Posted by "Weidong Bian (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weidong Bian updated HIVE-2373:
-------------------------------

    Attachment: HIVE-2373_v1.patch
    
> Importing hive tables into hbase+hive requires a lot of work which often can be implied
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-2373
>                 URL: https://issues.apache.org/jira/browse/HIVE-2373
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Alex Newman
>            Priority: Minor
>         Attachments: HIVE-2373_v1.patch
>
>
> The HiveQL way of creating a HBase table looks something like 
> REATE TABLE bla(id_1 type_1, id_2 type_2..., id_n type_n)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:id_2, cf:id_3") TBLPROPERTIES ("hbase.table.name" = "blah");
> But in most cases huge amounts of this can be assumed from the original table description. In fact in most cases, especially ones when that data was imported from MySQL it is trivial to generate at least one HBase backing for that data. I currently wrote a python script which our users can use to make things simpler. Would anyone be interested in that script? Would it make sense to make it easy from Hive? I hate to add reserved words so any suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2373) Importing hive tables into hbase+hive requires a lot of work which often can be implied

Posted by "Shengsheng Huang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490469#comment-13490469 ] 

Shengsheng Huang commented on HIVE-2373:
----------------------------------------

Review Request submitted @ https://reviews.apache.org/r/7866/
                
> Importing hive tables into hbase+hive requires a lot of work which often can be implied
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-2373
>                 URL: https://issues.apache.org/jira/browse/HIVE-2373
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Alex Newman
>            Priority: Minor
>         Attachments: HIVE-2373_v1.patch
>
>
> The HiveQL way of creating a HBase table looks something like 
> REATE TABLE bla(id_1 type_1, id_2 type_2..., id_n type_n)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:id_2, cf:id_3") TBLPROPERTIES ("hbase.table.name" = "blah");
> But in most cases huge amounts of this can be assumed from the original table description. In fact in most cases, especially ones when that data was imported from MySQL it is trivial to generate at least one HBase backing for that data. I currently wrote a python script which our users can use to make things simpler. Would anyone be interested in that script? Would it make sense to make it easy from Hive? I hate to add reserved words so any suggestions are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2373) Importing hive tables into hbase+hive requires a lot of work which often can be implied

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085478#comment-13085478 ] 

John Sichi commented on HIVE-2373:
----------------------------------

Posting the script somewhere and linking it from the wiki would be a good start:

https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

Putting something into Hive's HBase handler would be even better.  There's no need to add new reserved words; you could add new optional automapping configuration parameters to the HBase handler.


> Importing hive tables into hbase+hive requires a lot of work which often can be implied
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-2373
>                 URL: https://issues.apache.org/jira/browse/HIVE-2373
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Alex Newman
>            Priority: Minor
>
> The HiveQL way of creating a HBase table looks something like 
> REATE TABLE bla(id_1 type_1, id_2 type_2..., id_n type_n)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:id_2, cf:id_3") TBLPROPERTIES ("hbase.table.name" = "blah");
> But in most cases huge amounts of this can be assumed from the original table description. In fact in most cases, especially ones when that data was imported from MySQL it is trivial to generate at least one HBase backing for that data. I currently wrote a python script which our users can use to make things simpler. Would anyone be interested in that script? Would it make sense to make it easy from Hive? I hate to add reserved words so any suggestions are welcome.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira