You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "John Sichi (JIRA)" <ji...@apache.org> on 2011/08/10 23:27:27 UTC

[jira] [Created] (HIVE-2365) SQL support for bulk load into HBase

SQL support for bulk load into HBase
------------------------------------

                 Key: HIVE-2365
                 URL: https://issues.apache.org/jira/browse/HIVE-2365
             Project: Hive
          Issue Type: Improvement
          Components: HBase Handler
            Reporter: John Sichi


Support the "as simple as this" SQL for bulk load from Hive into HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2365) SQL support for bulk load into HBase

Posted by "Alex Newman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085420#comment-13085420 ] 

Alex Newman commented on HIVE-2365:
-----------------------------------

So I would love to take this up but, I have a couple of questions. 

Is the issue that the create statement is to complicated? You do have to specify a lot of things.
Do you want all of the steps in one SQL command? 
What's the right balance of configurability vs automagically?

> SQL support for bulk load into HBase
> ------------------------------------
>
>                 Key: HIVE-2365
>                 URL: https://issues.apache.org/jira/browse/HIVE-2365
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>            Reporter: John Sichi
>
> Support the "as simple as this" SQL for bulk load from Hive into HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2365) SQL support for bulk load into HBase

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085477#comment-13085477 ] 

John Sichi commented on HIVE-2365:
----------------------------------

(Just realized I forgot to link the original doc where "as simple as this" is mentioned.)

https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad

This issue pertains to INSERT of large amounts of data into HBase from Hive (not CREATE; I'll follow up separately in HIVE-2373).

The major challenges here are:

* automating the sampling needed for coming up with the range partitioning for the global sort
* extending Hive's INSERT to express the whole thing
* chaining together the sampling job with the actual load job and tying together the relevant bits such as temporary file locations (we've had success doing something similar via reentrant SQL for index load/query statements)
* making the load use the HBase bulk load API which was added subsequent to the original Hive work


> SQL support for bulk load into HBase
> ------------------------------------
>
>                 Key: HIVE-2365
>                 URL: https://issues.apache.org/jira/browse/HIVE-2365
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>            Reporter: John Sichi
>
> Support the "as simple as this" SQL for bulk load from Hive into HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira