You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by John Sichi <js...@facebook.com> on 2010/10/01 03:45:07 UTC

Re: Incremental load from Hive into HBase?

Good point.  In retrospect, I guess I should have modified the grammar to support a regular INSERT (without OVERWRITE) and require usage of that for HBase (but prohibit it for native tables).  Probably too late for that now, so I guess we'll just say that the OVERWRITE in the HBase case means that if keys match existing rows, those rows are overwritten (but existing rows are not deleted as they would be with a native Hive table).

If that's OK, I'll update the wiki accordingly.

JVS

On Sep 28, 2010, at 10:50 PM, Leo Alekseyev wrote:

> On Tue, Sep 28, 2010 at 7:50 PM, Leo Alekseyev <dn...@gmail.com> wrote:
>> I can create and load data into an HBase table as per the instructions
>> from Hive/HBase Integration wiki page using something like
>> create table ...
>> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ....);
>> 
>> Is it possible to then load more data from Hive into this table?..  I
>> keep seeing references to "bulk inserts" vs "incremental inserts" in
>> people's slides, as well as references to HBASE-1923, but no concrete
>> examples.
> 
> I will start by answering my own question: with HBaseStorageHandler,
> INSERT OVERWRITE TABLE foo ... statement appears to append rows (given
> that the row keys are unique).  Note that this is different than
> "regular" Hive tables, which would get overwritten under similar
> circumstances.  Perhaps this should be spelled out in the wiki...
> 
> This resolves the original question, but further comments on the issue
> are always welcome :)