You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Chetan Khatri <ch...@gmail.com> on 2017/01/27 18:49:56 UTC
Incremental import from HBase to Hive
Hello Community,
I am working with HBase 1.2.4 , what would be the best approach to do
Incremental load from HBase to Hive ?
Thanks.
Re: Incremental import from HBase to Hive
Posted by Rohit Jain <ro...@esgyn.com>.
An example of that is how in Trafodion one can generate a Divisioning column, such a week number, derived from a date column, that becomes the leading part of a multi-column HBase key. Of course, Trafodion has a salt key as a prefix to spread the data across the regions in a balanced way, but you may not need that in your scenario. Then you can use that to just access the data for the last week.
Rohit
> On Jan 28, 2017, at 1:34 PM, Josh Elser <el...@apache.org> wrote:
>
> (Please stop adding the dev@hbase mailing list. This is a question for the user@ list only.)
>
> Unless you have a time component included in your HBase data, there is no way to find all "new" data in HBase with the timestamp component aside from scanning the entire HBase table. Performing a full table scan is not an ideal scenario, as it is not a situation which HBase is optimized for.
>
> You can consider including a leading component of time in your rowKey or creating an index table of time loaded to rowKey to efficiently perform these lookups.
>
> Chetan Khatri wrote:
>> Sure, There are several applications talks to HBase and populate data, Now
>> I want to load Incrementally data from HBase and do transformations like
>> Data Quality (filters) and save at Hive.
>>
>> Incremental load means - I want to run this job weekly, and making sure
>> should not get duplication at Hive level.
>>
>> Thanks.
>>
>>> On Sat, Jan 28, 2017 at 1:00 AM, Josh Elser<el...@apache.org> wrote:
>>>
>>> (-cc dev)
>>>
>>> Might you be able to be more specific in the context of your question?
>>>
>>> What kind of requirements do you have?
>>>
>>>
>>> Chetan Khatri wrote:
>>>
>>>> Hello Community,
>>>>
>>>> I am working with HBase 1.2.4 , what would be the best approach to do
>>>> Incremental load from HBase to Hive ?
>>>>
>>>> Thanks.
>>>>
>>>>
>>
Re: Incremental import from HBase to Hive
Posted by Josh Elser <el...@apache.org>.
(Please stop adding the dev@hbase mailing list. This is a question for
the user@ list only.)
Unless you have a time component included in your HBase data, there is
no way to find all "new" data in HBase with the timestamp component
aside from scanning the entire HBase table. Performing a full table scan
is not an ideal scenario, as it is not a situation which HBase is
optimized for.
You can consider including a leading component of time in your rowKey or
creating an index table of time loaded to rowKey to efficiently perform
these lookups.
Chetan Khatri wrote:
> Sure, There are several applications talks to HBase and populate data, Now
> I want to load Incrementally data from HBase and do transformations like
> Data Quality (filters) and save at Hive.
>
> Incremental load means - I want to run this job weekly, and making sure
> should not get duplication at Hive level.
>
> Thanks.
>
> On Sat, Jan 28, 2017 at 1:00 AM, Josh Elser<el...@apache.org> wrote:
>
>> (-cc dev)
>>
>> Might you be able to be more specific in the context of your question?
>>
>> What kind of requirements do you have?
>>
>>
>> Chetan Khatri wrote:
>>
>>> Hello Community,
>>>
>>> I am working with HBase 1.2.4 , what would be the best approach to do
>>> Incremental load from HBase to Hive ?
>>>
>>> Thanks.
>>>
>>>
>
Re: Incremental import from HBase to Hive
Posted by Chetan Khatri <ch...@gmail.com>.
Sure, There are several applications talks to HBase and populate data, Now
I want to load Incrementally data from HBase and do transformations like
Data Quality (filters) and save at Hive.
Incremental load means - I want to run this job weekly, and making sure
should not get duplication at Hive level.
Thanks.
On Sat, Jan 28, 2017 at 1:00 AM, Josh Elser <el...@apache.org> wrote:
> (-cc dev)
>
> Might you be able to be more specific in the context of your question?
>
> What kind of requirements do you have?
>
>
> Chetan Khatri wrote:
>
>> Hello Community,
>>
>> I am working with HBase 1.2.4 , what would be the best approach to do
>> Incremental load from HBase to Hive ?
>>
>> Thanks.
>>
>>
Re: Incremental import from HBase to Hive
Posted by Chetan Khatri <ch...@gmail.com>.
Sure, There are several applications talks to HBase and populate data, Now
I want to load Incrementally data from HBase and do transformations like
Data Quality (filters) and save at Hive.
Incremental load means - I want to run this job weekly, and making sure
should not get duplication at Hive level.
Thanks.
On Sat, Jan 28, 2017 at 1:00 AM, Josh Elser <el...@apache.org> wrote:
> (-cc dev)
>
> Might you be able to be more specific in the context of your question?
>
> What kind of requirements do you have?
>
>
> Chetan Khatri wrote:
>
>> Hello Community,
>>
>> I am working with HBase 1.2.4 , what would be the best approach to do
>> Incremental load from HBase to Hive ?
>>
>> Thanks.
>>
>>
Re: Incremental import from HBase to Hive
Posted by Josh Elser <el...@apache.org>.
(-cc dev)
Might you be able to be more specific in the context of your question?
What kind of requirements do you have?
Chetan Khatri wrote:
> Hello Community,
>
> I am working with HBase 1.2.4 , what would be the best approach to do
> Incremental load from HBase to Hive ?
>
> Thanks.
>