You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by shengjie min <ke...@gmail.com> on 2013/08/07 18:04:39 UTC

incremental updates mysql to HBase

Hi guys,

TO simplify my question, Let's say, I have a mysql table called 'student', looks like this:

+----+----------+-----+
| id | name     | sex | 
+----+----------+-----+
|  1 | Alice       |   0  | 
|  2 | Bob         |   1  |  
|  3 | Charles  |   1  |  
+----+----------+-----+

I want to import this table to HBase periodically which means I will run this sqoop job periodically. There are two goals:

A.  every time there is a new record inserted to mysql table, e.g. (4, David, 1), I hope my next sqoop import will catch it and put it in HBase.
B. if  there is any updates have been made to mysql rows 1, 2, 3, I want to have the updates in HBase too after next round sqoop import.

I checked two types incremental updates sqoop has:  Append mode seems only satisfied goal A while Last-modified mode will require my mysql table has a timestamp column for each row(which I don't in real life). I know if I don't use incremental updates options at all, I can just get way with it by running a fresh import every time, but if my mysql table is really huge and fresh import might be a performance killer.

Is there anyway I can just do incremental updates instead of having to re-run the whole import to get NEW RECORDS + UPDATES ON OLD ROWS?


Shengjie

Re: incremental updates mysql to HBase

Posted by Shengjie Min <ke...@gmail.com>.
Thanks Joanne,

so normally what's the best practice here if I were to use sqoop? For now,
I feel like just doing a brand new re-import everytime makes more sense,
but I will need to keep eyes on the performance side, what you think?

Shengjie


On 8 August 2013 00:15, Joanne Chan <jc...@shutterstock.com> wrote:

> 1. No
> 2. Then, you are out of luck.
>
> Sqoop isn't magic, imagine how would sqoop know what data was updated
> without any indicators?
>
>
> On Wed, Aug 7, 2013 at 12:12 PM, shengjie min <ke...@gmail.com>wrote:
>
>>
>>
>> - Add an update_time column to the source table and do the incremental
>> load by that update_time column.
>>
>> two questions to that then:
>>
>> 1. will sqoop job updates that field "update_time" automatically when it
>> runs OR my application needs to write that field?
>> 2. What if I don't wanna or I can't  touch the schema on the source table?
>>
>> THanks,
>> Shengjie
>>
>> On 8 Aug 2013, at 00:06, Joanne Chan <jc...@shutterstock.com> wrote:
>>
>> Add an update_time column to the source table and do the incremental load
>> by that update_time column.
>>
>>
>> On Wed, Aug 7, 2013 at 12:04 PM, shengjie min <ke...@gmail.com>wrote:
>>
>>> Hi guys,
>>>
>>> TO simplify my question, Let's say, I have a mysql table called
>>> 'student', looks like this:
>>>
>>> +----+----------+-----+
>>> | id | name     | sex |
>>> +----+----------+-----+
>>> |  1 | Alice       |   0  |
>>> |  2 | Bob         |   1  |
>>> |  3 | Charles  |   1  |
>>> +----+----------+-----+
>>>
>>> I want to import this table to HBase periodically which means I will run
>>> this sqoop job periodically. There are two goals:
>>>
>>> A.  every time there is a new record inserted to mysql table, e.g. (4,
>>> David, 1), I hope my next sqoop import will catch it and put it in HBase.
>>> B. if  there is any updates have been made to mysql rows 1, 2, 3, I want
>>> to have the updates in HBase too after next round sqoop import.
>>>
>>> I checked two types incremental updates sqoop has:  Append mode seems
>>> only satisfied goal A while Last-modified mode will require my mysql table
>>> has a timestamp column for each row(which I don't in real life). I know if
>>> I don't use incremental updates options at all, I can just get way with it
>>> by running a fresh import every time, but if my mysql table is really huge
>>> and fresh import might be a performance killer.
>>>
>>> Is there anyway I can just do incremental updates instead of having to
>>> re-run the whole import to get NEW RECORDS + UPDATES ON OLD ROWS?
>>>
>>>
>>> Shengjie
>>
>>
>>
>>
>> --
>> -- JChan
>>
>>
>>
>
>
> --
> -- JChan
>
>


-- 
All the best,
Shengjie Min

Re: incremental updates mysql to HBase

Posted by Joanne Chan <jc...@shutterstock.com>.
1. No
2. Then, you are out of luck.

Sqoop isn't magic, imagine how would sqoop know what data was updated
without any indicators?


On Wed, Aug 7, 2013 at 12:12 PM, shengjie min <ke...@gmail.com> wrote:

>
>
> - Add an update_time column to the source table and do the incremental
> load by that update_time column.
>
> two questions to that then:
>
> 1. will sqoop job updates that field "update_time" automatically when it
> runs OR my application needs to write that field?
> 2. What if I don't wanna or I can't  touch the schema on the source table?
>
> THanks,
> Shengjie
>
> On 8 Aug 2013, at 00:06, Joanne Chan <jc...@shutterstock.com> wrote:
>
> Add an update_time column to the source table and do the incremental load
> by that update_time column.
>
>
> On Wed, Aug 7, 2013 at 12:04 PM, shengjie min <ke...@gmail.com>wrote:
>
>> Hi guys,
>>
>> TO simplify my question, Let's say, I have a mysql table called
>> 'student', looks like this:
>>
>> +----+----------+-----+
>> | id | name     | sex |
>> +----+----------+-----+
>> |  1 | Alice       |   0  |
>> |  2 | Bob         |   1  |
>> |  3 | Charles  |   1  |
>> +----+----------+-----+
>>
>> I want to import this table to HBase periodically which means I will run
>> this sqoop job periodically. There are two goals:
>>
>> A.  every time there is a new record inserted to mysql table, e.g. (4,
>> David, 1), I hope my next sqoop import will catch it and put it in HBase.
>> B. if  there is any updates have been made to mysql rows 1, 2, 3, I want
>> to have the updates in HBase too after next round sqoop import.
>>
>> I checked two types incremental updates sqoop has:  Append mode seems
>> only satisfied goal A while Last-modified mode will require my mysql table
>> has a timestamp column for each row(which I don't in real life). I know if
>> I don't use incremental updates options at all, I can just get way with it
>> by running a fresh import every time, but if my mysql table is really huge
>> and fresh import might be a performance killer.
>>
>> Is there anyway I can just do incremental updates instead of having to
>> re-run the whole import to get NEW RECORDS + UPDATES ON OLD ROWS?
>>
>>
>> Shengjie
>
>
>
>
> --
> -- JChan
>
>
>


-- 
-- JChan

Re: incremental updates mysql to HBase

Posted by shengjie min <ke...@gmail.com>.

> - Add an update_time column to the source table and do the incremental load by that update_time column.
two questions to that then:

1. will sqoop job updates that field "update_time" automatically when it runs OR my application needs to write that field?
2. What if I don't wanna or I can't  touch the schema on the source table?

THanks,
Shengjie

On 8 Aug 2013, at 00:06, Joanne Chan <jc...@shutterstock.com> wrote:

> Add an update_time column to the source table and do the incremental load by that update_time column.
> 
> 
> On Wed, Aug 7, 2013 at 12:04 PM, shengjie min <ke...@gmail.com> wrote:
> Hi guys,
> 
> TO simplify my question, Let's say, I have a mysql table called 'student', looks like this:
> 
> +----+----------+-----+
> | id | name     | sex |
> +----+----------+-----+
> |  1 | Alice       |   0  |
> |  2 | Bob         |   1  |
> |  3 | Charles  |   1  |
> +----+----------+-----+
> 
> I want to import this table to HBase periodically which means I will run this sqoop job periodically. There are two goals:
> 
> A.  every time there is a new record inserted to mysql table, e.g. (4, David, 1), I hope my next sqoop import will catch it and put it in HBase.
> B. if  there is any updates have been made to mysql rows 1, 2, 3, I want to have the updates in HBase too after next round sqoop import.
> 
> I checked two types incremental updates sqoop has:  Append mode seems only satisfied goal A while Last-modified mode will require my mysql table has a timestamp column for each row(which I don't in real life). I know if I don't use incremental updates options at all, I can just get way with it by running a fresh import every time, but if my mysql table is really huge and fresh import might be a performance killer.
> 
> Is there anyway I can just do incremental updates instead of having to re-run the whole import to get NEW RECORDS + UPDATES ON OLD ROWS?
> 
> 
> Shengjie
> 
> 
> 
> -- 
> -- JChan
> 


Re: incremental updates mysql to HBase

Posted by Joanne Chan <jc...@shutterstock.com>.
Add an update_time column to the source table and do the incremental load
by that update_time column.


On Wed, Aug 7, 2013 at 12:04 PM, shengjie min <ke...@gmail.com> wrote:

> Hi guys,
>
> TO simplify my question, Let's say, I have a mysql table called 'student',
> looks like this:
>
> +----+----------+-----+
> | id | name     | sex |
> +----+----------+-----+
> |  1 | Alice       |   0  |
> |  2 | Bob         |   1  |
> |  3 | Charles  |   1  |
> +----+----------+-----+
>
> I want to import this table to HBase periodically which means I will run
> this sqoop job periodically. There are two goals:
>
> A.  every time there is a new record inserted to mysql table, e.g. (4,
> David, 1), I hope my next sqoop import will catch it and put it in HBase.
> B. if  there is any updates have been made to mysql rows 1, 2, 3, I want
> to have the updates in HBase too after next round sqoop import.
>
> I checked two types incremental updates sqoop has:  Append mode seems only
> satisfied goal A while Last-modified mode will require my mysql table has a
> timestamp column for each row(which I don't in real life). I know if I
> don't use incremental updates options at all, I can just get way with it by
> running a fresh import every time, but if my mysql table is really huge and
> fresh import might be a performance killer.
>
> Is there anyway I can just do incremental updates instead of having to
> re-run the whole import to get NEW RECORDS + UPDATES ON OLD ROWS?
>
>
> Shengjie




-- 
-- JChan