You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Guillaume percoclap <pe...@gmail.com> on 2014/04/09 22:01:10 UTC

Re: Import only Delta from SDL DB using Sqoop

Hi thanks for this answer.

So should use this kind of command since my SQL table to import is based on
row ID values:

sqoop import  --check-column "PersonID" --incremental "append" --last-value
30

so it will only import new entries greather than ID 30. But If I want to
automate, of course I cannot check each day what is the current ID and
modify the script manually.
So what would be the best way to automatically only import last entries
greather than the last ID imported?
Thanks for all



2014-03-30 22:41 GMT+02:00 Peyman Mohajerian <mo...@gmail.com>:

> There are two ways, based on id or timestamp, here is the clear
> documentation:
>
> https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports
>
>
> On Sun, Mar 30, 2014 at 12:58 PM, Guillaume percoclap
> <pe...@gmail.com>wrote:
>
> > Hi
> >
> > Using tool SQOOP, how could I import only differential data generated
> since
> > the last import for SDL DB please?
> >
> > Let say I run first this command: sqoop import-all-tables --connect
> > 'jdbc:sqlserver://xxxxssxs.com ;username=user1;password=xxxxx'
> > --hive-import
> >
> > and the next day I want to import only modified/added/deleted data, is it
> > existing a specific command allowing to do that?
> >
> > Thanks in advance
> >
> > Perco
> >
>

Re: Import only Delta from SDL DB using Sqoop

Posted by Guillaume percoclap <pe...@gmail.com>.
exact it works well. concerning SQL DB import using sqoop for a complete
DB, what are best practices?
Launch a sqoop job for each table indicating what it the incremental mode?
Thanks for your helpful advices




2014-04-09 22:13 GMT+02:00 Peyman Mohajerian <mo...@gmail.com>:

> In the same document/site it talks about 'saved jobs' basically a
> placeholder for the last value. You can of course store the last value on
> your own and pass it each time.
>
>
> On Wed, Apr 9, 2014 at 4:01 PM, Guillaume percoclap <percoclap@gmail.com
> >wrote:
>
> > Hi thanks for this answer.
> >
> > So should use this kind of command since my SQL table to import is based
> on
> > row ID values:
> >
> > sqoop import  --check-column "PersonID" --incremental "append"
> --last-value
> > 30
> >
> > so it will only import new entries greather than ID 30. But If I want to
> > automate, of course I cannot check each day what is the current ID and
> > modify the script manually.
> > So what would be the best way to automatically only import last entries
> > greather than the last ID imported?
> > Thanks for all
> >
> >
> >
> > 2014-03-30 22:41 GMT+02:00 Peyman Mohajerian <mo...@gmail.com>:
> >
> > > There are two ways, based on id or timestamp, here is the clear
> > > documentation:
> > >
> > >
> >
> https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports
> > >
> > >
> > > On Sun, Mar 30, 2014 at 12:58 PM, Guillaume percoclap
> > > <pe...@gmail.com>wrote:
> > >
> > > > Hi
> > > >
> > > > Using tool SQOOP, how could I import only differential data generated
> > > since
> > > > the last import for SDL DB please?
> > > >
> > > > Let say I run first this command: sqoop import-all-tables --connect
> > > > 'jdbc:sqlserver://xxxxssxs.com ;username=user1;password=xxxxx'
> > > > --hive-import
> > > >
> > > > and the next day I want to import only modified/added/deleted data,
> is
> > it
> > > > existing a specific command allowing to do that?
> > > >
> > > > Thanks in advance
> > > >
> > > > Perco
> > > >
> > >
> >
>

Re: Import only Delta from SDL DB using Sqoop

Posted by Peyman Mohajerian <mo...@gmail.com>.
In the same document/site it talks about 'saved jobs' basically a
placeholder for the last value. You can of course store the last value on
your own and pass it each time.


On Wed, Apr 9, 2014 at 4:01 PM, Guillaume percoclap <pe...@gmail.com>wrote:

> Hi thanks for this answer.
>
> So should use this kind of command since my SQL table to import is based on
> row ID values:
>
> sqoop import  --check-column "PersonID" --incremental "append" --last-value
> 30
>
> so it will only import new entries greather than ID 30. But If I want to
> automate, of course I cannot check each day what is the current ID and
> modify the script manually.
> So what would be the best way to automatically only import last entries
> greather than the last ID imported?
> Thanks for all
>
>
>
> 2014-03-30 22:41 GMT+02:00 Peyman Mohajerian <mo...@gmail.com>:
>
> > There are two ways, based on id or timestamp, here is the clear
> > documentation:
> >
> >
> https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports
> >
> >
> > On Sun, Mar 30, 2014 at 12:58 PM, Guillaume percoclap
> > <pe...@gmail.com>wrote:
> >
> > > Hi
> > >
> > > Using tool SQOOP, how could I import only differential data generated
> > since
> > > the last import for SDL DB please?
> > >
> > > Let say I run first this command: sqoop import-all-tables --connect
> > > 'jdbc:sqlserver://xxxxssxs.com ;username=user1;password=xxxxx'
> > > --hive-import
> > >
> > > and the next day I want to import only modified/added/deleted data, is
> it
> > > existing a specific command allowing to do that?
> > >
> > > Thanks in advance
> > >
> > > Perco
> > >
> >
>