You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by Tanzir Musabbir <tm...@outlook.com> on 2013/05/08 19:00:19 UTC

Using Sqoop incremental import as chunk



Hello everyone,
Is it really possible to import chunk-wise data through sqoop incremental import?
Say I have a table with id 1,2,3..... N (here N is 100) and now I want to import it as chunk. Like1st import: 1,2,3.... 202nd import: 21,22,23.....40last import: 81,82,83....100
I have read about the Sqoop job with incremental import and also know the --last-value parameter but do not know how to pass the chunk size. For the above example, chunk size here is 20.

Any information will be highly appreciated. Thanks in advance.
 		 	   		  

Re: Using Sqoop incremental import as chunk

Posted by Felix GV <fe...@mate1inc.com>.
That's the only way I see you being able to achieve this, yes.

(Assuming you want many separate sequential imports, because if importing
the chunks in parallel is fine with you then you could use a single sqoop
command and let the size of your chunks be a by-product of the number of
mappers you choose.)

--
Felix


On Wed, May 8, 2013 at 2:17 PM, Tanzir Musabbir <tm...@outlook.com>wrote:

> Thanks a lot Felix & Jarcec. So it looks like, if I am running a Oozie
> coordinator job which periodically imports chunk data through Sqoop, before
> calling the Sqoop action I need to change the boundary query value every
> time. Like
>
> --boundary-query 'select 1,20' - for the 1st run
> --boundary-query 'select 21,40' - for the 2nd run
>
> Please correct me if I'm wrong. Thanks again.
>
>
> > Date: Wed, 8 May 2013 11:08:05 -0700
> > From: jarcec@apache.org
> > To: user@sqoop.apache.org
> > Subject: Re: Using Sqoop incremental import as chunk
>
> >
> > Hi Tanzir,
> > incremental import is not working in chunks, it always imports
> everything since last import - e.g. everything from --last-value up. You
> can simulate the chunks if needed using --boundary-query argument as was
> advised by Felix.
> >
> > Jarcec
> >
> > On Wed, May 08, 2013 at 01:46:47PM -0400, Felix GV wrote:
> > > --boundary-query
> > >
> > >
> http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_connecting_to_a_database_server
> > >
> > > --
> > > Felix
> > >
> > >
> > > On Wed, May 8, 2013 at 1:00 PM, Tanzir Musabbir <tmusabbir@outlook.com
> >wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > Is it really possible to import chunk-wise data through sqoop
> incremental
> > > > import?
> > > >
> > > > Say I have a table with id 1,2,3..... N (here N is 100) and now I
> want to
> > > > import it as chunk. Like
> > > > 1st import: 1,2,3.... 20
> > > > 2nd import: 21,22,23.....40
> > > > last import: 81,82,83....100
> > > >
> > > > I have read about the Sqoop job with incremental import and also
> know the
> > > > --last-value parameter but do not know how to pass the chunk size.
> For the
> > > > above example, chunk size here is 20.
> > > >
> > > >
> > > > Any information will be highly appreciated. Thanks in advance.
> > > >
>

RE: Using Sqoop incremental import as chunk

Posted by Tanzir Musabbir <tm...@outlook.com>.
Sure Jarcec,Actually we would like to import data(from oracle) 4-5 times in a day and then process(analytics) them pretty much same number of time. Each of the chunk may have around 10 millions record. The records are continuously added in that table and sometimes for a given time frame, it may cross 10 M. So in that case we will not import all of the records, instead we will import only 10 M records. That's why we are trying to import them as a chunk.
Tanzir

> Date: Wed, 8 May 2013 11:23:25 -0700
> From: jarcec@apache.org
> To: user@sqoop.apache.org
> Subject: Re: Using Sqoop incremental import as chunk
> 
> Hi Tanzir,
> would you mind describing a bit more about your use case? Is there a reason why you do not want your Oozie job to import all missing data?
> 
> Jarcec
> 
> On Thu, May 09, 2013 at 12:17:03AM +0600, Tanzir Musabbir wrote:
> > Thanks a lot Felix & Jarcec. So it looks like, if I am running a Oozie coordinator job which periodically imports chunk data through Sqoop, before calling the Sqoop action I need to change the boundary query value every time. Like
> > --boundary-query 'select 1,20' - for the 1st run--boundary-query 'select 21,40' - for the 2nd run
> > Please correct me if I'm wrong. Thanks again.
> > 
> > > Date: Wed, 8 May 2013 11:08:05 -0700
> > > From: jarcec@apache.org
> > > To: user@sqoop.apache.org
> > > Subject: Re: Using Sqoop incremental import as chunk
> > > 
> > > Hi Tanzir,
> > > incremental import is not working in chunks, it always imports everything since last import - e.g. everything from --last-value up. You can simulate the chunks if needed using --boundary-query argument as was advised by Felix.
> > > 
> > > Jarcec
> > > 
> > > On Wed, May 08, 2013 at 01:46:47PM -0400, Felix GV wrote:
> > > > --boundary-query
> > > > 
> > > > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_connecting_to_a_database_server
> > > > 
> > > > --
> > > > Felix
> > > > 
> > > > 
> > > > On Wed, May 8, 2013 at 1:00 PM, Tanzir Musabbir <tm...@outlook.com>wrote:
> > > > 
> > > > >  Hello everyone,
> > > > >
> > > > > Is it really possible to import chunk-wise data through sqoop incremental
> > > > > import?
> > > > >
> > > > > Say I have a table with id 1,2,3..... N (here N is 100) and now I want to
> > > > > import it as chunk. Like
> > > > > 1st import: 1,2,3.... 20
> > > > > 2nd import: 21,22,23.....40
> > > > > last import: 81,82,83....100
> > > > >
> > > > > I have read about the Sqoop job with incremental import and also know the
> > > > > --last-value parameter but do not know how to pass the chunk size. For the
> > > > > above example, chunk size here is 20.
> > > > >
> > > > >
> > > > > Any information will be highly appreciated. Thanks in advance.
> > > > >
> >  		 	   		  
 		 	   		  

Re: Using Sqoop incremental import as chunk

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Tanzir,
would you mind describing a bit more about your use case? Is there a reason why you do not want your Oozie job to import all missing data?

Jarcec

On Thu, May 09, 2013 at 12:17:03AM +0600, Tanzir Musabbir wrote:
> Thanks a lot Felix & Jarcec. So it looks like, if I am running a Oozie coordinator job which periodically imports chunk data through Sqoop, before calling the Sqoop action I need to change the boundary query value every time. Like
> --boundary-query 'select 1,20' - for the 1st run--boundary-query 'select 21,40' - for the 2nd run
> Please correct me if I'm wrong. Thanks again.
> 
> > Date: Wed, 8 May 2013 11:08:05 -0700
> > From: jarcec@apache.org
> > To: user@sqoop.apache.org
> > Subject: Re: Using Sqoop incremental import as chunk
> > 
> > Hi Tanzir,
> > incremental import is not working in chunks, it always imports everything since last import - e.g. everything from --last-value up. You can simulate the chunks if needed using --boundary-query argument as was advised by Felix.
> > 
> > Jarcec
> > 
> > On Wed, May 08, 2013 at 01:46:47PM -0400, Felix GV wrote:
> > > --boundary-query
> > > 
> > > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_connecting_to_a_database_server
> > > 
> > > --
> > > Felix
> > > 
> > > 
> > > On Wed, May 8, 2013 at 1:00 PM, Tanzir Musabbir <tm...@outlook.com>wrote:
> > > 
> > > >  Hello everyone,
> > > >
> > > > Is it really possible to import chunk-wise data through sqoop incremental
> > > > import?
> > > >
> > > > Say I have a table with id 1,2,3..... N (here N is 100) and now I want to
> > > > import it as chunk. Like
> > > > 1st import: 1,2,3.... 20
> > > > 2nd import: 21,22,23.....40
> > > > last import: 81,82,83....100
> > > >
> > > > I have read about the Sqoop job with incremental import and also know the
> > > > --last-value parameter but do not know how to pass the chunk size. For the
> > > > above example, chunk size here is 20.
> > > >
> > > >
> > > > Any information will be highly appreciated. Thanks in advance.
> > > >
>  		 	   		  

RE: Using Sqoop incremental import as chunk

Posted by Tanzir Musabbir <tm...@outlook.com>.
Thanks a lot Felix & Jarcec. So it looks like, if I am running a Oozie coordinator job which periodically imports chunk data through Sqoop, before calling the Sqoop action I need to change the boundary query value every time. Like
--boundary-query 'select 1,20' - for the 1st run--boundary-query 'select 21,40' - for the 2nd run
Please correct me if I'm wrong. Thanks again.

> Date: Wed, 8 May 2013 11:08:05 -0700
> From: jarcec@apache.org
> To: user@sqoop.apache.org
> Subject: Re: Using Sqoop incremental import as chunk
> 
> Hi Tanzir,
> incremental import is not working in chunks, it always imports everything since last import - e.g. everything from --last-value up. You can simulate the chunks if needed using --boundary-query argument as was advised by Felix.
> 
> Jarcec
> 
> On Wed, May 08, 2013 at 01:46:47PM -0400, Felix GV wrote:
> > --boundary-query
> > 
> > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_connecting_to_a_database_server
> > 
> > --
> > Felix
> > 
> > 
> > On Wed, May 8, 2013 at 1:00 PM, Tanzir Musabbir <tm...@outlook.com>wrote:
> > 
> > >  Hello everyone,
> > >
> > > Is it really possible to import chunk-wise data through sqoop incremental
> > > import?
> > >
> > > Say I have a table with id 1,2,3..... N (here N is 100) and now I want to
> > > import it as chunk. Like
> > > 1st import: 1,2,3.... 20
> > > 2nd import: 21,22,23.....40
> > > last import: 81,82,83....100
> > >
> > > I have read about the Sqoop job with incremental import and also know the
> > > --last-value parameter but do not know how to pass the chunk size. For the
> > > above example, chunk size here is 20.
> > >
> > >
> > > Any information will be highly appreciated. Thanks in advance.
> > >
 		 	   		  

Re: Using Sqoop incremental import as chunk

Posted by Jarek Jarcec Cecho <ja...@apache.org>.
Hi Tanzir,
incremental import is not working in chunks, it always imports everything since last import - e.g. everything from --last-value up. You can simulate the chunks if needed using --boundary-query argument as was advised by Felix.

Jarcec

On Wed, May 08, 2013 at 01:46:47PM -0400, Felix GV wrote:
> --boundary-query
> 
> http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_connecting_to_a_database_server
> 
> --
> Felix
> 
> 
> On Wed, May 8, 2013 at 1:00 PM, Tanzir Musabbir <tm...@outlook.com>wrote:
> 
> >  Hello everyone,
> >
> > Is it really possible to import chunk-wise data through sqoop incremental
> > import?
> >
> > Say I have a table with id 1,2,3..... N (here N is 100) and now I want to
> > import it as chunk. Like
> > 1st import: 1,2,3.... 20
> > 2nd import: 21,22,23.....40
> > last import: 81,82,83....100
> >
> > I have read about the Sqoop job with incremental import and also know the
> > --last-value parameter but do not know how to pass the chunk size. For the
> > above example, chunk size here is 20.
> >
> >
> > Any information will be highly appreciated. Thanks in advance.
> >

Re: Using Sqoop incremental import as chunk

Posted by Felix GV <fe...@mate1inc.com>.
--boundary-query

http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_connecting_to_a_database_server

--
Felix


On Wed, May 8, 2013 at 1:00 PM, Tanzir Musabbir <tm...@outlook.com>wrote:

>  Hello everyone,
>
> Is it really possible to import chunk-wise data through sqoop incremental
> import?
>
> Say I have a table with id 1,2,3..... N (here N is 100) and now I want to
> import it as chunk. Like
> 1st import: 1,2,3.... 20
> 2nd import: 21,22,23.....40
> last import: 81,82,83....100
>
> I have read about the Sqoop job with incremental import and also know the
> --last-value parameter but do not know how to pass the chunk size. For the
> above example, chunk size here is 20.
>
>
> Any information will be highly appreciated. Thanks in advance.
>