You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Jagrut Sharma <ja...@gmail.com> on 2017/07/19 19:02:58 UTC

Getting upper bound in --incremental mode

Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v 1.4.2)
generates a query like:
WHERE column >= last_modified_time and column < current_time

The --last-value is set to the current_time and gets used for the next run.

Here, the upper bound is always set to the current_time. In some cases,
this upper bound is required to be taken from the database table column
itself. So, the query is required of the form:
WHERE column >= last_modified_time and column < max_time_in_db_table_column

And the --last-value for next run needs to be set as
the max_time_in_db_table_column (and not the current_time).

Is this possible to set while running sqoop?  If no, is there any
workaround suggested for this?

Thanks a lot.
-- 
Jagrut

Re: Getting upper bound in --incremental mode

Posted by Markus Kemper <ma...@cloudera.com>.
I am not aware of a method to derive the --last-value from the target RDBMS
table.  You could use Sqoop eval to select the value and pass it to
--last-value however that is not very elegant.


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Thu, Jul 20, 2017 at 6:30 PM, Jagrut Sharma <ja...@gmail.com>
wrote:

> Hi Markus - I'm using Sqoop v1, but with a custom metastore (not the one
> that Sqoop provides). My original question was on deriving the --last-value
> based on table column values (and not time of job execution).
>
> Thanks.
> --
> Jagrut
>
> On Thu, Jul 20, 2017 at 2:48 PM, Markus Kemper <ma...@cloudera.com>
> wrote:
>
>> Hey Jagrut,
>>
>> Are you using the Sqoop1 Metastore job tool (assuming yes)?
>> Are you wanting to override the current stored --last-value when
>> executing the Sqoop job?
>>
>>
>>
>> Markus Kemper
>> Customer Operations Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>>
>> On Thu, Jul 20, 2017 at 5:16 PM, Jagrut Sharma <ja...@gmail.com>
>> wrote:
>>
>>> Hi Markus - The question was that --incremental with --lastmodified
>>> option always takes the current time as the upper bound, and this gets
>>> stored as the --last-value for the next run.
>>>
>>> In certain cases, it is desirable that the upper bound should come from
>>> the actual column values, and that should get set for the --last-value for
>>> next run.
>>> -
>>> Jagrut
>>>
>>>
>>>
>>> On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <ma...@cloudera.com>
>>> wrote:
>>>
>>>> Hey Jagrut,
>>>>
>>>> Can you elaborate more about the problem you are facing and what you
>>>> mean
>>>> by (Is this possible to set while running sqoop?).
>>>>
>>>>
>>>> Markus Kemper
>>>> Customer Operations Engineer
>>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>>
>>>>
>>>>
>>>> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Tony - I was under the assumption that append mode will not work
>>>> for
>>>> > timestamp column. But I gave it a try after your reply, and it works.
>>>> And
>>>> > it gets the upper bound from the database itself. Thanks.
>>>> >
>>>> > --
>>>> > Jagrut
>>>> >
>>>> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io>
>>>> wrote:
>>>> >
>>>> >> Does `--incremental append` work for you?
>>>> >>
>>>> >> > You should specify append mode when importing a table where new
>>>> rows
>>>> >> are continually being added with increasing row id values
>>>> >>
>>>> >> Tony
>>>> >>
>>>> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <jagrutsharma@gmail.com
>>>> >
>>>> >> wrote:
>>>> >> >
>>>> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop
>>>> (v
>>>> >> 1.4.2)
>>>> >> > generates a query like:
>>>> >> > WHERE column >= last_modified_time and column < current_time
>>>> >> >
>>>> >> > The --last-value is set to the current_time and gets used for the
>>>> next
>>>> >> run.
>>>> >> >
>>>> >> > Here, the upper bound is always set to the current_time. In some
>>>> cases,
>>>> >> > this upper bound is required to be taken from the database table
>>>> column
>>>> >> > itself. So, the query is required of the form:
>>>> >> > WHERE column >= last_modified_time and column <
>>>> >> max_time_in_db_table_column
>>>> >> >
>>>> >> > And the --last-value for next run needs to be set as
>>>> >> > the max_time_in_db_table_column (and not the current_time).
>>>> >> >
>>>> >> > Is this possible to set while running sqoop?  If no, is there any
>>>> >> > workaround suggested for this?
>>>> >> >
>>>> >> > Thanks a lot.
>>>> >> > --
>>>> >> > Jagrut
>>>> >>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Jagrut
>>>
>>>
>>
>
>
> --
> Jagrut
>
>

Re: Getting upper bound in --incremental mode

Posted by Mario Amatucci <ma...@gmail.com>.
ciao Jagrut,
what about invoking sqoop
playing a bit with WHERE $CONDITIONS ...  in your script
you could put some smart logic and filter the data regardless the actual
value stored in the metastore (just an idea) hope it's not a dead end track
for your use case
mario



*Kind Regards,*Mario Amatucci

On 21 July 2017 at 00:30, Jagrut Sharma <ja...@gmail.com> wrote:

> Hi Markus - I'm using Sqoop v1, but with a custom metastore (not the one
> that Sqoop provides). My original question was on deriving the --last-value
> based on table column values (and not time of job execution).
>
> Thanks.
> --
> Jagrut
>
> On Thu, Jul 20, 2017 at 2:48 PM, Markus Kemper <ma...@cloudera.com>
> wrote:
>
>> Hey Jagrut,
>>
>> Are you using the Sqoop1 Metastore job tool (assuming yes)?
>> Are you wanting to override the current stored --last-value when
>> executing the Sqoop job?
>>
>>
>>
>> Markus Kemper
>> Customer Operations Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>>
>> On Thu, Jul 20, 2017 at 5:16 PM, Jagrut Sharma <ja...@gmail.com>
>> wrote:
>>
>>> Hi Markus - The question was that --incremental with --lastmodified
>>> option always takes the current time as the upper bound, and this gets
>>> stored as the --last-value for the next run.
>>>
>>> In certain cases, it is desirable that the upper bound should come from
>>> the actual column values, and that should get set for the --last-value for
>>> next run.
>>> -
>>> Jagrut
>>>
>>>
>>>
>>> On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <ma...@cloudera.com>
>>> wrote:
>>>
>>>> Hey Jagrut,
>>>>
>>>> Can you elaborate more about the problem you are facing and what you
>>>> mean
>>>> by (Is this possible to set while running sqoop?).
>>>>
>>>>
>>>> Markus Kemper
>>>> Customer Operations Engineer
>>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>>
>>>>
>>>>
>>>> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Tony - I was under the assumption that append mode will not work
>>>> for
>>>> > timestamp column. But I gave it a try after your reply, and it works.
>>>> And
>>>> > it gets the upper bound from the database itself. Thanks.
>>>> >
>>>> > --
>>>> > Jagrut
>>>> >
>>>> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io>
>>>> wrote:
>>>> >
>>>> >> Does `--incremental append` work for you?
>>>> >>
>>>> >> > You should specify append mode when importing a table where new
>>>> rows
>>>> >> are continually being added with increasing row id values
>>>> >>
>>>> >> Tony
>>>> >>
>>>> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <jagrutsharma@gmail.com
>>>> >
>>>> >> wrote:
>>>> >> >
>>>> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop
>>>> (v
>>>> >> 1.4.2)
>>>> >> > generates a query like:
>>>> >> > WHERE column >= last_modified_time and column < current_time
>>>> >> >
>>>> >> > The --last-value is set to the current_time and gets used for the
>>>> next
>>>> >> run.
>>>> >> >
>>>> >> > Here, the upper bound is always set to the current_time. In some
>>>> cases,
>>>> >> > this upper bound is required to be taken from the database table
>>>> column
>>>> >> > itself. So, the query is required of the form:
>>>> >> > WHERE column >= last_modified_time and column <
>>>> >> max_time_in_db_table_column
>>>> >> >
>>>> >> > And the --last-value for next run needs to be set as
>>>> >> > the max_time_in_db_table_column (and not the current_time).
>>>> >> >
>>>> >> > Is this possible to set while running sqoop?  If no, is there any
>>>> >> > workaround suggested for this?
>>>> >> >
>>>> >> > Thanks a lot.
>>>> >> > --
>>>> >> > Jagrut
>>>> >>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Jagrut
>>>
>>>
>>
>
>
> --
> Jagrut
>
>

Re: Getting upper bound in --incremental mode

Posted by Markus Kemper <ma...@cloudera.com>.
I am not aware of a method to derive the --last-value from the target RDBMS
table.  You could use Sqoop eval to select the value and pass it to
--last-value however that is not very elegant.


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Thu, Jul 20, 2017 at 6:30 PM, Jagrut Sharma <ja...@gmail.com>
wrote:

> Hi Markus - I'm using Sqoop v1, but with a custom metastore (not the one
> that Sqoop provides). My original question was on deriving the --last-value
> based on table column values (and not time of job execution).
>
> Thanks.
> --
> Jagrut
>
> On Thu, Jul 20, 2017 at 2:48 PM, Markus Kemper <ma...@cloudera.com>
> wrote:
>
>> Hey Jagrut,
>>
>> Are you using the Sqoop1 Metastore job tool (assuming yes)?
>> Are you wanting to override the current stored --last-value when
>> executing the Sqoop job?
>>
>>
>>
>> Markus Kemper
>> Customer Operations Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>>
>> On Thu, Jul 20, 2017 at 5:16 PM, Jagrut Sharma <ja...@gmail.com>
>> wrote:
>>
>>> Hi Markus - The question was that --incremental with --lastmodified
>>> option always takes the current time as the upper bound, and this gets
>>> stored as the --last-value for the next run.
>>>
>>> In certain cases, it is desirable that the upper bound should come from
>>> the actual column values, and that should get set for the --last-value for
>>> next run.
>>> -
>>> Jagrut
>>>
>>>
>>>
>>> On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <ma...@cloudera.com>
>>> wrote:
>>>
>>>> Hey Jagrut,
>>>>
>>>> Can you elaborate more about the problem you are facing and what you
>>>> mean
>>>> by (Is this possible to set while running sqoop?).
>>>>
>>>>
>>>> Markus Kemper
>>>> Customer Operations Engineer
>>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>>
>>>>
>>>>
>>>> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Tony - I was under the assumption that append mode will not work
>>>> for
>>>> > timestamp column. But I gave it a try after your reply, and it works.
>>>> And
>>>> > it gets the upper bound from the database itself. Thanks.
>>>> >
>>>> > --
>>>> > Jagrut
>>>> >
>>>> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io>
>>>> wrote:
>>>> >
>>>> >> Does `--incremental append` work for you?
>>>> >>
>>>> >> > You should specify append mode when importing a table where new
>>>> rows
>>>> >> are continually being added with increasing row id values
>>>> >>
>>>> >> Tony
>>>> >>
>>>> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <jagrutsharma@gmail.com
>>>> >
>>>> >> wrote:
>>>> >> >
>>>> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop
>>>> (v
>>>> >> 1.4.2)
>>>> >> > generates a query like:
>>>> >> > WHERE column >= last_modified_time and column < current_time
>>>> >> >
>>>> >> > The --last-value is set to the current_time and gets used for the
>>>> next
>>>> >> run.
>>>> >> >
>>>> >> > Here, the upper bound is always set to the current_time. In some
>>>> cases,
>>>> >> > this upper bound is required to be taken from the database table
>>>> column
>>>> >> > itself. So, the query is required of the form:
>>>> >> > WHERE column >= last_modified_time and column <
>>>> >> max_time_in_db_table_column
>>>> >> >
>>>> >> > And the --last-value for next run needs to be set as
>>>> >> > the max_time_in_db_table_column (and not the current_time).
>>>> >> >
>>>> >> > Is this possible to set while running sqoop?  If no, is there any
>>>> >> > workaround suggested for this?
>>>> >> >
>>>> >> > Thanks a lot.
>>>> >> > --
>>>> >> > Jagrut
>>>> >>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Jagrut
>>>
>>>
>>
>
>
> --
> Jagrut
>
>

Re: Getting upper bound in --incremental mode

Posted by Jagrut Sharma <ja...@gmail.com>.
Hi Markus - I'm using Sqoop v1, but with a custom metastore (not the one
that Sqoop provides). My original question was on deriving the --last-value
based on table column values (and not time of job execution).

Thanks.
--
Jagrut

On Thu, Jul 20, 2017 at 2:48 PM, Markus Kemper <ma...@cloudera.com> wrote:

> Hey Jagrut,
>
> Are you using the Sqoop1 Metastore job tool (assuming yes)?
> Are you wanting to override the current stored --last-value when executing
> the Sqoop job?
>
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
> On Thu, Jul 20, 2017 at 5:16 PM, Jagrut Sharma <ja...@gmail.com>
> wrote:
>
>> Hi Markus - The question was that --incremental with --lastmodified
>> option always takes the current time as the upper bound, and this gets
>> stored as the --last-value for the next run.
>>
>> In certain cases, it is desirable that the upper bound should come from
>> the actual column values, and that should get set for the --last-value for
>> next run.
>> -
>> Jagrut
>>
>>
>>
>> On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <ma...@cloudera.com>
>> wrote:
>>
>>> Hey Jagrut,
>>>
>>> Can you elaborate more about the problem you are facing and what you mean
>>> by (Is this possible to set while running sqoop?).
>>>
>>>
>>> Markus Kemper
>>> Customer Operations Engineer
>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>
>>>
>>>
>>> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
>>> wrote:
>>>
>>> > Hi Tony - I was under the assumption that append mode will not work for
>>> > timestamp column. But I gave it a try after your reply, and it works.
>>> And
>>> > it gets the upper bound from the database itself. Thanks.
>>> >
>>> > --
>>> > Jagrut
>>> >
>>> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io>
>>> wrote:
>>> >
>>> >> Does `--incremental append` work for you?
>>> >>
>>> >> > You should specify append mode when importing a table where new rows
>>> >> are continually being added with increasing row id values
>>> >>
>>> >> Tony
>>> >>
>>> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com>
>>> >> wrote:
>>> >> >
>>> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
>>> >> 1.4.2)
>>> >> > generates a query like:
>>> >> > WHERE column >= last_modified_time and column < current_time
>>> >> >
>>> >> > The --last-value is set to the current_time and gets used for the
>>> next
>>> >> run.
>>> >> >
>>> >> > Here, the upper bound is always set to the current_time. In some
>>> cases,
>>> >> > this upper bound is required to be taken from the database table
>>> column
>>> >> > itself. So, the query is required of the form:
>>> >> > WHERE column >= last_modified_time and column <
>>> >> max_time_in_db_table_column
>>> >> >
>>> >> > And the --last-value for next run needs to be set as
>>> >> > the max_time_in_db_table_column (and not the current_time).
>>> >> >
>>> >> > Is this possible to set while running sqoop?  If no, is there any
>>> >> > workaround suggested for this?
>>> >> >
>>> >> > Thanks a lot.
>>> >> > --
>>> >> > Jagrut
>>> >>
>>> >>
>>> >
>>>
>>
>>
>>
>> --
>> Jagrut
>>
>>
>


-- 
Jagrut

Re: Getting upper bound in --incremental mode

Posted by Jagrut Sharma <ja...@gmail.com>.
Hi Markus - I'm using Sqoop v1, but with a custom metastore (not the one
that Sqoop provides). My original question was on deriving the --last-value
based on table column values (and not time of job execution).

Thanks.
--
Jagrut

On Thu, Jul 20, 2017 at 2:48 PM, Markus Kemper <ma...@cloudera.com> wrote:

> Hey Jagrut,
>
> Are you using the Sqoop1 Metastore job tool (assuming yes)?
> Are you wanting to override the current stored --last-value when executing
> the Sqoop job?
>
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
> On Thu, Jul 20, 2017 at 5:16 PM, Jagrut Sharma <ja...@gmail.com>
> wrote:
>
>> Hi Markus - The question was that --incremental with --lastmodified
>> option always takes the current time as the upper bound, and this gets
>> stored as the --last-value for the next run.
>>
>> In certain cases, it is desirable that the upper bound should come from
>> the actual column values, and that should get set for the --last-value for
>> next run.
>> -
>> Jagrut
>>
>>
>>
>> On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <ma...@cloudera.com>
>> wrote:
>>
>>> Hey Jagrut,
>>>
>>> Can you elaborate more about the problem you are facing and what you mean
>>> by (Is this possible to set while running sqoop?).
>>>
>>>
>>> Markus Kemper
>>> Customer Operations Engineer
>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>
>>>
>>>
>>> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
>>> wrote:
>>>
>>> > Hi Tony - I was under the assumption that append mode will not work for
>>> > timestamp column. But I gave it a try after your reply, and it works.
>>> And
>>> > it gets the upper bound from the database itself. Thanks.
>>> >
>>> > --
>>> > Jagrut
>>> >
>>> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io>
>>> wrote:
>>> >
>>> >> Does `--incremental append` work for you?
>>> >>
>>> >> > You should specify append mode when importing a table where new rows
>>> >> are continually being added with increasing row id values
>>> >>
>>> >> Tony
>>> >>
>>> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com>
>>> >> wrote:
>>> >> >
>>> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
>>> >> 1.4.2)
>>> >> > generates a query like:
>>> >> > WHERE column >= last_modified_time and column < current_time
>>> >> >
>>> >> > The --last-value is set to the current_time and gets used for the
>>> next
>>> >> run.
>>> >> >
>>> >> > Here, the upper bound is always set to the current_time. In some
>>> cases,
>>> >> > this upper bound is required to be taken from the database table
>>> column
>>> >> > itself. So, the query is required of the form:
>>> >> > WHERE column >= last_modified_time and column <
>>> >> max_time_in_db_table_column
>>> >> >
>>> >> > And the --last-value for next run needs to be set as
>>> >> > the max_time_in_db_table_column (and not the current_time).
>>> >> >
>>> >> > Is this possible to set while running sqoop?  If no, is there any
>>> >> > workaround suggested for this?
>>> >> >
>>> >> > Thanks a lot.
>>> >> > --
>>> >> > Jagrut
>>> >>
>>> >>
>>> >
>>>
>>
>>
>>
>> --
>> Jagrut
>>
>>
>


-- 
Jagrut

Re: Getting upper bound in --incremental mode

Posted by Markus Kemper <ma...@cloudera.com>.
Hey Jagrut,

Are you using the Sqoop1 Metastore job tool (assuming yes)?
Are you wanting to override the current stored --last-value when executing
the Sqoop job?



Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Thu, Jul 20, 2017 at 5:16 PM, Jagrut Sharma <ja...@gmail.com>
wrote:

> Hi Markus - The question was that --incremental with --lastmodified option
> always takes the current time as the upper bound, and this gets stored as
> the --last-value for the next run.
>
> In certain cases, it is desirable that the upper bound should come from
> the actual column values, and that should get set for the --last-value for
> next run.
> -
> Jagrut
>
>
>
> On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <ma...@cloudera.com>
> wrote:
>
>> Hey Jagrut,
>>
>> Can you elaborate more about the problem you are facing and what you mean
>> by (Is this possible to set while running sqoop?).
>>
>>
>> Markus Kemper
>> Customer Operations Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>>
>>
>> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
>> wrote:
>>
>> > Hi Tony - I was under the assumption that append mode will not work for
>> > timestamp column. But I gave it a try after your reply, and it works.
>> And
>> > it gets the upper bound from the database itself. Thanks.
>> >
>> > --
>> > Jagrut
>> >
>> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io> wrote:
>> >
>> >> Does `--incremental append` work for you?
>> >>
>> >> > You should specify append mode when importing a table where new rows
>> >> are continually being added with increasing row id values
>> >>
>> >> Tony
>> >>
>> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com>
>> >> wrote:
>> >> >
>> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
>> >> 1.4.2)
>> >> > generates a query like:
>> >> > WHERE column >= last_modified_time and column < current_time
>> >> >
>> >> > The --last-value is set to the current_time and gets used for the
>> next
>> >> run.
>> >> >
>> >> > Here, the upper bound is always set to the current_time. In some
>> cases,
>> >> > this upper bound is required to be taken from the database table
>> column
>> >> > itself. So, the query is required of the form:
>> >> > WHERE column >= last_modified_time and column <
>> >> max_time_in_db_table_column
>> >> >
>> >> > And the --last-value for next run needs to be set as
>> >> > the max_time_in_db_table_column (and not the current_time).
>> >> >
>> >> > Is this possible to set while running sqoop?  If no, is there any
>> >> > workaround suggested for this?
>> >> >
>> >> > Thanks a lot.
>> >> > --
>> >> > Jagrut
>> >>
>> >>
>> >
>>
>
>
>
> --
> Jagrut
>
>

Re: Getting upper bound in --incremental mode

Posted by Markus Kemper <ma...@cloudera.com>.
Hey Jagrut,

Are you using the Sqoop1 Metastore job tool (assuming yes)?
Are you wanting to override the current stored --last-value when executing
the Sqoop job?



Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Thu, Jul 20, 2017 at 5:16 PM, Jagrut Sharma <ja...@gmail.com>
wrote:

> Hi Markus - The question was that --incremental with --lastmodified option
> always takes the current time as the upper bound, and this gets stored as
> the --last-value for the next run.
>
> In certain cases, it is desirable that the upper bound should come from
> the actual column values, and that should get set for the --last-value for
> next run.
> -
> Jagrut
>
>
>
> On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <ma...@cloudera.com>
> wrote:
>
>> Hey Jagrut,
>>
>> Can you elaborate more about the problem you are facing and what you mean
>> by (Is this possible to set while running sqoop?).
>>
>>
>> Markus Kemper
>> Customer Operations Engineer
>> [image: www.cloudera.com] <http://www.cloudera.com>
>>
>>
>>
>> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
>> wrote:
>>
>> > Hi Tony - I was under the assumption that append mode will not work for
>> > timestamp column. But I gave it a try after your reply, and it works.
>> And
>> > it gets the upper bound from the database itself. Thanks.
>> >
>> > --
>> > Jagrut
>> >
>> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io> wrote:
>> >
>> >> Does `--incremental append` work for you?
>> >>
>> >> > You should specify append mode when importing a table where new rows
>> >> are continually being added with increasing row id values
>> >>
>> >> Tony
>> >>
>> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com>
>> >> wrote:
>> >> >
>> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
>> >> 1.4.2)
>> >> > generates a query like:
>> >> > WHERE column >= last_modified_time and column < current_time
>> >> >
>> >> > The --last-value is set to the current_time and gets used for the
>> next
>> >> run.
>> >> >
>> >> > Here, the upper bound is always set to the current_time. In some
>> cases,
>> >> > this upper bound is required to be taken from the database table
>> column
>> >> > itself. So, the query is required of the form:
>> >> > WHERE column >= last_modified_time and column <
>> >> max_time_in_db_table_column
>> >> >
>> >> > And the --last-value for next run needs to be set as
>> >> > the max_time_in_db_table_column (and not the current_time).
>> >> >
>> >> > Is this possible to set while running sqoop?  If no, is there any
>> >> > workaround suggested for this?
>> >> >
>> >> > Thanks a lot.
>> >> > --
>> >> > Jagrut
>> >>
>> >>
>> >
>>
>
>
>
> --
> Jagrut
>
>

Re: Getting upper bound in --incremental mode

Posted by Jagrut Sharma <ja...@gmail.com>.
Hi Markus - The question was that --incremental with --lastmodified option
always takes the current time as the upper bound, and this gets stored as
the --last-value for the next run.

In certain cases, it is desirable that the upper bound should come from the
actual column values, and that should get set for the --last-value for next
run.
-
Jagrut



On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <ma...@cloudera.com> wrote:

> Hey Jagrut,
>
> Can you elaborate more about the problem you are facing and what you mean
> by (Is this possible to set while running sqoop?).
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
> wrote:
>
> > Hi Tony - I was under the assumption that append mode will not work for
> > timestamp column. But I gave it a try after your reply, and it works. And
> > it gets the upper bound from the database itself. Thanks.
> >
> > --
> > Jagrut
> >
> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io> wrote:
> >
> >> Does `--incremental append` work for you?
> >>
> >> > You should specify append mode when importing a table where new rows
> >> are continually being added with increasing row id values
> >>
> >> Tony
> >>
> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com>
> >> wrote:
> >> >
> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
> >> 1.4.2)
> >> > generates a query like:
> >> > WHERE column >= last_modified_time and column < current_time
> >> >
> >> > The --last-value is set to the current_time and gets used for the next
> >> run.
> >> >
> >> > Here, the upper bound is always set to the current_time. In some
> cases,
> >> > this upper bound is required to be taken from the database table
> column
> >> > itself. So, the query is required of the form:
> >> > WHERE column >= last_modified_time and column <
> >> max_time_in_db_table_column
> >> >
> >> > And the --last-value for next run needs to be set as
> >> > the max_time_in_db_table_column (and not the current_time).
> >> >
> >> > Is this possible to set while running sqoop?  If no, is there any
> >> > workaround suggested for this?
> >> >
> >> > Thanks a lot.
> >> > --
> >> > Jagrut
> >>
> >>
> >
>



-- 
Jagrut

Re: Getting upper bound in --incremental mode

Posted by Jagrut Sharma <ja...@gmail.com>.
Hi Markus - The question was that --incremental with --lastmodified option
always takes the current time as the upper bound, and this gets stored as
the --last-value for the next run.

In certain cases, it is desirable that the upper bound should come from the
actual column values, and that should get set for the --last-value for next
run.
-
Jagrut



On Wed, Jul 19, 2017 at 2:56 PM, Markus Kemper <ma...@cloudera.com> wrote:

> Hey Jagrut,
>
> Can you elaborate more about the problem you are facing and what you mean
> by (Is this possible to set while running sqoop?).
>
>
> Markus Kemper
> Customer Operations Engineer
> [image: www.cloudera.com] <http://www.cloudera.com>
>
>
> On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
> wrote:
>
> > Hi Tony - I was under the assumption that append mode will not work for
> > timestamp column. But I gave it a try after your reply, and it works. And
> > it gets the upper bound from the database itself. Thanks.
> >
> > --
> > Jagrut
> >
> > On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io> wrote:
> >
> >> Does `--incremental append` work for you?
> >>
> >> > You should specify append mode when importing a table where new rows
> >> are continually being added with increasing row id values
> >>
> >> Tony
> >>
> >> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com>
> >> wrote:
> >> >
> >> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
> >> 1.4.2)
> >> > generates a query like:
> >> > WHERE column >= last_modified_time and column < current_time
> >> >
> >> > The --last-value is set to the current_time and gets used for the next
> >> run.
> >> >
> >> > Here, the upper bound is always set to the current_time. In some
> cases,
> >> > this upper bound is required to be taken from the database table
> column
> >> > itself. So, the query is required of the form:
> >> > WHERE column >= last_modified_time and column <
> >> max_time_in_db_table_column
> >> >
> >> > And the --last-value for next run needs to be set as
> >> > the max_time_in_db_table_column (and not the current_time).
> >> >
> >> > Is this possible to set while running sqoop?  If no, is there any
> >> > workaround suggested for this?
> >> >
> >> > Thanks a lot.
> >> > --
> >> > Jagrut
> >>
> >>
> >
>



-- 
Jagrut

Re: Getting upper bound in --incremental mode

Posted by Markus Kemper <ma...@cloudera.com>.
Hey Jagrut,

Can you elaborate more about the problem you are facing and what you mean
by (Is this possible to set while running sqoop?).


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
wrote:

> Hi Tony - I was under the assumption that append mode will not work for
> timestamp column. But I gave it a try after your reply, and it works. And
> it gets the upper bound from the database itself. Thanks.
>
> --
> Jagrut
>
> On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io> wrote:
>
>> Does `--incremental append` work for you?
>>
>> > You should specify append mode when importing a table where new rows
>> are continually being added with increasing row id values
>>
>> Tony
>>
>> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com>
>> wrote:
>> >
>> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
>> 1.4.2)
>> > generates a query like:
>> > WHERE column >= last_modified_time and column < current_time
>> >
>> > The --last-value is set to the current_time and gets used for the next
>> run.
>> >
>> > Here, the upper bound is always set to the current_time. In some cases,
>> > this upper bound is required to be taken from the database table column
>> > itself. So, the query is required of the form:
>> > WHERE column >= last_modified_time and column <
>> max_time_in_db_table_column
>> >
>> > And the --last-value for next run needs to be set as
>> > the max_time_in_db_table_column (and not the current_time).
>> >
>> > Is this possible to set while running sqoop?  If no, is there any
>> > workaround suggested for this?
>> >
>> > Thanks a lot.
>> > --
>> > Jagrut
>>
>>
>

Re: Getting upper bound in --incremental mode

Posted by Markus Kemper <ma...@cloudera.com>.
Hey Jagrut,

Can you elaborate more about the problem you are facing and what you mean
by (Is this possible to set while running sqoop?).


Markus Kemper
Customer Operations Engineer
[image: www.cloudera.com] <http://www.cloudera.com>


On Wed, Jul 19, 2017 at 5:43 PM, Jagrut Sharma <ja...@gmail.com>
wrote:

> Hi Tony - I was under the assumption that append mode will not work for
> timestamp column. But I gave it a try after your reply, and it works. And
> it gets the upper bound from the database itself. Thanks.
>
> --
> Jagrut
>
> On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io> wrote:
>
>> Does `--incremental append` work for you?
>>
>> > You should specify append mode when importing a table where new rows
>> are continually being added with increasing row id values
>>
>> Tony
>>
>> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com>
>> wrote:
>> >
>> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
>> 1.4.2)
>> > generates a query like:
>> > WHERE column >= last_modified_time and column < current_time
>> >
>> > The --last-value is set to the current_time and gets used for the next
>> run.
>> >
>> > Here, the upper bound is always set to the current_time. In some cases,
>> > this upper bound is required to be taken from the database table column
>> > itself. So, the query is required of the form:
>> > WHERE column >= last_modified_time and column <
>> max_time_in_db_table_column
>> >
>> > And the --last-value for next run needs to be set as
>> > the max_time_in_db_table_column (and not the current_time).
>> >
>> > Is this possible to set while running sqoop?  If no, is there any
>> > workaround suggested for this?
>> >
>> > Thanks a lot.
>> > --
>> > Jagrut
>>
>>
>

Re: Getting upper bound in --incremental mode

Posted by Jagrut Sharma <ja...@gmail.com>.
Hi Tony - I was under the assumption that append mode will not work for
timestamp column. But I gave it a try after your reply, and it works. And
it gets the upper bound from the database itself. Thanks.

--
Jagrut

On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io> wrote:

> Does `--incremental append` work for you?
>
> > You should specify append mode when importing a table where new rows are
> continually being added with increasing row id values
>
> Tony
>
> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com>
> wrote:
> >
> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
> 1.4.2)
> > generates a query like:
> > WHERE column >= last_modified_time and column < current_time
> >
> > The --last-value is set to the current_time and gets used for the next
> run.
> >
> > Here, the upper bound is always set to the current_time. In some cases,
> > this upper bound is required to be taken from the database table column
> > itself. So, the query is required of the form:
> > WHERE column >= last_modified_time and column <
> max_time_in_db_table_column
> >
> > And the --last-value for next run needs to be set as
> > the max_time_in_db_table_column (and not the current_time).
> >
> > Is this possible to set while running sqoop?  If no, is there any
> > workaround suggested for this?
> >
> > Thanks a lot.
> > --
> > Jagrut
>
>

Re: Getting upper bound in --incremental mode

Posted by Jagrut Sharma <ja...@gmail.com>.
Hi Tony - I was under the assumption that append mode will not work for
timestamp column. But I gave it a try after your reply, and it works. And
it gets the upper bound from the database itself. Thanks.

--
Jagrut

On Wed, Jul 19, 2017 at 12:18 PM, Tony Foerster <to...@phdata.io> wrote:

> Does `--incremental append` work for you?
>
> > You should specify append mode when importing a table where new rows are
> continually being added with increasing row id values
>
> Tony
>
> > On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com>
> wrote:
> >
> > Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v
> 1.4.2)
> > generates a query like:
> > WHERE column >= last_modified_time and column < current_time
> >
> > The --last-value is set to the current_time and gets used for the next
> run.
> >
> > Here, the upper bound is always set to the current_time. In some cases,
> > this upper bound is required to be taken from the database table column
> > itself. So, the query is required of the form:
> > WHERE column >= last_modified_time and column <
> max_time_in_db_table_column
> >
> > And the --last-value for next run needs to be set as
> > the max_time_in_db_table_column (and not the current_time).
> >
> > Is this possible to set while running sqoop?  If no, is there any
> > workaround suggested for this?
> >
> > Thanks a lot.
> > --
> > Jagrut
>
>

Re: Getting upper bound in --incremental mode

Posted by Tony Foerster <to...@phdata.io>.
Does `--incremental append` work for you?

> You should specify append mode when importing a table where new rows are continually being added with increasing row id values

Tony

> On Jul 19, 2017, at 2:02 PM, Jagrut Sharma <ja...@gmail.com> wrote:
> 
> Hi all - For --incremental mode with 'lastmodified' option, Sqoop (v 1.4.2)
> generates a query like:
> WHERE column >= last_modified_time and column < current_time
> 
> The --last-value is set to the current_time and gets used for the next run.
> 
> Here, the upper bound is always set to the current_time. In some cases,
> this upper bound is required to be taken from the database table column
> itself. So, the query is required of the form:
> WHERE column >= last_modified_time and column < max_time_in_db_table_column
> 
> And the --last-value for next run needs to be set as
> the max_time_in_db_table_column (and not the current_time).
> 
> Is this possible to set while running sqoop?  If no, is there any
> workaround suggested for this?
> 
> Thanks a lot.
> -- 
> Jagrut