You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by "Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)" <vt...@cisco.com> on 2014/08/08 23:18:07 UTC

Sqoop --incremental lastmodified --last-value

Hi All,

I am using --incremental lastmodified in sqoop to get updated data and everything seems to be good except the --last-value that sqoop prints in output after successful import.
Need more insight into what value does sqoop print in output for--last-value and how it approach that value.

Sqoop output after successful import:
[Screen capture]

Maximum value in that column
[Screen capture]

I initially thought it prints the maximum of --check-column value (max(LAST_UPDATE_DATE) column in my case) but that doesn't happen.

Please help me understand this.


[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Vikash Talanki
Engineer - Software
vtalanki@cisco.com
Phone: +1 (408)838 4078

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA - 95134
United States
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html




RE: Sqoop --incremental lastmodified --last-value

Posted by "Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)" <vt...@cisco.com>.
I have one more question on this.

What records will sqoop import when we use say --incremental lastmodified --check-column LAST_UPDATE_DATE --last-value "2009-12-31 12:14:28" ??
All the records those are update after(>) "2009-12-31 12:14:28" or those are updated on and after(>=) "2009-12-31 12:14:28"….??

Sqoop cook book says that all the records where LAST_UPDATE_DATE > 2009-12-31 12:14:28 will be imported. But when I use this I am getting records that were updated on 2009-12-31 12:14:28 also.
That means it is using >= 2009-12-31 12:14:28 instead of > 2009-12-31 12:14:28.

Please let me understand this.

Thanks,
Vikash Talanki
+1 (408)838-4078

From: Gwen Shapira [mailto:gshapira@cloudera.com]
Sent: Friday, August 08, 2014 3:56 PM
To: user@sqoop.apache.org
Subject: Re: Sqoop --incremental lastmodified --last-value

I think its the same bug as this one:
https://issues.apache.org/jira/browse/SQOOP-1420

On Fri, Aug 8, 2014 at 3:41 PM, Abraham Elmahrek <ab...@cloudera.com>> wrote:
Huh, that's really weird. What is the result of the date command in linux on that machine?

On Fri, Aug 8, 2014 at 3:16 PM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco) <vt...@cisco.com>> wrote:
Thanks a ton Abraham. Understood the point.

But in my case its showing current time(upper boundary) in GMT time zone
[cid:image005.png@01CFB575.5C2392B0]

Even when I use -D oracle.sessionTimeZone=America/Los_Angeles in my sqoop command, it is considering the upper boundary in GMT only
[cid:image006.png@01CFB575.5C2392B0]

Is there a way to handle this time zone issue, so that I will not loose records updated in that 7 hrs difference(GMT – Los Angeles TZ = 7 hours) when this is job is executed next time.

Also please let me know,if considering max(last_update_date) from current run for --last-value for subsequent run gives me the same result… I think it should.

Thanks,
Vikash Talanki
+1 (408)838-4078<tel:%2B1%20%28408%29838-4078>

From: Abraham Elmahrek [mailto:abe@cloudera.com<ma...@cloudera.com>]
Sent: Friday, August 08, 2014 2:48 PM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Sqoop --incremental lastmodified --last-value

Hey there,

Sqoop returns the current timestamp. If you look closely at the bounding query, it uses whatever is supplied in "--last-value" as the lower boundary and the current system time as the upper boundary:
sqoop import --connect "jdbc:mysql:///test" --table lastmod --incremental lastmodified --check-column created --last-value "2014-08-08 11:19:41.0" ...
14/08/08 14:45:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `lastmod` WHERE ( `created` >= '2014-08-08 11:19:41.0' AND `created` < '2014-08-08 14:45:08.0' )
In the above example, 2014-08-08 11:19:41.0 is the lower boundary or the supplied "--last-value". Also, 2014-08-08 14:45:08.0 was the current system time at the time of running the script.
-Abe

On Fri, Aug 8, 2014 at 2:18 PM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco) <vt...@cisco.com>> wrote:
Hi All,

I am using --incremental lastmodified in sqoop to get updated data and everything seems to be good except the --last-value that sqoop prints in output after successful import.
Need more insight into what value does sqoop print in output for--last-value and how it approach that value.

Sqoop output after successful import:
[Screen capture]

Maximum value in that column
[Screen capture]

I initially thought it prints the maximum of --check-column value (max(LAST_UPDATE_DATE) column in my case) but that doesn’t happen.

Please help me understand this.


[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Vikash Talanki
Engineer - Software
vtalanki@cisco.com<ma...@cisco.com>
Phone: +1 (408)838 4078<tel:%2B1%20%28408%29838%204078>

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA – 95134
United States
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html







Re: Sqoop --incremental lastmodified --last-value

Posted by Gwen Shapira <gs...@cloudera.com>.
I think its the same bug as this one:
https://issues.apache.org/jira/browse/SQOOP-1420


On Fri, Aug 8, 2014 at 3:41 PM, Abraham Elmahrek <ab...@cloudera.com> wrote:

> Huh, that's really weird. What is the result of the date command in linux
> on that machine?
>
> On Fri, Aug 8, 2014 at 3:16 PM, Vikash Talanki -X (vtalanki - INFOSYS
> LIMITED at Cisco) <vt...@cisco.com> wrote:
>
>>  Thanks a ton Abraham. Understood the point.
>>
>>
>>
>> But in my case its showing current time(upper boundary) in GMT time zone
>>
>>
>>
>> Even when I use *-D oracle.sessionTimeZone=America/Los_Angeles* in my
>> sqoop command, it is considering the upper boundary in GMT only
>>
>>
>>
>> Is there a way to handle this time zone issue, so that I will not loose
>> records updated in that 7 hrs difference(GMT – Los Angeles TZ = 7 hours)
>> when this is job is executed next time.
>>
>>
>>
>> Also please let me know,if considering max(last_update_date) from current
>> run for *--last-value* for subsequent run gives me the same result… I
>> think it should.
>>
>>
>>
>> Thanks,
>>
>> Vikash Talanki
>>
>> +1 (408)838-4078
>>
>>
>>
>> *From:* Abraham Elmahrek [mailto:abe@cloudera.com]
>> *Sent:* Friday, August 08, 2014 2:48 PM
>> *To:* user@sqoop.apache.org
>> *Subject:* Re: Sqoop --incremental lastmodified --last-value
>>
>>
>>
>> Hey there,
>>
>>
>>
>> Sqoop returns the current timestamp. If you look closely at the bounding
>> query, it uses whatever is supplied in "--last-value" as the lower boundary
>> and the current system time as the upper boundary:
>>
>> sqoop import --connect "jdbc:mysql:///test" --table lastmod --incremental
>> lastmodified --check-column created --last-value "2014-08-08 11:19:41.0" ...
>>
>> 14/08/08 14:45:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
>> SELECT MIN(`id`), MAX(`id`) FROM `lastmod` WHERE ( `created` >= '2014-08-08
>> 11:19:41.0' AND `created` < '2014-08-08 14:45:08.0' )
>>
>> In the above example, 2014-08-08 11:19:41.0 is the lower boundary or the
>> supplied "--last-value". Also, 2014-08-08 14:45:08.0 was the current system
>> time at the time of running the script.
>>
>> -Abe
>>
>>
>>
>> On Fri, Aug 8, 2014 at 2:18 PM, Vikash Talanki -X (vtalanki - INFOSYS
>> LIMITED at Cisco) <vt...@cisco.com> wrote:
>>
>> Hi All,
>>
>>
>>
>> I am using *--incremental lastmodified* in sqoop to get updated data and
>> everything seems to be good except the *--last-value* that sqoop prints
>> in output after successful import.
>>
>> Need more insight into what value does sqoop print in output
>> for--last-value and how it approach that value.
>>
>>
>>
>> Sqoop output after successful import:
>>
>> [image: Screen capture]
>>
>>
>>
>> Maximum value in that column
>>
>> [image: Screen capture]
>>
>>
>>
>> I initially thought it prints the maximum of *--check-column* value
>> (max(LAST_UPDATE_DATE) column in my case) but that doesn’t happen.
>>
>>
>>
>> Please help me understand this.
>>
>>
>>
>>
>>
>> [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>>
>> *Vikash Talanki*
>> Engineer - Software
>> vtalanki@cisco.com
>> Phone: *+1 (408)838 4078 <%2B1%20%28408%29838%204078>*
>>
>> *Cisco Systems Limited*
>> SJ-J 3
>> 255 W Tasman Dr
>> San Jose
>> CA – 95134
>> United States
>> Cisco.com <http://www.cisco.com/>
>>
>>
>>
>> [image: Think before you print.]Think before you print.
>>
>> This email may contain confidential and privileged material for the sole
>> use of the intended recipient. Any review, use, distribution or disclosure
>> by others is strictly prohibited. If you are not the intended recipient (or
>> authorized to receive for the recipient), please contact the sender by
>> reply email and delete all copies of this message.
>>
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>>
>>
>>
>>
>>
>>
>>
>
>

RE: Sqoop --incremental lastmodified --last-value

Posted by "Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)" <vt...@cisco.com>.
Its just a DATE column.

[Screen capture]


Thanks,
Vikash Talanki
+1 (408)838-4078

From: Gwen Shapira [mailto:gshapira@cloudera.com]
Sent: Friday, August 08, 2014 4:36 PM
To: user@sqoop.apache.org
Subject: Re: Sqoop --incremental lastmodified --last-value

How's your timestamp column in Oracle is defined? "timestamp with timezone"? if so, which timezone?

On Fri, Aug 8, 2014 at 4:00 PM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco) <vt...@cisco.com>> wrote:
Its in PDT time zone.

[Screen capture]


Thanks,
Vikash Talanki
+1 (408)838-4078<tel:%2B1%20%28408%29838-4078>

From: Abraham Elmahrek [mailto:abe@cloudera.com<ma...@cloudera.com>]
Sent: Friday, August 08, 2014 3:42 PM

To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Sqoop --incremental lastmodified --last-value

Huh, that's really weird. What is the result of the date command in linux on that machine?

On Fri, Aug 8, 2014 at 3:16 PM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco) <vt...@cisco.com>> wrote:
Thanks a ton Abraham. Understood the point.

But in my case its showing current time(upper boundary) in GMT time zone
[cid:image005.png@01CFB330.5C5F1110]

Even when I use -D oracle.sessionTimeZone=America/Los_Angeles in my sqoop command, it is considering the upper boundary in GMT only
[cid:image006.png@01CFB330.5C5F1110]

Is there a way to handle this time zone issue, so that I will not loose records updated in that 7 hrs difference(GMT – Los Angeles TZ = 7 hours) when this is job is executed next time.

Also please let me know,if considering max(last_update_date) from current run for --last-value for subsequent run gives me the same result… I think it should.

Thanks,
Vikash Talanki
+1 (408)838-4078<tel:%2B1%20%28408%29838-4078>

From: Abraham Elmahrek [mailto:abe@cloudera.com<ma...@cloudera.com>]
Sent: Friday, August 08, 2014 2:48 PM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Sqoop --incremental lastmodified --last-value

Hey there,

Sqoop returns the current timestamp. If you look closely at the bounding query, it uses whatever is supplied in "--last-value" as the lower boundary and the current system time as the upper boundary:
sqoop import --connect "jdbc:mysql:///test" --table lastmod --incremental lastmodified --check-column created --last-value "2014-08-08 11:19:41.0" ...
14/08/08 14:45:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `lastmod` WHERE ( `created` >= '2014-08-08 11:19:41.0' AND `created` < '2014-08-08 14:45:08.0' )
In the above example, 2014-08-08 11:19:41.0 is the lower boundary or the supplied "--last-value". Also, 2014-08-08 14:45:08.0 was the current system time at the time of running the script.
-Abe

On Fri, Aug 8, 2014 at 2:18 PM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco) <vt...@cisco.com>> wrote:
Hi All,

I am using --incremental lastmodified in sqoop to get updated data and everything seems to be good except the --last-value that sqoop prints in output after successful import.
Need more insight into what value does sqoop print in output for--last-value and how it approach that value.

Sqoop output after successful import:
[Screen capture]

Maximum value in that column
[Screen capture]

I initially thought it prints the maximum of --check-column value (max(LAST_UPDATE_DATE) column in my case) but that doesn’t happen.

Please help me understand this.


[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Vikash Talanki
Engineer - Software
vtalanki@cisco.com<ma...@cisco.com>
Phone: +1 (408)838 4078<tel:%2B1%20%28408%29838%204078>

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA – 95134
United States
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html







Re: Sqoop --incremental lastmodified --last-value

Posted by Gwen Shapira <gs...@cloudera.com>.
How's your timestamp column in Oracle is defined? "timestamp with
timezone"? if so, which timezone?


On Fri, Aug 8, 2014 at 4:00 PM, Vikash Talanki -X (vtalanki - INFOSYS
LIMITED at Cisco) <vt...@cisco.com> wrote:

>  Its in PDT time zone.
>
>
>
> [image: Screen capture]
>
>
>
>
>
> Thanks,
>
> Vikash Talanki
>
> +1 (408)838-4078
>
>
>
> *From:* Abraham Elmahrek [mailto:abe@cloudera.com]
> *Sent:* Friday, August 08, 2014 3:42 PM
>
> *To:* user@sqoop.apache.org
> *Subject:* Re: Sqoop --incremental lastmodified --last-value
>
>
>
> Huh, that's really weird. What is the result of the date command in linux
> on that machine?
>
>
>
> On Fri, Aug 8, 2014 at 3:16 PM, Vikash Talanki -X (vtalanki - INFOSYS
> LIMITED at Cisco) <vt...@cisco.com> wrote:
>
> Thanks a ton Abraham. Understood the point.
>
>
>
> But in my case its showing current time(upper boundary) in GMT time zone
>
>
>
> Even when I use *-D oracle.sessionTimeZone=America/Los_Angeles* in my
> sqoop command, it is considering the upper boundary in GMT only
>
>
>
> Is there a way to handle this time zone issue, so that I will not loose
> records updated in that 7 hrs difference(GMT – Los Angeles TZ = 7 hours)
> when this is job is executed next time.
>
>
>
> Also please let me know,if considering max(last_update_date) from current
> run for *--last-value* for subsequent run gives me the same result… I
> think it should.
>
>
>
> Thanks,
>
> Vikash Talanki
>
> +1 (408)838-4078
>
>
>
> *From:* Abraham Elmahrek [mailto:abe@cloudera.com]
> *Sent:* Friday, August 08, 2014 2:48 PM
> *To:* user@sqoop.apache.org
> *Subject:* Re: Sqoop --incremental lastmodified --last-value
>
>
>
> Hey there,
>
>
>
> Sqoop returns the current timestamp. If you look closely at the bounding
> query, it uses whatever is supplied in "--last-value" as the lower boundary
> and the current system time as the upper boundary:
>
> sqoop import --connect "jdbc:mysql:///test" --table lastmod --incremental
> lastmodified --check-column created --last-value "2014-08-08 11:19:41.0" ...
>
> 14/08/08 14:45:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
> SELECT MIN(`id`), MAX(`id`) FROM `lastmod` WHERE ( `created` >= '2014-08-08
> 11:19:41.0' AND `created` < '2014-08-08 14:45:08.0' )
>
> In the above example, 2014-08-08 11:19:41.0 is the lower boundary or the
> supplied "--last-value". Also, 2014-08-08 14:45:08.0 was the current system
> time at the time of running the script.
>
> -Abe
>
>
>
> On Fri, Aug 8, 2014 at 2:18 PM, Vikash Talanki -X (vtalanki - INFOSYS
> LIMITED at Cisco) <vt...@cisco.com> wrote:
>
> Hi All,
>
>
>
> I am using *--incremental lastmodified* in sqoop to get updated data and
> everything seems to be good except the *--last-value* that sqoop prints
> in output after successful import.
>
> Need more insight into what value does sqoop print in output
> for--last-value and how it approach that value.
>
>
>
> Sqoop output after successful import:
>
> [image: Screen capture]
>
>
>
> Maximum value in that column
>
> [image: Screen capture]
>
>
>
> I initially thought it prints the maximum of *--check-column* value
> (max(LAST_UPDATE_DATE) column in my case) but that doesn’t happen.
>
>
>
> Please help me understand this.
>
>
>
>
>
> [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>
> *Vikash Talanki*
> Engineer - Software
> vtalanki@cisco.com
> Phone: *+1 (408)838 4078 <%2B1%20%28408%29838%204078>*
>
> *Cisco Systems Limited*
> SJ-J 3
> 255 W Tasman Dr
> San Jose
> CA – 95134
> United States
> Cisco.com <http://www.cisco.com/>
>
>
>
> [image: Think before you print.]Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>
>
>
>
>
>
>
>
>

RE: Sqoop --incremental lastmodified --last-value

Posted by "Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)" <vt...@cisco.com>.
Its in PDT time zone.

[Screen capture]


Thanks,
Vikash Talanki
+1 (408)838-4078

From: Abraham Elmahrek [mailto:abe@cloudera.com]
Sent: Friday, August 08, 2014 3:42 PM
To: user@sqoop.apache.org
Subject: Re: Sqoop --incremental lastmodified --last-value

Huh, that's really weird. What is the result of the date command in linux on that machine?

On Fri, Aug 8, 2014 at 3:16 PM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco) <vt...@cisco.com>> wrote:
Thanks a ton Abraham. Understood the point.

But in my case its showing current time(upper boundary) in GMT time zone
[cid:image009.png@01CFB321.D25D0D40]

Even when I use -D oracle.sessionTimeZone=America/Los_Angeles in my sqoop command, it is considering the upper boundary in GMT only
[cid:image010.png@01CFB321.D25D0D40]

Is there a way to handle this time zone issue, so that I will not loose records updated in that 7 hrs difference(GMT – Los Angeles TZ = 7 hours) when this is job is executed next time.

Also please let me know,if considering max(last_update_date) from current run for --last-value for subsequent run gives me the same result… I think it should.

Thanks,
Vikash Talanki
+1 (408)838-4078<tel:%2B1%20%28408%29838-4078>

From: Abraham Elmahrek [mailto:abe@cloudera.com<ma...@cloudera.com>]
Sent: Friday, August 08, 2014 2:48 PM
To: user@sqoop.apache.org<ma...@sqoop.apache.org>
Subject: Re: Sqoop --incremental lastmodified --last-value

Hey there,

Sqoop returns the current timestamp. If you look closely at the bounding query, it uses whatever is supplied in "--last-value" as the lower boundary and the current system time as the upper boundary:
sqoop import --connect "jdbc:mysql:///test" --table lastmod --incremental lastmodified --check-column created --last-value "2014-08-08 11:19:41.0" ...
14/08/08 14:45:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `lastmod` WHERE ( `created` >= '2014-08-08 11:19:41.0' AND `created` < '2014-08-08 14:45:08.0' )
In the above example, 2014-08-08 11:19:41.0 is the lower boundary or the supplied "--last-value". Also, 2014-08-08 14:45:08.0 was the current system time at the time of running the script.
-Abe

On Fri, Aug 8, 2014 at 2:18 PM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco) <vt...@cisco.com>> wrote:
Hi All,

I am using --incremental lastmodified in sqoop to get updated data and everything seems to be good except the --last-value that sqoop prints in output after successful import.
Need more insight into what value does sqoop print in output for--last-value and how it approach that value.

Sqoop output after successful import:
[Screen capture]

Maximum value in that column
[Screen capture]

I initially thought it prints the maximum of --check-column value (max(LAST_UPDATE_DATE) column in my case) but that doesn’t happen.

Please help me understand this.


[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Vikash Talanki
Engineer - Software
vtalanki@cisco.com<ma...@cisco.com>
Phone: +1 (408)838 4078<tel:%2B1%20%28408%29838%204078>

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA – 95134
United States
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html






Re: Sqoop --incremental lastmodified --last-value

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Huh, that's really weird. What is the result of the date command in linux
on that machine?

On Fri, Aug 8, 2014 at 3:16 PM, Vikash Talanki -X (vtalanki - INFOSYS
LIMITED at Cisco) <vt...@cisco.com> wrote:

>  Thanks a ton Abraham. Understood the point.
>
>
>
> But in my case its showing current time(upper boundary) in GMT time zone
>
>
>
> Even when I use *-D oracle.sessionTimeZone=America/Los_Angeles* in my
> sqoop command, it is considering the upper boundary in GMT only
>
>
>
> Is there a way to handle this time zone issue, so that I will not loose
> records updated in that 7 hrs difference(GMT – Los Angeles TZ = 7 hours)
> when this is job is executed next time.
>
>
>
> Also please let me know,if considering max(last_update_date) from current
> run for *--last-value* for subsequent run gives me the same result… I
> think it should.
>
>
>
> Thanks,
>
> Vikash Talanki
>
> +1 (408)838-4078
>
>
>
> *From:* Abraham Elmahrek [mailto:abe@cloudera.com]
> *Sent:* Friday, August 08, 2014 2:48 PM
> *To:* user@sqoop.apache.org
> *Subject:* Re: Sqoop --incremental lastmodified --last-value
>
>
>
> Hey there,
>
>
>
> Sqoop returns the current timestamp. If you look closely at the bounding
> query, it uses whatever is supplied in "--last-value" as the lower boundary
> and the current system time as the upper boundary:
>
> sqoop import --connect "jdbc:mysql:///test" --table lastmod --incremental
> lastmodified --check-column created --last-value "2014-08-08 11:19:41.0" ...
>
> 14/08/08 14:45:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
> SELECT MIN(`id`), MAX(`id`) FROM `lastmod` WHERE ( `created` >= '2014-08-08
> 11:19:41.0' AND `created` < '2014-08-08 14:45:08.0' )
>
> In the above example, 2014-08-08 11:19:41.0 is the lower boundary or the
> supplied "--last-value". Also, 2014-08-08 14:45:08.0 was the current system
> time at the time of running the script.
>
> -Abe
>
>
>
> On Fri, Aug 8, 2014 at 2:18 PM, Vikash Talanki -X (vtalanki - INFOSYS
> LIMITED at Cisco) <vt...@cisco.com> wrote:
>
> Hi All,
>
>
>
> I am using *--incremental lastmodified* in sqoop to get updated data and
> everything seems to be good except the *--last-value* that sqoop prints
> in output after successful import.
>
> Need more insight into what value does sqoop print in output
> for--last-value and how it approach that value.
>
>
>
> Sqoop output after successful import:
>
> [image: Screen capture]
>
>
>
> Maximum value in that column
>
> [image: Screen capture]
>
>
>
> I initially thought it prints the maximum of *--check-column* value
> (max(LAST_UPDATE_DATE) column in my case) but that doesn’t happen.
>
>
>
> Please help me understand this.
>
>
>
>
>
> [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>
> *Vikash Talanki*
> Engineer - Software
> vtalanki@cisco.com
> Phone: *+1 (408)838 4078 <%2B1%20%28408%29838%204078>*
>
> *Cisco Systems Limited*
> SJ-J 3
> 255 W Tasman Dr
> San Jose
> CA – 95134
> United States
> Cisco.com <http://www.cisco.com/>
>
>
>
> [image: Think before you print.]Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>
>
>
>
>
>
>

RE: Sqoop --incremental lastmodified --last-value

Posted by "Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco)" <vt...@cisco.com>.
Thanks a ton Abraham. Understood the point.

But in my case its showing current time(upper boundary) in GMT time zone
[cid:image005.png@01CFB31A.A8814740]

Even when I use -D oracle.sessionTimeZone=America/Los_Angeles in my sqoop command, it is considering the upper boundary in GMT only
[cid:image006.png@01CFB31B.5638DDD0]

Is there a way to handle this time zone issue, so that I will not loose records updated in that 7 hrs difference(GMT – Los Angeles TZ = 7 hours) when this is job is executed next time.

Also please let me know,if considering max(last_update_date) from current run for --last-value for subsequent run gives me the same result… I think it should.

Thanks,
Vikash Talanki
+1 (408)838-4078

From: Abraham Elmahrek [mailto:abe@cloudera.com]
Sent: Friday, August 08, 2014 2:48 PM
To: user@sqoop.apache.org
Subject: Re: Sqoop --incremental lastmodified --last-value

Hey there,

Sqoop returns the current timestamp. If you look closely at the bounding query, it uses whatever is supplied in "--last-value" as the lower boundary and the current system time as the upper boundary:
sqoop import --connect "jdbc:mysql:///test" --table lastmod --incremental lastmodified --check-column created --last-value "2014-08-08 11:19:41.0" ...
14/08/08 14:45:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `lastmod` WHERE ( `created` >= '2014-08-08 11:19:41.0' AND `created` < '2014-08-08 14:45:08.0' )
In the above example, 2014-08-08 11:19:41.0 is the lower boundary or the supplied "--last-value". Also, 2014-08-08 14:45:08.0 was the current system time at the time of running the script.
-Abe

On Fri, Aug 8, 2014 at 2:18 PM, Vikash Talanki -X (vtalanki - INFOSYS LIMITED at Cisco) <vt...@cisco.com>> wrote:
Hi All,

I am using --incremental lastmodified in sqoop to get updated data and everything seems to be good except the --last-value that sqoop prints in output after successful import.
Need more insight into what value does sqoop print in output for--last-value and how it approach that value.

Sqoop output after successful import:
[Screen capture]

Maximum value in that column
[Screen capture]

I initially thought it prints the maximum of --check-column value (max(LAST_UPDATE_DATE) column in my case) but that doesn’t happen.

Please help me understand this.


[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Vikash Talanki
Engineer - Software
vtalanki@cisco.com<ma...@cisco.com>
Phone: +1 (408)838 4078<tel:%2B1%20%28408%29838%204078>

Cisco Systems Limited
SJ-J 3
255 W Tasman Dr
San Jose
CA – 95134
United States
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html





Re: Sqoop --incremental lastmodified --last-value

Posted by Abraham Elmahrek <ab...@cloudera.com>.
Hey there,

Sqoop returns the current timestamp. If you look closely at the bounding
query, it uses whatever is supplied in "--last-value" as the lower boundary
and the current system time as the upper boundary:

sqoop import --connect "jdbc:mysql:///test" --table lastmod --incremental
lastmodified --check-column created --last-value "2014-08-08 11:19:41.0" ...

14/08/08 14:45:10 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
SELECT MIN(`id`), MAX(`id`) FROM `lastmod` WHERE ( `created` >= '2014-08-08
11:19:41.0' AND `created` < '2014-08-08 14:45:08.0' )

In the above example, 2014-08-08 11:19:41.0 is the lower boundary or the
supplied "--last-value". Also, 2014-08-08 14:45:08.0 was the current system
time at the time of running the script.
-Abe


On Fri, Aug 8, 2014 at 2:18 PM, Vikash Talanki -X (vtalanki - INFOSYS
LIMITED at Cisco) <vt...@cisco.com> wrote:

>  Hi All,
>
>
>
> I am using *--incremental lastmodified* in sqoop to get updated data and
> everything seems to be good except the *--last-value* that sqoop prints
> in output after successful import.
>
> Need more insight into what value does sqoop print in output
> for--last-value and how it approach that value.
>
>
>
> Sqoop output after successful import:
>
> [image: Screen capture]
>
>
>
> Maximum value in that column
>
> [image: Screen capture]
>
>
>
> I initially thought it prints the maximum of *--check-column* value
> (max(LAST_UPDATE_DATE) column in my case) but that doesn’t happen.
>
>
>
> Please help me understand this.
>
>
>
>
>
> [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
>
> *Vikash Talanki*
> Engineer - Software
> vtalanki@cisco.com
> Phone: *+1 (408)838 4078 <%2B1%20%28408%29838%204078>*
>
> *Cisco Systems Limited*
> SJ-J 3
> 255 W Tasman Dr
> San Jose
> CA – 95134
> United States
> Cisco.com <http://www.cisco.com/>
>
>
>
> [image: Think before you print.]Think before you print.
>
> This email may contain confidential and privileged material for the sole
> use of the intended recipient. Any review, use, distribution or disclosure
> by others is strictly prohibited. If you are not the intended recipient (or
> authorized to receive for the recipient), please contact the sender by
> reply email and delete all copies of this message.
>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/index.html
>
>
>
>
>