You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by burberry blues <bl...@gmail.com> on 2013/11/08 09:11:43 UTC

Extracting Updated records using Sqoop

Hi Harsh

I am trying to extract the modified records apart from the incremental
updates through Sqoop from Oracle database into Hive table.

However I am getting duplicate entries when trying to extract on a
particular last value attribute.

Below is my sqoop commnad

sqoop import  --connect jdbc:oracle:thin:xxx:xxx:xxx --username xxx
--password xxx --hive-import --table xxx --target-dir xxx  --hive-table xxx
--incremental append --check-column COLUMN_3  --split-by COLUMN_2 --columns
COLUMN_1,COLUMN_2,COLUMN_3 --last-value "2013-11-05 00:00:00"


My output is as follows




Column_1

Column_2

Column_3

new change1

1.0

2013-11-07 11:05:55.0

change3

3.0

2013-11-07 11:19:25.0

change1

1.0

2013-11-05 11:15:50.0

new change1

2.0

2013-11-07 11:18:55.0

NULL

4.0

2013-11-07 12:13:00.0

change2

2.0

2013-11-05 11:15:55.0




The highlighted record is getting inserted again instead of updating the
existing record


Is there any command for this?



Thanks,

Burberry

Fwd: Extracting Updated records using Sqoop

Posted by burberry blues <bl...@gmail.com>.
---------- Forwarded message ----------
From: burberry blues <bl...@gmail.com>
Date: Fri, Nov 8, 2013 at 12:11 AM
Subject: Extracting Updated records using Sqoop
To: user@sqoop.apache.org


Hi Harsh

I am trying to extract the modified records apart from the incremental
updates through Sqoop from Oracle database into Hive table.

However I am getting duplicate entries when trying to extract on a
particular last value attribute.

Below is my sqoop commnad

sqoop import  --connect jdbc:oracle:thin:xxx:xxx:xxx --username xxx
--password xxx --hive-import --table xxx --target-dir xxx  --hive-table xxx
--incremental append --check-column COLUMN_3  --split-by COLUMN_2 --columns
COLUMN_1,COLUMN_2,COLUMN_3 --last-value "2013-11-05 00:00:00"


My output is as follows




Column_1

Column_2

Column_3

new change1

1.0

2013-11-07 11:05:55.0

change3

3.0

2013-11-07 11:19:25.0

change1

1.0

2013-11-05 11:15:50.0

new change1

2.0

2013-11-07 11:18:55.0

NULL

4.0

2013-11-07 12:13:00.0

change2

2.0

2013-11-05 11:15:55.0




The highlighted record is getting inserted again instead of updating the
existing record


Is there any command for this?



Thanks,

Burberry