You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@sqoop.apache.org by burberry blues <bl...@gmail.com> on 2013/11/08 09:11:43 UTC
Extracting Updated records using Sqoop
Hi Harsh
I am trying to extract the modified records apart from the incremental
updates through Sqoop from Oracle database into Hive table.
However I am getting duplicate entries when trying to extract on a
particular last value attribute.
Below is my sqoop commnad
sqoop import --connect jdbc:oracle:thin:xxx:xxx:xxx --username xxx
--password xxx --hive-import --table xxx --target-dir xxx --hive-table xxx
--incremental append --check-column COLUMN_3 --split-by COLUMN_2 --columns
COLUMN_1,COLUMN_2,COLUMN_3 --last-value "2013-11-05 00:00:00"
My output is as follows
Column_1
Column_2
Column_3
new change1
1.0
2013-11-07 11:05:55.0
change3
3.0
2013-11-07 11:19:25.0
change1
1.0
2013-11-05 11:15:50.0
new change1
2.0
2013-11-07 11:18:55.0
NULL
4.0
2013-11-07 12:13:00.0
change2
2.0
2013-11-05 11:15:55.0
The highlighted record is getting inserted again instead of updating the
existing record
Is there any command for this?
Thanks,
Burberry
Fwd: Extracting Updated records using Sqoop
Posted by burberry blues <bl...@gmail.com>.
---------- Forwarded message ----------
From: burberry blues <bl...@gmail.com>
Date: Fri, Nov 8, 2013 at 12:11 AM
Subject: Extracting Updated records using Sqoop
To: user@sqoop.apache.org
Hi Harsh
I am trying to extract the modified records apart from the incremental
updates through Sqoop from Oracle database into Hive table.
However I am getting duplicate entries when trying to extract on a
particular last value attribute.
Below is my sqoop commnad
sqoop import --connect jdbc:oracle:thin:xxx:xxx:xxx --username xxx
--password xxx --hive-import --table xxx --target-dir xxx --hive-table xxx
--incremental append --check-column COLUMN_3 --split-by COLUMN_2 --columns
COLUMN_1,COLUMN_2,COLUMN_3 --last-value "2013-11-05 00:00:00"
My output is as follows
Column_1
Column_2
Column_3
new change1
1.0
2013-11-07 11:05:55.0
change3
3.0
2013-11-07 11:19:25.0
change1
1.0
2013-11-05 11:15:50.0
new change1
2.0
2013-11-07 11:18:55.0
NULL
4.0
2013-11-07 12:13:00.0
change2
2.0
2013-11-05 11:15:55.0
The highlighted record is getting inserted again instead of updating the
existing record
Is there any command for this?
Thanks,
Burberry