You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "zhaohong (JIRA)" <ji...@apache.org> on 2012/12/22 14:07:13 UTC
[jira] [Updated] (SQOOP-801) Missing data when Importing data
split by the column of the date type from oracle
[ https://issues.apache.org/jira/browse/SQOOP-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhaohong updated SQOOP-801:
---------------------------
Priority: Major (was: Critical)
> Missing data when Importing data split by the column of the date type from oracle
> -------------------------------------------------------------------------------------
>
> Key: SQOOP-801
> URL: https://issues.apache.org/jira/browse/SQOOP-801
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.4.2
> Environment: source dababase: oracle
> os: centos 5.4
> hadoop version: 0.20.2-cdh3u2
> Reporter: zhaohong
> Labels: patch
>
> The command is below:
> ============================
> sqoop import --connect 'jdbc:oracle:thin:@???.???.???.???:1521:staging1' --username v_stage --password 'v_stage'
> --table shoppingcart
> --columns 'trace_id,checkout_status ,entered_price,promotion_id ,permanent_id ,from_platform,points,shop_id ,creation_date,productid ,category_path,data_date ,cust_id ,cart_id ,last_changed_date ,order_from_ip,counts,reference_url,cart_item_id ,special_sale'
> --where "last_changed_date >= TO_DATE('2012-12-11','yyyy-mm-dd') and last_changed_date < TO_DATE('2012-12-12','yyyy-mm-dd')"
> --fields-terminated-by ''
> --lines-terminated-by '
> ' --split-by last_changed_date
> --target-dir /share/comm/customer/zhao/
> --hive-delims-replacement ' '
> -m 20
> ============================
> The DataBase is Oracle and the type of last_changed_date is date.
> There are 600010 records between 2012-12-11 and 2012-12-12 but I can only pull about 12 records using this command.
> the BoundingValsQuery is select min(last_change_date), max(last_change_date)
> from shoppingcart where last_chage_date >=TO_DATA('2012-12-11','yyyy-mm-dd') and last_chage_date<TO_DATA('2012-12-12','yyyy-mm-dd')
> I debuged the source code and found that the min and max val are both equal to 2012-12-11 and The query condition look like :
> where last_chage_date>=TO_TIMESTAMP(‘2012-12-11’,‘YYYY-MM-DD’) and last_chage_date<=TO_TIMESTAMP(‘2012-12-11’,‘YYYY-MM-DD’) that's why I can only get 12 records whose he last_chage_date is equal to 2012-12-11 00:00:00 exactly.
> I do some modified and the result is right.
> diff --git a/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java b/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java
> index 31e9351..812c6bb 100644
> --- a/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java
> +++ b/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java
> @@ -140,7 +144,7 @@ public class DateSplitter extends IntegerSplitter {
> try {
> switch (sqlDataType) {
> case Types.DATE:
> - return rs.getDate(colNum).getTime();
> + return rs.getTimestamp(colNum).getTime();
> case Types.TIME:
> return rs.getTime(colNum).getTime();
> case Types.TIMESTAMP:
> @@ -160,7 +164,7 @@ public class DateSplitter extends IntegerSplitter {
> private Date longToDate(long val, int sqlDataType) {
> switch (sqlDataType) {
> case Types.DATE:
> - return new java.sql.Date(val);
> + return new java.sql.Timestamp(val);
> case Types.TIME:
> return new java.sql.Time(val);
> case Types.TIMESTAMP:
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira