You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "zhaohong (JIRA)" <ji...@apache.org> on 2012/12/22 14:07:13 UTC

[jira] [Updated] (SQOOP-801) Missing data when Importing data split by the column of the date type from oracle

     [ https://issues.apache.org/jira/browse/SQOOP-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhaohong updated SQOOP-801:
---------------------------

    Priority: Major  (was: Critical)
    
> Missing data  when Importing data split by  the column of the date type from oracle  
> -------------------------------------------------------------------------------------
>
>                 Key: SQOOP-801
>                 URL: https://issues.apache.org/jira/browse/SQOOP-801
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.2
>         Environment: source dababase: oracle
> os: centos 5.4
> hadoop version: 0.20.2-cdh3u2
>            Reporter: zhaohong
>              Labels: patch
>
> The command is below:
> ============================
> sqoop import --connect 'jdbc:oracle:thin:@???.???.???.???:1521:staging1' --username v_stage --password 'v_stage' 
> --table shoppingcart 
> --columns 'trace_id,checkout_status ,entered_price,promotion_id ,permanent_id ,from_platform,points,shop_id ,creation_date,productid ,category_path,data_date ,cust_id ,cart_id ,last_changed_date ,order_from_ip,counts,reference_url,cart_item_id ,special_sale' 
> --where "last_changed_date >= TO_DATE('2012-12-11','yyyy-mm-dd') and last_changed_date < TO_DATE('2012-12-12','yyyy-mm-dd')" 
> --fields-terminated-by '' 
> --lines-terminated-by '
> ' --split-by last_changed_date 
> --target-dir /share/comm/customer/zhao/ 
> --hive-delims-replacement ' ' 
> -m 20
> ============================
> The DataBase is Oracle and the type of last_changed_date is date. 
> There are 600010 records between 2012-12-11 and 2012-12-12  but I can only pull about 12 records using this command.
> the BoundingValsQuery is select min(last_change_date), max(last_change_date)
> from shoppingcart where last_chage_date >=TO_DATA('2012-12-11','yyyy-mm-dd') and last_chage_date<TO_DATA('2012-12-12','yyyy-mm-dd')
> I debuged the source code and found that the min and max val are both equal to 2012-12-11 and The query condition look like : 
> where last_chage_date>=TO_TIMESTAMP(‘2012-12-11’,‘YYYY-MM-DD’) and last_chage_date<=TO_TIMESTAMP(‘2012-12-11’,‘YYYY-MM-DD’) that's why I can only get 12 records whose he last_chage_date is equal to 2012-12-11 00:00:00 exactly.
> I do some modified and the result is right.
>  diff --git a/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java b/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java
> index 31e9351..812c6bb 100644
> --- a/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java
> +++ b/src/java/org/apache/sqoop/mapreduce/db/DateSplitter.java
> @@ -140,7 +144,7 @@ public class DateSplitter extends IntegerSplitter {
>      try {
>        switch (sqlDataType) {
>        case Types.DATE:
> -        return rs.getDate(colNum).getTime();
> +       return rs.getTimestamp(colNum).getTime();
>        case Types.TIME:
>          return rs.getTime(colNum).getTime();
>        case Types.TIMESTAMP:
> @@ -160,7 +164,7 @@ public class DateSplitter extends IntegerSplitter {
>    private Date longToDate(long val, int sqlDataType) {
>      switch (sqlDataType) {
>      case Types.DATE:
> -      return new java.sql.Date(val);
> +      return new java.sql.Timestamp(val);
>      case Types.TIME:
>        return new java.sql.Time(val);
>      case Types.TIMESTAMP:
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira