You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Michael Hsu (JIRA)" <ji...@apache.org> on 2019/02/20 15:49:00 UTC

[jira] [Commented] (SQOOP-1946) DateSplitter relies on database string-to-date conversion when creating splits based on date columns

    [ https://issues.apache.org/jira/browse/SQOOP-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773124#comment-16773124 ] 

Michael Hsu commented on SQOOP-1946:
------------------------------------

It's been almost 1 month. IMHO, this is a important fix for the Oracle data source. I'm asking the board to review it.

> DateSplitter relies on database string-to-date conversion when creating splits based on date columns
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1946
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1946
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors/generic
>    Affects Versions: 1.4.5
>            Reporter: Andre Araujo
>            Priority: Major
>         Attachments: SQOOP-1946.patch
>
>
> When running a sqoop import job with a split-by column of data type DATE, the DateSplitter relies on the database implicit string-to-date convertion when generating the lower and upper bound clauses for the splits. The splits are generated with clauses similar to the following:
> {code}
> date_col >= '2013-08-26 00:00:00.0'
> date_col <= '2013-08-26 00:00:00.0'
> {code}
> This forces the database to either implcitly convert the date_col to string or implicitly cast the literal string to a date type. In case the database default date format is not the expected, this could lead to either exceptions, as in the example below, or to unexpected behavior (string comparison of to dates in different string formats).
> For Oracle databases, for example, we may see the following errors when the database cannot implicitly convert the string above to date:
> {code}
> 2014-12-23 12:38:25,690 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: sales_date >= '2013-08-26 00:00:00.0' AND sales_date <= '2013-08-26 00:00:00.0'
> 2014-12-23 12:38:25,745 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: sales_date >= '2013-08-26 00:00:00.0' AND sales_date <= '2013-08-26 00:00:00.0'
> 2014-12-23 12:38:25,860 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: select * from schema.TABLE where ( date_col >= '2013-08-26 00:00:00.0' ) AND ( date_col <= '2013-08-26 00:00:00.0' )
> 2014-12-23 12:38:25,909 ERROR [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Top level exception:  
> java.sql.SQLDataException: ORA-01861: literal does not match format string 
> {code}
> A workaround for that is to set the default date format for the database user used by Sqoop to the "expected" format using a logon trigger in Oracle, like the one below:
> {code}
> CREATE OR REPLACE TRIGGER tr_a_l_set_date_format
>    AFTER LOGON ON DATABASE WHEN (USER = 'SQOOP_USER')
>    BEGIN
>      execute immediate 'alter session set nls_date_format="yyyy-mm-dd hh24:mi:ss"';
>    END;
> {code}
> A better form, though, would be to explicitly convert the date string literal to a DATE using a specific format, in the exact same way that OracleManager.datetimeToQueryString() does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)