You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Benyi Wang (JIRA)" <ji...@apache.org> on 2014/11/12 06:21:33 UTC

[jira] [Created] (SQOOP-1714) DateSplitter makes wrong splits

Benyi Wang created SQOOP-1714:
---------------------------------

             Summary: DateSplitter makes wrong splits
                 Key: SQOOP-1714
                 URL: https://issues.apache.org/jira/browse/SQOOP-1714
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.4.4
         Environment: CentOS 6.4 CDH-5.1.0
            Reporter: Benyi Wang


If the split-by column is a Date type, Sqoop will send a query to read Min(Date) and Max(Date), those two values are passed to DateSplitter. DateSplitter converts those values into long, and does a split using num-mappers. But this method is wrong. If min(Date) and max(Date) are 2013-09-26 and 2013-09-28, how many days do we have? 3 days. But if 2013-09-28 as a java.sql.Date#getTIme will returns the value actually is (2013-09-28 00:00:00), the maxVal - minVal has only two days. 

I encountered this issue when I tried to import a Teradata table: Given date between 2013-09-26 and 2013-09-28, and num-mappers=3, there are 3 tasks, the conditions are
# date >= 2013-09-26 and date < 2013-09-26;
# date >=2013-09-26 and date < 2013-09-27, 
# date >= 2013-09-27 and date <= 2013-09-28
The first one has nothing, and the last one has two days.

Because the difference of the minVal and maxVal is two days (24*2*3600*1000), the split size will be 2/3 day, when it is converted back to Date, it will be still 2013-09-26, that's why the first partition is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)