You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Oren Benjamin (JIRA)" <ji...@apache.org> on 2013/02/21 02:12:11 UTC

[jira] [Commented] (SQOOP-603) Support small intervals in IntegerSplitter implementation

    [ https://issues.apache.org/jira/browse/SQOOP-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582758#comment-13582758 ] 

Oren Benjamin commented on SQOOP-603:
-------------------------------------

Why not use open intervals for all the splits instead of special-casing this?  Incrementing the max value and using only open intervals would result in removal of the existing special case for the closed last interval rather than the addition of another special case.
                
> Support small intervals in IntegerSplitter implementation
> ---------------------------------------------------------
>
>                 Key: SQOOP-603
>                 URL: https://issues.apache.org/jira/browse/SQOOP-603
>             Project: Sqoop
>          Issue Type: Improvement
>    Affects Versions: 1.4.2
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>             Fix For: 1.4.3
>
>         Attachments: SQOOP-603.patch
>
>
> IntegerSplitter is currently creating splits of following nature:
> {code}
> minimal value <= x < splitPoint1
> splitPoint1 <= x < splitPoint2
> ...
> splitPointN <= x <= maximal value
> {code}
> Please notice that upper bound is always with using condition "<" with exception of the last split that is using condition "<=". This is perfectly fine when creating reasonable amount of splits on very huge interval.
> This approach will however cause issues on very small intervals. For example following splits will be created on interval [0, 5] with 5 splits:
> * 0 <= x < 1
> * 1 <= x < 2 
> * 2 <= x < 3 
> * 3 <= x < 4 
> * 4 <= x <= 5
> Notice that all splits have equal count of numbers except the last one having two numbers - 4 and 5. This becomes very huge issue when for example user needs to create one split per one partition as one mapper will end up with moving two partitions and thus entire job will take twice as long as the other ones.
> Jarcec

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira