You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Julien Serdaru (JIRA)" <ji...@apache.org> on 2013/05/01 03:23:13 UTC

[jira] [Created] (HADOOP-9530) DBInputSplit creates one invalid range on Oracle

Julien Serdaru created HADOOP-9530:
--------------------------------------

             Summary: DBInputSplit creates one invalid range on Oracle
                 Key: HADOOP-9530
                 URL: https://issues.apache.org/jira/browse/HADOOP-9530
             Project: Hadoop Common
          Issue Type: Bug
    Affects Versions: 1.1.2
            Reporter: Julien Serdaru


The DBInputFormat on Oracle does not create valid ranges.

The method getSplit line 263 is as follows:

          split = new DBInputSplit(i * chunkSize, (i * chunkSize)
              + chunkSize);

So the first split will have a start value of 0 (0*chunkSize).

However, the OracleDBRecordReader, line 84 is as follows:

      if (split.getLength() > 0 && split.getStart() > 0){

Since the start value of the first range is equal to 0, we will skip the block that partitions the input set. As a result, one of the map task will process the entire data set, rather than the partition.

I'm assuming the fix is trivial and would involve removing the second check in the if block.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira