You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Sunil Govindan (JIRA)" <ji...@apache.org> on 2018/11/23 12:01:02 UTC

[jira] [Updated] (MAPREDUCE-6996) FileInputFormat#getBlockIndex should include file name in the exception.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sunil Govindan updated MAPREDUCE-6996:
--------------------------------------
    Target Version/s: 3.3.0  (was: 3.2.0)

Bulk update: moved all 3.2.0 non-blocker issues, please move back if it is a blocker.

> FileInputFormat#getBlockIndex should include file name in the exception.
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6996
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6996
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Rushabh S Shah
>            Priority: Minor
>              Labels: newbie++
>
> {code:title=FileInputFormat..java|borderStyle=solid}
> // Some comments here
>  protected int getBlockIndex(BlockLocation[] blkLocations, 
>                               long offset) {
> {
> ...
> ...
> BlockLocation last = blkLocations[blkLocations.length -1];
>     long fileLength = last.getOffset() + last.getLength() -1;
>     throw new IllegalArgumentException("Offset " + offset + 
>                                        " is outside of file (0.." +
>                                        fileLength + ")");
> }
> {code}
> When the file is open for writing, the {{last.getLength()}} and {{last.getOffset()}} will be zero and we see the following exception stack trace.
> {noformat}
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:288)
> Caused by: java.lang.IllegalArgumentException: Offset 0 is outside of file (0..-1)
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getBlockIndex(FileInputFormat.java:453)
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:413)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
> ... 18 more
> {noformat}
> Its difficult to debug which file was open.
> So creating this ticket to include the filename in the exception.
> Since {{FileInputFormat#getBlockIndex}} is protected, we can't change the signature of that method and add file name to arguments.
> The only way I can think to fix this is: 
> {code:title=FileInputFormat..java|borderStyle=solid}
>  public InputSplit[] getSplits(JobConf job, int numSplits)
>     throws IOException {
> {
> ...
> ...
>    for (FileStatus file: files) {
>       Path path = file.getPath();
>       long length = file.getLen();
>       if (length != 0) {
>         FileSystem fs = path.getFileSystem(job);
>         BlockLocation[] blkLocations;
>         if (file instanceof LocatedFileStatus) {
>           blkLocations = ((LocatedFileStatus) file).getBlockLocations();
>         } else {
>           blkLocations = fs.getFileBlockLocations(file, 0, length);
>         }
>         if (isSplitable(fs, path)) {
>           long blockSize = file.getBlockSize();
>           long splitSize = computeSplitSize(goalSize, minSize, blockSize);
>           long bytesRemaining = length;
>           while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {
>             String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,
>                 length-bytesRemaining, splitSize, clusterMap);
>             splits.add(makeSplit(path, length-bytesRemaining, splitSize,
>                 splitHosts[0], splitHosts[1]));
>             bytesRemaining -= splitSize;
>           }
>           if (bytesRemaining != 0) {
>             String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, length
>                 - bytesRemaining, bytesRemaining, clusterMap);
>             splits.add(makeSplit(path, length - bytesRemaining, bytesRemaining,
>                 splitHosts[0], splitHosts[1]));
>           }
>         } else {
>           String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,0,length,clusterMap);
>           splits.add(makeSplit(path, 0, length, splitHosts[0], splitHosts[1]));
>         }
>       } else { 
>         //Create empty hosts array for zero length files
>         splits.add(makeSplit(path, 0, length, new String[0]));
>       }
>     }
> {code}
> Have a try-catch block around the above code chunk and catch {{IllegalArgumentException}} and check for message {{Offset 0 is outside of file (0..-1)}}.
> If yes, add the file name and rethrow {{IllegalArgumentException}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org