You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Shane Kumpf (JIRA)" <ji...@apache.org> on 2014/02/05 15:44:10 UTC
[jira] [Created] (MAPREDUCE-5740) Shuffle error when the MiniMRYARNCluster work path contains special characters

Shane Kumpf created MAPREDUCE-5740:
--------------------------------------

             Summary: Shuffle error when the MiniMRYARNCluster work path contains special characters
                 Key: MAPREDUCE-5740
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5740
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 2.2.0
            Reporter: Shane Kumpf
            Priority: Minor


When running tests that leverage MiniMRYARNCluster a failure occurs during the jenkins build, however, the tests are successful on local workstations.

The exception found is as follows: 
{quote}
2014-01-30 10:59:28,649 ERROR [ShuffleHandler.java:510] Shuffle error :
java.io.IOException: Error Reading IndexFile
	at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:123)
	at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:68)
	at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:592)
	at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:503)
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
	at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
	at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
	at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
	at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
	at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
	at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
	at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.FileNotFoundException: /home/sitebuild/jenkins/workspace/%7Binventory-engineering%7D-snapshot-workflow-%7BS7274%7D/target/Integration-Tests/Integration-Tests-localDir-nm-0_2/usercache/sitebuild/appcache/application_1391108343099_0001/output/attempt_1391108343099_0001_m_000000_0/file.out.index
	at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:210)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
	at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:156)
	at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:70)
	at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:62)
	at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
	... 32 more
{quote}


It was found that org.apache.hadoop.mapred.SpillRecord does a toURI on the indexFileName Path object (line 71). Jenkins uses {} to denote team and branch. These {} characters are being URL encoded, which causes the FileNotFoundException during the shuffle phase.

Interestingly, the code snippet is as follows and seems a little strange to be doing the Path.toUri() so high up in the call:

{code}
public SpillRecord(Path indexFileName, JobConf job, Checksum crc, String expectedIndexOwner)  throws IOException {

    final FileSystem rfs = FileSystem.getLocal(job).getRaw();

    final FSDataInputStream in =

        SecureIOUtils.openFSDataInputStream(new File(indexFileName.toUri().getRawPath()), expectedIndexOwner, null);

....

}
{code}

and SecureIOUtils creates a Path from the File object (!):

{code}
public static FSDataInputStream openFSDataInputStream(File file,

      String expectedOwner, String expectedGroup) throws IOException {

    if (!UserGroupInformation.isSecurityEnabled()) {

      return rawFilesystem.open('''new Path(file.getAbsolutePath())''');

    }

    return forceSecureOpenFSDataInputStream(file, expectedOwner, expectedGroup);

  }
{code}

The rawFileSystem.open(Path) code, above, is executed by the abstract class FileSystem that delegates to the child class at runtime, which could be any of:
	•	ChRootedFileSystem
	•	ChecksumFileSystem
	•	DistributedFileSystem
	•	FtpFileSystem
	•	WebHdfsFileSystem
	•	and others

URL escaping makes sense for the WebHdfsFileSystem and some others, but not for all. It seems to make sense to only URL escape within FileSystem implementations that require it.

Also of note: MiniMRYarnCluster allows for changing a bulk of the directories it uses via org.apache.hadoop.yarn.conf.YarnConfiguration, however testWorkDir is not one of them. testWorkDir is hardcoded to use the following in org.apache.hadoop.yarn.server.MiniYARNCluster.java

{code}
public MiniYARNCluster(String testName, int noOfNodeManagers,
                         int numLocalDirs, int numLogDirs) {
    super(testName.replace("$", ""));
    this.numLocalDirs = numLocalDirs;
    this.numLogDirs = numLogDirs;
    this.testWorkDir = new File("target",
        testName.replace("$", ""));
....
}
{code}

If modifications to SpillRecord are undesirable, allowing testWorkDir to be configurable might be a good workaround.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)