You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "divya (Commented) (JIRA)" <ji...@apache.org> on 2012/01/27 11:16:41 UTC

[jira] [Commented] (MAPREDUCE-5) Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194590#comment-13194590 ] 

divya commented on MAPREDUCE-5:
-------------------------------

We get the above exception on a cluster set up and not on single node . Due to this the performance of the map/reduce job goes down (from 1mins, 35sec to 4mins, 24sec ). 
But one change in the exception the cause . 

org.mortbay.jetty.EofException
     
Caused by: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.write0(Native Method)
       

Does it happen due to connectivity issue among the nodes ?
                
> Shuffle's getMapOutput() fails with EofException, followed by IllegalStateException
> -----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>         Environment: Sun Java 1.6.0_13, OpenSolaris, running on a SunFire 4150 (x64) 10 node cluster
>            Reporter: George Porter
>
> During the shuffle phase, I'm seeing a large sequence of the following actions:
> 1) WARN org.apache.hadoop.mapred.TaskTracker: getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed : org.mortbay.jetty.EofException
> 2) WARN org.mortbay.log: Committed before 410 getMapOutput(attempt_200905181452_0002_m_000010_0,0) failed : org.mortbay.jetty.EofException
> 3) ERROR org.mortbay.log: /mapOutput java.lang.IllegalStateException: Committed
> The map phase completes with 100%, and then the reduce phase crawls along with the above errors in each of the TaskTracker logs.  None of the tasktrackers get lost.  When I run non-data jobs like the 'pi' test from the example jar, everything works fine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira