You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Prashant Sharma <pr...@gmail.com> on 2011/10/17 10:33:51 UTC

Problem while submitting jobs to NM started with ephemeral ports.

I am using following properties in yarn-site

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
 <property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
  <property>
    <name>yarn.nodemanager.address</name>
    <value>localhost:0</value>
  </property>
  <property>
    <name>yarn.nodemanager.localizer.address</name>
    <value>localhost:0</value>
  </property>

Everything runs fine. (means all daemons are started perfectly) But
when you try to submit the job. Job is stuck and NM logs says trying
to connect to 'localhost:0'. Localization takes forever. Why?

Please see the NM logs below.

http://pastebin.com/QfQDZeqF

Thanks,
Prashant

Re: Problem while submitting jobs to NM started with ephemeral ports.

Posted by Prashant Sharma <pr...@gmail.com>.
also I tried commenting out two last two properties in yarn-site
mentioned above. And keeping the following property in mapred-site

    <property>
      <name> mapreduce.shuffle.port</name>
      <value>0</value>
    </property>

I got this exception while running a wordcount.

 mapreduce.Job (Job.java:printTaskEvents(1315)) - Task Id :
attempt_1318840789401_0005_r_000000_0, Status : FAILED
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
shuffle in fetcher#5
	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:126)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:365)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
	at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
	at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:227)
	at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)


And everything works out of the box otherwise.

Thanks,
Prashant.

On Mon, Oct 17, 2011 at 2:03 PM, Prashant Sharma
<pr...@gmail.com> wrote:
> I am using following properties in yarn-site
>
> <property>
> <name>yarn.nodemanager.aux-services</name>
> <value>mapreduce.shuffle</value>
> </property>
>  <property>
> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
> <value>org.apache.hadoop.mapred.ShuffleHandler</value>
> </property>
>  <property>
>    <name>yarn.nodemanager.address</name>
>    <value>localhost:0</value>
>  </property>
>  <property>
>    <name>yarn.nodemanager.localizer.address</name>
>    <value>localhost:0</value>
>  </property>
>
> Everything runs fine. (means all daemons are started perfectly) But
> when you try to submit the job. Job is stuck and NM logs says trying
> to connect to 'localhost:0'. Localization takes forever. Why?
>
> Please see the NM logs below.
>
> http://pastebin.com/QfQDZeqF
>
> Thanks,
> Prashant
>