You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Prashant Sharma <pr...@gmail.com> on 2011/10/17 10:33:51 UTC
Problem while submitting jobs to NM started with ephemeral ports.
I am using following properties in yarn-site
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>localhost:0</value>
</property>
<property>
<name>yarn.nodemanager.localizer.address</name>
<value>localhost:0</value>
</property>
Everything runs fine. (means all daemons are started perfectly) But
when you try to submit the job. Job is stuck and NM logs says trying
to connect to 'localhost:0'. Localization takes forever. Why?
Please see the NM logs below.
http://pastebin.com/QfQDZeqF
Thanks,
Prashant
Re: Problem while submitting jobs to NM started with ephemeral ports.
Posted by Prashant Sharma <pr...@gmail.com>.
also I tried commenting out two last two properties in yarn-site
mentioned above. And keeping the following property in mapred-site
<property>
<name> mapreduce.shuffle.port</name>
<value>0</value>
</property>
I got this exception while running a wordcount.
mapreduce.Job (Job.java:printTaskEvents(1315)) - Task Id :
attempt_1318840789401_0005_r_000000_0, Status : FAILED
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in
shuffle in fetcher#5
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:126)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:365)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:227)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)
And everything works out of the box otherwise.
Thanks,
Prashant.
On Mon, Oct 17, 2011 at 2:03 PM, Prashant Sharma
<pr...@gmail.com> wrote:
> I am using following properties in yarn-site
>
> <property>
> <name>yarn.nodemanager.aux-services</name>
> <value>mapreduce.shuffle</value>
> </property>
> <property>
> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
> <value>org.apache.hadoop.mapred.ShuffleHandler</value>
> </property>
> <property>
> <name>yarn.nodemanager.address</name>
> <value>localhost:0</value>
> </property>
> <property>
> <name>yarn.nodemanager.localizer.address</name>
> <value>localhost:0</value>
> </property>
>
> Everything runs fine. (means all daemons are started perfectly) But
> when you try to submit the job. Job is stuck and NM logs says trying
> to connect to 'localhost:0'. Localization takes forever. Why?
>
> Please see the NM logs below.
>
> http://pastebin.com/QfQDZeqF
>
> Thanks,
> Prashant
>