You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Chris Curtin <cu...@gmail.com> on 2011/12/05 21:45:15 UTC

MAX_FETCH_RETRIES_PER_MAP (TaskTracker dying?)

Hi,

Using: *Version:* 0.20.2-cdh3u0, r81256ad0f2e4ab2bd34b04f53d25a6c23686dd14,
8 node cluster, 64 bit Centos

We are occasionally seeing MAX_FETCH_RETRIES_PER_MAP errors on reducer
jobs. When we investigate it looks like the TaskTracker on the node being
fetched from is not running. Looking at the logs we see what looks like a
self-initiated shutdown:

2011-12-05 14:10:48,632 INFO org.apache.hadoop.mapred.JvmManager: JVM :
jvm_201112050908_0222_r_1100711673 exited with exit code 0. Number of tasks
it ran: 0
2011-12-05 14:10:48,632 ERROR org.apache.hadoop.mapred.JvmManager: Caught
Throwable in JVMRunner. Aborting TaskTracker.
java.lang.NullPointerException
        at
org.apache.hadoop.mapred.DefaultTaskController.logShExecStatus(DefaultTaskController.java:145)
        at
org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:129)
        at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:472)
        at
org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:446)
2011-12-05 14:10:48,634 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at had11.atlis1/10.120.41.118
************************************************************/

Then the reducers have the following:


2011-12-05 14:12:00,962 WARN org.apache.hadoop.mapred.ReduceTask:
java.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
 at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
 at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
 at java.net.Socket.connect(Socket.java:529)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
 at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
 at sun.net.www.http.HttpClient.New(HttpClient.java:306)
 at sun.net.www.http.HttpClient.New(HttpClient.java:323)
 at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
 at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
 at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1525)
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1482)
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1390)
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

2011-12-05 14:12:00,962 INFO org.apache.hadoop.mapred.ReduceTask: Task
attempt_201112050908_0169_r_000005_0: Failed fetch #2 from
attempt_201112050908_0169_m_000002_0
2011-12-05 14:12:00,962 INFO org.apache.hadoop.mapred.ReduceTask: Failed to
fetch map-output from attempt_201112050908_0169_m_000002_0 even after
MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to
the JobTracker
2011-12-05 14:12:00,962 FATAL org.apache.hadoop.mapred.ReduceTask: Shuffle
failed with too many fetch failures and insufficient progress!Killing task
attempt_201112050908_0169_r_000005_0.
2011-12-05 14:12:00,966 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_201112050908_0169_r_000005_0 adding host had11.atlis1 to penalty
box, next contact in 8 seconds
2011-12-05 14:12:00,966 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201112050908_0169_r_000005_0: Got 1 map-outputs from previous
failures
The job then fails.

Several questions:
1. what is causing the TaskTracker to fail/exit? This is after running
hundreds to thousands of jobs, so it's not just at start-up.
2. why isn't hadoop detecting that the reducers need something from a dead
mapper and restarting the mapper job, even it means aborting the reducers?
3. why isn't the DataNode being used to fetch the blocks? It is still up
and running when this happens, so shouldn't it know where the files are in
HDFS?

Thanks,

Chris

Re: MAX_FETCH_RETRIES_PER_MAP (TaskTracker dying?)

Posted by Chris Curtin <cu...@gmail.com>.
Thanks guys, I'll get with operations to do the upgrade.

Chris

On Mon, Dec 5, 2011 at 4:11 PM, Bejoy Ks <be...@gmail.com> wrote:

> Hi Chris
>         From the stack trace, it looks like a JVM corruption issue. It is
> a known issue and have been fixed in CDH3u2, i believe an upgrade would
> solve your issues.
> https://issues.apache.org/jira/browse/MAPREDUCE-3184
>
> Then regarding your queries,I'd try to help you out a bit.In mapreduce the
> data transfer between map and reduce happens over http. If jetty is down
> then that won't happen which means map output in one location wont be
> accessible to reducer in another location. The map outputs are in LFS and
> not on HDFS so even if the data node on the machine is up we can't get the
> data in above circumstances.
>
> Hope it helps!..
>
> Regards
> Bejoy.K.S
>
>
> On Tue, Dec 6, 2011 at 2:15 AM, Chris Curtin <cu...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Using: *Version:* 0.20.2-cdh3u0,
> r81256ad0f2e4ab2bd34b04f53d25a6c23686dd14,
>  > 8 node cluster, 64 bit Centos
> >
> > We are occasionally seeing MAX_FETCH_RETRIES_PER_MAP errors on reducer
> > jobs. When we investigate it looks like the TaskTracker on the node being
> > fetched from is not running. Looking at the logs we see what looks like a
> > self-initiated shutdown:
> >
> > 2011-12-05 14:10:48,632 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> > jvm_201112050908_0222_r_1100711673 exited with exit code 0. Number of
> tasks
> > it ran: 0
> > 2011-12-05 14:10:48,632 ERROR org.apache.hadoop.mapred.JvmManager: Caught
> > Throwable in JVMRunner. Aborting TaskTracker.
> > java.lang.NullPointerException
> >        at
> >
> >
> org.apache.hadoop.mapred.DefaultTaskController.logShExecStatus(DefaultTaskController.java:145)
> >        at
> >
> >
> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:129)
> >        at
> >
> >
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:472)
> >        at
> >
> >
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:446)
> > 2011-12-05 14:10:48,634 INFO org.apache.hadoop.mapred.TaskTracker:
> > SHUTDOWN_MSG:
> > /************************************************************
> > SHUTDOWN_MSG: Shutting down TaskTracker at had11.atlis1/10.120.41.118
> > ************************************************************/
> >
> > Then the reducers have the following:
> >
> >
> > 2011-12-05 14:12:00,962 WARN org.apache.hadoop.mapred.ReduceTask:
> > java.net.ConnectException: Connection refused
> >  at java.net.PlainSocketImpl.socketConnect(Native Method)
> >  at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> >  at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> >  at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> >  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> >  at java.net.Socket.connect(Socket.java:529)
> >  at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
> >  at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
> >  at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
> >  at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
> >  at sun.net.www.http.HttpClient.New(HttpClient.java:306)
> >  at sun.net.www.http.HttpClient.New(HttpClient.java:323)
> >  at
> >
> >
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
> >  at
> >
> >
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
> >  at
> >
> >
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
> >  at
> >
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1525)
> >  at
> >
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1482)
> >  at
> >
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1390)
> >  at
> >
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
> >  at
> >
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)
> >
> > 2011-12-05 14:12:00,962 INFO org.apache.hadoop.mapred.ReduceTask: Task
> > attempt_201112050908_0169_r_000005_0: Failed fetch #2 from
> > attempt_201112050908_0169_m_000002_0
> > 2011-12-05 14:12:00,962 INFO org.apache.hadoop.mapred.ReduceTask: Failed
> to
> > fetch map-output from attempt_201112050908_0169_m_000002_0 even after
> > MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting
> to
> > the JobTracker
> > 2011-12-05 14:12:00,962 FATAL org.apache.hadoop.mapred.ReduceTask:
> Shuffle
> > failed with too many fetch failures and insufficient progress!Killing
> task
> > attempt_201112050908_0169_r_000005_0.
> > 2011-12-05 14:12:00,966 WARN org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112050908_0169_r_000005_0 adding host had11.atlis1 to penalty
> > box, next contact in 8 seconds
> > 2011-12-05 14:12:00,966 INFO org.apache.hadoop.mapred.ReduceTask:
> > attempt_201112050908_0169_r_000005_0: Got 1 map-outputs from previous
> > failures
> > The job then fails.
> >
> > Several questions:
> > 1. what is causing the TaskTracker to fail/exit? This is after running
> > hundreds to thousands of jobs, so it's not just at start-up.
> > 2. why isn't hadoop detecting that the reducers need something from a
> dead
> > mapper and restarting the mapper job, even it means aborting the
> reducers?
> > 3. why isn't the DataNode being used to fetch the blocks? It is still up
> > and running when this happens, so shouldn't it know where the files are
> in
> > HDFS?
> >
> > Thanks,
> >
> > Chris
> >
>

Re: MAX_FETCH_RETRIES_PER_MAP (TaskTracker dying?)

Posted by Bejoy Ks <be...@gmail.com>.
Hi Chris
         From the stack trace, it looks like a JVM corruption issue. It is
a known issue and have been fixed in CDH3u2, i believe an upgrade would
solve your issues.
https://issues.apache.org/jira/browse/MAPREDUCE-3184

Then regarding your queries,I'd try to help you out a bit.In mapreduce the
data transfer between map and reduce happens over http. If jetty is down
then that won't happen which means map output in one location wont be
accessible to reducer in another location. The map outputs are in LFS and
not on HDFS so even if the data node on the machine is up we can't get the
data in above circumstances.

Hope it helps!..

Regards
Bejoy.K.S


On Tue, Dec 6, 2011 at 2:15 AM, Chris Curtin <cu...@gmail.com> wrote:

> Hi,
>
> Using: *Version:* 0.20.2-cdh3u0, r81256ad0f2e4ab2bd34b04f53d25a6c23686dd14,
> 8 node cluster, 64 bit Centos
>
> We are occasionally seeing MAX_FETCH_RETRIES_PER_MAP errors on reducer
> jobs. When we investigate it looks like the TaskTracker on the node being
> fetched from is not running. Looking at the logs we see what looks like a
> self-initiated shutdown:
>
> 2011-12-05 14:10:48,632 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201112050908_0222_r_1100711673 exited with exit code 0. Number of tasks
> it ran: 0
> 2011-12-05 14:10:48,632 ERROR org.apache.hadoop.mapred.JvmManager: Caught
> Throwable in JVMRunner. Aborting TaskTracker.
> java.lang.NullPointerException
>        at
>
> org.apache.hadoop.mapred.DefaultTaskController.logShExecStatus(DefaultTaskController.java:145)
>        at
>
> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:129)
>        at
>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:472)
>        at
>
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:446)
> 2011-12-05 14:10:48,634 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at had11.atlis1/10.120.41.118
> ************************************************************/
>
> Then the reducers have the following:
>
>
> 2011-12-05 14:12:00,962 WARN org.apache.hadoop.mapred.ReduceTask:
> java.net.ConnectException: Connection refused
>  at java.net.PlainSocketImpl.socketConnect(Native Method)
>  at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>  at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
>  at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>  at java.net.Socket.connect(Socket.java:529)
>  at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
>  at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>  at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>  at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>  at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>  at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>  at
>
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
>  at
>
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
>  at
>
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
>  at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1525)
>  at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1482)
>  at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1390)
>  at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
>  at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)
>
> 2011-12-05 14:12:00,962 INFO org.apache.hadoop.mapred.ReduceTask: Task
> attempt_201112050908_0169_r_000005_0: Failed fetch #2 from
> attempt_201112050908_0169_m_000002_0
> 2011-12-05 14:12:00,962 INFO org.apache.hadoop.mapred.ReduceTask: Failed to
> fetch map-output from attempt_201112050908_0169_m_000002_0 even after
> MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to
> the JobTracker
> 2011-12-05 14:12:00,962 FATAL org.apache.hadoop.mapred.ReduceTask: Shuffle
> failed with too many fetch failures and insufficient progress!Killing task
> attempt_201112050908_0169_r_000005_0.
> 2011-12-05 14:12:00,966 WARN org.apache.hadoop.mapred.ReduceTask:
> attempt_201112050908_0169_r_000005_0 adding host had11.atlis1 to penalty
> box, next contact in 8 seconds
> 2011-12-05 14:12:00,966 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201112050908_0169_r_000005_0: Got 1 map-outputs from previous
> failures
> The job then fails.
>
> Several questions:
> 1. what is causing the TaskTracker to fail/exit? This is after running
> hundreds to thousands of jobs, so it's not just at start-up.
> 2. why isn't hadoop detecting that the reducers need something from a dead
> mapper and restarting the mapper job, even it means aborting the reducers?
> 3. why isn't the DataNode being used to fetch the blocks? It is still up
> and running when this happens, so shouldn't it know where the files are in
> HDFS?
>
> Thanks,
>
> Chris
>

Re: MAX_FETCH_RETRIES_PER_MAP (TaskTracker dying?)

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Chris,

I'd suggest updating to a newer version of your hadoop distro - you're
hitting some bugs that were fixed last summer. In particular, you're
missing the "amendment" patch from MAPREDUCE-2373 as well as some
patches to MR which make the fetch retry behavior more aggressive.

-Todd

On Mon, Dec 5, 2011 at 12:45 PM, Chris Curtin <cu...@gmail.com> wrote:
> Hi,
>
> Using: *Version:* 0.20.2-cdh3u0, r81256ad0f2e4ab2bd34b04f53d25a6c23686dd14,
> 8 node cluster, 64 bit Centos
>
> We are occasionally seeing MAX_FETCH_RETRIES_PER_MAP errors on reducer
> jobs. When we investigate it looks like the TaskTracker on the node being
> fetched from is not running. Looking at the logs we see what looks like a
> self-initiated shutdown:
>
> 2011-12-05 14:10:48,632 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201112050908_0222_r_1100711673 exited with exit code 0. Number of tasks
> it ran: 0
> 2011-12-05 14:10:48,632 ERROR org.apache.hadoop.mapred.JvmManager: Caught
> Throwable in JVMRunner. Aborting TaskTracker.
> java.lang.NullPointerException
>        at
> org.apache.hadoop.mapred.DefaultTaskController.logShExecStatus(DefaultTaskController.java:145)
>        at
> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:129)
>        at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:472)
>        at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:446)
> 2011-12-05 14:10:48,634 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at had11.atlis1/10.120.41.118
> ************************************************************/
>
> Then the reducers have the following:
>
>
> 2011-12-05 14:12:00,962 WARN org.apache.hadoop.mapred.ReduceTask:
> java.net.ConnectException: Connection refused
>  at java.net.PlainSocketImpl.socketConnect(Native Method)
>  at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
>  at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
>  at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
>  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>  at java.net.Socket.connect(Socket.java:529)
>  at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
>  at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>  at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>  at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
>  at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>  at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>  at
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
>  at
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
>  at
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
>  at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1525)
>  at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1482)
>  at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1390)
>  at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
>  at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)
>
> 2011-12-05 14:12:00,962 INFO org.apache.hadoop.mapred.ReduceTask: Task
> attempt_201112050908_0169_r_000005_0: Failed fetch #2 from
> attempt_201112050908_0169_m_000002_0
> 2011-12-05 14:12:00,962 INFO org.apache.hadoop.mapred.ReduceTask: Failed to
> fetch map-output from attempt_201112050908_0169_m_000002_0 even after
> MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to
> the JobTracker
> 2011-12-05 14:12:00,962 FATAL org.apache.hadoop.mapred.ReduceTask: Shuffle
> failed with too many fetch failures and insufficient progress!Killing task
> attempt_201112050908_0169_r_000005_0.
> 2011-12-05 14:12:00,966 WARN org.apache.hadoop.mapred.ReduceTask:
> attempt_201112050908_0169_r_000005_0 adding host had11.atlis1 to penalty
> box, next contact in 8 seconds
> 2011-12-05 14:12:00,966 INFO org.apache.hadoop.mapred.ReduceTask:
> attempt_201112050908_0169_r_000005_0: Got 1 map-outputs from previous
> failures
> The job then fails.
>
> Several questions:
> 1. what is causing the TaskTracker to fail/exit? This is after running
> hundreds to thousands of jobs, so it's not just at start-up.
> 2. why isn't hadoop detecting that the reducers need something from a dead
> mapper and restarting the mapper job, even it means aborting the reducers?
> 3. why isn't the DataNode being used to fetch the blocks? It is still up
> and running when this happens, so shouldn't it know where the files are in
> HDFS?
>
> Thanks,
>
> Chris



-- 
Todd Lipcon
Software Engineer, Cloudera