You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by David Parks <da...@yahoo.com> on 2013/03/11 09:23:45 UTC

Unexpected Hadoop behavior: map task re-running after reducer has been running

I can't explain this behavior, can someone help me here:

 

  Kind  % Complete Num Tasks Pending Running Complete Killed Failed/Killed
Task Attempts

    map      100.00%    23547       0       1    23546      0       247 / 0

   reduce     62.40%    10000    3738      30     6232      0       336 / 0

 

This job has been running for 48 hours and the reducer is quite a ways
through its processing. But we've hit a snag (I'm not sure what snag
exactly).

 

The map tasks were 100% complete, none running, but now I see 1 map task
running. In a few minutes that map task will finish and I'll see "Running
map tasks" change to 0, the # of failed map tasks will increment by 1, and
the map task will run again a short time thereafter. This seems perpetual as
we had 1 map task failed at the end of the map processing when the reducer
started running, now we have 247 failed map tasks and this is the pattern
I've been watching for hours now.

 

Anyone want to venture some guesses here?

 

Thanks,

David

 

 

 

 

 


Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by Ravi Prakash <ra...@ymail.com>.
This is not unexpected behavior. If there are fetch failures on the Reduce (i.e. its not able to get the map outputs) then a map may be rerun.




________________________________
 From: David Parks <da...@yahoo.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Monday, March 11, 2013 3:30 AM
Subject: Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I should have included the obvious log files here.


Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
        at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
        ... 23 more

Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures




________________________________
 From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org 
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I can’t explain this behavior, can someone help me here:
 
  Kind  % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
    map      100.00%    23547       0       1    23546      0       247 / 0
   reduce     62.40%    10000    3738      30     6232      0       336 / 0
 
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
 
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
 
Anyone want to venture some guesses here?
 
Thanks,
David

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by Ravi Prakash <ra...@ymail.com>.
This is not unexpected behavior. If there are fetch failures on the Reduce (i.e. its not able to get the map outputs) then a map may be rerun.




________________________________
 From: David Parks <da...@yahoo.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Monday, March 11, 2013 3:30 AM
Subject: Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I should have included the obvious log files here.


Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
        at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
        ... 23 more

Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures




________________________________
 From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org 
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I can’t explain this behavior, can someone help me here:
 
  Kind  % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
    map      100.00%    23547       0       1    23546      0       247 / 0
   reduce     62.40%    10000    3738      30     6232      0       336 / 0
 
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
 
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
 
Anyone want to venture some guesses here?
 
Thanks,
David

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by Bejoy Ks <be...@gmail.com>.
Hi David

The issue with the maps getting re triggered is because one of the node
where map outputs are stored are getting lost during reduce phase. As a
result of this the map outputs are no longer availabe from that node for
reduce to process and the job is again re triggered.

Can you try re triggering the job after increasing the value for
tasktracker.http.threads

Regards
Bejoy KS

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by Ravi Prakash <ra...@ymail.com>.
This is not unexpected behavior. If there are fetch failures on the Reduce (i.e. its not able to get the map outputs) then a map may be rerun.




________________________________
 From: David Parks <da...@yahoo.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Monday, March 11, 2013 3:30 AM
Subject: Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I should have included the obvious log files here.


Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
        at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
        ... 23 more

Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures




________________________________
 From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org 
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I can’t explain this behavior, can someone help me here:
 
  Kind  % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
    map      100.00%    23547       0       1    23546      0       247 / 0
   reduce     62.40%    10000    3738      30     6232      0       336 / 0
 
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
 
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
 
Anyone want to venture some guesses here?
 
Thanks,
David

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by Bejoy Ks <be...@gmail.com>.
Hi David

The issue with the maps getting re triggered is because one of the node
where map outputs are stored are getting lost during reduce phase. As a
result of this the map outputs are no longer availabe from that node for
reduce to process and the job is again re triggered.

Can you try re triggering the job after increasing the value for
tasktracker.http.threads

Regards
Bejoy KS

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by Bejoy Ks <be...@gmail.com>.
Hi David

The issue with the maps getting re triggered is because one of the node
where map outputs are stored are getting lost during reduce phase. As a
result of this the map outputs are no longer availabe from that node for
reduce to process and the job is again re triggered.

Can you try re triggering the job after increasing the value for
tasktracker.http.threads

Regards
Bejoy KS

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by Bejoy Ks <be...@gmail.com>.
Hi David

The issue with the maps getting re triggered is because one of the node
where map outputs are stored are getting lost during reduce phase. As a
result of this the map outputs are no longer availabe from that node for
reduce to process and the job is again re triggered.

Can you try re triggering the job after increasing the value for
tasktracker.http.threads

Regards
Bejoy KS

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by Ravi Prakash <ra...@ymail.com>.
This is not unexpected behavior. If there are fetch failures on the Reduce (i.e. its not able to get the map outputs) then a map may be rerun.




________________________________
 From: David Parks <da...@yahoo.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Monday, March 11, 2013 3:30 AM
Subject: Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I should have included the obvious log files here.


Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
        at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
        ... 23 more

Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures




________________________________
 From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org 
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I can’t explain this behavior, can someone help me here:
 
  Kind  % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
    map      100.00%    23547       0       1    23546      0       247 / 0
   reduce     62.40%    10000    3738      30     6232      0       336 / 0
 
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
 
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
 
Anyone want to venture some guesses here?
 
Thanks,
David

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by David Parks <da...@yahoo.com>.
I should have included the obvious log files here.


Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
        at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
        ... 23 more

Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures




________________________________
 From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org 
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I can’t explain this behavior, can someone help me here:
 
  Kind  % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
    map      100.00%    23547       0       1    23546      0       247 / 0
   reduce     62.40%    10000    3738      30     6232      0       336 / 0
 
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
 
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
 
Anyone want to venture some guesses here?
 
Thanks,
David

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by David Parks <da...@yahoo.com>.
I should have included the obvious log files here.


Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
        at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
        ... 23 more

Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures




________________________________
 From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org 
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I can’t explain this behavior, can someone help me here:
 
  Kind  % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
    map      100.00%    23547       0       1    23546      0       247 / 0
   reduce     62.40%    10000    3738      30     6232      0       336 / 0
 
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
 
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
 
Anyone want to venture some guesses here?
 
Thanks,
David

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by David Parks <da...@yahoo.com>.
I should have included the obvious log files here.


Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
        at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
        ... 23 more

Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures




________________________________
 From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org 
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I can’t explain this behavior, can someone help me here:
 
  Kind  % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
    map      100.00%    23547       0       1    23546      0       247 / 0
   reduce     62.40%    10000    3738      30     6232      0       336 / 0
 
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
 
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
 
Anyone want to venture some guesses here?
 
Thanks,
David

Re: Unexpected Hadoop behavior: map task re-running after reducer has been running

Posted by David Parks <da...@yahoo.com>.
I should have included the obvious log files here.


Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
        at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
        at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
        at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
        at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
        ... 23 more

Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures




________________________________
 From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org 
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
 

I can’t explain this behavior, can someone help me here:
 
  Kind  % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
    map      100.00%    23547       0       1    23546      0       247 / 0
   reduce     62.40%    10000    3738      30     6232      0       336 / 0
 
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
 
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
 
Anyone want to venture some guesses here?
 
Thanks,
David