You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by David Parks <da...@yahoo.com> on 2013/03/11 09:23:45 UTC
Unexpected Hadoop behavior: map task re-running after reducer has been running
I can't explain this behavior, can someone help me here:
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed
Task Attempts
map 100.00% 23547 0 1 23546 0 247 / 0
reduce 62.40% 10000 3738 30 6232 0 336 / 0
This job has been running for 48 hours and the reducer is quite a ways
through its processing. But we've hit a snag (I'm not sure what snag
exactly).
The map tasks were 100% complete, none running, but now I see 1 map task
running. In a few minutes that map task will finish and I'll see "Running
map tasks" change to 0, the # of failed map tasks will increment by 1, and
the map task will run again a short time thereafter. This seems perpetual as
we had 1 map task failed at the end of the map processing when the reducer
started running, now we have 247 failed map tasks and this is the pattern
I've been watching for hours now.
Anyone want to venture some guesses here?
Thanks,
David
Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
Posted by Ravi Prakash <ra...@ymail.com>.
This is not unexpected behavior. If there are fetch failures on the Reduce (i.e. its not able to get the map outputs) then a map may be rerun.
________________________________
From: David Parks <da...@yahoo.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Sent: Monday, March 11, 2013 3:30 AM
Subject: Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
I should have included the obvious log files here.
Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
... 23 more
Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures
________________________________
From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
I can’t explain this behavior, can someone help me here:
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 100.00% 23547 0 1 23546 0 247 / 0
reduce 62.40% 10000 3738 30 6232 0 336 / 0
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
Anyone want to venture some guesses here?
Thanks,
David
Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
Posted by Ravi Prakash <ra...@ymail.com>.
This is not unexpected behavior. If there are fetch failures on the Reduce (i.e. its not able to get the map outputs) then a map may be rerun.
________________________________
From: David Parks <da...@yahoo.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Sent: Monday, March 11, 2013 3:30 AM
Subject: Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
I should have included the obvious log files here.
Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
... 23 more
Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures
________________________________
From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
I can’t explain this behavior, can someone help me here:
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 100.00% 23547 0 1 23546 0 247 / 0
reduce 62.40% 10000 3738 30 6232 0 336 / 0
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
Anyone want to venture some guesses here?
Thanks,
David
Re: Unexpected Hadoop behavior: map task re-running after reducer has
been running
Posted by Bejoy Ks <be...@gmail.com>.
Hi David
The issue with the maps getting re triggered is because one of the node
where map outputs are stored are getting lost during reduce phase. As a
result of this the map outputs are no longer availabe from that node for
reduce to process and the job is again re triggered.
Can you try re triggering the job after increasing the value for
tasktracker.http.threads
Regards
Bejoy KS
Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
Posted by Ravi Prakash <ra...@ymail.com>.
This is not unexpected behavior. If there are fetch failures on the Reduce (i.e. its not able to get the map outputs) then a map may be rerun.
________________________________
From: David Parks <da...@yahoo.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Sent: Monday, March 11, 2013 3:30 AM
Subject: Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
I should have included the obvious log files here.
Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
... 23 more
Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures
________________________________
From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
I can’t explain this behavior, can someone help me here:
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 100.00% 23547 0 1 23546 0 247 / 0
reduce 62.40% 10000 3738 30 6232 0 336 / 0
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
Anyone want to venture some guesses here?
Thanks,
David
Re: Unexpected Hadoop behavior: map task re-running after reducer has
been running
Posted by Bejoy Ks <be...@gmail.com>.
Hi David
The issue with the maps getting re triggered is because one of the node
where map outputs are stored are getting lost during reduce phase. As a
result of this the map outputs are no longer availabe from that node for
reduce to process and the job is again re triggered.
Can you try re triggering the job after increasing the value for
tasktracker.http.threads
Regards
Bejoy KS
Re: Unexpected Hadoop behavior: map task re-running after reducer has
been running
Posted by Bejoy Ks <be...@gmail.com>.
Hi David
The issue with the maps getting re triggered is because one of the node
where map outputs are stored are getting lost during reduce phase. As a
result of this the map outputs are no longer availabe from that node for
reduce to process and the job is again re triggered.
Can you try re triggering the job after increasing the value for
tasktracker.http.threads
Regards
Bejoy KS
Re: Unexpected Hadoop behavior: map task re-running after reducer has
been running
Posted by Bejoy Ks <be...@gmail.com>.
Hi David
The issue with the maps getting re triggered is because one of the node
where map outputs are stored are getting lost during reduce phase. As a
result of this the map outputs are no longer availabe from that node for
reduce to process and the job is again re triggered.
Can you try re triggering the job after increasing the value for
tasktracker.http.threads
Regards
Bejoy KS
Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
Posted by Ravi Prakash <ra...@ymail.com>.
This is not unexpected behavior. If there are fetch failures on the Reduce (i.e. its not able to get the map outputs) then a map may be rerun.
________________________________
From: David Parks <da...@yahoo.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Sent: Monday, March 11, 2013 3:30 AM
Subject: Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
I should have included the obvious log files here.
Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
... 23 more
Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures
________________________________
From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
I can’t explain this behavior, can someone help me here:
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 100.00% 23547 0 1 23546 0 247 / 0
reduce 62.40% 10000 3738 30 6232 0 336 / 0
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
Anyone want to venture some guesses here?
Thanks,
David
Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
Posted by David Parks <da...@yahoo.com>.
I should have included the obvious log files here.
Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
... 23 more
Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures
________________________________
From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
I can’t explain this behavior, can someone help me here:
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 100.00% 23547 0 1 23546 0 247 / 0
reduce 62.40% 10000 3738 30 6232 0 336 / 0
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
Anyone want to venture some guesses here?
Thanks,
David
Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
Posted by David Parks <da...@yahoo.com>.
I should have included the obvious log files here.
Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
... 23 more
Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures
________________________________
From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
I can’t explain this behavior, can someone help me here:
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 100.00% 23547 0 1 23546 0 247 / 0
reduce 62.40% 10000 3738 30 6232 0 336 / 0
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
Anyone want to venture some guesses here?
Thanks,
David
Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
Posted by David Parks <da...@yahoo.com>.
I should have included the obvious log files here.
Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
... 23 more
Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures
________________________________
From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
I can’t explain this behavior, can someone help me here:
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 100.00% 23547 0 1 23546 0 247 / 0
reduce 62.40% 10000 3738 30 6232 0 336 / 0
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
Anyone want to venture some guesses here?
Thanks,
David
Re: Unexpected Hadoop behavior: map task re-running after reducer has been running
Posted by David Parks <da...@yahoo.com>.
I should have included the obvious log files here.
Task attempt_201303080219_0002_r_005846_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005857_0 failed to report status for 7203 seconds. Killing!
Task attempt_201303080219_0002_r_005858_0 failed to report status for 7200 seconds. Killing!
Map output lost, rescheduling: getMapOutput(attempt_201303080219_0002_m_006433_0,6169) failed :
java.io.IOException: Error Reading IndexFile
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:113)
at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:66)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3998)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:842)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:155)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:73)
at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:54)
at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:109)
... 23 more
Task attempt_201303080219_0002_r_005908_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_005962_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005971_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005980_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005983_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005988_0 failed to report status for 7200 seconds. Killing!
attempt_201303080219_0002_r_005988_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005986_0 failed to report status for 7202 seconds. Killing!
Task attempt_201303080219_0002_r_005992_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_005992_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_005993_0 failed to report status for 7200 seconds. Killing!
Task attempt_201303080219_0002_r_006004_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_005999_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_005999_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Task attempt_201303080219_0002_r_006005_0 failed to report status for 7202 seconds. Killing!
attempt_201303080219_0002_r_006005_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006007_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006007_0: [Fatal Error] :1:1: Premature end of file.
Task attempt_201303080219_0002_r_006010_0 failed to report status for 7201 seconds. Killing!
Task attempt_201303080219_0002_r_006013_0 failed to report status for 7201 seconds. Killing!
Too many fetch-failures
Too many fetch-failures
Task attempt_201303080219_0002_r_006026_0 failed to report status for 7201 seconds. Killing!
attempt_201303080219_0002_r_006026_0: [Fatal Error] :1:1: Premature end of file.
Too many fetch-failures
Too many fetch-failures
Too many fetch-failures
________________________________
From: David Parks <da...@yahoo.com>
To: user@hadoop.apache.org
Sent: Monday, March 11, 2013 3:23 PM
Subject: Unexpected Hadoop behavior: map task re-running after reducer has been running
I can’t explain this behavior, can someone help me here:
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 100.00% 23547 0 1 23546 0 247 / 0
reduce 62.40% 10000 3738 30 6232 0 336 / 0
This job has been running for 48 hours and the reducer is quite a ways through its processing. But we’ve hit a snag (I’m not sure what snag exactly).
The map tasks were 100% complete, none running, but now I see 1 map task running. In a few minutes that map task will finish and I’ll see “Running map tasks” change to 0, the # of failed map tasks will increment by 1, and the map task will run again a short time thereafter. This seems perpetual as we had 1 map task failed at the end of the map processing when the reducer started running, now we have 247 failed map tasks and this is the pattern I’ve been watching for hours now.
Anyone want to venture some guesses here?
Thanks,
David