You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Nigel Daley (JIRA)" <ji...@apache.org> on 2007/03/09 03:17:24 UTC
[jira] Created: (HADOOP-1098) output blocks lost when speculative
execution is used
output blocks lost when speculative execution is used
-----------------------------------------------------
Key: HADOOP-1098
URL: https://issues.apache.org/jira/browse/HADOOP-1098
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.12.0
Reporter: Nigel Daley
Priority: Critical
Fix For: 0.13.0
The Sort benchmark completes successfully for me on the latest trunk (0.12.1 candidate) with speculation turned on. Validation of the Sort benchmark output, however, is failing. I see one sort output file (part-00375) that is way smaller than all the rest. In fact, it is exactly 1 block long.
dfs ls output:
...
/user/hadoopqa/sortBenchmark100/output/part-00373 <r 3> 2971688212
/user/hadoopqa/sortBenchmark100/output/part-00374 <r 3> 2973451660
/user/hadoopqa/sortBenchmark100/output/part-00375 <r 3> 134217728
/user/hadoopqa/sortBenchmark100/output/part-00376 <r 3> 2972933208
/user/hadoopqa/sortBenchmark100/output/part-00377 <r 3> 2972309956
...
During the Sort Benchmark, I see 9 AlreadyBeingCreatedExceptions in the NameNode log for this file (and more of these exceptions for other files too). I also include here the 1 PendingReplicationMonitor WARN message from the NameNode log in case it's relevant:
...
2007-03-08 21:56:31,747 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor timed out block blk_-849195508701590166
...
2007-03-08 22:04:35,471 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
2007-03-08 22:04:35,476 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:701)
at org.apache.hadoop.dfs.NameNode.create(NameNode.java:283)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
...
During sort validation, I get this exception in the JobTracker log:
2007-03-08 22:51:32,017 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0003_m_001379_0: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:57)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:91)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1525)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1436)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1482)
at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:72)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
I also saw this in the DataNode log during sort validation, but it could be unrelated:
2007-03-09 01:04:41,323 WARN org.apache.hadoop.dfs.DataNode: java.io.IOException: Unexpected error trying to delete block blk_-5047673597270588432. Block not found in blockMap.
at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:596)
at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:429)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1053)
at java.lang.Thread.run(Thread.java:619)
Since the default for 0.12.1 will be for speculative execution to be turned off, I am assigning this to 0.13.0
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HADOOP-1098) output blocks lost when speculative
execution is used
Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley resolved HADOOP-1098.
-----------------------------------
Resolution: Duplicate
This was fixed by HADOOP-1127.
> output blocks lost when speculative execution is used
> -----------------------------------------------------
>
> Key: HADOOP-1098
> URL: https://issues.apache.org/jira/browse/HADOOP-1098
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.0
> Reporter: Nigel Daley
> Priority: Critical
> Fix For: 0.13.0
>
>
> The Sort benchmark completes successfully for me on the latest trunk (0.12.1 candidate) with speculation turned on. Validation of the Sort benchmark output, however, is failing. I see one sort output file (part-00375) that is way smaller than all the rest. In fact, it is exactly 1 block long.
> dfs ls output:
> ...
> /user/hadoopqa/sortBenchmark100/output/part-00373 <r 3> 2971688212
> /user/hadoopqa/sortBenchmark100/output/part-00374 <r 3> 2973451660
> /user/hadoopqa/sortBenchmark100/output/part-00375 <r 3> 134217728
> /user/hadoopqa/sortBenchmark100/output/part-00376 <r 3> 2972933208
> /user/hadoopqa/sortBenchmark100/output/part-00377 <r 3> 2972309956
> ...
> During the Sort Benchmark, I see 9 AlreadyBeingCreatedExceptions in the NameNode log for this file (and more of these exceptions for other files too). I also include here the 1 PendingReplicationMonitor WARN message from the NameNode log in case it's relevant:
> ...
> 2007-03-08 21:56:31,747 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor timed out block blk_-849195508701590166
> ...
> 2007-03-08 22:04:35,471 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
> 2007-03-08 22:04:35,476 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
> at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:701)
> at org.apache.hadoop.dfs.NameNode.create(NameNode.java:283)
> at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> ...
> During sort validation, I get this exception in the JobTracker log:
> 2007-03-08 22:51:32,017 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0003_m_001379_0: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:57)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:91)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1525)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1436)
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1482)
> at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:72)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
> I also saw this in the DataNode log during sort validation, but it could be unrelated:
> 2007-03-09 01:04:41,323 WARN org.apache.hadoop.dfs.DataNode: java.io.IOException: Unexpected error trying to delete block blk_-5047673597270588432. Block not found in blockMap.
> at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:596)
> at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:429)
> at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1053)
> at java.lang.Thread.run(Thread.java:619)
> Since the default for 0.12.1 will be for speculative execution to be turned off, I am assigning this to 0.13.0
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.