You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Nigel Daley (JIRA)" <ji...@apache.org> on 2007/03/09 03:17:24 UTC

[jira] Created: (HADOOP-1098) output blocks lost when speculative execution is used

output blocks lost when speculative execution is used
-----------------------------------------------------

                 Key: HADOOP-1098
                 URL: https://issues.apache.org/jira/browse/HADOOP-1098
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.12.0
            Reporter: Nigel Daley
            Priority: Critical
             Fix For: 0.13.0


The Sort benchmark completes successfully for me on the latest trunk (0.12.1 candidate) with speculation turned on.  Validation of the Sort benchmark output, however, is failing.  I see one sort output file (part-00375) that is way smaller than all the rest.  In fact, it is exactly 1 block long.

dfs ls output:
...
/user/hadoopqa/sortBenchmark100/output/part-00373       <r 3>   2971688212
/user/hadoopqa/sortBenchmark100/output/part-00374       <r 3>   2973451660
/user/hadoopqa/sortBenchmark100/output/part-00375       <r 3>   134217728
/user/hadoopqa/sortBenchmark100/output/part-00376       <r 3>   2972933208
/user/hadoopqa/sortBenchmark100/output/part-00377       <r 3>   2972309956
...


During the Sort Benchmark, I see 9 AlreadyBeingCreatedExceptions in the NameNode log for this file (and more of these exceptions for other files too).  I also include here the 1 PendingReplicationMonitor WARN message from the NameNode log in case it's relevant:
...
2007-03-08 21:56:31,747 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor timed out block blk_-849195508701590166
...
2007-03-08 22:04:35,471 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
2007-03-08 22:04:35,476 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:701)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:283)
        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
...


During sort validation, I get this exception in the JobTracker log:
2007-03-08 22:51:32,017 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0003_m_001379_0: java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:57)
        at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:91)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1525)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1436)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1482)
        at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:72)
        at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)


I also saw this in the DataNode log during sort validation, but it could be unrelated:
2007-03-09 01:04:41,323 WARN org.apache.hadoop.dfs.DataNode: java.io.IOException: Unexpected error trying to delete block blk_-5047673597270588432. Block not found in blockMap.
	at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:596)
	at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:429)
	at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1053)
	at java.lang.Thread.run(Thread.java:619)

Since the default for 0.12.1 will be for speculative execution to be turned off, I am assigning this to 0.13.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-1098) output blocks lost when speculative execution is used

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved HADOOP-1098.
-----------------------------------

    Resolution: Duplicate

This was fixed by HADOOP-1127.

> output blocks lost when speculative execution is used
> -----------------------------------------------------
>
>                 Key: HADOOP-1098
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1098
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Nigel Daley
>            Priority: Critical
>             Fix For: 0.13.0
>
>
> The Sort benchmark completes successfully for me on the latest trunk (0.12.1 candidate) with speculation turned on.  Validation of the Sort benchmark output, however, is failing.  I see one sort output file (part-00375) that is way smaller than all the rest.  In fact, it is exactly 1 block long.
> dfs ls output:
> ...
> /user/hadoopqa/sortBenchmark100/output/part-00373       <r 3>   2971688212
> /user/hadoopqa/sortBenchmark100/output/part-00374       <r 3>   2973451660
> /user/hadoopqa/sortBenchmark100/output/part-00375       <r 3>   134217728
> /user/hadoopqa/sortBenchmark100/output/part-00376       <r 3>   2972933208
> /user/hadoopqa/sortBenchmark100/output/part-00377       <r 3>   2972309956
> ...
> During the Sort Benchmark, I see 9 AlreadyBeingCreatedExceptions in the NameNode log for this file (and more of these exceptions for other files too).  I also include here the 1 PendingReplicationMonitor WARN message from the NameNode log in case it's relevant:
> ...
> 2007-03-08 21:56:31,747 WARN org.apache.hadoop.fs.FSNamesystem: PendingReplicationMonitor timed out block blk_-849195508701590166
> ...
> 2007-03-08 22:04:35,471 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.startFile: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
> 2007-03-08 22:04:35,476 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8020 call error: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /user/hadoopqa/sortBenchmark100/output/part-00375 for DFSClient_task_0002_r_000375_1 on client 72.30.38.16 because pendingCreates is non-null.
>         at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:701)
>         at org.apache.hadoop.dfs.NameNode.create(NameNode.java:283)
>         at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:336)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:559)
> ...
> During sort validation, I get this exception in the JobTracker log:
> 2007-03-08 22:51:32,017 INFO org.apache.hadoop.mapred.TaskInProgress: Error from task_0003_m_001379_0: java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:57)
>         at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:91)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1525)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1436)
>         at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1482)
>         at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:72)
>         at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
> I also saw this in the DataNode log during sort validation, but it could be unrelated:
> 2007-03-09 01:04:41,323 WARN org.apache.hadoop.dfs.DataNode: java.io.IOException: Unexpected error trying to delete block blk_-5047673597270588432. Block not found in blockMap.
> 	at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:596)
> 	at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:429)
> 	at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1053)
> 	at java.lang.Thread.run(Thread.java:619)
> Since the default for 0.12.1 will be for speculative execution to be turned off, I am assigning this to 0.13.0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.