You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org> on 2008/07/01 03:32:44 UTC

[jira] Updated: (HADOOP-3657) HDFS writes get stuck trying to recoverBlock

     [ https://issues.apache.org/jira/browse/HADOOP-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3657:
-------------------------------------------

         Priority: Blocker  (was: Major)
    Fix Version/s: 0.18.0

The "java.io.IOException: Connection reset by peer" is very easy to reproduce.  Promote this to a 0.18 blocker.


> HDFS writes get stuck trying to recoverBlock
> --------------------------------------------
>
>                 Key: HADOOP-3657
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3657
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> A few reduces got stuck in a sort500 job with the following thread dump:
> {noformat}
> "main" prio=10 tid=0x0805b800 nid=0x1951 waiting for monitor entry [0xf7e6d000..0xf7e6e1f8]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>   at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2485)
>   - waiting to lock <0xe905e8f8> (a java.util.LinkedList)
>   - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
>   at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
>   at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>   - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
>   at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>   - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:58)
>   - locked <0xe905e928> (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
>   at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:39)
>   at java.io.DataOutputStream.writeInt(DataOutputStream.java:181)
>   at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1014)
>   - locked <0xe90889e8> (a org.apache.hadoop.io.SequenceFile$Writer)
>   at org.apache.hadoop.mapred.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:70)
>   at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:298)
>   at org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:39)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:316)
>   at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2157)
> "DataStreamer for file /rw/out/_temporary/_attempt_200806261801_0006_r_000712_0/part-00712 block blk_-3923696991063961587_9628" daemon prio=10 tid=0x08413c00 nid=0x367a in Object.wait() [0xd00e4000..0xd00e4f20]
>    java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:485)
>   at org.apache.hadoop.ipc.Client.call(Client.java:701)
>   - locked <0xf167d540> (a org.apache.hadoop.ipc.Client$Call)
>   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>   at org.apache.hadoop.dfs.$Proxy2.recoverBlock(Unknown Source)
>   at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2186)
>   at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1737)
>   at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1891)
>   - locked <0xe905e8f8> (a java.util.LinkedList)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.