You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2008/03/09 07:47:46 UTC

[jira] Updated: (HADOOP-2976) Blocks staying underreplicated (for unclosed file)

     [ https://issues.apache.org/jira/browse/HADOOP-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-2976:
-------------------------------------

    Attachment: leaseExpiryReplication.patch

When the namenode processes a lease expiry event, it checks to see if all blocks of this file have achieved their intended replication target. Blocks that have fewer than their target replicas are inserted into neededReplication.

> Blocks staying underreplicated (for unclosed file)
> --------------------------------------------------
>
>                 Key: HADOOP-2976
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2976
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.3
>            Reporter: Koji Noguchi
>            Assignee: dhruba borthakur
>            Priority: Minor
>             Fix For: 0.17.0
>
>         Attachments: leaseExpiryReplication.patch
>
>
> We had two files staying underreplicated for over a day.
> I checked that these under-replicated blocks are not corrupted.
> (They were both task tmp files and most likely didn't get closed.)
> Taking one file, /aaa/_task_200803040823_0001_r_000421_0/part-00421
> Namenode log showed
> namenode.log.2008-03-04 2008-03-04 16:19:21,478 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /aaa/_task_200803040823_0001_r_000421_0/part-00421.  blk_-7848645760735416126
> 2008-03-04 16:19:24,357 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 11.1.111.111:22222 is added to blk_-7848645760735416126
> On the datanode 11.1.111.111, it showed 
> 2008-03-04 16:19:24,358 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-7848645760735416126 from /55.55.55.55 and operation failed at /22.2.222.22
> On the second datanode 22.2.222.22, it showed 
> 2008-03-04 16:19:21,578 INFO org.apache.hadoop.dfs.DataNode: Exception writing to mirror 33.3.33.33
> java.net.SocketException: Connection reset
>   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
>   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>   at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveChunk(DataNode.java:1333)
>   at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:1386)
>   at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:938)
>   at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
>   at java.lang.Thread.run(Thread.java:619)
> 2008-03-04 16:19:24,358 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: java.net.SocketException: Broken pipe
>   at java.net.SocketOutputStream.socketWrite0(Native Method)
>   at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>   at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>   at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>   at java.io.DataOutputStream.flush(DataOutputStream.java:106)
>   at org.apache.hadoop.dfs.DataNode$BlockReceiver.receiveBlock(DataNode.java:1394)
>   at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:938)
>   at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
>   at java.lang.Thread.run(Thread.java:619)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.