You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/01/28 01:40:59 UTC

[jira] Created: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle incosistent block length correctly

FSNameSystem#addStoredBlock does not handle incosistent block length correctly
------------------------------------------------------------------------------

                 Key: HADOOP-5133
                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.18.2
            Reporter: Hairong Kuang
             Fix For: 0.19.1


Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5133:
----------------------------------

    Attachment: inconsistentLen.patch

This patch implements summary 3 assuming summary 1.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.4
>
>         Attachments: inconsistentLen.patch
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668190#action_12668190 ] 

dhruba borthakur commented on HADOOP-5133:
------------------------------------------

>2. For an under construction block, if the new replica's length is shorter than the NN rec

If a block is under construction, addStoredBlock() should completely ignore it. This block is being written to by a client. The NN has relinquished control of this block to the client/datanode. Am I missing something here?

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672357#action_12672357 ] 

Hairong Kuang commented on HADOOP-5133:
---------------------------------------

Next two lines of the log:
WARN  hdfs.StateChange (FSNamesystem.java:addStoredBlock(2872)) - BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request received for blk_2248817250507458558_1011 on 127.0.0.1:51024 size 63   
WARN  hdfs.StateChange (FSNamesystem.java:addStoredBlock(2872)) - BLOCK* NameSystem.addStoredBlock: Redundant addStoredBlock request received for blk_2248817250507458558_1011 on 127.0.0.1:51021 size 63

blockReceived from 128.0.0.1:51021 did come. This time it did not complain about the length but redundant addStoredBlock. The replica got added to the blocksMap but of no use because the block was already marked as corrupt.

What's wrong here was that 128.0.0.1:51021 had a very good replica but NN wrongly marked it as corrupt based on some stale information.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.4
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671287#action_12671287 ] 

Hairong Kuang commented on HADOOP-5133:
---------------------------------------

Summary of an offline discussion:
1. No block locations should be added to blocksMap for an incomplete block. So HADOOP-5134 should be fixed;
2. The length of the previous block should set to be the default block length when client calls addBlock asking for an additional block for the file;
3. When receiving blockReceived from DN, NameNode checks the length of the new replica:
    If the new replica's length is greater than the default block length or smaller than the current block length, mark the new replica as corrupt;
    If the new replica's length is greater than the current block length, set the block's length to be the new replica's length and mark the existing replicas of the block as corrupt.

I believe that most of logic for 3 has already in 0.18.3 branch.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672476#action_12672476 ] 

hairong edited comment on HADOOP-5133 at 2/10/09 5:18 PM:
----------------------------------------------------------------

The logic in summary 3 should only apply to the last block of a file under construction. A complete picture should be:
{noformat}
if the reported block is the last block of an under-construction file {
     do summary 3
} else {
    if the reported block is not the last block && its NN recorded length is not equal to the preferred size
         set the block's NN recorded length to be the preferred block size;
    if the reported length is not equal to the NN recorded length
         mark the reported block as corrupt;
}
{noformat}

      was (Author: hairong):
    The logic in summary 3 should only apply to the last block of a file under construction. A complete picture should be:
if the reported block is the last block of an under-construction file {
     do summary 3
} else {
    if the reported block is not the last block of a file and NN recorded length is not equal to the preferred block length, set the block's NN recorded length to be the preferred block size;
    if the reported length is not equal to the NN recorded length, mark the reported block as corrupt;
}
  
> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.4
>
>         Attachments: inconsistentLen.patch
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5133:
----------------------------------

    Fix Version/s:     (was: 0.19.1)
                   0.18.4
         Assignee: Hairong Kuang

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.4
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5133:
----------------------------------

    Attachment: inconsistentLen2.patch

This patch adds a unit test and fixes a bug in the previous patch.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.20.0
>
>         Attachments: inconsistentLen.patch, inconsistentLen1.patch, inconsistentLen2.patch
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671334#action_12671334 ] 

dhruba borthakur commented on HADOOP-5133:
------------------------------------------

> if the new replica's length is greater than the default block length or smaller than

Just to mbe more explicit: the "default block length" referred to here is the the preferredBlockSize for this file. It does not refer to the default block size of the filesystem.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5133:
----------------------------------

    Priority: Blocker  (was: Major)

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5133:
----------------------------------

         Priority: Major  (was: Blocker)
    Fix Version/s:     (was: 0.18.4)
                   0.20.0

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.20.0
>
>         Attachments: inconsistentLen.patch, inconsistentLen1.patch
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668096#action_12668096 ] 

Konstantin Shvachko commented on HADOOP-5133:
---------------------------------------------

I do not think name-node's decision on whether the block is corrupt or not should be based on its length. This assumes that files can only grow. If we ever decide to implement truncates, which is a reasonable extension of append, this whole logic will have to be reconsidered.
I think the decision should rather be based on generation stamps, etc.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672198#action_12672198 ] 

dhruba borthakur commented on HADOOP-5133:
------------------------------------------

>WARN namenode.FSNamesystem (FSNamesystem.java:addStoredBlock(2791)) - Inconsistent size for block blk_2248817250507458558_1011 reported from 127.0.0.1:51024 current size is 6 reported size is 63
>WARN namenode.FSNamesystem (FSNamesystem.java:addStoredBlock(2816)) - Mark existing replica blk_2248817250507458558_1011from 127.0.0.1:51021 as corrupt because its length is shorter than the new one.

why wasn't a blockreceived generated by 128.0.0.1:51021 after the above two log lines?

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.4
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679419#action_12679419 ] 

Hairong Kuang commented on HADOOP-5133:
---------------------------------------

If we mark all replicas as corrupt, two questions remain to be answered:
1. What should be the length of the block: longer one or shorter one?
2. Should the file remains to be open or could we close the file?

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.20.0
>
>         Attachments: inconsistentLen.patch, inconsistentLen1.patch, inconsistentLen2.patch
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672476#action_12672476 ] 

Hairong Kuang commented on HADOOP-5133:
---------------------------------------

The logic in summary 3 should only apply to the last block of a file under construction. A complete picture should be:
if the reported block is the last block of an under-construction file {
     do summary 3
} else {
    if the reported block is not the last block of a file and NN recorded length is not equal to the preferred block length, set the block's NN recorded length to be the preferred block size;
    if the reported length is not equal to the NN recorded length, mark the reported block as corrupt;
}

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.4
>
>         Attachments: inconsistentLen.patch
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5133:
----------------------------------

    Summary: FSNameSystem#addStoredBlock does not handle inconsistent block length correctly  (was: FSNameSystem#addStoredBlock does not handle incosistent block length correctly)

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678989#action_12678989 ] 

Konstantin Shvachko commented on HADOOP-5133:
---------------------------------------------

This patch definitely reduces the probability "committing" an incorrect block. Choosing longer block out of all replicas is better than selecting a shorter one.
But this does not answer the question what if (as a result of a software bug or unfortunate circumstance) the longest block is actually a wrong replica. In general the name-node does not have a definite criteria to judge which replica is the right and which is not except for the generation stamp. And it would be wrong to *silently* make such a decision based on the size (or any other artificial convention).
I am coming to a conclusion that the honest way to deal with this is to declare all replicas corrupt in this case, that is when this is the last block of a file being written to. This will be reported in fsck and an administrator or the user can deal with it.
This should happen only as a result of an error in the code so may be we should just treat it as a corruption.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.20.0
>
>         Attachments: inconsistentLen.patch, inconsistentLen1.patch, inconsistentLen2.patch
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668194#action_12668194 ] 

Hairong Kuang commented on HADOOP-5133:
---------------------------------------

> addStoredBlock() should completely ignore it.
AddStoredBlock is triggerred by blockReceived not by block report processing.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668139#action_12668139 ] 

Konstantin Shvachko commented on HADOOP-5133:
---------------------------------------------

[See related comment here|https://issues.apache.org/jira/browse/HADOOP-5027#action_12668136]

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5133:
----------------------------------

    Attachment: inconsistentLen1.patch

This patch needs more intensive tests. But I upload it now for initial review.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.4
>
>         Attachments: inconsistentLen.patch, inconsistentLen1.patch
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668197#action_12668197 ] 

dhruba borthakur commented on HADOOP-5133:
------------------------------------------

>AddStoredBlock is triggerred by blockReceived not by block report processing.

Agreed. My point was that addStoredBlock() should complete avoid making decisions about this block if this block is under construction.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667901#action_12667901 ] 

Hairong Kuang commented on HADOOP-5133:
---------------------------------------

Below was pat of the log that illustrated the bug.
1. a block was allocated for a file
    INFO  hdfs.StateChange (FSNamesystem.java:allocateBlock(1398)) - BLOCK* NameSystem.allocateBlock: /xx/file7. blk_2248817250507458558_1010
2. a write pipeline error occurred and a lease recovery added two datanodes to the block's blockMap (this is a bug reported at HADOOP-5134) and set this block's length to be 6
    INFO  namenode.FSNamesystem (FSNamesystem.java:commitBlockSynchronization(1835)) - commitBlockSynchronization(lastblock=blk_2248817250507458558_1010, newgenerationstamp=1011, newlength=6, newtargets=[127.0.0.1:51021, 127.0.0.1:51024])
3. when the block was finalized, a datanode sent blockReceived to NN. NN then called addStoredBlock which triggered the error below. DataNode 127.0.0.1:51021 did has a valid replica with a length of 63, but was wrongly marked as corrupt. 
    WARN  namenode.FSNamesystem (FSNamesystem.java:addStoredBlock(2791)) - Inconsistent size for block blk_2248817250507458558_1011 reported from 127.0.0.1:51024 current size is 6 reported size is 63
    WARN  namenode.FSNamesystem (FSNamesystem.java:addStoredBlock(2816)) - Mark existing replica blk_2248817250507458558_1011from 127.0.0.1:51021 as corrupt because its length is shorter than the new one.




> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668181#action_12668181 ] 

Konstantin Shvachko commented on HADOOP-5133:
---------------------------------------------

Hairong, sorry I see I was vague.
What I really meant was that the *larger size* of the replica should not be the criteria for deciding this is the *correct replica*.
Incorrect size should indicate the replica is corrupted - yes, but this is all the size should mean.
Deciding which replica is correct should be based on completely other than the size properties.

In this case as I understand it there is a race condition between block received and recovery from unsuccessful pipeline, right?

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679437#action_12679437 ] 

Konstantin Shvachko commented on HADOOP-5133:
---------------------------------------------

> 1. What should be the length of the block: longer one or shorter one?

The length doesn't matter since all replicas are corrupt. It could be the default block size or 0.

> 2. Should the file remains to be open or could we close the file? 

I think according to our policies (at least one replica of each block should be reported) we cannot close the file. We should not wait in this case for more replicas to appear but rather return an error to the client once the problem is encountered.
Once again this is an error condition, and that is why it is better to let the client know that an error occurred than to silently make a decision on his behalf.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.20.0
>
>         Attachments: inconsistentLen.patch, inconsistentLen1.patch, inconsistentLen2.patch
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668102#action_12668102 ] 

Hairong Kuang commented on HADOOP-5133:
---------------------------------------

The replicas that reported in this jira had the same generation stamp. No matter a block gets truncated or appended, a new generation number will be generated for the block. This is for sure. However, for replicas of the same generation number, if we do not handle inconsistent block length well, we might get into the problem of data loss as reported in HADOOP-4810 as well as the problem reported in this jira.

By the way, should this be a blocker to 0.18.3?

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668114#action_12668114 ] 

Raghu Angadi commented on HADOOP-5133:
--------------------------------------

If not for 0.18.3, it should surely be a blocker for 0.18.4.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668192#action_12668192 ] 

Hairong Kuang commented on HADOOP-5133:
---------------------------------------

Yes, in this case, the problem is caused by the recovery from an unsuccessful pipeline. 

> Deciding which replica is correct should be based on completely other than the size properties. 
yes, i agree. But for some of cases we can decide which one is corrupt. For a finalized block (NN has received blockReceived), if the reported block length is not the same as the NN recorded length, the reported block must be corrupt. For a block that's being written (calling addStoredBlock through blockReceived), if the reported length is shorter than the NN recorded one, it must be corrupt too. If it is longer, then it is hard to decide which ones are corrupt because NN recorded length does not accurately match the length of the ones at the DataNode disk.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668214#action_12668214 ] 

dhruba borthakur commented on HADOOP-5133:
------------------------------------------

> addStoredBlock can not completely ignore it. It should at least update the stored block length and add the replica to the blocksMap.

Agreed.

Suppose two datanodes report inconsistent block length in their blockReceived confirmation of the same block. Suppose both replicas have the same generation stamp.
  1. If the file is not under construction or it is not the last block of a file then the replica with the smaller size should be treated as corrupt. The larger size replica should be in the blocksMap.
  2. if the file is the last block of a file that is under construction: then keep the longer size replica in the blocksmap but do not delete the shorter size replica from the corresponding (i.e. do not treat the shorter size replica as corrupt).  Remove the shorter size replica from the blocks map.
 
Case1  typically happens when the lazy flush of OS buffers in the datanode encounters a transient error and one copy of a good replica is truncated on disk.

Case 2 could occur because a datanode prematurely (because of buggy code somewhere) sends a block Received to the NN. In this case, it is safe to not treat the replica as corrupt because the existence of the lease indicates that the NN does not "own" this block. This situation will be fixed when a block report is processed after the lease is closed.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5133) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668202#action_12668202 ] 

Hairong Kuang commented on HADOOP-5133:
---------------------------------------

When a datanode reports that the block is finalized, addStoredBlock can not completely ignore it. It should at least update the stored block length and add the replica to the blocksMap.And also if two datanodes report inconsistent block length in their blockReceived confirmation of the same block, NN should be able to handle the problem.

Aslo HADOOP-5134 triggered the problem. Dhruba, could you please comment if the behavior in HADOOP-5134 is expected? If this is expected, lease recovered blocks will under the control of both client writes and NN's replication monitor.

> FSNameSystem#addStoredBlock does not handle inconsistent block length correctly
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-5133
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5133
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>             Fix For: 0.19.1
>
>
> Currently NameNode treats either the new replica or existing replicas as corrupt if the new replica's length is inconsistent with NN recorded block length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be marked as corrupt if its length is inconsistent (no matter shorter or longer) with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter than the NN recorded block length, the new replica could be marked as corrupt; if the new replica's length is longer, NN should update its recorded block length. But it should not mark existing replicas as corrupt. This is because NN recorded length for an under construction block does not accurately match the block length on datanode disk. NN should not judge an under construction replica to be corrupt by looking at the inaccurate information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.