You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2009/01/28 01:56:59 UTC

[jira] Created: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
--------------------------------------------------------------------------------------------

                 Key: HADOOP-5134
                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.18.2
            Reporter: Hairong Kuang
             Fix For: 0.19.1


>From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.

However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672775#action_12672775 ] 

Hairong Kuang commented on HADOOP-5134:
---------------------------------------

What if a blockReceived has received before commitBlockSync()? Should commitBlockSync() check and handle inconsistent block length? I do not feel comfortable about this patch because it breaks the rule that we set in HADOOP-5133.

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673747#action_12673747 ] 

Hadoop QA commented on HADOOP-5134:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12400220/commitBlockSync3.patch
  against trunk revision 744406.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3866/console

This message is automatically generated.

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync.patch, commitBlockSync2.patch, commitBlockSync3.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-5134:
-------------------------------------

    Attachment: commitBlockSync.patch

This patch does the following:

1. A commitBlockSync does not add block locations of the last block to the blocksMap, it just stores it in the INodeUnderConstruction. However, if the commitBlockSync occurs as part of lease recovery that is triggered by the NN (i.e., the file is getting closed), then it does update the blocksMap.
2. Opening a file for append causes the last partial block of the file (if any) to move its block locations from the blocksMap to the INodeUnderConstruction.

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4
>
>         Attachments: commitBlockSync.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672017#action_12672017 ] 

dhruba borthakur commented on HADOOP-5134:
------------------------------------------

I agree that "lease recovery initiating from NN is equal to file close from a client". Are you suggesting that the commitBlockSync will fail if the last block has not reached the minimum replication factor? I think this would not work too well, because restarting the lease recovery protocol would again stamp all the valid blocks with a new gs and each datanode will again have to send the blockReceived (with the new gs to the NN). It may so happen that the re-tried commitBlockSync call again reaches the NN before the new blockReceived calls. This can cause lease recovery to not end at all. Does this make sense?


> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5134:
----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I've just committed this. Thanks, Dhruba!

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync-0.18.patch, commitBlockSync.patch, commitBlockSync2.patch, commitBlockSync3.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5134:
----------------------------------

    Fix Version/s:     (was: 0.19.1)
                   0.18.4
         Assignee: dhruba borthakur

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672734#action_12672734 ] 

dhruba borthakur commented on HADOOP-5134:
------------------------------------------

This patch had nothing much to do with "appends", but it has mightily related to datanode(s) dropping off the write-pipeline during a write form a client.

When a client is writing to a file and encounters an error (bad datanode in pipleine), it invokes commitBlockSync(). This patch guarantees that this call to commitBlockSyn() does not add any replicas to the blocksMap. That's what you wanted, isn't it?

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4
>
>         Attachments: commitBlockSync.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-5134:
----------------------------------

    Priority: Blocker  (was: Major)

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.19.1
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-5134:
-------------------------------------

    Status: Patch Available  (was: Open)

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync.patch, commitBlockSync2.patch, commitBlockSync3.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671560#action_12671560 ] 

dhruba borthakur commented on HADOOP-5134:
------------------------------------------

At first I thought that the fix is simple. commitBlockSync should not update the blocksMap. But this could cause a subtle problem. The problem is the above fix could cause a call to "append" fail when it should not. let me try to explain.

Suppose, a writer is writing data to a file and dies before closing the file. A new writer starts and invokes "append" to try to append to this file. This "append" call triggers lease recovery, and thereby causes the Primary Datanode to invoke commitBlockSync. Meanwhle, the append call fails (as expected) with AlreadyBeingCreatedExcetion". The commitBlockSync call removes the lease but does not update block locations of the last block. The datanode(s) that have the last block  sends blockReceived to the NN, but before they reach the NN, the NN starts processing another call to "append". This append will now find that the file is not under construction anymore but that the last block of the file does not have any block locations associated with it. This means that the "append" call will fail. This is not expected behaviour.

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673361#action_12673361 ] 

Hairong Kuang commented on HADOOP-5134:
---------------------------------------

After much thought, I'd like to +1 on the approach in this patch as an emergency fix to 0.18 and 0.19, but leaving the discussion of lease close semantics and block length inconsistency handling to a later time.

Comments to the patch:
 lines 1897-1900 in the merged FSNamestem.java: why descriptors is not declared in the else block. It seems that descriptors is used only there.
 line 1915: no need to initiate descriptors again
 line 1916: not sure if ">=" is needed. trunk uses only ">".

Could you please also upload a patch for 0.18?

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-5134:
-------------------------------------

    Attachment: commitBlockSync3.patch

Ok, removed checking the data-crc lengths. Wull do it as a separate JIRA

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync.patch, commitBlockSync2.patch, commitBlockSync3.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-5134:
--------------------------------

    Fix Version/s: 0.19.1

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674108#action_12674108 ] 

Hairong Kuang commented on HADOOP-5134:
---------------------------------------

+1. The patch looks good to me.

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync.patch, commitBlockSync2.patch, commitBlockSync3.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-5134:
-------------------------------------

    Attachment: commitBlockSync-0.18.patch

Patch for 0.18 branch.

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync-0.18.patch, commitBlockSync.patch, commitBlockSync2.patch, commitBlockSync3.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672743#action_12672743 ] 

Hairong Kuang commented on HADOOP-5134:
---------------------------------------

But how about lease recovery when lease expires? This is part of 0.18 even nobody invokes sync or append.

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4
>
>         Attachments: commitBlockSync.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12673404#action_12673404 ] 

Hairong Kuang commented on HADOOP-5134:
---------------------------------------

> I also added a check that is triggered during lease recovery that matches the crc file with the data file (in length) before finalizing a block.
Checking the length seems not good enough. We should verify checksum for the last chunk in the block. Let's open a separate jira for this problem. Also in the code
 if (f.length() > maxDataSize || f.length() < minDataSize)  should be if (f.length() > maxDataSize || f.length() <= minDataSize).

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync.patch, commitBlockSync2.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672703#action_12672703 ] 

Hairong Kuang commented on HADOOP-5134:
---------------------------------------

My point is that a lease should not be removed before all blocks of an under-construction file reaches the minimum replication factor. Whether this is part of lease recovery or not is an implementation detail.

With your patch, is the last block still at the risk described in HADOOP-5133? If so, I would prefer to do this right instead of worrying about append. This is a patch to 0.18 no append is supported in this version.

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4
>
>         Attachments: commitBlockSync.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-5134:
-------------------------------------

    Attachment: commitBlockSync2.patch

Cleaned up the code as suggested by Hairong. I also added a check that is triggered during lease recovery that matches the crc file with the data file (in length) before finalizing a block. This ensures that the data in the block file is good.

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync.patch, commitBlockSync2.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671989#action_12671989 ] 

Hairong Kuang commented on HADOOP-5134:
---------------------------------------

>From my understanding, lease recovery initiating from NN is equal to file close from a client. So a lease should not be removed before the last block reaches the minimum replication factor. Would this solve the problem?

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674112#action_12674112 ] 

Hairong Kuang commented on HADOOP-5134:
---------------------------------------

Dhruba, could you please upload a patch for 0.18?

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync.patch, commitBlockSync2.patch, commitBlockSync3.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672760#action_12672760 ] 

dhruba borthakur commented on HADOOP-5134:
------------------------------------------

> But how about lease recovery when lease expires?

At this time, it should be safe to update the blocksMap. Since the file still has a lease when the replicas are added to the blocksMap (via commitBlockSync), the NN should not check/detect corrupt replicas. 

HADOOP-5133 caused a problem because the client wrote more data to the block after the first call to commitBlockSync(). However, when the NN triggers lease recovery, the client cannot write any more data to the block. Thus, in this case, it is safe for the NN to insert the replica in the blocksMap.

> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4
>
>         Attachments: commitBlockSync.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5134) FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674638#action_12674638 ] 

Hudson commented on HADOOP-5134:
--------------------------------

Integrated in Hadoop-trunk #758 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/758/])
    . FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap. Contributed by Dhruba Borthakur.


> FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5134
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5134
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.18.4, 0.19.1
>
>         Attachments: commitBlockSync-0.18.patch, commitBlockSync.patch, commitBlockSync2.patch, commitBlockSync3.patch
>
>
> From my understanding of sync/append design, an under construction block should not have any block locations associated with it in the blocksMap. So an under construction block will not be managed by ReplicationMonitor.
> However, if there is an error in the write pipeline, a lease recovery will trigger a call, commitBlockSynchronization, to NN. This call will add the successfully-recovered datanodes to blocksMap. This seems to violate the design. It should update the targets of the last block at INode instead. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.