You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hairong Kuang (JIRA)" <ji...@apache.org> on 2008/10/29 23:02:44 UTC

[jira] Created: (HADOOP-4540) An invalidated block should be removed from the blockMap

An invalidated block should be removed from the blockMap
--------------------------------------------------------

                 Key: HADOOP-4540
                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.18.0
            Reporter: Hairong Kuang


Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
1. getBlockLocations may return locations that do not contain the block;
2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4540) An invalidated block should be removed from the blockMap

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644425#action_12644425 ] 

dhruba borthakur commented on HADOOP-4540:
------------------------------------------

I agree with Hairong's proposal that when the NN schedules a block to be deleted, it should delete it from the blocksMap. I have always wondered why the current code does was written to not delete the block immediately.

> An invalidated block should be removed from the blockMap
> --------------------------------------------------------
>
>                 Key: HADOOP-4540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.3
>
>
> Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
> 1. getBlockLocations may return locations that do not contain the block;
> 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
> 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4540) An invalidated block should be removed from the blockMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-4540:
----------------------------------

             Priority: Blocker  (was: Major)
    Affects Version/s:     (was: 0.18.0)
                       0.17.0
        Fix Version/s: 0.18.2
             Assignee: Hairong Kuang

This bug may cause block lose if a datanode containing this block frequently gets heartbeat lost and re-registers itself in a short period. Thus marking it as a blocker.

> An invalidated block should be removed from the blockMap
> --------------------------------------------------------
>
>                 Key: HADOOP-4540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.2
>
>
> Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
> 1. getBlockLocations may return locations that do not contain the block;
> 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
> 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4540) An invalidated block should be removed from the blockMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-4540:
----------------------------------

    Fix Version/s:     (was: 0.18.2)
                   0.18.3

> An invalidated block should be removed from the blockMap
> --------------------------------------------------------
>
>                 Key: HADOOP-4540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.3
>
>
> Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
> 1. getBlockLocations may return locations that do not contain the block;
> 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
> 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4540) An invalidated block should be removed from the blockMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-4540:
----------------------------------

    Fix Version/s:     (was: 0.18.3)
                   0.20.0

> An invalidated block should be removed from the blockMap
> --------------------------------------------------------
>
>                 Key: HADOOP-4540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
> 1. getBlockLocations may return locations that do not contain the block;
> 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
> 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4540) An invalidated block should be removed from the blockMap

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656097#action_12656097 ] 

Runping Qi commented on HADOOP-4540:
------------------------------------

 Could this problem be related to H-3885?

> An invalidated block should be removed from the blockMap
> --------------------------------------------------------
>
>                 Key: HADOOP-4540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
> 1. getBlockLocations may return locations that do not contain the block;
> 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
> 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4540) An invalidated block should be removed from the blockMap

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644165#action_12644165 ] 

Hairong Kuang commented on HADOOP-4540:
---------------------------------------

My proposal is to remove a replica from the blocks map when it is marked as "invalid" (i.e., when it is moved to the recentInvalidateSet) as a result of over-replication. Also when a block report comes in, and a new replica is found but it is marked as invalid, this new replica does not get added to the blocks map.  

> An invalidated block should be removed from the blockMap
> --------------------------------------------------------
>
>                 Key: HADOOP-4540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.2
>
>
> Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
> 1. getBlockLocations may return locations that do not contain the block;
> 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
> 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4540) An invalidated block should be removed from the blockMap

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-4540:
------------------------------------

    Priority: Major  (was: Blocker)

> An invalidated block should be removed from the blockMap
> --------------------------------------------------------
>
>                 Key: HADOOP-4540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.20.0
>
>
> Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
> 1. getBlockLocations may return locations that do not contain the block;
> 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
> 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4540) An invalidated block should be removed from the blockMap

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644391#action_12644391 ] 

Raghu Angadi commented on HADOOP-4540:
--------------------------------------

I think this was the policy  the case even in pre-0.17.0 NameNode i.e. Blocks were deleted only lazily from blocksMap. Whether HADOOP-4477 has always been there or made more probably by another policy I am not sure.

bq. My proposal is to remove a replica from the blocks map when it is marked as "invalid" (i.e., when it is moved to the recentInvalidateSet) as a result of over-replication. Also when a block report comes in, and a new replica is found but it is marked as invalid, this new replica does not get added to the blocks map.

This probably needs more details.

We have so many maps : blocksMap, neededReplications, excessReplications etc. These are all supposed to be consistent in some way. What the consistency requirements are or how the requirements are enforced in not explicitly defined anywhere. I am afraid if we make one isolated change now, it is very hard say for sure that we are not introducing issues similar to HADOOP-4477. 

We could probably do something smaller to avoid HADOOP-4477. But to change a policy that been there since the beginning as this jira proposes, I think we need to consider more. I propose we write down what are the maps involved and their relations (when and why a block moves to and from these maps etc).



> An invalidated block should be removed from the blockMap
> --------------------------------------------------------
>
>                 Key: HADOOP-4540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.3
>
>
> Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
> 1. getBlockLocations may return locations that do not contain the block;
> 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
> 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-4540) An invalidated block should be removed from the blockMap

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644391#action_12644391 ] 

rangadi edited comment on HADOOP-4540 at 10/31/08 11:40 AM:
-----------------------------------------------------------------

(Edit : corrected the jira number refererred.)

I think this was the policy  the case even in pre-0.17.0 NameNode i.e. Blocks were deleted only lazily from blocksMap. Whether HADOOP-4556 has always been there or made more probably by another policy I am not sure.

bq. My proposal is to remove a replica from the blocks map when it is marked as "invalid" (i.e., when it is moved to the recentInvalidateSet) as a result of over-replication. Also when a block report comes in, and a new replica is found but it is marked as invalid, this new replica does not get added to the blocks map.

This probably needs more details.

We have so many maps : blocksMap, neededReplications, excessReplications etc. These are all supposed to be consistent in some way. What the consistency requirements are or how the requirements are enforced in not explicitly defined anywhere. I am afraid if we make one isolated change now, it is very hard say for sure that we are not introducing issues similar to HADOOP-4556. 

We could probably do something smaller to avoid HADOOP-4556. But to change a policy that been there since the beginning as this jira proposes, I think we need to consider more. I propose we write down what are the maps involved and their relations (when and why a block moves to and from these maps etc).



      was (Author: rangadi):
    I think this was the policy  the case even in pre-0.17.0 NameNode i.e. Blocks were deleted only lazily from blocksMap. Whether HADOOP-4477 has always been there or made more probably by another policy I am not sure.

bq. My proposal is to remove a replica from the blocks map when it is marked as "invalid" (i.e., when it is moved to the recentInvalidateSet) as a result of over-replication. Also when a block report comes in, and a new replica is found but it is marked as invalid, this new replica does not get added to the blocks map.

This probably needs more details.

We have so many maps : blocksMap, neededReplications, excessReplications etc. These are all supposed to be consistent in some way. What the consistency requirements are or how the requirements are enforced in not explicitly defined anywhere. I am afraid if we make one isolated change now, it is very hard say for sure that we are not introducing issues similar to HADOOP-4477. 

We could probably do something smaller to avoid HADOOP-4477. But to change a policy that been there since the beginning as this jira proposes, I think we need to consider more. I propose we write down what are the maps involved and their relations (when and why a block moves to and from these maps etc).


  
> An invalidated block should be removed from the blockMap
> --------------------------------------------------------
>
>                 Key: HADOOP-4540
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4540
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.3
>
>
> Currently when a namenode schedules to delete an over-replicated block, the replica to be deleted does not get removed the block map immediately. Instead it gets removed when the next block report to comes in. This causes three problems: 
> 1. getBlockLocations may return locations that do not contain the block;
> 2. Over-replication due to unsuccessful deletion can not be detected as described in HADOOP-4477.
> 3. The number of blocks shown on dfs Web UI does not get updated on a source node when a large number of blocks have been moved from the source node to a target node, for example, when running a balancer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.