You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Ahad Rana (JIRA)" <ji...@apache.org> on 2008/10/21 22:38:44 UTC

[jira] Created: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not not honor passed in maxblocks value

getBlockArray in DatanodeDescriptor does not not honor passed in maxblocks value
--------------------------------------------------------------------------------

Key: HADOOP-4483
URL: https://issues.apache.org/jira/browse/HADOOP-4483
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.18.1
Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
Reporter: Ahad Rana
Priority: Critical

The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats).

In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643650#action_12643650 ] 

Hairong Kuang commented on HADOOP-4483:
---------------------------------------

Junit tests passed on my local machine:
BUILD SUCCESSFUL
Total time: 113 minutes 11 seconds

Ant test-patch result:

     [exec] +1 overall.

     [exec]     +1 @author.  The patch does not contain any @author tags.

     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.

     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.

     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.

     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.



> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahad Rana updated HADOOP-4483:
------------------------------

    Attachment: HADOOP-4483-v2.patch

Revised version with .clear() implementation in cases where n == Blocks.size()

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley reassigned HADOOP-4483:
-------------------------------------

    Assignee: Ahad Rana  (was: Abdul Qadeer)

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, invalidateBlocksCopy-br18.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not not honor passed in maxblocks value

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahad Rana updated HADOOP-4483:
------------------------------

    Attachment: patch.HADOOP-4483

This fixes the getBlockArray method in DatanodeDescriptor to constrained the returned Block array to the maxBlocks values passed in. 

> getBlockArray in DatanodeDescriptor does not not honor passed in maxblocks value
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4483:
-------------------------------------

    Attachment: HADOOP-4483-v3.patch

Removed more empty lines from earlier patch.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644014#action_12644014 ] 

Hudson commented on HADOOP-4483:
--------------------------------

Integrated in Hadoop-trunk #647 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/647/])
    . Honor the max parameter in DatanodeDescriptor.getBlockArray(...).  (Ahad Rana and Hairong Kuang via szetszwo)


> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Assignee: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, invalidateBlocksCopy-br18.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641915#action_12641915 ] 

hairong edited comment on HADOOP-4483 at 10/22/08 10:56 AM:
------------------------------------------------------------------

When I looked at the code related to block invalidation, I had a question whey invalidateBlocks were implemented as TreeSet, which require log( n ) for both Inserting & removing. If we limit the size of invalidateBlocks to be no greater than blockInvalidateLimit, an array or arrayList would be more efficient. Otherwise, a LinkedList would be still be better than a TreeSet. 

      was (Author: hairong):
    When I looked at the code related to block invalidation, I had a question whey invalidateBlocks were implemented as TreeSet, which require log(n) for both Inserting & removing. If we limit the size of invalidateBlocks to be no greater than blockInvalidateLimit, an array or arrayList would be more efficient. Otherwise, a LinkedList would be still be better than a TreeSet. 
  
> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4483:
-------------------------------------

    Attachment: HADOOP-4483-v3.patch

Thanks Ahad for the patch. Changed patch to delete empty lines and to generate it using "svn diff" from the top level of the source tree.


> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641617#action_12641617 ] 

dhruba borthakur commented on HADOOP-4483:
------------------------------------------

Good catch. This is a candidate for 0.19.1

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643392#action_12643392 ] 

Nigel Daley commented on HADOOP-4483:
-------------------------------------

Please include a unit test.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641614#action_12641614 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4483:
------------------------------------------------

Good catch on the bug.

- Removing the elements form a collection one by one could be expensive.  Also, we have max >= n in most of the cases.  How about using the existing codes (i.e. blocks.clear() instead of e.remove()) when max >= n?

- Could you also remove the white space changes, tabs and the trailing spaces?  This will keep the codes have the same style.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643654#action_12643654 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4483:
------------------------------------------------

Hairong, we also need a 0.18 patch.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahad Rana updated HADOOP-4483:
------------------------------

    Summary: getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value  (was: getBlockArray in DatanodeDescriptor does not not honor passed in maxblocks value)

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641641#action_12641641 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4483:
------------------------------------------------

bq. I believe that since the underlying container is a tree, even Collection's internal code needs to use the iterator approach to remove items from the data structure.

That is wrong.  clear() removes all elements.  You only have to set root = null,

bq. Plus, in the standard configuration, where the heartbeat interval is set to 3 seconds, I believe max blocks <= 100.

That's exactly we are removing the whole tree most of the times.

> Am I correct in assuming that the convention for the hadoop codebase is spaces (instead of tabs) ? 

Yes, we are not using tabs in the source codes.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641648#action_12641648 ] 

Hairong Kuang commented on HADOOP-4483:
---------------------------------------

If we want to make the implementation to be more efficient, I have the following suggestions:
1. Store invalidateBlocks as an array or arrayList instead of a treeSet;
2. ReplicationMonitor makes sure that the size of invalidateBlocks does not go beyond blockInvalidateLimit;
3. getBlockArray does not need to worry about the number of invalidate blocks. It needs only to do an array copy and reset.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-4483:
----------------------------------

    Attachment: invalidateBlocksCopy.patch

Upload a patch with a junit test.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641915#action_12641915 ] 

Hairong Kuang commented on HADOOP-4483:
---------------------------------------

When I looked at the code related to block invalidation, I had a question whey invalidateBlocks were implemented as TreeSet, which require log(n) for both Inserting & removing. If we limit the size of invalidateBlocks to be no greater than blockInvalidateLimit, an array or arrayList would be more efficient. Otherwise, a LinkedList would be still be better than a TreeSet. 

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-4483:
-------------------------------------------

      Resolution: Fixed
        Assignee: Abdul Qadeer
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Ahad Rana and Hairong Kuang!

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Assignee: Abdul Qadeer
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, invalidateBlocksCopy-br18.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643609#action_12643609 ] 

Ahad Rana commented on HADOOP-4483:
-----------------------------------

Thanks Hairong. Sorry, I couldn't get to this in time. 

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641674#action_12641674 ] 

Ahad Rana commented on HADOOP-4483:
-----------------------------------

Re: That is wrong. clear() removes all elements. You only have to set root = null,

You are right. I see that TreeSet uses TreeMap as the underlying container, and, each iterator.remove causes a re-balance of the red-black tree. And, yes, looks like TreeSet.clear() set root = null, which is obviously speedier. I would still argue that under normal load scenarios (from my initial observations), the number of blocks in the Collection are <= 10, and since delete on the CLRS implementation of the Red-Black Tree takes O(log n), there might not be that much gained from the .clear() optimization. In the edge case where the system is under heavy load, I observed block count in excess of 1000. In this case, a partial removal of the blocks (based on the max blocks limitation) would still require the iterator.remove pattern. So perhaps in the long term, it might be better to replace the underlying data structure, as Hairong suggests. I guess it would be interesting to find out if the code (post patch) is a performance bottle neck or not before undertaking the more aggressive modifications.    

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not not honor passed in maxblocks value

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahad Rana updated HADOOP-4483:
------------------------------

    Fix Version/s: 0.18.2
           Status: Patch Available  (was: Open)

This fixes the getBlockArray method in DatanodeDescriptor to constrained the returned Block array to the maxBlocks values passed in.

> getBlockArray in DatanodeDescriptor does not not honor passed in maxblocks value
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641916#action_12641916 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4483:
------------------------------------------------

+1 HADOOP-4483-v2.patch looks good to me.

>Are you folks saying that the approach adopted by this patch is not sufficient and it needs more changes to make it efficient?

Current fix is good enough for this issue.  If there is anything we could do for better performance, we could do it in a separated issue.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641920#action_12641920 ] 

Hairong Kuang commented on HADOOP-4483:
---------------------------------------

>If there is anything we could do for better performance, we could do it in a separated issue.
I agree. Even Nicholas's suggestion is not necessary for this issue. Because this makes the good case even better and worse case even worse. Besides, the new patch covers the cases (n<available) & (n==availiable); If thee is a programmatic error that causes (n>available), invalidateBlocks may not get cleared. 

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641630#action_12641630 ] 

Ahad Rana commented on HADOOP-4483:
-----------------------------------

Re: Removing the elements form a collection one by one could be expensive. Also, we have max >= n in most of the cases. How about using the existing codes (i.e. blocks.clear() instead of e.remove()) when max >= n?

I believe that since the underlying container is a tree, even Collection's internal code needs to use the iterator approach to remove items from the data structure. Plus, in the standard configuration, where the heartbeat interval is set to 3 seconds, I believe max blocks <= 100. Better to stick to one code path in this instance.

Re: Could you also remove the white space changes, tabs and the trailing spaces? This will keep the codes have the same style.

Sorry, one my editors must to be set to tabs vs. spaces or vice versa. Am I correct in assuming that the convention for the hadoop codebase is spaces (instead of tabs) ? 


> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-4483:
----------------------------------

    Attachment: invalidateBlocksCopy-br18.patch

Here is the 18 patch.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, invalidateBlocksCopy-br18.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not not honor passed in maxblocks value

Posted by "Ahad Rana (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ahad Rana updated HADOOP-4483:
------------------------------

    Comment: was deleted

> getBlockArray in DatanodeDescriptor does not not honor passed in maxblocks value
> --------------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643605#action_12643605 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4483:
------------------------------------------------

+1 patch looks good.

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641719#action_12641719 ] 

dhruba borthakur commented on HADOOP-4483:
------------------------------------------

@Nicholas/Hairong: Are you folks saying that the approach adopted by this patch is not sufficient and it needs more changes to make it efficient?

> getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-4483
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4483
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.1
>         Environment: hadoop-0.18.1 running on a cluster of 16 nodes.
>            Reporter: Ahad Rana
>            Priority: Critical
>             Fix For: 0.18.2
>
>         Attachments: HADOOP-4483-v2.patch, patch.HADOOP-4483
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). 
> In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.