You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2007/04/25 23:24:15 UTC

[jira] Created: (HADOOP-1297) datanode sending block reports to namenode once every second

datanode sending block reports to namenode once every second
------------------------------------------------------------

                 Key: HADOOP-1297
                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
            Reporter: dhruba borthakur
         Assigned To: dhruba borthakur


The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.

In short, the above condition causes the datanode to send blockReports almost once every second!

I propose that we do the following:

1. in Datanode.offerService, replace the following piece of code

DatanodeCommand cmd = namenode.blockReport(dnRegistration,
data.getBlockReport());
processCommand(cmd);
lastBlockReport = now;

with

DatanodeCommand cmd = namenode.blockReport(dnRegistration,
data.getBlockReport());
lastBlockReport = now;
processCommand(cmd);

2. In FSDataSet.invalidate:
a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
[

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1297) datanode sending block reports to namenode once every second

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491819 ] 

Hadoop QA commented on HADOOP-1297:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12356274/datanodeDeleteBlocks2.patch applied and successfully tested against trunk revision r532083.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/82/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/82/console

> datanode sending block reports to namenode once every second
> ------------------------------------------------------------
>
>                 Key: HADOOP-1297
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: datanodeDeleteBlocks2.patch
>
>
> The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.
> In short, the above condition causes the datanode to send blockReports almost once every second!
> I propose that we do the following:
> 1. in Datanode.offerService, replace the following piece of code
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> processCommand(cmd);
> lastBlockReport = now;
> with
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> lastBlockReport = now;
> processCommand(cmd);
> 2. In FSDataSet.invalidate:
> a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
> b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
> volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
> [

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1297) datanode sending block reports to namenode once every second

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1297:
-------------------------------------

    Status: Patch Available  (was: Open)

> datanode sending block reports to namenode once every second
> ------------------------------------------------------------
>
>                 Key: HADOOP-1297
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: datanodeDeleteBlocks2.patch
>
>
> The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.
> In short, the above condition causes the datanode to send blockReports almost once every second!
> I propose that we do the following:
> 1. in Datanode.offerService, replace the following piece of code
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> processCommand(cmd);
> lastBlockReport = now;
> with
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> lastBlockReport = now;
> processCommand(cmd);
> 2. In FSDataSet.invalidate:
> a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
> b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
> volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
> [

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1297) datanode sending block reports to namenode once every second

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492253 ] 

Hadoop QA commented on HADOOP-1297:
-----------------------------------

Integrated in Hadoop-Nightly #71 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/71/)

> datanode sending block reports to namenode once every second
> ------------------------------------------------------------
>
>                 Key: HADOOP-1297
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>             Fix For: 0.13.0
>
>         Attachments: datanodeDeleteBlocks2.patch
>
>
> The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.
> In short, the above condition causes the datanode to send blockReports almost once every second!
> I propose that we do the following:
> 1. in Datanode.offerService, replace the following piece of code
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> processCommand(cmd);
> lastBlockReport = now;
> with
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> lastBlockReport = now;
> processCommand(cmd);
> 2. In FSDataSet.invalidate:
> a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
> b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
> volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
> [

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1297) datanode sending block reports to namenode once every second

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1297:
-------------------------------------

    Attachment: datanodeDeleteBlocks-0.12.3.patch

patch for 0.12.3 release.

> datanode sending block reports to namenode once every second
> ------------------------------------------------------------
>
>                 Key: HADOOP-1297
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>             Fix For: 0.13.0
>
>         Attachments: datanodeDeleteBlocks-0.12.3.patch, datanodeDeleteBlocks2.patch
>
>
> The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.
> In short, the above condition causes the datanode to send blockReports almost once every second!
> I propose that we do the following:
> 1. in Datanode.offerService, replace the following piece of code
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> processCommand(cmd);
> lastBlockReport = now;
> with
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> lastBlockReport = now;
> processCommand(cmd);
> 2. In FSDataSet.invalidate:
> a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
> b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
> volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
> [

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1297) datanode sending block reports to namenode once every second

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1297:
-------------------------------------

    Attachment:     (was: datanodeDeleteBlocks.patch)

> datanode sending block reports to namenode once every second
> ------------------------------------------------------------
>
>                 Key: HADOOP-1297
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>
> The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.
> In short, the above condition causes the datanode to send blockReports almost once every second!
> I propose that we do the following:
> 1. in Datanode.offerService, replace the following piece of code
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> processCommand(cmd);
> lastBlockReport = now;
> with
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> lastBlockReport = now;
> processCommand(cmd);
> 2. In FSDataSet.invalidate:
> a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
> b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
> volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
> [

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1297) datanode sending block reports to namenode once every second

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1297:
-------------------------------------

    Attachment: datanodeDeleteBlocks.patch

> datanode sending block reports to namenode once every second
> ------------------------------------------------------------
>
>                 Key: HADOOP-1297
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: datanodeDeleteBlocks.patch
>
>
> The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.
> In short, the above condition causes the datanode to send blockReports almost once every second!
> I propose that we do the following:
> 1. in Datanode.offerService, replace the following piece of code
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> processCommand(cmd);
> lastBlockReport = now;
> with
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> lastBlockReport = now;
> processCommand(cmd);
> 2. In FSDataSet.invalidate:
> a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
> b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
> volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
> [

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1297) datanode sending block reports to namenode once every second

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1297:
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.13.0
           Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Dhruba!

> datanode sending block reports to namenode once every second
> ------------------------------------------------------------
>
>                 Key: HADOOP-1297
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>             Fix For: 0.13.0
>
>         Attachments: datanodeDeleteBlocks2.patch
>
>
> The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.
> In short, the above condition causes the datanode to send blockReports almost once every second!
> I propose that we do the following:
> 1. in Datanode.offerService, replace the following piece of code
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> processCommand(cmd);
> lastBlockReport = now;
> with
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> lastBlockReport = now;
> processCommand(cmd);
> 2. In FSDataSet.invalidate:
> a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
> b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
> volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
> [

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1297) datanode sending block reports to namenode once every second

Posted by "Marco Nicosia (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marco Nicosia updated HADOOP-1297:
----------------------------------


Please include this in 0.12.4

> datanode sending block reports to namenode once every second
> ------------------------------------------------------------
>
>                 Key: HADOOP-1297
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>             Fix For: 0.13.0
>
>         Attachments: datanodeDeleteBlocks2.patch
>
>
> The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.
> In short, the above condition causes the datanode to send blockReports almost once every second!
> I propose that we do the following:
> 1. in Datanode.offerService, replace the following piece of code
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> processCommand(cmd);
> lastBlockReport = now;
> with
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> lastBlockReport = now;
> processCommand(cmd);
> 2. In FSDataSet.invalidate:
> a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
> b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
> volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
> [

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1297) datanode sending block reports to namenode once every second

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491797 ] 

Raghu Angadi commented on HADOOP-1297:
--------------------------------------

+1.

Couple of minor things:
 # {{invalidBlks[i] + "file "}} neess a space before 'file' 
 # You might want to include comment saying extra checks like {{f.exists()}} and exceptions are temporary.


> datanode sending block reports to namenode once every second
> ------------------------------------------------------------
>
>                 Key: HADOOP-1297
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: datanodeDeleteBlocks.patch
>
>
> The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.
> In short, the above condition causes the datanode to send blockReports almost once every second!
> I propose that we do the following:
> 1. in Datanode.offerService, replace the following piece of code
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> processCommand(cmd);
> lastBlockReport = now;
> with
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> lastBlockReport = now;
> processCommand(cmd);
> 2. In FSDataSet.invalidate:
> a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
> b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
> volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
> [

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1297) datanode sending block reports to namenode once every second

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1297:
-------------------------------------

    Attachment: datanodeDeleteBlocks2.patch

Inserted comment saying that the f.exists() that follows the f.delete() is "temporary" and can be removed after hadoop-1220 is solved.

> datanode sending block reports to namenode once every second
> ------------------------------------------------------------
>
>                 Key: HADOOP-1297
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1297
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: datanodeDeleteBlocks2.patch
>
>
> The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode.
> In short, the above condition causes the datanode to send blockReports almost once every second!
> I propose that we do the following:
> 1. in Datanode.offerService, replace the following piece of code
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> processCommand(cmd);
> lastBlockReport = now;
> with
> DatanodeCommand cmd = namenode.blockReport(dnRegistration,
> data.getBlockReport());
> lastBlockReport = now;
> processCommand(cmd);
> 2. In FSDataSet.invalidate:
> a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem.
> b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in
> volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging.
> [

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.