You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2008/04/30 20:41:55 UTC

[jira] Created: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

DFS write pipeline : only the last datanode needs to verify checksum
--------------------------------------------------------------------

                 Key: HADOOP-3328
                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.16.0
            Reporter: Raghu Angadi



Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 

Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi reassigned HADOOP-3328:
------------------------------------

    Assignee: Raghu Angadi

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-3328:
------------------------------------

    Release Note:   (was: When client is writing data to DFS, only the lastdatanode in the pipeline needs to verify the checksum. Saves around 30% CPU on intermediate datanodes. )

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599205#action_12599205 ] 

lohit vijayarenu commented on HADOOP-3328:
------------------------------------------

The idea of verifying checksum only at last datanode is very good. +1

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3328:
---------------------------------

      Resolution: Fixed
    Release Note: When client is writing data to DFS, only the lastdatanode in the pipeline needs to verify the checksum. Saves around 30% CPU on intermediate datanodes. 
          Status: Resolved  (was: Patch Available)

I just committed this.

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611792#action_12611792 ] 

rangadi edited comment on HADOOP-3328 at 7/8/08 1:56 PM:
--------------------------------------------------------------

CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode cluster with a replication of 3 shows 20% combined CPU improvement on Datanodes. Since the last datanode's work in the write pipeline does not change, this would be 30% CPU reduction on intermediate datanodes. The results are average of 3 runs. All the three datanodes are running on the same physical node and input for 4Gb file is read from /dev/zero.

||CPU |  User | Kernel | Total | % improvement ||
|| Trunk | 17777 | 16971 | 34749 | 0%
|| Trunk + patch | 10462 | 17314 |  27776 || 20% |

20% is a little less than the original estimate above, but is within the range.  

      was (Author: rangadi):
    
CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode cluster with a replication of 3 shows 20% CPU combined improvement on Datanodes. Since the last datanode's work in the write pipeline does not change, this would be 30% CPU reduction on intermediate datanodes. The results are average of 3 runs. All the three datanodes are running on the same physical node and input for 4Gb file is read from /dev/zero.

||CPU |  User | Kernel | Total | % improvement ||
|| Trunk | 17777 | 169971 | 34749 | 0%
|| Trunk + patch | 10462 | 17314 |  27776 || 20% |

20% is a little less than the original estimate above, but is within the range.  
  
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3328:
---------------------------------

    Attachment: HADOOP-3328.patch

Suggested patch. I will provide some CPU numbers for comparison.

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lohit Vijayarenu updated HADOOP-3328:
-------------------------------------

     Description: 
Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 

Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

  was:

Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 

Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

    Hadoop Flags: [Reviewed]

+1 Looks good. 

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611921#action_12611921 ] 

Hadoop QA commented on HADOOP-3328:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12385436/HADOOP-3328.patch
  against trunk revision 675078.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2826/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2826/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2826/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2826/console

This message is automatically generated.

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3328:
---------------------------------

    Fix Version/s: 0.19.0
           Status: Patch Available  (was: Open)

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599486#action_12599486 ] 

Raghu Angadi commented on HADOOP-3328:
--------------------------------------

Verified that this patch does not change any checksum guarantees provided by current protocol. i.e. if there is a corruption detected at the last node, it will not considered complete at the any of the datanodes. This policy needs to be checked again if the protocol changes.

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624729#action_12624729 ] 

Hudson commented on HADOOP-3328:
--------------------------------

Integrated in Hadoop-trunk #581 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/])

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611792#action_12611792 ] 

Raghu Angadi commented on HADOOP-3328:
--------------------------------------


CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode cluster with a replication of 3 shows 20% CPU combined improvement on Datanodes. Since the last datanode's work in the write pipeline does not change, this would be 30% CPU reduction on intermediate datanodes. The results are average of 3 runs. All the three datanodes are running on the same physical node and input for 4Gb file is read from /dev/zero.

||CPU |  User | Kernel | Total | % improvement ||
|| Trunk | 17777 | 169971 | 34749 | 0%
|| Trunk + patch | 10462 | 17314 |  27776 || 20% |

20% is a little less than the original estimate above, but is within the range.  

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3328:
---------------------------------

    Attachment: HADOOP-3328.patch

Updated patch for trunk.

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.