You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2008/04/30 20:41:55 UTC
[jira] Created: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
DFS write pipeline : only the last datanode needs to verify checksum
--------------------------------------------------------------------
Key: HADOOP-3328
URL: https://issues.apache.org/jira/browse/HADOOP-3328
Project: Hadoop Core
Issue Type: Improvement
Components: dfs
Affects Versions: 0.16.0
Reporter: Raghu Angadi
Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghu Angadi reassigned HADOOP-3328:
------------------------------------
Assignee: Raghu Angadi
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Chansler updated HADOOP-3328:
------------------------------------
Release Note: (was: When client is writing data to DFS, only the lastdatanode in the pipeline needs to verify the checksum. Saves around 30% CPU on intermediate datanodes. )
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.19.0
>
> Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "lohit vijayarenu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599205#action_12599205 ]
lohit vijayarenu commented on HADOOP-3328:
------------------------------------------
The idea of verifying checksum only at last datanode is very good. +1
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Attachments: HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghu Angadi updated HADOOP-3328:
---------------------------------
Resolution: Fixed
Release Note: When client is writing data to DFS, only the lastdatanode in the pipeline needs to verify the checksum. Saves around 30% CPU on intermediate datanodes.
Status: Resolved (was: Patch Available)
I just committed this.
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.19.0
>
> Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (HADOOP-3328) DFS write pipeline :
only the last datanode needs to verify checksum
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611792#action_12611792 ]
rangadi edited comment on HADOOP-3328 at 7/8/08 1:56 PM:
--------------------------------------------------------------
CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode cluster with a replication of 3 shows 20% combined CPU improvement on Datanodes. Since the last datanode's work in the write pipeline does not change, this would be 30% CPU reduction on intermediate datanodes. The results are average of 3 runs. All the three datanodes are running on the same physical node and input for 4Gb file is read from /dev/zero.
||CPU | User | Kernel | Total | % improvement ||
|| Trunk | 17777 | 16971 | 34749 | 0%
|| Trunk + patch | 10462 | 17314 | 27776 || 20% |
20% is a little less than the original estimate above, but is within the range.
was (Author: rangadi):
CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode cluster with a replication of 3 shows 20% CPU combined improvement on Datanodes. Since the last datanode's work in the write pipeline does not change, this would be 30% CPU reduction on intermediate datanodes. The results are average of 3 runs. All the three datanodes are running on the same physical node and input for 4Gb file is read from /dev/zero.
||CPU | User | Kernel | Total | % improvement ||
|| Trunk | 17777 | 169971 | 34749 | 0%
|| Trunk + patch | 10462 | 17314 | 27776 || 20% |
20% is a little less than the original estimate above, but is within the range.
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghu Angadi updated HADOOP-3328:
---------------------------------
Attachment: HADOOP-3328.patch
Suggested patch. I will provide some CPU numbers for comparison.
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Attachments: HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Lohit Vijayarenu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lohit Vijayarenu updated HADOOP-3328:
-------------------------------------
Description:
Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
was:
Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
Hadoop Flags: [Reviewed]
+1 Looks good.
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611921#action_12611921 ]
Hadoop QA commented on HADOOP-3328:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12385436/HADOOP-3328.patch
against trunk revision 675078.
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2826/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2826/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2826/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2826/console
This message is automatically generated.
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.19.0
>
> Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghu Angadi updated HADOOP-3328:
---------------------------------
Fix Version/s: 0.19.0
Status: Patch Available (was: Open)
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.19.0
>
> Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599486#action_12599486 ]
Raghu Angadi commented on HADOOP-3328:
--------------------------------------
Verified that this patch does not change any checksum guarantees provided by current protocol. i.e. if there is a corruption detected at the last node, it will not considered complete at the any of the datanodes. This policy needs to be checked again if the protocol changes.
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Attachments: HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12624729#action_12624729 ]
Hudson commented on HADOOP-3328:
--------------------------------
Integrated in Hadoop-trunk #581 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/])
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.19.0
>
> Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611792#action_12611792 ]
Raghu Angadi commented on HADOOP-3328:
--------------------------------------
CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode cluster with a replication of 3 shows 20% CPU combined improvement on Datanodes. Since the last datanode's work in the write pipeline does not change, this would be 30% CPU reduction on intermediate datanodes. The results are average of 3 runs. All the three datanodes are running on the same physical node and input for 4Gb file is read from /dev/zero.
||CPU | User | Kernel | Total | % improvement ||
|| Trunk | 17777 | 169971 | 34749 | 0%
|| Trunk + patch | 10462 | 17314 | 27776 || 20% |
20% is a little less than the original estimate above, but is within the range.
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-3328) DFS write pipeline : only the last
datanode needs to verify checksum
Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghu Angadi updated HADOOP-3328:
---------------------------------
Attachment: HADOOP-3328.patch
Updated patch for trunk.
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
> Key: HADOOP-3328
> URL: https://issues.apache.org/jira/browse/HADOOP-3328
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.16.0
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702.
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.