You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2008/07/08 22:55:31 UTC

[jira] Commented: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

    [ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611792#action_12611792 ] 

Raghu Angadi commented on HADOOP-3328:
--------------------------------------


CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode cluster with a replication of 3 shows 20% CPU combined improvement on Datanodes. Since the last datanode's work in the write pipeline does not change, this would be 30% CPU reduction on intermediate datanodes. The results are average of 3 runs. All the three datanodes are running on the same physical node and input for 4Gb file is read from /dev/zero.

||CPU |  User | Kernel | Total | % improvement ||
|| Trunk | 17777 | 169971 | 34749 | 0%
|| Trunk + patch | 10462 | 17314 |  27776 || 20% |

20% is a little less than the original estimate above, but is within the range.  

> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the current protocol includes acks from  the datanodes, an ack from the last node could also serve as verification that checksum ok. In that sense, only the last datanode needs to verify checksum. Based on [this comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.