You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2008/07/31 00:35:32 UTC

[jira] Issue Comment Edited: (HADOOP-3860) Compare name-node performance when journaling is performed into local hard-drives or nfs.

    [ https://issues.apache.org/jira/browse/HADOOP-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618038#action_12618038 ] 

shv edited comment on HADOOP-3860 at 7/30/08 3:35 PM:
----------------------------------------------------------------------

I benchmarked three operations: _create_, _rename_, and _delete_ using {{NNThroughputBenchmark}}, which is a pure name-node benchmark. It calls the name-node methods directly without using the rpc protocol. So the *rpc overhead is not included* in these results, and should be measured separately say with synthetic load generator. 
In a sense these benchmarks determine an upper bound for the HDFS operations, namely the maximum throughput the name-node can sustain under heavy load.

Each run starts with an empty files system and performs 1 million operations handled by 256 threads on the name-node. The output is the throughput that is the number of operation per second, which is calculated as 1,000,000/(tE-tB), where tB is when the first thread starts, and tE is when all threads stop. The threads run in parallel.
Creates create empty files and do not close them. Renames change file names, but do not move them.
All test results are consistent except for one distortion in deletes on a remote drive, which is way out of the expected range. Don't know what that is, one day they were good the other not.

Each test consists of 1,000,000 operations performed using 256 threads.
Result is in *ops/sec*.
||Log to	||open	||create (no close)	||rename	||delete||
|none		| 126,119| | | |
|1 Local HD	| |5,710	|8,400	|20,690|
|1 NFS HD	| |5,600	|8,290	|12,090|
|1 NFS Filer	| |5,676	|8,134	|21,100|
|4 Local HD	| |5,210| | |
|3 loc HD, 1 NFS HD	| |5,150| | |

Some conclusions:
-	Local drive is faster than nfs, and
-	nfs filer is faster than a remote drive;
-	but *the difference between nfs storage and local drives is very slim, only 2-3%*.
-	*Using 4 local drives instead of 1 degrades the performance by only 9%*, even though we write onto the drives sequentially (one after another).
_It would be fair to say that there is some parallelism in writing, since current code batches writes first and then synchs them at once in larges chunks. So while the writes are sequential the synchs are parallel._
-	Opens (getBlockLocation()) are 22 times faster than creates,
-	which means *journaling is the real bottleneck* for the name-node operations,
-	and the *lack of fine-grained locking in the namespace data-structures is not a problem* so far. Otherwise, the throughputs for opens and other operations would be characterized by the same or at least close numbers.
-	Further optimization of the name-node performance imo should be focused around *efficient journaling*.

Another set of statistical data, which characterizes the actual load on the name-node on some of our clusters. Unfortunately, the statistics for open is broken, and we do not collect stats for renames. So I can only present creates and deletes. Please contribute if somebody has more data.

||Actual load (ops/sec)||open	||create	||delete||
|peak	| |144	|6460|
|avarage| |11	|50|

-	These numbers show that the actual peak load for creates is about 40 times lower than the name-node can handle, and 3 times lower for deletes. On average the picture is even more drastic. 
*The name-node processing capability is 400-500 times higher than the actual average load on it.*


      was (Author: shv):
    I benchmarked three operations: _create_, _rename_, and _delete_ using {{NNThroughputBenchmark}}, which is a pure name-node benchmark. It calls the name-node methods directly without using the rpc protocol. So the *rpc overhead is not included* in these results, and should be measured separately say with synthetic load generator. 
In a sense these benchmarks determine an upper bound for the HDFS operations, namely the maximum throughput the name-node can sustain under heavy load.

Each run starts with an empty files system and performs 1 million operations handled by 256 threads on the name-node. The output is the throughput that is the number of operation per second, which is calculated as 1,000,000/(tE-tE), where tB is when the first thread starts, and tE is when all threads stop. The threads run in parallel.
Creates create empty files and do not close them. Renames change file names, but do not move them.
All test results are consistent except for one distortion in deletes on a remote drive, which is way out of the expected range. Don't know what that is, one day they were good the other not.

Each test consists of 1,000,000 operations performed using 256 threads.
Result is in *ops/sec*.
||Log to	||open	||create (no close)	||rename	||delete||
|none		| 126,119| | | |
|1 Local HD	| |5,710	|8,400	|20,690|
|1 NFS HD	| |5,600	|8,290	|12,090|
|1 NFS Filer	| |5,676	|8,134	|21,100|
|4 Local HD	| |5,210| | |
|3 loc HD, 1 NFS HD	| |5,150| | |

Some conclusions:
-	Local drive is faster than nfs, and
-	nfs filer is faster than a remote drive;
-	but *the difference between nfs storage and local drives is very slim, only 2-3%*.
-	*Using 4 local drives instead of 1 degrades the performance by only 9%*, even though we write onto the drives sequentially (one after another).
_It would be fair to say that there is some parallelism in writing, since current code batches writes first and then synchs them at once in larges chunks. So while the writes are sequential the synchs are parallel._
-	Opens (getBlockLocation()) are 22 times faster than creates,
-	which means *journaling is the real bottleneck* for the name-node operations,
-	and the *lack of fine-grained locking in the namespace data-structures is not a problem* so far. Otherwise, the throughputs for opens and other operations would be characterized by the same or at least close numbers.
-	Further optimization of the name-node performance imo should be focused around *efficient journaling*.

Another set of statistical data, which characterizes the actual load on the name-node on some of our clusters. Unfortunately, the statistics for open is broken, and we do not collect stats for renames. So I can only present creates and deletes. Please contribute if somebody has more data.

||Actual load (ops/sec)||open	||create	||delete||
|peak	| |144	|6460|
|avarage| |11	|50|

-	These numbers show that the actual peak load for creates is about 40 times lower than the name-node can handle, and 3 times lower for deletes. On average the picture is even more drastic. 
*The name-node processing capability is 400-500 times higher than the actual average load on it.*

  
> Compare name-node performance when journaling is performed into local hard-drives or nfs.
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3860
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3860
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.19.0
>
>         Attachments: NNThruputMoreOps.patch
>
>
> The goal of this issue is to measure how the name-node performance depends on where the edits log is written to.
> Three types of the journal storage should be evaluated:
> # local hard drive;
> # remote drive mounted via nfs;
> # nfs filer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.