You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Martin Mituzas <xi...@hotmail.com> on 2009/06/01 14:55:57 UTC

Question on HDFS write performance

I need to indentify the bottleneck of my current cluster when running
io-bound benchmarks.
I run a test with 4 nodes, 1 node as job tracker and namenode, 3 nodes as
task tracker and data nodes.
I run RandomWriter to generate 30G data with 15 mappers, and then run Sort
on the generated data with 15 reducers. Replication is 3.
I add log code into HDFS code and then analyze the generated log for
randomwriter period and sort period.
The result is as follows.
I measured the following values:
1) average block preparation time: the time DFSClient spent to generate all
packets for a block.
2) average block writing time: from the time DFSClient gets an allocated
block from namenode in nextBlockOutputStream() to all Acks are received.
3) average network receiving time and average disk writing time for a block
for the first, second, third datanode in the pipeline.

RandomWriter
Total 528 blocks, total size is 34063336366, full blocks(64M): 506
Average block preparation time by client is: 11456.21
Average writing time for one block(64M): 11931.49
Average time on No.0 target datanode:
average network receiving time :112.44
average disk writing time :3035.04
Average time on No.1 target datanode:
average network receiving time :3337.68
average disk writing time :2950.74
Average time on No.2 target datanode:
average network receiving time :3171.18
average disk writing time :2646.38

sort
Total 494 blocks, total size is 32318504139, full blocks(64M): 479
Average block preparation time by client is: 16237.59
Average writing time for one block(64M): 16642.67
Average time on No.0 target datanode:
average network receiving time :164.28
average disk writing time :3331.50
Average time on No.1 target datanode:
average network receiving time :2125.62
average disk writing time :3436.32
Average time on No.2 target datanode:
average network receiving time :2856.56
average disk writing time :3426.04

And my question is why the network receiving time on the third node is
larger than the other two nodes. Another question, how to identify the
bottlenecks? Or You can tell me what other kinds of values should be
collected.

Thanks in advance.
--
View this message in context: http://www.nabble.com/Question-on-HDFS-write-performance-tp23814528p23814528.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Question on HDFS write performance

Posted by Martin Mituzas <xi...@hotmail.com>.

Thanks for response.
I attached the measurement patch. 
Avg block writing time I get from the task log from the point
WRITE_STATUS_NEW_BLOCK to WRITE_STATUS_BLOCK_FINISH. You can refer to the
code to understand the meaning.
I set the property mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum both 5.
The first datanode uses quite little time on receving data, I think this is
because the sender and receiver are both in the same machine, for. e.g. from
10.0.0.8 to 10.0.0.8, so the receiving time is quite smaller comparing with
the next two datanodes in the pipeline.


Raghu Angadi wrote:
> 
> 
> Can you post the patch for these measurements? I can guess where these 
> are measured but better to see the actual changes.
> 
> For. e.g. the third datanode does only two things : receiving and 
> writing data to the disk. So "avg block writing time" for you should be 
> around sum of these two (~6-7k) but it is much larger (CRC verification 
> should not affect much). Not sure why that is the case.
> 
> How many simultaneous maps are you running?
> 
> Looking at the first datanode stats, the fact that it spends a lot more 
> time writing to disk compared to receiving, you are mostly harddisk bound.
> 
> Raghu.
> 
> Martin Mituzas wrote:
>> I need to indentify the bottleneck of my current cluster when running
>> io-bound benchmarks.
>> I run a test with 4 nodes,  1 node as job tracker and namenode, 3 nodes
>> as
>> task tracker and data nodes. 
>> I run RandomWriter to generate 30G data with 15 mappers, and then run
>> Sort
>> on the generated data with 15 reducers. Replication is 3.
>> I add log code into HDFS code and then analyze the generated log for
>> randomwriter period and sort period. 
>> The result is as follows. 
>> I measured the following values:
>> 1) average block preparation time: the time DFSClient spent to generate
>> all
>> packets for a block. 
>> 2) average block writing time: from the time DFSClient gets an allocated
>> block from namenode in nextBlockOutputStream() to all Acks are received.
>> 3) average network receiving time and average disk writing time for a
>> block
>> for the first, second, third datanode in the pipeline.
>> 
>> RandomWriter
>> Total 528 blocks,  total size is 34063336366, full blocks(64M): 506
>> Average block preparation time by client is: 11456.21
>> Average writing time for one block(64M): 11931.49
>> Average time on No.0  target datanode:
>>   average network receiving time :112.44
>>   average disk writing time      :3035.04
>> Average time on No.1  target datanode:
>>   average network receiving time :3337.68
>>   average disk writing time      :2950.74
>> Average time on No.2  target datanode:
>>   average network receiving time :3171.18
>>   average disk writing time      :2646.38
>>  
>> 
>> sort
>> Total 494 blocks,  total size is 32318504139, full blocks(64M): 479
>> Average block preparation time by client is: 16237.59
>> Average writing time for one block(64M): 16642.67
>> Average time on No.0  target datanode:
>>   average network receiving time :164.28
>>   average disk writing time      :3331.50
>> Average time on No.1  target datanode:
>>   average network receiving time :2125.62
>>   average disk writing time      :3436.32
>> Average time on No.2  target datanode:
>>   average network receiving time :2856.56
>>   average disk writing time      :3426.04
>> 
>> And my question is why the network receiving time on the third node is
>> larger than the other two nodes. Another question, how to identify the
>> bottlenecks? Or You can tell me what other kinds of values should be
>> collected. 
>> 
>> Thanks in advance.
> 
> 
> 
http://www.nabble.com/file/p23825481/measure.patch measure.patch 
-- 
View this message in context: http://www.nabble.com/Question-on-HDFS-write-performance-tp23814528p23825481.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Question on HDFS write performance

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

Can you post the patch for these measurements? I can guess where these 
are measured but better to see the actual changes.

For. e.g. the third datanode does only two things : receiving and 
writing data to the disk. So "avg block writing time" for you should be 
around sum of these two (~6-7k) but it is much larger (CRC verification 
should not affect much). Not sure why that is the case.

How many simultaneous maps are you running?

Looking at the first datanode stats, the fact that it spends a lot more 
time writing to disk compared to receiving, you are mostly harddisk bound.

Raghu.

Martin Mituzas wrote:
> I need to indentify the bottleneck of my current cluster when running
> io-bound benchmarks.
> I run a test with 4 nodes,  1 node as job tracker and namenode, 3 nodes as
> task tracker and data nodes. 
> I run RandomWriter to generate 30G data with 15 mappers, and then run Sort
> on the generated data with 15 reducers. Replication is 3.
> I add log code into HDFS code and then analyze the generated log for
> randomwriter period and sort period. 
> The result is as follows. 
> I measured the following values:
> 1) average block preparation time: the time DFSClient spent to generate all
> packets for a block. 
> 2) average block writing time: from the time DFSClient gets an allocated
> block from namenode in nextBlockOutputStream() to all Acks are received.
> 3) average network receiving time and average disk writing time for a block
> for the first, second, third datanode in the pipeline.
> 
> RandomWriter
> Total 528 blocks,  total size is 34063336366, full blocks(64M): 506
> Average block preparation time by client is: 11456.21
> Average writing time for one block(64M): 11931.49
> Average time on No.0  target datanode:
>   average network receiving time :112.44
>   average disk writing time      :3035.04
> Average time on No.1  target datanode:
>   average network receiving time :3337.68
>   average disk writing time      :2950.74
> Average time on No.2  target datanode:
>   average network receiving time :3171.18
>   average disk writing time      :2646.38
>  
> 
> sort
> Total 494 blocks,  total size is 32318504139, full blocks(64M): 479
> Average block preparation time by client is: 16237.59
> Average writing time for one block(64M): 16642.67
> Average time on No.0  target datanode:
>   average network receiving time :164.28
>   average disk writing time      :3331.50
> Average time on No.1  target datanode:
>   average network receiving time :2125.62
>   average disk writing time      :3436.32
> Average time on No.2  target datanode:
>   average network receiving time :2856.56
>   average disk writing time      :3426.04
> 
> And my question is why the network receiving time on the third node is
> larger than the other two nodes. Another question, how to identify the
> bottlenecks? Or You can tell me what other kinds of values should be
> collected. 
> 
> Thanks in advance.