You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Gaurav Dasgupta <gd...@gmail.com> on 2012/08/30 09:14:08 UTC

TestDFSIO info required

Hi,

I ran TestDFSIO in my Hadoop cluster:
*hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -write -nrFiles
100 -fileSize 10240*
The report generated is:
*12/08/30 01:31:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write*

*12/08/30 01:31:34 INFO fs.TestDFSIO:            Date & time: Thu Aug 30
01:31:34 CDT 2012*

*12/08/30 01:31:34 INFO fs.TestDFSIO:        Number of files: 100*

*12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes processed: 1024000.0*

*12/08/30 01:31:34 INFO fs.TestDFSIO:      Throughput mb/sec:
5.54130695296031*

*12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate mb/sec:
5.875064849853516*

*12/08/30 01:31:34 INFO fs.TestDFSIO:  IO rate std deviation:
1.503623716482166*

*12/08/30 01:31:34 INFO fs.TestDFSIO:     Test exec time sec: 3490.168*

**

I was refering to the blog:

http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/



As per my understanding from that blog, I calculated *Throughput =
(1024000*1000)/3490.168 =  293395.61* which is not my throughput ofcourse.

Then I found a file in the HDFS output directory of the job:

*hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000* gave me this:



*f:rate 587506.5
f:sqrate 3677727.2
l:size 1073741824000
l:tasks 100
l:time 184793950*

Then I applied this above time in the formula: *Throughput =
(1024000*1000)/184793950 = 5.541* which is my throughput.



Can someone tell me what exactly is this time in the HDFS output
directory file "part-0000" ?



Thanks,

Gaurav Dasgupta

Re: TestDFSIO info required

Posted by Gaurav Dasgupta <gd...@gmail.com>.
Hi All,

The formula is actually: *Throughput = (size*1000) / (time*MEGA)*
*                                                    = (1073741824000*1000)
/ (184793950 * 1048576)*
*                                                    = 5.54130695296031*

And the "time" is the summation of all the "Exec Time" of each "Task
Attempts" of the Map phase. These can be found inside the "Task Logs" of
each Task Attempts.
So, solved.

Thanks,
Gaurav Dasgupta

On Thu, Aug 30, 2012 at 12:44 PM, Gaurav Dasgupta <gd...@gmail.com>wrote:

> Hi,
>
> I ran TestDFSIO in my Hadoop cluster:
> *hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -write
> -nrFiles 100 -fileSize 10240*
> The report generated is:
> *12/08/30 01:31:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:            Date & time: Thu Aug 30
> 01:31:34 CDT 2012*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:        Number of files: 100*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes processed: 1024000.0*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:      Throughput mb/sec:
> 5.54130695296031*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 5.875064849853516*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:  IO rate std deviation:
> 1.503623716482166*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:     Test exec time sec: 3490.168*
>
> **
>
> I was refering to the blog:
>
>
> http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
>
>
>
> As per my understanding from that blog, I calculated *Throughput =
> (1024000*1000)/3490.168 =  293395.61* which is not my throughput ofcourse.
>
> Then I found a file in the HDFS output directory of the job:
>
> *hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000* gave me this:
>
>
>
> *f:rate 587506.5
> f:sqrate 3677727.2
> l:size 1073741824000
> l:tasks 100
> l:time 184793950*
>
> Then I applied this above time in the formula: *Throughput =
> (1024000*1000)/184793950 = 5.541* which is my throughput.
>
>
>
> Can someone tell me what exactly is this time in the HDFS output
> directory file "part-0000" ?
>
>
>
> Thanks,
>
> Gaurav Dasgupta
>

Re: TestDFSIO info required

Posted by Gaurav Dasgupta <gd...@gmail.com>.
Hi All,

The formula is actually: *Throughput = (size*1000) / (time*MEGA)*
*                                                    = (1073741824000*1000)
/ (184793950 * 1048576)*
*                                                    = 5.54130695296031*

And the "time" is the summation of all the "Exec Time" of each "Task
Attempts" of the Map phase. These can be found inside the "Task Logs" of
each Task Attempts.
So, solved.

Thanks,
Gaurav Dasgupta

On Thu, Aug 30, 2012 at 12:44 PM, Gaurav Dasgupta <gd...@gmail.com>wrote:

> Hi,
>
> I ran TestDFSIO in my Hadoop cluster:
> *hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -write
> -nrFiles 100 -fileSize 10240*
> The report generated is:
> *12/08/30 01:31:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:            Date & time: Thu Aug 30
> 01:31:34 CDT 2012*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:        Number of files: 100*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes processed: 1024000.0*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:      Throughput mb/sec:
> 5.54130695296031*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 5.875064849853516*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:  IO rate std deviation:
> 1.503623716482166*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:     Test exec time sec: 3490.168*
>
> **
>
> I was refering to the blog:
>
>
> http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
>
>
>
> As per my understanding from that blog, I calculated *Throughput =
> (1024000*1000)/3490.168 =  293395.61* which is not my throughput ofcourse.
>
> Then I found a file in the HDFS output directory of the job:
>
> *hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000* gave me this:
>
>
>
> *f:rate 587506.5
> f:sqrate 3677727.2
> l:size 1073741824000
> l:tasks 100
> l:time 184793950*
>
> Then I applied this above time in the formula: *Throughput =
> (1024000*1000)/184793950 = 5.541* which is my throughput.
>
>
>
> Can someone tell me what exactly is this time in the HDFS output
> directory file "part-0000" ?
>
>
>
> Thanks,
>
> Gaurav Dasgupta
>

Re: TestDFSIO info required

Posted by Gaurav Dasgupta <gd...@gmail.com>.
Hi All,

The formula is actually: *Throughput = (size*1000) / (time*MEGA)*
*                                                    = (1073741824000*1000)
/ (184793950 * 1048576)*
*                                                    = 5.54130695296031*

And the "time" is the summation of all the "Exec Time" of each "Task
Attempts" of the Map phase. These can be found inside the "Task Logs" of
each Task Attempts.
So, solved.

Thanks,
Gaurav Dasgupta

On Thu, Aug 30, 2012 at 12:44 PM, Gaurav Dasgupta <gd...@gmail.com>wrote:

> Hi,
>
> I ran TestDFSIO in my Hadoop cluster:
> *hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -write
> -nrFiles 100 -fileSize 10240*
> The report generated is:
> *12/08/30 01:31:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:            Date & time: Thu Aug 30
> 01:31:34 CDT 2012*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:        Number of files: 100*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes processed: 1024000.0*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:      Throughput mb/sec:
> 5.54130695296031*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 5.875064849853516*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:  IO rate std deviation:
> 1.503623716482166*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:     Test exec time sec: 3490.168*
>
> **
>
> I was refering to the blog:
>
>
> http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
>
>
>
> As per my understanding from that blog, I calculated *Throughput =
> (1024000*1000)/3490.168 =  293395.61* which is not my throughput ofcourse.
>
> Then I found a file in the HDFS output directory of the job:
>
> *hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000* gave me this:
>
>
>
> *f:rate 587506.5
> f:sqrate 3677727.2
> l:size 1073741824000
> l:tasks 100
> l:time 184793950*
>
> Then I applied this above time in the formula: *Throughput =
> (1024000*1000)/184793950 = 5.541* which is my throughput.
>
>
>
> Can someone tell me what exactly is this time in the HDFS output
> directory file "part-0000" ?
>
>
>
> Thanks,
>
> Gaurav Dasgupta
>

Re: TestDFSIO info required

Posted by Gaurav Dasgupta <gd...@gmail.com>.
Hi All,

The formula is actually: *Throughput = (size*1000) / (time*MEGA)*
*                                                    = (1073741824000*1000)
/ (184793950 * 1048576)*
*                                                    = 5.54130695296031*

And the "time" is the summation of all the "Exec Time" of each "Task
Attempts" of the Map phase. These can be found inside the "Task Logs" of
each Task Attempts.
So, solved.

Thanks,
Gaurav Dasgupta

On Thu, Aug 30, 2012 at 12:44 PM, Gaurav Dasgupta <gd...@gmail.com>wrote:

> Hi,
>
> I ran TestDFSIO in my Hadoop cluster:
> *hadoop jar /usr/lib/hadoop-0.20/hadoop-test.jar TestDFSIO -write
> -nrFiles 100 -fileSize 10240*
> The report generated is:
> *12/08/30 01:31:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:            Date & time: Thu Aug 30
> 01:31:34 CDT 2012*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:        Number of files: 100*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO: Total MBytes processed: 1024000.0*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:      Throughput mb/sec:
> 5.54130695296031*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 5.875064849853516*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:  IO rate std deviation:
> 1.503623716482166*
>
> *12/08/30 01:31:34 INFO fs.TestDFSIO:     Test exec time sec: 3490.168*
>
> **
>
> I was refering to the blog:
>
>
> http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
>
>
>
> As per my understanding from that blog, I calculated *Throughput =
> (1024000*1000)/3490.168 =  293395.61* which is not my throughput ofcourse.
>
> Then I found a file in the HDFS output directory of the job:
>
> *hadoop fs -cat /benchmarks/TestDFSIO/io_write/part-00000* gave me this:
>
>
>
> *f:rate 587506.5
> f:sqrate 3677727.2
> l:size 1073741824000
> l:tasks 100
> l:time 184793950*
>
> Then I applied this above time in the formula: *Throughput =
> (1024000*1000)/184793950 = 5.541* which is my throughput.
>
>
>
> Can someone tell me what exactly is this time in the HDFS output
> directory file "part-0000" ?
>
>
>
> Thanks,
>
> Gaurav Dasgupta
>