You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Bart Vandewoestyne <Ba...@telenet.be> on 2014/10/07 09:27:14 UTC

TestDFSIO and hadoop config options

Hello list,

I would like to experiment with TestDFSIO and run some benchmarks under 
different configuration settings.  One of the things I would like to 
experiment with is to see for example how the block replication factor 
(dfs.replication) has an influence on the TestDFSIO results.

I'm using the following version of Hadoop and CDH:

bart@sandy-quad-1:~$ hadoop version
Hadoop 2.3.0-cdh5.1.2
Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 
8e266e052e423af592871e2dfe09d54c03f6a0e8
Compiled by jenkins on 2014-08-26T01:36Z
Compiled with protoc 2.5.0
 From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
This command was run using 
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar

My main problem is how I can easily change the replication factor for 
each run of TestDFSIO.  I see two options:

1) Change the dfs.replication configuration value in my Cloudera 
Manager, restart my cluster, and re-run TestDFSIO.

2) Somehow pass the different dfs.replication option to the command line 
of TestDFSIO.  On 
http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1 
I see that people run the TestDFSIO benchmark with the '-D 
dfs.replication=1' option.  This is probably the better way to go?

Method 1 seems cumbersome, and it looks like method 2 does not give any 
errors on my cluster, but how can I check if TestDFSIO was indeed run 
with the replication factor I specified with the -D option?

Kind regards,
Bart

Re: TestDFSIO and hadoop config options

Posted by Ulul <ha...@ulul.org>.
Hi

I would also go for the testdfsio option passing way
Once your write test is over you can check how many replicas were 
created for each file with
hdfs fsck <path> -files -blocks

Ulul

Le 07/10/2014 09:27, Bart Vandewoestyne a écrit :
> Hello list,
>
> I would like to experiment with TestDFSIO and run some benchmarks 
> under different configuration settings.  One of the things I would 
> like to experiment with is to see for example how the block 
> replication factor (dfs.replication) has an influence on the TestDFSIO 
> results.
>
> I'm using the following version of Hadoop and CDH:
>
> bart@sandy-quad-1:~$ hadoop version
> Hadoop 2.3.0-cdh5.1.2
> Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 
> 8e266e052e423af592871e2dfe09d54c03f6a0e8
> Compiled by jenkins on 2014-08-26T01:36Z
> Compiled with protoc 2.5.0
> From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
> This command was run using 
> /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar
>
> My main problem is how I can easily change the replication factor for 
> each run of TestDFSIO.  I see two options:
>
> 1) Change the dfs.replication configuration value in my Cloudera 
> Manager, restart my cluster, and re-run TestDFSIO.
>
> 2) Somehow pass the different dfs.replication option to the command 
> line of TestDFSIO.  On 
> http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1 
> I see that people run the TestDFSIO benchmark with the '-D 
> dfs.replication=1' option.  This is probably the better way to go?
>
> Method 1 seems cumbersome, and it looks like method 2 does not give 
> any errors on my cluster, but how can I check if TestDFSIO was indeed 
> run with the replication factor I specified with the -D option?
>
> Kind regards,
> Bart


Re: TestDFSIO and hadoop config options

Posted by Ulul <ha...@ulul.org>.
Hi

I would also go for the testdfsio option passing way
Once your write test is over you can check how many replicas were 
created for each file with
hdfs fsck <path> -files -blocks

Ulul

Le 07/10/2014 09:27, Bart Vandewoestyne a écrit :
> Hello list,
>
> I would like to experiment with TestDFSIO and run some benchmarks 
> under different configuration settings.  One of the things I would 
> like to experiment with is to see for example how the block 
> replication factor (dfs.replication) has an influence on the TestDFSIO 
> results.
>
> I'm using the following version of Hadoop and CDH:
>
> bart@sandy-quad-1:~$ hadoop version
> Hadoop 2.3.0-cdh5.1.2
> Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 
> 8e266e052e423af592871e2dfe09d54c03f6a0e8
> Compiled by jenkins on 2014-08-26T01:36Z
> Compiled with protoc 2.5.0
> From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
> This command was run using 
> /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar
>
> My main problem is how I can easily change the replication factor for 
> each run of TestDFSIO.  I see two options:
>
> 1) Change the dfs.replication configuration value in my Cloudera 
> Manager, restart my cluster, and re-run TestDFSIO.
>
> 2) Somehow pass the different dfs.replication option to the command 
> line of TestDFSIO.  On 
> http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1 
> I see that people run the TestDFSIO benchmark with the '-D 
> dfs.replication=1' option.  This is probably the better way to go?
>
> Method 1 seems cumbersome, and it looks like method 2 does not give 
> any errors on my cluster, but how can I check if TestDFSIO was indeed 
> run with the replication factor I specified with the -D option?
>
> Kind regards,
> Bart


Re: TestDFSIO and hadoop config options

Posted by Ulul <ha...@ulul.org>.
Hi

I would also go for the testdfsio option passing way
Once your write test is over you can check how many replicas were 
created for each file with
hdfs fsck <path> -files -blocks

Ulul

Le 07/10/2014 09:27, Bart Vandewoestyne a écrit :
> Hello list,
>
> I would like to experiment with TestDFSIO and run some benchmarks 
> under different configuration settings.  One of the things I would 
> like to experiment with is to see for example how the block 
> replication factor (dfs.replication) has an influence on the TestDFSIO 
> results.
>
> I'm using the following version of Hadoop and CDH:
>
> bart@sandy-quad-1:~$ hadoop version
> Hadoop 2.3.0-cdh5.1.2
> Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 
> 8e266e052e423af592871e2dfe09d54c03f6a0e8
> Compiled by jenkins on 2014-08-26T01:36Z
> Compiled with protoc 2.5.0
> From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
> This command was run using 
> /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar
>
> My main problem is how I can easily change the replication factor for 
> each run of TestDFSIO.  I see two options:
>
> 1) Change the dfs.replication configuration value in my Cloudera 
> Manager, restart my cluster, and re-run TestDFSIO.
>
> 2) Somehow pass the different dfs.replication option to the command 
> line of TestDFSIO.  On 
> http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1 
> I see that people run the TestDFSIO benchmark with the '-D 
> dfs.replication=1' option.  This is probably the better way to go?
>
> Method 1 seems cumbersome, and it looks like method 2 does not give 
> any errors on my cluster, but how can I check if TestDFSIO was indeed 
> run with the replication factor I specified with the -D option?
>
> Kind regards,
> Bart


Re: TestDFSIO and hadoop config options

Posted by Ulul <ha...@ulul.org>.
Hi

I would also go for the testdfsio option passing way
Once your write test is over you can check how many replicas were 
created for each file with
hdfs fsck <path> -files -blocks

Ulul

Le 07/10/2014 09:27, Bart Vandewoestyne a écrit :
> Hello list,
>
> I would like to experiment with TestDFSIO and run some benchmarks 
> under different configuration settings.  One of the things I would 
> like to experiment with is to see for example how the block 
> replication factor (dfs.replication) has an influence on the TestDFSIO 
> results.
>
> I'm using the following version of Hadoop and CDH:
>
> bart@sandy-quad-1:~$ hadoop version
> Hadoop 2.3.0-cdh5.1.2
> Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 
> 8e266e052e423af592871e2dfe09d54c03f6a0e8
> Compiled by jenkins on 2014-08-26T01:36Z
> Compiled with protoc 2.5.0
> From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
> This command was run using 
> /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar
>
> My main problem is how I can easily change the replication factor for 
> each run of TestDFSIO.  I see two options:
>
> 1) Change the dfs.replication configuration value in my Cloudera 
> Manager, restart my cluster, and re-run TestDFSIO.
>
> 2) Somehow pass the different dfs.replication option to the command 
> line of TestDFSIO.  On 
> http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1 
> I see that people run the TestDFSIO benchmark with the '-D 
> dfs.replication=1' option.  This is probably the better way to go?
>
> Method 1 seems cumbersome, and it looks like method 2 does not give 
> any errors on my cluster, but how can I check if TestDFSIO was indeed 
> run with the replication factor I specified with the -D option?
>
> Kind regards,
> Bart