You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Bart Vandewoestyne <Ba...@telenet.be> on 2014/10/07 09:27:14 UTC
TestDFSIO and hadoop config options
Hello list,
I would like to experiment with TestDFSIO and run some benchmarks under
different configuration settings. One of the things I would like to
experiment with is to see for example how the block replication factor
(dfs.replication) has an influence on the TestDFSIO results.
I'm using the following version of Hadoop and CDH:
bart@sandy-quad-1:~$ hadoop version
Hadoop 2.3.0-cdh5.1.2
Subversion git://github.sf.cloudera.com/CDH/cdh.git -r
8e266e052e423af592871e2dfe09d54c03f6a0e8
Compiled by jenkins on 2014-08-26T01:36Z
Compiled with protoc 2.5.0
From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
This command was run using
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar
My main problem is how I can easily change the replication factor for
each run of TestDFSIO. I see two options:
1) Change the dfs.replication configuration value in my Cloudera
Manager, restart my cluster, and re-run TestDFSIO.
2) Somehow pass the different dfs.replication option to the command line
of TestDFSIO. On
http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1
I see that people run the TestDFSIO benchmark with the '-D
dfs.replication=1' option. This is probably the better way to go?
Method 1 seems cumbersome, and it looks like method 2 does not give any
errors on my cluster, but how can I check if TestDFSIO was indeed run
with the replication factor I specified with the -D option?
Kind regards,
Bart
Re: TestDFSIO and hadoop config options
Posted by Ulul <ha...@ulul.org>.
Hi
I would also go for the testdfsio option passing way
Once your write test is over you can check how many replicas were
created for each file with
hdfs fsck <path> -files -blocks
Ulul
Le 07/10/2014 09:27, Bart Vandewoestyne a écrit :
> Hello list,
>
> I would like to experiment with TestDFSIO and run some benchmarks
> under different configuration settings. One of the things I would
> like to experiment with is to see for example how the block
> replication factor (dfs.replication) has an influence on the TestDFSIO
> results.
>
> I'm using the following version of Hadoop and CDH:
>
> bart@sandy-quad-1:~$ hadoop version
> Hadoop 2.3.0-cdh5.1.2
> Subversion git://github.sf.cloudera.com/CDH/cdh.git -r
> 8e266e052e423af592871e2dfe09d54c03f6a0e8
> Compiled by jenkins on 2014-08-26T01:36Z
> Compiled with protoc 2.5.0
> From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
> This command was run using
> /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar
>
> My main problem is how I can easily change the replication factor for
> each run of TestDFSIO. I see two options:
>
> 1) Change the dfs.replication configuration value in my Cloudera
> Manager, restart my cluster, and re-run TestDFSIO.
>
> 2) Somehow pass the different dfs.replication option to the command
> line of TestDFSIO. On
> http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1
> I see that people run the TestDFSIO benchmark with the '-D
> dfs.replication=1' option. This is probably the better way to go?
>
> Method 1 seems cumbersome, and it looks like method 2 does not give
> any errors on my cluster, but how can I check if TestDFSIO was indeed
> run with the replication factor I specified with the -D option?
>
> Kind regards,
> Bart
Re: TestDFSIO and hadoop config options
Posted by Ulul <ha...@ulul.org>.
Hi
I would also go for the testdfsio option passing way
Once your write test is over you can check how many replicas were
created for each file with
hdfs fsck <path> -files -blocks
Ulul
Le 07/10/2014 09:27, Bart Vandewoestyne a écrit :
> Hello list,
>
> I would like to experiment with TestDFSIO and run some benchmarks
> under different configuration settings. One of the things I would
> like to experiment with is to see for example how the block
> replication factor (dfs.replication) has an influence on the TestDFSIO
> results.
>
> I'm using the following version of Hadoop and CDH:
>
> bart@sandy-quad-1:~$ hadoop version
> Hadoop 2.3.0-cdh5.1.2
> Subversion git://github.sf.cloudera.com/CDH/cdh.git -r
> 8e266e052e423af592871e2dfe09d54c03f6a0e8
> Compiled by jenkins on 2014-08-26T01:36Z
> Compiled with protoc 2.5.0
> From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
> This command was run using
> /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar
>
> My main problem is how I can easily change the replication factor for
> each run of TestDFSIO. I see two options:
>
> 1) Change the dfs.replication configuration value in my Cloudera
> Manager, restart my cluster, and re-run TestDFSIO.
>
> 2) Somehow pass the different dfs.replication option to the command
> line of TestDFSIO. On
> http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1
> I see that people run the TestDFSIO benchmark with the '-D
> dfs.replication=1' option. This is probably the better way to go?
>
> Method 1 seems cumbersome, and it looks like method 2 does not give
> any errors on my cluster, but how can I check if TestDFSIO was indeed
> run with the replication factor I specified with the -D option?
>
> Kind regards,
> Bart
Re: TestDFSIO and hadoop config options
Posted by Ulul <ha...@ulul.org>.
Hi
I would also go for the testdfsio option passing way
Once your write test is over you can check how many replicas were
created for each file with
hdfs fsck <path> -files -blocks
Ulul
Le 07/10/2014 09:27, Bart Vandewoestyne a écrit :
> Hello list,
>
> I would like to experiment with TestDFSIO and run some benchmarks
> under different configuration settings. One of the things I would
> like to experiment with is to see for example how the block
> replication factor (dfs.replication) has an influence on the TestDFSIO
> results.
>
> I'm using the following version of Hadoop and CDH:
>
> bart@sandy-quad-1:~$ hadoop version
> Hadoop 2.3.0-cdh5.1.2
> Subversion git://github.sf.cloudera.com/CDH/cdh.git -r
> 8e266e052e423af592871e2dfe09d54c03f6a0e8
> Compiled by jenkins on 2014-08-26T01:36Z
> Compiled with protoc 2.5.0
> From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
> This command was run using
> /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar
>
> My main problem is how I can easily change the replication factor for
> each run of TestDFSIO. I see two options:
>
> 1) Change the dfs.replication configuration value in my Cloudera
> Manager, restart my cluster, and re-run TestDFSIO.
>
> 2) Somehow pass the different dfs.replication option to the command
> line of TestDFSIO. On
> http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1
> I see that people run the TestDFSIO benchmark with the '-D
> dfs.replication=1' option. This is probably the better way to go?
>
> Method 1 seems cumbersome, and it looks like method 2 does not give
> any errors on my cluster, but how can I check if TestDFSIO was indeed
> run with the replication factor I specified with the -D option?
>
> Kind regards,
> Bart
Re: TestDFSIO and hadoop config options
Posted by Ulul <ha...@ulul.org>.
Hi
I would also go for the testdfsio option passing way
Once your write test is over you can check how many replicas were
created for each file with
hdfs fsck <path> -files -blocks
Ulul
Le 07/10/2014 09:27, Bart Vandewoestyne a écrit :
> Hello list,
>
> I would like to experiment with TestDFSIO and run some benchmarks
> under different configuration settings. One of the things I would
> like to experiment with is to see for example how the block
> replication factor (dfs.replication) has an influence on the TestDFSIO
> results.
>
> I'm using the following version of Hadoop and CDH:
>
> bart@sandy-quad-1:~$ hadoop version
> Hadoop 2.3.0-cdh5.1.2
> Subversion git://github.sf.cloudera.com/CDH/cdh.git -r
> 8e266e052e423af592871e2dfe09d54c03f6a0e8
> Compiled by jenkins on 2014-08-26T01:36Z
> Compiled with protoc 2.5.0
> From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
> This command was run using
> /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar
>
> My main problem is how I can easily change the replication factor for
> each run of TestDFSIO. I see two options:
>
> 1) Change the dfs.replication configuration value in my Cloudera
> Manager, restart my cluster, and re-run TestDFSIO.
>
> 2) Somehow pass the different dfs.replication option to the command
> line of TestDFSIO. On
> http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1
> I see that people run the TestDFSIO benchmark with the '-D
> dfs.replication=1' option. This is probably the better way to go?
>
> Method 1 seems cumbersome, and it looks like method 2 does not give
> any errors on my cluster, but how can I check if TestDFSIO was indeed
> run with the replication factor I specified with the -D option?
>
> Kind regards,
> Bart