You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Jeffrey Denton <de...@clemson.edu> on 2014/10/02 18:42:55 UTC

TestDFSIO with FS other than defaultFS

Hello all,

I'm trying to run TestDFSIO using a different file system other than the
configured defaultFS and it doesn't work for me:

$ hadoop org.apache.hadoop.fs.TestDFSIO
-Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1
-fileSize 10240

14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7

14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1

14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0

14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000

14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir =
ofs://test/user/denton/TestDFSIO

14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local reads
feature cannot be used because libhadoop cannot be loaded.

14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240
bytes, 1 files

*java.lang.IllegalArgumentException: Wrong FS:
ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci*

at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)

at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)

at
org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)

at
org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)

at
org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)

at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)

at
org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)

at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290)

at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)

At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16 data
nodes and 3 separate master nodes for the resource manager and two
namenodes; however, for this test, the data nodes are really being used
to run the map tasks with job output being written to 16 separate OrangeFS
servers.

Ideally, we would like the 16 HDFS data nodes and two namenodes to be the
defaultFS, but would also like the capability to run jobs using other
OrangeFS installations.

The above error does not occur when OrangeFS is configured to be the
defaultFS. Also, we have no problems running teragen/terasort/teravalidate
when OrangeFS IS NOT the defaultFS.

So, is it possible to run TestDFSIO using a FS other than the defaultFS?

If you're interested in the OrangeFS classes, they can be found here
<http://www.orangefs.org/svn/orangefs/branches/denton.hadoop2.trunk/src/client/hadoop/orangefs-hadoop2/src/main/java/org/apache/hadoop/fs/ofs/>
:

I have not yet run any of the FS tests
<http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/testing.html>
released with 2.5.1 but hope to soon.

Regards,

Jeff Denton
OrangeFS Developer
Clemson University
denton@clemson.edu

Re: TestDFSIO with FS other than defaultFS

Posted by Jeffrey Denton <de...@clemson.edu>.
Jay,

I have not tried the bigtop hcfs tests. Any tips on how to get started with
those?

Our configuration looks similar except for the Gluster specific options and
both *fs.default.name <http://fs.default.name> *(and *fs.defaultFS*) as we
don't want OrangeFS to be the default fs for this Hadoop cluster. I don't
think the problem is caused by a configuration issue as the tera* suite
works.

The problem is with how TestDFSIO determines the "fs" instance:

FileSystem fs = FileSystem.get(config);

This basically forces the fs to be fs.defaultFS. Shouldn't TestDFSIO be
capable of handling a non-default URI set via:
 -Dtest.build.data=ofs://test/user/$USER/TestDFSIO

I think TestDFSIO should use:

FileSystem get(URI uri, Configuration conf)

with *uri* being the test.build.data property, if specified, or a sensible
default based on the defaultFS scheme and authority as well as the rest of
the desired URI.

This means test.build.dir should always be treated as a *URI* rather than a
*String* so that the default value returned by the method getBaseDir, in
class TestDFSIO, can be based off of the defaultFS. Currently, this isn't
the case:

private static String getBaseDir(Configuration conf) {
    return conf.get("test.build.data","/benchmarks/TestDFSIO");
}

Thoughts?

Thanks,
Jeff


On Thu, Oct 2, 2014 at 4:02 PM, Jay Vyas <ja...@gmail.com>
wrote:

> Hi jeff.  Wrong fs means that your configuration doesn't know how to bind
> ofs to the OrangeFS file system class.
>
> You can debug the configuration using fs.dumpConfiguration(....), and you
> will likely see references to hdfs in there.
>
> By the way, have you tried our bigtop hcfs tests yet? We now support over
> 100 Hadoop file system compatibility tests...
>
> You can see a good sample of what parameters should be set for a hcfs
> implementation here:
> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>
> On Oct 2, 2014, at 12:42 PM, Jeffrey Denton <de...@clemson.edu> wrote:
>
> Hello all,
>
> I'm trying to run TestDFSIO using a different file system other than the
> configured defaultFS and it doesn't work for me:
>
> $ hadoop org.apache.hadoop.fs.TestDFSIO
> -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1
> -fileSize 10240
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir =
> ofs://test/user/denton/TestDFSIO
>
> 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local
> reads feature cannot be used because libhadoop cannot be loaded.
>
> 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240
> bytes, 1 files
>
> *java.lang.IllegalArgumentException: Wrong FS:
> ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci*
>
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
>
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
>
> at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290)
>
> at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
>
> At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16
> data nodes and 3 separate master nodes for the resource manager and two
> namenodes; however, for this test, the data nodes are really being used
> to run the map tasks with job output being written to 16 separate OrangeFS
> servers.
>
> Ideally, we would like the 16 HDFS data nodes and two namenodes to be the
> defaultFS, but would also like the capability to run jobs using other
> OrangeFS installations.
>
> The above error does not occur when OrangeFS is configured to be the
> defaultFS. Also, we have no problems running teragen/terasort/teravalidate
> when OrangeFS IS NOT the defaultFS.
>
> So, is it possible to run TestDFSIO using a FS other than the defaultFS?
>
> If you're interested in the OrangeFS classes, they can be found here
> <http://www.orangefs.org/svn/orangefs/branches/denton.hadoop2.trunk/src/client/hadoop/orangefs-hadoop2/src/main/java/org/apache/hadoop/fs/ofs/>
> :
>
> I have not yet run any of the FS tests
> <http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/testing.html>
> released with 2.5.1 but hope to soon.
>
> Regards,
>
> Jeff Denton
> OrangeFS Developer
> Clemson University
> denton@clemson.edu
>
>
>
>
>
>
>
>

Re: TestDFSIO with FS other than defaultFS

Posted by Jeffrey Denton <de...@clemson.edu>.
Jay,

I have not tried the bigtop hcfs tests. Any tips on how to get started with
those?

Our configuration looks similar except for the Gluster specific options and
both *fs.default.name <http://fs.default.name> *(and *fs.defaultFS*) as we
don't want OrangeFS to be the default fs for this Hadoop cluster. I don't
think the problem is caused by a configuration issue as the tera* suite
works.

The problem is with how TestDFSIO determines the "fs" instance:

FileSystem fs = FileSystem.get(config);

This basically forces the fs to be fs.defaultFS. Shouldn't TestDFSIO be
capable of handling a non-default URI set via:
 -Dtest.build.data=ofs://test/user/$USER/TestDFSIO

I think TestDFSIO should use:

FileSystem get(URI uri, Configuration conf)

with *uri* being the test.build.data property, if specified, or a sensible
default based on the defaultFS scheme and authority as well as the rest of
the desired URI.

This means test.build.dir should always be treated as a *URI* rather than a
*String* so that the default value returned by the method getBaseDir, in
class TestDFSIO, can be based off of the defaultFS. Currently, this isn't
the case:

private static String getBaseDir(Configuration conf) {
    return conf.get("test.build.data","/benchmarks/TestDFSIO");
}

Thoughts?

Thanks,
Jeff


On Thu, Oct 2, 2014 at 4:02 PM, Jay Vyas <ja...@gmail.com>
wrote:

> Hi jeff.  Wrong fs means that your configuration doesn't know how to bind
> ofs to the OrangeFS file system class.
>
> You can debug the configuration using fs.dumpConfiguration(....), and you
> will likely see references to hdfs in there.
>
> By the way, have you tried our bigtop hcfs tests yet? We now support over
> 100 Hadoop file system compatibility tests...
>
> You can see a good sample of what parameters should be set for a hcfs
> implementation here:
> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>
> On Oct 2, 2014, at 12:42 PM, Jeffrey Denton <de...@clemson.edu> wrote:
>
> Hello all,
>
> I'm trying to run TestDFSIO using a different file system other than the
> configured defaultFS and it doesn't work for me:
>
> $ hadoop org.apache.hadoop.fs.TestDFSIO
> -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1
> -fileSize 10240
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir =
> ofs://test/user/denton/TestDFSIO
>
> 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local
> reads feature cannot be used because libhadoop cannot be loaded.
>
> 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240
> bytes, 1 files
>
> *java.lang.IllegalArgumentException: Wrong FS:
> ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci*
>
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
>
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
>
> at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290)
>
> at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
>
> At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16
> data nodes and 3 separate master nodes for the resource manager and two
> namenodes; however, for this test, the data nodes are really being used
> to run the map tasks with job output being written to 16 separate OrangeFS
> servers.
>
> Ideally, we would like the 16 HDFS data nodes and two namenodes to be the
> defaultFS, but would also like the capability to run jobs using other
> OrangeFS installations.
>
> The above error does not occur when OrangeFS is configured to be the
> defaultFS. Also, we have no problems running teragen/terasort/teravalidate
> when OrangeFS IS NOT the defaultFS.
>
> So, is it possible to run TestDFSIO using a FS other than the defaultFS?
>
> If you're interested in the OrangeFS classes, they can be found here
> <http://www.orangefs.org/svn/orangefs/branches/denton.hadoop2.trunk/src/client/hadoop/orangefs-hadoop2/src/main/java/org/apache/hadoop/fs/ofs/>
> :
>
> I have not yet run any of the FS tests
> <http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/testing.html>
> released with 2.5.1 but hope to soon.
>
> Regards,
>
> Jeff Denton
> OrangeFS Developer
> Clemson University
> denton@clemson.edu
>
>
>
>
>
>
>
>

Re: TestDFSIO with FS other than defaultFS

Posted by Jeffrey Denton <de...@clemson.edu>.
Jay,

I have not tried the bigtop hcfs tests. Any tips on how to get started with
those?

Our configuration looks similar except for the Gluster specific options and
both *fs.default.name <http://fs.default.name> *(and *fs.defaultFS*) as we
don't want OrangeFS to be the default fs for this Hadoop cluster. I don't
think the problem is caused by a configuration issue as the tera* suite
works.

The problem is with how TestDFSIO determines the "fs" instance:

FileSystem fs = FileSystem.get(config);

This basically forces the fs to be fs.defaultFS. Shouldn't TestDFSIO be
capable of handling a non-default URI set via:
 -Dtest.build.data=ofs://test/user/$USER/TestDFSIO

I think TestDFSIO should use:

FileSystem get(URI uri, Configuration conf)

with *uri* being the test.build.data property, if specified, or a sensible
default based on the defaultFS scheme and authority as well as the rest of
the desired URI.

This means test.build.dir should always be treated as a *URI* rather than a
*String* so that the default value returned by the method getBaseDir, in
class TestDFSIO, can be based off of the defaultFS. Currently, this isn't
the case:

private static String getBaseDir(Configuration conf) {
    return conf.get("test.build.data","/benchmarks/TestDFSIO");
}

Thoughts?

Thanks,
Jeff


On Thu, Oct 2, 2014 at 4:02 PM, Jay Vyas <ja...@gmail.com>
wrote:

> Hi jeff.  Wrong fs means that your configuration doesn't know how to bind
> ofs to the OrangeFS file system class.
>
> You can debug the configuration using fs.dumpConfiguration(....), and you
> will likely see references to hdfs in there.
>
> By the way, have you tried our bigtop hcfs tests yet? We now support over
> 100 Hadoop file system compatibility tests...
>
> You can see a good sample of what parameters should be set for a hcfs
> implementation here:
> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>
> On Oct 2, 2014, at 12:42 PM, Jeffrey Denton <de...@clemson.edu> wrote:
>
> Hello all,
>
> I'm trying to run TestDFSIO using a different file system other than the
> configured defaultFS and it doesn't work for me:
>
> $ hadoop org.apache.hadoop.fs.TestDFSIO
> -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1
> -fileSize 10240
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir =
> ofs://test/user/denton/TestDFSIO
>
> 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local
> reads feature cannot be used because libhadoop cannot be loaded.
>
> 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240
> bytes, 1 files
>
> *java.lang.IllegalArgumentException: Wrong FS:
> ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci*
>
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
>
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
>
> at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290)
>
> at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
>
> At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16
> data nodes and 3 separate master nodes for the resource manager and two
> namenodes; however, for this test, the data nodes are really being used
> to run the map tasks with job output being written to 16 separate OrangeFS
> servers.
>
> Ideally, we would like the 16 HDFS data nodes and two namenodes to be the
> defaultFS, but would also like the capability to run jobs using other
> OrangeFS installations.
>
> The above error does not occur when OrangeFS is configured to be the
> defaultFS. Also, we have no problems running teragen/terasort/teravalidate
> when OrangeFS IS NOT the defaultFS.
>
> So, is it possible to run TestDFSIO using a FS other than the defaultFS?
>
> If you're interested in the OrangeFS classes, they can be found here
> <http://www.orangefs.org/svn/orangefs/branches/denton.hadoop2.trunk/src/client/hadoop/orangefs-hadoop2/src/main/java/org/apache/hadoop/fs/ofs/>
> :
>
> I have not yet run any of the FS tests
> <http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/testing.html>
> released with 2.5.1 but hope to soon.
>
> Regards,
>
> Jeff Denton
> OrangeFS Developer
> Clemson University
> denton@clemson.edu
>
>
>
>
>
>
>
>

Re: TestDFSIO with FS other than defaultFS

Posted by Jeffrey Denton <de...@clemson.edu>.
Jay,

I have not tried the bigtop hcfs tests. Any tips on how to get started with
those?

Our configuration looks similar except for the Gluster specific options and
both *fs.default.name <http://fs.default.name> *(and *fs.defaultFS*) as we
don't want OrangeFS to be the default fs for this Hadoop cluster. I don't
think the problem is caused by a configuration issue as the tera* suite
works.

The problem is with how TestDFSIO determines the "fs" instance:

FileSystem fs = FileSystem.get(config);

This basically forces the fs to be fs.defaultFS. Shouldn't TestDFSIO be
capable of handling a non-default URI set via:
 -Dtest.build.data=ofs://test/user/$USER/TestDFSIO

I think TestDFSIO should use:

FileSystem get(URI uri, Configuration conf)

with *uri* being the test.build.data property, if specified, or a sensible
default based on the defaultFS scheme and authority as well as the rest of
the desired URI.

This means test.build.dir should always be treated as a *URI* rather than a
*String* so that the default value returned by the method getBaseDir, in
class TestDFSIO, can be based off of the defaultFS. Currently, this isn't
the case:

private static String getBaseDir(Configuration conf) {
    return conf.get("test.build.data","/benchmarks/TestDFSIO");
}

Thoughts?

Thanks,
Jeff


On Thu, Oct 2, 2014 at 4:02 PM, Jay Vyas <ja...@gmail.com>
wrote:

> Hi jeff.  Wrong fs means that your configuration doesn't know how to bind
> ofs to the OrangeFS file system class.
>
> You can debug the configuration using fs.dumpConfiguration(....), and you
> will likely see references to hdfs in there.
>
> By the way, have you tried our bigtop hcfs tests yet? We now support over
> 100 Hadoop file system compatibility tests...
>
> You can see a good sample of what parameters should be set for a hcfs
> implementation here:
> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>
> On Oct 2, 2014, at 12:42 PM, Jeffrey Denton <de...@clemson.edu> wrote:
>
> Hello all,
>
> I'm trying to run TestDFSIO using a different file system other than the
> configured defaultFS and it doesn't work for me:
>
> $ hadoop org.apache.hadoop.fs.TestDFSIO
> -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1
> -fileSize 10240
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000
>
> 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir =
> ofs://test/user/denton/TestDFSIO
>
> 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local
> reads feature cannot be used because libhadoop cannot be loaded.
>
> 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240
> bytes, 1 files
>
> *java.lang.IllegalArgumentException: Wrong FS:
> ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci*
>
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
>
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
>
> at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290)
>
> at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
>
> At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16
> data nodes and 3 separate master nodes for the resource manager and two
> namenodes; however, for this test, the data nodes are really being used
> to run the map tasks with job output being written to 16 separate OrangeFS
> servers.
>
> Ideally, we would like the 16 HDFS data nodes and two namenodes to be the
> defaultFS, but would also like the capability to run jobs using other
> OrangeFS installations.
>
> The above error does not occur when OrangeFS is configured to be the
> defaultFS. Also, we have no problems running teragen/terasort/teravalidate
> when OrangeFS IS NOT the defaultFS.
>
> So, is it possible to run TestDFSIO using a FS other than the defaultFS?
>
> If you're interested in the OrangeFS classes, they can be found here
> <http://www.orangefs.org/svn/orangefs/branches/denton.hadoop2.trunk/src/client/hadoop/orangefs-hadoop2/src/main/java/org/apache/hadoop/fs/ofs/>
> :
>
> I have not yet run any of the FS tests
> <http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/testing.html>
> released with 2.5.1 but hope to soon.
>
> Regards,
>
> Jeff Denton
> OrangeFS Developer
> Clemson University
> denton@clemson.edu
>
>
>
>
>
>
>
>

Re: TestDFSIO with FS other than defaultFS

Posted by Jay Vyas <ja...@gmail.com>.
Hi jeff.  Wrong fs means that your configuration doesn't know how to bind ofs to the OrangeFS file system class.

You can debug the configuration using fs.dumpConfiguration(....), and you will likely see references to hdfs in there.

By the way, have you tried our bigtop hcfs tests yet? We now support over 100 Hadoop file system compatibility tests...

You can see a good sample of what parameters should be set for a hcfs implementation here: https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml

> On Oct 2, 2014, at 12:42 PM, Jeffrey Denton <de...@clemson.edu> wrote:
> 
> Hello all,
> 
> I'm trying to run TestDFSIO using a different file system other than the configured defaultFS and it doesn't work for me:
> 
> $ hadoop org.apache.hadoop.fs.TestDFSIO -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1 -fileSize 10240
> 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0
> 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000
> 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir = ofs://test/user/denton/TestDFSIO
> 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
> 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240 bytes, 1 files
> java.lang.IllegalArgumentException: Wrong FS: ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)
> at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)
> at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
> at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
> at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290)
> at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
> 
> At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16 data nodes and 3 separate master nodes for the resource manager and two namenodes; however, for this test, the data nodes are really being used to run the map tasks with job output being written to 16 separate OrangeFS servers.
> 
> Ideally, we would like the 16 HDFS data nodes and two namenodes to be the defaultFS, but would also like the capability to run jobs using other OrangeFS installations. 
> 
> The above error does not occur when OrangeFS is configured to be the defaultFS. Also, we have no problems running teragen/terasort/teravalidate when OrangeFS IS NOT the defaultFS.
> 
> So, is it possible to run TestDFSIO using a FS other than the defaultFS?
> 
> If you're interested in the OrangeFS classes, they can be found here:
> 
> I have not yet run any of the FS tests released with 2.5.1 but hope to soon.
> 
> Regards,
> 
> Jeff Denton
> OrangeFS Developer
> Clemson University
> denton@clemson.edu
> 
> 
> 
> 
> 
> 
> 

Re: TestDFSIO with FS other than defaultFS

Posted by Jay Vyas <ja...@gmail.com>.
Hi jeff.  Wrong fs means that your configuration doesn't know how to bind ofs to the OrangeFS file system class.

You can debug the configuration using fs.dumpConfiguration(....), and you will likely see references to hdfs in there.

By the way, have you tried our bigtop hcfs tests yet? We now support over 100 Hadoop file system compatibility tests...

You can see a good sample of what parameters should be set for a hcfs implementation here: https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml

> On Oct 2, 2014, at 12:42 PM, Jeffrey Denton <de...@clemson.edu> wrote:
> 
> Hello all,
> 
> I'm trying to run TestDFSIO using a different file system other than the configured defaultFS and it doesn't work for me:
> 
> $ hadoop org.apache.hadoop.fs.TestDFSIO -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1 -fileSize 10240
> 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0
> 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000
> 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir = ofs://test/user/denton/TestDFSIO
> 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
> 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240 bytes, 1 files
> java.lang.IllegalArgumentException: Wrong FS: ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)
> at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)
> at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
> at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
> at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290)
> at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
> 
> At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16 data nodes and 3 separate master nodes for the resource manager and two namenodes; however, for this test, the data nodes are really being used to run the map tasks with job output being written to 16 separate OrangeFS servers.
> 
> Ideally, we would like the 16 HDFS data nodes and two namenodes to be the defaultFS, but would also like the capability to run jobs using other OrangeFS installations. 
> 
> The above error does not occur when OrangeFS is configured to be the defaultFS. Also, we have no problems running teragen/terasort/teravalidate when OrangeFS IS NOT the defaultFS.
> 
> So, is it possible to run TestDFSIO using a FS other than the defaultFS?
> 
> If you're interested in the OrangeFS classes, they can be found here:
> 
> I have not yet run any of the FS tests released with 2.5.1 but hope to soon.
> 
> Regards,
> 
> Jeff Denton
> OrangeFS Developer
> Clemson University
> denton@clemson.edu
> 
> 
> 
> 
> 
> 
> 

Re: TestDFSIO with FS other than defaultFS

Posted by Jay Vyas <ja...@gmail.com>.
Hi jeff.  Wrong fs means that your configuration doesn't know how to bind ofs to the OrangeFS file system class.

You can debug the configuration using fs.dumpConfiguration(....), and you will likely see references to hdfs in there.

By the way, have you tried our bigtop hcfs tests yet? We now support over 100 Hadoop file system compatibility tests...

You can see a good sample of what parameters should be set for a hcfs implementation here: https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml

> On Oct 2, 2014, at 12:42 PM, Jeffrey Denton <de...@clemson.edu> wrote:
> 
> Hello all,
> 
> I'm trying to run TestDFSIO using a different file system other than the configured defaultFS and it doesn't work for me:
> 
> $ hadoop org.apache.hadoop.fs.TestDFSIO -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1 -fileSize 10240
> 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0
> 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000
> 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir = ofs://test/user/denton/TestDFSIO
> 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
> 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240 bytes, 1 files
> java.lang.IllegalArgumentException: Wrong FS: ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)
> at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)
> at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
> at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
> at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290)
> at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
> 
> At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16 data nodes and 3 separate master nodes for the resource manager and two namenodes; however, for this test, the data nodes are really being used to run the map tasks with job output being written to 16 separate OrangeFS servers.
> 
> Ideally, we would like the 16 HDFS data nodes and two namenodes to be the defaultFS, but would also like the capability to run jobs using other OrangeFS installations. 
> 
> The above error does not occur when OrangeFS is configured to be the defaultFS. Also, we have no problems running teragen/terasort/teravalidate when OrangeFS IS NOT the defaultFS.
> 
> So, is it possible to run TestDFSIO using a FS other than the defaultFS?
> 
> If you're interested in the OrangeFS classes, they can be found here:
> 
> I have not yet run any of the FS tests released with 2.5.1 but hope to soon.
> 
> Regards,
> 
> Jeff Denton
> OrangeFS Developer
> Clemson University
> denton@clemson.edu
> 
> 
> 
> 
> 
> 
> 

Re: TestDFSIO with FS other than defaultFS

Posted by Jay Vyas <ja...@gmail.com>.
Hi jeff.  Wrong fs means that your configuration doesn't know how to bind ofs to the OrangeFS file system class.

You can debug the configuration using fs.dumpConfiguration(....), and you will likely see references to hdfs in there.

By the way, have you tried our bigtop hcfs tests yet? We now support over 100 Hadoop file system compatibility tests...

You can see a good sample of what parameters should be set for a hcfs implementation here: https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml

> On Oct 2, 2014, at 12:42 PM, Jeffrey Denton <de...@clemson.edu> wrote:
> 
> Hello all,
> 
> I'm trying to run TestDFSIO using a different file system other than the configured defaultFS and it doesn't work for me:
> 
> $ hadoop org.apache.hadoop.fs.TestDFSIO -Dtest.build.data=ofs://test/user/$USER/TestDFSIO -write -nrFiles 1 -fileSize 10240
> 14/10/02 11:24:19 INFO fs.TestDFSIO: TestDFSIO.1.7
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrFiles = 1
> 14/10/02 11:24:19 INFO fs.TestDFSIO: nrBytes (MB) = 10240.0
> 14/10/02 11:24:19 INFO fs.TestDFSIO: bufferSize = 1000000
> 14/10/02 11:24:19 INFO fs.TestDFSIO: baseDir = ofs://test/user/denton/TestDFSIO
> 14/10/02 11:24:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 14/10/02 11:24:20 WARN hdfs.BlockReaderLocal: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
> 14/10/02 11:24:20 INFO fs.TestDFSIO: creating control file: 10737418240 bytes, 1 files
> java.lang.IllegalArgumentException: Wrong FS: ofs://test/user/denton/TestDFSIO/io_control, expected: hdfs://dsci
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:643)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:191)
> at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:102)
> at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:595)
> at org.apache.hadoop.hdfs.DistributedFileSystem$11.doCall(DistributedFileSystem.java:591)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:591)
> at org.apache.hadoop.fs.TestDFSIO.createControlFile(TestDFSIO.java:290)
> at org.apache.hadoop.fs.TestDFSIO.run(TestDFSIO.java:751)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.fs.TestDFSIO.main(TestDFSIO.java:650)
> 
> At Clemson University, we're running HDP-2.1 (Hadoop 2.4.0.2.1) on 16 data nodes and 3 separate master nodes for the resource manager and two namenodes; however, for this test, the data nodes are really being used to run the map tasks with job output being written to 16 separate OrangeFS servers.
> 
> Ideally, we would like the 16 HDFS data nodes and two namenodes to be the defaultFS, but would also like the capability to run jobs using other OrangeFS installations. 
> 
> The above error does not occur when OrangeFS is configured to be the defaultFS. Also, we have no problems running teragen/terasort/teravalidate when OrangeFS IS NOT the defaultFS.
> 
> So, is it possible to run TestDFSIO using a FS other than the defaultFS?
> 
> If you're interested in the OrangeFS classes, they can be found here:
> 
> I have not yet run any of the FS tests released with 2.5.1 but hope to soon.
> 
> Regards,
> 
> Jeff Denton
> OrangeFS Developer
> Clemson University
> denton@clemson.edu
> 
> 
> 
> 
> 
> 
>