You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Philip M. White" <pm...@qnan.org> on 2009/03/05 06:28:07 UTC

HDFS handle creates local files?

Hi, all,

I have a problem that sounds ridiculous, but I've been struggling with
this for a while now.  I hope you can help.

I have a small Java program that performs some very basic operation
within the HDFS.  The program either creates a text file or just creates
a new blank file or creates a directory.  

I also have the latest stable version of Hadoop installed in a
single-machine configuration on Linux.  As far as I can tell Hadoop
works great; I can run sample M/R jobs.  My Hadoop's fs.default.name is
hdfs://localhost:9000/.

The problem is that sometimes my small Java program works perfectly --
it does what's asked of it on the HDFS -- but sometimes it instead
creates the requested file or directory on my local, native filesystem,
without ever affecting the HDFS.

For example, here's the relevant part of my test program:
h_conf = new Configuration();
h_fs = FileSystem.get(h_conf);
FileSystem.mkdirs(h_fs, new Path("/tmp/junit-test"), new FsPermission(FsAction.ALL, FsAction.ALL, FsAction.NONE));

Sometimes after I run it, I can do 'hadoop fs -ls /tmp' and see the
junit-test directory.  At other times, the above command doesn't show
that directory but instead I can do 'ls /tmp' and see that directory!

The worst thing about this problem is that I haven't been able to
establish what circumstances affect which behavior.  Thus far it appears
truly random.

It's not random in that every time I run it I get a random result, but
rather that during one programming session it works as intended, no
matter how many times I rerun the code, restart the IDE or restart
Eclipse, and during another session it doesn't work no matter what I do.

As one example, yesterday it was working fine until I replaced in my IDE
hadoop-0.19-0-core.jar with hadoop-0.19.1-dev-core.jar that I built
myself from the 0.19.0 code.  Then it changed to the native filesystem
behavior.  I reverted the library to 0.19.0-core.jar, but the native
filesystem behavior persisted and I cannot get Hadoop to write to the
HDFS anymore.

There is also a seemingly unrelated problem but I'm guessing it's
related.

I'm working on developing a Hadoop backend for Jena, so I often run
Hadoop-enabled Jena on the same computer and from the same IDE.
Occasionally (there's that word again), an exception is raised claiming
that the JVM is out of heap space so Hadoop cannot execute.  Tweaking
JVM's command-line arguments to change the amount of memory allocated
has no effect.  On other days everything works fine without any errors.
When it doesn't work, I can see that Java is not /truly/ out of memory
-- there's as much available memory (if not more) as when Java runs
fine.

Also in the past I was able to fix the out-of-memory issue by
recompiling everything inside Eclipse.  Today I couldn't.

Today I experienced both problems (Hadoop writing to native filesystem
in my small Java program, and out-of-memory exceptions in Hadoop-enabled
Jena) simultaneously.  On previous occasions I'm not sure whether these
issued occurred simultaneously or not.

What could be going on?  Any ideas would be very appreciated.

Thank you.

-- 
Philip