You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Chris D <ch...@gmail.com> on 2010/07/01 20:28:48 UTC

Support for non-HDFS distributed, shared POSIX FS

Hi all,



I’d like to create a new URI for a distributed POSIX-compliant filesystem
shared between all nodes. A number of such filesystems currently exist
(think HDFS w/o the POSIX incompliance). We can, of course, run HDFS on top
of such a file system, but it adds an extra unnecessary and inefficient
layer. Why have a master retrieve a set of data from a FS cluster, only to
distribute it back out to the same cluster but on a different distributed FS
(HDFS)?



In the new URI I seek to create, each MapReduce slave would look for input
data from a seemingly local file:///, and write output to it as well. Assume
that the distributed FS handles concurrent reads, writes. Assuming
POSIX-compliance, the LocalFileSystem seems to be the best foundation.



Please let me know of any warnings or errors you see in this. Any advice is
strongly appreciated as well, as the source tree of Hadoop is new to me and
intimidating.



Best,

--Chris

Re: Support for non-HDFS distributed, shared POSIX FS

Posted by Chris D <ch...@gmail.com>.
With the following configuration, tasks complete successfully:
1. /mnt1 is the mount point for a shared, distributed FS on all machines
running hadoop.
2. hadoop.tmp.dir = /mnt1/tmp/hadoop
3. fs.default.name = file:///
4. `bin/hadoop jar hadoop-*-examples.jar wordcount /mnt1/input /mnt1/ouput`

top and intuition suggest that each machine is reading from it's own /mnt1
point and that computation is distributed. Thank you very much for the
advice Allen.

Best,
--Chris

On Thu, Jul 1, 2010 at 12:15 PM, Allen Wittenauer
<aw...@linkedin.com>wrote:

>
>
>
> On Jul 1, 2010, at 12:00 PM, Chris D wrote:
>
> > Yes, it is mountable on all machines simultaneously, and, for example,
> works
> > properly through file:///mnt/to/dfs in a single node cluster.
>
> Then file:// will likely work a multi-node cluster as well.  So I doubt
> you'll need to write anything at all. :)

Re: Support for non-HDFS distributed, shared POSIX FS

Posted by Allen Wittenauer <aw...@linkedin.com>.


On Jul 1, 2010, at 12:00 PM, Chris D wrote:

> Yes, it is mountable on all machines simultaneously, and, for example, works
> properly through file:///mnt/to/dfs in a single node cluster.

Then file:// will likely work a multi-node cluster as well.  So I doubt you'll need to write anything at all. :)

Re: Support for non-HDFS distributed, shared POSIX FS

Posted by Chris D <ch...@gmail.com>.
Yes, it is mountable on all machines simultaneously, and, for example, works
properly through file:///mnt/to/dfs in a single node cluster.

On Jul 1, 2010 11:51 AM, "Allen Wittenauer" <aw...@linkedin.com>
wrote:


On Jul 1, 2010, at 11:28 AM, Chris D wrote:
> In the new URI I seek to create, each MapReduce slave...
If this is a mountable file system, does LocalFileSystem work out of the box
with no modifications required?

Re: Support for non-HDFS distributed, shared POSIX FS

Posted by Allen Wittenauer <aw...@linkedin.com>.
On Jul 1, 2010, at 11:28 AM, Chris D wrote:
> In the new URI I seek to create, each MapReduce slave would look for input
> data from a seemingly local file:///, and write output to it as well. Assume
> that the distributed FS handles concurrent reads, writes. Assuming
> POSIX-compliance, the LocalFileSystem seems to be the best foundation.

If this is a mountable file system, does LocalFileSystem work out of the box with no modifications required?