You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2008/08/18 20:23:24 UTC

[Hadoop Wiki] Update of "MountableHDFS" by petewyckoff

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by petewyckoff:
http://wiki.apache.org/hadoop/MountableHDFS

New page:
= Mounting HDFS =

These projects allow HDFS to be mounted (on most flavors of Unix) as a standard file system using the mount command.  Once mounted, the user can operate on an instance of hdfs using standard Unix utilities such as 'ls', 'cd', 'cp', 'mkdir', 'find', 'grep', etc. 

== Options ==

 * contrib/fuse-dfs is built on fuse, some C glue, libhdfs and the hadoop-dev.jar
 * fuse-j-hdfs is built on fuse, fuse for java, and the hadoop-dev.jar
 * hdfs-fuse - a google code project is very similar to contrib/fuse-dfs

== Fuse-DFS ==

(currently this is just the README for fuse-dfs)

=== BUILDING ===

Requirements:

   1. a Linux kernel > 2.6.9 or a kernel module from FUSE - i.e., you
   compile it yourself and then modprobe it. Better off with the
   former option if possible.  (Note for now if you use the kernel
   with fuse included, it doesn't allow you to export this through NFS
   so be warned. See the FUSE email list for more about this.)

   2. FUSE should be installed in /usr/local or FUSE_HOME ant
   environment variable

To build:

   1. in HADOOP_HOME: ant compile-contrib -Dcompile.c++=1 -Dfusedfs=1


NOTE: for amd64 architecture, libhdfs will not compile unless you edit
the Makefile in src/c++/libhdfs/Makefile and set OS_ARCH=amd64
(probably the same for others too).


=== CONFIGURING ===

Look at all the paths in fuse_dfs_wrapper.sh and either correct them
or set them in your environment before running. (note for automount
and mount as root, you probably cannnot control the environment, so
best to set them in the wrapper)

=== INSTALLING ===

1. mkdir /mnt/dfs (or wherever you want to mount it)

2. fuse_dfs_wrapper.sh dfs://hadoop_server1.foo.com:9000 /mnt/dfs -d
; and from another terminal, try ls /mnt/dfs

If 2 works, try again dropping the debug mode, i.e., -d

(note - common problems are that you don't have libhdfs.so or
libjvm.so or libfuse.so on your LD_LIBRARY_PATH, and your CLASSPATH
does not contain hadoop and other required jars.)

=== DEPLOYING ===

in a root shell do the following:

1. add the following to /etc/fstab -
  fuse_dfs#dfs://hadoop_server.foo.com:9000 /mnt/dfs fuse
  -oallow_other,rw,-ousetrash 0 0

2. mount /mnt/dfs Expect problems with not finding fuse_dfs. You will
   need to probably add this to /sbin and then problems finding the
   above 3 libraries. Add these using ldconfig.


Fuse DFS takes the following mount options (i.e., on the command line or the comma separated list of options in /etc/fstab:

-oserver=%s  (optional place to specify the server but in fstab use the format above)
-oport=%d (optional port see comment on server option)
-oentry_timeout=%d (how long directory entries are cached by fuse in seconds - see fuse docs)
-oattribute_timeout=%d (how long attributes are cached by fuse in seconds - see fuse docs)
-oprotected=%s (a colon separated list of directories that fuse-dfs should not allow to be deleted or moved - e.g., /user:/tmp)
-oprivate (not often used but means only the person who does the mount can use the filesystem - aka ! allow_others in fuse speak)
-ordbuffer=%d (in KBs how large a buffer should fuse-dfs use when doing hdfs reads)
ro 
rw
-ousetrash (should fuse dfs throw things in /Trash when deleting them)
-onotrash (opposite of usetrash)
-odebug (do not daemonize - aka -d in fuse speak)
-obig_writes (use fuse big_writes option so as to allow better performance of writes on kernels >= 2.6.26)

The defaults are:

entry,attribute_timeouts = 60 seconds
rdbuffer = 10 MB
protected = null
debug = 0
notrash
private = 0

=== EXPORTING ===

Add the following to /etc/exports:

  /mnt/hdfs *.foo.com(no_root_squash,rw,fsid=1,sync)

NOTE - you cannot export this with a FUSE module built into the kernel
- e.g., kernel 2.6.17. For info on this, refer to the FUSE wiki.

=== ADVANCED ===

you may want to ensure certain directories cannot be deleted from the
shell until the FS has permissions. You can set this in the build.xml
file in src/contrib/fuse-dfs/build.xml

=== RECOMMENDATIONS ===

1. From /bin, ln -s $HADOOP_HOME/contrib/fuse-dfs/fuse_dfs* .
2. Always start with debug on so you can see if you are missing a classpath or something like that.
3. use -obig_writes

=== PERFORMANCE ===

1. if you alias ls to ls --color=auto and try listing a directory with lots (over thousands) of files, expect it to be slow and at 10s of thousands, expect it to be
 very very slow.  This is because --color=auto causes ls to stat every file in the directory. Since fuse-dfs does not cache attribute entries when doing a readdir, 
this is very slow. see https://issues.apache.org/jira/browse/HADOOP-3797 

2. Writes are approximately 33% slower than the DFSClient. TBD how to optimize this. see: https://issues.apache.org/jira/browse/HADOOP-3805 - try using -obig_writes
 and if on a >2.6.26 kernel, should perform much better since bigger writes implies less context switching.

3. Reads are ~20-30% slower even with the read buffering. 



== Fuse-j-HDFS ==

see https://issues.apache.org/jira/browse/HADOOP-4

== HDFS-FUSE ==

see http://code.google.com/p/hdfs-fuse/