You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ling Kun <lk...@gmail.com> on 2013/04/19 11:18:45 UTC
Any ideas about Performance optimization inside Hadoop framework based on the NFS-like Shared FileSystem

Dear all,

   I am writing to ask for
    1. Any ideas about Performance optimization inside Hadoop framework
based on the NFS-like Shared FileSystem
   2. And this mail is also helpful to discuss about whether HDFS should
support POSIX or NFS-like interface.

    Hadoop MapReduce Framework both depends highly on the distributed file
system, so that masters(JobTracker) can split input file, and assign
individual parts to separated slaves(TaskTracker). And  Since the files are
kept on the shared distributed file  system, if one task or region server
failes. the master/jobtracker can immediately assign the task to a
different TaskTacker which also have a replication of the data-block of the
input file.

   Under some circumstance , the local file access is slower than remote
NFS file access.  This could  be true currently or in the future.  Suppose
we have an fast inter-connection ( say, 10000Mbps Ethernet or Infiniband,
and this connection makes networks faster than local SATA harddisk), and we
also have an NFS-like shared filesystem which can attached to each
TaskTracker Node in the Hadoop Cluster. The NFS-like POSIX-compatible
shared FS could be: MapR fileSystem, Oracle Lustre,  IBM GPFS, EMC Isilon,
Ceph, Gluster, etc. Which have very advanced cache-algorithms that can
ignore the remote harddisk access latency by highly optimized prefetch
strategy.   And also these storage system can store a huge number of
files,( much more than 100 Million).

    I am wondering  what  this means to the current Hadoop framework, and
which component of Hadoop MapReduce could be modified to speedup the whole
cluster performance, and make Hadoop programming much easier..

     The following is some of the component that could be modified:

The Following may improve the performance:
     1. Partitions of Map Output: currently  Map Output should be
partitioned for different reducer, using a index file, and will store in
the NFS. Also an http file transformation is needed for reducers to get its
map output partition from each mapper. We can directly make map generated
output  files for different reducer, and each reducer could access to these
output files directly.  And there is also a problem about the merge
map-output thread.

     2. The Input and output of the Map Reduce Job:  Using NFS-like
Storage, we do not needed to copy files into, and out of HDFSs, which will
save some time for Job pro/epi-log.


The Following could decrease Job Submit latency, which could make Job start
quickly.
     1. Distributed Cache: since the files could directly be accessed using
standard POSIX-interface, so  distributed cache is not necessary. We can
just put data into a NFS-mounted dir, so that other nodes can directly
open-read/write to it.
      2. Setup phase of the Job and Task: The JobJar, JobToken,
_partition.lst files of the job, and some other files that should be
delivered to each TaskTracker can be stored in the NFS mounted dir, and
other TTs could directly access them.


And also no distributed cache, and data copy between local and HDFS could
make the Hadoop MapReduce programming much easier.

Besides all these above, HBase may also benefit from the NFS-like shared
file System.


So far as I know, plenty of NFS-like FileSystem have support Hadoop
MapReduce. and Plenty of them have some optimizations based on the
NFS-feature, but unfornately  these are close source.
    1. MapR have a NFS-like Distributed File System, and also a MR release
( http://www.mapr.com/Download-document/9-NFS-Technical-Brief )
    2. Greenplum also have a team work on optimize Greenplum HD based on
the EMC Isilon (http://www.greenplum.com/products/pivotal-hd )
    3. IBM have once submit a patch on Hadoop JIRA which could make Hadoop
support IBM GPFS, however the patch could not found currently.  (
http://www.greenplum.com/products/pivotal-hd )

Does anyone have any details about these implementation?

This is very interesting, right?


Thanks

Ling Kun

-- 
http://www.lingcc.com