You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Nathan Rutman <nr...@gmail.com> on 2011/01/27 22:44:02 UTC

TestDFSIO on Lustre vs HDFS

In case others are interested, I ran a comparison of TestDFSIO on HDFS vs Lustre.
This is on an 8-node Infiniband-connected cluster.  For the Lustre test, we replaced the HTTP transfer during the shuffle phase with a simple hardlink to the data (since all data is always visible on all nodes with Lustre).


Max Map Thread = 80; Max Reduce Thread = 1; File Size = 512MB; Scheduler = JobQueue; Buffer Size = Default; Number of Nodes = 8; Drive Speed = 80MB/s



The conclusion is that Lustre TestDFSIO performance is significantly better than HDFS when using a fast network (as it theoretically should be).  On a slower network (e.g. 1gigE), I would not expect Lustre to show much advantage over HDFS.

Re: TestDFSIO on Lustre vs HDFS

Posted by Allen Wittenauer <aw...@linkedin.com>.

On Jan 27, 2011, at 1:44 PM, Nathan Rutman wrote:

> This is on an 8-node Infiniband-connected cluster.

	In other words, not a realistic Hadoop grid.   

	Let me know how that 100 node test goes.

Re: TestDFSIO on Lustre vs HDFS

Posted by Allen Wittenauer <aw...@linkedin.com>.

On Jan 28, 2011, at 10:39 AM, Nathan Rutman wrote:
> Your storage type should depend on the kind of data your storing, the quantity, the reliability, scalabilty, heterogenicity (sic), data access pattern, applications you're using, performance requirements, and system cost.   My point in posting this stuff is not to say the Lustre should be your choice for Hadoop backend in all situations.  It was really to show that HDFS was designed for a particular usage pattern and scale, and using it outside of that realm may not be the best choice.  I was looking to the HDFS community to poke holes in my arguments.

	People who approach HDFS from a pure filesystem perspective are often disappointed because they miss out on the fact that it is written primarily to support Hadoop's MapReduce framework.  In particular, this means having access to data locality information so that the network hit is mostly immaterial when reading or writing.   It is going to make a huge difference if you are reading a single TB file from one node for processing (which in turn will likely require many many block fetches from across the network) vs. being able to distribute that read to multiple hosts (such that there are is little-to-no network activity at all).

>  Also, to get improved Hadoop performance, the network needs to be more expensive than 1gigE.  

	Hardly, especially when trunking is thrown into the mix.  

> And Lustre requires more sysadmin care and understanding, which adds to total cost of ownership.
> But all of that is a "fixed" cost -- it does not scale linearly with your storage size. If you double your storage requirement, you'll pay ~1.2x for RAID parity and spare space with Lustre, but you'll pay 3x for HDFS disks.  The Lustre initial costs are higher.  So at some scale there will necessarily be a cost crossover.

	As nodes are added, the network costs will also go up, regardless of setup.  The only time they don't is if the original design had significantly over provisioned network vs. node count.  Only using 8 nodes hides this fact.

Re: TestDFSIO on Lustre vs HDFS

Posted by Nathan Rutman <nr...@gmail.com>.

Hi Rita, thanks for a great response.

On Jan 27, 2011, at 7:31 PM, Rita wrote:

> Comparing apples and oranges.
Certainly some factors are comparable, others are not.  I was primarily interested in performance of Hadoop IO.

> Lustre is great filesystem but has no native fault tolerance. If you want POSIX filesystem with high performance than Lustre does it. However, if you want to access data in a heterogeneous environment and not POSIX complaint then hdfs is the tool. 
I am so on the same page as you :)

Your storage type should depend on the kind of data your storing, the quantity, the reliability, scalabilty, heterogenicity (sic), data access pattern, applications you're using, performance requirements, and system cost.   My point in posting this stuff is not to say the Lustre should be your choice for Hadoop backend in all situations.  It was really to show that HDFS was designed for a particular usage pattern and scale, and using it outside of that realm may not be the best choice.  I was looking to the HDFS community to poke holes in my arguments.

> 
> 
> I've read an earlier thread from you, before you choose a filesystem some things to consider:

> 
> Cost: Any exoctic software hardware needed? (Lustre and hdfs can run very well on commodity hardware) 

> Transparency: Any application change needed? Lustre wins in this! With hdfs you would have to convert or make changes in the way you access the data
> Scalability: Both scale well.
> Implementation cost: The cost of implementing a solution and maintaining it. HDFS wins.  It will run on any server which will run java. No kernel modules, no kernel configuration, etc...it just works out of the box

I'd say that HDFS probably wins on the "exotic hardware" requirements -- Lustre failover typically requires standalone RAID boxes, redundant servers, and redundant network pathing in order to achieve data access reliability.  (It can run without this stuff, but that introduces single points of failure.)  Also, to get improved Hadoop performance, the network needs to be more expensive than 1gigE.  And Lustre requires more sysadmin care and understanding, which adds to total cost of ownership.
But all of that is a "fixed" cost -- it does not scale linearly with your storage size. If you double your storage requirement, you'll pay ~1.2x for RAID parity and spare space with Lustre, but you'll pay 3x for HDFS disks.  The Lustre initial costs are higher.  So at some scale there will necessarily be a cost crossover.

Some other factors: there is the cost per megabyte, and there is also a cost per megabyte per second.  If performance is important to you (again, it becomes more of an issue at larger scales), then that also must enter the calculation.  Or, if you only care about 100% data availability, that also will influence your choice.  Are you just using Hadoop or HBase, or do you need to run other distributed software?  

Thanks all for your time and responses.

> 
> 
> 
> 
> 
> 
> 
> On Thu, Jan 27, 2011 at 4:44 PM, Nathan Rutman <nr...@gmail.com> wrote:
> In case others are interested, I ran a comparison of TestDFSIO on HDFS vs Lustre.
> This is on an 8-node Infiniband-connected cluster.  For the Lustre test, we replaced the HTTP transfer during the shuffle phase with a simple hardlink to the data (since all data is always visible on all nodes with Lustre).
> 
> 
> Max Map Thread = 80; Max Reduce Thread = 1; File Size = 512MB; Scheduler = JobQueue; Buffer Size = Default; Number of Nodes = 8; Drive Speed = 80MB/s
> 
> 
> 
> The conclusion is that Lustre TestDFSIO performance is significantly better than HDFS when using a fast network (as it theoretically should be).  On a slower network (e.g. 1gigE), I would not expect Lustre to show much advantage over HDFS.
> 
> 
> 
> 
> -- 
> --- Get your facts first, then you can distort them as you please.--

Re: TestDFSIO on Lustre vs HDFS

Posted by Rita <rm...@gmail.com>.

Comparing apples and oranges.

Lustre is great filesystem but has no native fault tolerance. If you want
POSIX filesystem with high performance than Lustre does it. However, if you
want to access data in a heterogeneous environment and not POSIX complaint
then hdfs is the tool.

I've read an earlier thread from you, before you choose a filesystem some
things to consider:

Cost: Any exoctic software hardware needed? (Lustre and hdfs can run very
well on commodity hardware)
Transparency: Any application change needed? Lustre wins in this! With hdfs
you would have to convert or make changes in the way you access the data
Scalability: Both scale well.
Implementation cost: The cost of implementing a solution and maintaining it.
HDFS wins.  It will run on any server which will run java. No kernel
modules, no kernel configuration, etc...it just works out of the box

On Thu, Jan 27, 2011 at 4:44 PM, Nathan Rutman <nr...@gmail.com> wrote:

> In case others are interested, I ran a comparison of TestDFSIO on HDFS vs
> Lustre.
> This is on an 8-node Infiniband-connected cluster.  For the Lustre test, we
> replaced the HTTP transfer during the shuffle phase with a simple hardlink
> to the data (since all data is always visible on all nodes with Lustre).
>
>
> Max Map Thread = 80; Max Reduce Thread = 1; File Size = 512MB; Scheduler =
> JobQueue; Buffer Size = Default; Number of Nodes = 8; Drive Speed = 80MB/s
>
>
> The conclusion is that Lustre TestDFSIO performance is significantly better
> than HDFS when using a fast network (as it theoretically should be).  On a
> slower network (e.g. 1gigE), I would not expect Lustre to show much
> advantage over HDFS.
>
>

-- 
--- Get your facts first, then you can distort them as you please.--