You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Robert Evans <ev...@yahoo-inc.com> on 2011/08/25 15:26:26 UTC

New competition

I saw an article yesterday saying the GlusterFS 3.3 now has Hadoop bindings.
I also ran across XtreemFS a while back, which also supports Hadoop
bindings.  Both of them claim to be faster and more scalable than HDFS.

Has anyone in the community done some actual benchmarks on the same hardware
for some HDFS replacements?  I would love to see how true their claims are
and what we need to do to beat them.

http://www.infostor.com/storage-management/gluster-goes-after-hadoop-big-dat
a.html

http://www.gluster.org/

http://www.xtreemfs.org/

--Bobby Evans


Re: New competition

Posted by Allen Wittenauer <aw...@apache.org>.
On Aug 25, 2011, at 6:26 AM, Robert Evans wrote:

> I saw an article yesterday saying the GlusterFS 3.3 now has Hadoop bindings.
> I also ran across XtreemFS a while back, which also supports Hadoop
> bindings.  Both of them claim to be faster and more scalable than HDFS.

... for various values of "faster" and "scalable".  

	For example, in the case of gluster, I haven't seen any references to PB-sized filesystems.  But gluster likely handles small files better.  Both are measurements of scale, but is one more scalable than the other?  Depending upon use case, obviously yes.

	As Hadoop gains in importance, we're seeing more and more of these type of overly broad statements.  Consumers just need to be smart and do their research to find the correct bits for them.  I just hope folks actually dig into the details before spending their cash.  

> Has anyone in the community done some actual benchmarks on the same hardware
> for some HDFS replacements?  I would love to see how true their claims are
> and what we need to do to beat them.

	I don't think it is a matter of 'beating' them.   Certain environments have needs that aren't met by HDFS and they will go out looking for alternatives.  Competition is also good in the sense that it drives people to improve the base stuff.  (The sudden importance of HA in certain camps is a great example of this.)

	Besides, for a lot of these commercial companies, the fact that they aren't submitting (viable) patches to be included in Apache Hadoop puts them at a disadvantage from the start.