You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2011/11/11 22:42:37 UTC

HBase cluster on heterogeneous filesystems

Hello,

I was wondering if anyone has done an experiment with HBase or HDFS/MR where machines in the cluster have heterogeneous underlying file systems?
e.g.,
* 10 nodes with xfs
* 10 nodes with ext3
* 10 nodes with ext4

The goal being comparing performance of MapReduce jobs reading from and writing to HBase (or just HDFS).


And does anyone have any reason to believe doing the above would be super risky and cause data loss?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Re: HBase cluster on heterogeneous filesystems

Posted by "Jeronimo de A. Barros" <je...@i2.com.br>.
Hello,

On Fri, 11 Nov 2011, Edward Capriolo wrote:

> I have found that ext3 performance gets noticeably poor as disks gets 
> full.

 	Have you formatted your ext3 partitions with the "dir_index" 
option ? "Use hashed b-trees to speed up lookups in large directories."

 	Does anyone know if the "dir_index", "extent" formatting options 
and the "noatime" mounting option really give a better disk performance ?

 	Thanks for any hint.

Jero

Re: HBase cluster on heterogeneous filesystems

Posted by Edward Capriolo <ed...@gmail.com>.
On Fri, Nov 11, 2011 at 4:42 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hello,
>
> I was wondering if anyone has done an experiment with HBase or HDFS/MR
> where machines in the cluster have heterogeneous underlying file systems?
> e.g.,
> * 10 nodes with xfs
> * 10 nodes with ext3
> * 10 nodes with ext4
>
> The goal being comparing performance of MapReduce jobs reading from and
> writing to HBase (or just HDFS).
>
>
> And does anyone have any reason to believe doing the above would be super
> risky and cause data loss?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/


Since Hadoop abstracts you from the filesystem guts the underlying file
system chosen can be mixed and matched. you can even mix and match the
disks on a single machine.

I have found that ext3 performance gets noticeably poor as disks gets full.
I captured system vitals from a before and after ext3 to ext4 upgrade.

http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/a_great_reason_to_use

Also if you want to get the most out of your disks read this:

http://allthingshadoop.com/2011/05/20/faster-datanodes-with-less-wait-io-using-df-instead-of-du/

XFS should is usually described as on par or slightly better then ext4.
However anecdotally most hardcore sysadmins I know can account for one XFS
"i lost my super block" stories :)

Re: HBase cluster on heterogeneous filesystems

Posted by Fuad Efendi <fu...@efendi.ca>.
Hi Otis,


I had super ugly experience with Amazon EC2 virtual nodes and I even found bug reports related to Ubuntu... Problems with unpredictable "wall time" when everything stops and ZooKeeper sessions expire...

I don't have any problems with dedicated servers and CentoOS.

It is super risky to do any kind of business without (sorry for cliche) corporate standards (knowledge base of corporate problems and corporate workarounds lol) - it is very specific... I suggest "stick with standards" it will lower TCO;)



-Fuad




Sent from my iPad

On 2011-11-11, at 4:42 PM, Otis Gospodnetic <ot...@yahoo.com> wrote:

> Hello,
> 
> I was wondering if anyone has done an experiment with HBase or HDFS/MR where machines in the cluster have heterogeneous underlying file systems?
> e.g.,
> * 10 nodes with xfs
> * 10 nodes with ext3
> * 10 nodes with ext4
> 
> The goal being comparing performance of MapReduce jobs reading from and writing to HBase (or just HDFS).
> 
> 
> And does anyone have any reason to believe doing the above would be super risky and cause data loss?
> 
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/