You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Chris K Wensel <ch...@wensel.net> on 2008/05/01 02:30:57 UTC

Re: comparing hbase backed by HDFS verses S3

Anything relating to S3 will be slower thus it probably shouldn't be  
used as the default FileSystem for Hadoop.

It works great if you need to park data between cluster runs, assuming  
you do not need external (from Hadoop and the cluster) applications to  
be able to read the data, as data in S3FS are stuffed into S3 as  
blocks (similar to HDFS).

Further,  once support for appends is added to Hadoop/HDFS, I am  
unsure if it will be inherited by S3FS. I think this is a critical  
issue for HBase.

Assuming your aren't expecting this cluster to live forever, maybe you  
should keep your authoritative data on s3 (native or s3fs) and just  
reload HBase on cluster init?

ckw

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/
http://www.cascading.org/

On Apr 30, 2008, at 1:02 PM, Clint Morgan wrote:

> We are considering using S3 as the DFS impl for hbase. I ran some
> benchmarks to get an idea for the performance differences. We are
> particularly interested in being able to serve data to users from
> hbase, so want low latency responses for getting 10s of rows.
>
> Each row ("transaction") has about 1K worth of data in about 5 columns
> in two families. I'm using HBASE-605 to maintain a secondary index on
> the transaction amount. There is also a "relation" to a customer
> table, so some reads will also do a get from this other table.
>
> First ran hbase backed by hdfs. Everything was run on EC2 small nodes.
> 1 node for Name node, 1 node for Data
> node, 1 node with Master and Region server, 1 node to load/read data
> from.
>
> Adding 50K transactions: [56610.166]ms
> Find all transactions: [35388.601]ms
> FindAll page 1: [125.058]ms (PageSize is 10)
> FindAll page 11: [71.89]ms
> FindAll page 51: [145.54]ms
> FindAll page 61: [268.486]ms
>
> FindAll sorted page 1: [139.881]ms
> FindAll sorted page 11: [1521.655]ms
> FindAll sorted page 21: [2729.641]ms
> FindAll sorted page 31: [3035.18]ms
>
> Then I ran hbase backed by s3. Everything else the same
>
> Adding 50K transaction: [104826.437]ms
> Findall transaction: [51622.039]ms
> Findall page 1: [5694.974]ms
> Findall page 11: [4878.234]ms
> Findall page 51: [5743.882]ms
> Findall page 61: [4167.133]ms
>
> Findall sorted page 1: [18535.306]ms
> Then the other sorted finds timed out on the RPC call.
>
> So to summarize:
> loading data: almost twice as slow
> A long scan is about 1.5 times slower
> short scans are over an order of magnitude slower
> and random reads (done on the sorted "scan") are over 2 orders of
> magnitude slower
>
> Do these results sound reasonable? Is S3 really that costly compared
> to HDFS? Thanks for your input.
> -clint

Re: comparing hbase backed by HDFS verses S3

Posted by Chris K Wensel <ch...@wensel.net>.

No. It just means I have no idea how appends will be implemented and  
how it affects the other FileSystems.

On May 1, 2008, at 2:59 AM, Leon Mergen wrote:

> On Thu, May 1, 2008 at 2:30 AM, Chris K Wensel <ch...@wensel.net>  
> wrote:
>
>> Further,  once support for appends is added to Hadoop/HDFS, I am  
>> unsure if
>> it will be inherited by S3FS. I think this is a critical issue for  
>> HBase.
>>
>
> Does that mean that support for an S3-backed storage backend will be  
> stopped
> ?
>
> -- 
> Leon Mergen
> http://www.solatis.com

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/
http://www.cascading.org/

Re: comparing hbase backed by HDFS verses S3

Posted by Leon Mergen <le...@solatis.com>.

On Thu, May 1, 2008 at 2:30 AM, Chris K Wensel <ch...@wensel.net> wrote:

> Further,  once support for appends is added to Hadoop/HDFS, I am unsure if
> it will be inherited by S3FS. I think this is a critical issue for HBase.
>

Does that mean that support for an S3-backed storage backend will be stopped
?

-- 
Leon Mergen
http://www.solatis.com

Re: comparing hbase backed by HDFS verses S3

Posted by Leon Mergen <le...@solatis.com>.

On Mon, May 5, 2008 at 6:12 PM, Clint Morgan <cm...@troove.net> wrote:

> Actually, I think a more simple approach will get what we want here:
> Give hbase a custom filesystem which writes to hdfs, then to s3, but
> reads just from hdfs.

Keep in mind that Amazon is about to release permanent file storage on EC2:

http://aws.typepad.com/aws/2008/04/block-to-the-fu.html

"These volumes can be thought of as raw, unformatted disk drives which can
be formatted and then used as desired (or even used as raw storage if you'd
like). Volumes can range in size from 1 GB on up to 1 TB; you can create and
attach several of them to each EC2 instance. They are designed for low
latency, high throughput access from Amazon EC2. Needless to say, you can
use these volumes to host a relational database."

This might save you or the hadoop team from a lot of unnecessary work.. or
aren't you talking about EC2 instances ?

-- 
Leon Mergen
http://www.solatis.com

Re: comparing hbase backed by HDFS verses S3

Posted by "Jim R. Wilson" <wi...@gmail.com>.

> Actually, I think a more simple approach will get what we want here:
> Give hbase a custom filesystem which writes to hdfs, then to s3, but
> reads just from hdfs.

+1 :)

That would be fantastic for making geographically distributed live-backups.

-- Jim R. Wilson (jimbojw)

On Mon, May 5, 2008 at 11:41 AM, stack <st...@duboce.net> wrote:
> Clint Morgan wrote:
>
> > Actually, I think a more simple approach will get what we want here:
> > Give hbase a custom filesystem which writes to hdfs, then to s3, but
> > reads just from hdfs.
> >
> >
>
>  Thats an interesting idea Clint.  What would you call it?  (hdfs3?)
>
>  ...
>
>
> > On Thu, May 1, 2008 at 5:14 PM, Clint Morgan <cm...@troove.net> wrote:
> >
> >
> > >
> > > >  What do you need for HBASE-50?  Is it sufficient forcing the cluster
> to go
> > > >
> > > >
> > >  > read-only flushing all in memory while the copy runs?
> > >
> > >  Hopefully we can minimize the time we are read-only. We'd like the
> > >  system to behave as close to normally as possible while snapshotting.
> > >  Is the only danger of allowing writes that some new hstores may get
> > >  written so that we don't get a consistent view? Could we solve that by
> > >  only copying files who where created before the time when flushing
> > >  completed? Or just taking a listing at this point and only copying
> > >  from this listing?
> > >
> > >
> > >
> >
>  HBase also removes files (For example, after compaction, it'll remove the
> old or after a split is done with its parent, the parent is removed).  You
> could make a manifest that had all files at the time of snapshot but we'd
> have to do something like rename files that are for deletion adding a
> '.deleted' or use something like hard links -- does this exist in hdfs? --
> so that deletes would be available to the copy task when it gets around to
> the copy.
>
>
>
> >
> > >  It would be nice for hbase to provide the orchestration when we need
> > >  to restore from a snapshot. (Taking regions offline, copying over the
> > >  appropriate parts of region and META, etc).
> > >
> > >
> > >
> >
>  Yes.  A tool that ran the 'fix' from backup and that verified and repaired
> the install.
>
>
>
> >
> > >  At this point I'm still not sure what types of failures to plan for.
> > >  Any input on the sort of failures we should expect w.r.t data loss and
> > >  corruption? Obviously name node failure, which we would handle with
> > >  secondary name node. Should be able to recover from that just by bring
> > >  hdfs back online. So I guess the main concern is that we get a
> > >  corrupted hdfs file.
> > >
> >
>  Can you tolerate empty write-ahead-logs?  That is, lose of the in-memory
> content on regionserver crash because we don't yet have HADOOP-1700?
>
>
>  Otherwise, from what I've seen, fallures generally come of our having
> problems writing hdfs.
>
>  St.Ack
>

Re: comparing hbase backed by HDFS verses S3

Posted by stack <st...@duboce.net>.

Clint Morgan wrote:
> Actually, I think a more simple approach will get what we want here:
> Give hbase a custom filesystem which writes to hdfs, then to s3, but
> reads just from hdfs.
>   

Thats an interesting idea Clint.  What would you call it?  (hdfs3?)

...
> On Thu, May 1, 2008 at 5:14 PM, Clint Morgan <cm...@troove.net> wrote:
>   
>>>  What do you need for HBASE-50?  Is it sufficient forcing the cluster to go
>>>       
>>  > read-only flushing all in memory while the copy runs?
>>
>>  Hopefully we can minimize the time we are read-only. We'd like the
>>  system to behave as close to normally as possible while snapshotting.
>>  Is the only danger of allowing writes that some new hstores may get
>>  written so that we don't get a consistent view? Could we solve that by
>>  only copying files who where created before the time when flushing
>>  completed? Or just taking a listing at this point and only copying
>>  from this listing?
>>
>>     
HBase also removes files (For example, after compaction, it'll remove 
the old or after a split is done with its parent, the parent is 
removed).  You could make a manifest that had all files at the time of 
snapshot but we'd have to do something like rename files that are for 
deletion adding a '.deleted' or use something like hard links -- does 
this exist in hdfs? -- so that deletes would be available to the copy 
task when it gets around to the copy.

>>  It would be nice for hbase to provide the orchestration when we need
>>  to restore from a snapshot. (Taking regions offline, copying over the
>>  appropriate parts of region and META, etc).
>>
>>     
Yes.  A tool that ran the 'fix' from backup and that verified and 
repaired the install.

>>  At this point I'm still not sure what types of failures to plan for.
>>  Any input on the sort of failures we should expect w.r.t data loss and
>>  corruption? Obviously name node failure, which we would handle with
>>  secondary name node. Should be able to recover from that just by bring
>>  hdfs back online. So I guess the main concern is that we get a
>>  corrupted hdfs file.
Can you tolerate empty write-ahead-logs?  That is, lose of the in-memory 
content on regionserver crash because we don't yet have HADOOP-1700?


Otherwise, from what I've seen, fallures generally come of our having 
problems writing hdfs.

St.Ack

Re: comparing hbase backed by HDFS verses S3

Posted by Clint Morgan <cm...@troove.net>.

Actually, I think a more simple approach will get what we want here:
Give hbase a custom filesystem which writes to hdfs, then to s3, but
reads just from hdfs.

This way we maintain a fresh backup on s3.Then when hdfs crashes we
can piece it back together from s3. Meanwhile, we could even try to
keep hbase responsive by reading from s3 while hdfs is down.

While this does not provide a consistent snapshot at a given point in
time like HBASE-50 is asking, I think it would work for us as a data
backup.

cheers,
-clint

On Thu, May 1, 2008 at 5:14 PM, Clint Morgan <cm...@troove.net> wrote:
> >  What do you need for HBASE-50?  Is it sufficient forcing the cluster to go
>  > read-only flushing all in memory while the copy runs?
>
>  Hopefully we can minimize the time we are read-only. We'd like the
>  system to behave as close to normally as possible while snapshotting.
>  Is the only danger of allowing writes that some new hstores may get
>  written so that we don't get a consistent view? Could we solve that by
>  only copying files who where created before the time when flushing
>  completed? Or just taking a listing at this point and only copying
>  from this listing?
>
>  It would be nice for hbase to provide the orchestration when we need
>  to restore from a snapshot. (Taking regions offline, copying over the
>  appropriate parts of region and META, etc).
>
>  At this point I'm still not sure what types of failures to plan for.
>  Any input on the sort of failures we should expect w.r.t data loss and
>  corruption? Obviously name node failure, which we would handle with
>  secondary name node. Should be able to recover from that just by bring
>  hdfs back online. So I guess the main concern is that we get a
>  corrupted hdfs file.
>
>  cheers,
>  -clint
>
>
>
>
>  >  CopyFiles/distcp should be able to go between filesystems, running the copy
>  > in an MR job which is probably what you want.  You could go hdfs:// to
>  > s3://.  Upside would be that all of the s3 vagaries would be managed for you
>  > (size limits, etc.).  Downside would be that you'd have to put up an s3 dfs
>  > as copy sink.   Alternatively, maybe you can convince distcp to go between
>  > hdfs:// and http://?  Hbase files should never really be bigger than a 1G or
>  > so, so it should be 'safe'.
>  >
>  >  St.Ack
>  >
>  >
>  >
>  >  Clint Morgan wrote:
>  >
>  > > Thanks for the input as it confirmed my suspicions.
>  > >
>  > > We were debating running off of S3 just to minimize moving parts. But
>  > > it does not look feasible.
>  > >
>  > > We are wanting the cluster to "live forever" in that once the app is
>  > > live, hbase will always be needed to serve data.
>  > >
>  > > A primary concern is data lose, so will probably still want to use S3
>  > > as a backup medium. Moreover, we'd like to be able to quickly recover
>  > > from HDFS failures to minimize downtime.This makes HBASE-50 look like
>  > > the way to go.
>  > >
>  > > cheers,
>  > > -clint
>  > >
>  > > On Wed, Apr 30, 2008 at 5:30 PM, Chris K Wensel <ch...@wensel.net> wrote:
>  > >
>  > >
>  > > > Anything relating to S3 will be slower thus it probably shouldn't be
>  > used as
>  > > > the default FileSystem for Hadoop.
>  > > >
>  > > >  It works great if you need to park data between cluster runs, assuming
>  > you
>  > > > do not need external (from Hadoop and the cluster) applications to be
>  > able
>  > > > to read the data, as data in S3FS are stuffed into S3 as blocks (similar
>  > to
>  > > > HDFS).
>  > > >
>  > > >  Further,  once support for appends is added to Hadoop/HDFS, I am unsure
>  > if
>  > > > it will be inherited by S3FS. I think this is a critical issue for
>  > HBase.
>  > > >
>  > > >  Assuming your aren't expecting this cluster to live forever, maybe you
>  > > > should keep your authoritative data on s3 (native or s3fs) and just
>  > reload
>  > > > HBase on cluster init?
>  > > >
>  > > >  ckw
>  > > >
>  > > >  Chris K Wensel
>  > > >  chris@wensel.net
>  > > >  http://chris.wensel.net/
>  > > >  http://www.cascading.org/
>  > > >
>  > > >
>  > > >
>  > > >  On Apr 30, 2008, at 1:02 PM, Clint Morgan wrote:
>  > > >
>  > > >
>  > > >
>  > > >
>  > > > > We are considering using S3 as the DFS impl for hbase. I ran some
>  > > > > benchmarks to get an idea for the performance differences. We are
>  > > > > particularly interested in being able to serve data to users from
>  > > > > hbase, so want low latency responses for getting 10s of rows.
>  > > > >
>  > > > > Each row ("transaction") has about 1K worth of data in about 5 columns
>  > > > > in two families. I'm using HBASE-605 to maintain a secondary index on
>  > > > > the transaction amount. There is also a "relation" to a customer
>  > > > > table, so some reads will also do a get from this other table.
>  > > > >
>  > > > > First ran hbase backed by hdfs. Everything was run on EC2 small nodes.
>  > > > > 1 node for Name node, 1 node for Data
>  > > > > node, 1 node with Master and Region server, 1 node to load/read data
>  > > > > from.
>  > > > >
>  > > > > Adding 50K transactions: [56610.166]ms
>  > > > > Find all transactions: [35388.601]ms
>  > > > > FindAll page 1: [125.058]ms (PageSize is 10)
>  > > > > FindAll page 11: [71.89]ms
>  > > > > FindAll page 51: [145.54]ms
>  > > > > FindAll page 61: [268.486]ms
>  > > > >
>  > > > > FindAll sorted page 1: [139.881]ms
>  > > > > FindAll sorted page 11: [1521.655]ms
>  > > > > FindAll sorted page 21: [2729.641]ms
>  > > > > FindAll sorted page 31: [3035.18]ms
>  > > > >
>  > > > > Then I ran hbase backed by s3. Everything else the same
>  > > > >
>  > > > > Adding 50K transaction: [104826.437]ms
>  > > > > Findall transaction: [51622.039]ms
>  > > > > Findall page 1: [5694.974]ms
>  > > > > Findall page 11: [4878.234]ms
>  > > > > Findall page 51: [5743.882]ms
>  > > > > Findall page 61: [4167.133]ms
>  > > > >
>  > > > > Findall sorted page 1: [18535.306]ms
>  > > > > Then the other sorted finds timed out on the RPC call.
>  > > > >
>  > > > > So to summarize:
>  > > > > loading data: almost twice as slow
>  > > > > A long scan is about 1.5 times slower
>  > > > > short scans are over an order of magnitude slower
>  > > > > and random reads (done on the sorted "scan") are over 2 orders of
>  > > > > magnitude slower
>  > > > >
>  > > > > Do these results sound reasonable? Is S3 really that costly compared
>  > > > > to HDFS? Thanks for your input.
>  > > > > -clint
>  > > > >
>  > > > >
>  > > > >
>  > > >
>  > > >
>  > > >
>  > > >
>  > > >
>  > > >
>  > >
>  >
>  >
>

Re: comparing hbase backed by HDFS verses S3

Posted by Clint Morgan <cm...@troove.net>.

>  What do you need for HBASE-50?  Is it sufficient forcing the cluster to go
> read-only flushing all in memory while the copy runs?

Hopefully we can minimize the time we are read-only. We'd like the
system to behave as close to normally as possible while snapshotting.
Is the only danger of allowing writes that some new hstores may get
written so that we don't get a consistent view? Could we solve that by
only copying files who where created before the time when flushing
completed? Or just taking a listing at this point and only copying
from this listing?

It would be nice for hbase to provide the orchestration when we need
to restore from a snapshot. (Taking regions offline, copying over the
appropriate parts of region and META, etc).

At this point I'm still not sure what types of failures to plan for.
Any input on the sort of failures we should expect w.r.t data loss and
corruption? Obviously name node failure, which we would handle with
secondary name node. Should be able to recover from that just by bring
hdfs back online. So I guess the main concern is that we get a
corrupted hdfs file.

cheers,
-clint


>  CopyFiles/distcp should be able to go between filesystems, running the copy
> in an MR job which is probably what you want.  You could go hdfs:// to
> s3://.  Upside would be that all of the s3 vagaries would be managed for you
> (size limits, etc.).  Downside would be that you'd have to put up an s3 dfs
> as copy sink.   Alternatively, maybe you can convince distcp to go between
> hdfs:// and http://?  Hbase files should never really be bigger than a 1G or
> so, so it should be 'safe'.
>
>  St.Ack
>
>
>
>  Clint Morgan wrote:
>
> > Thanks for the input as it confirmed my suspicions.
> >
> > We were debating running off of S3 just to minimize moving parts. But
> > it does not look feasible.
> >
> > We are wanting the cluster to "live forever" in that once the app is
> > live, hbase will always be needed to serve data.
> >
> > A primary concern is data lose, so will probably still want to use S3
> > as a backup medium. Moreover, we'd like to be able to quickly recover
> > from HDFS failures to minimize downtime.This makes HBASE-50 look like
> > the way to go.
> >
> > cheers,
> > -clint
> >
> > On Wed, Apr 30, 2008 at 5:30 PM, Chris K Wensel <ch...@wensel.net> wrote:
> >
> >
> > > Anything relating to S3 will be slower thus it probably shouldn't be
> used as
> > > the default FileSystem for Hadoop.
> > >
> > >  It works great if you need to park data between cluster runs, assuming
> you
> > > do not need external (from Hadoop and the cluster) applications to be
> able
> > > to read the data, as data in S3FS are stuffed into S3 as blocks (similar
> to
> > > HDFS).
> > >
> > >  Further,  once support for appends is added to Hadoop/HDFS, I am unsure
> if
> > > it will be inherited by S3FS. I think this is a critical issue for
> HBase.
> > >
> > >  Assuming your aren't expecting this cluster to live forever, maybe you
> > > should keep your authoritative data on s3 (native or s3fs) and just
> reload
> > > HBase on cluster init?
> > >
> > >  ckw
> > >
> > >  Chris K Wensel
> > >  chris@wensel.net
> > >  http://chris.wensel.net/
> > >  http://www.cascading.org/
> > >
> > >
> > >
> > >  On Apr 30, 2008, at 1:02 PM, Clint Morgan wrote:
> > >
> > >
> > >
> > >
> > > > We are considering using S3 as the DFS impl for hbase. I ran some
> > > > benchmarks to get an idea for the performance differences. We are
> > > > particularly interested in being able to serve data to users from
> > > > hbase, so want low latency responses for getting 10s of rows.
> > > >
> > > > Each row ("transaction") has about 1K worth of data in about 5 columns
> > > > in two families. I'm using HBASE-605 to maintain a secondary index on
> > > > the transaction amount. There is also a "relation" to a customer
> > > > table, so some reads will also do a get from this other table.
> > > >
> > > > First ran hbase backed by hdfs. Everything was run on EC2 small nodes.
> > > > 1 node for Name node, 1 node for Data
> > > > node, 1 node with Master and Region server, 1 node to load/read data
> > > > from.
> > > >
> > > > Adding 50K transactions: [56610.166]ms
> > > > Find all transactions: [35388.601]ms
> > > > FindAll page 1: [125.058]ms (PageSize is 10)
> > > > FindAll page 11: [71.89]ms
> > > > FindAll page 51: [145.54]ms
> > > > FindAll page 61: [268.486]ms
> > > >
> > > > FindAll sorted page 1: [139.881]ms
> > > > FindAll sorted page 11: [1521.655]ms
> > > > FindAll sorted page 21: [2729.641]ms
> > > > FindAll sorted page 31: [3035.18]ms
> > > >
> > > > Then I ran hbase backed by s3. Everything else the same
> > > >
> > > > Adding 50K transaction: [104826.437]ms
> > > > Findall transaction: [51622.039]ms
> > > > Findall page 1: [5694.974]ms
> > > > Findall page 11: [4878.234]ms
> > > > Findall page 51: [5743.882]ms
> > > > Findall page 61: [4167.133]ms
> > > >
> > > > Findall sorted page 1: [18535.306]ms
> > > > Then the other sorted finds timed out on the RPC call.
> > > >
> > > > So to summarize:
> > > > loading data: almost twice as slow
> > > > A long scan is about 1.5 times slower
> > > > short scans are over an order of magnitude slower
> > > > and random reads (done on the sorted "scan") are over 2 orders of
> > > > magnitude slower
> > > >
> > > > Do these results sound reasonable? Is S3 really that costly compared
> > > > to HDFS? Thanks for your input.
> > > > -clint
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>
>

Re: comparing hbase backed by HDFS verses S3

Posted by stack <st...@duboce.net>.

Thanks for trying this "interesting" experiment Clint.  I'm a little 
surprised the thing worked at all (smile).

What do you need for HBASE-50?  Is it sufficient forcing the cluster to 
go read-only flushing all in memory while the copy runs?

CopyFiles/distcp should be able to go between filesystems, running the 
copy in an MR job which is probably what you want.  You could go hdfs:// 
to s3://.  Upside would be that all of the s3 vagaries would be managed 
for you (size limits, etc.).  Downside would be that you'd have to put 
up an s3 dfs as copy sink.   Alternatively, maybe you can convince 
distcp to go between hdfs:// and http://?  Hbase files should never 
really be bigger than a 1G or so, so it should be 'safe'.

St.Ack

Clint Morgan wrote:
> Thanks for the input as it confirmed my suspicions.
>
> We were debating running off of S3 just to minimize moving parts. But
> it does not look feasible.
>
> We are wanting the cluster to "live forever" in that once the app is
> live, hbase will always be needed to serve data.
>
> A primary concern is data lose, so will probably still want to use S3
> as a backup medium. Moreover, we'd like to be able to quickly recover
> from HDFS failures to minimize downtime.This makes HBASE-50 look like
> the way to go.
>
> cheers,
> -clint
>
> On Wed, Apr 30, 2008 at 5:30 PM, Chris K Wensel <ch...@wensel.net> wrote:
>   
>> Anything relating to S3 will be slower thus it probably shouldn't be used as
>> the default FileSystem for Hadoop.
>>
>>  It works great if you need to park data between cluster runs, assuming you
>> do not need external (from Hadoop and the cluster) applications to be able
>> to read the data, as data in S3FS are stuffed into S3 as blocks (similar to
>> HDFS).
>>
>>  Further,  once support for appends is added to Hadoop/HDFS, I am unsure if
>> it will be inherited by S3FS. I think this is a critical issue for HBase.
>>
>>  Assuming your aren't expecting this cluster to live forever, maybe you
>> should keep your authoritative data on s3 (native or s3fs) and just reload
>> HBase on cluster init?
>>
>>  ckw
>>
>>  Chris K Wensel
>>  chris@wensel.net
>>  http://chris.wensel.net/
>>  http://www.cascading.org/
>>
>>
>>
>>  On Apr 30, 2008, at 1:02 PM, Clint Morgan wrote:
>>
>>
>>     
>>> We are considering using S3 as the DFS impl for hbase. I ran some
>>> benchmarks to get an idea for the performance differences. We are
>>> particularly interested in being able to serve data to users from
>>> hbase, so want low latency responses for getting 10s of rows.
>>>
>>> Each row ("transaction") has about 1K worth of data in about 5 columns
>>> in two families. I'm using HBASE-605 to maintain a secondary index on
>>> the transaction amount. There is also a "relation" to a customer
>>> table, so some reads will also do a get from this other table.
>>>
>>> First ran hbase backed by hdfs. Everything was run on EC2 small nodes.
>>> 1 node for Name node, 1 node for Data
>>> node, 1 node with Master and Region server, 1 node to load/read data
>>> from.
>>>
>>> Adding 50K transactions: [56610.166]ms
>>> Find all transactions: [35388.601]ms
>>> FindAll page 1: [125.058]ms (PageSize is 10)
>>> FindAll page 11: [71.89]ms
>>> FindAll page 51: [145.54]ms
>>> FindAll page 61: [268.486]ms
>>>
>>> FindAll sorted page 1: [139.881]ms
>>> FindAll sorted page 11: [1521.655]ms
>>> FindAll sorted page 21: [2729.641]ms
>>> FindAll sorted page 31: [3035.18]ms
>>>
>>> Then I ran hbase backed by s3. Everything else the same
>>>
>>> Adding 50K transaction: [104826.437]ms
>>> Findall transaction: [51622.039]ms
>>> Findall page 1: [5694.974]ms
>>> Findall page 11: [4878.234]ms
>>> Findall page 51: [5743.882]ms
>>> Findall page 61: [4167.133]ms
>>>
>>> Findall sorted page 1: [18535.306]ms
>>> Then the other sorted finds timed out on the RPC call.
>>>
>>> So to summarize:
>>> loading data: almost twice as slow
>>> A long scan is about 1.5 times slower
>>> short scans are over an order of magnitude slower
>>> and random reads (done on the sorted "scan") are over 2 orders of
>>> magnitude slower
>>>
>>> Do these results sound reasonable? Is S3 really that costly compared
>>> to HDFS? Thanks for your input.
>>> -clint
>>>
>>>       
>>
>>
>>
>>
>>

Re: comparing hbase backed by HDFS verses S3

Posted by Clint Morgan <cm...@troove.net>.

Thanks for the input as it confirmed my suspicions.

We were debating running off of S3 just to minimize moving parts. But
it does not look feasible.

We are wanting the cluster to "live forever" in that once the app is
live, hbase will always be needed to serve data.

A primary concern is data lose, so will probably still want to use S3
as a backup medium. Moreover, we'd like to be able to quickly recover
from HDFS failures to minimize downtime.This makes HBASE-50 look like
the way to go.

cheers,
-clint

On Wed, Apr 30, 2008 at 5:30 PM, Chris K Wensel <ch...@wensel.net> wrote:
> Anything relating to S3 will be slower thus it probably shouldn't be used as
> the default FileSystem for Hadoop.
>
>  It works great if you need to park data between cluster runs, assuming you
> do not need external (from Hadoop and the cluster) applications to be able
> to read the data, as data in S3FS are stuffed into S3 as blocks (similar to
> HDFS).
>
>  Further,  once support for appends is added to Hadoop/HDFS, I am unsure if
> it will be inherited by S3FS. I think this is a critical issue for HBase.
>
>  Assuming your aren't expecting this cluster to live forever, maybe you
> should keep your authoritative data on s3 (native or s3fs) and just reload
> HBase on cluster init?
>
>  ckw
>
>  Chris K Wensel
>  chris@wensel.net
>  http://chris.wensel.net/
>  http://www.cascading.org/
>
>
>
>  On Apr 30, 2008, at 1:02 PM, Clint Morgan wrote:
>
>
> > We are considering using S3 as the DFS impl for hbase. I ran some
> > benchmarks to get an idea for the performance differences. We are
> > particularly interested in being able to serve data to users from
> > hbase, so want low latency responses for getting 10s of rows.
> >
> > Each row ("transaction") has about 1K worth of data in about 5 columns
> > in two families. I'm using HBASE-605 to maintain a secondary index on
> > the transaction amount. There is also a "relation" to a customer
> > table, so some reads will also do a get from this other table.
> >
> > First ran hbase backed by hdfs. Everything was run on EC2 small nodes.
> > 1 node for Name node, 1 node for Data
> > node, 1 node with Master and Region server, 1 node to load/read data
> > from.
> >
> > Adding 50K transactions: [56610.166]ms
> > Find all transactions: [35388.601]ms
> > FindAll page 1: [125.058]ms (PageSize is 10)
> > FindAll page 11: [71.89]ms
> > FindAll page 51: [145.54]ms
> > FindAll page 61: [268.486]ms
> >
> > FindAll sorted page 1: [139.881]ms
> > FindAll sorted page 11: [1521.655]ms
> > FindAll sorted page 21: [2729.641]ms
> > FindAll sorted page 31: [3035.18]ms
> >
> > Then I ran hbase backed by s3. Everything else the same
> >
> > Adding 50K transaction: [104826.437]ms
> > Findall transaction: [51622.039]ms
> > Findall page 1: [5694.974]ms
> > Findall page 11: [4878.234]ms
> > Findall page 51: [5743.882]ms
> > Findall page 61: [4167.133]ms
> >
> > Findall sorted page 1: [18535.306]ms
> > Then the other sorted finds timed out on the RPC call.
> >
> > So to summarize:
> > loading data: almost twice as slow
> > A long scan is about 1.5 times slower
> > short scans are over an order of magnitude slower
> > and random reads (done on the sorted "scan") are over 2 orders of
> > magnitude slower
> >
> > Do these results sound reasonable? Is S3 really that costly compared
> > to HDFS? Thanks for your input.
> > -clint
> >
>
>
>
>
>
>