You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by David Parks <da...@yahoo.com> on 2012/10/10 10:16:00 UTC

How well does HBase run on low/medium memory/cpu clusters?

In looking at the AWS MapReduce version of HBase, it doesn't  even  give an
option to run it on lower end hardware.

I am considering HBase as an alternative to one large table we have in MySQL
which is causing problems. It's 50M rows, a pretty straight forward set of
product  items.

The challenge  is that I need to do 10+ range scans a day over about 7M
items each where we check for updates. This is ideal for HBase, but hell for
MySQL (a join of a 7M row table with a 50M row table is giving us
fits-a-plenty).

But beyond the daily range scans the actual workload on the boxes should be
reasonable, just random access reads. So it doesn't seem like I should need
significant memory/CPU requirements...

But here's where I don't find a lot of information - as  someone reasonably
new to HBase (I read a book, did the examples), am I missing anything in my
thinking?

David


RE: How well does HBase run on low/medium memory/cpu clusters?

Posted by Anoop Sam John <an...@huawei.com>.
>But perhaps I don't know enough. Is HBase typically CPU bound? Memory bound?
Disk bound?

I would say HBase (RegionServers) are more memory bound.

-Anoop-
________________________________________
From: David Parks [davidparks21@yahoo.com]
Sent: Thursday, October 11, 2012 12:34 PM
To: user@hbase.apache.org
Subject: RE: How well does HBase run on low/medium memory/cpu clusters?

Ah, the question I have isn't about schema design. What exists as multiple
tables in MySQL would become one table probably in HBase. My comment about
"joining" a 7M and a 15M row table in MySQL is because of our daily "scan"
to update that range of 7M rows. In MySQL, that's a CSV import followed by
an update (requiring a nasty join). This would go down pretty well with a
properly designed rowkey in HBase and perhaps a mapreduce job for the big
update.

My question is more about what kind of hardware I really need in order to
support a reasonable amount of random  access lookups, and the occasional
range scan over say 7M rows.

I would like to think that a cluster of dual-core, 1.7GB ram boxes could
perform reasonably well. That is to say, I don't need an expensive cluster
of 15GB ram boxes.

But perhaps I don't know enough. Is HBase typically CPU bound? Memory bound?
Disk bound?  Given the expectation of a reasonable rate (given the cluster
size) of random reads (from a web app) and ~hourly range scans of 7M rows.

Dave


-----Original Message-----
From: Michael Segel [mailto:michael_segel@hotmail.com]
Sent: Wednesday, October 10, 2012 8:52 PM
To: user@hbase.apache.org
Subject: Re: How well does HBase run on low/medium memory/cpu clusters?

Well you don't want to do joins in HBase.

There are a couple of ways to do this, however, I think based on what you
have said... the larger issue for either solution (HBase or MySQL would be
your schema design.)

Basically you said you have Table A w 50 Million rows and Table B of 7
Million rows.

You don't really talk about any indexes or Foreign Key constraints between
the two tables.
Or what that data is...

Can you provide more information?

Right now you haven't provided enough information to solve your problem.

On Oct 10, 2012, at 3:16 AM, David Parks <da...@yahoo.com> wrote:

> In looking at the AWS MapReduce version of HBase, it doesn't  even
> give an option to run it on lower end hardware.
>
> I am considering HBase as an alternative to one large table we have in
> MySQL which is causing problems. It's 50M rows, a pretty straight
> forward set of product  items.
>
> The challenge  is that I need to do 10+ range scans a day over about
> 7M items each where we check for updates. This is ideal for HBase, but
> hell for MySQL (a join of a 7M row table with a 50M row table is
> giving us fits-a-plenty).
>
> But beyond the daily range scans the actual workload on the boxes
> should be reasonable, just random access reads. So it doesn't seem
> like I should need significant memory/CPU requirements...
>
> But here's where I don't find a lot of information - as  someone
> reasonably new to HBase (I read a book, did the examples), am I
> missing anything in my thinking?
>
> David
>
>

RE: How well does HBase run on low/medium memory/cpu clusters?

Posted by David Parks <da...@yahoo.com>.
Ah, the question I have isn't about schema design. What exists as multiple
tables in MySQL would become one table probably in HBase. My comment about
"joining" a 7M and a 15M row table in MySQL is because of our daily "scan"
to update that range of 7M rows. In MySQL, that's a CSV import followed by
an update (requiring a nasty join). This would go down pretty well with a
properly designed rowkey in HBase and perhaps a mapreduce job for the big
update.

My question is more about what kind of hardware I really need in order to
support a reasonable amount of random  access lookups, and the occasional
range scan over say 7M rows.

I would like to think that a cluster of dual-core, 1.7GB ram boxes could
perform reasonably well. That is to say, I don't need an expensive cluster
of 15GB ram boxes.

But perhaps I don't know enough. Is HBase typically CPU bound? Memory bound?
Disk bound?  Given the expectation of a reasonable rate (given the cluster
size) of random reads (from a web app) and ~hourly range scans of 7M rows.

Dave


-----Original Message-----
From: Michael Segel [mailto:michael_segel@hotmail.com] 
Sent: Wednesday, October 10, 2012 8:52 PM
To: user@hbase.apache.org
Subject: Re: How well does HBase run on low/medium memory/cpu clusters?

Well you don't want to do joins in HBase.

There are a couple of ways to do this, however, I think based on what you
have said... the larger issue for either solution (HBase or MySQL would be
your schema design.)

Basically you said you have Table A w 50 Million rows and Table B of 7
Million rows. 

You don't really talk about any indexes or Foreign Key constraints between
the two tables. 
Or what that data is...

Can you provide more information? 

Right now you haven't provided enough information to solve your problem.

On Oct 10, 2012, at 3:16 AM, David Parks <da...@yahoo.com> wrote:

> In looking at the AWS MapReduce version of HBase, it doesn't  even  
> give an option to run it on lower end hardware.
> 
> I am considering HBase as an alternative to one large table we have in 
> MySQL which is causing problems. It's 50M rows, a pretty straight 
> forward set of product  items.
> 
> The challenge  is that I need to do 10+ range scans a day over about 
> 7M items each where we check for updates. This is ideal for HBase, but 
> hell for MySQL (a join of a 7M row table with a 50M row table is 
> giving us fits-a-plenty).
> 
> But beyond the daily range scans the actual workload on the boxes 
> should be reasonable, just random access reads. So it doesn't seem 
> like I should need significant memory/CPU requirements...
> 
> But here's where I don't find a lot of information - as  someone 
> reasonably new to HBase (I read a book, did the examples), am I 
> missing anything in my thinking?
> 
> David
> 
> 


Re: How well does HBase run on low/medium memory/cpu clusters?

Posted by Michael Segel <mi...@hotmail.com>.
Well you don't want to do joins in HBase.

There are a couple of ways to do this, however, I think based on what you have said... the larger issue for either solution (HBase or MySQL would be your schema design.)

Basically you said you have Table A w 50 Million rows and Table B of 7 Million rows. 

You don't really talk about any indexes or Foreign Key constraints between the two tables. 
Or what that data is...

Can you provide more information? 

Right now you haven't provided enough information to solve your problem.

On Oct 10, 2012, at 3:16 AM, David Parks <da...@yahoo.com> wrote:

> In looking at the AWS MapReduce version of HBase, it doesn't  even  give an
> option to run it on lower end hardware.
> 
> I am considering HBase as an alternative to one large table we have in MySQL
> which is causing problems. It's 50M rows, a pretty straight forward set of
> product  items.
> 
> The challenge  is that I need to do 10+ range scans a day over about 7M
> items each where we check for updates. This is ideal for HBase, but hell for
> MySQL (a join of a 7M row table with a 50M row table is giving us
> fits-a-plenty).
> 
> But beyond the daily range scans the actual workload on the boxes should be
> reasonable, just random access reads. So it doesn't seem like I should need
> significant memory/CPU requirements...
> 
> But here's where I don't find a lot of information - as  someone reasonably
> new to HBase (I read a book, did the examples), am I missing anything in my
> thinking?
> 
> David
> 
> 


Re: How well does HBase run on low/medium memory/cpu clusters?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi David,

For productions, it will be recommanded to have good computers. But
for testing, you can do with almost anything.

I'm running cluster in production with 6 region servers. My smallest
computer is a P4 with only 500M of memory... I'm not saying that it's
recommanded but it's working.

I can't use big cache values else I'm getting out of memory and
servers are closing (but not hadoop). But it's working. I have few
tables. My biggest one is 15M but is growing every day. I have
splitted it a lot to make sure to share the workload between all the
servers.

I'm expecting to be at about 30M by the end of the week.

The only big computer I have is my master (8CPU, 12G) wich is also
hosting a region server (not recommended for quality production
schema).

JM

2012/10/10, David Parks <da...@yahoo.com>:
> In looking at the AWS MapReduce version of HBase, it doesn't  even  give an
> option to run it on lower end hardware.
>
> I am considering HBase as an alternative to one large table we have in
> MySQL
> which is causing problems. It's 50M rows, a pretty straight forward set of
> product  items.
>
> The challenge  is that I need to do 10+ range scans a day over about 7M
> items each where we check for updates. This is ideal for HBase, but hell
> for
> MySQL (a join of a 7M row table with a 50M row table is giving us
> fits-a-plenty).
>
> But beyond the daily range scans the actual workload on the boxes should be
> reasonable, just random access reads. So it doesn't seem like I should need
> significant memory/CPU requirements...
>
> But here's where I don't find a lot of information - as  someone reasonably
> new to HBase (I read a book, did the examples), am I missing anything in my
> thinking?
>
> David
>
>