You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ramu M S <ra...@gmail.com> on 2013/10/25 06:45:46 UTC

Linear Scalability in HBase

Hi All,

I am running HBase 0.94.6 with 8 region servers and getting throughput of
around 15K Read OPS and 20K Write OPS per server through YCSB tests. Table
is pre created with 8 regions per region server and it has 120 million
records of 700 bytes each.

I increased the number of region servers to 25,  pre created table with 8
regions per region server and loaded 375 Million records. I'm getting a
throughput of 12K Read OPS and 19K Write OPS per server. A drop of 20% per
server for read and drop of 10% per server for write.

Distribution of load on region servers is even in all region server in both
scenarios for read and write.

I wanted to understand if HBase does scale performance linearly? Any
configurations I'm missing? Any factors that might affect this linear
scalability?

Regards,
Ramu

Re: Linear Scalability in HBase

Posted by Asaf Mesika <as...@gmail.com>.
That seems like too much client threads. How much mb/sec did you on that 1
RS?

On Friday, October 25, 2013, Vladimir Rodionov wrote:

> You can not saturate region server with one client (unless you probably
> use hbase-async) if all data is cached in RAM.
> In our performance tests we have run 10 clients (on different hosts) with
> 30 threads each to max out 1 RS when all data
> is in cache (block, page, etc).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com <javascript:;>
>
> ________________________________________
> From: Ramu M S [ramu.malur@gmail.com <javascript:;>]
> Sent: Friday, October 25, 2013 9:35 AM
> To: user@hbase.apache.org <javascript:;>
> Subject: Re: Linear Scalability in HBase
>
> Hi,
>
> For me scalability is to achieve same throughput and latency with the
> increase in number of clients.
>
> In my case the data set increases with the number of clients. That's the
> reason I vary both clients and region servers.
>
> I'm trying to identify how the cluster should grow to handle data from more
> clients so that the operations throughput and latency is under defined
> limits.
>
> Currently the limit is 15K OPS throughput and 1 ms latency.
>
> To test, I have kept the data increase at around 15 million per server.
>
> Each YCSB client actually runs 32 threads. So it is actually 15 million
> more data for 32 more clients.
>
> All machines are physical servers.
>
> 1) Read and write latency is around 1 ms in first whereas in second case
> its little higher at 1.1 to 1.2 ms.
>
> 2) Keeping same number of clients as the first case, the latency reduced to
> 0.7 ms but throughput came down further to just 9K OPS
>
> For the tests, I'm running both clients and Region servers on same machine.
> But I tried in 8 Server scenario to run clients on different machines but
> results were almost same as that of running clients on same machine.
>
> Ganglia shows that system load is around 30% in both scenarios.
>
> What I wanted to understand is how to grow the cluster to meet the needs of
> both throughput and latency?
>
> Regards,
> Ramu
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com<javascript:;>and delete or destroy any copy of this message and its attachments.
>

RE: Linear Scalability in HBase

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
You can not saturate region server with one client (unless you probably use hbase-async) if all data is cached in RAM.
In our performance tests we have run 10 clients (on different hosts) with 30 threads each to max out 1 RS when all data
is in cache (block, page, etc).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ramu M S [ramu.malur@gmail.com]
Sent: Friday, October 25, 2013 9:35 AM
To: user@hbase.apache.org
Subject: Re: Linear Scalability in HBase

Hi,

For me scalability is to achieve same throughput and latency with the
increase in number of clients.

In my case the data set increases with the number of clients. That's the
reason I vary both clients and region servers.

I'm trying to identify how the cluster should grow to handle data from more
clients so that the operations throughput and latency is under defined
limits.

Currently the limit is 15K OPS throughput and 1 ms latency.

To test, I have kept the data increase at around 15 million per server.

Each YCSB client actually runs 32 threads. So it is actually 15 million
more data for 32 more clients.

All machines are physical servers.

1) Read and write latency is around 1 ms in first whereas in second case
its little higher at 1.1 to 1.2 ms.

2) Keeping same number of clients as the first case, the latency reduced to
0.7 ms but throughput came down further to just 9K OPS

For the tests, I'm running both clients and Region servers on same machine.
But I tried in 8 Server scenario to run clients on different machines but
results were almost same as that of running clients on same machine.

Ganglia shows that system load is around 30% in both scenarios.

What I wanted to understand is how to grow the cluster to meet the needs of
both throughput and latency?

Regards,
Ramu

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Linear Scalability in HBase

Posted by Ramu M S <ra...@gmail.com>.
Hi,

For me scalability is to achieve same throughput and latency with the
increase in number of clients.

In my case the data set increases with the number of clients. That's the
reason I vary both clients and region servers.

I'm trying to identify how the cluster should grow to handle data from more
clients so that the operations throughput and latency is under defined
limits.

Currently the limit is 15K OPS throughput and 1 ms latency.

To test, I have kept the data increase at around 15 million per server.

Each YCSB client actually runs 32 threads. So it is actually 15 million
more data for 32 more clients.

All machines are physical servers.

1) Read and write latency is around 1 ms in first whereas in second case
its little higher at 1.1 to 1.2 ms.

2) Keeping same number of clients as the first case, the latency reduced to
0.7 ms but throughput came down further to just 9K OPS

For the tests, I'm running both clients and Region servers on same machine.
But I tried in 8 Server scenario to run clients on different machines but
results were almost same as that of running clients on same machine.

Ganglia shows that system load is around 30% in both scenarios.

What I wanted to understand is how to grow the cluster to meet the needs of
both throughput and latency?

Regards,
Ramu

Re: Linear Scalability in HBase

Posted by Michael Segel <ms...@hotmail.com>.
How do you define linear scalability? 
Is it as the cluster grows the time it takes to fetch data is roughly consistent? 

In your test… you’re changing both the number of users and the size at the same time. 

And here’s a bigger question… are these physical machines or are they on AWS? 


1)  How long does it take to do a get() on the first size of the cluster? 
    How long does it take to do a get() on the second, larger cluster? 

2) If you increase the size and data of the cluster, running the test with the same number of clients, what do you see? 


On Oct 24, 2013, at 11:45 PM, Ramu M S <ra...@gmail.com> wrote:

> Hi All,
> 
> I am running HBase 0.94.6 with 8 region servers and getting throughput of
> around 15K Read OPS and 20K Write OPS per server through YCSB tests. Table
> is pre created with 8 regions per region server and it has 120 million
> records of 700 bytes each.
> 
> I increased the number of region servers to 25,  pre created table with 8
> regions per region server and loaded 375 Million records. I'm getting a
> throughput of 12K Read OPS and 19K Write OPS per server. A drop of 20% per
> server for read and drop of 10% per server for write.
> 
> Distribution of load on region servers is even in all region server in both
> scenarios for read and write.
> 
> I wanted to understand if HBase does scale performance linearly? Any
> configurations I'm missing? Any factors that might affect this linear
> scalability?
> 
> Regards,
> Ramu


Re: Linear Scalability in HBase

Posted by Ramu M S <ra...@gmail.com>.
Ted,

I running 8 clients in first and 25 in second. Clients running in same
machines where region servers are running.

Regards,
Ramu

Re: Linear Scalability in HBase

Posted by Ted Yu <yu...@gmail.com>.
How many YCSB clients were used in each setting ?

Thanks

On Oct 24, 2013, at 9:45 PM, Ramu M S <ra...@gmail.com> wrote:

> Hi All,
> 
> I am running HBase 0.94.6 with 8 region servers and getting throughput of
> around 15K Read OPS and 20K Write OPS per server through YCSB tests. Table
> is pre created with 8 regions per region server and it has 120 million
> records of 700 bytes each.
> 
> I increased the number of region servers to 25,  pre created table with 8
> regions per region server and loaded 375 Million records. I'm getting a
> throughput of 12K Read OPS and 19K Write OPS per server. A drop of 20% per
> server for read and drop of 10% per server for write.
> 
> Distribution of load on region servers is even in all region server in both
> scenarios for read and write.
> 
> I wanted to understand if HBase does scale performance linearly? Any
> configurations I'm missing? Any factors that might affect this linear
> scalability?
> 
> Regards,
> Ramu