You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Adam Silberstein <si...@yahoo-inc.com> on 2009/10/06 17:59:30 UTC

random read/write performance

Hi,

Just wanted to give a quick update on our HBase benchmarking efforts at
Yahoo.  The basic use case we're looking at is:

1K records

20GB of records per node (and 6GB of memory per node, so data is not
memory resident)

Workloads that do random reads/writes (e.g. 95% reads, 5% writes).

Multiple clients doing the reads/writes (i.e. 50-200)

Measure throughput vs. latency, and see how high we can push the
throughput.  

Note that although we want to see where throughput maxes out, the
workload is random, rather than scan-oriented.

 

I've been tweaking our HBase installation based on advice I've
read/gotten from a few people.  Currently, I'm running 0.20.0, have heap
size set to 6GB per server, and have iCMS off.  I'm still using the REST
server instead of the java client.  We're about to move our benchmarking
tool to java, so at that point we can use the java API.  At that point,
I want to turn off WAL as well.  If anyone has more suggestions for this
workload (either things to try while still using REST, or things to try
once I have a java client), please let me know. 

 

Given all that, I'm currently seeing maximal throughput of about 300
ops/sec/server.  Has anyone with a similar disk-resident and random
workload seen drastically different numbers, or guesses for what I can
expect with the java client?

 

Thanks!

Adam


Re: random read/write performance

Posted by Schubert Zhang <zs...@gmail.com>.
There is a performance evaluation result:
http://cloudepr.blogspot.com/2009/08/hbase-0200-performance-evaluation.html
That benchmarks does not use LZO, we will do it.
On Sat, Oct 10, 2009 at 11:21 AM, stack <st...@duboce.net> wrote:

> I should have said, to figure the count of regions, see the UI (HBase puts
> up UIs on port 60020 for master by default... regionservers on 60030).
> St.Ack
>
> On Wed, Oct 7, 2009 at 10:14 PM, stack <st...@duboce.net> wrote:
>
> > On Tue, Oct 6, 2009 at 10:52 PM, Adam Silberstein <
> silberst@yahoo-inc.com>wrote:
> >
> >> Hey,
> >> Thanks for all the info...
> >>
> >> First, a few details to clarify my use case:
> >> -I have 6 region servers.
> >> -I loaded a total of 120GB in 1K records into my table, so 20GB per
> >> server.  I'm not sure how many regions that has created.
> >>
> >
> > You could run the rowcounter mapreduce job to see:
> >
> > ./bin/hadoop jar hbase.jar rowcounter
> >
> > That'll dump usage.  You pass a tablename, column and a tmpdir IIRC.
> >
> >
> >
> >> -My reported numbers are on workloads taking place once the 120GB is in
> >> place, rather than while loading the 120GB.
> >> -I've run with combinations of 50,100,200 clients hitting the REST
> >> server.  So that's e.g. 200 clients across all region servers, not per
> >> region server.  Each client just repeatedly a) generates a random record
> >> known to exist, and b) reads or updates it.
> >>
> >
> > Our client can be a bottleneck.  At its core is hadoop RPC with its
> single
> > connection to each server over which request/response are multiplexed.
>  As
> > per J-D's suggestion, you might be able to get more throughput by upping
> the
> > REST server count (or, should be non-issue when you move to java api).
> >
> > REST server base64's everything too so this'll add a bit of friction.
> >
> >
> >
> >> -I'm interested in both throughput and latency.  First, at medium
> >> throughputs (i.e. not at maximum capacity) what are average read/write
> >> latencies.  And then, what is the maximum possible throughput, even as
> >> that causes latencies to be very high.  What is the throughput wall?
> >> Plotting throughput vs. latency for different target throughputs reveals
> >> both of these.
> >>
> >
> > Good stuff.  Let us know how else we can help out.
> >
> >
> > When I have 50 clients across 6 region server, this is fairly close to
> >> your read throughput experiment with 8 clients on 1 region server.  Your
> >> 2.4 k/sec throughput is obviously a lot better than what I'm seeing at
> >> 300/sec.  Since you had 10GB loaded, is it reasonable to assume that
> >> ~50% of the reads were from memory?
> >
> >
> > I think I had 3G per RS with 40% given over to cache. I had 1RS so not
> too
> > much coming from hbase cache (OS cache probably played a big factor).
> >
> >
> >
> >>  In my case, with 20GB loaded and
> >> 6GB heapspace, I assume ~30% was served from memory.   I haven't run
> >> enough tests on different size tables to estimate the impact of having
> >> data in memory, though intuitively, in the time it takes to read a
> >> record from disk, you could read several from memory.  And the more the
> >> data is disk resident, the more the disk contention.
> >>
> >> Yes.
> >
> >
> >
> >> Finally, I haven't tried LZO or increasing the logroll multiplier yet,
> >>
> >
> > LZO would be good.  Logroll multiplier is more about writing which you
> are
> > doing little of so maybe its ok at default?
> >
> >
> >
> >> and I'm hoping to move to the java client soon.  As you might recall,
> >> we're working toward a benchmark for cloud serving stores.  We're
> >> testing the newest version of our tool now.  Since it's in java, we'll
> >> be able to use it with HBase.
> >>
> >
> > Tell us more?  You are comparing HBase to others with a tool of your
> > writing?
> >
> >
> > I'll report back when I find out how much these changes close the
> >> performance gap, and how much seems inherent when much of the data is
> >> disk resident.
> >>
> >>
> > Thanks Adam.
> > St.Ack
> >
> >
> >> -Adam
> >>
> >> -----Original Message-----
> >> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> >> stack
> >> Sent: Tuesday, October 06, 2009 1:08 PM
> >> To: hbase-user@hadoop.apache.org
> >> Subject: Re: random read/write performance
> >>
> >> Hey Adam:
> >>
> >> Thanks for checking in.
> >>
> >> I just did some rough loadings on a small (old hardware) cluster using
> >> less
> >> memory per regionserver than you.  Its described on this page:
> >> http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation.  Random
> >> writing
> >> 1k records with the PerformanceEvaluation script to a single
> >> regionserver, I
> >> can do about 8-10k/writes/second on average using the 0.20.1 release
> >> candidate 1 with a single client.  Sequential writes are about the same
> >> speed usually.  Random reads are about 650/second on average with single
> >> client and about 2.4k/second on average with 8 concurrent clients.
> >>
> >> So it seems like you should be able to do better than
> >> 300ops/persecond/permachine -- especially if you can do the java api.
> >>
> >> This single regionserver was carrying about 50 regions.  Thats about
> >> 10GB.
> >> How many regions loaded in your case?
> >>
> >> If throughput is important to you, lzo should help (as per J-D).
> >> Turning
> >> off WAL will also help with write throughput but that might not be what
> >> you
> >> want.  Random-read-wise, the best thing you can do is give it RAM (6G
> >> should
> >> be good).
> >>
> >> Is that 50-200 clients per regionserver or for the overall cluster?  If
> >> per
> >> regionserver, I can try that over here.   I can try with bigger regions
> >> if
> >> you'd like -- 1G regions -- to see if that'd help your use case (if you
> >> enable lzo, this should up your throughput and shrink the number of
> >> regions
> >> any one server is hosting).
> >>
> >> St.Ack
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Oct 6, 2009 at 8:59 AM, Adam Silberstein
> >> <si...@yahoo-inc.com>wrote:
> >>
> >> > Hi,
> >> >
> >> > Just wanted to give a quick update on our HBase benchmarking efforts
> >> at
> >> > Yahoo.  The basic use case we're looking at is:
> >> >
> >> > 1K records
> >> >
> >> > 20GB of records per node (and 6GB of memory per node, so data is not
> >> > memory resident)
> >> >
> >> > Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
> >> >
> >> > Multiple clients doing the reads/writes (i.e. 50-200)
> >> >
> >> > Measure throughput vs. latency, and see how high we can push the
> >> > throughput.
> >> >
> >> > Note that although we want to see where throughput maxes out, the
> >> > workload is random, rather than scan-oriented.
> >> >
> >> >
> >> >
> >> > I've been tweaking our HBase installation based on advice I've
> >> > read/gotten from a few people.  Currently, I'm running 0.20.0, have
> >> heap
> >> > size set to 6GB per server, and have iCMS off.  I'm still using the
> >> REST
> >> > server instead of the java client.  We're about to move our
> >> benchmarking
> >> > tool to java, so at that point we can use the java API.  At that
> >> point,
> >> > I want to turn off WAL as well.  If anyone has more suggestions for
> >> this
> >> > workload (either things to try while still using REST, or things to
> >> try
> >> > once I have a java client), please let me know.
> >> >
> >> >
> >> >
> >> > Given all that, I'm currently seeing maximal throughput of about 300
> >> > ops/sec/server.  Has anyone with a similar disk-resident and random
> >> > workload seen drastically different numbers, or guesses for what I can
> >> > expect with the java client?
> >> >
> >> >
> >> >
> >> > Thanks!
> >> >
> >> > Adam
> >> >
> >> >
> >>
> >
> >
>

Re: random read/write performance

Posted by stack <st...@duboce.net>.
I should have said, to figure the count of regions, see the UI (HBase puts
up UIs on port 60020 for master by default... regionservers on 60030).
St.Ack

On Wed, Oct 7, 2009 at 10:14 PM, stack <st...@duboce.net> wrote:

> On Tue, Oct 6, 2009 at 10:52 PM, Adam Silberstein <si...@yahoo-inc.com>wrote:
>
>> Hey,
>> Thanks for all the info...
>>
>> First, a few details to clarify my use case:
>> -I have 6 region servers.
>> -I loaded a total of 120GB in 1K records into my table, so 20GB per
>> server.  I'm not sure how many regions that has created.
>>
>
> You could run the rowcounter mapreduce job to see:
>
> ./bin/hadoop jar hbase.jar rowcounter
>
> That'll dump usage.  You pass a tablename, column and a tmpdir IIRC.
>
>
>
>> -My reported numbers are on workloads taking place once the 120GB is in
>> place, rather than while loading the 120GB.
>> -I've run with combinations of 50,100,200 clients hitting the REST
>> server.  So that's e.g. 200 clients across all region servers, not per
>> region server.  Each client just repeatedly a) generates a random record
>> known to exist, and b) reads or updates it.
>>
>
> Our client can be a bottleneck.  At its core is hadoop RPC with its single
> connection to each server over which request/response are multiplexed.  As
> per J-D's suggestion, you might be able to get more throughput by upping the
> REST server count (or, should be non-issue when you move to java api).
>
> REST server base64's everything too so this'll add a bit of friction.
>
>
>
>> -I'm interested in both throughput and latency.  First, at medium
>> throughputs (i.e. not at maximum capacity) what are average read/write
>> latencies.  And then, what is the maximum possible throughput, even as
>> that causes latencies to be very high.  What is the throughput wall?
>> Plotting throughput vs. latency for different target throughputs reveals
>> both of these.
>>
>
> Good stuff.  Let us know how else we can help out.
>
>
> When I have 50 clients across 6 region server, this is fairly close to
>> your read throughput experiment with 8 clients on 1 region server.  Your
>> 2.4 k/sec throughput is obviously a lot better than what I'm seeing at
>> 300/sec.  Since you had 10GB loaded, is it reasonable to assume that
>> ~50% of the reads were from memory?
>
>
> I think I had 3G per RS with 40% given over to cache. I had 1RS so not too
> much coming from hbase cache (OS cache probably played a big factor).
>
>
>
>>  In my case, with 20GB loaded and
>> 6GB heapspace, I assume ~30% was served from memory.   I haven't run
>> enough tests on different size tables to estimate the impact of having
>> data in memory, though intuitively, in the time it takes to read a
>> record from disk, you could read several from memory.  And the more the
>> data is disk resident, the more the disk contention.
>>
>> Yes.
>
>
>
>> Finally, I haven't tried LZO or increasing the logroll multiplier yet,
>>
>
> LZO would be good.  Logroll multiplier is more about writing which you are
> doing little of so maybe its ok at default?
>
>
>
>> and I'm hoping to move to the java client soon.  As you might recall,
>> we're working toward a benchmark for cloud serving stores.  We're
>> testing the newest version of our tool now.  Since it's in java, we'll
>> be able to use it with HBase.
>>
>
> Tell us more?  You are comparing HBase to others with a tool of your
> writing?
>
>
> I'll report back when I find out how much these changes close the
>> performance gap, and how much seems inherent when much of the data is
>> disk resident.
>>
>>
> Thanks Adam.
> St.Ack
>
>
>> -Adam
>>
>> -----Original Message-----
>> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
>> stack
>> Sent: Tuesday, October 06, 2009 1:08 PM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: random read/write performance
>>
>> Hey Adam:
>>
>> Thanks for checking in.
>>
>> I just did some rough loadings on a small (old hardware) cluster using
>> less
>> memory per regionserver than you.  Its described on this page:
>> http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation.  Random
>> writing
>> 1k records with the PerformanceEvaluation script to a single
>> regionserver, I
>> can do about 8-10k/writes/second on average using the 0.20.1 release
>> candidate 1 with a single client.  Sequential writes are about the same
>> speed usually.  Random reads are about 650/second on average with single
>> client and about 2.4k/second on average with 8 concurrent clients.
>>
>> So it seems like you should be able to do better than
>> 300ops/persecond/permachine -- especially if you can do the java api.
>>
>> This single regionserver was carrying about 50 regions.  Thats about
>> 10GB.
>> How many regions loaded in your case?
>>
>> If throughput is important to you, lzo should help (as per J-D).
>> Turning
>> off WAL will also help with write throughput but that might not be what
>> you
>> want.  Random-read-wise, the best thing you can do is give it RAM (6G
>> should
>> be good).
>>
>> Is that 50-200 clients per regionserver or for the overall cluster?  If
>> per
>> regionserver, I can try that over here.   I can try with bigger regions
>> if
>> you'd like -- 1G regions -- to see if that'd help your use case (if you
>> enable lzo, this should up your throughput and shrink the number of
>> regions
>> any one server is hosting).
>>
>> St.Ack
>>
>>
>>
>>
>>
>> On Tue, Oct 6, 2009 at 8:59 AM, Adam Silberstein
>> <si...@yahoo-inc.com>wrote:
>>
>> > Hi,
>> >
>> > Just wanted to give a quick update on our HBase benchmarking efforts
>> at
>> > Yahoo.  The basic use case we're looking at is:
>> >
>> > 1K records
>> >
>> > 20GB of records per node (and 6GB of memory per node, so data is not
>> > memory resident)
>> >
>> > Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
>> >
>> > Multiple clients doing the reads/writes (i.e. 50-200)
>> >
>> > Measure throughput vs. latency, and see how high we can push the
>> > throughput.
>> >
>> > Note that although we want to see where throughput maxes out, the
>> > workload is random, rather than scan-oriented.
>> >
>> >
>> >
>> > I've been tweaking our HBase installation based on advice I've
>> > read/gotten from a few people.  Currently, I'm running 0.20.0, have
>> heap
>> > size set to 6GB per server, and have iCMS off.  I'm still using the
>> REST
>> > server instead of the java client.  We're about to move our
>> benchmarking
>> > tool to java, so at that point we can use the java API.  At that
>> point,
>> > I want to turn off WAL as well.  If anyone has more suggestions for
>> this
>> > workload (either things to try while still using REST, or things to
>> try
>> > once I have a java client), please let me know.
>> >
>> >
>> >
>> > Given all that, I'm currently seeing maximal throughput of about 300
>> > ops/sec/server.  Has anyone with a similar disk-resident and random
>> > workload seen drastically different numbers, or guesses for what I can
>> > expect with the java client?
>> >
>> >
>> >
>> > Thanks!
>> >
>> > Adam
>> >
>> >
>>
>
>

Re: random read/write performance

Posted by stack <st...@duboce.net>.
On Tue, Oct 6, 2009 at 10:52 PM, Adam Silberstein <si...@yahoo-inc.com>wrote:

> Hey,
> Thanks for all the info...
>
> First, a few details to clarify my use case:
> -I have 6 region servers.
> -I loaded a total of 120GB in 1K records into my table, so 20GB per
> server.  I'm not sure how many regions that has created.
>

You could run the rowcounter mapreduce job to see:

./bin/hadoop jar hbase.jar rowcounter

That'll dump usage.  You pass a tablename, column and a tmpdir IIRC.



> -My reported numbers are on workloads taking place once the 120GB is in
> place, rather than while loading the 120GB.
> -I've run with combinations of 50,100,200 clients hitting the REST
> server.  So that's e.g. 200 clients across all region servers, not per
> region server.  Each client just repeatedly a) generates a random record
> known to exist, and b) reads or updates it.
>

Our client can be a bottleneck.  At its core is hadoop RPC with its single
connection to each server over which request/response are multiplexed.  As
per J-D's suggestion, you might be able to get more throughput by upping the
REST server count (or, should be non-issue when you move to java api).

REST server base64's everything too so this'll add a bit of friction.



> -I'm interested in both throughput and latency.  First, at medium
> throughputs (i.e. not at maximum capacity) what are average read/write
> latencies.  And then, what is the maximum possible throughput, even as
> that causes latencies to be very high.  What is the throughput wall?
> Plotting throughput vs. latency for different target throughputs reveals
> both of these.
>

Good stuff.  Let us know how else we can help out.


When I have 50 clients across 6 region server, this is fairly close to
> your read throughput experiment with 8 clients on 1 region server.  Your
> 2.4 k/sec throughput is obviously a lot better than what I'm seeing at
> 300/sec.  Since you had 10GB loaded, is it reasonable to assume that
> ~50% of the reads were from memory?


I think I had 3G per RS with 40% given over to cache. I had 1RS so not too
much coming from hbase cache (OS cache probably played a big factor).



>  In my case, with 20GB loaded and
> 6GB heapspace, I assume ~30% was served from memory.   I haven't run
> enough tests on different size tables to estimate the impact of having
> data in memory, though intuitively, in the time it takes to read a
> record from disk, you could read several from memory.  And the more the
> data is disk resident, the more the disk contention.
>
> Yes.



> Finally, I haven't tried LZO or increasing the logroll multiplier yet,
>

LZO would be good.  Logroll multiplier is more about writing which you are
doing little of so maybe its ok at default?



> and I'm hoping to move to the java client soon.  As you might recall,
> we're working toward a benchmark for cloud serving stores.  We're
> testing the newest version of our tool now.  Since it's in java, we'll
> be able to use it with HBase.
>

Tell us more?  You are comparing HBase to others with a tool of your
writing?


I'll report back when I find out how much these changes close the
> performance gap, and how much seems inherent when much of the data is
> disk resident.
>
>
Thanks Adam.
St.Ack


> -Adam
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> stack
> Sent: Tuesday, October 06, 2009 1:08 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: random read/write performance
>
> Hey Adam:
>
> Thanks for checking in.
>
> I just did some rough loadings on a small (old hardware) cluster using
> less
> memory per regionserver than you.  Its described on this page:
> http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation.  Random
> writing
> 1k records with the PerformanceEvaluation script to a single
> regionserver, I
> can do about 8-10k/writes/second on average using the 0.20.1 release
> candidate 1 with a single client.  Sequential writes are about the same
> speed usually.  Random reads are about 650/second on average with single
> client and about 2.4k/second on average with 8 concurrent clients.
>
> So it seems like you should be able to do better than
> 300ops/persecond/permachine -- especially if you can do the java api.
>
> This single regionserver was carrying about 50 regions.  Thats about
> 10GB.
> How many regions loaded in your case?
>
> If throughput is important to you, lzo should help (as per J-D).
> Turning
> off WAL will also help with write throughput but that might not be what
> you
> want.  Random-read-wise, the best thing you can do is give it RAM (6G
> should
> be good).
>
> Is that 50-200 clients per regionserver or for the overall cluster?  If
> per
> regionserver, I can try that over here.   I can try with bigger regions
> if
> you'd like -- 1G regions -- to see if that'd help your use case (if you
> enable lzo, this should up your throughput and shrink the number of
> regions
> any one server is hosting).
>
> St.Ack
>
>
>
>
>
> On Tue, Oct 6, 2009 at 8:59 AM, Adam Silberstein
> <si...@yahoo-inc.com>wrote:
>
> > Hi,
> >
> > Just wanted to give a quick update on our HBase benchmarking efforts
> at
> > Yahoo.  The basic use case we're looking at is:
> >
> > 1K records
> >
> > 20GB of records per node (and 6GB of memory per node, so data is not
> > memory resident)
> >
> > Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
> >
> > Multiple clients doing the reads/writes (i.e. 50-200)
> >
> > Measure throughput vs. latency, and see how high we can push the
> > throughput.
> >
> > Note that although we want to see where throughput maxes out, the
> > workload is random, rather than scan-oriented.
> >
> >
> >
> > I've been tweaking our HBase installation based on advice I've
> > read/gotten from a few people.  Currently, I'm running 0.20.0, have
> heap
> > size set to 6GB per server, and have iCMS off.  I'm still using the
> REST
> > server instead of the java client.  We're about to move our
> benchmarking
> > tool to java, so at that point we can use the java API.  At that
> point,
> > I want to turn off WAL as well.  If anyone has more suggestions for
> this
> > workload (either things to try while still using REST, or things to
> try
> > once I have a java client), please let me know.
> >
> >
> >
> > Given all that, I'm currently seeing maximal throughput of about 300
> > ops/sec/server.  Has anyone with a similar disk-resident and random
> > workload seen drastically different numbers, or guesses for what I can
> > expect with the java client?
> >
> >
> >
> > Thanks!
> >
> > Adam
> >
> >
>

Re: random read/write performance

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Adam,

You could also try to run 1 REST server per client machine or, if the
clients are on the region servers, 1 REST server per RS. The Thrift
and REST servers are basically wrappers around the Java client API.

J-D

On Wed, Oct 7, 2009 at 1:52 AM, Adam Silberstein <si...@yahoo-inc.com> wrote:
> Hey,
> Thanks for all the info...
>
> First, a few details to clarify my use case:
> -I have 6 region servers.
> -I loaded a total of 120GB in 1K records into my table, so 20GB per
> server.  I'm not sure how many regions that has created.
> -My reported numbers are on workloads taking place once the 120GB is in
> place, rather than while loading the 120GB.
> -I've run with combinations of 50,100,200 clients hitting the REST
> server.  So that's e.g. 200 clients across all region servers, not per
> region server.  Each client just repeatedly a) generates a random record
> known to exist, and b) reads or updates it.
> -I'm interested in both throughput and latency.  First, at medium
> throughputs (i.e. not at maximum capacity) what are average read/write
> latencies.  And then, what is the maximum possible throughput, even as
> that causes latencies to be very high.  What is the throughput wall?
> Plotting throughput vs. latency for different target throughputs reveals
> both of these.
>
> When I have 50 clients across 6 region server, this is fairly close to
> your read throughput experiment with 8 clients on 1 region server.  Your
> 2.4 k/sec throughput is obviously a lot better than what I'm seeing at
> 300/sec.  Since you had 10GB loaded, is it reasonable to assume that
> ~50% of the reads were from memory?  In my case, with 20GB loaded and
> 6GB heapspace, I assume ~30% was served from memory.   I haven't run
> enough tests on different size tables to estimate the impact of having
> data in memory, though intuitively, in the time it takes to read a
> record from disk, you could read several from memory.  And the more the
> data is disk resident, the more the disk contention.
>
> Finally, I haven't tried LZO or increasing the logroll multiplier yet,
> and I'm hoping to move to the java client soon.  As you might recall,
> we're working toward a benchmark for cloud serving stores.  We're
> testing the newest version of our tool now.  Since it's in java, we'll
> be able to use it with HBase.
>
> I'll report back when I find out how much these changes close the
> performance gap, and how much seems inherent when much of the data is
> disk resident.
>
> -Adam
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> stack
> Sent: Tuesday, October 06, 2009 1:08 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: random read/write performance
>
> Hey Adam:
>
> Thanks for checking in.
>
> I just did some rough loadings on a small (old hardware) cluster using
> less
> memory per regionserver than you.  Its described on this page:
> http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation.  Random
> writing
> 1k records with the PerformanceEvaluation script to a single
> regionserver, I
> can do about 8-10k/writes/second on average using the 0.20.1 release
> candidate 1 with a single client.  Sequential writes are about the same
> speed usually.  Random reads are about 650/second on average with single
> client and about 2.4k/second on average with 8 concurrent clients.
>
> So it seems like you should be able to do better than
> 300ops/persecond/permachine -- especially if you can do the java api.
>
> This single regionserver was carrying about 50 regions.  Thats about
> 10GB.
> How many regions loaded in your case?
>
> If throughput is important to you, lzo should help (as per J-D).
> Turning
> off WAL will also help with write throughput but that might not be what
> you
> want.  Random-read-wise, the best thing you can do is give it RAM (6G
> should
> be good).
>
> Is that 50-200 clients per regionserver or for the overall cluster?  If
> per
> regionserver, I can try that over here.   I can try with bigger regions
> if
> you'd like -- 1G regions -- to see if that'd help your use case (if you
> enable lzo, this should up your throughput and shrink the number of
> regions
> any one server is hosting).
>
> St.Ack
>
>
>
>
>
> On Tue, Oct 6, 2009 at 8:59 AM, Adam Silberstein
> <si...@yahoo-inc.com>wrote:
>
>> Hi,
>>
>> Just wanted to give a quick update on our HBase benchmarking efforts
> at
>> Yahoo.  The basic use case we're looking at is:
>>
>> 1K records
>>
>> 20GB of records per node (and 6GB of memory per node, so data is not
>> memory resident)
>>
>> Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
>>
>> Multiple clients doing the reads/writes (i.e. 50-200)
>>
>> Measure throughput vs. latency, and see how high we can push the
>> throughput.
>>
>> Note that although we want to see where throughput maxes out, the
>> workload is random, rather than scan-oriented.
>>
>>
>>
>> I've been tweaking our HBase installation based on advice I've
>> read/gotten from a few people.  Currently, I'm running 0.20.0, have
> heap
>> size set to 6GB per server, and have iCMS off.  I'm still using the
> REST
>> server instead of the java client.  We're about to move our
> benchmarking
>> tool to java, so at that point we can use the java API.  At that
> point,
>> I want to turn off WAL as well.  If anyone has more suggestions for
> this
>> workload (either things to try while still using REST, or things to
> try
>> once I have a java client), please let me know.
>>
>>
>>
>> Given all that, I'm currently seeing maximal throughput of about 300
>> ops/sec/server.  Has anyone with a similar disk-resident and random
>> workload seen drastically different numbers, or guesses for what I can
>> expect with the java client?
>>
>>
>>
>> Thanks!
>>
>> Adam
>>
>>
>

RE: random read/write performance

Posted by Adam Silberstein <si...@yahoo-inc.com>.
Hey, 
Thanks for all the info...

First, a few details to clarify my use case:
-I have 6 region servers.  
-I loaded a total of 120GB in 1K records into my table, so 20GB per
server.  I'm not sure how many regions that has created. 
-My reported numbers are on workloads taking place once the 120GB is in
place, rather than while loading the 120GB.
-I've run with combinations of 50,100,200 clients hitting the REST
server.  So that's e.g. 200 clients across all region servers, not per
region server.  Each client just repeatedly a) generates a random record
known to exist, and b) reads or updates it.
-I'm interested in both throughput and latency.  First, at medium
throughputs (i.e. not at maximum capacity) what are average read/write
latencies.  And then, what is the maximum possible throughput, even as
that causes latencies to be very high.  What is the throughput wall?
Plotting throughput vs. latency for different target throughputs reveals
both of these.

When I have 50 clients across 6 region server, this is fairly close to
your read throughput experiment with 8 clients on 1 region server.  Your
2.4 k/sec throughput is obviously a lot better than what I'm seeing at
300/sec.  Since you had 10GB loaded, is it reasonable to assume that
~50% of the reads were from memory?  In my case, with 20GB loaded and
6GB heapspace, I assume ~30% was served from memory.   I haven't run
enough tests on different size tables to estimate the impact of having
data in memory, though intuitively, in the time it takes to read a
record from disk, you could read several from memory.  And the more the
data is disk resident, the more the disk contention.

Finally, I haven't tried LZO or increasing the logroll multiplier yet,
and I'm hoping to move to the java client soon.  As you might recall,
we're working toward a benchmark for cloud serving stores.  We're
testing the newest version of our tool now.  Since it's in java, we'll
be able to use it with HBase.

I'll report back when I find out how much these changes close the
performance gap, and how much seems inherent when much of the data is
disk resident.

-Adam

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
stack
Sent: Tuesday, October 06, 2009 1:08 PM
To: hbase-user@hadoop.apache.org
Subject: Re: random read/write performance

Hey Adam:

Thanks for checking in.

I just did some rough loadings on a small (old hardware) cluster using
less
memory per regionserver than you.  Its described on this page:
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation.  Random
writing
1k records with the PerformanceEvaluation script to a single
regionserver, I
can do about 8-10k/writes/second on average using the 0.20.1 release
candidate 1 with a single client.  Sequential writes are about the same
speed usually.  Random reads are about 650/second on average with single
client and about 2.4k/second on average with 8 concurrent clients.

So it seems like you should be able to do better than
300ops/persecond/permachine -- especially if you can do the java api.

This single regionserver was carrying about 50 regions.  Thats about
10GB.
How many regions loaded in your case?

If throughput is important to you, lzo should help (as per J-D).
Turning
off WAL will also help with write throughput but that might not be what
you
want.  Random-read-wise, the best thing you can do is give it RAM (6G
should
be good).

Is that 50-200 clients per regionserver or for the overall cluster?  If
per
regionserver, I can try that over here.   I can try with bigger regions
if
you'd like -- 1G regions -- to see if that'd help your use case (if you
enable lzo, this should up your throughput and shrink the number of
regions
any one server is hosting).

St.Ack





On Tue, Oct 6, 2009 at 8:59 AM, Adam Silberstein
<si...@yahoo-inc.com>wrote:

> Hi,
>
> Just wanted to give a quick update on our HBase benchmarking efforts
at
> Yahoo.  The basic use case we're looking at is:
>
> 1K records
>
> 20GB of records per node (and 6GB of memory per node, so data is not
> memory resident)
>
> Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
>
> Multiple clients doing the reads/writes (i.e. 50-200)
>
> Measure throughput vs. latency, and see how high we can push the
> throughput.
>
> Note that although we want to see where throughput maxes out, the
> workload is random, rather than scan-oriented.
>
>
>
> I've been tweaking our HBase installation based on advice I've
> read/gotten from a few people.  Currently, I'm running 0.20.0, have
heap
> size set to 6GB per server, and have iCMS off.  I'm still using the
REST
> server instead of the java client.  We're about to move our
benchmarking
> tool to java, so at that point we can use the java API.  At that
point,
> I want to turn off WAL as well.  If anyone has more suggestions for
this
> workload (either things to try while still using REST, or things to
try
> once I have a java client), please let me know.
>
>
>
> Given all that, I'm currently seeing maximal throughput of about 300
> ops/sec/server.  Has anyone with a similar disk-resident and random
> workload seen drastically different numbers, or guesses for what I can
> expect with the java client?
>
>
>
> Thanks!
>
> Adam
>
>

Re: random read/write performance

Posted by stack <st...@duboce.net>.
Hey Adam:

Thanks for checking in.

I just did some rough loadings on a small (old hardware) cluster using less
memory per regionserver than you.  Its described on this page:
http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation.  Random writing
1k records with the PerformanceEvaluation script to a single regionserver, I
can do about 8-10k/writes/second on average using the 0.20.1 release
candidate 1 with a single client.  Sequential writes are about the same
speed usually.  Random reads are about 650/second on average with single
client and about 2.4k/second on average with 8 concurrent clients.

So it seems like you should be able to do better than
300ops/persecond/permachine -- especially if you can do the java api.

This single regionserver was carrying about 50 regions.  Thats about 10GB.
How many regions loaded in your case?

If throughput is important to you, lzo should help (as per J-D).   Turning
off WAL will also help with write throughput but that might not be what you
want.  Random-read-wise, the best thing you can do is give it RAM (6G should
be good).

Is that 50-200 clients per regionserver or for the overall cluster?  If per
regionserver, I can try that over here.   I can try with bigger regions if
you'd like -- 1G regions -- to see if that'd help your use case (if you
enable lzo, this should up your throughput and shrink the number of regions
any one server is hosting).

St.Ack





On Tue, Oct 6, 2009 at 8:59 AM, Adam Silberstein <si...@yahoo-inc.com>wrote:

> Hi,
>
> Just wanted to give a quick update on our HBase benchmarking efforts at
> Yahoo.  The basic use case we're looking at is:
>
> 1K records
>
> 20GB of records per node (and 6GB of memory per node, so data is not
> memory resident)
>
> Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
>
> Multiple clients doing the reads/writes (i.e. 50-200)
>
> Measure throughput vs. latency, and see how high we can push the
> throughput.
>
> Note that although we want to see where throughput maxes out, the
> workload is random, rather than scan-oriented.
>
>
>
> I've been tweaking our HBase installation based on advice I've
> read/gotten from a few people.  Currently, I'm running 0.20.0, have heap
> size set to 6GB per server, and have iCMS off.  I'm still using the REST
> server instead of the java client.  We're about to move our benchmarking
> tool to java, so at that point we can use the java API.  At that point,
> I want to turn off WAL as well.  If anyone has more suggestions for this
> workload (either things to try while still using REST, or things to try
> once I have a java client), please let me know.
>
>
>
> Given all that, I'm currently seeing maximal throughput of about 300
> ops/sec/server.  Has anyone with a similar disk-resident and random
> workload seen drastically different numbers, or guesses for what I can
> expect with the java client?
>
>
>
> Thanks!
>
> Adam
>
>

Re: random read/write performance

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Well you don't have to do this often, just try it once to see. You can
do it in the HBase shell:

major_compact '.META.'

And it takes 3-4 seconds.

J-D

On Tue, Oct 6, 2009 at 12:31 PM, Adam Silberstein
<si...@yahoo-inc.com> wrote:
> Hi J-D,
> Thanks for the tips.  Tweaking the multiplier looks easy enough.  I'm not sure how to force a major compaction.  If from M/R, does that mean you did it with the HDFS/Hadoop API?  Any guess on how long that major compaction takes?  Just wondering what it does to availability.
>
> Thanks,
> Adam
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
> Sent: Tuesday, October 06, 2009 9:11 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: random read/write performance
>
> Adam,
>
> Few thoughts:
>
>  - Do you use LZO?
>  - Instead of disabling the WAL, try first tweaking the safety net
> that's in place. For example, setting
> hbase.regionserver.logroll.multiplier to 1.5 or even higher will make
> it roll less often. The current value of 0.95 means you roll every
> ~62MB inserted in a regionserver. You can also set
> hbase.regionserver.maxlogs to something higher than 32 like 64.
>  - We flush the .META. table very very often and this results,
> sometimes after a big upload, in a lot of store files. Once I
> force-major compacted it during a MR job and speed went 500% faster
> because of the contention of all the clients.
>
> J-D
>
> On Tue, Oct 6, 2009 at 11:59 AM, Adam Silberstein
> <si...@yahoo-inc.com> wrote:
>> Hi,
>>
>> Just wanted to give a quick update on our HBase benchmarking efforts at
>> Yahoo.  The basic use case we're looking at is:
>>
>> 1K records
>>
>> 20GB of records per node (and 6GB of memory per node, so data is not
>> memory resident)
>>
>> Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
>>
>> Multiple clients doing the reads/writes (i.e. 50-200)
>>
>> Measure throughput vs. latency, and see how high we can push the
>> throughput.
>>
>> Note that although we want to see where throughput maxes out, the
>> workload is random, rather than scan-oriented.
>>
>>
>>
>> I've been tweaking our HBase installation based on advice I've
>> read/gotten from a few people.  Currently, I'm running 0.20.0, have heap
>> size set to 6GB per server, and have iCMS off.  I'm still using the REST
>> server instead of the java client.  We're about to move our benchmarking
>> tool to java, so at that point we can use the java API.  At that point,
>> I want to turn off WAL as well.  If anyone has more suggestions for this
>> workload (either things to try while still using REST, or things to try
>> once I have a java client), please let me know.
>>
>>
>>
>> Given all that, I'm currently seeing maximal throughput of about 300
>> ops/sec/server.  Has anyone with a similar disk-resident and random
>> workload seen drastically different numbers, or guesses for what I can
>> expect with the java client?
>>
>>
>>
>> Thanks!
>>
>> Adam
>>
>>
>

RE: random read/write performance

Posted by Adam Silberstein <si...@yahoo-inc.com>.
Hi J-D, 
Thanks for the tips.  Tweaking the multiplier looks easy enough.  I'm not sure how to force a major compaction.  If from M/R, does that mean you did it with the HDFS/Hadoop API?  Any guess on how long that major compaction takes?  Just wondering what it does to availability.

Thanks,
Adam

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Tuesday, October 06, 2009 9:11 AM
To: hbase-user@hadoop.apache.org
Subject: Re: random read/write performance

Adam,

Few thoughts:

 - Do you use LZO?
 - Instead of disabling the WAL, try first tweaking the safety net
that's in place. For example, setting
hbase.regionserver.logroll.multiplier to 1.5 or even higher will make
it roll less often. The current value of 0.95 means you roll every
~62MB inserted in a regionserver. You can also set
hbase.regionserver.maxlogs to something higher than 32 like 64.
 - We flush the .META. table very very often and this results,
sometimes after a big upload, in a lot of store files. Once I
force-major compacted it during a MR job and speed went 500% faster
because of the contention of all the clients.

J-D

On Tue, Oct 6, 2009 at 11:59 AM, Adam Silberstein
<si...@yahoo-inc.com> wrote:
> Hi,
>
> Just wanted to give a quick update on our HBase benchmarking efforts at
> Yahoo.  The basic use case we're looking at is:
>
> 1K records
>
> 20GB of records per node (and 6GB of memory per node, so data is not
> memory resident)
>
> Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
>
> Multiple clients doing the reads/writes (i.e. 50-200)
>
> Measure throughput vs. latency, and see how high we can push the
> throughput.
>
> Note that although we want to see where throughput maxes out, the
> workload is random, rather than scan-oriented.
>
>
>
> I've been tweaking our HBase installation based on advice I've
> read/gotten from a few people.  Currently, I'm running 0.20.0, have heap
> size set to 6GB per server, and have iCMS off.  I'm still using the REST
> server instead of the java client.  We're about to move our benchmarking
> tool to java, so at that point we can use the java API.  At that point,
> I want to turn off WAL as well.  If anyone has more suggestions for this
> workload (either things to try while still using REST, or things to try
> once I have a java client), please let me know.
>
>
>
> Given all that, I'm currently seeing maximal throughput of about 300
> ops/sec/server.  Has anyone with a similar disk-resident and random
> workload seen drastically different numbers, or guesses for what I can
> expect with the java client?
>
>
>
> Thanks!
>
> Adam
>
>

Re: random read/write performance

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Adam,

Few thoughts:

 - Do you use LZO?
 - Instead of disabling the WAL, try first tweaking the safety net
that's in place. For example, setting
hbase.regionserver.logroll.multiplier to 1.5 or even higher will make
it roll less often. The current value of 0.95 means you roll every
~62MB inserted in a regionserver. You can also set
hbase.regionserver.maxlogs to something higher than 32 like 64.
 - We flush the .META. table very very often and this results,
sometimes after a big upload, in a lot of store files. Once I
force-major compacted it during a MR job and speed went 500% faster
because of the contention of all the clients.

J-D

On Tue, Oct 6, 2009 at 11:59 AM, Adam Silberstein
<si...@yahoo-inc.com> wrote:
> Hi,
>
> Just wanted to give a quick update on our HBase benchmarking efforts at
> Yahoo.  The basic use case we're looking at is:
>
> 1K records
>
> 20GB of records per node (and 6GB of memory per node, so data is not
> memory resident)
>
> Workloads that do random reads/writes (e.g. 95% reads, 5% writes).
>
> Multiple clients doing the reads/writes (i.e. 50-200)
>
> Measure throughput vs. latency, and see how high we can push the
> throughput.
>
> Note that although we want to see where throughput maxes out, the
> workload is random, rather than scan-oriented.
>
>
>
> I've been tweaking our HBase installation based on advice I've
> read/gotten from a few people.  Currently, I'm running 0.20.0, have heap
> size set to 6GB per server, and have iCMS off.  I'm still using the REST
> server instead of the java client.  We're about to move our benchmarking
> tool to java, so at that point we can use the java API.  At that point,
> I want to turn off WAL as well.  If anyone has more suggestions for this
> workload (either things to try while still using REST, or things to try
> once I have a java client), please let me know.
>
>
>
> Given all that, I'm currently seeing maximal throughput of about 300
> ops/sec/server.  Has anyone with a similar disk-resident and random
> workload seen drastically different numbers, or guesses for what I can
> expect with the java client?
>
>
>
> Thanks!
>
> Adam
>
>

Re: random read/write performance

Posted by Ryan Rawson <ry...@gmail.com>.
Remember that random reads are the worst case of all db performance numbers.
You are limited by the spindle count and the seek time (typically 7-9ms).
Truly random reads on a data set much much larger than ram will never be
fast, no matter what the db system.

But fortunately the real world is more forgiving - there is rarely random
reads, disk buffer caches help, and you can frequently exploit data
locality.

On Oct 6, 2009 11:13 AM, "Andrew Purtell" <ap...@apache.org> wrote:

Hi Adam, thanks for writing in.

I suggest using Thrift or the native Java API instead of REST for
benchmarking performance. If you must use REST, for bulk throughput
benching, consider using Stargate (contrib/stargate/) and bulk transactions
-- scanners with 'batch' parameter set >= 100, or multi-puts with 100s or
1000s of mutations. We had a fellow up on the list some time ago who did
some localhost only benchmarking of the three API options. Java API got 22K
ops/sec, Thrift connector got 20K ops/sec, REST connector got 8K ops/sec.
The transactions were not batched in nature. Absolute numbers are not
important. Note the scale of the differences.

> Note that although we want to see where throughput maxes out, the workload
is random, rather than...
That's currently an impedance mismatch. HBase throughput with 0.20.0 is best
with scanners. MultiGet/Put/Delete is on deck but not ready yet:
https://issues.apache.org/jira/browse/HBASE-1845

  - Andy




________________________________
From: Adam Silberstein <si...@yahoo-inc.com>

To: hbase-user@hadoop.apache.org
Sent: Tuesday, October 6, 2009 8:59:30 AM
Subject: random read/write performance

Hi, Just wanted to give a quick update on our HBase benchmarking efforts at
Yahoo. The basic use ...

Re: random read/write performance

Posted by Andrew Purtell <ap...@apache.org>.
Hi Adam, thanks for writing in.

I suggest using Thrift or the native Java API instead of REST for benchmarking performance. If you must use REST, for bulk throughput benching, consider using Stargate (contrib/stargate/) and bulk transactions -- scanners with 'batch' parameter set >= 100, or multi-puts with 100s or 1000s of mutations. We had a fellow up on the list some time ago who did some localhost only benchmarking of the three API options. Java API got 22K ops/sec, Thrift connector got 20K ops/sec, REST connector got 8K ops/sec. The transactions were not batched in nature. Absolute numbers are not important. Note the scale of the differences. 

> Note that although we want to see where throughput maxes out, the workload is random, rather than scan-oriented.

That's currently an impedance mismatch. HBase throughput with 0.20.0 is best with scanners. MultiGet/Put/Delete is on deck but not ready yet: https://issues.apache.org/jira/browse/HBASE-1845

   - Andy




________________________________
From: Adam Silberstein <si...@yahoo-inc.com>
To: hbase-user@hadoop.apache.org
Sent: Tuesday, October 6, 2009 8:59:30 AM
Subject: random read/write performance

Hi,

Just wanted to give a quick update on our HBase benchmarking efforts at
Yahoo.  The basic use case we're looking at is:

1K records

20GB of records per node (and 6GB of memory per node, so data is not
memory resident)

Workloads that do random reads/writes (e.g. 95% reads, 5% writes).

Multiple clients doing the reads/writes (i.e. 50-200)

Measure throughput vs. latency, and see how high we can push the
throughput.  

Note that although we want to see where throughput maxes out, the
workload is random, rather than scan-oriented.



I've been tweaking our HBase installation based on advice I've
read/gotten from a few people.  Currently, I'm running 0.20.0, have heap
size set to 6GB per server, and have iCMS off.  I'm still using the REST
server instead of the java client.  We're about to move our benchmarking
tool to java, so at that point we can use the java API.  At that point,
I want to turn off WAL as well.  If anyone has more suggestions for this
workload (either things to try while still using REST, or things to try
once I have a java client), please let me know. 



Given all that, I'm currently seeing maximal throughput of about 300
ops/sec/server.  Has anyone with a similar disk-resident and random
workload seen drastically different numbers, or guesses for what I can
expect with the java client?



Thanks!

Adam