You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Akmal Abbasov <ak...@icloud.com> on 2016/03/08 17:28:38 UTC

HBase poor write performance

Hi,
I'm testing HBase to choose the right hardware configurations for a heavy write use case. I'm testing using YCSB. 
The cluster consist of 2 masters, and 5 regionservers(4 cores, 14GB ram, 4x512GB SSD). 
I've created a new table in HBase, presplit it to 50 regions. I'm running 3 clients each running 50 threads, to insert data.
I'm using the default HBase settings. After running few tests, I can see that the cluster is underutilized, in fact memory usage is around 30%.
The main problem I see for now is compactions, compactionQueueLength is growing very fast, and compaction process is always running.
I found that there are hbase.regionserver.thread.compaction.small and hbase.regionserver.thread.compaction.large but couldn't find information regarding their default values.
I am also planing to increase the regions number and the memstore size to increase utilization of the cluster and performance.
Which other settings should be tuned to improve both utilization and performance?
Thank you.

I'm using HBase 0.98.7 and regionserver heap size is 7GB.

Regards, Akmal

Re: HBase poor write performance

Posted by Vladimir Rodionov <vl...@gmail.com>.

hbase has too many knobs to tune and selection of right ones depends on a
use case (heavy write continuous, heavy writes burst mode, neavy
writes/reads etc) and available hardware.

General recommendations:

1. Try to load data in bulk
2. Presplit tables in advance and avoid splitting after that (set
DisabledRegionSplitPolicy or ConstantSizeRegionSPlitPolicy)
3. Disable automatic major compactions - do it manually in off peak time
4. Have separate compaction settings for on/off peak times
5. (SSD) Set limit on minor max compaction size
("hbase.hstore.compaction.max.size") It is worth upgrading to 0.98.latest
to have

"hbase.hstore.compaction.max.size.offpeak". By setting this limit and
disbaling automatic major compactions you will decrease

 compaction activity significantly. Downside? The minimum number of store
files per region is going to be > (Region Size) / (Max Compaction Size)
6. Buy support from HW

-Vlad


On Tue, Mar 8, 2016 at 11:19 AM, Heng Chen <he...@gmail.com> wrote:

> what is your HLogs File num during test,   is it always the max number
> (IIRC, default is 34?).
>
> How many DNs in your hdfs?
>
> 2016-03-09 1:31 GMT+08:00 Frank Luo <jl...@merkleinc.com>:
>
> > 0.98
> >
> > "Light" means not enough to trigger compacts during actively write.
> >
> > -----Original Message-----
> > From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> Stack
> > Sent: Tuesday, March 08, 2016 11:29 AM
> > To: Hbase-User
> > Subject: Re: HBase poor write performance
> >
> > On Tue, Mar 8, 2016 at 8:49 AM, Frank Luo <jl...@merkleinc.com> wrote:
> >
> > > Akmal,
> > >
> > > We have been suffering the issue for two years now without a good
> > > solution. From what I learned, it is not really a good idea to do
> > > heavy online hbase puts. The first thing you encounter will be
> > > performance caused by compact no matter how you tune parameters. Then
> > > later on you will see job failures because hbase operation timeouts
> > and/or region server crashes.
> > >
> > > Light writes, heavy reads are generally OK.
> > >
> > >
> > What version are you running Frank?
> >
> > Yes, bulk load is >>> than Puts via API but I'd be interested in what
> > 'light' means for you.
> >
> > Thanks,
> > St.Ack
> >
> >
> >
> > > For heavy puts, the best practice is to prepare tables offline, then
> > > turn it on for reads.
> > >
> > > If online heavy puts not avoidable, you might get the best out of it
> > > if you manage compact/split by yourself. Meaning when # of files per
> > > region reaches certain number, stops writing, performs  compacts and
> > > splits with large regions; then resume writing.
> > >
> > > I hope it helps.
> > >
> > > Frank Luo
> > >
> > > From: Akmal Abbasov [mailto:akmal.abbasov@icloud.com]
> > > Sent: Tuesday, March 08, 2016 10:29 AM
> > > To: user@hbase.apache.org
> > > Subject: HBase poor write performance
> > >
> > > Hi,
> > > I'm testing HBase to choose the right hardware configurations for a
> > > heavy write use case. I'm testing using YCSB.
> > > The cluster consist of 2 masters, and 5 regionservers(4 cores, 14GB
> > > ram, 4x512GB SSD).
> > > I've created a new table in HBase, presplit it to 50 regions. I'm
> > > running
> > > 3 clients each running 50 threads, to insert data.
> > > I'm using the default HBase settings. After running few tests, I can
> > > see that the cluster is underutilized, in fact memory usage is around
> > 30%.
> > > The main problem I see for now is compactions, compactionQueueLength
> > > is growing very fast, and compaction process is always running.
> > > I found that there are hbase.regionserver.thread.compaction.small and
> > > hbase.regionserver.thread.compaction.large but couldn't find
> > > information regarding their default values.
> > > I am also planing to increase the regions number and the memstore size
> > > to increase utilization of the cluster and performance.
> > > Which other settings should be tuned to improve both utilization and
> > > performance?
> > > Thank you.
> > >
> > >
> > > I'm using HBase 0.98.7 and regionserver heap size is 7GB.
> > >
> > >
> > > Regards, Akmal
> > >
> > > This email and any attachments transmitted with it are intended for
> > > use by the intended recipient(s) only. If you have received this email
> > > in error, please notify the sender immediately and then delete it. If
> > > you are not the intended recipient, you must not keep, use, disclose,
> > > copy or distribute this email without the author’s prior permission.
> > > We take precautions to minimize the risk of transmitting software
> > > viruses, but we advise you to perform your own virus checks on any
> > > attachment to this message. We cannot accept liability for any loss or
> > > damage caused by software viruses. The information contained in this
> > > communication may be confidential and may be subject to the
> > attorney-client privilege.
> > >
> > This email and any attachments transmitted with it are intended for use
> by
> > the intended recipient(s) only. If you have received this email in error,
> > please notify the sender immediately and then delete it. If you are not
> the
> > intended recipient, you must not keep, use, disclose, copy or distribute
> > this email without the author’s prior permission. We take precautions to
> > minimize the risk of transmitting software viruses, but we advise you to
> > perform your own virus checks on any attachment to this message. We
> cannot
> > accept liability for any loss or damage caused by software viruses. The
> > information contained in this communication may be confidential and may
> be
> > subject to the attorney-client privilege.
> >
>

Re: HBase poor write performance

Posted by Heng Chen <he...@gmail.com>.

what is your HLogs File num during test,   is it always the max number
(IIRC, default is 34?).

How many DNs in your hdfs?

2016-03-09 1:31 GMT+08:00 Frank Luo <jl...@merkleinc.com>:

> 0.98
>
> "Light" means not enough to trigger compacts during actively write.
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Tuesday, March 08, 2016 11:29 AM
> To: Hbase-User
> Subject: Re: HBase poor write performance
>
> On Tue, Mar 8, 2016 at 8:49 AM, Frank Luo <jl...@merkleinc.com> wrote:
>
> > Akmal,
> >
> > We have been suffering the issue for two years now without a good
> > solution. From what I learned, it is not really a good idea to do
> > heavy online hbase puts. The first thing you encounter will be
> > performance caused by compact no matter how you tune parameters. Then
> > later on you will see job failures because hbase operation timeouts
> and/or region server crashes.
> >
> > Light writes, heavy reads are generally OK.
> >
> >
> What version are you running Frank?
>
> Yes, bulk load is >>> than Puts via API but I'd be interested in what
> 'light' means for you.
>
> Thanks,
> St.Ack
>
>
>
> > For heavy puts, the best practice is to prepare tables offline, then
> > turn it on for reads.
> >
> > If online heavy puts not avoidable, you might get the best out of it
> > if you manage compact/split by yourself. Meaning when # of files per
> > region reaches certain number, stops writing, performs  compacts and
> > splits with large regions; then resume writing.
> >
> > I hope it helps.
> >
> > Frank Luo
> >
> > From: Akmal Abbasov [mailto:akmal.abbasov@icloud.com]
> > Sent: Tuesday, March 08, 2016 10:29 AM
> > To: user@hbase.apache.org
> > Subject: HBase poor write performance
> >
> > Hi,
> > I'm testing HBase to choose the right hardware configurations for a
> > heavy write use case. I'm testing using YCSB.
> > The cluster consist of 2 masters, and 5 regionservers(4 cores, 14GB
> > ram, 4x512GB SSD).
> > I've created a new table in HBase, presplit it to 50 regions. I'm
> > running
> > 3 clients each running 50 threads, to insert data.
> > I'm using the default HBase settings. After running few tests, I can
> > see that the cluster is underutilized, in fact memory usage is around
> 30%.
> > The main problem I see for now is compactions, compactionQueueLength
> > is growing very fast, and compaction process is always running.
> > I found that there are hbase.regionserver.thread.compaction.small and
> > hbase.regionserver.thread.compaction.large but couldn't find
> > information regarding their default values.
> > I am also planing to increase the regions number and the memstore size
> > to increase utilization of the cluster and performance.
> > Which other settings should be tuned to improve both utilization and
> > performance?
> > Thank you.
> >
> >
> > I'm using HBase 0.98.7 and regionserver heap size is 7GB.
> >
> >
> > Regards, Akmal
> >
> > This email and any attachments transmitted with it are intended for
> > use by the intended recipient(s) only. If you have received this email
> > in error, please notify the sender immediately and then delete it. If
> > you are not the intended recipient, you must not keep, use, disclose,
> > copy or distribute this email without the author’s prior permission.
> > We take precautions to minimize the risk of transmitting software
> > viruses, but we advise you to perform your own virus checks on any
> > attachment to this message. We cannot accept liability for any loss or
> > damage caused by software viruses. The information contained in this
> > communication may be confidential and may be subject to the
> attorney-client privilege.
> >
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>

RE: HBase poor write performance

Posted by Frank Luo <jl...@merkleinc.com>.

0.98

"Light" means not enough to trigger compacts during actively write.

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Tuesday, March 08, 2016 11:29 AM
To: Hbase-User
Subject: Re: HBase poor write performance

On Tue, Mar 8, 2016 at 8:49 AM, Frank Luo <jl...@merkleinc.com> wrote:

> Akmal,
>
> We have been suffering the issue for two years now without a good
> solution. From what I learned, it is not really a good idea to do
> heavy online hbase puts. The first thing you encounter will be
> performance caused by compact no matter how you tune parameters. Then
> later on you will see job failures because hbase operation timeouts and/or region server crashes.
>
> Light writes, heavy reads are generally OK.
>
>
What version are you running Frank?

Yes, bulk load is >>> than Puts via API but I'd be interested in what 'light' means for you.

Thanks,
St.Ack



> For heavy puts, the best practice is to prepare tables offline, then
> turn it on for reads.
>
> If online heavy puts not avoidable, you might get the best out of it
> if you manage compact/split by yourself. Meaning when # of files per
> region reaches certain number, stops writing, performs  compacts and
> splits with large regions; then resume writing.
>
> I hope it helps.
>
> Frank Luo
>
> From: Akmal Abbasov [mailto:akmal.abbasov@icloud.com]
> Sent: Tuesday, March 08, 2016 10:29 AM
> To: user@hbase.apache.org
> Subject: HBase poor write performance
>
> Hi,
> I'm testing HBase to choose the right hardware configurations for a
> heavy write use case. I'm testing using YCSB.
> The cluster consist of 2 masters, and 5 regionservers(4 cores, 14GB
> ram, 4x512GB SSD).
> I've created a new table in HBase, presplit it to 50 regions. I'm
> running
> 3 clients each running 50 threads, to insert data.
> I'm using the default HBase settings. After running few tests, I can
> see that the cluster is underutilized, in fact memory usage is around 30%.
> The main problem I see for now is compactions, compactionQueueLength
> is growing very fast, and compaction process is always running.
> I found that there are hbase.regionserver.thread.compaction.small and
> hbase.regionserver.thread.compaction.large but couldn't find
> information regarding their default values.
> I am also planing to increase the regions number and the memstore size
> to increase utilization of the cluster and performance.
> Which other settings should be tuned to improve both utilization and
> performance?
> Thank you.
>
>
> I'm using HBase 0.98.7 and regionserver heap size is 7GB.
>
>
> Regards, Akmal
>
> This email and any attachments transmitted with it are intended for
> use by the intended recipient(s) only. If you have received this email
> in error, please notify the sender immediately and then delete it. If
> you are not the intended recipient, you must not keep, use, disclose,
> copy or distribute this email without the author’s prior permission.
> We take precautions to minimize the risk of transmitting software
> viruses, but we advise you to perform your own virus checks on any
> attachment to this message. We cannot accept liability for any loss or
> damage caused by software viruses. The information contained in this
> communication may be confidential and may be subject to the attorney-client privilege.
>
This email and any attachments transmitted with it are intended for use by the intended recipient(s) only. If you have received this email in error, please notify the sender immediately and then delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or distribute this email without the author’s prior permission. We take precautions to minimize the risk of transmitting software viruses, but we advise you to perform your own virus checks on any attachment to this message. We cannot accept liability for any loss or damage caused by software viruses. The information contained in this communication may be confidential and may be subject to the attorney-client privilege.

Re: HBase poor write performance

Posted by Stack <st...@duboce.net>.

On Tue, Mar 8, 2016 at 8:49 AM, Frank Luo <jl...@merkleinc.com> wrote:

> Akmal,
>
> We have been suffering the issue for two years now without a good
> solution. From what I learned, it is not really a good idea to do heavy
> online hbase puts. The first thing you encounter will be performance caused
> by compact no matter how you tune parameters. Then later on you will see
> job failures because hbase operation timeouts and/or region server crashes.
>
> Light writes, heavy reads are generally OK.
>
>
What version are you running Frank?

Yes, bulk load is >>> than Puts via API but I'd be interested in what
'light' means for you.

Thanks,
St.Ack



> For heavy puts, the best practice is to prepare tables offline, then turn
> it on for reads.
>
> If online heavy puts not avoidable, you might get the best out of it if
> you manage compact/split by yourself. Meaning when # of files per region
> reaches certain number, stops writing, performs  compacts and splits with
> large regions; then resume writing.
>
> I hope it helps.
>
> Frank Luo
>
> From: Akmal Abbasov [mailto:akmal.abbasov@icloud.com]
> Sent: Tuesday, March 08, 2016 10:29 AM
> To: user@hbase.apache.org
> Subject: HBase poor write performance
>
> Hi,
> I'm testing HBase to choose the right hardware configurations for a heavy
> write use case. I'm testing using YCSB.
> The cluster consist of 2 masters, and 5 regionservers(4 cores, 14GB ram,
> 4x512GB SSD).
> I've created a new table in HBase, presplit it to 50 regions. I'm running
> 3 clients each running 50 threads, to insert data.
> I'm using the default HBase settings. After running few tests, I can see
> that the cluster is underutilized, in fact memory usage is around 30%.
> The main problem I see for now is compactions, compactionQueueLength is
> growing very fast, and compaction process is always running.
> I found that there are hbase.regionserver.thread.compaction.small and
> hbase.regionserver.thread.compaction.large but couldn't find information
> regarding their default values.
> I am also planing to increase the regions number and the memstore size to
> increase utilization of the cluster and performance.
> Which other settings should be tuned to improve both utilization and
> performance?
> Thank you.
>
>
> I'm using HBase 0.98.7 and regionserver heap size is 7GB.
>
>
> Regards, Akmal
>
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>

RE: HBase poor write performance

Posted by Frank Luo <jl...@merkleinc.com>.

Akmal,

We have been suffering the issue for two years now without a good solution. From what I learned, it is not really a good idea to do heavy online hbase puts. The first thing you encounter will be performance caused by compact no matter how you tune parameters. Then later on you will see job failures because hbase operation timeouts and/or region server crashes.

Light writes, heavy reads are generally OK.

For heavy puts, the best practice is to prepare tables offline, then turn it on for reads.

If online heavy puts not avoidable, you might get the best out of it if you manage compact/split by yourself. Meaning when # of files per region reaches certain number, stops writing, performs  compacts and splits with large regions; then resume writing.

I hope it helps.

Frank Luo

From: Akmal Abbasov [mailto:akmal.abbasov@icloud.com]
Sent: Tuesday, March 08, 2016 10:29 AM
To: user@hbase.apache.org
Subject: HBase poor write performance

Hi,
I'm testing HBase to choose the right hardware configurations for a heavy write use case. I'm testing using YCSB.
The cluster consist of 2 masters, and 5 regionservers(4 cores, 14GB ram, 4x512GB SSD).
I've created a new table in HBase, presplit it to 50 regions. I'm running 3 clients each running 50 threads, to insert data.
I'm using the default HBase settings. After running few tests, I can see that the cluster is underutilized, in fact memory usage is around 30%.
The main problem I see for now is compactions, compactionQueueLength is growing very fast, and compaction process is always running.
I found that there are hbase.regionserver.thread.compaction.small and hbase.regionserver.thread.compaction.large but couldn't find information regarding their default values.
I am also planing to increase the regions number and the memstore size to increase utilization of the cluster and performance.
Which other settings should be tuned to improve both utilization and performance?
Thank you.


I'm using HBase 0.98.7 and regionserver heap size is 7GB.


Regards, Akmal

This email and any attachments transmitted with it are intended for use by the intended recipient(s) only. If you have received this email in error, please notify the sender immediately and then delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or distribute this email without the author’s prior permission. We take precautions to minimize the risk of transmitting software viruses, but we advise you to perform your own virus checks on any attachment to this message. We cannot accept liability for any loss or damage caused by software viruses. The information contained in this communication may be confidential and may be subject to the attorney-client privilege.

Re: HBase poor write performance

Posted by Sean Busbey <bu...@cloudera.com>.

On Tue, Mar 8, 2016 at 8:28 AM, Akmal Abbasov <ak...@icloud.com>
wrote:

> Hi,
> I'm testing HBase to choose the right hardware configurations for a heavy
> write use case. I'm testing using YCSB.
> The cluster consist of 2 masters, and 5 regionservers(4 cores, 14GB ram,
> 4x512GB SSD).
> I've created a new table in HBase, presplit it to 50 regions. I'm running
> 3 clients each running 50 threads, to insert data.
> I'm using the default HBase settings. After running few tests, I can see
> that the cluster is underutilized, in fact memory usage is around 30%.
> The main problem I see for now is compactions, compactionQueueLength is
> growing very fast, and compaction process is always running.
> I found that there are hbase.regionserver.thread.compaction.small and hbase.regionserver.thread.compaction.large
> but couldn't find information regarding their default values.
> I am also planing to increase the regions number and the memstore size to
> increase utilization of the cluster and performance.
> Which other settings should be tuned to improve both utilization and
> performance?
> Thank you.
>
> I'm using HBase 0.98.7 and regionserver heap size is 7GB.
>
>
Which YCSB version are you using?

Re: HBase poor write performance

Posted by Stack <st...@duboce.net>.

On Tue, Mar 8, 2016 at 8:28 AM, Akmal Abbasov <ak...@icloud.com>
wrote:

> Hi,
> I'm testing HBase to choose the right hardware configurations for a heavy
> write use case. I'm testing using YCSB.
>
The cluster consist of 2 masters, and 5 regionservers(4 cores, 14GB ram,
> 4x512GB SSD).
> I've created a new table in HBase, presplit it to 50 regions. I'm running
> 3 clients each running 50 threads, to insert data.
>

What happens if you double the number of clients?

> I'm using the default HBase settings.
>

Our defaults are intentionally conservative set for broad appeal. For now,
there is an expectation that configs are adjusted to suit the loadings.

> After running few tests, I can see that the cluster is underutilized, in
> fact memory usage is around 30%.
>

Which memory? Java heap? Or the systems' memory? If java heap, we make
broad allocations for cache and write-time memstore. You might allocate
more to memstore if you are write-centric (Later versions of hbase try to
do this ergonomically).

> The main problem I see for now is compactions, compactionQueueLength is
> growing very fast, and compaction process is always running.
>

Compactions running all the time while you are under write load is
'normal'. How are you seeing this manifest as a 'problem'?

Show us  a bit of your log from a regionserver and take a few thread dumps
while it is running so we can make suggestion.

> I found that there are hbase.regionserver.thread.compaction.small and hbase.regionserver.thread.compaction.large
> but couldn't find information regarding their default values.
>

Yes. This stuff is missing from the doc, in code only. Lets figure if they
need adjustment first.

> I am also planing to increase the regions number and the memstore size to
> increase utilization of the cluster and performance.
> Which other settings should be tuned to improve both utilization and
> performance?
>

You've browsed this section http://hbase.apache.org/book.html#performance ?

> Thank you.
>
>

> I'm using HBase 0.98.7 and regionserver heap size is 7GB.
>

Can you use a later HBase? 1.2.0 just got released.
St.Ack

>
> Regards, Akmal
>