You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Amit Jain <ja...@gmail.com> on 2011/11/17 00:06:17 UTC

Help with continuous loading configuration

Hello,

We're doing a proof-of-concept study to see if HBase is a good fit for an
application we're planning to build.  The application will be recording a
continuous stream of sensor data throughout the day and the data needs to
be online immediately.  Our test cluster consists of 16 machines, each with
16 cores and 32GB of RAM and 8TB local storage running CDH3u2.  We're using
the HBase client Put class, and have set the table "auto flush" to false
and the write buffer size to 12MB.  Here are the region server JVM options:

export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"

And here are the property settings that we're using in the hbase-site.xml
file:

hbase.rootdir=hdfs://master:9000/hbase
hbase.regionserver.handler.count=20
hbase.cluster.distributed=true
hbase.zookeeper.quorum=zk01,zk02,zk03
hfile.block.cache.size=0
hbase.hregion.max.filesize=1073741824
hbase.regionserver.global.memstore.upperLimit=0.79
hbase.regionserver.global.memstore.lowerLimit=0.70
hbase.hregion.majorcompaction=0
hbase.hstore.compactionThreshold=15
hbase.hstore.blockingStoreFiles=20
hbase.rpc.timeout=0
zookeeper.session.timeout=3600000

It's taking about 24 hours to load 4TB of data which isn't quite fast
enough for our application.  Is there a more optimal configuration that we
can use to improve loading performance?

- Amit

Re: Help with continuous loading configuration

Posted by Amit Jain <ja...@gmail.com>.

We would prefer not to do this.  It's important that we have all of the
historical data without any loss.  But thanks for the suggestion.

- Amit

On Wed, Nov 16, 2011 at 4:30 PM, Matt Corgan <mc...@hotpads.com> wrote:

> You can set put.setWriteToWAL(false) to skip the write ahead logging which
> slows down puts significantly.  But, you will lose data if a regionserver
> crashes with data in its memstore.
>
>
> On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <ja...@gmail.com> wrote:
>
> > Hi Stack,
> >
> > Thanks for the feedback.  Comments inline ...
> >
> > On Wed, Nov 16, 2011 at 3:35 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <ja...@gmail.com>
> wrote:
> > > > Hi Lars,
> > > >
> > > > The keys are arriving in random order.  The HBase monitoring page
> shows
> > > > evenly distributed load across all of the region servers.
> > >
> > > What kind of ops rates are you seeing?  They are running nice and
> > > smooth across all servers?   No stuttering?   Whats your regionserver
> > > logs look like?
> > >
> > > Are you presplitting your table or just letting hbase run and do up the
> > > splits?
> > >
> >
> > As far as I can tell, the operations look smooth across all servers.
>  We're
> > not doing any pre-splitting, just letting HBase do the splits.
> >
> >
> > > >  I didn't see
> > > > anything weird in the gc logs, no mention of any failures.  I'm a
> > little
> > > > unclear about what the optimal values for the following properties
> > should
> > > > be:
> > > >
> > > > hbase.hstore.compactionThreshold
> > >
> > > Default is 3.  Look in regionserver logs.  See how many files you have
> > > on average by region columnfamily (you could also look in filesystem).
> > >  Are we constantly rewriting them?   If write only load mostly, you
> > > might up this putting off compactions till more files around (but
> > > looking in regionserver logs, if high write rate, we might be having
> > > trouble keeping up with this default threshold anyways?).
> > >
> >
> > Well, it looks like half of the regions are in the 25-32 file range and
> the
> > other half just have 1 or 2 files.  This was when we ran it with a
> > compactionThreshold of 15.
> >
> > How can I tell by looking at the region server logs if we're seeing a
> "high
> > write rate" ?  We've got 48 clients sending load, 12 region servers
> total.
> >  We're pushing the system pretty hard.
> >
> >
> > > > hbase.hstore.blockingStoreFiles
> > > >
> > >
> > > The higher this is, the bigger the price you'll pay if a server
> > > crashes because this will be the upper bound on how many WAL logs we
> > > need to split for the server before its regions come back on line
> > > again.  Leave it default I'd say for now.
> > >
> >
> > Ok, we'll leave it default for now.
> >
> >
> > > > Is there some rule of thumb that I can use to determine good values
> for
> > > > these properties?
> > > >
> > >
> > > You've checked out this section of the book:
> > > http://hbase.apache.org/book.html#performance
> > >
> > > Are you filling the machines?   Are they burning cpu?  Or io-bound?
> > > If not, perhaps open the front gate wider by upping the number of
> > > concurrent handlers.
> > >
> >
> > I have read through that section of the HBase book.  There is plenty of
> CPU
> > available.  How do I up the number of concurrent handlers?  Increase
> > hbase.regionserver.handler.count ?
> >
> > - Amit
> >
>

Re: Help with continuous loading configuration

Posted by Matt Corgan <mc...@hotpads.com>.

You can set put.setWriteToWAL(false) to skip the write ahead logging which
slows down puts significantly.  But, you will lose data if a regionserver
crashes with data in its memstore.


On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <ja...@gmail.com> wrote:

> Hi Stack,
>
> Thanks for the feedback.  Comments inline ...
>
> On Wed, Nov 16, 2011 at 3:35 PM, Stack <st...@duboce.net> wrote:
>
> > On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <ja...@gmail.com> wrote:
> > > Hi Lars,
> > >
> > > The keys are arriving in random order.  The HBase monitoring page shows
> > > evenly distributed load across all of the region servers.
> >
> > What kind of ops rates are you seeing?  They are running nice and
> > smooth across all servers?   No stuttering?   Whats your regionserver
> > logs look like?
> >
> > Are you presplitting your table or just letting hbase run and do up the
> > splits?
> >
>
> As far as I can tell, the operations look smooth across all servers.  We're
> not doing any pre-splitting, just letting HBase do the splits.
>
>
> > >  I didn't see
> > > anything weird in the gc logs, no mention of any failures.  I'm a
> little
> > > unclear about what the optimal values for the following properties
> should
> > > be:
> > >
> > > hbase.hstore.compactionThreshold
> >
> > Default is 3.  Look in regionserver logs.  See how many files you have
> > on average by region columnfamily (you could also look in filesystem).
> >  Are we constantly rewriting them?   If write only load mostly, you
> > might up this putting off compactions till more files around (but
> > looking in regionserver logs, if high write rate, we might be having
> > trouble keeping up with this default threshold anyways?).
> >
>
> Well, it looks like half of the regions are in the 25-32 file range and the
> other half just have 1 or 2 files.  This was when we ran it with a
> compactionThreshold of 15.
>
> How can I tell by looking at the region server logs if we're seeing a "high
> write rate" ?  We've got 48 clients sending load, 12 region servers total.
>  We're pushing the system pretty hard.
>
>
> > > hbase.hstore.blockingStoreFiles
> > >
> >
> > The higher this is, the bigger the price you'll pay if a server
> > crashes because this will be the upper bound on how many WAL logs we
> > need to split for the server before its regions come back on line
> > again.  Leave it default I'd say for now.
> >
>
> Ok, we'll leave it default for now.
>
>
> > > Is there some rule of thumb that I can use to determine good values for
> > > these properties?
> > >
> >
> > You've checked out this section of the book:
> > http://hbase.apache.org/book.html#performance
> >
> > Are you filling the machines?   Are they burning cpu?  Or io-bound?
> > If not, perhaps open the front gate wider by upping the number of
> > concurrent handlers.
> >
>
> I have read through that section of the HBase book.  There is plenty of CPU
> available.  How do I up the number of concurrent handlers?  Increase
> hbase.regionserver.handler.count ?
>
> - Amit
>

Re: Help with continuous loading configuration

Posted by Amit Jain <ja...@gmail.com>.

Hi Ram,

For this test, the data is synthetically generated and the keys are just
random fixed-width integers.  We're loading into a single table with a one
column family.  The real data would be less uniform, but we just want to
get an idea of whether or not it is feasible.

- Amit

On Wed, Nov 16, 2011 at 9:07 PM, Ramkrishna S Vasudevan <
ramkrishna.vasudevan@huawei.com> wrote:

>
> Hi Amit
>
> As you said the regions may be distributed evenly across RS, if you can see
> if the puts are reaching to a particular RS only at any point of time it
> will surely overload the RS.
>
> As Stack pointed out, what is your schema and how is your row key designed
> ?
>
> Regards
> Ram
>
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Thursday, November 17, 2011 9:29 AM
> To: user@hbase.apache.org
> Cc: lars hofhansl
> Subject: Re: Help with continuous loading configuration
>
> On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <ja...@gmail.com> wrote:
> > On Wed, Nov 16, 2011 at 3:35 PM, Stack <st...@duboce.net> wrote:
> >
> >> On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <ja...@gmail.com> wrote:
> >> > Hi Lars,
> >> >
> >> > The keys are arriving in random order.  The HBase monitoring page
> shows
> >> > evenly distributed load across all of the region servers.
> >>
> >> What kind of ops rates are you seeing?  They are running nice and
> >> smooth across all servers?   No stuttering?   Whats your regionserver
> >> logs look like?
> >>
> >> Are you presplitting your table or just letting hbase run and do up the
> >> splits?
> >>
> >
> > As far as I can tell, the operations look smooth across all servers.
>  We're
> > not doing any pre-splitting, just letting HBase do the splits.
> >
>
> So, how many requests per second per server.
>
> How many column families?  What size are the puts on average?
>
>
> > Well, it looks like half of the regions are in the 25-32 file range and
> the
> > other half just have 1 or 2 files.  This was when we ran it with a
> > compactionThreshold of 15.
> >
>
> So, its this count even after the load comes off?  Maybe compactions
> get a chance to cut in and it should shrink them.
>
>
> > How can I tell by looking at the region server logs if we're seeing a
> "high
> > write rate" ?
>
> Look at UI for basic ops/second.
>
>
> > I have read through that section of the HBase book.  There is plenty of
> CPU
> > available.  How do I up the number of concurrent handlers?  Increase
> > hbase.regionserver.handler.count ?
> >
>
> Yes.  You have it pretty low at the moment.
>
> What kinda of performance are you looking for?
>
> Post your configs so we can look at them.  Post a bit of your
> regionserver log and your table schema.
> St.Ack
>
>

Re: Help with continuous loading configuration

Posted by Amit Jain <ja...@gmail.com>.

Hi Stack,

Right now we're just testing.  There's a single table with just one column
family and the size of each put is about 5KB.  We made some of the changes
that you suggested (including upping handler count to 50) and have
restarted the test.

Attached are the config files that we're using (hbase-env.sh,
hbase-site.xml, and hdfs-site.xml).  I've also included a screen shot of
the HBase admin console after about two hours of operation.  We're shooting
for getting 10TB of data loaded in about 48 hours.

- Amit

On Wed, Nov 16, 2011 at 7:58 PM, Stack <st...@duboce.net> wrote:

> On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <ja...@gmail.com> wrote:
> > On Wed, Nov 16, 2011 at 3:35 PM, Stack <st...@duboce.net> wrote:
> >
> >> On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <ja...@gmail.com> wrote:
> >> > Hi Lars,
> >> >
> >> > The keys are arriving in random order.  The HBase monitoring page
> shows
> >> > evenly distributed load across all of the region servers.
> >>
> >> What kind of ops rates are you seeing?  They are running nice and
> >> smooth across all servers?   No stuttering?   Whats your regionserver
> >> logs look like?
> >>
> >> Are you presplitting your table or just letting hbase run and do up the
> >> splits?
> >>
> >
> > As far as I can tell, the operations look smooth across all servers.
>  We're
> > not doing any pre-splitting, just letting HBase do the splits.
> >
>
> So, how many requests per second per server.
>
> How many column families?  What size are the puts on average?
>
>
> > Well, it looks like half of the regions are in the 25-32 file range and
> the
> > other half just have 1 or 2 files.  This was when we ran it with a
> > compactionThreshold of 15.
> >
>
> So, its this count even after the load comes off?  Maybe compactions
> get a chance to cut in and it should shrink them.
>
>
> > How can I tell by looking at the region server logs if we're seeing a
> "high
> > write rate" ?
>
> Look at UI for basic ops/second.
>
>
> > I have read through that section of the HBase book.  There is plenty of
> CPU
> > available.  How do I up the number of concurrent handlers?  Increase
> > hbase.regionserver.handler.count ?
> >
>
> Yes.  You have it pretty low at the moment.
>
> What kinda of performance are you looking for?
>
> Post your configs so we can look at them.  Post a bit of your
> regionserver log and your table schema.
> St.Ack
>

Re: Help with continuous loading configuration

Posted by Stack <st...@duboce.net>.

On Wed, Nov 16, 2011 at 4:09 PM, Amit Jain <ja...@gmail.com> wrote:
> On Wed, Nov 16, 2011 at 3:35 PM, Stack <st...@duboce.net> wrote:
>
>> On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <ja...@gmail.com> wrote:
>> > Hi Lars,
>> >
>> > The keys are arriving in random order.  The HBase monitoring page shows
>> > evenly distributed load across all of the region servers.
>>
>> What kind of ops rates are you seeing?  They are running nice and
>> smooth across all servers?   No stuttering?   Whats your regionserver
>> logs look like?
>>
>> Are you presplitting your table or just letting hbase run and do up the
>> splits?
>>
>
> As far as I can tell, the operations look smooth across all servers.  We're
> not doing any pre-splitting, just letting HBase do the splits.
>

So, how many requests per second per server.

How many column families?  What size are the puts on average?


> Well, it looks like half of the regions are in the 25-32 file range and the
> other half just have 1 or 2 files.  This was when we ran it with a
> compactionThreshold of 15.
>

So, its this count even after the load comes off?  Maybe compactions
get a chance to cut in and it should shrink them.


> How can I tell by looking at the region server logs if we're seeing a "high
> write rate" ?

Look at UI for basic ops/second.


> I have read through that section of the HBase book.  There is plenty of CPU
> available.  How do I up the number of concurrent handlers?  Increase
> hbase.regionserver.handler.count ?
>

Yes.  You have it pretty low at the moment.

What kinda of performance are you looking for?

Post your configs so we can look at them.  Post a bit of your
regionserver log and your table schema.
St.Ack

Re: Help with continuous loading configuration

Posted by Amit Jain <ja...@gmail.com>.

Hi Stack,

Thanks for the feedback.  Comments inline ...

On Wed, Nov 16, 2011 at 3:35 PM, Stack <st...@duboce.net> wrote:

> On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <ja...@gmail.com> wrote:
> > Hi Lars,
> >
> > The keys are arriving in random order.  The HBase monitoring page shows
> > evenly distributed load across all of the region servers.
>
> What kind of ops rates are you seeing?  They are running nice and
> smooth across all servers?   No stuttering?   Whats your regionserver
> logs look like?
>
> Are you presplitting your table or just letting hbase run and do up the
> splits?
>

As far as I can tell, the operations look smooth across all servers.  We're
not doing any pre-splitting, just letting HBase do the splits.

> >  I didn't see
> > anything weird in the gc logs, no mention of any failures.  I'm a little
> > unclear about what the optimal values for the following properties should
> > be:
> >
> > hbase.hstore.compactionThreshold
>
> Default is 3.  Look in regionserver logs.  See how many files you have
> on average by region columnfamily (you could also look in filesystem).
>  Are we constantly rewriting them?   If write only load mostly, you
> might up this putting off compactions till more files around (but
> looking in regionserver logs, if high write rate, we might be having
> trouble keeping up with this default threshold anyways?).
>

Well, it looks like half of the regions are in the 25-32 file range and the
other half just have 1 or 2 files.  This was when we ran it with a
compactionThreshold of 15.

How can I tell by looking at the region server logs if we're seeing a "high
write rate" ?  We've got 48 clients sending load, 12 region servers total.
 We're pushing the system pretty hard.

> > hbase.hstore.blockingStoreFiles
> >
>
> The higher this is, the bigger the price you'll pay if a server
> crashes because this will be the upper bound on how many WAL logs we
> need to split for the server before its regions come back on line
> again.  Leave it default I'd say for now.
>

Ok, we'll leave it default for now.

> > Is there some rule of thumb that I can use to determine good values for
> > these properties?
> >
>
> You've checked out this section of the book:
> http://hbase.apache.org/book.html#performance
>
> Are you filling the machines?   Are they burning cpu?  Or io-bound?
> If not, perhaps open the front gate wider by upping the number of
> concurrent handlers.
>

I have read through that section of the HBase book.  There is plenty of CPU
available.  How do I up the number of concurrent handlers?  Increase
hbase.regionserver.handler.count ?

- Amit

Re: Help with continuous loading configuration

Posted by Stack <st...@duboce.net>.

On Wed, Nov 16, 2011 at 3:26 PM, Amit Jain <ja...@gmail.com> wrote:
> Hi Lars,
>
> The keys are arriving in random order.  The HBase monitoring page shows
> evenly distributed load across all of the region servers.

What kind of ops rates are you seeing?  They are running nice and
smooth across all servers?   No stuttering?   Whats your regionserver
logs look like?

Are you presplitting your table or just letting hbase run and do up the splits?

>  I didn't see
> anything weird in the gc logs, no mention of any failures.  I'm a little
> unclear about what the optimal values for the following properties should
> be:
>
> hbase.hstore.compactionThreshold

Default is 3.  Look in regionserver logs.  See how many files you have
on average by region columnfamily (you could also look in filesystem).
 Are we constantly rewriting them?   If write only load mostly, you
might up this putting off compactions till more files around (but
looking in regionserver logs, if high write rate, we might be having
trouble keeping up with this default threshold anyways?).

> hbase.hstore.blockingStoreFiles
>

The higher this is, the bigger the price you'll pay if a server
crashes because this will be the upper bound on how many WAL logs we
need to split for the server before its regions come back on line
again.  Leave it default I'd say for now.

> Is there some rule of thumb that I can use to determine good values for
> these properties?
>

You've checked out this section of the book:
http://hbase.apache.org/book.html#performance

Are you filling the machines?   Are they burning cpu?  Or io-bound?
If not, perhaps open the front gate wider by upping the number of
concurrent handlers.

St.Ack

Re: Help with continuous loading configuration

Posted by lars hofhansl <lh...@yahoo.com>.

hbase.hstore.blockingStoreFiles is the maximum number of store files HBase will allow before
it will block writes in order to catch up with compacting files. Default is 7.

If this is too low you'll see warning about blocking writers in the logs. I found that for some test load I had, I needed to increase this 20
along with changing hbase.hregion.memstore.block.multiplier to 4 (this allows the memstore to grow larger, be careful with this :) ).


hbase.hstore.compactionThreshold is the number of store files that will trigger a compaction. Changing this won't help with throughput...

But I'll let somebody else jump in with more operational experience.



________________________________
From: Amit Jain <ja...@gmail.com>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Wednesday, November 16, 2011 3:26 PM
Subject: Re: Help with continuous loading configuration

Hi Lars,

The keys are arriving in random order.  The HBase monitoring page shows
evenly distributed load across all of the region servers.  I didn't see
anything weird in the gc logs, no mention of any failures.  I'm a little
unclear about what the optimal values for the following properties should
be:

hbase.hstore.compactionThreshold
hbase.hstore.blockingStoreFiles

Is there some rule of thumb that I can use to determine good values for
these properties?

- Amit

On Wed, Nov 16, 2011 at 3:14 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Hi Amit,
>
> 12MB write buffer might be a bit high.
>
> How are you generating your keys? You might hot spot a single region
> server if (for example) you create
> monotonically increasing keys. When you look at the HBase monitoring page,
> do you see a single region server
> getting all the requests?
>
>
> Anything weird in the GC logs? Do they all log similar?
>
>
> -- Lars
>
>
>
> ________________________________
> From: Amit Jain <ja...@gmail.com>
> To: user@hbase.apache.org
> Sent: Wednesday, November 16, 2011 3:06 PM
> Subject: Help with continuous loading configuration
>
> Hello,
>
> We're doing a proof-of-concept study to see if HBase is a good fit for an
> application we're planning to build.  The application will be recording a
> continuous stream of sensor data throughout the day and the data needs to
> be online immediately.  Our test cluster consists of 16 machines, each with
> 16 cores and 32GB of RAM and 8TB local storage running CDH3u2.  We're using
> the HBase client Put class, and have set the table "auto flush" to false
> and the write buffer size to 12MB.  Here are the region server JVM options:
>
> export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
>
> And here are the property settings that we're using in the hbase-site.xml
> file:
>
> hbase.rootdir=hdfs://master:9000/hbase
> hbase.regionserver.handler.count=20
> hbase.cluster.distributed=true
> hbase.zookeeper.quorum=zk01,zk02,zk03
> hfile.block.cache.size=0
> hbase.hregion.max.filesize=1073741824
> hbase.regionserver.global.memstore.upperLimit=0.79
> hbase.regionserver.global.memstore.lowerLimit=0.70
> hbase.hregion.majorcompaction=0
> hbase.hstore.compactionThreshold=15
> hbase.hstore.blockingStoreFiles=20
> hbase.rpc.timeout=0
> zookeeper.session.timeout=3600000
>
> It's taking about 24 hours to load 4TB of data which isn't quite fast
> enough for our application.  Is there a more optimal configuration that we
> can use to improve loading performance?
>
> - Amit
>

Re: Help with continuous loading configuration

Posted by Amit Jain <ja...@gmail.com>.

Hi Lars,

The keys are arriving in random order.  The HBase monitoring page shows
evenly distributed load across all of the region servers.  I didn't see
anything weird in the gc logs, no mention of any failures.  I'm a little
unclear about what the optimal values for the following properties should
be:

hbase.hstore.compactionThreshold
hbase.hstore.blockingStoreFiles

Is there some rule of thumb that I can use to determine good values for
these properties?

- Amit

On Wed, Nov 16, 2011 at 3:14 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Hi Amit,
>
> 12MB write buffer might be a bit high.
>
> How are you generating your keys? You might hot spot a single region
> server if (for example) you create
> monotonically increasing keys. When you look at the HBase monitoring page,
> do you see a single region server
> getting all the requests?
>
>
> Anything weird in the GC logs? Do they all log similar?
>
>
> -- Lars
>
>
>
> ________________________________
> From: Amit Jain <ja...@gmail.com>
> To: user@hbase.apache.org
> Sent: Wednesday, November 16, 2011 3:06 PM
> Subject: Help with continuous loading configuration
>
> Hello,
>
> We're doing a proof-of-concept study to see if HBase is a good fit for an
> application we're planning to build.  The application will be recording a
> continuous stream of sensor data throughout the day and the data needs to
> be online immediately.  Our test cluster consists of 16 machines, each with
> 16 cores and 32GB of RAM and 8TB local storage running CDH3u2.  We're using
> the HBase client Put class, and have set the table "auto flush" to false
> and the write buffer size to 12MB.  Here are the region server JVM options:
>
> export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
>
> And here are the property settings that we're using in the hbase-site.xml
> file:
>
> hbase.rootdir=hdfs://master:9000/hbase
> hbase.regionserver.handler.count=20
> hbase.cluster.distributed=true
> hbase.zookeeper.quorum=zk01,zk02,zk03
> hfile.block.cache.size=0
> hbase.hregion.max.filesize=1073741824
> hbase.regionserver.global.memstore.upperLimit=0.79
> hbase.regionserver.global.memstore.lowerLimit=0.70
> hbase.hregion.majorcompaction=0
> hbase.hstore.compactionThreshold=15
> hbase.hstore.blockingStoreFiles=20
> hbase.rpc.timeout=0
> zookeeper.session.timeout=3600000
>
> It's taking about 24 hours to load 4TB of data which isn't quite fast
> enough for our application.  Is there a more optimal configuration that we
> can use to improve loading performance?
>
> - Amit
>

Re: Help with continuous loading configuration

Posted by lars hofhansl <lh...@yahoo.com>.

Hi Amit,

12MB write buffer might be a bit high.

How are you generating your keys? You might hot spot a single region server if (for example) you create
monotonically increasing keys. When you look at the HBase monitoring page, do you see a single region server
getting all the requests?


Anything weird in the GC logs? Do they all log similar?


-- Lars



________________________________
From: Amit Jain <ja...@gmail.com>
To: user@hbase.apache.org
Sent: Wednesday, November 16, 2011 3:06 PM
Subject: Help with continuous loading configuration

Hello,

We're doing a proof-of-concept study to see if HBase is a good fit for an
application we're planning to build.  The application will be recording a
continuous stream of sensor data throughout the day and the data needs to
be online immediately.  Our test cluster consists of 16 machines, each with
16 cores and 32GB of RAM and 8TB local storage running CDH3u2.  We're using
the HBase client Put class, and have set the table "auto flush" to false
and the write buffer size to 12MB.  Here are the region server JVM options:

export HBASE_REGIONSERVER_OPTS="-Xmx28g -Xms28g -Xmn128m -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"

And here are the property settings that we're using in the hbase-site.xml
file:

hbase.rootdir=hdfs://master:9000/hbase
hbase.regionserver.handler.count=20
hbase.cluster.distributed=true
hbase.zookeeper.quorum=zk01,zk02,zk03
hfile.block.cache.size=0
hbase.hregion.max.filesize=1073741824
hbase.regionserver.global.memstore.upperLimit=0.79
hbase.regionserver.global.memstore.lowerLimit=0.70
hbase.hregion.majorcompaction=0
hbase.hstore.compactionThreshold=15
hbase.hstore.blockingStoreFiles=20
hbase.rpc.timeout=0
zookeeper.session.timeout=3600000

It's taking about 24 hours to load 4TB of data which isn't quite fast
enough for our application.  Is there a more optimal configuration that we
can use to improve loading performance?

- Amit