You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Matt Corgan <mc...@hotpads.com> on 2010/09/28 01:26:12 UTC

region doesn't split after 32+ GB

I'm sequentially importing ~1 billion small rows (32 byte keys) into a table
called StatAreaModelLink.  I realize that sequential insertion isn't
efficient by design, but I'm not in a hurry so I let it run all weekend.
 It's been proceeding quickly except for ~20s stalls every minute or so.

I also noticed that one regionserver was getting all the load and just
figured that after each split the later region stayed on the current node.
 Turns out the last region stopped splitting altogether and now has a 33gb
store file.

I started importing on 0.20.6, but switched to 0.89.20100726 today.  They
both seem to act similarly.  Using all default settings except VERSIONS=1.

That regionserver's logs constantly say "Compaction requested for region...
because regionserver60020.cacheFlusher"

http://pastebin.com/WJDs7ZbM

Am I doing something wrong, like not giving it enough time to flush/compact?
 There are 23 previous regions that look ok.

The region summary:

StatAreaModelLink
\x00\x00\x07\xD9\x00\x00\x00\x04\x00\x00\x00\x004\x12z\xCF\x00\x00\x00\x09\x00\x00\x00\x00\x00\x00\x00\x00=\xE9C
1285438365987.69034405
stores=1, storefiles=13, storefileSizeMB=34001, memstoreSizeMB=51,
storefileIndexSizeMB=47


Thanks,
Matt

Re: region doesn't split after 32+ GB

Posted by Andrew Purtell <ap...@apache.org>.

Matt,

Since you are using ZooKeeper already, conceivably you could keep a hosts file in ZooKeeper somewhere, use a strategy for updates similar to what is done for implementing locking to insure a new slave gets and updates the latest version "atomically", and use Twitcher to trigger updates on each host: 
   http://github.com/twitter/twitcher

?

Best regards,

    - Andy


> From: Matt Corgan <mc...@hotpads.com>
> Subject: Re: region doesn't split after 32+ GB
> To: "user" <us...@hbase.apache.org>
> Date: Wednesday, September 29, 2010, 11:30 AM
> Thanks for your help again Stack...
> sorry i don't have the logs.  Will do a
> better job of saving them.  By the way, this time the
> insert job maintained
> about 22k rows/sec all night without any pauses, and even
> though it was
> sequential insertion, it did a nice job of rotating the
> active region around
> the cluster.
> 
> As for the hostnames, there are no problems in .89, and
> nothing is onerous
> by any means... we are just trying to come to some level of
> familiarity
> before putting any real data into hbase.
> 
> EC2/RightScale make it very easy to add/remove
> regionservers to the cluster
> with the click of a button, which is the reason that the
> hosts file can
> change more often then you'd want to modify it
> manually.  We're going to go
> the route of having each newly added regionserver append
> it's name to the
> host file of every other server in our EC2 account (~30
> servers).  The only
> downsides I see there are that it doesn't scale very
> elegantly, and that it
> gets complicated if you want to launch multiple
> regionservers or new clients
> at the same time.
> 
> For the sake of brainstorming, maybe it's possible to have
> the master always
> broadcast IP addresses and have all communication done via
> IP.  This may be
> more robust anyway.  Then the first time a new
> regionserver or cient gets an
> unfamiliar IP address, it can try to figure out the
> hostname (the same way
> the master currently does this), and cache it
> somewhere.  The hostname could
> be added alongside the IP address or replace it in the logs
> for convenience.
> 
> Thanks again,
> Matt

Re: region doesn't split after 32+ GB

Posted by Matt Corgan <mc...@hotpads.com>.

Thanks for your help again Stack... sorry i don't have the logs.  Will do a
better job of saving them.  By the way, this time the insert job maintained
about 22k rows/sec all night without any pauses, and even though it was
sequential insertion, it did a nice job of rotating the active region around
the cluster.

As for the hostnames, there are no problems in .89, and nothing is onerous
by any means... we are just trying to come to some level of familiarity
before putting any real data into hbase.

EC2/RightScale make it very easy to add/remove regionservers to the cluster
with the click of a button, which is the reason that the hosts file can
change more often then you'd want to modify it manually.  We're going to go
the route of having each newly added regionserver append it's name to the
host file of every other server in our EC2 account (~30 servers).  The only
downsides I see there are that it doesn't scale very elegantly, and that it
gets complicated if you want to launch multiple regionservers or new clients
at the same time.

For the sake of brainstorming, maybe it's possible to have the master always
broadcast IP addresses and have all communication done via IP.  This may be
more robust anyway.  Then the first time a new regionserver or cient gets an
unfamiliar IP address, it can try to figure out the hostname (the same way
the master currently does this), and cache it somewhere.  The hostname could
be added alongside the IP address or replace it in the logs for convenience.

Thanks again,
Matt

On Wed, Sep 29, 2010 at 12:53 PM, Stack <st...@duboce.net> wrote:

> On Wed, Sep 29, 2010 at 9:22 AM, Matt Corgan <mc...@hotpads.com> wrote:
> > Everything is working fine now.
> >
> > My best guess is that when we upgraded from 0.20.6 to 0.89.20100726 there
> > was a change in hostname resolution (either by hbase, hdfs, or us).
>
> Resolution is done differently in 0.89.
>
> RS checks into master.  Master tells it what it sees as its hostname
> and ever after the RS will use what the master told it when its
> talking tot the master.   Only the master's DNS setup needs make some
> bit of sense.
>
>
> In
> > 0.20.6, our regionservers looked each other up via IP address, but after
> the
> > upgrade it switched to hostname, and some of our servers were not aware
> of
> > each other's hostnames.  Then the CompactSplitThread did the compaction
> part
> > but failed to split because it got an unknown host exception.  Is that
> > plausible?
> >
>
> You have log from that time?
>
>
> > Is there a way to configure it so that regionservers are referenced by IP
> > addresses instead of hostnames?  When we add a regionserver to a running
> > cluster, it's pretty easy to automatically add it's name to the master's
> > hosts file, but it's less reliable to try to add it to all other
> > regionservers and client machines.  Maybe just not even populate the
> > master's hosts file?
> >
> > I guess the downside there is that we'd lose readability in the logs,
> etc..
> >
>
> Well, is there a problem w/ how 0.89 works?
>
> I suppose clients need to be in agreement w/ master regards hostnames.
>  Is that too onerous an expectation?
>
> If master can't resolve hostnames it'll just use IPs.  I suppose you
> could use this fact to force your cluster all IP and I suppose we
> could include a flag to go all IPs all over but I'd be interested in
> how 0.89 naming is failing you so can try fix.
>
> Thanks,
> St.Ack
>
>
> >
> > On Tue, Sep 28, 2010 at 3:15 PM, Matt Corgan <mc...@hotpads.com>
> wrote:
> >
> >> I'll try to reproduce it and capture some comprehensive log files, but
> >> we're testing on EC2 and had terminated some of the servers before
> noticing
> >> what was happening.
> >>
> >> I think it's been doing successful compactions all along because there
> are
> >> only 3 files in that directory.  Here's the hdfs files for that
> particular
> >> table (line 109): http://pastebin.com/8fsDmh6M
> >>
> >> If i stopped inserting to the cluster altogether to give it time to
> >> breathe, is the intended behaviour that the region should be split after
> >> compaction because it's size is greater than 256 MB?  I'll try again to
> >> reproduce, but I'm fairly certain it's just sitting there based on
> >> network/disk/cpu activity.
> >>
> >>
> >> On Tue, Sep 28, 2010 at 12:01 PM, Stack <st...@duboce.net> wrote:
> >>
> >>> On Mon, Sep 27, 2010 at 4:26 PM, Matt Corgan <mc...@hotpads.com>
> wrote:
> >>> > I'm sequentially importing ~1 billion small rows (32 byte keys) into
> a
> >>> table
> >>> > called StatAreaModelLink.  I realize that sequential insertion isn't
> >>> > efficient by design, but I'm not in a hurry so I let it run all
> weekend.
> >>> >  It's been proceeding quickly except for ~20s stalls every minute or
> so.
> >>> >
> >>> > I also noticed that one regionserver was getting all the load and
> just
> >>> > figured that after each split the later region stayed on the current
> >>> node.
> >>> >  Turns out the last region stopped splitting altogether and now has a
> >>> 33gb
> >>> > store file.
> >>> >
> >>>
> >>> Interesting.
> >>>
> >>>
> >>> > I started importing on 0.20.6, but switched to 0.89.20100726 today.
> >>>  They
> >>> > both seem to act similarly.  Using all default settings except
> >>> VERSIONS=1.
> >>> >
> >>> > That regionserver's logs constantly say "Compaction requested for
> >>> region...
> >>> > because regionserver60020.cacheFlusher"
> >>> >
> >>> > http://pastebin.com/WJDs7ZbM
> >>> >
> >>> > Am I doing something wrong, like not giving it enough time to
> >>> flush/compact?
> >>> >  There are 23 previous regions that look ok.
> >>> >
> >>>
> >>> I wonder if a compaction is running and its just taking a long time.
> >>> Grep for 'Starting compaction' in your logs.  See when last started?
> >>>
> >>> I see you continue to flush.  Try taking the load off.
> >>>
> >>> You might also do a:
> >>>
> >>> > bin/hadoop fs -lsr /hbase
> >>>
> >>> ... and pastbin it.  I'd be looking for a region with a bunch of files
> in
> >>> it.
> >>>
> >>> Finally, you've read about the bulk load [1] tool?
> >>>
> >>> St.Ack
> >>>
> >>> 1. http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html
> >>> St.Ack
> >>>
> >>
> >>
> >
>

Re: region doesn't split after 32+ GB

Posted by Stack <st...@duboce.net>.

On Wed, Sep 29, 2010 at 9:22 AM, Matt Corgan <mc...@hotpads.com> wrote:
> Everything is working fine now.
>
> My best guess is that when we upgraded from 0.20.6 to 0.89.20100726 there
> was a change in hostname resolution (either by hbase, hdfs, or us).

Resolution is done differently in 0.89.

RS checks into master.  Master tells it what it sees as its hostname
and ever after the RS will use what the master told it when its
talking tot the master.   Only the master's DNS setup needs make some
bit of sense.


In
> 0.20.6, our regionservers looked each other up via IP address, but after the
> upgrade it switched to hostname, and some of our servers were not aware of
> each other's hostnames.  Then the CompactSplitThread did the compaction part
> but failed to split because it got an unknown host exception.  Is that
> plausible?
>

You have log from that time?


> Is there a way to configure it so that regionservers are referenced by IP
> addresses instead of hostnames?  When we add a regionserver to a running
> cluster, it's pretty easy to automatically add it's name to the master's
> hosts file, but it's less reliable to try to add it to all other
> regionservers and client machines.  Maybe just not even populate the
> master's hosts file?
>
> I guess the downside there is that we'd lose readability in the logs, etc..
>

Well, is there a problem w/ how 0.89 works?

I suppose clients need to be in agreement w/ master regards hostnames.
 Is that too onerous an expectation?

If master can't resolve hostnames it'll just use IPs.  I suppose you
could use this fact to force your cluster all IP and I suppose we
could include a flag to go all IPs all over but I'd be interested in
how 0.89 naming is failing you so can try fix.

Thanks,
St.Ack


>
> On Tue, Sep 28, 2010 at 3:15 PM, Matt Corgan <mc...@hotpads.com> wrote:
>
>> I'll try to reproduce it and capture some comprehensive log files, but
>> we're testing on EC2 and had terminated some of the servers before noticing
>> what was happening.
>>
>> I think it's been doing successful compactions all along because there are
>> only 3 files in that directory.  Here's the hdfs files for that particular
>> table (line 109): http://pastebin.com/8fsDmh6M
>>
>> If i stopped inserting to the cluster altogether to give it time to
>> breathe, is the intended behaviour that the region should be split after
>> compaction because it's size is greater than 256 MB?  I'll try again to
>> reproduce, but I'm fairly certain it's just sitting there based on
>> network/disk/cpu activity.
>>
>>
>> On Tue, Sep 28, 2010 at 12:01 PM, Stack <st...@duboce.net> wrote:
>>
>>> On Mon, Sep 27, 2010 at 4:26 PM, Matt Corgan <mc...@hotpads.com> wrote:
>>> > I'm sequentially importing ~1 billion small rows (32 byte keys) into a
>>> table
>>> > called StatAreaModelLink.  I realize that sequential insertion isn't
>>> > efficient by design, but I'm not in a hurry so I let it run all weekend.
>>> >  It's been proceeding quickly except for ~20s stalls every minute or so.
>>> >
>>> > I also noticed that one regionserver was getting all the load and just
>>> > figured that after each split the later region stayed on the current
>>> node.
>>> >  Turns out the last region stopped splitting altogether and now has a
>>> 33gb
>>> > store file.
>>> >
>>>
>>> Interesting.
>>>
>>>
>>> > I started importing on 0.20.6, but switched to 0.89.20100726 today.
>>>  They
>>> > both seem to act similarly.  Using all default settings except
>>> VERSIONS=1.
>>> >
>>> > That regionserver's logs constantly say "Compaction requested for
>>> region...
>>> > because regionserver60020.cacheFlusher"
>>> >
>>> > http://pastebin.com/WJDs7ZbM
>>> >
>>> > Am I doing something wrong, like not giving it enough time to
>>> flush/compact?
>>> >  There are 23 previous regions that look ok.
>>> >
>>>
>>> I wonder if a compaction is running and its just taking a long time.
>>> Grep for 'Starting compaction' in your logs.  See when last started?
>>>
>>> I see you continue to flush.  Try taking the load off.
>>>
>>> You might also do a:
>>>
>>> > bin/hadoop fs -lsr /hbase
>>>
>>> ... and pastbin it.  I'd be looking for a region with a bunch of files in
>>> it.
>>>
>>> Finally, you've read about the bulk load [1] tool?
>>>
>>> St.Ack
>>>
>>> 1. http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html
>>> St.Ack
>>>
>>
>>
>

Re: region doesn't split after 32+ GB

Posted by Matt Corgan <mc...@hotpads.com>.

Everything is working fine now.

My best guess is that when we upgraded from 0.20.6 to 0.89.20100726 there
was a change in hostname resolution (either by hbase, hdfs, or us).  In
0.20.6, our regionservers looked each other up via IP address, but after the
upgrade it switched to hostname, and some of our servers were not aware of
each other's hostnames.  Then the CompactSplitThread did the compaction part
but failed to split because it got an unknown host exception.  Is that
plausible?

Is there a way to configure it so that regionservers are referenced by IP
addresses instead of hostnames?  When we add a regionserver to a running
cluster, it's pretty easy to automatically add it's name to the master's
hosts file, but it's less reliable to try to add it to all other
regionservers and client machines.  Maybe just not even populate the
master's hosts file?

I guess the downside there is that we'd lose readability in the logs, etc..


On Tue, Sep 28, 2010 at 3:15 PM, Matt Corgan <mc...@hotpads.com> wrote:

> I'll try to reproduce it and capture some comprehensive log files, but
> we're testing on EC2 and had terminated some of the servers before noticing
> what was happening.
>
> I think it's been doing successful compactions all along because there are
> only 3 files in that directory.  Here's the hdfs files for that particular
> table (line 109): http://pastebin.com/8fsDmh6M
>
> If i stopped inserting to the cluster altogether to give it time to
> breathe, is the intended behaviour that the region should be split after
> compaction because it's size is greater than 256 MB?  I'll try again to
> reproduce, but I'm fairly certain it's just sitting there based on
> network/disk/cpu activity.
>
>
> On Tue, Sep 28, 2010 at 12:01 PM, Stack <st...@duboce.net> wrote:
>
>> On Mon, Sep 27, 2010 at 4:26 PM, Matt Corgan <mc...@hotpads.com> wrote:
>> > I'm sequentially importing ~1 billion small rows (32 byte keys) into a
>> table
>> > called StatAreaModelLink.  I realize that sequential insertion isn't
>> > efficient by design, but I'm not in a hurry so I let it run all weekend.
>> >  It's been proceeding quickly except for ~20s stalls every minute or so.
>> >
>> > I also noticed that one regionserver was getting all the load and just
>> > figured that after each split the later region stayed on the current
>> node.
>> >  Turns out the last region stopped splitting altogether and now has a
>> 33gb
>> > store file.
>> >
>>
>> Interesting.
>>
>>
>> > I started importing on 0.20.6, but switched to 0.89.20100726 today.
>>  They
>> > both seem to act similarly.  Using all default settings except
>> VERSIONS=1.
>> >
>> > That regionserver's logs constantly say "Compaction requested for
>> region...
>> > because regionserver60020.cacheFlusher"
>> >
>> > http://pastebin.com/WJDs7ZbM
>> >
>> > Am I doing something wrong, like not giving it enough time to
>> flush/compact?
>> >  There are 23 previous regions that look ok.
>> >
>>
>> I wonder if a compaction is running and its just taking a long time.
>> Grep for 'Starting compaction' in your logs.  See when last started?
>>
>> I see you continue to flush.  Try taking the load off.
>>
>> You might also do a:
>>
>> > bin/hadoop fs -lsr /hbase
>>
>> ... and pastbin it.  I'd be looking for a region with a bunch of files in
>> it.
>>
>> Finally, you've read about the bulk load [1] tool?
>>
>> St.Ack
>>
>> 1. http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html
>> St.Ack
>>
>
>

Re: region doesn't split after 32+ GB

Posted by Matt Corgan <mc...@hotpads.com>.

I'll try to reproduce it and capture some comprehensive log files, but we're
testing on EC2 and had terminated some of the servers before noticing what
was happening.

I think it's been doing successful compactions all along because there are
only 3 files in that directory.  Here's the hdfs files for that particular
table (line 109): http://pastebin.com/8fsDmh6M

If i stopped inserting to the cluster altogether to give it time to breathe,
is the intended behaviour that the region should be split after compaction
because it's size is greater than 256 MB?  I'll try again to reproduce, but
I'm fairly certain it's just sitting there based on network/disk/cpu
activity.

On Tue, Sep 28, 2010 at 12:01 PM, Stack <st...@duboce.net> wrote:

> On Mon, Sep 27, 2010 at 4:26 PM, Matt Corgan <mc...@hotpads.com> wrote:
> > I'm sequentially importing ~1 billion small rows (32 byte keys) into a
> table
> > called StatAreaModelLink.  I realize that sequential insertion isn't
> > efficient by design, but I'm not in a hurry so I let it run all weekend.
> >  It's been proceeding quickly except for ~20s stalls every minute or so.
> >
> > I also noticed that one regionserver was getting all the load and just
> > figured that after each split the later region stayed on the current
> node.
> >  Turns out the last region stopped splitting altogether and now has a
> 33gb
> > store file.
> >
>
> Interesting.
>
>
> > I started importing on 0.20.6, but switched to 0.89.20100726 today.  They
> > both seem to act similarly.  Using all default settings except
> VERSIONS=1.
> >
> > That regionserver's logs constantly say "Compaction requested for
> region...
> > because regionserver60020.cacheFlusher"
> >
> > http://pastebin.com/WJDs7ZbM
> >
> > Am I doing something wrong, like not giving it enough time to
> flush/compact?
> >  There are 23 previous regions that look ok.
> >
>
> I wonder if a compaction is running and its just taking a long time.
> Grep for 'Starting compaction' in your logs.  See when last started?
>
> I see you continue to flush.  Try taking the load off.
>
> You might also do a:
>
> > bin/hadoop fs -lsr /hbase
>
> ... and pastbin it.  I'd be looking for a region with a bunch of files in
> it.
>
> Finally, you've read about the bulk load [1] tool?
>
> St.Ack
>
> 1. http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html
> St.Ack
>

Re: region doesn't split after 32+ GB

Posted by Stack <st...@duboce.net>.

On Mon, Sep 27, 2010 at 4:26 PM, Matt Corgan <mc...@hotpads.com> wrote:
> I'm sequentially importing ~1 billion small rows (32 byte keys) into a table
> called StatAreaModelLink.  I realize that sequential insertion isn't
> efficient by design, but I'm not in a hurry so I let it run all weekend.
>  It's been proceeding quickly except for ~20s stalls every minute or so.
>
> I also noticed that one regionserver was getting all the load and just
> figured that after each split the later region stayed on the current node.
>  Turns out the last region stopped splitting altogether and now has a 33gb
> store file.
>

Interesting.


> I started importing on 0.20.6, but switched to 0.89.20100726 today.  They
> both seem to act similarly.  Using all default settings except VERSIONS=1.
>
> That regionserver's logs constantly say "Compaction requested for region...
> because regionserver60020.cacheFlusher"
>
> http://pastebin.com/WJDs7ZbM
>
> Am I doing something wrong, like not giving it enough time to flush/compact?
>  There are 23 previous regions that look ok.
>

I wonder if a compaction is running and its just taking a long time.
Grep for 'Starting compaction' in your logs.  See when last started?

I see you continue to flush.  Try taking the load off.

You might also do a:

> bin/hadoop fs -lsr /hbase

... and pastbin it.  I'd be looking for a region with a bunch of files in it.

Finally, you've read about the bulk load [1] tool?

St.Ack

1. http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html
St.Ack