You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Rohit Kelkar <ro...@gmail.com> on 2014/02/15 00:52:31 UTC

uneven region distribution

I am using hbase version 0.92.4 on a 5 node cluster. I am seeing that a
particular region server often crashes. A status 'simple' on hbase shell
gives the following stats


HBase Shell; enter 'help<RETURN>' for list of supported commands. Type
"exit<RETURN>" to leave the HBase Shell Version 0.94.2, r1395367, Sun Oct 7
19:11:01 UTC 2012
status 'simple' 4 live servers
server7:60020 1392017875910 requestsPerSecond=0, numberOfOnlineRegions=419,
usedHeapMB=3315, maxHeapMB=6127
server4:60020 1392300859332 requestsPerSecond=843,
numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
server3:60020 1391583646998 requestsPerSecond=429,
numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
server6:60020 1391583647588 requestsPerSecond=0, numberOfOnlineRegions=966,
usedHeapMB=2975, maxHeapMB=6127 1 dead servers
server5,60020,1392108515637 Aggregate load: 1272, regions: 2417

The dead region server has 2417 regions as opposed to 419, 379, 653, 966
regions on other servers. Am I right in attributing the region server crash
to the disproportionately high number of regions on that server?

If I invoke the balancer on hbase shell using the "balancer" command it
returns true. But it does not change the status of the assignments.

- R

Re: uneven region distribution

Posted by divye sheth <di...@gmail.com>.

Typo. Not the total load on machine but the hbase cluster.

Thanks
D
On Feb 15, 2014 9:24 AM, "divye sheth" <di...@gmail.com> wrote:

> The 2417 is the total load on the machine. When the regionserver crashes
> the master autobalances the regions.
>
> Also when you run balancer externally, one thing you should note that the
> balancer runs on a table in a RS. So if the total regions for a table are
> 20 then in your case the mean would be 4. Check using the hbase ui if the
> any table has regions equal to (average +- 1)
>
> Thanks
> D
> On Feb 15, 2014 9:13 AM, "Ted Yu" <yu...@gmail.com> wrote:
>
>> Please take a look at http://hbase.apache.org/book.html#hbase_metrics.
>>
>> You should pay attention to callQueueLength, compactionQueueLength,
>> readRequestsCount and writeRequestsCount.
>>
>> Cheers
>>
>>
>> On Fri, Feb 14, 2014 at 7:13 PM, Rohit Kelkar <ro...@gmail.com>
>> wrote:
>>
>> > It could have been under load because I am not salting the keys. If I
>> were
>> > in a position to replicate this issue what metrics should I capture so
>> > that I find whether it was under load?
>> >
>> > - R
>> >
>> > On Friday, February 14, 2014, Ted Yu <yu...@gmail.com> wrote:
>> >
>> > > From region server log - was server5 under heavy load ?
>> > >
>> > >
>> > >    1. 2014-02-14 16:06:05,700 WARN
>> org.apache.hadoop.hbase.util.Sleeper:
>> > We
>> > >    slept 99984ms instead of 3000ms, this is likely due to a long
>> garbage
>> > >    collecting pause and it's usually bad, see
>> > >    http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>> > >    2. ...
>> > >    3. 2014-02-14 16:06:05,783 FATAL
>> > >    org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>> > > server
>> > >    server5,60020,1392355987269: Unhandled exception:
>> > >    org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
>> rejected;
>> > >    currently processing server5,60020,1392355987269 as dead server
>> > >
>> > >
>> > >
>> > > On Fri, Feb 14, 2014 at 5:00 PM, Rohit Kelkar <rohitkelkar@gmail.com
>> > <javascript:;>>
>> > > wrote:
>> > >
>> > > > Thanks for your inputs,
>> > > > I am sharing the master log - http://pastebin.com/Xi9P6Ykr
>> > > > and the region server log of the failed region server -
>> > > > http://pastebin.com/1munghDv
>> > > >
>> > > > - R
>> > > >
>> > > >
>> > > > On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu <yuzhihong@gmail.com
>> > <javascript:;>>
>> > > wrote:
>> > > >
>> > > > > Looking at bug fix since 0.94.2, I wonder if you are experiencing
>> the
>> > > > > following which went into 0.94.10 :
>> > > > > HBASE-8432 a table with unbalanced regions will balance
>> indefinitely
>> > > > >
>> > > > > Master log would tell us more.
>> > > > >
>> > > > >
>> > > > > On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar <
>> rohitkelkar@gmail.com
>> > <javascript:;>
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Sorry mis-stated the version, its 0.94.2
>> > > > > >
>> > > > > > - R
>> > > > > >
>> > > > > >
>> > > > > > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <yuzhihong@gmail.com
>> > <javascript:;>>
>> > > wrote:
>> > > > > >
>> > > > > > > bq.  it does not change the status of the assignments.
>> > > > > > >
>> > > > > > > Can you check / pastebin master log to see what caused the
>> > > balancing
>> > > > to
>> > > > > > > stop ?
>> > > > > > >
>> > > > > > > bq. attributing the region server crash to the
>> disproportionately
>> > > > high
>> > > > > > > number of regions on that server?
>> > > > > > >
>> > > > > > > Checking region server log on server5 should give us more
>> clue.
>> > > > > > >
>> > > > > > > bq. 0.92.4
>> > > > > > >
>> > > > > > > please consider upgrading :-)
>> > > > > > >
>> > > > > > >
>> > > > > > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <
>> > > rohitkelkar@gmail.com <javascript:;>
>> > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > I am using hbase version 0.92.4 on a 5 node cluster. I am
>> > seeing
>> > > > > that a
>> > > > > > > > particular region server often crashes. A status 'simple' on
>> > > hbase
>> > > > > > shell
>> > > > > > > > gives the following stats
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > HBase Shell; enter 'help<RETURN>' for list of supported
>> > commands.
>> > > > > Type
>> > > > > > > > "exit<RETURN>" to leave the HBase Shell Version 0.94.2,
>> > r1395367,
>> > > > Sun
>> > > > > > > Oct 7
>> > > > > > > > 19:11:01 UTC 2012
>> > > > > > > > status 'simple' 4 live servers
>> > > > > > > > server7:60020 1392017875910 requestsPerSecond=0,
>> > > > > > > numberOfOnlineRegions=419,
>> > > > > > > > usedHeapMB=3315, maxHeapMB=6127
>> > > > > > > > server4:60020 1392300859332 requestsPerSecond=843,
>> > > > > > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
>> > > > > > > > server3:60020 1391583646998 requestsPerSecond=429,
>> > > > > > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
>> > > > > > > > server6:60020 1391583647588 requestsPerSecond=0,
>> > > > > > > numberOfOnlineRegions=966,
>> > > > > > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
>> > > > > > > > server5,60020,1392108515637 Aggregate load: 1272, regions:
>> 2417
>> > > > > > > >
>> > > > > > > > The dead region server has 2417 regions as opposed to 419,
>> 379,
>> > > > 653,
>> > > > > > 966
>> > > > > > > > regions on other servers. Am I right in attributing the
>> region
>> > > > server
>> > > > > > > crash
>> > > > > > > > to the disproportionately high number of regions on that
>> > server?
>> > > > > > > >
>> > > > > > > > If I invoke the balancer on hbase shell using the "balancer"
>> > > > command
>> > > > > it
>> > > > > > > > returns true. But it does not change the status of the
>> > > assignments.
>> > > > > > > >
>> > > > > > > > - R
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: uneven region distribution

Posted by divye sheth <di...@gmail.com>.

The 2417 is the total load on the machine. When the regionserver crashes
the master autobalances the regions.

Also when you run balancer externally, one thing you should note that the
balancer runs on a table in a RS. So if the total regions for a table are
20 then in your case the mean would be 4. Check using the hbase ui if the
any table has regions equal to (average +- 1)

Thanks
D
On Feb 15, 2014 9:13 AM, "Ted Yu" <yu...@gmail.com> wrote:

> Please take a look at http://hbase.apache.org/book.html#hbase_metrics.
>
> You should pay attention to callQueueLength, compactionQueueLength,
> readRequestsCount and writeRequestsCount.
>
> Cheers
>
>
> On Fri, Feb 14, 2014 at 7:13 PM, Rohit Kelkar <ro...@gmail.com>
> wrote:
>
> > It could have been under load because I am not salting the keys. If I
> were
> > in a position to replicate this issue what metrics should I capture so
> > that I find whether it was under load?
> >
> > - R
> >
> > On Friday, February 14, 2014, Ted Yu <yu...@gmail.com> wrote:
> >
> > > From region server log - was server5 under heavy load ?
> > >
> > >
> > >    1. 2014-02-14 16:06:05,700 WARN
> org.apache.hadoop.hbase.util.Sleeper:
> > We
> > >    slept 99984ms instead of 3000ms, this is likely due to a long
> garbage
> > >    collecting pause and it's usually bad, see
> > >    http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > >    2. ...
> > >    3. 2014-02-14 16:06:05,783 FATAL
> > >    org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > > server
> > >    server5,60020,1392355987269: Unhandled exception:
> > >    org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > >    currently processing server5,60020,1392355987269 as dead server
> > >
> > >
> > >
> > > On Fri, Feb 14, 2014 at 5:00 PM, Rohit Kelkar <rohitkelkar@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > Thanks for your inputs,
> > > > I am sharing the master log - http://pastebin.com/Xi9P6Ykr
> > > > and the region server log of the failed region server -
> > > > http://pastebin.com/1munghDv
> > > >
> > > > - R
> > > >
> > > >
> > > > On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu <yuzhihong@gmail.com
> > <javascript:;>>
> > > wrote:
> > > >
> > > > > Looking at bug fix since 0.94.2, I wonder if you are experiencing
> the
> > > > > following which went into 0.94.10 :
> > > > > HBASE-8432 a table with unbalanced regions will balance
> indefinitely
> > > > >
> > > > > Master log would tell us more.
> > > > >
> > > > >
> > > > > On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar <
> rohitkelkar@gmail.com
> > <javascript:;>
> > > >
> > > > > wrote:
> > > > >
> > > > > > Sorry mis-stated the version, its 0.94.2
> > > > > >
> > > > > > - R
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <yuzhihong@gmail.com
> > <javascript:;>>
> > > wrote:
> > > > > >
> > > > > > > bq.  it does not change the status of the assignments.
> > > > > > >
> > > > > > > Can you check / pastebin master log to see what caused the
> > > balancing
> > > > to
> > > > > > > stop ?
> > > > > > >
> > > > > > > bq. attributing the region server crash to the
> disproportionately
> > > > high
> > > > > > > number of regions on that server?
> > > > > > >
> > > > > > > Checking region server log on server5 should give us more clue.
> > > > > > >
> > > > > > > bq. 0.92.4
> > > > > > >
> > > > > > > please consider upgrading :-)
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <
> > > rohitkelkar@gmail.com <javascript:;>
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I am using hbase version 0.92.4 on a 5 node cluster. I am
> > seeing
> > > > > that a
> > > > > > > > particular region server often crashes. A status 'simple' on
> > > hbase
> > > > > > shell
> > > > > > > > gives the following stats
> > > > > > > >
> > > > > > > >
> > > > > > > > HBase Shell; enter 'help<RETURN>' for list of supported
> > commands.
> > > > > Type
> > > > > > > > "exit<RETURN>" to leave the HBase Shell Version 0.94.2,
> > r1395367,
> > > > Sun
> > > > > > > Oct 7
> > > > > > > > 19:11:01 UTC 2012
> > > > > > > > status 'simple' 4 live servers
> > > > > > > > server7:60020 1392017875910 requestsPerSecond=0,
> > > > > > > numberOfOnlineRegions=419,
> > > > > > > > usedHeapMB=3315, maxHeapMB=6127
> > > > > > > > server4:60020 1392300859332 requestsPerSecond=843,
> > > > > > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
> > > > > > > > server3:60020 1391583646998 requestsPerSecond=429,
> > > > > > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
> > > > > > > > server6:60020 1391583647588 requestsPerSecond=0,
> > > > > > > numberOfOnlineRegions=966,
> > > > > > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
> > > > > > > > server5,60020,1392108515637 Aggregate load: 1272, regions:
> 2417
> > > > > > > >
> > > > > > > > The dead region server has 2417 regions as opposed to 419,
> 379,
> > > > 653,
> > > > > > 966
> > > > > > > > regions on other servers. Am I right in attributing the
> region
> > > > server
> > > > > > > crash
> > > > > > > > to the disproportionately high number of regions on that
> > server?
> > > > > > > >
> > > > > > > > If I invoke the balancer on hbase shell using the "balancer"
> > > > command
> > > > > it
> > > > > > > > returns true. But it does not change the status of the
> > > assignments.
> > > > > > > >
> > > > > > > > - R
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: uneven region distribution

Posted by Ted Yu <yu...@gmail.com>.

Please take a look at http://hbase.apache.org/book.html#hbase_metrics.

You should pay attention to callQueueLength, compactionQueueLength,
readRequestsCount and writeRequestsCount.

Cheers


On Fri, Feb 14, 2014 at 7:13 PM, Rohit Kelkar <ro...@gmail.com> wrote:

> It could have been under load because I am not salting the keys. If I were
> in a position to replicate this issue what metrics should I capture so
> that I find whether it was under load?
>
> - R
>
> On Friday, February 14, 2014, Ted Yu <yu...@gmail.com> wrote:
>
> > From region server log - was server5 under heavy load ?
> >
> >
> >    1. 2014-02-14 16:06:05,700 WARN org.apache.hadoop.hbase.util.Sleeper:
> We
> >    slept 99984ms instead of 3000ms, this is likely due to a long garbage
> >    collecting pause and it's usually bad, see
> >    http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> >    2. ...
> >    3. 2014-02-14 16:06:05,783 FATAL
> >    org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > server
> >    server5,60020,1392355987269: Unhandled exception:
> >    org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> >    currently processing server5,60020,1392355987269 as dead server
> >
> >
> >
> > On Fri, Feb 14, 2014 at 5:00 PM, Rohit Kelkar <rohitkelkar@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Thanks for your inputs,
> > > I am sharing the master log - http://pastebin.com/Xi9P6Ykr
> > > and the region server log of the failed region server -
> > > http://pastebin.com/1munghDv
> > >
> > > - R
> > >
> > >
> > > On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu <yuzhihong@gmail.com
> <javascript:;>>
> > wrote:
> > >
> > > > Looking at bug fix since 0.94.2, I wonder if you are experiencing the
> > > > following which went into 0.94.10 :
> > > > HBASE-8432 a table with unbalanced regions will balance indefinitely
> > > >
> > > > Master log would tell us more.
> > > >
> > > >
> > > > On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar <rohitkelkar@gmail.com
> <javascript:;>
> > >
> > > > wrote:
> > > >
> > > > > Sorry mis-stated the version, its 0.94.2
> > > > >
> > > > > - R
> > > > >
> > > > >
> > > > > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <yuzhihong@gmail.com
> <javascript:;>>
> > wrote:
> > > > >
> > > > > > bq.  it does not change the status of the assignments.
> > > > > >
> > > > > > Can you check / pastebin master log to see what caused the
> > balancing
> > > to
> > > > > > stop ?
> > > > > >
> > > > > > bq. attributing the region server crash to the disproportionately
> > > high
> > > > > > number of regions on that server?
> > > > > >
> > > > > > Checking region server log on server5 should give us more clue.
> > > > > >
> > > > > > bq. 0.92.4
> > > > > >
> > > > > > please consider upgrading :-)
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <
> > rohitkelkar@gmail.com <javascript:;>
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I am using hbase version 0.92.4 on a 5 node cluster. I am
> seeing
> > > > that a
> > > > > > > particular region server often crashes. A status 'simple' on
> > hbase
> > > > > shell
> > > > > > > gives the following stats
> > > > > > >
> > > > > > >
> > > > > > > HBase Shell; enter 'help<RETURN>' for list of supported
> commands.
> > > > Type
> > > > > > > "exit<RETURN>" to leave the HBase Shell Version 0.94.2,
> r1395367,
> > > Sun
> > > > > > Oct 7
> > > > > > > 19:11:01 UTC 2012
> > > > > > > status 'simple' 4 live servers
> > > > > > > server7:60020 1392017875910 requestsPerSecond=0,
> > > > > > numberOfOnlineRegions=419,
> > > > > > > usedHeapMB=3315, maxHeapMB=6127
> > > > > > > server4:60020 1392300859332 requestsPerSecond=843,
> > > > > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
> > > > > > > server3:60020 1391583646998 requestsPerSecond=429,
> > > > > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
> > > > > > > server6:60020 1391583647588 requestsPerSecond=0,
> > > > > > numberOfOnlineRegions=966,
> > > > > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
> > > > > > > server5,60020,1392108515637 Aggregate load: 1272, regions: 2417
> > > > > > >
> > > > > > > The dead region server has 2417 regions as opposed to 419, 379,
> > > 653,
> > > > > 966
> > > > > > > regions on other servers. Am I right in attributing the region
> > > server
> > > > > > crash
> > > > > > > to the disproportionately high number of regions on that
> server?
> > > > > > >
> > > > > > > If I invoke the balancer on hbase shell using the "balancer"
> > > command
> > > > it
> > > > > > > returns true. But it does not change the status of the
> > assignments.
> > > > > > >
> > > > > > > - R
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: uneven region distribution

Posted by Rohit Kelkar <ro...@gmail.com>.

It could have been under load because I am not salting the keys. If I were
in a position to replicate this issue what metrics should I capture so
that I find whether it was under load?

- R

On Friday, February 14, 2014, Ted Yu <yu...@gmail.com> wrote:

> From region server log - was server5 under heavy load ?
>
>
>    1. 2014-02-14 16:06:05,700 WARN org.apache.hadoop.hbase.util.Sleeper: We
>    slept 99984ms instead of 3000ms, this is likely due to a long garbage
>    collecting pause and it's usually bad, see
>    http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
>    2. ...
>    3. 2014-02-14 16:06:05,783 FATAL
>    org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> server
>    server5,60020,1392355987269: Unhandled exception:
>    org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
>    currently processing server5,60020,1392355987269 as dead server
>
>
>
> On Fri, Feb 14, 2014 at 5:00 PM, Rohit Kelkar <rohitkelkar@gmail.com<javascript:;>>
> wrote:
>
> > Thanks for your inputs,
> > I am sharing the master log - http://pastebin.com/Xi9P6Ykr
> > and the region server log of the failed region server -
> > http://pastebin.com/1munghDv
> >
> > - R
> >
> >
> > On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu <yuzhihong@gmail.com<javascript:;>>
> wrote:
> >
> > > Looking at bug fix since 0.94.2, I wonder if you are experiencing the
> > > following which went into 0.94.10 :
> > > HBASE-8432 a table with unbalanced regions will balance indefinitely
> > >
> > > Master log would tell us more.
> > >
> > >
> > > On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar <rohitkelkar@gmail.com<javascript:;>
> >
> > > wrote:
> > >
> > > > Sorry mis-stated the version, its 0.94.2
> > > >
> > > > - R
> > > >
> > > >
> > > > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <yuzhihong@gmail.com<javascript:;>>
> wrote:
> > > >
> > > > > bq.  it does not change the status of the assignments.
> > > > >
> > > > > Can you check / pastebin master log to see what caused the
> balancing
> > to
> > > > > stop ?
> > > > >
> > > > > bq. attributing the region server crash to the disproportionately
> > high
> > > > > number of regions on that server?
> > > > >
> > > > > Checking region server log on server5 should give us more clue.
> > > > >
> > > > > bq. 0.92.4
> > > > >
> > > > > please consider upgrading :-)
> > > > >
> > > > >
> > > > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <
> rohitkelkar@gmail.com <javascript:;>
> > >
> > > > > wrote:
> > > > >
> > > > > > I am using hbase version 0.92.4 on a 5 node cluster. I am seeing
> > > that a
> > > > > > particular region server often crashes. A status 'simple' on
> hbase
> > > > shell
> > > > > > gives the following stats
> > > > > >
> > > > > >
> > > > > > HBase Shell; enter 'help<RETURN>' for list of supported commands.
> > > Type
> > > > > > "exit<RETURN>" to leave the HBase Shell Version 0.94.2, r1395367,
> > Sun
> > > > > Oct 7
> > > > > > 19:11:01 UTC 2012
> > > > > > status 'simple' 4 live servers
> > > > > > server7:60020 1392017875910 requestsPerSecond=0,
> > > > > numberOfOnlineRegions=419,
> > > > > > usedHeapMB=3315, maxHeapMB=6127
> > > > > > server4:60020 1392300859332 requestsPerSecond=843,
> > > > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
> > > > > > server3:60020 1391583646998 requestsPerSecond=429,
> > > > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
> > > > > > server6:60020 1391583647588 requestsPerSecond=0,
> > > > > numberOfOnlineRegions=966,
> > > > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
> > > > > > server5,60020,1392108515637 Aggregate load: 1272, regions: 2417
> > > > > >
> > > > > > The dead region server has 2417 regions as opposed to 419, 379,
> > 653,
> > > > 966
> > > > > > regions on other servers. Am I right in attributing the region
> > server
> > > > > crash
> > > > > > to the disproportionately high number of regions on that server?
> > > > > >
> > > > > > If I invoke the balancer on hbase shell using the "balancer"
> > command
> > > it
> > > > > > returns true. But it does not change the status of the
> assignments.
> > > > > >
> > > > > > - R
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: uneven region distribution

Posted by Ted Yu <yu...@gmail.com>.

>From region server log - was server5 under heavy load ?


   1. 2014-02-14 16:06:05,700 WARN org.apache.hadoop.hbase.util.Sleeper: We
   slept 99984ms instead of 3000ms, this is likely due to a long garbage
   collecting pause and it's usually bad, see
   http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
   2. ...
   3. 2014-02-14 16:06:05,783 FATAL
   org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
   server5,60020,1392355987269: Unhandled exception:
   org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
   currently processing server5,60020,1392355987269 as dead server



On Fri, Feb 14, 2014 at 5:00 PM, Rohit Kelkar <ro...@gmail.com> wrote:

> Thanks for your inputs,
> I am sharing the master log - http://pastebin.com/Xi9P6Ykr
> and the region server log of the failed region server -
> http://pastebin.com/1munghDv
>
> - R
>
>
> On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Looking at bug fix since 0.94.2, I wonder if you are experiencing the
> > following which went into 0.94.10 :
> > HBASE-8432 a table with unbalanced regions will balance indefinitely
> >
> > Master log would tell us more.
> >
> >
> > On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar <ro...@gmail.com>
> > wrote:
> >
> > > Sorry mis-stated the version, its 0.94.2
> > >
> > > - R
> > >
> > >
> > > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > bq.  it does not change the status of the assignments.
> > > >
> > > > Can you check / pastebin master log to see what caused the balancing
> to
> > > > stop ?
> > > >
> > > > bq. attributing the region server crash to the disproportionately
> high
> > > > number of regions on that server?
> > > >
> > > > Checking region server log on server5 should give us more clue.
> > > >
> > > > bq. 0.92.4
> > > >
> > > > please consider upgrading :-)
> > > >
> > > >
> > > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <rohitkelkar@gmail.com
> >
> > > > wrote:
> > > >
> > > > > I am using hbase version 0.92.4 on a 5 node cluster. I am seeing
> > that a
> > > > > particular region server often crashes. A status 'simple' on hbase
> > > shell
> > > > > gives the following stats
> > > > >
> > > > >
> > > > > HBase Shell; enter 'help<RETURN>' for list of supported commands.
> > Type
> > > > > "exit<RETURN>" to leave the HBase Shell Version 0.94.2, r1395367,
> Sun
> > > > Oct 7
> > > > > 19:11:01 UTC 2012
> > > > > status 'simple' 4 live servers
> > > > > server7:60020 1392017875910 requestsPerSecond=0,
> > > > numberOfOnlineRegions=419,
> > > > > usedHeapMB=3315, maxHeapMB=6127
> > > > > server4:60020 1392300859332 requestsPerSecond=843,
> > > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
> > > > > server3:60020 1391583646998 requestsPerSecond=429,
> > > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
> > > > > server6:60020 1391583647588 requestsPerSecond=0,
> > > > numberOfOnlineRegions=966,
> > > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
> > > > > server5,60020,1392108515637 Aggregate load: 1272, regions: 2417
> > > > >
> > > > > The dead region server has 2417 regions as opposed to 419, 379,
> 653,
> > > 966
> > > > > regions on other servers. Am I right in attributing the region
> server
> > > > crash
> > > > > to the disproportionately high number of regions on that server?
> > > > >
> > > > > If I invoke the balancer on hbase shell using the "balancer"
> command
> > it
> > > > > returns true. But it does not change the status of the assignments.
> > > > >
> > > > > - R
> > > > >
> > > >
> > >
> >
>

Re: uneven region distribution

Posted by Rohit Kelkar <ro...@gmail.com>.

Thanks for your inputs,
I am sharing the master log - http://pastebin.com/Xi9P6Ykr
and the region server log of the failed region server -
http://pastebin.com/1munghDv

- R


On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu <yu...@gmail.com> wrote:

> Looking at bug fix since 0.94.2, I wonder if you are experiencing the
> following which went into 0.94.10 :
> HBASE-8432 a table with unbalanced regions will balance indefinitely
>
> Master log would tell us more.
>
>
> On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar <ro...@gmail.com>
> wrote:
>
> > Sorry mis-stated the version, its 0.94.2
> >
> > - R
> >
> >
> > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > bq.  it does not change the status of the assignments.
> > >
> > > Can you check / pastebin master log to see what caused the balancing to
> > > stop ?
> > >
> > > bq. attributing the region server crash to the disproportionately high
> > > number of regions on that server?
> > >
> > > Checking region server log on server5 should give us more clue.
> > >
> > > bq. 0.92.4
> > >
> > > please consider upgrading :-)
> > >
> > >
> > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <ro...@gmail.com>
> > > wrote:
> > >
> > > > I am using hbase version 0.92.4 on a 5 node cluster. I am seeing
> that a
> > > > particular region server often crashes. A status 'simple' on hbase
> > shell
> > > > gives the following stats
> > > >
> > > >
> > > > HBase Shell; enter 'help<RETURN>' for list of supported commands.
> Type
> > > > "exit<RETURN>" to leave the HBase Shell Version 0.94.2, r1395367, Sun
> > > Oct 7
> > > > 19:11:01 UTC 2012
> > > > status 'simple' 4 live servers
> > > > server7:60020 1392017875910 requestsPerSecond=0,
> > > numberOfOnlineRegions=419,
> > > > usedHeapMB=3315, maxHeapMB=6127
> > > > server4:60020 1392300859332 requestsPerSecond=843,
> > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
> > > > server3:60020 1391583646998 requestsPerSecond=429,
> > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
> > > > server6:60020 1391583647588 requestsPerSecond=0,
> > > numberOfOnlineRegions=966,
> > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
> > > > server5,60020,1392108515637 Aggregate load: 1272, regions: 2417
> > > >
> > > > The dead region server has 2417 regions as opposed to 419, 379, 653,
> > 966
> > > > regions on other servers. Am I right in attributing the region server
> > > crash
> > > > to the disproportionately high number of regions on that server?
> > > >
> > > > If I invoke the balancer on hbase shell using the "balancer" command
> it
> > > > returns true. But it does not change the status of the assignments.
> > > >
> > > > - R
> > > >
> > >
> >
>

Re: uneven region distribution

Posted by Ted Yu <yu...@gmail.com>.

Looking at bug fix since 0.94.2, I wonder if you are experiencing the
following which went into 0.94.10 :
HBASE-8432 a table with unbalanced regions will balance indefinitely

Master log would tell us more.


On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar <ro...@gmail.com> wrote:

> Sorry mis-stated the version, its 0.94.2
>
> - R
>
>
> On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq.  it does not change the status of the assignments.
> >
> > Can you check / pastebin master log to see what caused the balancing to
> > stop ?
> >
> > bq. attributing the region server crash to the disproportionately high
> > number of regions on that server?
> >
> > Checking region server log on server5 should give us more clue.
> >
> > bq. 0.92.4
> >
> > please consider upgrading :-)
> >
> >
> > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <ro...@gmail.com>
> > wrote:
> >
> > > I am using hbase version 0.92.4 on a 5 node cluster. I am seeing that a
> > > particular region server often crashes. A status 'simple' on hbase
> shell
> > > gives the following stats
> > >
> > >
> > > HBase Shell; enter 'help<RETURN>' for list of supported commands. Type
> > > "exit<RETURN>" to leave the HBase Shell Version 0.94.2, r1395367, Sun
> > Oct 7
> > > 19:11:01 UTC 2012
> > > status 'simple' 4 live servers
> > > server7:60020 1392017875910 requestsPerSecond=0,
> > numberOfOnlineRegions=419,
> > > usedHeapMB=3315, maxHeapMB=6127
> > > server4:60020 1392300859332 requestsPerSecond=843,
> > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
> > > server3:60020 1391583646998 requestsPerSecond=429,
> > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
> > > server6:60020 1391583647588 requestsPerSecond=0,
> > numberOfOnlineRegions=966,
> > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
> > > server5,60020,1392108515637 Aggregate load: 1272, regions: 2417
> > >
> > > The dead region server has 2417 regions as opposed to 419, 379, 653,
> 966
> > > regions on other servers. Am I right in attributing the region server
> > crash
> > > to the disproportionately high number of regions on that server?
> > >
> > > If I invoke the balancer on hbase shell using the "balancer" command it
> > > returns true. But it does not change the status of the assignments.
> > >
> > > - R
> > >
> >
>

Re: uneven region distribution

Posted by Rohit Kelkar <ro...@gmail.com>.

Sorry mis-stated the version, its 0.94.2

- R


On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <yu...@gmail.com> wrote:

> bq.  it does not change the status of the assignments.
>
> Can you check / pastebin master log to see what caused the balancing to
> stop ?
>
> bq. attributing the region server crash to the disproportionately high
> number of regions on that server?
>
> Checking region server log on server5 should give us more clue.
>
> bq. 0.92.4
>
> please consider upgrading :-)
>
>
> On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <ro...@gmail.com>
> wrote:
>
> > I am using hbase version 0.92.4 on a 5 node cluster. I am seeing that a
> > particular region server often crashes. A status 'simple' on hbase shell
> > gives the following stats
> >
> >
> > HBase Shell; enter 'help<RETURN>' for list of supported commands. Type
> > "exit<RETURN>" to leave the HBase Shell Version 0.94.2, r1395367, Sun
> Oct 7
> > 19:11:01 UTC 2012
> > status 'simple' 4 live servers
> > server7:60020 1392017875910 requestsPerSecond=0,
> numberOfOnlineRegions=419,
> > usedHeapMB=3315, maxHeapMB=6127
> > server4:60020 1392300859332 requestsPerSecond=843,
> > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
> > server3:60020 1391583646998 requestsPerSecond=429,
> > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
> > server6:60020 1391583647588 requestsPerSecond=0,
> numberOfOnlineRegions=966,
> > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
> > server5,60020,1392108515637 Aggregate load: 1272, regions: 2417
> >
> > The dead region server has 2417 regions as opposed to 419, 379, 653, 966
> > regions on other servers. Am I right in attributing the region server
> crash
> > to the disproportionately high number of regions on that server?
> >
> > If I invoke the balancer on hbase shell using the "balancer" command it
> > returns true. But it does not change the status of the assignments.
> >
> > - R
> >
>

Re: uneven region distribution

Posted by Ted Yu <yu...@gmail.com>.

bq.  it does not change the status of the assignments.

Can you check / pastebin master log to see what caused the balancing to
stop ?

bq. attributing the region server crash to the disproportionately high
number of regions on that server?

Checking region server log on server5 should give us more clue.

bq. 0.92.4

please consider upgrading :-)


On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <ro...@gmail.com> wrote:

> I am using hbase version 0.92.4 on a 5 node cluster. I am seeing that a
> particular region server often crashes. A status 'simple' on hbase shell
> gives the following stats
>
>
> HBase Shell; enter 'help<RETURN>' for list of supported commands. Type
> "exit<RETURN>" to leave the HBase Shell Version 0.94.2, r1395367, Sun Oct 7
> 19:11:01 UTC 2012
> status 'simple' 4 live servers
> server7:60020 1392017875910 requestsPerSecond=0, numberOfOnlineRegions=419,
> usedHeapMB=3315, maxHeapMB=6127
> server4:60020 1392300859332 requestsPerSecond=843,
> numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
> server3:60020 1391583646998 requestsPerSecond=429,
> numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
> server6:60020 1391583647588 requestsPerSecond=0, numberOfOnlineRegions=966,
> usedHeapMB=2975, maxHeapMB=6127 1 dead servers
> server5,60020,1392108515637 Aggregate load: 1272, regions: 2417
>
> The dead region server has 2417 regions as opposed to 419, 379, 653, 966
> regions on other servers. Am I right in attributing the region server crash
> to the disproportionately high number of regions on that server?
>
> If I invoke the balancer on hbase shell using the "balancer" command it
> returns true. But it does not change the status of the assignments.
>
> - R
>