You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2009/09/25 18:56:16 UTC

[jira] Created: (HBASE-1868) Spew about rebalancing but none done....

Spew about rebalancing but none done....
----------------------------------------

                 Key: HBASE-1868
                 URL: https://issues.apache.org/jira/browse/HBASE-1868
             Project: Hadoop HBase
          Issue Type: Bug
         Environment: 0.20.0 RC2
            Reporter: stack


I'm seeing loads of this in logs:

{code}
2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
{code}

Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.

Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.

{code}
2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
...
{code}

Then back to the 'will be unloaded'... message.


A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.

Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1868) Spew about rebalancing but none done....

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760828#action_12760828 ] 

stack commented on HBASE-1868:
------------------------------

Spent some time trying to manufacture the condition to no avail.  I'm thinking of committing this patch as is.  It flags if we hit the 'condition'.

> Spew about rebalancing but none done....
> ----------------------------------------
>
>                 Key: HBASE-1868
>                 URL: https://issues.apache.org/jira/browse/HBASE-1868
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 0.20.0 RC2
>            Reporter: stack
>             Fix For: 0.20.1
>
>         Attachments: 1868.patch
>
>
> I'm seeing loads of this in logs:
> {code}
> 2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
> {code}
> Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.
> Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.
> {code}
> 2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
> 828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
> 2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
> sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> 2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
> ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> ...
> {code}
> Then back to the 'will be unloaded'... message.
> A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.
> Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1868) Spew about rebalancing but none done....

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760756#action_12760756 ] 

stack commented on HBASE-1868:
------------------------------

Working with Cheddar over in IRC, he figured what was up here.  Down in assignRegionsToMultipleServers, if nRegions == nRegionsToAssign, we don't assign.

It also looks like if nRegion > nRegionsToAssign, we have same issue:

{code}
22:28 < cheddar> LOL, St^Ack, the plot thickens, my master is now in a loop with
22:28 < cheddar> 09/09/29 22:28:35 DEBUG master.RegionManager: Assigning for address: 10.18.52.61:60020, startcode: 1254261590467, load: (requests=0,
                 regions=37, usedHeap=134, maxHeap=2009): total nregions to assign=2, nregions to reach balance=6, isMetaAssign=false
22:29 < cheddar> so, it's not just when nRegionsToAssign == nregions, but also when nregions > nRegionsToAssign
{code}

> Spew about rebalancing but none done....
> ----------------------------------------
>
>                 Key: HBASE-1868
>                 URL: https://issues.apache.org/jira/browse/HBASE-1868
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 0.20.0 RC2
>            Reporter: stack
>             Fix For: 0.20.1
>
>
> I'm seeing loads of this in logs:
> {code}
> 2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
> {code}
> Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.
> Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.
> {code}
> 2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
> 828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
> 2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
> sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> 2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
> ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> ...
> {code}
> Then back to the 'will be unloaded'... message.
> A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.
> Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1868) Spew about rebalancing but none done....

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759628#action_12759628 ] 

stack commented on HBASE-1868:
------------------------------

Looking.. but this seems to be at least part of why things were off:

{code}
2009-09-25 00:46:26,187 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 10 on 20020, call getClosestRowBefore([B@1c5fbf52, [B@5209af0b, [B@79c325
40) from XX.XX.XX.126:39465: error: java.io.IOException: Could not obtain block: blk_-4524474748365854543_61342811 file=/hbase/coral_hbase/.META./102878519
2/info/1652111193973935565
java.io.IOException: Could not obtain block: blk_-4524474748365854543_61342811 file=/hbase/coral_hbase/.META./1028785192/info/1652111193973935565
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1757)  
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1585)  
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1712)
        at java.io.DataInputStream.read(Unknown Source)
        at org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:99)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
        at org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:979)
        at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:936)
        at org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.loadBlock(HFile.java:1258)
        at org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1141)  
        at org.apache.hadoop.hbase.regionserver.Store.seekToScanner(Store.java:1134)   
        at org.apache.hadoop.hbase.regionserver.Store.rowAtOrBeforeFromStoreFile(Store.java:1101)
        at org.apache.hadoop.hbase.regionserver.Store.getRowKeyAtOrBefore(Store.java:1063)
        at org.apache.hadoop.hbase.regionserver.HRegion.getClosestRowBefore(HRegion.java:1036)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1755)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source) 
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
2009-09-25 00:46:26,224 DEBUG org.apache.hadoop.hbase.regionserver.Store: Completed major compaction of info; new storefile is hdfs://coral-dfs.cluster.powe
rset.com:10000/hbase/coral_hbase/.META./1028785192/info/5419371614376079003; store size is 746.2k
{code}

It was complaining for a long while.. then got 'fixed' by the major compaction.

> Spew about rebalancing but none done....
> ----------------------------------------
>
>                 Key: HBASE-1868
>                 URL: https://issues.apache.org/jira/browse/HBASE-1868
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 0.20.0 RC2
>            Reporter: stack
>
> I'm seeing loads of this in logs:
> {code}
> 2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
> {code}
> Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.
> Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.
> {code}
> 2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
> 828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
> 2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
> sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> 2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
> ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> ...
> {code}
> Then back to the 'will be unloaded'... message.
> A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.
> Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1868) Spew about rebalancing but none done....

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1868:
-------------------------

    Fix Version/s: 0.20.1

> Spew about rebalancing but none done....
> ----------------------------------------
>
>                 Key: HBASE-1868
>                 URL: https://issues.apache.org/jira/browse/HBASE-1868
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 0.20.0 RC2
>            Reporter: stack
>             Fix For: 0.20.1
>
>
> I'm seeing loads of this in logs:
> {code}
> 2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
> {code}
> Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.
> Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.
> {code}
> 2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
> 828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
> 2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
> sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> 2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
> ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> ...
> {code}
> Then back to the 'will be unloaded'... message.
> A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.
> Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1868) Spew about rebalancing but none done....

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1868:
-------------------------

    Attachment: 1868.patch

An ugly patch that notices the condition where the regions to assign is <= number to assign to balance.  When it sees this condition, it just hands out a single region and returns.

This code is beyond my powers of comprehension.  When I see this kind of stuff normally, I rip it out and start over with test code to verify its operation.  I don't want to do this in 0.20.1 bug fix context.  Hence this compromise patch.

Trying to test it actually does the right thing before commiting.

> Spew about rebalancing but none done....
> ----------------------------------------
>
>                 Key: HBASE-1868
>                 URL: https://issues.apache.org/jira/browse/HBASE-1868
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 0.20.0 RC2
>            Reporter: stack
>             Fix For: 0.20.1
>
>         Attachments: 1868.patch
>
>
> I'm seeing loads of this in logs:
> {code}
> 2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
> {code}
> Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.
> Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.
> {code}
> 2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
> 828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
> 2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
> sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> 2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
> ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> ...
> {code}
> Then back to the 'will be unloaded'... message.
> A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.
> Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1868) Spew about rebalancing but none done....

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759720#action_12759720 ] 

stack commented on HBASE-1868:
------------------------------

So, just before the first instance of the above IOE, I see an instance of the NPE over in HBASE-1809.  After the NPE is thrown because there was no read-lock doing gets, I see a few java.io.IOException: Stream closed.  I wonder what file these were going against?  The old file that was doing the NPE or the newly opened compaction file, 1652111193973935565 (This is the file that was made by the compaction around the time of the NPE).  These two strikes may have been made against the 1652111193973935565 file.  If no hdfs-127 in place, could have hosed the dfsclient for this file.  This may have been why this file went bad.  Looking...

So, reads from meta were failing while this issue was in place.

> Spew about rebalancing but none done....
> ----------------------------------------
>
>                 Key: HBASE-1868
>                 URL: https://issues.apache.org/jira/browse/HBASE-1868
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 0.20.0 RC2
>            Reporter: stack
>
> I'm seeing loads of this in logs:
> {code}
> 2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
> {code}
> Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.
> Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.
> {code}
> 2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
> 828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
> 2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
> sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> 2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
> ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> ...
> {code}
> Then back to the 'will be unloaded'... message.
> A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.
> Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1868) Spew about rebalancing but none done....

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1868:
-------------------------

    Status: Patch Available  (was: Open)

> Spew about rebalancing but none done....
> ----------------------------------------
>
>                 Key: HBASE-1868
>                 URL: https://issues.apache.org/jira/browse/HBASE-1868
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 0.20.0 RC2
>            Reporter: stack
>             Fix For: 0.20.1
>
>         Attachments: 1868.patch
>
>
> I'm seeing loads of this in logs:
> {code}
> 2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
> {code}
> Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.
> Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.
> {code}
> 2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
> 828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
> 2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
> sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> 2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
> ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> ...
> {code}
> Then back to the 'will be unloaded'... message.
> A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.
> Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1868) Spew about rebalancing but none done....

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760829#action_12760829 ] 

stack commented on HBASE-1868:
------------------------------

Just committing it.  Won't close it till sure we don't see this issue any more.

> Spew about rebalancing but none done....
> ----------------------------------------
>
>                 Key: HBASE-1868
>                 URL: https://issues.apache.org/jira/browse/HBASE-1868
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 0.20.0 RC2
>            Reporter: stack
>             Fix For: 0.20.1
>
>         Attachments: 1868.patch
>
>
> I'm seeing loads of this in logs:
> {code}
> 2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
> {code}
> Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.
> Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.
> {code}
> 2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
> 828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
> 2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
> sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> 2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
> ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> ...
> {code}
> Then back to the 'will be unloaded'... message.
> A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.
> Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1868) Spew about rebalancing but none done....

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759722#action_12759722 ] 

stack commented on HBASE-1868:
------------------------------

It looks like the 'will be assigned' started when a new RS came online:

{code}
2009-09-22 22:37:23,466 INFO org.apache.hadoop.hbase.master.ServerManager: Received start message from: XX.XX.XX.XX,20020,1253659043416
2009-09-22 22:37:23,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Updated ZNode /hbase/rs/1253659043416 with data 208.76.44.55:20020
2009-09-22 22:37:23,547 DEBUG org.apache.hadoop.hbase.master.RegionManager: ServerXX.XX.XX-2.u.powerset.com,20020,1253219585276 will be unloaded for balance
. Server load: 5 avg: 3.8181818181818183, regions can be moved: 2
2009-09-22 22:37:23,640 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX-17.u.powerset.com,20020,1253219582748 will be unloaded for balanc
e. Server load: 5 avg: 3.8181818181818183, regions can be moved: 2
2009-09-22 22:37:23,843 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX-10.u.powerset.com,20020,1253219585277 will be unloaded for balanc
e. Server load: 5 avg: 3.8181818181818183, regions can be moved: 2
...
{code}

> Spew about rebalancing but none done....
> ----------------------------------------
>
>                 Key: HBASE-1868
>                 URL: https://issues.apache.org/jira/browse/HBASE-1868
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 0.20.0 RC2
>            Reporter: stack
>
> I'm seeing loads of this in logs:
> {code}
> 2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
> {code}
> Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.
> Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.
> {code}
> 2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
> 828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
> 2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
> sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> 2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
> ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> ...
> {code}
> Then back to the 'will be unloaded'... message.
> A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.
> Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1868) Spew about rebalancing but none done....

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1868:
-------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committing. Reopen or open new issue if we see this again.

> Spew about rebalancing but none done....
> ----------------------------------------
>
>                 Key: HBASE-1868
>                 URL: https://issues.apache.org/jira/browse/HBASE-1868
>             Project: Hadoop HBase
>          Issue Type: Bug
>         Environment: 0.20.0 RC2
>            Reporter: stack
>             Fix For: 0.20.1
>
>         Attachments: 1868.patch
>
>
> I'm seeing loads of this in logs:
> {code}
> 2009-09-24 21:27:22,130 DEBUG org.apache.hadoop.hbase.master.RegionManager: Server XX.XX.XX.100,20020,1253219583523 will be unloaded for balance. Server load: 5 avg: 3.78, regions can be moved: 4
> {code}
> Its like balancer is coming up w/ wrong answer to question... I don't see subsequent stuff going on... It does it over and over for hours.
> Then a split comes in and its seems to shake things up.  I see it do a bunch of assigning.
> {code}
> 2009-09-24 21:41:02,784 INFO org.apache.hadoop.hbase.master.ServerManager: Processing MSG_REPORT_SPLIT: locations,,1253657949707: Daughters; locations,,1253
> 828460677, locations,http:\x2F\x2Fen.wikipedia.org\x2Fwiki\x2FLarry_Lucchino,1253828460677 from aa0-009-2.u.powerset.com,20020,1253219584971; 1 of 3
> 2009-09-24 21:41:02,784 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.6:20020, startcode: 1253219584971, load: (reque
> sts=5213, regions=3, usedHeap=114, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> 2009-09-24 21:41:02,820 DEBUG org.apache.hadoop.hbase.master.RegionManager: Assigning for address: XX.XX.XX.96:20020, startcode: 1253219584175, load: (requ
> ests=12, regions=4, usedHeap=404, maxHeap=2031): total nregions to assign=2, nregions to reach balance=4, isMetaAssign=false
> ...
> {code}
> Then back to the 'will be unloaded'... message.
> A new split comes in and then the assigning gets triggered again... a few regions are opened but not enough.
> Eventually it goes back to 'normal' (average load went to 3.85 from 3.8?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.