You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by ji...@apache.org on 2008/10/11 21:04:10 UTC

svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt src/java/org/apache/hadoop/hbase/RegionHistorian.java src/java/org/apache/hadoop/hbase/master/RegionManager.java

Author: jimk
Date: Sat Oct 11 12:04:09 2008
New Revision: 703711

URL: http://svn.apache.org/viewvc?rev=703711&view=rev
Log:
HBASE-918   Region balancing during startup makes cluster unstable

Modified:
    hadoop/hbase/trunk/CHANGES.txt
    hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java
    hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/master/RegionManager.java

Modified: hadoop/hbase/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/hbase/trunk/CHANGES.txt?rev=703711&r1=703710&r2=703711&view=diff
==============================================================================
--- hadoop/hbase/trunk/CHANGES.txt (original)
+++ hadoop/hbase/trunk/CHANGES.txt Sat Oct 11 12:04:09 2008
@@ -20,6 +20,7 @@
    HBASE-837   Add unit tests for ThriftServer.HBaseHandler (Izaak Rubin via Stack)
    HBASE-913   Classes using log4j directly
    HBASE-914   MSG_REPORT_CLOSE has a byte array for a message
+   HBASE-918   Region balancing during startup makes cluster unstable
 
   IMPROVEMENTS
    HBASE-901   Add a limit to key length, check key and value length on client side

Modified: hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java
URL: http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java?rev=703711&r1=703710&r2=703711&view=diff
==============================================================================
--- hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java (original)
+++ hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java Sat Oct 11 12:04:09 2008
@@ -86,7 +86,7 @@
    * Get the RegionHistorian Singleton instance.
    * @return The region historian
    */
-  public static RegionHistorian getInstance() {
+  public synchronized static RegionHistorian getInstance() {
     if (historian == null) {
       historian = new RegionHistorian();
     }

Modified: hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/master/RegionManager.java
URL: http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/master/RegionManager.java?rev=703711&r1=703710&r2=703711&view=diff
==============================================================================
--- hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/master/RegionManager.java (original)
+++ hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/master/RegionManager.java Sat Oct 11 12:04:09 2008
@@ -178,16 +178,19 @@
       // worked on elsewhere.
       Set<HRegionInfo> regionsToAssign = regionsAwaitingAssignment();
       if (regionsToAssign.size() == 0) {
-        // There are no regions waiting to be assigned. This is an opportunity
-        // for us to check if this server is overloaded. 
-        double avgLoad = master.serverManager.getAverageLoad();
-        if (avgLoad > 2.0 && thisServersLoad.getNumberOfRegions() > avgLoad) {
-          if (LOG.isDebugEnabled()) {
-            LOG.debug("Server " + serverName + " is overloaded. Server load: " + 
-              thisServersLoad.getNumberOfRegions() + " avg: " + avgLoad);
+        // There are no regions waiting to be assigned.
+        if (allRegionsAssigned()) {
+          // We only do load balancing once all regions are assigned.
+          // This prevents churn while the cluster is starting up.
+          double avgLoad = master.serverManager.getAverageLoad();
+          if (avgLoad > 2.0 && thisServersLoad.getNumberOfRegions() > avgLoad) {
+            if (LOG.isDebugEnabled()) {
+              LOG.debug("Server " + serverName + " is overloaded. Server load: " + 
+                  thisServersLoad.getNumberOfRegions() + " avg: " + avgLoad);
+            }
+            unassignSomeRegions(thisServersLoad, avgLoad, mostLoadedRegions, 
+                returnMsgs);
           }
-          unassignSomeRegions(thisServersLoad, avgLoad, mostLoadedRegions, 
-            returnMsgs);
         }
       } else {
         // if there's only one server, just give it all the regions
@@ -843,6 +846,15 @@
   }
   
   /** 
+   * @return true if the initial meta scan is complete and there are no
+   * unassigned or pending regions
+   */
+  public boolean allRegionsAssigned() {
+    return isInitialMetaScanComplete() && unassignedRegions.size() == 0 &&
+      pendingRegions.size() == 0;
+  }
+  
+  /** 
    * Get the root region location.
    * @return HServerAddress describing root region server.
    */



RE: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt src/java/org/apache/hadoop/hbase/RegionHistorian.java src/java/org/apache/hadoop/hbase/master/RegionManager.java

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.
919 was not fixed in the patch for 918 ultimately. 918 only
addresses not doing load balancing until all regions are
assigned.

919 is more related to "safe mode" in which the master does
not return a root region address until all regions are
assigned. This behavior prevented the cluster from starting
since the servers make use of HTable rather than the
HRegionInterface API.

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: stack [mailto:stack@duboce.net]
> Sent: Sunday, October 12, 2008 5:33 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt
> src/java/org/apache/hadoop/hbase/RegionHistorian.java
> src/java/org/apache/hadoop/hbase/master/RegionManager.java
>
> I was just interested in what in the patch addressed HBASE-919? (Am I
> wrong thinking that 919 got in the way of the fixing of this issue?)
> Thanks,
> St.Ack
>
> Jim Kellerman (POWERSET) wrote:
> > No, the change to RegionHistorian was only to make the instantiation
> > of the singleton thread safe, which it was not before.
> >
> > ---
> > Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
> >
> >
> >
> >> -----Original Message-----
> >> From: stack [mailto:stack@duboce.net]
> >> Sent: Saturday, October 11, 2008 4:02 PM
> >> To: hbase-dev@hadoop.apache.org
> >> Subject: Re: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt
> >> src/java/org/apache/hadoop/hbase/RegionHistorian.java
> >> src/java/org/apache/hadoop/hbase/master/RegionManager.java
> >>
> >> jimk@apache.org wrote:
> >>
> >>> +   HBASE-918   Region balancing during startup makes cluster unstable
> >>>  ...
> >>> URL:
> >>>
> >>
> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop
> >> /hbase/RegionHistorian.java?rev=703711&r1=703710&r2=703711&view=diff
> >>
> >>
> ==========================================================================
> >> ====
> >>
> >>> ---
> >>>
> >>
> hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java
> >> (original)
> >>
> >>> +++
> >>>
> >>
> hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java
> >> Sat Oct 11 12:04:09 2008
> >>
> >>> @@ -86,7 +86,7 @@
> >>>     * Get the RegionHistorian Singleton instance.
> >>>     * @return The region historian
> >>>     */
> >>> -  public static RegionHistorian getInstance() {
> >>> +  public synchronized static RegionHistorian getInstance() {
> >>>
> >>>
> >> Was it adding the synchronized here that made it so regionhistorian
> >> could continue to use HTable?  i.e. fixes HBASE-919?
> >> St.Ack
> >>
> >>
> >
> >
>


Re: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt src/java/org/apache/hadoop/hbase/RegionHistorian.java src/java/org/apache/hadoop/hbase/master/RegionManager.java

Posted by stack <st...@duboce.net>.
I was just interested in what in the patch addressed HBASE-919? (Am I 
wrong thinking that 919 got in the way of the fixing of this issue?)
Thanks,
St.Ack

Jim Kellerman (POWERSET) wrote:
> No, the change to RegionHistorian was only to make the instantiation
> of the singleton thread safe, which it was not before.
>
> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
>
>   
>> -----Original Message-----
>> From: stack [mailto:stack@duboce.net]
>> Sent: Saturday, October 11, 2008 4:02 PM
>> To: hbase-dev@hadoop.apache.org
>> Subject: Re: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt
>> src/java/org/apache/hadoop/hbase/RegionHistorian.java
>> src/java/org/apache/hadoop/hbase/master/RegionManager.java
>>
>> jimk@apache.org wrote:
>>     
>>> +   HBASE-918   Region balancing during startup makes cluster unstable
>>>  ...
>>> URL:
>>>       
>> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop
>> /hbase/RegionHistorian.java?rev=703711&r1=703710&r2=703711&view=diff
>>     
>> ==========================================================================
>> ====
>>     
>>> ---
>>>       
>> hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java
>> (original)
>>     
>>> +++
>>>       
>> hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java
>> Sat Oct 11 12:04:09 2008
>>     
>>> @@ -86,7 +86,7 @@
>>>     * Get the RegionHistorian Singleton instance.
>>>     * @return The region historian
>>>     */
>>> -  public static RegionHistorian getInstance() {
>>> +  public synchronized static RegionHistorian getInstance() {
>>>
>>>       
>> Was it adding the synchronized here that made it so regionhistorian
>> could continue to use HTable?  i.e. fixes HBASE-919?
>> St.Ack
>>
>>     
>
>   


Re: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt src/java/org/apache/hadoop/hbase/RegionHistorian.java src/java/org/apache/hadoop/hbase/master/RegionManager.java

Posted by stack <st...@duboce.net>.
Doğacan Güney wrote:
> On Mon, Oct 13, 2008 at 12:43 AM, Jim Kellerman (POWERSET)
> <Ji...@microsoft.com> wrote:
>   
>> No, the change to RegionHistorian was only to make the instantiation
>> of the singleton thread safe, which it was not before.
>>
>>     
>
> IIRC what I read about java memory model, you do not need synchronized
> there if you make a few changes:
>
>   /** Singleton reference */
>   private static RegionHistorian historian = new RegionHistorian();
>
> I think if you do this, Java guarantees that historian won't be
> initialized until it is accessed
> (because static members are only initialized when a class is loaded)
> and it will be thread-safe.
> So you can get rid of "synchronized" in getInstance.

Agreed.  The above is more the idiom but for some reason, the 
RegionHistorian needs to be lazily instantiated (I looked into this 
before but don't remember why -- smile).  I tried changing it yesterday 
and ran unit tests.  They failed.   It seems the lazy-instantiation is 
still necessary.

St.Ack

Re: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt src/java/org/apache/hadoop/hbase/RegionHistorian.java src/java/org/apache/hadoop/hbase/master/RegionManager.java

Posted by Doğacan Güney <do...@gmail.com>.
On Mon, Oct 13, 2008 at 12:43 AM, Jim Kellerman (POWERSET)
<Ji...@microsoft.com> wrote:
> No, the change to RegionHistorian was only to make the instantiation
> of the singleton thread safe, which it was not before.
>

IIRC what I read about java memory model, you do not need synchronized
there if you make a few changes:

  /** Singleton reference */
  private static RegionHistorian historian = new RegionHistorian();

I think if you do this, Java guarantees that historian won't be
initialized until it is accessed
(because static members are only initialized when a class is loaded)
and it will be thread-safe.
So you can get rid of "synchronized" in getInstance.


> ---
> Jim Kellerman, Powerset (Live Search, Microsoft Corporation)
>
>
>> -----Original Message-----
>> From: stack [mailto:stack@duboce.net]
>> Sent: Saturday, October 11, 2008 4:02 PM
>> To: hbase-dev@hadoop.apache.org
>> Subject: Re: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt
>> src/java/org/apache/hadoop/hbase/RegionHistorian.java
>> src/java/org/apache/hadoop/hbase/master/RegionManager.java
>>
>> jimk@apache.org wrote:
>> > +   HBASE-918   Region balancing during startup makes cluster unstable
>> >  ...
>> > URL:
>> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop
>> /hbase/RegionHistorian.java?rev=703711&r1=703710&r2=703711&view=diff
>> >
>> ==========================================================================
>> ====
>> > ---
>> hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java
>> (original)
>> > +++
>> hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java
>> Sat Oct 11 12:04:09 2008
>> > @@ -86,7 +86,7 @@
>> >     * Get the RegionHistorian Singleton instance.
>> >     * @return The region historian
>> >     */
>> > -  public static RegionHistorian getInstance() {
>> > +  public synchronized static RegionHistorian getInstance() {
>> >
>>
>> Was it adding the synchronized here that made it so regionhistorian
>> could continue to use HTable?  i.e. fixes HBASE-919?
>> St.Ack
>>
>
>



-- 
Doğacan Güney

RE: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt src/java/org/apache/hadoop/hbase/RegionHistorian.java src/java/org/apache/hadoop/hbase/master/RegionManager.java

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.
No, the change to RegionHistorian was only to make the instantiation
of the singleton thread safe, which it was not before.

---
Jim Kellerman, Powerset (Live Search, Microsoft Corporation)


> -----Original Message-----
> From: stack [mailto:stack@duboce.net]
> Sent: Saturday, October 11, 2008 4:02 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt
> src/java/org/apache/hadoop/hbase/RegionHistorian.java
> src/java/org/apache/hadoop/hbase/master/RegionManager.java
>
> jimk@apache.org wrote:
> > +   HBASE-918   Region balancing during startup makes cluster unstable
> >  ...
> > URL:
> http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop
> /hbase/RegionHistorian.java?rev=703711&r1=703710&r2=703711&view=diff
> >
> ==========================================================================
> ====
> > ---
> hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java
> (original)
> > +++
> hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java
> Sat Oct 11 12:04:09 2008
> > @@ -86,7 +86,7 @@
> >     * Get the RegionHistorian Singleton instance.
> >     * @return The region historian
> >     */
> > -  public static RegionHistorian getInstance() {
> > +  public synchronized static RegionHistorian getInstance() {
> >
>
> Was it adding the synchronized here that made it so regionhistorian
> could continue to use HTable?  i.e. fixes HBASE-919?
> St.Ack
>


Re: svn commit: r703711 - in /hadoop/hbase/trunk: CHANGES.txt src/java/org/apache/hadoop/hbase/RegionHistorian.java src/java/org/apache/hadoop/hbase/master/RegionManager.java

Posted by stack <st...@duboce.net>.
jimk@apache.org wrote:
> +   HBASE-918   Region balancing during startup makes cluster unstable
>  ...
> URL: http://svn.apache.org/viewvc/hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java?rev=703711&r1=703710&r2=703711&view=diff
> ==============================================================================
> --- hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java (original)
> +++ hadoop/hbase/trunk/src/java/org/apache/hadoop/hbase/RegionHistorian.java Sat Oct 11 12:04:09 2008
> @@ -86,7 +86,7 @@
>     * Get the RegionHistorian Singleton instance.
>     * @return The region historian
>     */
> -  public static RegionHistorian getInstance() {
> +  public synchronized static RegionHistorian getInstance() {
>   

Was it adding the synchronized here that made it so regionhistorian 
could continue to use HTable?  i.e. fixes HBASE-919?
St.Ack