You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Weiwei Xiong <xi...@gmail.com> on 2011/04/18 03:01:58 UTC

Region server OME

Hi,


My HBase deployment has been working great but recently the regionservers
always got out of memory error. The symptom is like this: region servers
failed with OME, and then I restarted them. One of region servers used up
the maximum memory I configured for HBase, and then fail with memory dump.
Region servers on other nodes will relay to use up the system memory and
fail with OME one by one. Finally all region servers failed. This problem is
reproduced every time I restart region servers or the whole HBase.

I am using 0.90.1 on a cluster with two nodes, each of which has 32GB
memory. Initially I configured hbase with 16GB memory. After this issue came
out I kept increasing the memory for HBase but seems won't work out.

I have been dumping millions of rows into HBase tables these days.  I
totally stopped all of them now. But every time I restart HBase, the
regionservers fail with OME. Did anyone of you have such problem before?

Thanks,
-- Weiwei

Re: Region server OME

Posted by Ted Yu <yu...@gmail.com>.

In 0.90.2, this option is off by default:

    <name>hbase.hregion.memstore.mslab.enabled</name>
    <value>false</value>

On Mon, Apr 18, 2011 at 4:54 PM, Weiwei Xiong <xi...@gmail.com> wrote:

> I didn't modify this option. So I guess by default it should be true.
>
> But I don't understand how could this option be related to the region
> server
> OME?  Because I thought this option is useful when the system is under
> heavy
> write loads. But my situation is that even I restarted HBase and do
> nothing,
> the region servers will still automatically go OME.
>
> Thanks,
> -- Weiwei
>
> On Sun, Apr 17, 2011 at 8:28 PM, Ted Dunning <td...@maprtech.com>
> wrote:
>
> > Did you turn on the mslab option?
> >
> > On Sun, Apr 17, 2011 at 6:01 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> > > I am using 0.90.1 on a cluster with two nodes, each of which has 32GB
> > > memory. Initially I configured hbase with 16GB memory. After this issue
> > came
> > > out I kept increasing the memory for HBase but seems won't work out.
> >
>

Re: A question about Hmaster restarted.

Posted by Jean-Daniel Cryans <jd...@apache.org>.

See HBASE-3744, createTable shouldn't be using the startup bulk assigner.

J-D

On Mon, Apr 18, 2011 at 8:43 PM, Gaojinchao <ga...@huawei.com> wrote:
> I created table with some regions.
> Hmaster had crashed because of one region server crashed.
>
> I dig the code. It may be a bug.
> Startup or create table use this code.
> In startup case need to shutdown itself.
> But ,create table need to reassign.
>
> long maxWaitTime = System.currentTimeMillis() +
>        this.master.getConfiguration().getLong("hbase.regionserver.rpc.startup.waittime", 60000);
>      while (!this.master.isStopped()) {
>        try {
>          this.serverManager.sendRegionOpen(destination, regions);
>          break;
>        } catch (org.apache.hadoop.hbase.ipc.ServerNotRunningException e) {
>          // This is the one exception to retry.  For all else we should just fail
>          // the startup.
>          long now = System.currentTimeMillis();
>          if (now > maxWaitTime) throw e;
>          LOG.debug("Server is not yet up; waiting up to " +
>              (maxWaitTime - now) + "ms", e);
>          Thread.sleep(1000);
>        }
>      }
>    } catch (Throwable t) {
>      this.master.abort("Failed assignment of regions to " + destination +
>        "; bulk assign FAILED", t);
>      return;
>    }
>
> Hmaster logs:
> 2011-04-18 20:21:37,144 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
> 2011-04-18 20:21:47,012 FATAL org.apache.hadoop.hbase.master.HMaster: Failed assignment of regions to serverName=t5,60020,1303129296185, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)
>
>        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
>        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>        at $Proxy6.openRegions(Unknown Source)
>        at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:566)
>        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776)
>        at org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> 2011-04-18 20:21:47,012 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> 2011-04-18 20:21:47,012 FATAL org.apache.hadoop.hbase.master.HMaster: Failed assignment of regions to serverName=t5,60020,1303129296185, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
> org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
>        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)
>
>

A question about Hmaster restarted.

Posted by Gaojinchao <ga...@huawei.com>.

I created table with some regions. 
Hmaster had crashed because of one region server crashed.

I dig the code. It may be a bug.
Startup or create table use this code.
In startup case need to shutdown itself.
But ,create table need to reassign.

long maxWaitTime = System.currentTimeMillis() +
        this.master.getConfiguration().getLong("hbase.regionserver.rpc.startup.waittime", 60000);
      while (!this.master.isStopped()) {
        try {
          this.serverManager.sendRegionOpen(destination, regions);
          break;
        } catch (org.apache.hadoop.hbase.ipc.ServerNotRunningException e) {
          // This is the one exception to retry.  For all else we should just fail
          // the startup.
          long now = System.currentTimeMillis();
          if (now > maxWaitTime) throw e;
          LOG.debug("Server is not yet up; waiting up to " +
              (maxWaitTime - now) + "ms", e);
          Thread.sleep(1000);
        }
      }
    } catch (Throwable t) {
      this.master.abort("Failed assignment of regions to " + destination +
        "; bulk assign FAILED", t);
      return;
    }

Hmaster logs:
2011-04-18 20:21:37,144 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
2011-04-18 20:21:47,012 FATAL org.apache.hadoop.hbase.master.HMaster: Failed assignment of regions to serverName=t5,60020,1303129296185, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)

        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:771)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy6.openRegions(Unknown Source)
        at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:566)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:776)
        at org.apache.hadoop.hbase.master.AssignmentManager$SingleServerBulkAssigner.run(AssignmentManager.java:1310)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2011-04-18 20:21:47,012 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-04-18 20:21:47,012 FATAL org.apache.hadoop.hbase.master.HMaster: Failed assignment of regions to serverName=t5,60020,1303129296185, load=(requests=0, regions=0, usedHeap=0, maxHeap=0)
org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.ipc.ServerNotRunningException: Server is not running yet
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1038)

Re: Region server OME

Posted by Stack <st...@duboce.net>.

So, whats in the regionserver log that time?  Its trying to replay
edits is my guess from a recovered.edits file and one of them is
causing the OOME?  Whats in the log just before the OOME on all
servers?
St.Ack

On Mon, Apr 18, 2011 at 4:54 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> I didn't modify this option. So I guess by default it should be true.
>
> But I don't understand how could this option be related to the region server
> OME?  Because I thought this option is useful when the system is under heavy
> write loads. But my situation is that even I restarted HBase and do nothing,
> the region servers will still automatically go OME.
>
> Thanks,
> -- Weiwei
>
> On Sun, Apr 17, 2011 at 8:28 PM, Ted Dunning <td...@maprtech.com> wrote:
>
>> Did you turn on the mslab option?
>>
>> On Sun, Apr 17, 2011 at 6:01 PM, Weiwei Xiong <xi...@gmail.com> wrote:
>> > I am using 0.90.1 on a cluster with two nodes, each of which has 32GB
>> > memory. Initially I configured hbase with 16GB memory. After this issue
>> came
>> > out I kept increasing the memory for HBase but seems won't work out.
>>
>

Re: Region server OME

Posted by Weiwei Xiong <xi...@gmail.com>.

I didn't modify this option. So I guess by default it should be true.

But I don't understand how could this option be related to the region server
OME?  Because I thought this option is useful when the system is under heavy
write loads. But my situation is that even I restarted HBase and do nothing,
the region servers will still automatically go OME.

Thanks,
-- Weiwei

On Sun, Apr 17, 2011 at 8:28 PM, Ted Dunning <td...@maprtech.com> wrote:

> Did you turn on the mslab option?
>
> On Sun, Apr 17, 2011 at 6:01 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> > I am using 0.90.1 on a cluster with two nodes, each of which has 32GB
> > memory. Initially I configured hbase with 16GB memory. After this issue
> came
> > out I kept increasing the memory for HBase but seems won't work out.
>

Re: Region server OME

Posted by Ted Dunning <td...@maprtech.com>.

Did you turn on the mslab option?

On Sun, Apr 17, 2011 at 6:01 PM, Weiwei Xiong <xi...@gmail.com> wrote:
> I am using 0.90.1 on a cluster with two nodes, each of which has 32GB
> memory. Initially I configured hbase with 16GB memory. After this issue came
> out I kept increasing the memory for HBase but seems won't work out.

Re: Region server OME

Posted by 陈加俊 <cj...@gmail.com>.

Yes,I hava this problem too. So I want to know how to allocate the memery in
hbase . Why OOME ? How to limit the heap space in hbase ? or It did not calc
the memery ?

On Mon, Apr 18, 2011 at 9:01 AM, Weiwei Xiong <xi...@gmail.com> wrote:

> Hi,
>
>
> My HBase deployment has been working great but recently the regionservers
> always got out of memory error. The symptom is like this: region servers
> failed with OME, and then I restarted them. One of region servers used up
> the maximum memory I configured for HBase, and then fail with memory dump.
> Region servers on other nodes will relay to use up the system memory and
> fail with OME one by one. Finally all region servers failed. This problem
> is
> reproduced every time I restart region servers or the whole HBase.
>
> I am using 0.90.1 on a cluster with two nodes, each of which has 32GB
> memory. Initially I configured hbase with 16GB memory. After this issue
> came
> out I kept increasing the memory for HBase but seems won't work out.
>
> I have been dumping millions of rows into HBase tables these days.  I
> totally stopped all of them now. But every time I restart HBase, the
> regionservers fail with OME. Did anyone of you have such problem before?
>
> Thanks,
> -- Weiwei
>



-- 
Thanks & Best regards
jiajun