You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Zhoushuaifeng <zh...@huawei.com> on 2011/04/19 14:10:01 UTC

Disable table causing regionserver shutdown

Hi,
I run the disable table command, after a while, two RegionServers shutdown.
I see the log, when close one region, compaction is running on this region:
I check the code, when close regions, it will first set writestate. writesEnabled  to false, but if there is still compact running, this setting may interrupt compact and throw InterruptedIOException, when the HRegion catched this Exception, compact will fail, is this the cause of Regionserver down? If so, this may be a problem.
                 if (!this.region.areWritesEnabled()) {
                    writer.close();
                    fs.delete(writer.getPath(), false);
                    throw new InterruptedIOException(
                        "Aborting compaction of store " + this +
                        " in region " + this.region +
                        " because user requested stop.");
                  }


    } catch (InterruptedIOException iioe) {
          LOG.info("compaction interrupted by user: ", iioe);
        } finally {
          long now = EnvironmentEdgeManager.currentTimeMillis();
          LOG.info(((completed) ? "completed" : "aborted")
              + " compaction on region " + this
              + " after " + StringUtils.formatTimeDiff(now, startTime));

Some logs:

2011-04-18 14:00:56,468 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572. because Region has too many store files; priority=6, compaction queue size=0
2011-04-18 14:01:06,569 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572.
2011-04-18 14:01:06,569 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572.: disabling compactions & flushes
2011-04-18 14:01:06,569 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: waiting for compaction to complete for region ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572.
2011-04-18 14:01:06,714 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction interrupted by user:
java.io.InterruptedIOException: Aborting compaction of store value in region ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572. because user requested stop.
2011-04-18 14:01:06,714 INFO org.apache.hadoop.hbase.regionserver.HRegion: aborted compaction on region ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572. after 10sec
2011-04-18 14:01:06,714 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: regionserver60020.compactor exiting
2011-04-18 14:01:07,532 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closing leases
2011-04-18 14:01:07,532 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closed leases
2011-04-18 14:01:07,600 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 exiting


Zhou Shuaifeng(Frank)


-------------------------------------------------------------------------------------------------------------------------------------
This e-mail and its attachments contain confidential information from HUAWEI, which
is intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!


Re: Disable table causing regionserver shutdown

Posted by Jean-Daniel Cryans <jd...@gmail.com>.
I'm not sure which version you are on, but
https://issues.apache.org/jira/browse/HBASE-3741 might be the cause.

J-D

On Tue, Apr 19, 2011 at 7:30 PM, Zhoushuaifeng <zh...@huawei.com> wrote:
> Hi J-D,
> You are right, there are other reason causing the server shutdown:
>
> org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/unassigned/c7d346485bb00d6f905985c5d6b47b5e
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
>        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>        at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
>        at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:708)
>        at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:698)
>        at org.apache.hadoop.hbase.zookeeper.ZKAssign.retransitionNodeOpening(ZKAssign.java:585)
>        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tickleOpening(OpenRegionHandler.java:322)
>        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:97)
>        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> 2011-04-18 14:01:03,444 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=0.0, regions=361, stores=722, storefiles=294, storefileIndexSize=115, memstoreSize=0, compactionQueueSize=3, flushQueueSize=0, usedHeap=5881, maxHeap=8165, blockCacheSize=1385421952, blockCacheFree=326902656, blockCacheCount=20786, blockCacheHitCount=283716555, blockCacheMissCount=68379692, blockCacheEvictedCount=9159612, blockCacheHitRatio=80, blockCacheHitCachingRatio=96
> 2011-04-18 14:01:03,444 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exception refreshing OPENING; region=c7d346485bb00d6f905985c5d6b47b5e, context=post_region_open
>
> Zhou Shuaifeng(Frank)
>

Re: Disable table causing regionserver shutdown

Posted by Zhoushuaifeng <zh...@huawei.com>.
Hi J-D,
You are right, there are other reason causing the server shutdown:

org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/unassigned/c7d346485bb00d6f905985c5d6b47b5e
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:106)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
	at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1038)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:708)
	at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:698)
	at org.apache.hadoop.hbase.zookeeper.ZKAssign.retransitionNodeOpening(ZKAssign.java:585)
	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.tickleOpening(OpenRegionHandler.java:322)
	at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:97)
	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
2011-04-18 14:01:03,444 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=0.0, regions=361, stores=722, storefiles=294, storefileIndexSize=115, memstoreSize=0, compactionQueueSize=3, flushQueueSize=0, usedHeap=5881, maxHeap=8165, blockCacheSize=1385421952, blockCacheFree=326902656, blockCacheCount=20786, blockCacheHitCount=283716555, blockCacheMissCount=68379692, blockCacheEvictedCount=9159612, blockCacheHitRatio=80, blockCacheHitCachingRatio=96
2011-04-18 14:01:03,444 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Exception refreshing OPENING; region=c7d346485bb00d6f905985c5d6b47b5e, context=post_region_open

Zhou Shuaifeng(Frank)


-----邮件原件-----
发件人: Zhoushuaifeng [mailto:zhoushuaifeng@huawei.com] 
发送时间: 2011年4月20日 8:56
收件人: Jean-Daniel Cryans; dev@hbase.apache.org
抄送: Yanlijun
主题: 答复: Disable table causing regionserver shutdown

Nobody asked to shutdown the region server. Only disable tables, but the region server shutdown.
I have 8 region servers, 2 shut down, the same reason.
Please recheck if there is a problem.

Zhou Shuaifeng(Frank)
-------------------------------------------------------------------------------------------------------------------------------------

-----邮件原件-----
发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年4月20日 2:45
收件人: dev@hbase.apache.org
抄送: Zhoushuaifeng; Yanlijun
主题: Re: Disable table causing regionserver shutdown

That's the expected behavior when the region server is asked to
shutdown and there's a compaction running, take a closer look at the
log before those lines to find the reason.

J-D


答复: Disable table causing regionserver shutdown

Posted by Zhoushuaifeng <zh...@huawei.com>.
Nobody asked to shutdown the region server. Only disable tables, but the region server shutdown.
I have 8 region servers, 2 shut down, the same reason.
Please recheck if there is a problem.

Zhou Shuaifeng(Frank)
-------------------------------------------------------------------------------------------------------------------------------------
This e-mail and its attachments contain confidential information from HUAWEI, which 
is intended only for the person or entity whose address is listed above. Any use of the 
information contained herein in any way (including, but not limited to, total or partial 
disclosure, reproduction, or dissemination) by persons other than the intended 
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by 
phone or email immediately and delete it!

-----邮件原件-----
发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年4月20日 2:45
收件人: dev@hbase.apache.org
抄送: Zhoushuaifeng; Yanlijun
主题: Re: Disable table causing regionserver shutdown

That's the expected behavior when the region server is asked to
shutdown and there's a compaction running, take a closer look at the
log before those lines to find the reason.

J-D

On Tue, Apr 19, 2011 at 5:10 AM, Zhoushuaifeng <zh...@huawei.com> wrote:
> Hi,
> I run the disable table command, after a while, two RegionServers shutdown.
> I see the log, when close one region, compaction is running on this region:
> I check the code, when close regions, it will first set writestate. writesEnabled  to false, but if there is still compact running, this setting may interrupt compact and throw InterruptedIOException, when the HRegion catched this Exception, compact will fail, is this the cause of Regionserver down? If so, this may be a problem.
>                 if (!this.region.areWritesEnabled()) {
>                    writer.close();
>                    fs.delete(writer.getPath(), false);
>                    throw new InterruptedIOException(
>                        "Aborting compaction of store " + this +
>                        " in region " + this.region +
>                        " because user requested stop.");
>                  }
>
>
>    } catch (InterruptedIOException iioe) {
>          LOG.info("compaction interrupted by user: ", iioe);
>        } finally {
>          long now = EnvironmentEdgeManager.currentTimeMillis();
>          LOG.info(((completed) ? "completed" : "aborted")
>              + " compaction on region " + this
>              + " after " + StringUtils.formatTimeDiff(now, startTime));
>
> Some logs:
>
> 2011-04-18 14:00:56,468 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572. because Region has too many store files; priority=6, compaction queue size=0
> 2011-04-18 14:01:06,569 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572.
> 2011-04-18 14:01:06,569 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572.: disabling compactions & flushes
> 2011-04-18 14:01:06,569 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: waiting for compaction to complete for region ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572.
> 2011-04-18 14:01:06,714 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction interrupted by user:
> java.io.InterruptedIOException: Aborting compaction of store value in region ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572. because user requested stop.
> 2011-04-18 14:01:06,714 INFO org.apache.hadoop.hbase.regionserver.HRegion: aborted compaction on region ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572. after 10sec
> 2011-04-18 14:01:06,714 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: regionserver60020.compactor exiting
> 2011-04-18 14:01:07,532 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closing leases
> 2011-04-18 14:01:07,532 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closed leases
> 2011-04-18 14:01:07,600 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 exiting
>
>
> Zhou Shuaifeng(Frank)
>
>
> -------------------------------------------------------------------------------------------------------------------------------------
> This e-mail and its attachments contain confidential information from HUAWEI, which
> is intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
> phone or email immediately and delete it!
>
>

Re: Disable table causing regionserver shutdown

Posted by Jean-Daniel Cryans <jd...@apache.org>.
That's the expected behavior when the region server is asked to
shutdown and there's a compaction running, take a closer look at the
log before those lines to find the reason.

J-D

On Tue, Apr 19, 2011 at 5:10 AM, Zhoushuaifeng <zh...@huawei.com> wrote:
> Hi,
> I run the disable table command, after a while, two RegionServers shutdown.
> I see the log, when close one region, compaction is running on this region:
> I check the code, when close regions, it will first set writestate. writesEnabled  to false, but if there is still compact running, this setting may interrupt compact and throw InterruptedIOException, when the HRegion catched this Exception, compact will fail, is this the cause of Regionserver down? If so, this may be a problem.
>                 if (!this.region.areWritesEnabled()) {
>                    writer.close();
>                    fs.delete(writer.getPath(), false);
>                    throw new InterruptedIOException(
>                        "Aborting compaction of store " + this +
>                        " in region " + this.region +
>                        " because user requested stop.");
>                  }
>
>
>    } catch (InterruptedIOException iioe) {
>          LOG.info("compaction interrupted by user: ", iioe);
>        } finally {
>          long now = EnvironmentEdgeManager.currentTimeMillis();
>          LOG.info(((completed) ? "completed" : "aborted")
>              + " compaction on region " + this
>              + " after " + StringUtils.formatTimeDiff(now, startTime));
>
> Some logs:
>
> 2011-04-18 14:00:56,468 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572. because Region has too many store files; priority=6, compaction queue size=0
> 2011-04-18 14:01:06,569 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing close of ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572.
> 2011-04-18 14:01:06,569 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572.: disabling compactions & flushes
> 2011-04-18 14:01:06,569 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: waiting for compaction to complete for region ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572.
> 2011-04-18 14:01:06,714 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction interrupted by user:
> java.io.InterruptedIOException: Aborting compaction of store value in region ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572. because user requested stop.
> 2011-04-18 14:01:06,714 INFO org.apache.hadoop.hbase.regionserver.HRegion: aborted compaction on region ufdr,1000286138199982#0129000,1302767272113.80928bc54c94a029b76098ce04c22572. after 10sec
> 2011-04-18 14:01:06,714 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread: regionserver60020.compactor exiting
> 2011-04-18 14:01:07,532 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closing leases
> 2011-04-18 14:01:07,532 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020 closed leases
> 2011-04-18 14:01:07,600 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020 exiting
>
>
> Zhou Shuaifeng(Frank)
>
>
> -------------------------------------------------------------------------------------------------------------------------------------
> This e-mail and its attachments contain confidential information from HUAWEI, which
> is intended only for the person or entity whose address is listed above. Any use of the
> information contained herein in any way (including, but not limited to, total or partial
> disclosure, reproduction, or dissemination) by persons other than the intended
> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
> phone or email immediately and delete it!
>
>