You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Mike Drob <md...@apache.org> on 2018/01/03 21:37:17 UTC

Hadoop Metrics stopping and starting

Hi folks,

I've been seeing on one of my testbeds intermittently that HBase will fail
to stop, I suspect that this somehow involves the Hadoop Metrics system
having a thread that isn't getting cleaned up. This is on a fork of
branch-2 testing against hadoop-3 (although not with the RC specifically).
Here's a log excerpt from the shutdown sequence:


2017-12-31 01:44:07,308 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
bwiskf-2,22101,1514712293690; zookeeper connection closed.
2017-12-31 01:44:07,308 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer:
regionserver/bwiskf-2/x.x.x.x:22101 exiting
2017-12-31 01:44:07,309 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2017-12-31 01:44:08,521 INFO org.apache.hadoop.hbase.regionserver.Leases:
regionserver/bwiskf-2/x.x.x.x:22101.leaseChecker closing leases
2017-12-31 01:44:08,521 INFO org.apache.hadoop.hbase.regionserver.Leases:
regionserver/bwiskf-2/x.x.x.x:22101.leaseChecker closed leases
2017-12-31 01:44:09,290 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase metrics
system...
2017-12-31 01:44:09,290 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
stopped.
2017-12-31 01:44:09,792 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
loaded properties from hadoop-metrics2.properties
2017-12-31 01:44:09,798 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric
snapshot period at 10 second(s).
2017-12-31 01:44:09,798 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
started
2017-12-31 01:45:21,905 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase metrics
system...
2017-12-31 01:45:21,906 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
stopped.
2017-12-31 01:45:22,408 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
loaded properties from hadoop-metrics2.properties
2017-12-31 01:45:22,409 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric
snapshot period at 10 second(s).
2017-12-31 01:45:22,409 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
started
2017-12-31 01:47:12,137 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting;
hbase.shutdown.hook=true;
fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@336880df
2017-12-31 01:47:12,138 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown
hook thread.
2017-12-31 01:47:12,139 INFO
org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.

The first shutdown was a graceful shutdown request that started around
1:43, and then 1:47 was a shutdown via process kill request. The metrics
system looks like it starts up again every time it is stopped. I'm digging
through the internals of HRegionServer trying to figure out where we
interface with it but not having much luck finding the cleanup steps.

Mike

Re: Hadoop Metrics stopping and starting

Posted by Mike Drob <md...@apache.org>.
It happens about 10% of the time for me. I'm going to try to grab a jstack
next time I see it happen and try to note what is preventing the graceful
shutdown.

On Fri, Jan 5, 2018 at 12:18 PM, Ted Yu <yu...@gmail.com> wrote:

> Mike:
> Do you still see the hang from region server ?
>
> I recently successfully shutdown a 5 node cluster running on top of hadoop
> 3 (after completing LoadTestTool) based on this commit:
>
> HBASE-19667 Get rid of MasterEnvironment#supportGroupCPs
>
> FYI
>
> On Wed, Jan 3, 2018 at 3:23 PM, Sergey Soldatov <se...@gmail.com>
> wrote:
>
> > It may be something else that prevents HBase from the proper shutdown.
> > Metric system is automatically stopped and started by JmxCacheBuster and
> > may be irrelevant to your problem.
> >
> > Thanks,
> > Sergey
> >
> > On Wed, Jan 3, 2018 at 1:37 PM, Mike Drob <md...@apache.org> wrote:
> >
> > > Hi folks,
> > >
> > > I've been seeing on one of my testbeds intermittently that HBase will
> > fail
> > > to stop, I suspect that this somehow involves the Hadoop Metrics system
> > > having a thread that isn't getting cleaned up. This is on a fork of
> > > branch-2 testing against hadoop-3 (although not with the RC
> > specifically).
> > > Here's a log excerpt from the shutdown sequence:
> > >
> > >
> > > 2017-12-31 01:44:07,308 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
> > > bwiskf-2,22101,1514712293690; zookeeper connection closed.
> > > 2017-12-31 01:44:07,308 INFO
> > > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > > regionserver/bwiskf-2/x.x.x.x:22101 exiting
> > > 2017-12-31 01:44:07,309 INFO org.apache.zookeeper.ClientCnxn:
> > EventThread
> > > shut down
> > > 2017-12-31 01:44:08,521 INFO org.apache.hadoop.hbase.
> > regionserver.Leases:
> > > regionserver/bwiskf-2/x.x.x.x:22101.leaseChecker closing leases
> > > 2017-12-31 01:44:08,521 INFO org.apache.hadoop.hbase.
> > regionserver.Leases:
> > > regionserver/bwiskf-2/x.x.x.x:22101.leaseChecker closed leases
> > > 2017-12-31 01:44:09,290 INFO
> > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase
> > metrics
> > > system...
> > > 2017-12-31 01:44:09,290 INFO
> > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics
> system
> > > stopped.
> > > 2017-12-31 01:44:09,792 INFO org.apache.hadoop.metrics2.
> > > impl.MetricsConfig:
> > > loaded properties from hadoop-metrics2.properties
> > > 2017-12-31 01:44:09,798 INFO
> > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric
> > > snapshot period at 10 second(s).
> > > 2017-12-31 01:44:09,798 INFO
> > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics
> system
> > > started
> > > 2017-12-31 01:45:21,905 INFO
> > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase
> > metrics
> > > system...
> > > 2017-12-31 01:45:21,906 INFO
> > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics
> system
> > > stopped.
> > > 2017-12-31 01:45:22,408 INFO org.apache.hadoop.metrics2.
> > > impl.MetricsConfig:
> > > loaded properties from hadoop-metrics2.properties
> > > 2017-12-31 01:45:22,409 INFO
> > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric
> > > snapshot period at 10 second(s).
> > > 2017-12-31 01:45:22,409 INFO
> > > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics
> system
> > > started
> > > 2017-12-31 01:47:12,137 INFO
> > > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
> > starting;
> > > hbase.shutdown.hook=true;
> > > fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$
> > > ClientFinalizer@336880df
> > > 2017-12-31 01:47:12,138 INFO
> > > org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs
> shutdown
> > > hook thread.
> > > 2017-12-31 01:47:12,139 INFO
> > > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
> > finished.
> > >
> > > The first shutdown was a graceful shutdown request that started around
> > > 1:43, and then 1:47 was a shutdown via process kill request. The
> metrics
> > > system looks like it starts up again every time it is stopped. I'm
> > digging
> > > through the internals of HRegionServer trying to figure out where we
> > > interface with it but not having much luck finding the cleanup steps.
> > >
> > > Mike
> > >
> >
>

Re: Hadoop Metrics stopping and starting

Posted by Ted Yu <yu...@gmail.com>.
Mike:
Do you still see the hang from region server ?

I recently successfully shutdown a 5 node cluster running on top of hadoop
3 (after completing LoadTestTool) based on this commit:

HBASE-19667 Get rid of MasterEnvironment#supportGroupCPs

FYI

On Wed, Jan 3, 2018 at 3:23 PM, Sergey Soldatov <se...@gmail.com>
wrote:

> It may be something else that prevents HBase from the proper shutdown.
> Metric system is automatically stopped and started by JmxCacheBuster and
> may be irrelevant to your problem.
>
> Thanks,
> Sergey
>
> On Wed, Jan 3, 2018 at 1:37 PM, Mike Drob <md...@apache.org> wrote:
>
> > Hi folks,
> >
> > I've been seeing on one of my testbeds intermittently that HBase will
> fail
> > to stop, I suspect that this somehow involves the Hadoop Metrics system
> > having a thread that isn't getting cleaned up. This is on a fork of
> > branch-2 testing against hadoop-3 (although not with the RC
> specifically).
> > Here's a log excerpt from the shutdown sequence:
> >
> >
> > 2017-12-31 01:44:07,308 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
> > bwiskf-2,22101,1514712293690; zookeeper connection closed.
> > 2017-12-31 01:44:07,308 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > regionserver/bwiskf-2/x.x.x.x:22101 exiting
> > 2017-12-31 01:44:07,309 INFO org.apache.zookeeper.ClientCnxn:
> EventThread
> > shut down
> > 2017-12-31 01:44:08,521 INFO org.apache.hadoop.hbase.
> regionserver.Leases:
> > regionserver/bwiskf-2/x.x.x.x:22101.leaseChecker closing leases
> > 2017-12-31 01:44:08,521 INFO org.apache.hadoop.hbase.
> regionserver.Leases:
> > regionserver/bwiskf-2/x.x.x.x:22101.leaseChecker closed leases
> > 2017-12-31 01:44:09,290 INFO
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase
> metrics
> > system...
> > 2017-12-31 01:44:09,290 INFO
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
> > stopped.
> > 2017-12-31 01:44:09,792 INFO org.apache.hadoop.metrics2.
> > impl.MetricsConfig:
> > loaded properties from hadoop-metrics2.properties
> > 2017-12-31 01:44:09,798 INFO
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric
> > snapshot period at 10 second(s).
> > 2017-12-31 01:44:09,798 INFO
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
> > started
> > 2017-12-31 01:45:21,905 INFO
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase
> metrics
> > system...
> > 2017-12-31 01:45:21,906 INFO
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
> > stopped.
> > 2017-12-31 01:45:22,408 INFO org.apache.hadoop.metrics2.
> > impl.MetricsConfig:
> > loaded properties from hadoop-metrics2.properties
> > 2017-12-31 01:45:22,409 INFO
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric
> > snapshot period at 10 second(s).
> > 2017-12-31 01:45:22,409 INFO
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
> > started
> > 2017-12-31 01:47:12,137 INFO
> > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
> starting;
> > hbase.shutdown.hook=true;
> > fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$
> > ClientFinalizer@336880df
> > 2017-12-31 01:47:12,138 INFO
> > org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown
> > hook thread.
> > 2017-12-31 01:47:12,139 INFO
> > org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
> finished.
> >
> > The first shutdown was a graceful shutdown request that started around
> > 1:43, and then 1:47 was a shutdown via process kill request. The metrics
> > system looks like it starts up again every time it is stopped. I'm
> digging
> > through the internals of HRegionServer trying to figure out where we
> > interface with it but not having much luck finding the cleanup steps.
> >
> > Mike
> >
>

Re: Hadoop Metrics stopping and starting

Posted by Sergey Soldatov <se...@gmail.com>.
It may be something else that prevents HBase from the proper shutdown.
Metric system is automatically stopped and started by JmxCacheBuster and
may be irrelevant to your problem.

Thanks,
Sergey

On Wed, Jan 3, 2018 at 1:37 PM, Mike Drob <md...@apache.org> wrote:

> Hi folks,
>
> I've been seeing on one of my testbeds intermittently that HBase will fail
> to stop, I suspect that this somehow involves the Hadoop Metrics system
> having a thread that isn't getting cleaned up. This is on a fork of
> branch-2 testing against hadoop-3 (although not with the RC specifically).
> Here's a log excerpt from the shutdown sequence:
>
>
> 2017-12-31 01:44:07,308 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server
> bwiskf-2,22101,1514712293690; zookeeper connection closed.
> 2017-12-31 01:44:07,308 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> regionserver/bwiskf-2/x.x.x.x:22101 exiting
> 2017-12-31 01:44:07,309 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2017-12-31 01:44:08,521 INFO org.apache.hadoop.hbase.regionserver.Leases:
> regionserver/bwiskf-2/x.x.x.x:22101.leaseChecker closing leases
> 2017-12-31 01:44:08,521 INFO org.apache.hadoop.hbase.regionserver.Leases:
> regionserver/bwiskf-2/x.x.x.x:22101.leaseChecker closed leases
> 2017-12-31 01:44:09,290 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase metrics
> system...
> 2017-12-31 01:44:09,290 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
> stopped.
> 2017-12-31 01:44:09,792 INFO org.apache.hadoop.metrics2.
> impl.MetricsConfig:
> loaded properties from hadoop-metrics2.properties
> 2017-12-31 01:44:09,798 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric
> snapshot period at 10 second(s).
> 2017-12-31 01:44:09,798 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
> started
> 2017-12-31 01:45:21,905 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping HBase metrics
> system...
> 2017-12-31 01:45:21,906 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
> stopped.
> 2017-12-31 01:45:22,408 INFO org.apache.hadoop.metrics2.
> impl.MetricsConfig:
> loaded properties from hadoop-metrics2.properties
> 2017-12-31 01:45:22,409 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric
> snapshot period at 10 second(s).
> 2017-12-31 01:45:22,409 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HBase metrics system
> started
> 2017-12-31 01:47:12,137 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting;
> hbase.shutdown.hook=true;
> fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$
> ClientFinalizer@336880df
> 2017-12-31 01:47:12,138 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown
> hook thread.
> 2017-12-31 01:47:12,139 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished.
>
> The first shutdown was a graceful shutdown request that started around
> 1:43, and then 1:47 was a shutdown via process kill request. The metrics
> system looks like it starts up again every time it is stopped. I'm digging
> through the internals of HRegionServer trying to figure out where we
> interface with it but not having much luck finding the cleanup steps.
>
> Mike
>