You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by sunweiwei <su...@asiainfo-linkage.com> on 2014/05/05 10:14:06 UTC
meta server hungs ?
Hi
I'm using hbase0.96.0.
I found client can't put data suddenly and hmaster hungs. Then I shutdown
the hmaster and start a new hmaster, then the client back to normal.
I found this logs in the new hmaster . It seem like meta server hungs and
hmaster stop the meta server.
2014-04-29 15:32:21,530 INFO [master:hadoop1:60000] catalog.CatalogTracker:
Failed verification of hbase:meta,,1 at
address=hadoop77,60020,1396606457005,
exception=java.net.SocketTimeoutException: Call to
hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
remote=hadoop77/192.168.1.87:60020]
2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster: Forcing
expire of hadoop77,60020,1396606457005
I can't find why meta server hungs .I found this in meta server log
2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
region hbase:meta,,1.1588230740
2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
region hbase:meta,,1.1588230740
2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
region hbase:meta,,1.1588230740
2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
region hbase:meta,,1.1588230740
any suggestion will be appreciated. Thanks.
Re: 答复: 答复: 答复: meta server hungs ?
Posted by Ted Yu <yu...@gmail.com>.
You can add the following to JVM parameters:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Cheers
On Fri, May 16, 2014 at 4:32 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> HI
> Sorry, I just saw this mail. I set Gc parameters like this:
>
> export HBASE_REGIONSERVER_OPTS="-Xmn512m
> -XX:CMSInitiatingOccupancyFraction=70 -Xms16384m -Xmx16384m -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps "
>
>
> -----邮件原件-----
> 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> 发送时间: 2014年5月11日 19:11
> 收件人: user@hbase.apache.org
> 抄送: <us...@hbase.apache.org>
> 主题: Re: 答复: 答复: meta server hungs ?
>
> What GC parameters did you specify for JVM ?
>
> Thanks
>
> On May 7, 2014, at 6:27 PM, "sunweiwei" <su...@asiainfo-linkage.com>
> wrote:
>
> > I find lots of these in gc.log. It seems like CMS gc run many times
> but old Generation is always large.
> > I'm confused.
> > Any suggestion will be appreciated. Thanks.
> >
> > 2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
> > 2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew:
> 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K),
> 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
> > 2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew:
> 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K),
> 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
> > 2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep:
> 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]
> >
> > 2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
> > 2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep:
> 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]
> >
> > 2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
> > 2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep:
> 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]
> >
> > -----邮件原件-----
> > 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> > 发送时间: 2014年5月6日 9:27
> > 收件人: user@hbase.apache.org
> > 主题: 答复: 答复: meta server hungs ?
> >
> > HI Samir
> > I think master declared hadoop77/192.168.1.87:60020 as dead server,
> because of "Failed verification of hbase:meta,,1 at
> address=hadoop77,60020,1396606457005
> exception=java.net.SocketTimeoutException".
> > I have paste the master log in the first mail.
> >
> > I'm not sure, here is the whole process:
> > at 2014-04-29 13:53:57,271 client throw a SocketTimeoutException :
> Call to hadoop77/192.168.1.87:60020failed because
> java.net.SocketTimeoutException: 60000 millis timeout and other clients
> hung.
> > at 2014-04-29 15:30:** I visit hbase web and found hmaster
> hung , then i stop it and start a new hmaster.
> > at 2014-04-29 15:32:21,530 the new hmaster logs "Failed
> verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005,
> exception=java.net.SocketTimeoutException:
> > Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException"
> > at 2014-04-29 15:32:28,364 the meta server received hmaster's
> message and shutdown itself.
> >
> > after these, clients come back to normal
> >
> > -----邮件原件-----
> > 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> > 发送时间: 2014年5月5日 19:25
> > 收件人: user@hbase.apache.org
> > 主题: Re: 答复: meta server hungs ?
> >
> > There should be exception in regionserver log on hadoop77/
> > 192.168.1.87:60020 above this one:
> >
> > *********
> > 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> > regionserver.HRegionServer: ABORTING region server
> > hadoop77,60020,1396606457005:
> org.apache.hadoop.hbase.YouAreDeadException:
> > Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> > as dead server
> > at org.apache.hadoop.hbase.master.ServerManager.
> > checkIsDead(ServerManager.java:339)
> > *********
> >
> > Can you find it and past it. That exception should explain why
> > master declared hadoop77/192.168.1.87:60020 as dead server.
> >
> > Regards
> > Samir
> >
> >
> > On Mon, May 5, 2014 at 11:39 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
> >
> >> And this is client log.
> >>
> >> 2014-04-29 13:53:57,271 WARN [main]
> >> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> >> closed
> >> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed
> because java.net.SocketTimeoutException: 60000 millis timeout while
> >> waiting for channel to be ready for read. ch :
> >> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473
> remote=hadoop77/
> >> 192.168.1.87:60020]
> >> at
> >> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> >> at
> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> >> at
> >>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
> >> at
> >>
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
> >> at
> >>
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
> >> at
> >>
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
> >> at
> >>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
> >> at
> >>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
> >> at
> >>
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
> >> at
> >>
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
> >> at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
> >> at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
> >> at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
> >> at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
> >> at
> >>
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
> >> at
> >>
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> >> at
> >>
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
> >> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
> >> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
> >>
> >> -----邮件原件-----
> >> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> >> 发送时间: 2014年5月5日 17:23
> >> 收件人: user@hbase.apache.org
> >> 主题: 答复: meta server hungs ?
> >>
> >> Thank you for reply.
> >> I find this logs in hadoop77/192.168.1.87. It seems like meta
> >> regionserver receive hmaster's message and shutdown itself.
> >> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> >> regionserver.HRegionServer: ABORTING region server
> >> hadoop77,60020,1396606457005:
> org.apache.hadoop.hbase.YouAreDeadException:
> >> Server REPORT rejected; currently processing
> hadoop77,60020,1396606457005
> >> as dead server
> >> at
> >>
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
> >>
> >>
> >> and this is gc log:
> >> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> >> 449091K->52416K(471872K), 0.0411300 secs]
> 11582287K->11199419K(16724800K),
> >> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> >> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> >> 471859K->19313K(471872K), 0.0222250 secs]
> 11618863K->11175232K(16724800K),
> >> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> >> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> >> 438769K->38887K(471872K), 0.0242330 secs]
> 11594688K->11194807K(16724800K),
> >> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> >> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> >> 458343K->18757K(471872K), 0.0242790 secs]
> 11614263K->11180844K(16724800K),
> >> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> >> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> >> 438213K->4874K(471872K), 0.0221520 secs]
> 11600300K->11166960K(16724800K),
> >> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> >> Heap
> >> par new generation total 471872K, used 335578K [0x00000003fae00000,
> >> 0x000000041ae00000, 0x000000041ae00000)
> >> eden space 419456K, 78% used [0x00000003fae00000, 0x000000040f0f41c8,
> >> 0x00000004147a0000)
> >> from space 52416K, 9% used [0x0000000417ad0000, 0x0000000417f928e0,
> >> 0x000000041ae00000)
> >> to space 52416K, 0% used [0x00000004147a0000, 0x00000004147a0000,
> >> 0x0000000417ad0000)
> >> concurrent mark-sweep generation total 16252928K, used 11162086K
> >> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
> >> concurrent-mark-sweep perm gen total 81072K, used 48660K
> >> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
> >>
> >>
> >>
> >> -----邮件原件-----
> >> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> >> 发送时间: 2014年5月5日 16:50
> >> 收件人: user@hbase.apache.org
> >> 抄送: sunweiwei
> >> 主题: Re: meta server hungs ?
> >>
> >> Hi,
> >> This exception:
> >> ****
> >> exception=java.net.SocketTimeoutException: Call to
> >> hadoop77/192.168.1.87:60020 failed because
> >> java.net.SocketTimeoutException:
> >> 60000 millis timeout while waiting for channel to be ready for read. ch
> :
> >> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> >> remote=hadoop77/192.168.1.87:60020]
> >> *****
> >> shows that there is connection timeout between master server and
> >> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta'
> table.
> >> Real question is what is causing this timeout? In my experience it can
> be
> >> by few things causing this type of timeout. I would suggest that you
> check
> >> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> >> memory, network, CPU disks and i'm sure you will find cause of timeout.
> >> You can us some diagnostic tools like vmstat, sar, iostat to check your
> >> sistem and you can use jstat to check GC and some other JVM stuff.
> >>
> >> Regards
> >> Samir
> >>
> >>
> >>
> >>
> >> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >>> wrote:
> >>
> >>> Hi
> >>>
> >>> I'm using hbase0.96.0.
> >>>
> >>> I found client can't put data suddenly and hmaster hungs. Then I
> >> shutdown
> >>> the hmaster and start a new hmaster, then the client back to normal.
> >>>
> >>>
> >>>
> >>> I found this logs in the new hmaster . It seem like meta server hungs
> and
> >>> hmaster stop the meta server.
> >>>
> >>> 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000]
> >>> catalog.CatalogTracker:
> >>> Failed verification of hbase:meta,,1 at
> >>> address=hadoop77,60020,1396606457005,
> >>> exception=java.net.SocketTimeoutException: Call to
> >>> hadoop77/192.168.1.87:60020 failed because
> >>> java.net.SocketTimeoutException:
> >>> 60000 millis timeout while waiting for channel to be ready for read.
> ch :
> >>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> >>> remote=hadoop77/192.168.1.87:60020]
> >>>
> >>> 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster:
> >>> Forcing
> >>> expire of hadoop77,60020,1396606457005
> >>>
> >>>
> >>>
> >>> I can't find why meta server hungs .I found this in meta server log
> >>>
> >>> 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired
> on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>> 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired
> on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired
> on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> any suggestion will be appreciated. Thanks.
> >
>
>
答复: 答复: 答复: meta server hungs ?
Posted by sunweiwei <su...@asiainfo-linkage.com>.
HI
Sorry, I just saw this mail. I set Gc parameters like this:
export HBASE_REGIONSERVER_OPTS="-Xmn512m -XX:CMSInitiatingOccupancyFraction=70 -Xms16384m -Xmx16384m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps "
-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com]
发送时间: 2014年5月11日 19:11
收件人: user@hbase.apache.org
抄送: <us...@hbase.apache.org>
主题: Re: 答复: 答复: meta server hungs ?
What GC parameters did you specify for JVM ?
Thanks
On May 7, 2014, at 6:27 PM, "sunweiwei" <su...@asiainfo-linkage.com> wrote:
> I find lots of these in gc.log. It seems like CMS gc run many times but old Generation is always large.
> I'm confused.
> Any suggestion will be appreciated. Thanks.
>
> 2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
> 2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew: 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K), 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
> 2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew: 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K), 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
> 2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep: 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]
>
> 2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
> 2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep: 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]
>
> 2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
> 2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep: 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> 发送时间: 2014年5月6日 9:27
> 收件人: user@hbase.apache.org
> 主题: 答复: 答复: meta server hungs ?
>
> HI Samir
> I think master declared hadoop77/192.168.1.87:60020 as dead server, because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
> I have paste the master log in the first mail.
>
> I'm not sure, here is the whole process:
> at 2014-04-29 13:53:57,271 client throw a SocketTimeoutException : Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
> at 2014-04-29 15:30:** I visit hbase web and found hmaster hung , then i stop it and start a new hmaster.
> at 2014-04-29 15:32:21,530 the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException:
> Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
> at 2014-04-29 15:32:28,364 the meta server received hmaster's message and shutdown itself.
>
> after these, clients come back to normal
>
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> 发送时间: 2014年5月5日 19:25
> 收件人: user@hbase.apache.org
> 主题: Re: 答复: meta server hungs ?
>
> There should be exception in regionserver log on hadoop77/
> 192.168.1.87:60020 above this one:
>
> *********
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
> at org.apache.hadoop.hbase.master.ServerManager.
> checkIsDead(ServerManager.java:339)
> *********
>
> Can you find it and past it. That exception should explain why
> master declared hadoop77/192.168.1.87:60020 as dead server.
>
> Regards
> Samir
>
>
> On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
>
>> And this is client log.
>>
>> 2014-04-29 13:53:57,271 WARN [main]
>> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
>> closed
>> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
>> 192.168.1.87:60020]
>> at
>> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
>> at
>> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
>> at
>> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
>> at
>> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
>> at
>> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
>> at
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
>> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
>> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>>
>> -----邮件原件-----
>> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
>> 发送时间: 2014年5月5日 17:23
>> 收件人: user@hbase.apache.org
>> 主题: 答复: meta server hungs ?
>>
>> Thank you for reply.
>> I find this logs in hadoop77/192.168.1.87. It seems like meta
>> regionserver receive hmaster's message and shutdown itself.
>> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
>> regionserver.HRegionServer: ABORTING region server
>> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
>> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
>> as dead server
>> at
>> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>>
>>
>> and this is gc log:
>> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
>> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
>> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
>> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
>> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
>> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
>> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
>> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
>> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
>> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
>> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
>> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
>> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
>> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
>> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
>> Heap
>> par new generation total 471872K, used 335578K [0x00000003fae00000,
>> 0x000000041ae00000, 0x000000041ae00000)
>> eden space 419456K, 78% used [0x00000003fae00000, 0x000000040f0f41c8,
>> 0x00000004147a0000)
>> from space 52416K, 9% used [0x0000000417ad0000, 0x0000000417f928e0,
>> 0x000000041ae00000)
>> to space 52416K, 0% used [0x00000004147a0000, 0x00000004147a0000,
>> 0x0000000417ad0000)
>> concurrent mark-sweep generation total 16252928K, used 11162086K
>> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
>> concurrent-mark-sweep perm gen total 81072K, used 48660K
>> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>>
>>
>>
>> -----邮件原件-----
>> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
>> 发送时间: 2014年5月5日 16:50
>> 收件人: user@hbase.apache.org
>> 抄送: sunweiwei
>> 主题: Re: meta server hungs ?
>>
>> Hi,
>> This exception:
>> ****
>> exception=java.net.SocketTimeoutException: Call to
>> hadoop77/192.168.1.87:60020 failed because
>> java.net.SocketTimeoutException:
>> 60000 millis timeout while waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
>> remote=hadoop77/192.168.1.87:60020]
>> *****
>> shows that there is connection timeout between master server and
>> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
>> Real question is what is causing this timeout? In my experience it can be
>> by few things causing this type of timeout. I would suggest that you check
>> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
>> memory, network, CPU disks and i'm sure you will find cause of timeout.
>> You can us some diagnostic tools like vmstat, sar, iostat to check your
>> sistem and you can use jstat to check GC and some other JVM stuff.
>>
>> Regards
>> Samir
>>
>>
>>
>>
>> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
>>> wrote:
>>
>>> Hi
>>>
>>> I'm using hbase0.96.0.
>>>
>>> I found client can't put data suddenly and hmaster hungs. Then I
>> shutdown
>>> the hmaster and start a new hmaster, then the client back to normal.
>>>
>>>
>>>
>>> I found this logs in the new hmaster . It seem like meta server hungs and
>>> hmaster stop the meta server.
>>>
>>> 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000]
>>> catalog.CatalogTracker:
>>> Failed verification of hbase:meta,,1 at
>>> address=hadoop77,60020,1396606457005,
>>> exception=java.net.SocketTimeoutException: Call to
>>> hadoop77/192.168.1.87:60020 failed because
>>> java.net.SocketTimeoutException:
>>> 60000 millis timeout while waiting for channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
>>> remote=hadoop77/192.168.1.87:60020]
>>>
>>> 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster:
>>> Forcing
>>> expire of hadoop77,60020,1396606457005
>>>
>>>
>>>
>>> I can't find why meta server hungs .I found this in meta server log
>>>
>>> 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
>>> region hbase:meta,,1.1588230740
>>>
>>> 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
>>> region hbase:meta,,1.1588230740
>>>
>>> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
>>> region hbase:meta,,1.1588230740
>>>
>>> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
>>> region hbase:meta,,1.1588230740
>>>
>>>
>>>
>>>
>>>
>>> any suggestion will be appreciated. Thanks.
>
Re: 答复: 答复: meta server hungs ?
Posted by Ted Yu <yu...@gmail.com>.
What GC parameters did you specify for JVM ?
Thanks
On May 7, 2014, at 6:27 PM, "sunweiwei" <su...@asiainfo-linkage.com> wrote:
> I find lots of these in gc.log. It seems like CMS gc run many times but old Generation is always large.
> I'm confused.
> Any suggestion will be appreciated. Thanks.
>
> 2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
> 2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew: 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K), 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
> 2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew: 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K), 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
> 2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep: 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]
>
> 2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
> 2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep: 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]
>
> 2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
> 2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep: 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> 发送时间: 2014年5月6日 9:27
> 收件人: user@hbase.apache.org
> 主题: 答复: 答复: meta server hungs ?
>
> HI Samir
> I think master declared hadoop77/192.168.1.87:60020 as dead server, because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
> I have paste the master log in the first mail.
>
> I'm not sure, here is the whole process:
> at 2014-04-29 13:53:57,271 client throw a SocketTimeoutException : Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
> at 2014-04-29 15:30:** I visit hbase web and found hmaster hung , then i stop it and start a new hmaster.
> at 2014-04-29 15:32:21,530 the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException:
> Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
> at 2014-04-29 15:32:28,364 the meta server received hmaster's message and shutdown itself.
>
> after these, clients come back to normal
>
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> 发送时间: 2014年5月5日 19:25
> 收件人: user@hbase.apache.org
> 主题: Re: 答复: meta server hungs ?
>
> There should be exception in regionserver log on hadoop77/
> 192.168.1.87:60020 above this one:
>
> *********
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
> at org.apache.hadoop.hbase.master.ServerManager.
> checkIsDead(ServerManager.java:339)
> *********
>
> Can you find it and past it. That exception should explain why
> master declared hadoop77/192.168.1.87:60020 as dead server.
>
> Regards
> Samir
>
>
> On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
>
>> And this is client log.
>>
>> 2014-04-29 13:53:57,271 WARN [main]
>> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
>> closed
>> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
>> 192.168.1.87:60020]
>> at
>> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>> at
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
>> at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
>> at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
>> at
>> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
>> at
>> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
>> at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
>> at
>> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
>> at
>> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
>> at
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
>> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
>> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>>
>> -----邮件原件-----
>> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
>> 发送时间: 2014年5月5日 17:23
>> 收件人: user@hbase.apache.org
>> 主题: 答复: meta server hungs ?
>>
>> Thank you for reply.
>> I find this logs in hadoop77/192.168.1.87. It seems like meta
>> regionserver receive hmaster's message and shutdown itself.
>> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
>> regionserver.HRegionServer: ABORTING region server
>> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
>> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
>> as dead server
>> at
>> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>>
>>
>> and this is gc log:
>> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
>> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
>> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
>> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
>> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
>> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
>> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
>> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
>> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
>> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
>> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
>> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
>> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
>> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
>> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
>> Heap
>> par new generation total 471872K, used 335578K [0x00000003fae00000,
>> 0x000000041ae00000, 0x000000041ae00000)
>> eden space 419456K, 78% used [0x00000003fae00000, 0x000000040f0f41c8,
>> 0x00000004147a0000)
>> from space 52416K, 9% used [0x0000000417ad0000, 0x0000000417f928e0,
>> 0x000000041ae00000)
>> to space 52416K, 0% used [0x00000004147a0000, 0x00000004147a0000,
>> 0x0000000417ad0000)
>> concurrent mark-sweep generation total 16252928K, used 11162086K
>> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
>> concurrent-mark-sweep perm gen total 81072K, used 48660K
>> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>>
>>
>>
>> -----邮件原件-----
>> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
>> 发送时间: 2014年5月5日 16:50
>> 收件人: user@hbase.apache.org
>> 抄送: sunweiwei
>> 主题: Re: meta server hungs ?
>>
>> Hi,
>> This exception:
>> ****
>> exception=java.net.SocketTimeoutException: Call to
>> hadoop77/192.168.1.87:60020 failed because
>> java.net.SocketTimeoutException:
>> 60000 millis timeout while waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
>> remote=hadoop77/192.168.1.87:60020]
>> *****
>> shows that there is connection timeout between master server and
>> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
>> Real question is what is causing this timeout? In my experience it can be
>> by few things causing this type of timeout. I would suggest that you check
>> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
>> memory, network, CPU disks and i'm sure you will find cause of timeout.
>> You can us some diagnostic tools like vmstat, sar, iostat to check your
>> sistem and you can use jstat to check GC and some other JVM stuff.
>>
>> Regards
>> Samir
>>
>>
>>
>>
>> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
>>> wrote:
>>
>>> Hi
>>>
>>> I'm using hbase0.96.0.
>>>
>>> I found client can't put data suddenly and hmaster hungs. Then I
>> shutdown
>>> the hmaster and start a new hmaster, then the client back to normal.
>>>
>>>
>>>
>>> I found this logs in the new hmaster . It seem like meta server hungs and
>>> hmaster stop the meta server.
>>>
>>> 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000]
>>> catalog.CatalogTracker:
>>> Failed verification of hbase:meta,,1 at
>>> address=hadoop77,60020,1396606457005,
>>> exception=java.net.SocketTimeoutException: Call to
>>> hadoop77/192.168.1.87:60020 failed because
>>> java.net.SocketTimeoutException:
>>> 60000 millis timeout while waiting for channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
>>> remote=hadoop77/192.168.1.87:60020]
>>>
>>> 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster:
>>> Forcing
>>> expire of hadoop77,60020,1396606457005
>>>
>>>
>>>
>>> I can't find why meta server hungs .I found this in meta server log
>>>
>>> 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
>>> region hbase:meta,,1.1588230740
>>>
>>> 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
>>> region hbase:meta,,1.1588230740
>>>
>>> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
>>> region hbase:meta,,1.1588230740
>>>
>>> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
>>> region hbase:meta,,1.1588230740
>>>
>>>
>>>
>>>
>>>
>>> any suggestion will be appreciated. Thanks.
>
答复: 答复: meta server hungs ?
Posted by sunweiwei <su...@asiainfo-linkage.com>.
I find lots of these in gc.log. It seems like CMS gc run many times but old Generation is always large.
I'm confused.
Any suggestion will be appreciated. Thanks.
2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew: 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K), 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew: 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K), 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep: 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]
2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep: 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]
2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep: 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]
-----邮件原件-----
发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
发送时间: 2014年5月6日 9:27
收件人: user@hbase.apache.org
主题: 答复: 答复: meta server hungs ?
HI Samir
I think master declared hadoop77/192.168.1.87:60020 as dead server, because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
I have paste the master log in the first mail.
I'm not sure, here is the whole process:
at 2014-04-29 13:53:57,271 client throw a SocketTimeoutException : Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
at 2014-04-29 15:30:** I visit hbase web and found hmaster hung , then i stop it and start a new hmaster.
at 2014-04-29 15:32:21,530 the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException:
Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
at 2014-04-29 15:32:28,364 the meta server received hmaster's message and shutdown itself.
after these, clients come back to normal
-----邮件原件-----
发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
发送时间: 2014年5月5日 19:25
收件人: user@hbase.apache.org
主题: Re: 答复: meta server hungs ?
There should be exception in regionserver log on hadoop77/
192.168.1.87:60020 above this one:
*********
2014-04-29 15:32:28,364 FATAL [regionserver60020]
regionserver.HRegionServer: ABORTING region server
hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing hadoop77,60020,1396606457005
as dead server
at org.apache.hadoop.hbase.master.ServerManager.
checkIsDead(ServerManager.java:339)
*********
Can you find it and past it. That exception should explain why
master declared hadoop77/192.168.1.87:60020 as dead server.
Regards
Samir
On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> And this is client log.
>
> 2014-04-29 13:53:57,271 WARN [main]
> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> closed
> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
> 192.168.1.87:60020]
> at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
> at
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> at
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> 发送时间: 2014年5月5日 17:23
> 收件人: user@hbase.apache.org
> 主题: 答复: meta server hungs ?
>
> Thank you for reply.
> I find this logs in hadoop77/192.168.1.87. It seems like meta
> regionserver receive hmaster's message and shutdown itself.
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
> at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>
>
> and this is gc log:
> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> Heap
> par new generation total 471872K, used 335578K [0x00000003fae00000,
> 0x000000041ae00000, 0x000000041ae00000)
> eden space 419456K, 78% used [0x00000003fae00000, 0x000000040f0f41c8,
> 0x00000004147a0000)
> from space 52416K, 9% used [0x0000000417ad0000, 0x0000000417f928e0,
> 0x000000041ae00000)
> to space 52416K, 0% used [0x00000004147a0000, 0x00000004147a0000,
> 0x0000000417ad0000)
> concurrent mark-sweep generation total 16252928K, used 11162086K
> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
> concurrent-mark-sweep perm gen total 81072K, used 48660K
> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>
>
>
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> 发送时间: 2014年5月5日 16:50
> 收件人: user@hbase.apache.org
> 抄送: sunweiwei
> 主题: Re: meta server hungs ?
>
> Hi,
> This exception:
> ****
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
> *****
> shows that there is connection timeout between master server and
> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
> Real question is what is causing this timeout? In my experience it can be
> by few things causing this type of timeout. I would suggest that you check
> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> memory, network, CPU disks and i'm sure you will find cause of timeout.
> You can us some diagnostic tools like vmstat, sar, iostat to check your
> sistem and you can use jstat to check GC and some other JVM stuff.
>
> Regards
> Samir
>
>
>
>
> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
>
> > Hi
> >
> > I'm using hbase0.96.0.
> >
> > I found client can't put data suddenly and hmaster hungs. Then I
> shutdown
> > the hmaster and start a new hmaster, then the client back to normal.
> >
> >
> >
> > I found this logs in the new hmaster . It seem like meta server hungs and
> > hmaster stop the meta server.
> >
> > 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000]
> > catalog.CatalogTracker:
> > Failed verification of hbase:meta,,1 at
> > address=hadoop77,60020,1396606457005,
> > exception=java.net.SocketTimeoutException: Call to
> > hadoop77/192.168.1.87:60020 failed because
> > java.net.SocketTimeoutException:
> > 60000 millis timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> > remote=hadoop77/192.168.1.87:60020]
> >
> > 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster:
> > Forcing
> > expire of hadoop77,60020,1396606457005
> >
> >
> >
> > I can't find why meta server hungs .I found this in meta server log
> >
> > 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> > region hbase:meta,,1.1588230740
> >
> >
> >
> >
> >
> > any suggestion will be appreciated. Thanks.
> >
> >
>
>
答复: 答复: meta server hungs ?
Posted by sunweiwei <su...@asiainfo-linkage.com>.
I find lots of these in gc.log. It seems like CMS gc run many times but old Generation is always large.
I'm confused. Any suggestions ,Thanks
2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew: 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K), 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew: 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K), 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep: 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]
2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep: 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]
2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep: 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]
-----邮件原件-----
发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
发送时间: 2014年5月6日 9:27
收件人: user@hbase.apache.org
主题: 答复: 答复: meta server hungs ?
HI Samir
I think master declared hadoop77/192.168.1.87:60020 as dead server, because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
I have paste the master log in the first mail.
I'm not sure, here is the whole process:
at 2014-04-29 13:53:57,271 client throw a SocketTimeoutException : Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
at 2014-04-29 15:30:** I visit hbase web and found hmaster hung , then i stop it and start a new hmaster.
at 2014-04-29 15:32:21,530 the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException:
Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
at 2014-04-29 15:32:28,364 the meta server received hmaster's message and shutdown itself.
after these, clients come back to normal
-----邮件原件-----
发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
发送时间: 2014年5月5日 19:25
收件人: user@hbase.apache.org
主题: Re: 答复: meta server hungs ?
There should be exception in regionserver log on hadoop77/
192.168.1.87:60020 above this one:
*********
2014-04-29 15:32:28,364 FATAL [regionserver60020]
regionserver.HRegionServer: ABORTING region server
hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing hadoop77,60020,1396606457005
as dead server
at org.apache.hadoop.hbase.master.ServerManager.
checkIsDead(ServerManager.java:339)
*********
Can you find it and past it. That exception should explain why
master declared hadoop77/192.168.1.87:60020 as dead server.
Regards
Samir
On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> And this is client log.
>
> 2014-04-29 13:53:57,271 WARN [main]
> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> closed
> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
> 192.168.1.87:60020]
> at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
> at
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> at
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> 发送时间: 2014年5月5日 17:23
> 收件人: user@hbase.apache.org
> 主题: 答复: meta server hungs ?
>
> Thank you for reply.
> I find this logs in hadoop77/192.168.1.87. It seems like meta
> regionserver receive hmaster's message and shutdown itself.
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
> at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>
>
> and this is gc log:
> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> Heap
> par new generation total 471872K, used 335578K [0x00000003fae00000,
> 0x000000041ae00000, 0x000000041ae00000)
> eden space 419456K, 78% used [0x00000003fae00000, 0x000000040f0f41c8,
> 0x00000004147a0000)
> from space 52416K, 9% used [0x0000000417ad0000, 0x0000000417f928e0,
> 0x000000041ae00000)
> to space 52416K, 0% used [0x00000004147a0000, 0x00000004147a0000,
> 0x0000000417ad0000)
> concurrent mark-sweep generation total 16252928K, used 11162086K
> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
> concurrent-mark-sweep perm gen total 81072K, used 48660K
> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>
>
>
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> 发送时间: 2014年5月5日 16:50
> 收件人: user@hbase.apache.org
> 抄送: sunweiwei
> 主题: Re: meta server hungs ?
>
> Hi,
> This exception:
> ****
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
> *****
> shows that there is connection timeout between master server and
> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
> Real question is what is causing this timeout? In my experience it can be
> by few things causing this type of timeout. I would suggest that you check
> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> memory, network, CPU disks and i'm sure you will find cause of timeout.
> You can us some diagnostic tools like vmstat, sar, iostat to check your
> sistem and you can use jstat to check GC and some other JVM stuff.
>
> Regards
> Samir
>
>
>
>
> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
>
> > Hi
> >
> > I'm using hbase0.96.0.
> >
> > I found client can't put data suddenly and hmaster hungs. Then I
> shutdown
> > the hmaster and start a new hmaster, then the client back to normal.
> >
> >
> >
> > I found this logs in the new hmaster . It seem like meta server hungs and
> > hmaster stop the meta server.
> >
> > 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000]
> > catalog.CatalogTracker:
> > Failed verification of hbase:meta,,1 at
> > address=hadoop77,60020,1396606457005,
> > exception=java.net.SocketTimeoutException: Call to
> > hadoop77/192.168.1.87:60020 failed because
> > java.net.SocketTimeoutException:
> > 60000 millis timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> > remote=hadoop77/192.168.1.87:60020]
> >
> > 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster:
> > Forcing
> > expire of hadoop77,60020,1396606457005
> >
> >
> >
> > I can't find why meta server hungs .I found this in meta server log
> >
> > 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> > region hbase:meta,,1.1588230740
> >
> >
> >
> >
> >
> > any suggestion will be appreciated. Thanks.
> >
> >
>
>
答复: 答复: meta server hungs ?
Posted by sunweiwei <su...@asiainfo-linkage.com>.
HI Samir
I think master declared hadoop77/192.168.1.87:60020 as dead server, because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
I have paste the master log in the first mail.
I'm not sure, here is the whole process:
at 2014-04-29 13:53:57,271 client throw a SocketTimeoutException : Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
at 2014-04-29 15:30:** I visit hbase web and found hmaster hung , then i stop it and start a new hmaster.
at 2014-04-29 15:32:21,530 the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException:
Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
at 2014-04-29 15:32:28,364 the meta server received hmaster's message and shutdown itself.
after these, clients come back to normal
-----邮件原件-----
发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
发送时间: 2014年5月5日 19:25
收件人: user@hbase.apache.org
主题: Re: 答复: meta server hungs ?
There should be exception in regionserver log on hadoop77/
192.168.1.87:60020 above this one:
*********
2014-04-29 15:32:28,364 FATAL [regionserver60020]
regionserver.HRegionServer: ABORTING region server
hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing hadoop77,60020,1396606457005
as dead server
at org.apache.hadoop.hbase.master.ServerManager.
checkIsDead(ServerManager.java:339)
*********
Can you find it and past it. That exception should explain why
master declared hadoop77/192.168.1.87:60020 as dead server.
Regards
Samir
On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> And this is client log.
>
> 2014-04-29 13:53:57,271 WARN [main]
> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> closed
> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
> 192.168.1.87:60020]
> at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
> at
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> at
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> 发送时间: 2014年5月5日 17:23
> 收件人: user@hbase.apache.org
> 主题: 答复: meta server hungs ?
>
> Thank you for reply.
> I find this logs in hadoop77/192.168.1.87. It seems like meta
> regionserver receive hmaster's message and shutdown itself.
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
> at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>
>
> and this is gc log:
> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> Heap
> par new generation total 471872K, used 335578K [0x00000003fae00000,
> 0x000000041ae00000, 0x000000041ae00000)
> eden space 419456K, 78% used [0x00000003fae00000, 0x000000040f0f41c8,
> 0x00000004147a0000)
> from space 52416K, 9% used [0x0000000417ad0000, 0x0000000417f928e0,
> 0x000000041ae00000)
> to space 52416K, 0% used [0x00000004147a0000, 0x00000004147a0000,
> 0x0000000417ad0000)
> concurrent mark-sweep generation total 16252928K, used 11162086K
> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
> concurrent-mark-sweep perm gen total 81072K, used 48660K
> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>
>
>
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> 发送时间: 2014年5月5日 16:50
> 收件人: user@hbase.apache.org
> 抄送: sunweiwei
> 主题: Re: meta server hungs ?
>
> Hi,
> This exception:
> ****
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
> *****
> shows that there is connection timeout between master server and
> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
> Real question is what is causing this timeout? In my experience it can be
> by few things causing this type of timeout. I would suggest that you check
> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> memory, network, CPU disks and i'm sure you will find cause of timeout.
> You can us some diagnostic tools like vmstat, sar, iostat to check your
> sistem and you can use jstat to check GC and some other JVM stuff.
>
> Regards
> Samir
>
>
>
>
> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
>
> > Hi
> >
> > I'm using hbase0.96.0.
> >
> > I found client can't put data suddenly and hmaster hungs. Then I
> shutdown
> > the hmaster and start a new hmaster, then the client back to normal.
> >
> >
> >
> > I found this logs in the new hmaster . It seem like meta server hungs and
> > hmaster stop the meta server.
> >
> > 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000]
> > catalog.CatalogTracker:
> > Failed verification of hbase:meta,,1 at
> > address=hadoop77,60020,1396606457005,
> > exception=java.net.SocketTimeoutException: Call to
> > hadoop77/192.168.1.87:60020 failed because
> > java.net.SocketTimeoutException:
> > 60000 millis timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> > remote=hadoop77/192.168.1.87:60020]
> >
> > 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster:
> > Forcing
> > expire of hadoop77,60020,1396606457005
> >
> >
> >
> > I can't find why meta server hungs .I found this in meta server log
> >
> > 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> > region hbase:meta,,1.1588230740
> >
> >
> >
> >
> >
> > any suggestion will be appreciated. Thanks.
> >
> >
>
>
Re: 答复: meta server hungs ?
Posted by Samir Ahmic <ah...@gmail.com>.
There should be exception in regionserver log on hadoop77/
192.168.1.87:60020 above this one:
*********
2014-04-29 15:32:28,364 FATAL [regionserver60020]
regionserver.HRegionServer: ABORTING region server
hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing hadoop77,60020,1396606457005
as dead server
at org.apache.hadoop.hbase.master.ServerManager.
checkIsDead(ServerManager.java:339)
*********
Can you find it and past it. That exception should explain why
master declared hadoop77/192.168.1.87:60020 as dead server.
Regards
Samir
On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> And this is client log.
>
> 2014-04-29 13:53:57,271 WARN [main]
> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> closed
> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
> 192.168.1.87:60020]
> at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
> at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
> at
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
> at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
> at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> at
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> 发送时间: 2014年5月5日 17:23
> 收件人: user@hbase.apache.org
> 主题: 答复: meta server hungs ?
>
> Thank you for reply.
> I find this logs in hadoop77/192.168.1.87. It seems like meta
> regionserver receive hmaster's message and shutdown itself.
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
> at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>
>
> and this is gc log:
> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> Heap
> par new generation total 471872K, used 335578K [0x00000003fae00000,
> 0x000000041ae00000, 0x000000041ae00000)
> eden space 419456K, 78% used [0x00000003fae00000, 0x000000040f0f41c8,
> 0x00000004147a0000)
> from space 52416K, 9% used [0x0000000417ad0000, 0x0000000417f928e0,
> 0x000000041ae00000)
> to space 52416K, 0% used [0x00000004147a0000, 0x00000004147a0000,
> 0x0000000417ad0000)
> concurrent mark-sweep generation total 16252928K, used 11162086K
> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
> concurrent-mark-sweep perm gen total 81072K, used 48660K
> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>
>
>
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> 发送时间: 2014年5月5日 16:50
> 收件人: user@hbase.apache.org
> 抄送: sunweiwei
> 主题: Re: meta server hungs ?
>
> Hi,
> This exception:
> ****
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
> *****
> shows that there is connection timeout between master server and
> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
> Real question is what is causing this timeout? In my experience it can be
> by few things causing this type of timeout. I would suggest that you check
> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> memory, network, CPU disks and i'm sure you will find cause of timeout.
> You can us some diagnostic tools like vmstat, sar, iostat to check your
> sistem and you can use jstat to check GC and some other JVM stuff.
>
> Regards
> Samir
>
>
>
>
> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
>
> > Hi
> >
> > I'm using hbase0.96.0.
> >
> > I found client can't put data suddenly and hmaster hungs. Then I
> shutdown
> > the hmaster and start a new hmaster, then the client back to normal.
> >
> >
> >
> > I found this logs in the new hmaster . It seem like meta server hungs and
> > hmaster stop the meta server.
> >
> > 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000]
> > catalog.CatalogTracker:
> > Failed verification of hbase:meta,,1 at
> > address=hadoop77,60020,1396606457005,
> > exception=java.net.SocketTimeoutException: Call to
> > hadoop77/192.168.1.87:60020 failed because
> > java.net.SocketTimeoutException:
> > 60000 millis timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> > remote=hadoop77/192.168.1.87:60020]
> >
> > 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster:
> > Forcing
> > expire of hadoop77,60020,1396606457005
> >
> >
> >
> > I can't find why meta server hungs .I found this in meta server log
> >
> > 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> > region hbase:meta,,1.1588230740
> >
> >
> >
> >
> >
> > any suggestion will be appreciated. Thanks.
> >
> >
>
>
答复: meta server hungs ?
Posted by sunweiwei <su...@asiainfo-linkage.com>.
And this is client log.
2014-04-29 13:53:57,271 WARN [main] org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already closed
java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473 remote=hadoop77/192.168.1.87:60020]
at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
at org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
-----邮件原件-----
发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
发送时间: 2014年5月5日 17:23
收件人: user@hbase.apache.org
主题: 答复: meta server hungs ?
Thank you for reply.
I find this logs in hadoop77/192.168.1.87. It seems like meta regionserver receive hmaster's message and shutdown itself.
2014-04-29 15:32:28,364 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hadoop77,60020,1396606457005 as dead server
at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
and this is gc log:
2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew: 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K), 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew: 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K), 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew: 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K), 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew: 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K), 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew: 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K), 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
Heap
par new generation total 471872K, used 335578K [0x00000003fae00000, 0x000000041ae00000, 0x000000041ae00000)
eden space 419456K, 78% used [0x00000003fae00000, 0x000000040f0f41c8, 0x00000004147a0000)
from space 52416K, 9% used [0x0000000417ad0000, 0x0000000417f928e0, 0x000000041ae00000)
to space 52416K, 0% used [0x00000004147a0000, 0x00000004147a0000, 0x0000000417ad0000)
concurrent mark-sweep generation total 16252928K, used 11162086K [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 81072K, used 48660K [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
-----邮件原件-----
发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
发送时间: 2014年5月5日 16:50
收件人: user@hbase.apache.org
抄送: sunweiwei
主题: Re: meta server hungs ?
Hi,
This exception:
****
exception=java.net.SocketTimeoutException: Call to
hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
remote=hadoop77/192.168.1.87:60020]
*****
shows that there is connection timeout between master server and
regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
Real question is what is causing this timeout? In my experience it can be
by few things causing this type of timeout. I would suggest that you check
hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
memory, network, CPU disks and i'm sure you will find cause of timeout.
You can us some diagnostic tools like vmstat, sar, iostat to check your
sistem and you can use jstat to check GC and some other JVM stuff.
Regards
Samir
On Mon, May 5, 2014 at 10:14 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> Hi
>
> I'm using hbase0.96.0.
>
> I found client can't put data suddenly and hmaster hungs. Then I shutdown
> the hmaster and start a new hmaster, then the client back to normal.
>
>
>
> I found this logs in the new hmaster . It seem like meta server hungs and
> hmaster stop the meta server.
>
> 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000]
> catalog.CatalogTracker:
> Failed verification of hbase:meta,,1 at
> address=hadoop77,60020,1396606457005,
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
>
> 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster:
> Forcing
> expire of hadoop77,60020,1396606457005
>
>
>
> I can't find why meta server hungs .I found this in meta server log
>
> 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> region hbase:meta,,1.1588230740
>
>
>
>
>
> any suggestion will be appreciated. Thanks.
>
>
答复: meta server hungs ?
Posted by sunweiwei <su...@asiainfo-linkage.com>.
Thank you for reply.
I find this logs in hadoop77/192.168.1.87. It seems like meta regionserver receive hmaster's message and shutdown itself.
2014-04-29 15:32:28,364 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hadoop77,60020,1396606457005 as dead server
at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
and this is gc log:
2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew: 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K), 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew: 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K), 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew: 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K), 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew: 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K), 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew: 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K), 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
Heap
par new generation total 471872K, used 335578K [0x00000003fae00000, 0x000000041ae00000, 0x000000041ae00000)
eden space 419456K, 78% used [0x00000003fae00000, 0x000000040f0f41c8, 0x00000004147a0000)
from space 52416K, 9% used [0x0000000417ad0000, 0x0000000417f928e0, 0x000000041ae00000)
to space 52416K, 0% used [0x00000004147a0000, 0x00000004147a0000, 0x0000000417ad0000)
concurrent mark-sweep generation total 16252928K, used 11162086K [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
concurrent-mark-sweep perm gen total 81072K, used 48660K [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
-----邮件原件-----
发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
发送时间: 2014年5月5日 16:50
收件人: user@hbase.apache.org
抄送: sunweiwei
主题: Re: meta server hungs ?
Hi,
This exception:
****
exception=java.net.SocketTimeoutException: Call to
hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
remote=hadoop77/192.168.1.87:60020]
*****
shows that there is connection timeout between master server and
regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
Real question is what is causing this timeout? In my experience it can be
by few things causing this type of timeout. I would suggest that you check
hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
memory, network, CPU disks and i'm sure you will find cause of timeout.
You can us some diagnostic tools like vmstat, sar, iostat to check your
sistem and you can use jstat to check GC and some other JVM stuff.
Regards
Samir
On Mon, May 5, 2014 at 10:14 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> Hi
>
> I'm using hbase0.96.0.
>
> I found client can't put data suddenly and hmaster hungs. Then I shutdown
> the hmaster and start a new hmaster, then the client back to normal.
>
>
>
> I found this logs in the new hmaster . It seem like meta server hungs and
> hmaster stop the meta server.
>
> 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000]
> catalog.CatalogTracker:
> Failed verification of hbase:meta,,1 at
> address=hadoop77,60020,1396606457005,
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
>
> 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster:
> Forcing
> expire of hadoop77,60020,1396606457005
>
>
>
> I can't find why meta server hungs .I found this in meta server log
>
> 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> region hbase:meta,,1.1588230740
>
>
>
>
>
> any suggestion will be appreciated. Thanks.
>
>
Re: meta server hungs ?
Posted by Samir Ahmic <ah...@gmail.com>.
Hi,
This exception:
****
exception=java.net.SocketTimeoutException: Call to
hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
remote=hadoop77/192.168.1.87:60020]
*****
shows that there is connection timeout between master server and
regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
Real question is what is causing this timeout? In my experience it can be
by few things causing this type of timeout. I would suggest that you check
hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
memory, network, CPU disks and i'm sure you will find cause of timeout.
You can us some diagnostic tools like vmstat, sar, iostat to check your
sistem and you can use jstat to check GC and some other JVM stuff.
Regards
Samir
On Mon, May 5, 2014 at 10:14 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> Hi
>
> I'm using hbase0.96.0.
>
> I found client can't put data suddenly and hmaster hungs. Then I shutdown
> the hmaster and start a new hmaster, then the client back to normal.
>
>
>
> I found this logs in the new hmaster . It seem like meta server hungs and
> hmaster stop the meta server.
>
> 2014-04-29 15:32:21,530 INFO [master:hadoop1:60000]
> catalog.CatalogTracker:
> Failed verification of hbase:meta,,1 at
> address=hadoop77,60020,1396606457005,
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
>
> 2014-04-29 15:32:21,532 INFO [master:hadoop1:60000] master.HMaster:
> Forcing
> expire of hadoop77,60020,1396606457005
>
>
>
> I can't find why meta server hungs .I found this in meta server log
>
> 2014-04-29 13:53:55,637 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,632 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> region hbase:meta,,1.1588230740
>
>
>
>
>
> any suggestion will be appreciated. Thanks.
>
>