You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by sunweiwei <su...@asiainfo-linkage.com> on 2014/05/05 10:14:06 UTC

meta server hungs ?

Hi

I'm using hbase0.96.0.

I found client can't put data suddenly  and  hmaster hungs. Then I shutdown
the hmaster and start a new hmaster, then  the client back to normal.

 

I found this logs in the new hmaster . It seem like meta server hungs and
hmaster stop the meta server.

2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000] catalog.CatalogTracker:
Failed verification of hbase:meta,,1 at
address=hadoop77,60020,1396606457005,
exception=java.net.SocketTimeoutException: Call to
hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
remote=hadoop77/192.168.1.87:60020]

2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster: Forcing
expire of hadoop77,60020,1396606457005

 

I can't find why meta server hungs .I found this in meta server log

2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
region hbase:meta,,1.1588230740

2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
region hbase:meta,,1.1588230740

2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
region hbase:meta,,1.1588230740

2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
region hbase:meta,,1.1588230740

 

 

any suggestion will be appreciated. Thanks.

Re: 答复: 答复: 答复: meta server hungs ?

Posted by Ted Yu <yu...@gmail.com>.

You can add the following to JVM parameters:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC

Cheers


On Fri, May 16, 2014 at 4:32 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:

> HI
>  Sorry, I just saw this mail. I set Gc parameters like this:
>
> export HBASE_REGIONSERVER_OPTS="-Xmn512m
> -XX:CMSInitiatingOccupancyFraction=70  -Xms16384m -Xmx16384m -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps "
>
>
> -----邮件原件-----
> 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> 发送时间: 2014年5月11日 19:11
> 收件人: user@hbase.apache.org
> 抄送: <us...@hbase.apache.org>
> 主题: Re: 答复: 答复: meta server hungs ?
>
> What GC parameters did you specify for JVM ?
>
> Thanks
>
> On May 7, 2014, at 6:27 PM, "sunweiwei" <su...@asiainfo-linkage.com>
> wrote:
>
> > I find lots of  these  in gc.log. It seems like CMS gc run many times
> but old Generation is always large.
> > I'm confused.
> > Any suggestion will be appreciated. Thanks.
> >
> > 2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
> > 2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew:
> 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K),
> 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
> > 2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew:
> 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K),
> 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
> > 2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep:
> 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]
> >
> > 2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
> > 2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep:
> 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]
> >
> > 2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
> > 2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep:
> 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]
> >
> > -----邮件原件-----
> > 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> > 发送时间: 2014年5月6日 9:27
> > 收件人: user@hbase.apache.org
> > 主题: 答复: 答复: meta server hungs ?
> >
> > HI Samir
> >    I think master declared  hadoop77/192.168.1.87:60020 as dead server,
>  because of "Failed verification of hbase:meta,,1 at
> address=hadoop77,60020,1396606457005
> exception=java.net.SocketTimeoutException".
> >    I have paste the master log in the first mail.
> >
> >    I'm not sure,  here is the whole process:
> >    at 2014-04-29 13:53:57,271    client throw a SocketTimeoutException :
> Call to  hadoop77/192.168.1.87:60020failed because
> java.net.SocketTimeoutException: 60000 millis timeout and other clients
> hung.
> >    at 2014-04-29 15:30:**        I visit hbase web and found hmaster
> hung , then i stop it and start a new  hmaster.
> >    at 2014-04-29 15:32:21,530    the new hmaster logs "Failed
> verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005,
> exception=java.net.SocketTimeoutException:
> >                                  Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException"
> >    at 2014-04-29 15:32:28,364    the meta server received hmaster's
> message and shutdown itself.
> >
> >    after these, clients come back to normal
> >
> > -----邮件原件-----
> > 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> > 发送时间: 2014年5月5日 19:25
> > 收件人: user@hbase.apache.org
> > 主题: Re: 答复: meta server hungs ?
> >
> > There should be exception in regionserver log on  hadoop77/
> > 192.168.1.87:60020 above  this one:
> >
> > *********
> > 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> > regionserver.HRegionServer: ABORTING region server
> > hadoop77,60020,1396606457005:
> org.apache.hadoop.hbase.YouAreDeadException:
> > Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> > as dead server
> >        at org.apache.hadoop.hbase.master.ServerManager.
> > checkIsDead(ServerManager.java:339)
> > *********
> >
> > Can you find it and past it. That exception should explain why
> > master declared  hadoop77/192.168.1.87:60020 as dead server.
> >
> > Regards
> > Samir
> >
> >
> > On Mon, May 5, 2014 at 11:39 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
> >
> >> And  this is client log.
> >>
> >> 2014-04-29 13:53:57,271 WARN [main]
> >> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> >> closed
> >> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed
> because java.net.SocketTimeoutException: 60000 millis timeout while
> >> waiting for channel to be ready for read. ch :
> >> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473
> remote=hadoop77/
> >> 192.168.1.87:60020]
> >>        at
> >> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> >>        at
> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> >>        at
> >>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
> >>        at
> >>
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
> >>        at
> >>
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
> >>        at
> >>
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
> >>        at
> >>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
> >>        at
> >>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
> >>        at
> >>
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
> >>        at
> >>
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
> >>        at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
> >>        at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
> >>        at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
> >>        at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
> >>        at
> >>
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
> >>        at
> >>
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> >>        at
> >>
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
> >>        at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
> >>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
> >>
> >> -----邮件原件-----
> >> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> >> 发送时间: 2014年5月5日 17:23
> >> 收件人: user@hbase.apache.org
> >> 主题: 答复: meta server hungs ?
> >>
> >> Thank you for reply.
> >> I find this logs in hadoop77/192.168.1.87. It seems like meta
> >> regionserver receive hmaster's message and shutdown itself.
> >> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> >> regionserver.HRegionServer: ABORTING region server
> >> hadoop77,60020,1396606457005:
> org.apache.hadoop.hbase.YouAreDeadException:
> >> Server REPORT rejected; currently processing
> hadoop77,60020,1396606457005
> >> as dead server
> >>        at
> >>
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
> >>
> >>
> >> and  this is  gc  log:
> >> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> >> 449091K->52416K(471872K), 0.0411300 secs]
> 11582287K->11199419K(16724800K),
> >> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> >> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> >> 471859K->19313K(471872K), 0.0222250 secs]
> 11618863K->11175232K(16724800K),
> >> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> >> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> >> 438769K->38887K(471872K), 0.0242330 secs]
> 11594688K->11194807K(16724800K),
> >> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> >> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> >> 458343K->18757K(471872K), 0.0242790 secs]
> 11614263K->11180844K(16724800K),
> >> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> >> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> >> 438213K->4874K(471872K), 0.0221520 secs]
> 11600300K->11166960K(16724800K),
> >> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> >> Heap
> >> par new generation   total 471872K, used 335578K [0x00000003fae00000,
> >> 0x000000041ae00000, 0x000000041ae00000)
> >>  eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8,
> >> 0x00000004147a0000)
> >>  from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0,
> >> 0x000000041ae00000)
> >>  to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000,
> >> 0x0000000417ad0000)
> >> concurrent mark-sweep generation total 16252928K, used 11162086K
> >> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
> >> concurrent-mark-sweep perm gen total 81072K, used 48660K
> >> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
> >>
> >>
> >>
> >> -----邮件原件-----
> >> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> >> 发送时间: 2014年5月5日 16:50
> >> 收件人: user@hbase.apache.org
> >> 抄送: sunweiwei
> >> 主题: Re: meta server hungs ?
> >>
> >> Hi,
> >> This exception:
> >> ****
> >> exception=java.net.SocketTimeoutException: Call to
> >> hadoop77/192.168.1.87:60020 failed because
> >> java.net.SocketTimeoutException:
> >> 60000 millis timeout while waiting for channel to be ready for read. ch
> :
> >> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> >> remote=hadoop77/192.168.1.87:60020]
> >> *****
> >> shows that there is connection timeout between master server and
> >> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta'
> table.
> >> Real question is what is causing this timeout?  In my experience it can
> be
> >> by few things causing this type of timeout. I would suggest that you
> check
> >> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> >> memory,  network, CPU disks and i'm sure you will find cause of timeout.
> >> You can us some diagnostic tools like vmstat, sar, iostat to check your
> >> sistem and you can use jstat to check GC and some other JVM stuff.
> >>
> >> Regards
> >> Samir
> >>
> >>
> >>
> >>
> >> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >>> wrote:
> >>
> >>> Hi
> >>>
> >>> I'm using hbase0.96.0.
> >>>
> >>> I found client can't put data suddenly  and  hmaster hungs. Then I
> >> shutdown
> >>> the hmaster and start a new hmaster, then  the client back to normal.
> >>>
> >>>
> >>>
> >>> I found this logs in the new hmaster . It seem like meta server hungs
> and
> >>> hmaster stop the meta server.
> >>>
> >>> 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
> >>> catalog.CatalogTracker:
> >>> Failed verification of hbase:meta,,1 at
> >>> address=hadoop77,60020,1396606457005,
> >>> exception=java.net.SocketTimeoutException: Call to
> >>> hadoop77/192.168.1.87:60020 failed because
> >>> java.net.SocketTimeoutException:
> >>> 60000 millis timeout while waiting for channel to be ready for read.
> ch :
> >>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> >>> remote=hadoop77/192.168.1.87:60020]
> >>>
> >>> 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
> >>> Forcing
> >>> expire of hadoop77,60020,1396606457005
> >>>
> >>>
> >>>
> >>> I can't find why meta server hungs .I found this in meta server log
> >>>
> >>> 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired
> on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>> 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired
> on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired
> on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> any suggestion will be appreciated. Thanks.
> >
>
>

答复: 答复: 答复: meta server hungs ?

Posted by sunweiwei <su...@asiainfo-linkage.com>.

HI
 Sorry, I just saw this mail. I set Gc parameters like this:
 
export HBASE_REGIONSERVER_OPTS="-Xmn512m -XX:CMSInitiatingOccupancyFraction=70  -Xms16384m -Xmx16384m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps "

 
-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2014年5月11日 19:11
收件人: user@hbase.apache.org
抄送: <us...@hbase.apache.org>
主题: Re: 答复: 答复: meta server hungs ?

What GC parameters did you specify for JVM ?

Thanks

On May 7, 2014, at 6:27 PM, "sunweiwei" <su...@asiainfo-linkage.com> wrote:

> I find lots of  these  in gc.log. It seems like CMS gc run many times but old Generation is always large. 
> I'm confused. 
> Any suggestion will be appreciated. Thanks.
> 
> 2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
> 2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew: 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K), 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
> 2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew: 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K), 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
> 2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep: 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]
> 
> 2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
> 2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep: 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]
> 
> 2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
> 2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep: 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]
> 
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com] 
> 发送时间: 2014年5月6日 9:27
> 收件人: user@hbase.apache.org
> 主题: 答复: 答复: meta server hungs ?
> 
> HI Samir
>    I think master declared  hadoop77/192.168.1.87:60020 as dead server,  because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
>    I have paste the master log in the first mail.
> 
>    I'm not sure,  here is the whole process:
>    at 2014-04-29 13:53:57,271    client throw a SocketTimeoutException : Call to  hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
>    at 2014-04-29 15:30:**        I visit hbase web and found hmaster hung , then i stop it and start a new  hmaster.
>    at 2014-04-29 15:32:21,530    the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException: 
>                                  Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
>    at 2014-04-29 15:32:28,364    the meta server received hmaster's message and shutdown itself.
> 
>    after these, clients come back to normal
> 
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com] 
> 发送时间: 2014年5月5日 19:25
> 收件人: user@hbase.apache.org
> 主题: Re: 答复: meta server hungs ?
> 
> There should be exception in regionserver log on  hadoop77/
> 192.168.1.87:60020 above  this one:
> 
> *********
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
>        at org.apache.hadoop.hbase.master.ServerManager.
> checkIsDead(ServerManager.java:339)
> *********
> 
> Can you find it and past it. That exception should explain why
> master declared  hadoop77/192.168.1.87:60020 as dead server.
> 
> Regards
> Samir
> 
> 
> On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> 
>> And  this is client log.
>> 
>> 2014-04-29 13:53:57,271 WARN [main]
>> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
>> closed
>> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
>> 192.168.1.87:60020]
>>        at
>> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>>        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>>        at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
>>        at
>> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
>>        at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
>>        at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>>        at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
>>        at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
>>        at
>> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
>>        at
>> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
>>        at
>> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
>>        at
>> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
>>        at
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
>>        at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>> 
>> -----邮件原件-----
>> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
>> 发送时间: 2014年5月5日 17:23
>> 收件人: user@hbase.apache.org
>> 主题: 答复: meta server hungs ?
>> 
>> Thank you for reply.
>> I find this logs in hadoop77/192.168.1.87. It seems like meta
>> regionserver receive hmaster's message and shutdown itself.
>> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
>> regionserver.HRegionServer: ABORTING region server
>> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
>> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
>> as dead server
>>        at
>> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>> 
>> 
>> and  this is  gc  log:
>> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
>> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
>> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
>> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
>> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
>> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
>> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
>> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
>> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
>> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
>> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
>> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
>> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
>> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
>> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
>> Heap
>> par new generation   total 471872K, used 335578K [0x00000003fae00000,
>> 0x000000041ae00000, 0x000000041ae00000)
>>  eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8,
>> 0x00000004147a0000)
>>  from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0,
>> 0x000000041ae00000)
>>  to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000,
>> 0x0000000417ad0000)
>> concurrent mark-sweep generation total 16252928K, used 11162086K
>> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
>> concurrent-mark-sweep perm gen total 81072K, used 48660K
>> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>> 
>> 
>> 
>> -----邮件原件-----
>> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
>> 发送时间: 2014年5月5日 16:50
>> 收件人: user@hbase.apache.org
>> 抄送: sunweiwei
>> 主题: Re: meta server hungs ?
>> 
>> Hi,
>> This exception:
>> ****
>> exception=java.net.SocketTimeoutException: Call to
>> hadoop77/192.168.1.87:60020 failed because
>> java.net.SocketTimeoutException:
>> 60000 millis timeout while waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
>> remote=hadoop77/192.168.1.87:60020]
>> *****
>> shows that there is connection timeout between master server and
>> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
>> Real question is what is causing this timeout?  In my experience it can be
>> by few things causing this type of timeout. I would suggest that you check
>> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
>> memory,  network, CPU disks and i'm sure you will find cause of timeout.
>> You can us some diagnostic tools like vmstat, sar, iostat to check your
>> sistem and you can use jstat to check GC and some other JVM stuff.
>> 
>> Regards
>> Samir
>> 
>> 
>> 
>> 
>> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
>>> wrote:
>> 
>>> Hi
>>> 
>>> I'm using hbase0.96.0.
>>> 
>>> I found client can't put data suddenly  and  hmaster hungs. Then I
>> shutdown
>>> the hmaster and start a new hmaster, then  the client back to normal.
>>> 
>>> 
>>> 
>>> I found this logs in the new hmaster . It seem like meta server hungs and
>>> hmaster stop the meta server.
>>> 
>>> 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
>>> catalog.CatalogTracker:
>>> Failed verification of hbase:meta,,1 at
>>> address=hadoop77,60020,1396606457005,
>>> exception=java.net.SocketTimeoutException: Call to
>>> hadoop77/192.168.1.87:60020 failed because
>>> java.net.SocketTimeoutException:
>>> 60000 millis timeout while waiting for channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
>>> remote=hadoop77/192.168.1.87:60020]
>>> 
>>> 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
>>> Forcing
>>> expire of hadoop77,60020,1396606457005
>>> 
>>> 
>>> 
>>> I can't find why meta server hungs .I found this in meta server log
>>> 
>>> 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 
>>> 
>>> 
>>> 
>>> any suggestion will be appreciated. Thanks.
>

Re: 答复: 答复: meta server hungs ?

Posted by Ted Yu <yu...@gmail.com>.

What GC parameters did you specify for JVM ?

Thanks

On May 7, 2014, at 6:27 PM, "sunweiwei" <su...@asiainfo-linkage.com> wrote:

> I find lots of  these  in gc.log. It seems like CMS gc run many times but old Generation is always large. 
> I'm confused. 
> Any suggestion will be appreciated. Thanks.
> 
> 2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
> 2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew: 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K), 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
> 2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew: 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K), 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
> 2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep: 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]
> 
> 2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
> 2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep: 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]
> 
> 2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
> 2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep: 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]
> 
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com] 
> 发送时间: 2014年5月6日 9:27
> 收件人: user@hbase.apache.org
> 主题: 答复: 答复: meta server hungs ?
> 
> HI Samir
>    I think master declared  hadoop77/192.168.1.87:60020 as dead server,  because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
>    I have paste the master log in the first mail.
> 
>    I'm not sure,  here is the whole process:
>    at 2014-04-29 13:53:57,271    client throw a SocketTimeoutException : Call to  hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
>    at 2014-04-29 15:30:**        I visit hbase web and found hmaster hung , then i stop it and start a new  hmaster.
>    at 2014-04-29 15:32:21,530    the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException: 
>                                  Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
>    at 2014-04-29 15:32:28,364    the meta server received hmaster's message and shutdown itself.
> 
>    after these, clients come back to normal
> 
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com] 
> 发送时间: 2014年5月5日 19:25
> 收件人: user@hbase.apache.org
> 主题: Re: 答复: meta server hungs ?
> 
> There should be exception in regionserver log on  hadoop77/
> 192.168.1.87:60020 above  this one:
> 
> *********
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
>        at org.apache.hadoop.hbase.master.ServerManager.
> checkIsDead(ServerManager.java:339)
> *********
> 
> Can you find it and past it. That exception should explain why
> master declared  hadoop77/192.168.1.87:60020 as dead server.
> 
> Regards
> Samir
> 
> 
> On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> 
>> And  this is client log.
>> 
>> 2014-04-29 13:53:57,271 WARN [main]
>> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
>> closed
>> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
>> 192.168.1.87:60020]
>>        at
>> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>>        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>>        at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
>>        at
>> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
>>        at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
>>        at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>>        at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
>>        at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
>>        at
>> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
>>        at
>> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
>>        at
>> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
>>        at
>> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
>>        at
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
>>        at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>> 
>> -----邮件原件-----
>> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
>> 发送时间: 2014年5月5日 17:23
>> 收件人: user@hbase.apache.org
>> 主题: 答复: meta server hungs ?
>> 
>> Thank you for reply.
>> I find this logs in hadoop77/192.168.1.87. It seems like meta
>> regionserver receive hmaster's message and shutdown itself.
>> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
>> regionserver.HRegionServer: ABORTING region server
>> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
>> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
>> as dead server
>>        at
>> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>> 
>> 
>> and  this is  gc  log:
>> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
>> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
>> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
>> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
>> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
>> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
>> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
>> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
>> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
>> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
>> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
>> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
>> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
>> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
>> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
>> Heap
>> par new generation   total 471872K, used 335578K [0x00000003fae00000,
>> 0x000000041ae00000, 0x000000041ae00000)
>>  eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8,
>> 0x00000004147a0000)
>>  from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0,
>> 0x000000041ae00000)
>>  to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000,
>> 0x0000000417ad0000)
>> concurrent mark-sweep generation total 16252928K, used 11162086K
>> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
>> concurrent-mark-sweep perm gen total 81072K, used 48660K
>> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>> 
>> 
>> 
>> -----邮件原件-----
>> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
>> 发送时间: 2014年5月5日 16:50
>> 收件人: user@hbase.apache.org
>> 抄送: sunweiwei
>> 主题: Re: meta server hungs ?
>> 
>> Hi,
>> This exception:
>> ****
>> exception=java.net.SocketTimeoutException: Call to
>> hadoop77/192.168.1.87:60020 failed because
>> java.net.SocketTimeoutException:
>> 60000 millis timeout while waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
>> remote=hadoop77/192.168.1.87:60020]
>> *****
>> shows that there is connection timeout between master server and
>> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
>> Real question is what is causing this timeout?  In my experience it can be
>> by few things causing this type of timeout. I would suggest that you check
>> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
>> memory,  network, CPU disks and i'm sure you will find cause of timeout.
>> You can us some diagnostic tools like vmstat, sar, iostat to check your
>> sistem and you can use jstat to check GC and some other JVM stuff.
>> 
>> Regards
>> Samir
>> 
>> 
>> 
>> 
>> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
>>> wrote:
>> 
>>> Hi
>>> 
>>> I'm using hbase0.96.0.
>>> 
>>> I found client can't put data suddenly  and  hmaster hungs. Then I
>> shutdown
>>> the hmaster and start a new hmaster, then  the client back to normal.
>>> 
>>> 
>>> 
>>> I found this logs in the new hmaster . It seem like meta server hungs and
>>> hmaster stop the meta server.
>>> 
>>> 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
>>> catalog.CatalogTracker:
>>> Failed verification of hbase:meta,,1 at
>>> address=hadoop77,60020,1396606457005,
>>> exception=java.net.SocketTimeoutException: Call to
>>> hadoop77/192.168.1.87:60020 failed because
>>> java.net.SocketTimeoutException:
>>> 60000 millis timeout while waiting for channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
>>> remote=hadoop77/192.168.1.87:60020]
>>> 
>>> 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
>>> Forcing
>>> expire of hadoop77,60020,1396606457005
>>> 
>>> 
>>> 
>>> I can't find why meta server hungs .I found this in meta server log
>>> 
>>> 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 
>>> 
>>> 
>>> 
>>> any suggestion will be appreciated. Thanks.
>

答复: 答复: meta server hungs ?

Posted by sunweiwei <su...@asiainfo-linkage.com>.

I find lots of  these  in gc.log. It seems like CMS gc run many times but old Generation is always large. 
I'm confused. 
Any suggestion will be appreciated. Thanks.

2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew: 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K), 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew: 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K), 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep: 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]

2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep: 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]

2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep: 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]

-----邮件原件-----
发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com] 
发送时间: 2014年5月6日 9:27
收件人: user@hbase.apache.org
主题: 答复: 答复: meta server hungs ?

HI Samir
    I think master declared  hadoop77/192.168.1.87:60020 as dead server,  because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
    I have paste the master log in the first mail.
    
    I'm not sure,  here is the whole process:
    at 2014-04-29 13:53:57,271    client throw a SocketTimeoutException : Call to  hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
    at 2014-04-29 15:30:**        I visit hbase web and found hmaster hung , then i stop it and start a new  hmaster.
    at 2014-04-29 15:32:21,530    the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException: 
                                  Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
    at 2014-04-29 15:32:28,364    the meta server received hmaster's message and shutdown itself.
    
    after these, clients come back to normal

-----邮件原件-----
发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com] 
发送时间: 2014年5月5日 19:25
收件人: user@hbase.apache.org
主题: Re: 答复: meta server hungs ?

There should be exception in regionserver log on  hadoop77/
192.168.1.87:60020 above  this one:

*********
2014-04-29 15:32:28,364 FATAL [regionserver60020]
regionserver.HRegionServer: ABORTING region server
hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing hadoop77,60020,1396606457005
as dead server
        at org.apache.hadoop.hbase.master.ServerManager.
checkIsDead(ServerManager.java:339)
*********

Can you find it and past it. That exception should explain why
master declared  hadoop77/192.168.1.87:60020 as dead server.

Regards
Samir


On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:

> And  this is client log.
>
> 2014-04-29 13:53:57,271 WARN [main]
> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> closed
> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
> 192.168.1.87:60020]
>         at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>         at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>         at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>         at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>         at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
>         at
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
>         at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
>         at
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
>         at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> 发送时间: 2014年5月5日 17:23
> 收件人: user@hbase.apache.org
> 主题: 答复: meta server hungs ?
>
> Thank you for reply.
> I find this logs in hadoop77/192.168.1.87. It seems like meta
> regionserver receive hmaster's message and shutdown itself.
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
>         at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>
>
> and  this is  gc  log:
> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> Heap
>  par new generation   total 471872K, used 335578K [0x00000003fae00000,
> 0x000000041ae00000, 0x000000041ae00000)
>   eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8,
> 0x00000004147a0000)
>   from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0,
> 0x000000041ae00000)
>   to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000,
> 0x0000000417ad0000)
>  concurrent mark-sweep generation total 16252928K, used 11162086K
> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 81072K, used 48660K
> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>
>
>
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> 发送时间: 2014年5月5日 16:50
> 收件人: user@hbase.apache.org
> 抄送: sunweiwei
> 主题: Re: meta server hungs ?
>
> Hi,
> This exception:
> ****
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
> *****
> shows that there is connection timeout between master server and
> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
> Real question is what is causing this timeout?  In my experience it can be
> by few things causing this type of timeout. I would suggest that you check
> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> memory,  network, CPU disks and i'm sure you will find cause of timeout.
> You can us some diagnostic tools like vmstat, sar, iostat to check your
> sistem and you can use jstat to check GC and some other JVM stuff.
>
> Regards
> Samir
>
>
>
>
> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
>
> > Hi
> >
> > I'm using hbase0.96.0.
> >
> > I found client can't put data suddenly  and  hmaster hungs. Then I
> shutdown
> > the hmaster and start a new hmaster, then  the client back to normal.
> >
> >
> >
> > I found this logs in the new hmaster . It seem like meta server hungs and
> > hmaster stop the meta server.
> >
> > 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
> > catalog.CatalogTracker:
> > Failed verification of hbase:meta,,1 at
> > address=hadoop77,60020,1396606457005,
> > exception=java.net.SocketTimeoutException: Call to
> > hadoop77/192.168.1.87:60020 failed because
> > java.net.SocketTimeoutException:
> > 60000 millis timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> > remote=hadoop77/192.168.1.87:60020]
> >
> > 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
> > Forcing
> > expire of hadoop77,60020,1396606457005
> >
> >
> >
> > I can't find why meta server hungs .I found this in meta server log
> >
> > 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> > region hbase:meta,,1.1588230740
> >
> >
> >
> >
> >
> > any suggestion will be appreciated. Thanks.
> >
> >
>
>

答复: 答复: meta server hungs ?

Posted by sunweiwei <su...@asiainfo-linkage.com>.

I find lots of  these  in gc.log. It seems like CMS gc run many times but old Generation is always large. 
I'm confused. Any suggestions ,Thanks

2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew: 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K), 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew: 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K), 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep: 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]

2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep: 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]

2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep: 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]


-----邮件原件-----
发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com] 
发送时间: 2014年5月6日 9:27
收件人: user@hbase.apache.org
主题: 答复: 答复: meta server hungs ?

HI Samir
    I think master declared  hadoop77/192.168.1.87:60020 as dead server,  because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
    I have paste the master log in the first mail.
    
    I'm not sure,  here is the whole process:
    at 2014-04-29 13:53:57,271    client throw a SocketTimeoutException : Call to  hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
    at 2014-04-29 15:30:**        I visit hbase web and found hmaster hung , then i stop it and start a new  hmaster.
    at 2014-04-29 15:32:21,530    the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException: 
                                  Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
    at 2014-04-29 15:32:28,364    the meta server received hmaster's message and shutdown itself.
    
    after these, clients come back to normal

-----邮件原件-----
发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com] 
发送时间: 2014年5月5日 19:25
收件人: user@hbase.apache.org
主题: Re: 答复: meta server hungs ?

There should be exception in regionserver log on  hadoop77/
192.168.1.87:60020 above  this one:

*********
2014-04-29 15:32:28,364 FATAL [regionserver60020]
regionserver.HRegionServer: ABORTING region server
hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing hadoop77,60020,1396606457005
as dead server
        at org.apache.hadoop.hbase.master.ServerManager.
checkIsDead(ServerManager.java:339)
*********

Can you find it and past it. That exception should explain why
master declared  hadoop77/192.168.1.87:60020 as dead server.

Regards
Samir


On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:

> And  this is client log.
>
> 2014-04-29 13:53:57,271 WARN [main]
> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> closed
> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
> 192.168.1.87:60020]
>         at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>         at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>         at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>         at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>         at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
>         at
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
>         at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
>         at
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
>         at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> 发送时间: 2014年5月5日 17:23
> 收件人: user@hbase.apache.org
> 主题: 答复: meta server hungs ?
>
> Thank you for reply.
> I find this logs in hadoop77/192.168.1.87. It seems like meta
> regionserver receive hmaster's message and shutdown itself.
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
>         at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>
>
> and  this is  gc  log:
> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> Heap
>  par new generation   total 471872K, used 335578K [0x00000003fae00000,
> 0x000000041ae00000, 0x000000041ae00000)
>   eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8,
> 0x00000004147a0000)
>   from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0,
> 0x000000041ae00000)
>   to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000,
> 0x0000000417ad0000)
>  concurrent mark-sweep generation total 16252928K, used 11162086K
> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 81072K, used 48660K
> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>
>
>
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> 发送时间: 2014年5月5日 16:50
> 收件人: user@hbase.apache.org
> 抄送: sunweiwei
> 主题: Re: meta server hungs ?
>
> Hi,
> This exception:
> ****
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
> *****
> shows that there is connection timeout between master server and
> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
> Real question is what is causing this timeout?  In my experience it can be
> by few things causing this type of timeout. I would suggest that you check
> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> memory,  network, CPU disks and i'm sure you will find cause of timeout.
> You can us some diagnostic tools like vmstat, sar, iostat to check your
> sistem and you can use jstat to check GC and some other JVM stuff.
>
> Regards
> Samir
>
>
>
>
> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
>
> > Hi
> >
> > I'm using hbase0.96.0.
> >
> > I found client can't put data suddenly  and  hmaster hungs. Then I
> shutdown
> > the hmaster and start a new hmaster, then  the client back to normal.
> >
> >
> >
> > I found this logs in the new hmaster . It seem like meta server hungs and
> > hmaster stop the meta server.
> >
> > 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
> > catalog.CatalogTracker:
> > Failed verification of hbase:meta,,1 at
> > address=hadoop77,60020,1396606457005,
> > exception=java.net.SocketTimeoutException: Call to
> > hadoop77/192.168.1.87:60020 failed because
> > java.net.SocketTimeoutException:
> > 60000 millis timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> > remote=hadoop77/192.168.1.87:60020]
> >
> > 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
> > Forcing
> > expire of hadoop77,60020,1396606457005
> >
> >
> >
> > I can't find why meta server hungs .I found this in meta server log
> >
> > 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> > region hbase:meta,,1.1588230740
> >
> >
> >
> >
> >
> > any suggestion will be appreciated. Thanks.
> >
> >
>
>

答复: 答复: meta server hungs ?

Posted by sunweiwei <su...@asiainfo-linkage.com>.

HI Samir
    I think master declared  hadoop77/192.168.1.87:60020 as dead server,  because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
    I have paste the master log in the first mail.
    
    I'm not sure,  here is the whole process:
    at 2014-04-29 13:53:57,271    client throw a SocketTimeoutException : Call to  hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
    at 2014-04-29 15:30:**        I visit hbase web and found hmaster hung , then i stop it and start a new  hmaster.
    at 2014-04-29 15:32:21,530    the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException: 
                                  Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
    at 2014-04-29 15:32:28,364    the meta server received hmaster's message and shutdown itself.
    
    after these, clients come back to normal

-----邮件原件-----
发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com] 
发送时间: 2014年5月5日 19:25
收件人: user@hbase.apache.org
主题: Re: 答复: meta server hungs ?

There should be exception in regionserver log on  hadoop77/
192.168.1.87:60020 above  this one:

*********
2014-04-29 15:32:28,364 FATAL [regionserver60020]
regionserver.HRegionServer: ABORTING region server
hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing hadoop77,60020,1396606457005
as dead server
        at org.apache.hadoop.hbase.master.ServerManager.
checkIsDead(ServerManager.java:339)
*********

Can you find it and past it. That exception should explain why
master declared  hadoop77/192.168.1.87:60020 as dead server.

Regards
Samir


On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:

> And  this is client log.
>
> 2014-04-29 13:53:57,271 WARN [main]
> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> closed
> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
> 192.168.1.87:60020]
>         at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>         at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>         at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>         at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>         at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
>         at
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
>         at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
>         at
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
>         at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> 发送时间: 2014年5月5日 17:23
> 收件人: user@hbase.apache.org
> 主题: 答复: meta server hungs ?
>
> Thank you for reply.
> I find this logs in hadoop77/192.168.1.87. It seems like meta
> regionserver receive hmaster's message and shutdown itself.
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
>         at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>
>
> and  this is  gc  log:
> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> Heap
>  par new generation   total 471872K, used 335578K [0x00000003fae00000,
> 0x000000041ae00000, 0x000000041ae00000)
>   eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8,
> 0x00000004147a0000)
>   from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0,
> 0x000000041ae00000)
>   to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000,
> 0x0000000417ad0000)
>  concurrent mark-sweep generation total 16252928K, used 11162086K
> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 81072K, used 48660K
> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>
>
>
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> 发送时间: 2014年5月5日 16:50
> 收件人: user@hbase.apache.org
> 抄送: sunweiwei
> 主题: Re: meta server hungs ?
>
> Hi,
> This exception:
> ****
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
> *****
> shows that there is connection timeout between master server and
> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
> Real question is what is causing this timeout?  In my experience it can be
> by few things causing this type of timeout. I would suggest that you check
> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> memory,  network, CPU disks and i'm sure you will find cause of timeout.
> You can us some diagnostic tools like vmstat, sar, iostat to check your
> sistem and you can use jstat to check GC and some other JVM stuff.
>
> Regards
> Samir
>
>
>
>
> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
>
> > Hi
> >
> > I'm using hbase0.96.0.
> >
> > I found client can't put data suddenly  and  hmaster hungs. Then I
> shutdown
> > the hmaster and start a new hmaster, then  the client back to normal.
> >
> >
> >
> > I found this logs in the new hmaster . It seem like meta server hungs and
> > hmaster stop the meta server.
> >
> > 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
> > catalog.CatalogTracker:
> > Failed verification of hbase:meta,,1 at
> > address=hadoop77,60020,1396606457005,
> > exception=java.net.SocketTimeoutException: Call to
> > hadoop77/192.168.1.87:60020 failed because
> > java.net.SocketTimeoutException:
> > 60000 millis timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> > remote=hadoop77/192.168.1.87:60020]
> >
> > 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
> > Forcing
> > expire of hadoop77,60020,1396606457005
> >
> >
> >
> > I can't find why meta server hungs .I found this in meta server log
> >
> > 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> > region hbase:meta,,1.1588230740
> >
> >
> >
> >
> >
> > any suggestion will be appreciated. Thanks.
> >
> >
>
>

Re: 答复: meta server hungs ?

Posted by Samir Ahmic <ah...@gmail.com>.

There should be exception in regionserver log on  hadoop77/
192.168.1.87:60020 above  this one:

*********
2014-04-29 15:32:28,364 FATAL [regionserver60020]
regionserver.HRegionServer: ABORTING region server
hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing hadoop77,60020,1396606457005
as dead server
        at org.apache.hadoop.hbase.master.ServerManager.
checkIsDead(ServerManager.java:339)
*********

Can you find it and past it. That exception should explain why
master declared  hadoop77/192.168.1.87:60020 as dead server.

Regards
Samir


On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:

> And  this is client log.
>
> 2014-04-29 13:53:57,271 WARN [main]
> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> closed
> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
> 192.168.1.87:60020]
>         at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>         at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>         at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>         at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>         at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
>         at
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
>         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
>         at
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
>         at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
>         at
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
>         at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> 发送时间: 2014年5月5日 17:23
> 收件人: user@hbase.apache.org
> 主题: 答复: meta server hungs ?
>
> Thank you for reply.
> I find this logs in hadoop77/192.168.1.87. It seems like meta
> regionserver receive hmaster's message and shutdown itself.
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
>         at
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>
>
> and  this is  gc  log:
> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> Heap
>  par new generation   total 471872K, used 335578K [0x00000003fae00000,
> 0x000000041ae00000, 0x000000041ae00000)
>   eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8,
> 0x00000004147a0000)
>   from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0,
> 0x000000041ae00000)
>   to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000,
> 0x0000000417ad0000)
>  concurrent mark-sweep generation total 16252928K, used 11162086K
> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 81072K, used 48660K
> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>
>
>
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> 发送时间: 2014年5月5日 16:50
> 收件人: user@hbase.apache.org
> 抄送: sunweiwei
> 主题: Re: meta server hungs ?
>
> Hi,
> This exception:
> ****
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
> *****
> shows that there is connection timeout between master server and
> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
> Real question is what is causing this timeout?  In my experience it can be
> by few things causing this type of timeout. I would suggest that you check
> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> memory,  network, CPU disks and i'm sure you will find cause of timeout.
> You can us some diagnostic tools like vmstat, sar, iostat to check your
> sistem and you can use jstat to check GC and some other JVM stuff.
>
> Regards
> Samir
>
>
>
>
> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
>
> > Hi
> >
> > I'm using hbase0.96.0.
> >
> > I found client can't put data suddenly  and  hmaster hungs. Then I
> shutdown
> > the hmaster and start a new hmaster, then  the client back to normal.
> >
> >
> >
> > I found this logs in the new hmaster . It seem like meta server hungs and
> > hmaster stop the meta server.
> >
> > 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
> > catalog.CatalogTracker:
> > Failed verification of hbase:meta,,1 at
> > address=hadoop77,60020,1396606457005,
> > exception=java.net.SocketTimeoutException: Call to
> > hadoop77/192.168.1.87:60020 failed because
> > java.net.SocketTimeoutException:
> > 60000 millis timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> > remote=hadoop77/192.168.1.87:60020]
> >
> > 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
> > Forcing
> > expire of hadoop77,60020,1396606457005
> >
> >
> >
> > I can't find why meta server hungs .I found this in meta server log
> >
> > 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> > region hbase:meta,,1.1588230740
> >
> > 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> > regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> > region hbase:meta,,1.1588230740
> >
> >
> >
> >
> >
> > any suggestion will be appreciated. Thanks.
> >
> >
>
>

答复: meta server hungs ?

Posted by sunweiwei <su...@asiainfo-linkage.com>.

And  this is client log.

2014-04-29 13:53:57,271 WARN [main] org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already closed
java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473 remote=hadoop77/192.168.1.87:60020]
	at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
	at org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
	at org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
	at org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
	at org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
	at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
	at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)

-----邮件原件-----
发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com] 
发送时间: 2014年5月5日 17:23
收件人: user@hbase.apache.org
主题: 答复: meta server hungs ?

Thank you for reply. 
I find this logs in hadoop77/192.168.1.87. It seems like meta regionserver receive hmaster's message and shutdown itself. 
2014-04-29 15:32:28,364 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hadoop77,60020,1396606457005 as dead server
        at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)


and  this is  gc  log:
2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew: 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K), 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew: 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K), 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew: 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K), 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew: 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K), 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew: 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K), 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
Heap
 par new generation   total 471872K, used 335578K [0x00000003fae00000, 0x000000041ae00000, 0x000000041ae00000)
  eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8, 0x00000004147a0000)
  from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0, 0x000000041ae00000)
  to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000, 0x0000000417ad0000)
 concurrent mark-sweep generation total 16252928K, used 11162086K [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 81072K, used 48660K [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)



-----邮件原件-----
发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com] 
发送时间: 2014年5月5日 16:50
收件人: user@hbase.apache.org
抄送: sunweiwei
主题: Re: meta server hungs ?

Hi,
This exception:
****
exception=java.net.SocketTimeoutException: Call to
hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
remote=hadoop77/192.168.1.87:60020]
*****
shows that there is connection timeout between master server and
regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
Real question is what is causing this timeout?  In my experience it can be
by few things causing this type of timeout. I would suggest that you check
hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
memory,  network, CPU disks and i'm sure you will find cause of timeout.
You can us some diagnostic tools like vmstat, sar, iostat to check your
sistem and you can use jstat to check GC and some other JVM stuff.

Regards
Samir




On Mon, May 5, 2014 at 10:14 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:

> Hi
>
> I'm using hbase0.96.0.
>
> I found client can't put data suddenly  and  hmaster hungs. Then I shutdown
> the hmaster and start a new hmaster, then  the client back to normal.
>
>
>
> I found this logs in the new hmaster . It seem like meta server hungs and
> hmaster stop the meta server.
>
> 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
> catalog.CatalogTracker:
> Failed verification of hbase:meta,,1 at
> address=hadoop77,60020,1396606457005,
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
>
> 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
> Forcing
> expire of hadoop77,60020,1396606457005
>
>
>
> I can't find why meta server hungs .I found this in meta server log
>
> 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> region hbase:meta,,1.1588230740
>
>
>
>
>
> any suggestion will be appreciated. Thanks.
>
>

答复: meta server hungs ?

Posted by sunweiwei <su...@asiainfo-linkage.com>.

Thank you for reply. 
I find this logs in hadoop77/192.168.1.87. It seems like meta regionserver receive hmaster's message and shutdown itself. 
2014-04-29 15:32:28,364 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING region server hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing hadoop77,60020,1396606457005 as dead server
        at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)


and  this is  gc  log:
2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew: 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K), 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew: 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K), 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew: 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K), 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew: 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K), 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew: 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K), 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
Heap
 par new generation   total 471872K, used 335578K [0x00000003fae00000, 0x000000041ae00000, 0x000000041ae00000)
  eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8, 0x00000004147a0000)
  from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0, 0x000000041ae00000)
  to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000, 0x0000000417ad0000)
 concurrent mark-sweep generation total 16252928K, used 11162086K [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 81072K, used 48660K [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)



-----邮件原件-----
发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com] 
发送时间: 2014年5月5日 16:50
收件人: user@hbase.apache.org
抄送: sunweiwei
主题: Re: meta server hungs ?

Hi,
This exception:
****
exception=java.net.SocketTimeoutException: Call to
hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
remote=hadoop77/192.168.1.87:60020]
*****
shows that there is connection timeout between master server and
regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
Real question is what is causing this timeout?  In my experience it can be
by few things causing this type of timeout. I would suggest that you check
hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
memory,  network, CPU disks and i'm sure you will find cause of timeout.
You can us some diagnostic tools like vmstat, sar, iostat to check your
sistem and you can use jstat to check GC and some other JVM stuff.

Regards
Samir




On Mon, May 5, 2014 at 10:14 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:

> Hi
>
> I'm using hbase0.96.0.
>
> I found client can't put data suddenly  and  hmaster hungs. Then I shutdown
> the hmaster and start a new hmaster, then  the client back to normal.
>
>
>
> I found this logs in the new hmaster . It seem like meta server hungs and
> hmaster stop the meta server.
>
> 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
> catalog.CatalogTracker:
> Failed verification of hbase:meta,,1 at
> address=hadoop77,60020,1396606457005,
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
>
> 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
> Forcing
> expire of hadoop77,60020,1396606457005
>
>
>
> I can't find why meta server hungs .I found this in meta server log
>
> 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> region hbase:meta,,1.1588230740
>
>
>
>
>
> any suggestion will be appreciated. Thanks.
>
>

Re: meta server hungs ?

Posted by Samir Ahmic <ah...@gmail.com>.

Hi,
This exception:
****
exception=java.net.SocketTimeoutException: Call to
hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException:
60000 millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
remote=hadoop77/192.168.1.87:60020]
*****
shows that there is connection timeout between master server and
regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
Real question is what is causing this timeout?  In my experience it can be
by few things causing this type of timeout. I would suggest that you check
hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
memory,  network, CPU disks and i'm sure you will find cause of timeout.
You can us some diagnostic tools like vmstat, sar, iostat to check your
sistem and you can use jstat to check GC and some other JVM stuff.

Regards
Samir

On Mon, May 5, 2014 at 10:14 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:

> Hi
>
> I'm using hbase0.96.0.
>
> I found client can't put data suddenly  and  hmaster hungs. Then I shutdown
> the hmaster and start a new hmaster, then  the client back to normal.
>
>
>
> I found this logs in the new hmaster . It seem like meta server hungs and
> hmaster stop the meta server.
>
> 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
> catalog.CatalogTracker:
> Failed verification of hbase:meta,,1 at
> address=hadoop77,60020,1396606457005,
> exception=java.net.SocketTimeoutException: Call to
> hadoop77/192.168.1.87:60020 failed because
> java.net.SocketTimeoutException:
> 60000 millis timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> remote=hadoop77/192.168.1.87:60020]
>
> 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
> Forcing
> expire of hadoop77,60020,1396606457005
>
>
>
> I can't find why meta server hungs .I found this in meta server log
>
> 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> region hbase:meta,,1.1588230740
>
> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
> region hbase:meta,,1.1588230740
>
>
>
>
>
> any suggestion will be appreciated. Thanks.
>
>