You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by sunweiwei <su...@asiainfo-linkage.com> on 2014/05/16 13:32:20 UTC

答复: 答复: 答复: meta server hungs ?

HI
 Sorry, I just saw this mail. I set Gc parameters like this:
 
export HBASE_REGIONSERVER_OPTS="-Xmn512m -XX:CMSInitiatingOccupancyFraction=70  -Xms16384m -Xmx16384m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps "

 
-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2014年5月11日 19:11
收件人: user@hbase.apache.org
抄送: <us...@hbase.apache.org>
主题: Re: 答复: 答复: meta server hungs ?

What GC parameters did you specify for JVM ?

Thanks

On May 7, 2014, at 6:27 PM, "sunweiwei" <su...@asiainfo-linkage.com> wrote:

> I find lots of  these  in gc.log. It seems like CMS gc run many times but old Generation is always large. 
> I'm confused. 
> Any suggestion will be appreciated. Thanks.
> 
> 2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
> 2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew: 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K), 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
> 2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew: 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K), 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
> 2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep: 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]
> 
> 2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
> 2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep: 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]
> 
> 2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
> 2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep: 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]
> 
> -----邮件原件-----
> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com] 
> 发送时间: 2014年5月6日 9:27
> 收件人: user@hbase.apache.org
> 主题: 答复: 答复: meta server hungs ?
> 
> HI Samir
>    I think master declared  hadoop77/192.168.1.87:60020 as dead server,  because of "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005 exception=java.net.SocketTimeoutException".
>    I have paste the master log in the first mail.
> 
>    I'm not sure,  here is the whole process:
>    at 2014-04-29 13:53:57,271    client throw a SocketTimeoutException : Call to  hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout and other clients hung.
>    at 2014-04-29 15:30:**        I visit hbase web and found hmaster hung , then i stop it and start a new  hmaster.
>    at 2014-04-29 15:32:21,530    the new hmaster logs "Failed verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005, exception=java.net.SocketTimeoutException: 
>                                  Call to hadoop77/192.168.1.87:60020 failed because java.net.SocketTimeoutException"
>    at 2014-04-29 15:32:28,364    the meta server received hmaster's message and shutdown itself.
> 
>    after these, clients come back to normal
> 
> -----邮件原件-----
> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com] 
> 发送时间: 2014年5月5日 19:25
> 收件人: user@hbase.apache.org
> 主题: Re: 答复: meta server hungs ?
> 
> There should be exception in regionserver log on  hadoop77/
> 192.168.1.87:60020 above  this one:
> 
> *********
> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> as dead server
>        at org.apache.hadoop.hbase.master.ServerManager.
> checkIsDead(ServerManager.java:339)
> *********
> 
> Can you find it and past it. That exception should explain why
> master declared  hadoop77/192.168.1.87:60020 as dead server.
> 
> Regards
> Samir
> 
> 
> On Mon, May 5, 2014 at 11:39 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:
> 
>> And  this is client log.
>> 
>> 2014-04-29 13:53:57,271 WARN [main]
>> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
>> closed
>> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473remote=hadoop77/
>> 192.168.1.87:60020]
>>        at
>> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>>        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>>        at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
>>        at
>> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
>>        at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
>>        at
>> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
>>        at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
>>        at
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
>>        at
>> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
>>        at
>> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
>>        at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
>>        at
>> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
>>        at
>> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
>>        at
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
>>        at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
>> 
>> -----邮件原件-----
>> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
>> 发送时间: 2014年5月5日 17:23
>> 收件人: user@hbase.apache.org
>> 主题: 答复: meta server hungs ?
>> 
>> Thank you for reply.
>> I find this logs in hadoop77/192.168.1.87. It seems like meta
>> regionserver receive hmaster's message and shutdown itself.
>> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
>> regionserver.HRegionServer: ABORTING region server
>> hadoop77,60020,1396606457005: org.apache.hadoop.hbase.YouAreDeadException:
>> Server REPORT rejected; currently processing hadoop77,60020,1396606457005
>> as dead server
>>        at
>> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
>> 
>> 
>> and  this is  gc  log:
>> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
>> 449091K->52416K(471872K), 0.0411300 secs] 11582287K->11199419K(16724800K),
>> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
>> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
>> 471859K->19313K(471872K), 0.0222250 secs] 11618863K->11175232K(16724800K),
>> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
>> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
>> 438769K->38887K(471872K), 0.0242330 secs] 11594688K->11194807K(16724800K),
>> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
>> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
>> 458343K->18757K(471872K), 0.0242790 secs] 11614263K->11180844K(16724800K),
>> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
>> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
>> 438213K->4874K(471872K), 0.0221520 secs] 11600300K->11166960K(16724800K),
>> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
>> Heap
>> par new generation   total 471872K, used 335578K [0x00000003fae00000,
>> 0x000000041ae00000, 0x000000041ae00000)
>>  eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8,
>> 0x00000004147a0000)
>>  from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0,
>> 0x000000041ae00000)
>>  to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000,
>> 0x0000000417ad0000)
>> concurrent mark-sweep generation total 16252928K, used 11162086K
>> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
>> concurrent-mark-sweep perm gen total 81072K, used 48660K
>> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
>> 
>> 
>> 
>> -----邮件原件-----
>> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
>> 发送时间: 2014年5月5日 16:50
>> 收件人: user@hbase.apache.org
>> 抄送: sunweiwei
>> 主题: Re: meta server hungs ?
>> 
>> Hi,
>> This exception:
>> ****
>> exception=java.net.SocketTimeoutException: Call to
>> hadoop77/192.168.1.87:60020 failed because
>> java.net.SocketTimeoutException:
>> 60000 millis timeout while waiting for channel to be ready for read. ch :
>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
>> remote=hadoop77/192.168.1.87:60020]
>> *****
>> shows that there is connection timeout between master server and
>> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta' table.
>> Real question is what is causing this timeout?  In my experience it can be
>> by few things causing this type of timeout. I would suggest that you check
>> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
>> memory,  network, CPU disks and i'm sure you will find cause of timeout.
>> You can us some diagnostic tools like vmstat, sar, iostat to check your
>> sistem and you can use jstat to check GC and some other JVM stuff.
>> 
>> Regards
>> Samir
>> 
>> 
>> 
>> 
>> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
>>> wrote:
>> 
>>> Hi
>>> 
>>> I'm using hbase0.96.0.
>>> 
>>> I found client can't put data suddenly  and  hmaster hungs. Then I
>> shutdown
>>> the hmaster and start a new hmaster, then  the client back to normal.
>>> 
>>> 
>>> 
>>> I found this logs in the new hmaster . It seem like meta server hungs and
>>> hmaster stop the meta server.
>>> 
>>> 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
>>> catalog.CatalogTracker:
>>> Failed verification of hbase:meta,,1 at
>>> address=hadoop77,60020,1396606457005,
>>> exception=java.net.SocketTimeoutException: Call to
>>> hadoop77/192.168.1.87:60020 failed because
>>> java.net.SocketTimeoutException:
>>> 60000 millis timeout while waiting for channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
>>> remote=hadoop77/192.168.1.87:60020]
>>> 
>>> 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
>>> Forcing
>>> expire of hadoop77,60020,1396606457005
>>> 
>>> 
>>> 
>>> I can't find why meta server hungs .I found this in meta server log
>>> 
>>> 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
>>> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired on
>>> region hbase:meta,,1.1588230740
>>> 
>>> 
>>> 
>>> 
>>> 
>>> any suggestion will be appreciated. Thanks.
> 


Re: 答复: 答复: 答复: meta server hungs ?

Posted by Ted Yu <yu...@gmail.com>.
You can add the following to JVM parameters:
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC

Cheers


On Fri, May 16, 2014 at 4:32 AM, sunweiwei <su...@asiainfo-linkage.com>wrote:

> HI
>  Sorry, I just saw this mail. I set Gc parameters like this:
>
> export HBASE_REGIONSERVER_OPTS="-Xmn512m
> -XX:CMSInitiatingOccupancyFraction=70  -Xms16384m -Xmx16384m -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps "
>
>
> -----邮件原件-----
> 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> 发送时间: 2014年5月11日 19:11
> 收件人: user@hbase.apache.org
> 抄送: <us...@hbase.apache.org>
> 主题: Re: 答复: 答复: meta server hungs ?
>
> What GC parameters did you specify for JVM ?
>
> Thanks
>
> On May 7, 2014, at 6:27 PM, "sunweiwei" <su...@asiainfo-linkage.com>
> wrote:
>
> > I find lots of  these  in gc.log. It seems like CMS gc run many times
> but old Generation is always large.
> > I'm confused.
> > Any suggestion will be appreciated. Thanks.
> >
> > 2014-04-29T13:40:36.081+0800: 2143586.787: [CMS-concurrent-sweep-start]
> > 2014-04-29T13:40:36.447+0800: 2143587.154: [GC 2143587.154: [ParNew:
> 471872K->52416K(471872K), 0.0587370 secs] 11893986K->11506108K(16724800K),
> 0.0590390 secs] [Times: user=0.00 sys=0.00, real=0.06 secs]
> > 2014-04-29T13:40:37.382+0800: 2143588.089: [GC 2143588.089: [ParNew:
> 471872K->52416K(471872K), 0.0805690 secs] 11812475K->11439145K(16724800K),
> 0.0807940 secs] [Times: user=0.00 sys=0.00, real=0.08 secs]
> > 2014-04-29T13:40:37.660+0800: 2143588.367: [CMS-concurrent-sweep:
> 1.435/1.579 secs] [Times: user=0.00 sys=0.00, real=1.58 secs]
> >
> > 2014-04-29T13:56:39.780+0800: 2144550.486: [CMS-concurrent-sweep-start]
> > 2014-04-29T13:56:41.007+0800: 2144551.714: [CMS-concurrent-sweep:
> 1.228/1.228 secs] [Times: user=0.00 sys=0.00, real=1.23 secs]
> >
> > 2014-04-29T13:56:48.231+0800: 2144558.938: [CMS-concurrent-sweep-start]
> > 2014-04-29T13:56:49.490+0800: 2144560.196: [CMS-concurrent-sweep:
> 1.258/1.258 secs] [Times: user=0.00 sys=0.00, real=1.26 secs]
> >
> > -----邮件原件-----
> > 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> > 发送时间: 2014年5月6日 9:27
> > 收件人: user@hbase.apache.org
> > 主题: 答复: 答复: meta server hungs ?
> >
> > HI Samir
> >    I think master declared  hadoop77/192.168.1.87:60020 as dead server,
>  because of "Failed verification of hbase:meta,,1 at
> address=hadoop77,60020,1396606457005
> exception=java.net.SocketTimeoutException".
> >    I have paste the master log in the first mail.
> >
> >    I'm not sure,  here is the whole process:
> >    at 2014-04-29 13:53:57,271    client throw a SocketTimeoutException :
> Call to  hadoop77/192.168.1.87:60020failed because
> java.net.SocketTimeoutException: 60000 millis timeout and other clients
> hung.
> >    at 2014-04-29 15:30:**        I visit hbase web and found hmaster
> hung , then i stop it and start a new  hmaster.
> >    at 2014-04-29 15:32:21,530    the new hmaster logs "Failed
> verification of hbase:meta,,1 at address=hadoop77,60020,1396606457005,
> exception=java.net.SocketTimeoutException:
> >                                  Call to hadoop77/192.168.1.87:60020failed because java.net.SocketTimeoutException"
> >    at 2014-04-29 15:32:28,364    the meta server received hmaster's
> message and shutdown itself.
> >
> >    after these, clients come back to normal
> >
> > -----邮件原件-----
> > 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> > 发送时间: 2014年5月5日 19:25
> > 收件人: user@hbase.apache.org
> > 主题: Re: 答复: meta server hungs ?
> >
> > There should be exception in regionserver log on  hadoop77/
> > 192.168.1.87:60020 above  this one:
> >
> > *********
> > 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> > regionserver.HRegionServer: ABORTING region server
> > hadoop77,60020,1396606457005:
> org.apache.hadoop.hbase.YouAreDeadException:
> > Server REPORT rejected; currently processing hadoop77,60020,1396606457005
> > as dead server
> >        at org.apache.hadoop.hbase.master.ServerManager.
> > checkIsDead(ServerManager.java:339)
> > *********
> >
> > Can you find it and past it. That exception should explain why
> > master declared  hadoop77/192.168.1.87:60020 as dead server.
> >
> > Regards
> > Samir
> >
> >
> > On Mon, May 5, 2014 at 11:39 AM, sunweiwei <sunww@asiainfo-linkage.com
> >wrote:
> >
> >> And  this is client log.
> >>
> >> 2014-04-29 13:53:57,271 WARN [main]
> >> org.apache.hadoop.hbase.client.ScannerCallable: Ignore, probably already
> >> closed
> >> java.net.SocketTimeoutException: Call to hadoop77/192.168.1.87:60020failed
> because java.net.SocketTimeoutException: 60000 millis timeout while
> >> waiting for channel to be ready for read. ch :
> >> java.nio.channels.SocketChannel[connected local=/192.168.1.102:56473
> remote=hadoop77/
> >> 192.168.1.87:60020]
> >>        at
> >> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> >>        at
> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> >>        at
> >>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:27332)
> >>        at
> >>
> org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:284)
> >>        at
> >>
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:152)
> >>        at
> >>
> org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:57)
> >>        at
> >>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:116)
> >>        at
> >>
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:94)
> >>        at
> >>
> org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:462)
> >>        at
> >>
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187)
> >>        at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:1095)
> >>        at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1155)
> >>        at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1047)
> >>        at
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
> >>        at
> >>
> org.apache.hadoop.hbase.client.AsyncProcess.findDestLocation(AsyncProcess.java:330)
> >>        at
> >>
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:281)
> >>        at
> >>
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:917)
> >>        at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:901)
> >>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:863)
> >>
> >> -----邮件原件-----
> >> 发件人: sunweiwei [mailto:sunww@asiainfo-linkage.com]
> >> 发送时间: 2014年5月5日 17:23
> >> 收件人: user@hbase.apache.org
> >> 主题: 答复: meta server hungs ?
> >>
> >> Thank you for reply.
> >> I find this logs in hadoop77/192.168.1.87. It seems like meta
> >> regionserver receive hmaster's message and shutdown itself.
> >> 2014-04-29 15:32:28,364 FATAL [regionserver60020]
> >> regionserver.HRegionServer: ABORTING region server
> >> hadoop77,60020,1396606457005:
> org.apache.hadoop.hbase.YouAreDeadException:
> >> Server REPORT rejected; currently processing
> hadoop77,60020,1396606457005
> >> as dead server
> >>        at
> >>
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:339)
> >>
> >>
> >> and  this is  gc  log:
> >> 2014-04-29T15:32:27.159+0800: 2150297.866: [GC 2150297.866: [ParNew:
> >> 449091K->52416K(471872K), 0.0411300 secs]
> 11582287K->11199419K(16724800K),
> >> 0.0413430 secs] [Times: user=0.00 sys=0.00, real=0.04 secs]
> >> 2014-04-29T15:32:28.160+0800: 2150298.867: [GC 2150298.867: [ParNew:
> >> 471859K->19313K(471872K), 0.0222250 secs]
> 11618863K->11175232K(16724800K),
> >> 0.0224050 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> >> 2014-04-29T15:32:29.063+0800: 2150299.769: [GC 2150299.769: [ParNew:
> >> 438769K->38887K(471872K), 0.0242330 secs]
> 11594688K->11194807K(16724800K),
> >> 0.0243580 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> >> 2014-04-29T15:32:29.861+0800: 2150300.568: [GC 2150300.568: [ParNew:
> >> 458343K->18757K(471872K), 0.0242790 secs]
> 11614263K->11180844K(16724800K),
> >> 0.0244340 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
> >> 2014-04-29T15:32:31.608+0800: 2150302.314: [GC 2150302.314: [ParNew:
> >> 438213K->4874K(471872K), 0.0221520 secs]
> 11600300K->11166960K(16724800K),
> >> 0.0222970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
> >> Heap
> >> par new generation   total 471872K, used 335578K [0x00000003fae00000,
> >> 0x000000041ae00000, 0x000000041ae00000)
> >>  eden space 419456K,  78% used [0x00000003fae00000, 0x000000040f0f41c8,
> >> 0x00000004147a0000)
> >>  from space 52416K,   9% used [0x0000000417ad0000, 0x0000000417f928e0,
> >> 0x000000041ae00000)
> >>  to   space 52416K,   0% used [0x00000004147a0000, 0x00000004147a0000,
> >> 0x0000000417ad0000)
> >> concurrent mark-sweep generation total 16252928K, used 11162086K
> >> [0x000000041ae00000, 0x00000007fae00000, 0x00000007fae00000)
> >> concurrent-mark-sweep perm gen total 81072K, used 48660K
> >> [0x00000007fae00000, 0x00000007ffd2c000, 0x0000000800000000)
> >>
> >>
> >>
> >> -----邮件原件-----
> >> 发件人: Samir Ahmic [mailto:ahmic.samir@gmail.com]
> >> 发送时间: 2014年5月5日 16:50
> >> 收件人: user@hbase.apache.org
> >> 抄送: sunweiwei
> >> 主题: Re: meta server hungs ?
> >>
> >> Hi,
> >> This exception:
> >> ****
> >> exception=java.net.SocketTimeoutException: Call to
> >> hadoop77/192.168.1.87:60020 failed because
> >> java.net.SocketTimeoutException:
> >> 60000 millis timeout while waiting for channel to be ready for read. ch
> :
> >> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> >> remote=hadoop77/192.168.1.87:60020]
> >> *****
> >> shows that there is connection timeout between master server and
> >> regionserver (hadoop77/192.168.1.87:60020) that is hosting 'meta'
> table.
> >> Real question is what is causing this timeout?  In my experience it can
> be
> >> by few things causing this type of timeout. I would suggest that you
> check
> >> hadoop77/192.168.1.87 <http://192.168.1.87:60020/> Garbage Collection,
> >> memory,  network, CPU disks and i'm sure you will find cause of timeout.
> >> You can us some diagnostic tools like vmstat, sar, iostat to check your
> >> sistem and you can use jstat to check GC and some other JVM stuff.
> >>
> >> Regards
> >> Samir
> >>
> >>
> >>
> >>
> >> On Mon, May 5, 2014 at 10:14 AM, sunweiwei <sunww@asiainfo-linkage.com
> >>> wrote:
> >>
> >>> Hi
> >>>
> >>> I'm using hbase0.96.0.
> >>>
> >>> I found client can't put data suddenly  and  hmaster hungs. Then I
> >> shutdown
> >>> the hmaster and start a new hmaster, then  the client back to normal.
> >>>
> >>>
> >>>
> >>> I found this logs in the new hmaster . It seem like meta server hungs
> and
> >>> hmaster stop the meta server.
> >>>
> >>> 2014-04-29 15:32:21,530 INFO  [master:hadoop1:60000]
> >>> catalog.CatalogTracker:
> >>> Failed verification of hbase:meta,,1 at
> >>> address=hadoop77,60020,1396606457005,
> >>> exception=java.net.SocketTimeoutException: Call to
> >>> hadoop77/192.168.1.87:60020 failed because
> >>> java.net.SocketTimeoutException:
> >>> 60000 millis timeout while waiting for channel to be ready for read.
> ch :
> >>> java.nio.channels.SocketChannel[connected local=/192.168.1.123:33117
> >>> remote=hadoop77/192.168.1.87:60020]
> >>>
> >>> 2014-04-29 15:32:21,532 INFO  [master:hadoop1:60000] master.HMaster:
> >>> Forcing
> >>> expire of hadoop77,60020,1396606457005
> >>>
> >>>
> >>>
> >>> I can't find why meta server hungs .I found this in meta server log
> >>>
> >>> 2014-04-29 13:53:55,637 INFO  [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner 8206938292079629452 lease expired
> on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>> 2014-04-29 13:53:56,632 INFO  [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner 1111451530521284267 lease expired
> on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner 516152687416913803 lease expired on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>> 2014-04-29 13:53:56,733 INFO  [regionserver60020.leaseChecker]
> >>> regionserver.HRegionServer: Scanner -2651411216936596082 lease expired
> on
> >>> region hbase:meta,,1.1588230740
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> any suggestion will be appreciated. Thanks.
> >
>
>