You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by 谢良 <xi...@xiaomi.com> on 2014/07/08 08:28:55 UTC

答复: Region server not accept connections intermittently

Coud you try with "-XX:+PrintGCApplicationStoppedTime" vm parameter ?
the hung from vm side was not caused by GC always

Thanks,
________________________________________
发件人: Rural Hunter [ruralhunter@gmail.com]
发送时间: 2014年7月8日 14:06
收件人: user@hbase.apache.org
主题: Region server not accept connections intermittently

Hi,

I'm using hbase-0.96.2. I saw sometimes my region servers don't accept
connections from clients. this could last like 10 minutes to half hour.
I was not able to connect to the 60020 port even with telnet command
when it happened. After a while, the problem disappeared and the region
server behaves normally. My region server is configured with 8G heap and
the usage always kepes around at 3-5G. I checked the log and don't see
error log. The log seems normal:

2014-07-08 10:13:26,192 INFO  [MemStoreFlusher.0]
regionserver.DefaultStoreFlusher: Flushed, sequenceid=1450565470,
memsize=937.9 K, hasBloomFilter=true, into tmp file
hdfs://master:9000/hbase/data/default/tbdoc/f7635647bcae9291971843131c4add0b/.tmp/d1a4e8fa945444958141dcdaa07320ab
2014-07-08 10:13:26,208 INFO  [MemStoreFlusher.0] regionserver.HStore:
Added
hdfs://master:9000/hbase/data/default/tbdoc/f7635647bcae9291971843131c4add0b/cf1/d1a4e8fa945444958141dcdaa07320ab,
entries=4494, sequenceid=1450565470, filesize=146.8 K
2014-07-08 10:13:26,209 INFO  [MemStoreFlusher.0] regionserver.HRegion:
Finished memstore flush of ~957.3 K/980232, currentsize=0/0 for region
tbdoc,,1390820890834.f7635647bcae9291971843131c4add0b. in 40ms,
sequenceid=1450565470, compaction requested=true
2014-07-08 10:13:26,209 INFO
[regionserver60020-smallCompactions-1397552664633] regionserver.HRegion:
Starting compaction on cf1 in region
tbdoc,,1390820890834.f7635647bcae9291971843131c4add0b.
2014-07-08 10:13:26,209 INFO
[regionserver60020-smallCompactions-1397552664633] regionserver.HStore:
Starting compaction of 3 file(s) in cf1 of
tbdoc,,1390820890834.f7635647bcae9291971843131c4add0b. into
tmpdir=hdfs://master:9000/hbase/data/default/tbdoc/f7635647bcae9291971843131c4add0b/.tmp,
totalSize=705.6 K
2014-07-08 10:13:26,390 INFO
[regionserver60020-smallCompactions-1397552664633] regionserver.HStore:
Completed compaction of 3 file(s) in cf1 of
tbdoc,,1390820890834.f7635647bcae9291971843131c4add0b. into
69b7c19f59994dffbb236faa03dc1090(size=693.8 K), total size for store is
1.1 G. This selection was in queue for 0sec, and took 0sec to execute.
2014-07-08 10:13:26,390 INFO
[regionserver60020-smallCompactions-1397552664633]
regionserver.CompactSplitThread: Completed compaction: Request =
regionName=tbdoc,,1390820890834.f7635647bcae9291971843131c4add0b.,
storeName=cf1, fileCount=3, fileSize=705.6 K, priority=6,
time=47043081604802310; duration=0sec
2014-07-08 10:29:06,383 INFO  [regionserver60020.periodicFlusher]
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
flush for region
tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
after a delay of 10113
2014-07-08 10:29:16,383 INFO  [regionserver60020.periodicFlusher]
regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
flush for region
tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
after a delay of 3456
2014-07-08 10:29:16,537 INFO  [MemStoreFlusher.0]
regionserver.DefaultStoreFlusher: Flushed, sequenceid=1450574203,
memsize=425.0 K, hasBloomFilter=true, into tmp file
hdfs://master:9000/hbase/data/default/tbguid/ded050ffeacd83613a0e3db7baaee4f0/.tmp/cabc231951cb473695c3bac045456cee
2014-07-08 10:29:16,549 INFO  [MemStoreFlusher.0] regionserver.HStore:
Added
hdfs://master:9000/hbase/data/default/tbguid/ded050ffeacd83613a0e3db7baaee4f0/cf1/cabc231951cb473695c3bac045456cee,
entries=1984, sequenceid=1450574203, filesize=83.0 K
2014-07-08 10:29:16,549 INFO  [MemStoreFlusher.0] regionserver.HRegion:
Finished memstore flush of ~429 K/439296, currentsize=0/0 for region
tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
in 53ms, sequenceid=1450574203, compaction requested=true
2014-07-08 10:29:16,550 INFO
[regionserver60020-smallCompactions-1397552664633] regionserver.HRegion:
Starting compaction on cf1 in region
tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
2014-07-08 10:29:16,550 INFO
[regionserver60020-smallCompactions-1397552664633] regionserver.HStore:
Starting compaction of 3 file(s) in cf1 of
tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
into
tmpdir=hdfs://master:9000/hbase/data/default/tbguid/ded050ffeacd83613a0e3db7baaee4f0/.tmp,
totalSize=9.0 M
2014-07-08 10:29:17,053 INFO
[regionserver60020-smallCompactions-1397552664633] regionserver.HStore:
Completed compaction of 3 file(s) in cf1 of
tbguid,781e82be88ce94de7bb2dee9f3436e97,1392692827380.ded050ffeacd83613a0e3db7baaee4f0.
into e923e5ad17c54bf4934ef07e3d0b5496(size=9.0 M), total size for store
is 582.3 M. This selection was in queue for 0sec, and took 0sec to execute.

I suspect it's the gc problem so I checked the gc log and found it kept
print logs like this(both when the problem happens or not):
2014-07-08T10:26:45.273+0800: 7233757.674: [GC 7233757.674: [ParNew:
112969K->8702K(118016K), 0.0652040 secs] 5262052K->5160850K(8375552K),
0.0653760 secs] [Times: user=0.00 sys=0.00, real=0.07 secs]
2014-07-08T10:26:46.836+0800: 7233759.237: [GC 7233759.237: [ParNew:
113662K->6949K(118016K), 0.0342530 secs] 5265810K->5163641K(8375552K),
0.0343870 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-07-08T10:26:48.422+0800: 7233760.823: [GC 7233760.823: [ParNew:
111909K->4855K(118016K), 0.0269810 secs] 5268601K->5165318K(8375552K),
0.0271120 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-07-08T10:26:49.896+0800: 7233762.296: [GC 7233762.296: [ParNew:
109815K->7721K(118016K), 0.0251810 secs] 5270278K->5168184K(8375552K),
0.0253300 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:26:51.334+0800: 7233763.734: [GC 7233763.735: [ParNew:
112681K->8670K(118016K), 0.0233590 secs] 5273144K->5171935K(8375552K),
0.0237200 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-07-08T10:26:52.701+0800: 7233765.102: [GC 7233765.102: [ParNew:
113390K->7880K(118016K), 0.0266700 secs] 5276656K->5175725K(8375552K),
0.0268250 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-07-08T10:26:54.166+0800: 7233766.566: [GC 7233766.566: [ParNew:
112840K->7022K(118016K), 0.0249410 secs] 5280685K->5178575K(8375552K),
0.0250850 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:26:55.586+0800: 7233767.986: [GC 7233767.987: [ParNew:
111982K->6807K(118016K), 0.0317690 secs] 5283535K->5182134K(8375552K),
0.0319390 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-07-08T10:26:57.006+0800: 7233769.407: [GC 7233769.407: [ParNew:
111767K->7417K(118016K), 0.0199740 secs] 5287094K->5185610K(8375552K),
0.0201160 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:26:58.520+0800: 7233770.921: [GC 7233770.921: [ParNew:
112377K->6285K(118016K), 0.0219640 secs] 5290570K->5188051K(8375552K),
0.0220970 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:27:00.077+0800: 7233772.478: [GC 7233772.478: [ParNew:
111245K->8191K(118016K), 0.0215280 secs] 5293011K->5189957K(8375552K),
0.0216740 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:27:01.571+0800: 7233773.971: [GC 7233773.972: [ParNew:
113151K->8510K(118016K), 0.0185730 secs] 5294917K->5193731K(8375552K),
0.0187220 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:27:02.958+0800: 7233775.359: [GC 7233775.359: [ParNew:
113470K->9384K(118016K), 0.0252470 secs] 5298691K->5197072K(8375552K),
0.0253900 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:27:04.416+0800: 7233776.817: [GC 7233776.817: [ParNew:
114344K->6721K(118016K), 0.0203040 secs] 5302032K->5200076K(8375552K),
0.0204510 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:27:05.891+0800: 7233778.291: [GC 7233778.291: [ParNew:
111681K->6341K(118016K), 0.0434460 secs] 5305036K->5203021K(8375552K),
0.0436020 secs] [Times: user=0.00 sys=0.00, real=0.05 secs]
2014-07-08T10:27:07.497+0800: 7233779.898: [GC 7233779.898: [ParNew:
111301K->9838K(118016K), 0.0177230 secs] 5307981K->5206518K(8375552K),
0.0178770 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:27:08.904+0800: 7233781.305: [GC 7233781.305: [ParNew:
114788K->7575K(118016K), 0.0231210 secs] 5311469K->5209993K(8375552K),
0.0233670 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:27:10.305+0800: 7233782.706: [GC 7233782.706: [ParNew:
112535K->7616K(118016K), 0.0226150 secs] 5314953K->5213487K(8375552K),
0.0227620 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:27:11.942+0800: 7233784.343: [GC 7233784.343: [ParNew:
112576K->7835K(118016K), 0.0244290 secs] 5318447K->5217156K(8375552K),
0.0245700 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]
2014-07-08T10:27:13.565+0800: 7233785.966: [GC 7233785.966: [ParNew:
112795K->7254K(118016K), 0.0206670 secs] 5322116K->5220093K(8375552K),
0.0207760 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:27:15.107+0800: 7233787.507: [GC 7233787.508: [ParNew:
112101K->5551K(118016K), 0.0213290 secs] 5324940K->5222345K(8375552K),
0.0214360 secs] [Times: user=0.00 sys=0.00, real=0.02 secs]
2014-07-08T10:27:16.818+0800: 7233789.218: [GC 7233789.218: [ParNew:
110511K->8559K(118016K), 0.0326800 secs] 5327305K->5225353K(8375552K),
0.0328540 secs] [Times: user=0.00 sys=0.00, real=0.03 secs]

I also anaylzed the gc log and found the max pause of gc is only around
2 seconds. So I don't see any gc problem. Can anyone point out the
possible cause of this problem?


Re: 答复: Region server not accept connections intermittently

Posted by Rural Hunter <ru...@gmail.com>.
I checked the parameter and it seems also a gc parameter to print the
total time of "stop the world". So will it help to get the info about
hung "was not caused by GC "?

于 2014/7/8 14:28, 谢良 写道:
> Coud you try with "-XX:+PrintGCApplicationStoppedTime" vm parameter ?
> the hung from vm side was not caused by GC always
>
> Thanks
>