You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by 谢良 <xi...@xiaomi.com> on 2013/01/25 03:07:19 UTC

答复: GC pause issues

Hi Varun,

Please note if you try to increase new generation size, then the "ParNew" time will be up accordingly, and CMS YGC is also a STW.
could you have a try to reduce memstore size to a smaller value, e.g. 128m or 256m ?

Regards,
Liang
________________________________________
发件人: Varun Sharma [varun@pinterest.com]
发送时间: 2013年1月25日 1:40
收件人: user@hbase.apache.org
主题: GC pause issues

Hi,

I have a region server which has the following logs. As you can see from
the log, ParNew is sufficiently big (450M) and there are heavy writes going
in. I am seeing 200ms pauses which eventually build up and there is a
promotion failure. There is a parnew collection every 2-3 seconds so it
fills up real fast. My memstore size is bigger 512m for flushes and 4
regions per server. (overall size is 3G for all memstores) - I have mslab
enabled

2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew:
471841K->52416K(471872K), 0.2251880 secs] 8733008K->8445039K(12727104K),
0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew:
461593K->52416K(471872K), 0.2812690 secs] 8854216K->8557572K(12727104K),
0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew:
442836K->52416K(471872K), 0.2781490 secs] 8947992K->8705355K(12727104K),
0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark:
8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: user=0.02
sys=0.00, real=0.04 secs]
2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start]
2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew:
471872K->52251K(471872K), 0.1611970 secs] 9124811K->8831437K(12727104K),
0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark: 4.105/4.268
secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-preclean-start]
2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean:
0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
2013-01-24T13:08:26.597+0000: 63543.691:
[CMS-concurrent-abortable-preclean-start]
2013-01-24T13:08:27.401+0000: 63544.495:
[CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46
sys=0.16, real=0.80 secs]
2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K (471872
K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs
processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06
secs]
2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start]
2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew:
471707K->44566K(471872K), 0.1371770 secs] 9044714K->8701862K(12727104K),
0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew:
445714K->52416K(471872K), 0.5626120 secs] 8648805K->8410223K(12727104K),
0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew:
471872K->52416K(471872K), 0.2300560 secs] 8247804K->7976043K(12727104K),
0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew (promotion
failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217:
[CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep:
8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]

What might be advised here - should I up the size to 1G for the new
generation ?

Thanks
Varun

Re: 答复: GC pause issues

Posted by Jack Levin <ma...@gmail.com>.

How much memory are you flushing?  Can you paste the log here?  The
large the chunk of the flush the longer  your GC is going to be.

-Jack

On Fri, Jan 25, 2013 at 9:55 AM, Varun Sharma <va...@pinterest.com> wrote:
> My total heap size is 12G and my young gen is 1G. I am getting 200ms pauses
> every few seconds - the machine has 4 cores (its m1.xlarge on ec2), if I
> set the young gen low. The one thing I noted today was that the pauses are
> spaced @10 seconds until a minor compaction kicks in - and then then the
> pauses happen every 3-4 seconds. I found this to be highly correlated on a
> region server. These minor compactions are not huge - they are typically
> compacting 3 files - upto 200M in total size. They are lasting for 20-30
> seconds. Once they are over the GC pauses recover back to the 10 second
> frequency.
>
> I have the following settings enabled (index.cacheonwrite and
> bloom.cacheonwrite) to cache the index/bloom blocks upon hfile writes
> though i find it unlikely that they could be impacting my setup.
>
> On Thu, Jan 24, 2013 at 11:46 PM, Jack Levin <ma...@gmail.com> wrote:
>
>> Generally, the larger the flush to harder the GC will work. Flush more
>> often to avoid this. What is your total heap size set at?
>> On Jan 24, 2013 9:02 PM, "Varun Sharma" <va...@pinterest.com> wrote:
>>
>> > I do have significant block cache churn and this issue is typical
>> > correlated with a huge increase in read latencies - could that be the
>> > reason for this - mslab should be taking care of the memstore related
>> heap
>> > fragmentation ? Has anyone seen issues with block cache churn ?
>> >
>> > On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma <va...@pinterest.com>
>> wrote:
>> >
>> > > I am curious how reducing the memstore size would help - I have 6
>> regions
>> > > and 3G total for memstore - so I would like max out on that by having a
>> > > bigger flush size per region. Are you asking me to have more regions
>> and
>> > > smaller memstore flush size instead ? How is that likely to help
>> > >
>> > >
>> > > On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <xi...@xiaomi.com> wrote:
>> > >
>> > >> Hi Varun,
>> > >>
>> > >> Please note if you try to increase new generation size, then the
>> > "ParNew"
>> > >> time will be up accordingly, and CMS YGC is also a STW.
>> > >> could you have a try to reduce memstore size to a smaller value, e.g.
>> > >> 128m or 256m ?
>> > >>
>> > >> Regards,
>> > >> Liang
>> > >> ________________________________________
>> > >> 发件人: Varun Sharma [varun@pinterest.com]
>> > >> 发送时间: 2013年1月25日 1:40
>> > >> 收件人: user@hbase.apache.org
>> > >> 主题: GC pause issues
>> > >>
>> > >> Hi,
>> > >>
>> > >> I have a region server which has the following logs. As you can see
>> from
>> > >> the log, ParNew is sufficiently big (450M) and there are heavy writes
>> > >> going
>> > >> in. I am seeing 200ms pauses which eventually build up and there is a
>> > >> promotion failure. There is a parnew collection every 2-3 seconds so
>> it
>> > >> fills up real fast. My memstore size is bigger 512m for flushes and 4
>> > >> regions per server. (overall size is 3G for all memstores) - I have
>> > mslab
>> > >> enabled
>> > >>
>> > >> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew:
>> > >> 471841K->52416K(471872K), 0.2251880 secs]
>> 8733008K->8445039K(12727104K),
>> > >> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
>> > >> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew:
>> > >> 461593K->52416K(471872K), 0.2812690 secs]
>> 8854216K->8557572K(12727104K),
>> > >> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
>> > >> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew:
>> > >> 442836K->52416K(471872K), 0.2781490 secs]
>> 8947992K->8705355K(12727104K),
>> > >> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
>> > >> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark:
>> > >> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times:
>> > >> user=0.02
>> > >> sys=0.00, real=0.04 secs]
>> > >> 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start]
>> > >> 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew:
>> > >> 471872K->52251K(471872K), 0.1611970 secs]
>> 9124811K->8831437K(12727104K),
>> > >> 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
>> > >> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark:
>> > 4.105/4.268
>> > >> secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
>> > >> 2013-01-24T13:08:26.434+0000: 63543.527:
>> [CMS-concurrent-preclean-start]
>> > >> 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean:
>> > >> 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
>> > >> 2013-01-24T13:08:26.597+0000: 63543.691:
>> > >> [CMS-concurrent-abortable-preclean-start]
>> > >> 2013-01-24T13:08:27.401+0000: 63544.495:
>> > >> [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times:
>> user=1.46
>> > >> sys=0.16, real=0.80 secs]
>> > >> 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K
>> > >> (471872
>> > >> K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak
>> refs
>> > >> processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
>> > >> 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01,
>> > real=0.06
>> > >> secs]
>> > >> 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start]
>> > >> 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew:
>> > >> 471707K->44566K(471872K), 0.1371770 secs]
>> 9044714K->8701862K(12727104K),
>> > >> 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
>> > >> 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew:
>> > >> 445714K->52416K(471872K), 0.5626120 secs]
>> 8648805K->8410223K(12727104K),
>> > >> 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
>> > >> 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew:
>> > >> 471872K->52416K(471872K), 0.2300560 secs]
>> 8247804K->7976043K(12727104K),
>> > >> 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
>> > >> 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew
>> > (promotion
>> > >> failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217:
>> > >> [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep:
>> > >> 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]
>> > >>
>> > >> What might be advised here - should I up the size to 1G for the new
>> > >> generation ?
>> > >>
>> > >> Thanks
>> > >> Varun
>> > >>
>> > >
>> > >
>> >
>>

Re: 答复: GC pause issues

Posted by Varun Sharma <va...@pinterest.com>.

My total heap size is 12G and my young gen is 1G. I am getting 200ms pauses
every few seconds - the machine has 4 cores (its m1.xlarge on ec2), if I
set the young gen low. The one thing I noted today was that the pauses are
spaced @10 seconds until a minor compaction kicks in - and then then the
pauses happen every 3-4 seconds. I found this to be highly correlated on a
region server. These minor compactions are not huge - they are typically
compacting 3 files - upto 200M in total size. They are lasting for 20-30
seconds. Once they are over the GC pauses recover back to the 10 second
frequency.

I have the following settings enabled (index.cacheonwrite and
bloom.cacheonwrite) to cache the index/bloom blocks upon hfile writes
though i find it unlikely that they could be impacting my setup.

On Thu, Jan 24, 2013 at 11:46 PM, Jack Levin <ma...@gmail.com> wrote:

> Generally, the larger the flush to harder the GC will work. Flush more
> often to avoid this. What is your total heap size set at?
> On Jan 24, 2013 9:02 PM, "Varun Sharma" <va...@pinterest.com> wrote:
>
> > I do have significant block cache churn and this issue is typical
> > correlated with a huge increase in read latencies - could that be the
> > reason for this - mslab should be taking care of the memstore related
> heap
> > fragmentation ? Has anyone seen issues with block cache churn ?
> >
> > On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma <va...@pinterest.com>
> wrote:
> >
> > > I am curious how reducing the memstore size would help - I have 6
> regions
> > > and 3G total for memstore - so I would like max out on that by having a
> > > bigger flush size per region. Are you asking me to have more regions
> and
> > > smaller memstore flush size instead ? How is that likely to help
> > >
> > >
> > > On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <xi...@xiaomi.com> wrote:
> > >
> > >> Hi Varun,
> > >>
> > >> Please note if you try to increase new generation size, then the
> > "ParNew"
> > >> time will be up accordingly, and CMS YGC is also a STW.
> > >> could you have a try to reduce memstore size to a smaller value, e.g.
> > >> 128m or 256m ?
> > >>
> > >> Regards,
> > >> Liang
> > >> ________________________________________
> > >> 发件人: Varun Sharma [varun@pinterest.com]
> > >> 发送时间: 2013年1月25日 1:40
> > >> 收件人: user@hbase.apache.org
> > >> 主题: GC pause issues
> > >>
> > >> Hi,
> > >>
> > >> I have a region server which has the following logs. As you can see
> from
> > >> the log, ParNew is sufficiently big (450M) and there are heavy writes
> > >> going
> > >> in. I am seeing 200ms pauses which eventually build up and there is a
> > >> promotion failure. There is a parnew collection every 2-3 seconds so
> it
> > >> fills up real fast. My memstore size is bigger 512m for flushes and 4
> > >> regions per server. (overall size is 3G for all memstores) - I have
> > mslab
> > >> enabled
> > >>
> > >> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew:
> > >> 471841K->52416K(471872K), 0.2251880 secs]
> 8733008K->8445039K(12727104K),
> > >> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
> > >> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew:
> > >> 461593K->52416K(471872K), 0.2812690 secs]
> 8854216K->8557572K(12727104K),
> > >> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
> > >> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew:
> > >> 442836K->52416K(471872K), 0.2781490 secs]
> 8947992K->8705355K(12727104K),
> > >> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
> > >> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark:
> > >> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times:
> > >> user=0.02
> > >> sys=0.00, real=0.04 secs]
> > >> 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start]
> > >> 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew:
> > >> 471872K->52251K(471872K), 0.1611970 secs]
> 9124811K->8831437K(12727104K),
> > >> 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
> > >> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark:
> > 4.105/4.268
> > >> secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
> > >> 2013-01-24T13:08:26.434+0000: 63543.527:
> [CMS-concurrent-preclean-start]
> > >> 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean:
> > >> 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
> > >> 2013-01-24T13:08:26.597+0000: 63543.691:
> > >> [CMS-concurrent-abortable-preclean-start]
> > >> 2013-01-24T13:08:27.401+0000: 63544.495:
> > >> [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times:
> user=1.46
> > >> sys=0.16, real=0.80 secs]
> > >> 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K
> > >> (471872
> > >> K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak
> refs
> > >> processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
> > >> 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01,
> > real=0.06
> > >> secs]
> > >> 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start]
> > >> 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew:
> > >> 471707K->44566K(471872K), 0.1371770 secs]
> 9044714K->8701862K(12727104K),
> > >> 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
> > >> 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew:
> > >> 445714K->52416K(471872K), 0.5626120 secs]
> 8648805K->8410223K(12727104K),
> > >> 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
> > >> 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew:
> > >> 471872K->52416K(471872K), 0.2300560 secs]
> 8247804K->7976043K(12727104K),
> > >> 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
> > >> 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew
> > (promotion
> > >> failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217:
> > >> [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep:
> > >> 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]
> > >>
> > >> What might be advised here - should I up the size to 1G for the new
> > >> generation ?
> > >>
> > >> Thanks
> > >> Varun
> > >>
> > >
> > >
> >
>

Re: 答复: GC pause issues

Posted by Jack Levin <ma...@gmail.com>.

Generally, the larger the flush to harder the GC will work. Flush more
often to avoid this. What is your total heap size set at?
On Jan 24, 2013 9:02 PM, "Varun Sharma" <va...@pinterest.com> wrote:

> I do have significant block cache churn and this issue is typical
> correlated with a huge increase in read latencies - could that be the
> reason for this - mslab should be taking care of the memstore related heap
> fragmentation ? Has anyone seen issues with block cache churn ?
>
> On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma <va...@pinterest.com> wrote:
>
> > I am curious how reducing the memstore size would help - I have 6 regions
> > and 3G total for memstore - so I would like max out on that by having a
> > bigger flush size per region. Are you asking me to have more regions and
> > smaller memstore flush size instead ? How is that likely to help
> >
> >
> > On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <xi...@xiaomi.com> wrote:
> >
> >> Hi Varun,
> >>
> >> Please note if you try to increase new generation size, then the
> "ParNew"
> >> time will be up accordingly, and CMS YGC is also a STW.
> >> could you have a try to reduce memstore size to a smaller value, e.g.
> >> 128m or 256m ?
> >>
> >> Regards,
> >> Liang
> >> ________________________________________
> >> 发件人: Varun Sharma [varun@pinterest.com]
> >> 发送时间: 2013年1月25日 1:40
> >> 收件人: user@hbase.apache.org
> >> 主题: GC pause issues
> >>
> >> Hi,
> >>
> >> I have a region server which has the following logs. As you can see from
> >> the log, ParNew is sufficiently big (450M) and there are heavy writes
> >> going
> >> in. I am seeing 200ms pauses which eventually build up and there is a
> >> promotion failure. There is a parnew collection every 2-3 seconds so it
> >> fills up real fast. My memstore size is bigger 512m for flushes and 4
> >> regions per server. (overall size is 3G for all memstores) - I have
> mslab
> >> enabled
> >>
> >> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew:
> >> 471841K->52416K(471872K), 0.2251880 secs] 8733008K->8445039K(12727104K),
> >> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
> >> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew:
> >> 461593K->52416K(471872K), 0.2812690 secs] 8854216K->8557572K(12727104K),
> >> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
> >> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew:
> >> 442836K->52416K(471872K), 0.2781490 secs] 8947992K->8705355K(12727104K),
> >> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
> >> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark:
> >> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times:
> >> user=0.02
> >> sys=0.00, real=0.04 secs]
> >> 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start]
> >> 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew:
> >> 471872K->52251K(471872K), 0.1611970 secs] 9124811K->8831437K(12727104K),
> >> 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
> >> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark:
> 4.105/4.268
> >> secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
> >> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-preclean-start]
> >> 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean:
> >> 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
> >> 2013-01-24T13:08:26.597+0000: 63543.691:
> >> [CMS-concurrent-abortable-preclean-start]
> >> 2013-01-24T13:08:27.401+0000: 63544.495:
> >> [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46
> >> sys=0.16, real=0.80 secs]
> >> 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K
> >> (471872
> >> K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs
> >> processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
> >> 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01,
> real=0.06
> >> secs]
> >> 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start]
> >> 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew:
> >> 471707K->44566K(471872K), 0.1371770 secs] 9044714K->8701862K(12727104K),
> >> 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
> >> 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew:
> >> 445714K->52416K(471872K), 0.5626120 secs] 8648805K->8410223K(12727104K),
> >> 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
> >> 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew:
> >> 471872K->52416K(471872K), 0.2300560 secs] 8247804K->7976043K(12727104K),
> >> 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
> >> 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew
> (promotion
> >> failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217:
> >> [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep:
> >> 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]
> >>
> >> What might be advised here - should I up the size to 1G for the new
> >> generation ?
> >>
> >> Thanks
> >> Varun
> >>
> >
> >
>

Re: 答复: GC pause issues

Posted by Varun Sharma <va...@pinterest.com>.

I do have significant block cache churn and this issue is typical
correlated with a huge increase in read latencies - could that be the
reason for this - mslab should be taking care of the memstore related heap
fragmentation ? Has anyone seen issues with block cache churn ?

On Thu, Jan 24, 2013 at 6:30 PM, Varun Sharma <va...@pinterest.com> wrote:

> I am curious how reducing the memstore size would help - I have 6 regions
> and 3G total for memstore - so I would like max out on that by having a
> bigger flush size per region. Are you asking me to have more regions and
> smaller memstore flush size instead ? How is that likely to help
>
>
> On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <xi...@xiaomi.com> wrote:
>
>> Hi Varun,
>>
>> Please note if you try to increase new generation size, then the "ParNew"
>> time will be up accordingly, and CMS YGC is also a STW.
>> could you have a try to reduce memstore size to a smaller value, e.g.
>> 128m or 256m ?
>>
>> Regards,
>> Liang
>> ________________________________________
>> 发件人: Varun Sharma [varun@pinterest.com]
>> 发送时间: 2013年1月25日 1:40
>> 收件人: user@hbase.apache.org
>> 主题: GC pause issues
>>
>> Hi,
>>
>> I have a region server which has the following logs. As you can see from
>> the log, ParNew is sufficiently big (450M) and there are heavy writes
>> going
>> in. I am seeing 200ms pauses which eventually build up and there is a
>> promotion failure. There is a parnew collection every 2-3 seconds so it
>> fills up real fast. My memstore size is bigger 512m for flushes and 4
>> regions per server. (overall size is 3G for all memstores) - I have mslab
>> enabled
>>
>> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew:
>> 471841K->52416K(471872K), 0.2251880 secs] 8733008K->8445039K(12727104K),
>> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
>> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew:
>> 461593K->52416K(471872K), 0.2812690 secs] 8854216K->8557572K(12727104K),
>> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
>> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew:
>> 442836K->52416K(471872K), 0.2781490 secs] 8947992K->8705355K(12727104K),
>> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
>> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark:
>> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times:
>> user=0.02
>> sys=0.00, real=0.04 secs]
>> 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start]
>> 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew:
>> 471872K->52251K(471872K), 0.1611970 secs] 9124811K->8831437K(12727104K),
>> 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
>> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark: 4.105/4.268
>> secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
>> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-preclean-start]
>> 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean:
>> 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
>> 2013-01-24T13:08:26.597+0000: 63543.691:
>> [CMS-concurrent-abortable-preclean-start]
>> 2013-01-24T13:08:27.401+0000: 63544.495:
>> [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46
>> sys=0.16, real=0.80 secs]
>> 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K
>> (471872
>> K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs
>> processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
>> 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06
>> secs]
>> 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start]
>> 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew:
>> 471707K->44566K(471872K), 0.1371770 secs] 9044714K->8701862K(12727104K),
>> 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
>> 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew:
>> 445714K->52416K(471872K), 0.5626120 secs] 8648805K->8410223K(12727104K),
>> 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
>> 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew:
>> 471872K->52416K(471872K), 0.2300560 secs] 8247804K->7976043K(12727104K),
>> 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
>> 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew (promotion
>> failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217:
>> [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep:
>> 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]
>>
>> What might be advised here - should I up the size to 1G for the new
>> generation ?
>>
>> Thanks
>> Varun
>>
>
>

Re: 答复: GC pause issues

Posted by Varun Sharma <va...@pinterest.com>.

I am curious how reducing the memstore size would help - I have 6 regions
and 3G total for memstore - so I would like max out on that by having a
bigger flush size per region. Are you asking me to have more regions and
smaller memstore flush size instead ? How is that likely to help

On Thu, Jan 24, 2013 at 6:07 PM, 谢良 <xi...@xiaomi.com> wrote:

> Hi Varun,
>
> Please note if you try to increase new generation size, then the "ParNew"
> time will be up accordingly, and CMS YGC is also a STW.
> could you have a try to reduce memstore size to a smaller value, e.g. 128m
> or 256m ?
>
> Regards,
> Liang
> ________________________________________
> 发件人: Varun Sharma [varun@pinterest.com]
> 发送时间: 2013年1月25日 1:40
> 收件人: user@hbase.apache.org
> 主题: GC pause issues
>
> Hi,
>
> I have a region server which has the following logs. As you can see from
> the log, ParNew is sufficiently big (450M) and there are heavy writes going
> in. I am seeing 200ms pauses which eventually build up and there is a
> promotion failure. There is a parnew collection every 2-3 seconds so it
> fills up real fast. My memstore size is bigger 512m for flushes and 4
> regions per server. (overall size is 3G for all memstores) - I have mslab
> enabled
>
> 2013-01-24T13:08:16.870+0000: 63533.964: [GC 63533.964: [ParNew:
> 471841K->52416K(471872K), 0.2251880 secs] 8733008K->8445039K(12727104K),
> 0.2254100 secs] [Times: user=0.50 sys=0.18, real=0.22 secs]
> 2013-01-24T13:08:19.546+0000: 63536.639: [GC 63536.639: [ParNew:
> 461593K->52416K(471872K), 0.2812690 secs] 8854216K->8557572K(12727104K),
> 0.2814870 secs] [Times: user=0.66 sys=0.09, real=0.29 secs]
> 013-01-24T13:08:21.824+0000: 63538.917: [GC 63538.918: [ParNew:
> 442836K->52416K(471872K), 0.2781490 secs] 8947992K->8705355K(12727104K),
> 0.2783810 secs] [Times: user=0.58 sys=0.14, real=0.28 secs]
> 2013-01-24T13:08:22.122+0000: 63539.216: [GC [1 CMS-initial-mark:
> 8652939K(12255232K)] 8752914K(12727104K), 0.0365000 secs] [Times: user=0.02
> sys=0.00, real=0.04 secs]
> 2013-01-24T13:08:22.159+0000: 63539.253: [CMS-concurrent-mark-start]
> 2013-01-24T13:08:24.953+0000: 63542.047: [GC 63542.047: [ParNew:
> 471872K->52251K(471872K), 0.1611970 secs] 9124811K->8831437K(12727104K),
> 0.1614180 secs] [Times: user=0.37 sys=0.19, real=0.16 secs]
> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-mark: 4.105/4.268
> secs] [Times: user=5.36 sys=0.34, real=4.27 secs]
> 2013-01-24T13:08:26.434+0000: 63543.527: [CMS-concurrent-preclean-start]
> 2013-01-24T13:08:26.597+0000: 63543.691: [CMS-concurrent-preclean:
> 0.133/0.163 secs] [Times: user=0.16 sys=0.05, real=0.17 secs]
> 2013-01-24T13:08:26.597+0000: 63543.691:
> [CMS-concurrent-abortable-preclean-start]
> 2013-01-24T13:08:27.401+0000: 63544.495:
> [CMS-concurrent-abortable-preclean: 0.792/0.804 secs] [Times: user=1.46
> sys=0.16, real=0.80 secs]
> 2013-01-24T13:08:27.403+0000: 63544.496: [GC[YG occupancy: 274458 K (471872
> K)]63544.496: [Rescan (parallel) , 0.0540730 secs]63544.551: [weak refs
> processing, 0.0001700 secs] [1 CMS-remark: 8779186K(12255232K)]
> 9053645K(12727104K), 0.0544410 secs] [Times: user=0.20 sys=0.01, real=0.06
> secs]
> 2013-01-24T13:08:27.458+0000: 63544.551: [CMS-concurrent-sweep-start]
> 2013-01-24T13:08:27.955+0000: 63545.048: [GC 63545.049: [ParNew:
> 471707K->44566K(471872K), 0.1371770 secs] 9044714K->8701862K(12727104K),
> 0.1374060 secs] [Times: user=0.35 sys=0.12, real=0.14 secs]
> 2013-01-24T13:08:29.285+0000: 63546.378: [GC 63546.478: [ParNew:
> 445714K->52416K(471872K), 0.5626120 secs] 8648805K->8410223K(12727104K),
> 0.5628610 secs] [Times: user=0.91 sys=0.08, real=0.66 secs]
> 2013-01-24T13:08:32.308+0000: 63549.401: [GC 63549.402: [ParNew:
> 471872K->52416K(471872K), 0.2300560 secs] 8247804K->7976043K(12727104K),
> 0.2302900 secs] [Times: user=0.62 sys=0.17, real=0.23 secs]
> 2013-01-24T13:08:34.844+0000: 63551.938: [GC 63551.938: [ParNew (promotion
> failed): 471872K->471872K(471872K), 0.2788500 secs]63552.217:
> [CMS2013-01-24T13:08:37.256+0000: 63554.349: [CMS-concurrent-sweep:
> 8.473/9.798 secs] [*Times: user=15.26 sys=1.11, real=9.80 secs*]
>
> What might be advised here - should I up the size to 1G for the new
> generation ?
>
> Thanks
> Varun
>