You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by 谢良 <xi...@xiaomi.com> on 2012/12/07 10:58:14 UTC

答复: Multiple regionservers on a single node

Emm, have you tried to tune your GC deeply? please provide the exactly VM options and jdk version and GC logs..
In our test cluster this week, i managed to reduce the longest STW from 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress long-term-testing.

Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev mail list:)
And the G1GC within jdk7u4+ is a potential solution for large-heap senario as well:)
________________________________________
> On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <ichhabra@rocketfuel.com
> >wrote:
>
> > Hi,
> > Has anybody tried to run multiple RegionServers on a single physical
> > node? Are there deep technical issues or minor impediments that would
> > hinder this?
> >
> > We are trying to do this because we are facing a lot of GC pauses on the
> > large heap sizes (~70G) that we are using, which leads to a lot of
> timeouts
> > in our latency critical application. More processes with smaller heaps
> > would help in mitigating this issue.
> >
> > Any experience or thoughts on this would help.
> > Thanks!
> >
> > --
> > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556 6803
> >
>
>
>
> --
>
> Robert Dyer
> rdyer@iastate.edu
>

Re: 答复: Re:Re: 答复: Multiple regionservers on a single node

Posted by Ishan Chhabra <ic...@rocketfuel.com>.

Hi Xieliang,
You have put in an interesting set of GC optimizations, similar to what I
concluded after extensive GC tuning recently. For latency critical
applications running on modern servers with large rams and multicore CPUs,
the key seems to be in minimizing stop the world causes cause by Young GC,
CMS initial-mark and CMS remark. Your GC options seems to capture that very
well. Thanks for sharing!


On Tue, Dec 11, 2012 at 12:42 AM, 谢良 <xi...@xiaomi.com> wrote:

> Sure, here it is :
> -Xmx49152m -Xms49152m -Xmn1024m -Xss256k -XX:MaxDirectMemorySize=1024m
> -XX:MaxPermSize=512m -XX:PermSize=512m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/home/work/log/hbase/ggsrv-miliao/regionserver
> -XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC -verbose:gc
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -Xloggc:/home/work/log/hbase/ggsrv-miliao/regionserver/regionserver_gc.log
> -XX:SurvivorRatio=1 -XX:+UseCMSCompactAtFullCollection
> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> -XX:+CMSParallelRemarkEnabled -XX:+UseNUMA -XX:+CMSClassUnloadingEnabled
> -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1
> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution
> -XX:CMSMaxAbortablePrecleanTime=10000 -XX:MaxGCPauseMillis=2000
> -XX:TargetSurvivorRatio=80 -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m -XX:CMSWaitDuration=2000
> -XX:+CMSScavengeBeforeRemark -XX:+PrintClassHistogramAfterFullGC
> -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintPromotionFailure
> -XX:ConcGCThreads=8 -XX:ParallelGCThreads=8 -XX:PretenureSizeThreshold=4m
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent
>
> I'm not a vm developer, so it's a suboptimal setting definitely, please do
> not apply it into product env directly w/o any testing with your data
> model. Any comments are welcome:)
>
> Liang,
> ________________________________________
> 发件人: Azury [ziqidonglai1979@126.com]
> 发送时间: 2012年12月11日 14:47
> 收件人: user@hbase.apache.org
> 主题: Re:Re: 答复: Multiple regionservers on a single node
>
> Can you share your GC command options here?
>
>
>
>
>
>
>
>
> 在 2012-12-11 06:21:08，"Adrien Mogenet" <ad...@gmail.com> 写道：
> >On Fri, Dec 7, 2012 at 10:58 AM, 谢良 <xi...@xiaomi.com> wrote:
> >
> >> Emm, have you tried to tune your GC deeply? please provide the exactly
> VM
> >> options and jdk version and GC logs..
> >> In our test cluster this week, i managed to reduce the longest STW from
> >> 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress
> >> long-term-testing.
> >>
> >
> >Do you have any further explanation on your specific case ? Looks
> >interesting :-)
> >
> >
> >>
> >> Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev
> >> mail list:)
> >> And the G1GC within jdk7u4+ is a potential solution for large-heap
> senario
> >> as well:)
> >> ________________________________________
> >> > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <
> ichhabra@rocketfuel.com
> >> > >wrote:
> >> >
> >> > > Hi,
> >> > > Has anybody tried to run multiple RegionServers on a single physical
> >> > > node? Are there deep technical issues or minor impediments that
> would
> >> > > hinder this?
> >> > >
> >> > > We are trying to do this because we are facing a lot of GC pauses on
> >> the
> >> > > large heap sizes (~70G) that we are using, which leads to a lot of
> >> > timeouts
> >> > > in our latency critical application. More processes with smaller
> heaps
> >> > > would help in mitigating this issue.
> >> > >
> >> > > Any experience or thoughts on this would help.
> >> > > Thanks!
> >> > >
> >> > > --
> >> > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556
> >> 6803
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Robert Dyer
> >> > rdyer@iastate.edu
> >> >
> >>
> >
> >
> >
> >--
> >Adrien Mogenet
> >06.59.16.64.22
> >http://www.mogenet.me
>



-- 
*Ishan Chhabra *| Rocket Scientist | +91-9988263562 *m*

答复: Re:Re: 答复: Multiple regionservers on a single node

Posted by 谢良 <xi...@xiaomi.com>.

Sure, here it is :
-Xmx49152m -Xms49152m -Xmn1024m -Xss256k -XX:MaxDirectMemorySize=1024m -XX:MaxPermSize=512m -XX:PermSize=512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/work/log/hbase/ggsrv-miliao/regionserver -XX:+PrintGCApplicationStoppedTime -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/home/work/log/hbase/ggsrv-miliao/regionserver/regionserver_gc.log -XX:SurvivorRatio=1 -XX:+UseCMSCompactAtFullCollection -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled -XX:+UseNUMA -XX:+CMSClassUnloadingEnabled -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:CMSMaxAbortablePrecleanTime=10000 -XX:MaxGCPauseMillis=2000 -XX:TargetSurvivorRatio=80 -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m -XX:CMSWaitDuration=2000 -XX:+CMSScavengeBeforeRemark -XX:+PrintClassHistogramAfterFullGC -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintPromotionFailure -XX:ConcGCThreads=8 -XX:ParallelGCThreads=8 -XX:PretenureSizeThreshold=4m -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent

I'm not a vm developer, so it's a suboptimal setting definitely, please do not apply it into product env directly w/o any testing with your data model. Any comments are welcome:)

Liang,
________________________________________
发件人: Azury [ziqidonglai1979@126.com]
发送时间: 2012年12月11日 14:47
收件人: user@hbase.apache.org
主题: Re:Re: 答复: Multiple regionservers on a single node

Can you share your GC command options here?








在 2012-12-11 06:21:08，"Adrien Mogenet" <ad...@gmail.com> 写道：
>On Fri, Dec 7, 2012 at 10:58 AM, 谢良 <xi...@xiaomi.com> wrote:
>
>> Emm, have you tried to tune your GC deeply? please provide the exactly VM
>> options and jdk version and GC logs..
>> In our test cluster this week, i managed to reduce the longest STW from
>> 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress
>> long-term-testing.
>>
>
>Do you have any further explanation on your specific case ? Looks
>interesting :-)
>
>
>>
>> Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev
>> mail list:)
>> And the G1GC within jdk7u4+ is a potential solution for large-heap senario
>> as well:)
>> ________________________________________
>> > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <ichhabra@rocketfuel.com
>> > >wrote:
>> >
>> > > Hi,
>> > > Has anybody tried to run multiple RegionServers on a single physical
>> > > node? Are there deep technical issues or minor impediments that would
>> > > hinder this?
>> > >
>> > > We are trying to do this because we are facing a lot of GC pauses on
>> the
>> > > large heap sizes (~70G) that we are using, which leads to a lot of
>> > timeouts
>> > > in our latency critical application. More processes with smaller heaps
>> > > would help in mitigating this issue.
>> > >
>> > > Any experience or thoughts on this would help.
>> > > Thanks!
>> > >
>> > > --
>> > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556
>> 6803
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > Robert Dyer
>> > rdyer@iastate.edu
>> >
>>
>
>
>
>--
>Adrien Mogenet
>06.59.16.64.22
>http://www.mogenet.me

Re:Re: 答复: Multiple regionservers on a single node

Posted by Azury <zi...@126.com>.

Can you share your GC command options here?








在 2012-12-11 06:21:08，"Adrien Mogenet" <ad...@gmail.com> 写道：
>On Fri, Dec 7, 2012 at 10:58 AM, 谢良 <xi...@xiaomi.com> wrote:
>
>> Emm, have you tried to tune your GC deeply? please provide the exactly VM
>> options and jdk version and GC logs..
>> In our test cluster this week, i managed to reduce the longest STW from
>> 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress
>> long-term-testing.
>>
>
>Do you have any further explanation on your specific case ? Looks
>interesting :-)
>
>
>>
>> Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev
>> mail list:)
>> And the G1GC within jdk7u4+ is a potential solution for large-heap senario
>> as well:)
>> ________________________________________
>> > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <ichhabra@rocketfuel.com
>> > >wrote:
>> >
>> > > Hi,
>> > > Has anybody tried to run multiple RegionServers on a single physical
>> > > node? Are there deep technical issues or minor impediments that would
>> > > hinder this?
>> > >
>> > > We are trying to do this because we are facing a lot of GC pauses on
>> the
>> > > large heap sizes (~70G) that we are using, which leads to a lot of
>> > timeouts
>> > > in our latency critical application. More processes with smaller heaps
>> > > would help in mitigating this issue.
>> > >
>> > > Any experience or thoughts on this would help.
>> > > Thanks!
>> > >
>> > > --
>> > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556
>> 6803
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > Robert Dyer
>> > rdyer@iastate.edu
>> >
>>
>
>
>
>-- 
>Adrien Mogenet
>06.59.16.64.22
>http://www.mogenet.me

答复: 答复: Multiple regionservers on a single node

Posted by 谢良 <xi...@xiaomi.com>.

I am just a hbase&hotspot vm newbie:)

1)Before look into GC detail, we should turn ontracing flags, e.g. -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:xxxx -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+PrintClassHistogramAfterFullGC -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintPromotionFailure ...

2)dive into GC log during each run, figure out the longest STW root cause, statistic GC total time/GC total count, etc. Here're some usual safepoint cause:  GC 、Revoke Biasedlock 、Deoptimize、FindDeadlocks、PrintJNI, etc..

3)If ParNew costs too much, we can reduce Xmn, adjust survivorRatio/TargetSurvivorRatio/PretenureSizeThreshold...

4)If CMS initial mak&remark are expesive, please notice : UseCMSCompactAtFullCollection/CMSInitiatingOccupancyFraction + UseCMSInitiatingOccupancyOnly/CMSParallelRemarkEnabled/CMSClassUnloadingEnabled/CMSMaxAbortablePrecleanTime/CMSWaitDuration/CMSScavengeBeforeRemark/

5)Multi-thread concurrent is a key as well, if running on modern hareware, e.g: CMSConcurrentMTEnabled/ParallelGCThreads/ConcGCThreads/...

at last, RTFC of right hotspot vm or ask help from hotspot-gc mail list should be the best choice for GC issue

Help it helpful for you,
Liang
________________________________________
发件人: Adrien Mogenet [adrien.mogenet@gmail.com]
发送时间: 2012年12月11日 6:21
收件人: user@hbase.apache.org
主题: Re: 答复: Multiple regionservers on a single node

On Fri, Dec 7, 2012 at 10:58 AM, 谢良 <xi...@xiaomi.com> wrote:

> Emm, have you tried to tune your GC deeply? please provide the exactly VM
> options and jdk version and GC logs..
> In our test cluster this week, i managed to reduce the longest STW from
> 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress
> long-term-testing.
>

Do you have any further explanation on your specific case ? Looks
interesting :-)

>
> Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev
> mail list:)
> And the G1GC within jdk7u4+ is a potential solution for large-heap senario
> as well:)
> ________________________________________
> > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <ichhabra@rocketfuel.com
> > >wrote:
> >
> > > Hi,
> > > Has anybody tried to run multiple RegionServers on a single physical
> > > node? Are there deep technical issues or minor impediments that would
> > > hinder this?
> > >
> > > We are trying to do this because we are facing a lot of GC pauses on
> the
> > > large heap sizes (~70G) that we are using, which leads to a lot of
> > timeouts
> > > in our latency critical application. More processes with smaller heaps
> > > would help in mitigating this issue.
> > >
> > > Any experience or thoughts on this would help.
> > > Thanks!
> > >
> > > --
> > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556
> 6803
> > >
> >
> >
> >
> > --
> >
> > Robert Dyer
> > rdyer@iastate.edu
> >
>

--
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me

Re: 答复: Multiple regionservers on a single node

Posted by Adrien Mogenet <ad...@gmail.com>.

On Fri, Dec 7, 2012 at 10:58 AM, 谢良 <xi...@xiaomi.com> wrote:

> Emm, have you tried to tune your GC deeply? please provide the exactly VM
> options and jdk version and GC logs..
> In our test cluster this week, i managed to reduce the longest STW from
> 22+ seconds(Xmx20G) to 1.1s(Xmx48G) under a very heavy YCSB stress
> long-term-testing.
>

Do you have any further explanation on your specific case ? Looks
interesting :-)


>
> Also it would be better to ask help from hotspot-gc-use/hotspot-gc-dev
> mail list:)
> And the G1GC within jdk7u4+ is a potential solution for large-heap senario
> as well:)
> ________________________________________
> > On Mon, Dec 3, 2012 at 3:39 PM, Ishan Chhabra <ichhabra@rocketfuel.com
> > >wrote:
> >
> > > Hi,
> > > Has anybody tried to run multiple RegionServers on a single physical
> > > node? Are there deep technical issues or minor impediments that would
> > > hinder this?
> > >
> > > We are trying to do this because we are facing a lot of GC pauses on
> the
> > > large heap sizes (~70G) that we are using, which leads to a lot of
> > timeouts
> > > in our latency critical application. More processes with smaller heaps
> > > would help in mitigating this issue.
> > >
> > > Any experience or thoughts on this would help.
> > > Thanks!
> > >
> > > --
> > > *Ishan Chhabra *| Rocket Scientist | Rocketfuel Inc. | *m *650 556
> 6803
> > >
> >
> >
> >
> > --
> >
> > Robert Dyer
> > rdyer@iastate.edu
> >
>



-- 
Adrien Mogenet
06.59.16.64.22
http://www.mogenet.me