You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by zheng wang <18...@qq.com> on 2019/09/29 08:02:24 UTC

a problem of long STW because of GC ref-proc

Hi~


My live cluster env config below:
hbase version:cdh6.0.1(apache hbase2.0.0)
hbase config: bucketCache(70g),blocksize(16k)


java version:1.8.0_51
javaconfig:heap(32g),-XX:+UseG1GC  -XX:MaxGCPauseMillis=100 -XX:+ParallelRefProcEnabled


About 1-2days ,regionServer would occur a old gen gc that cost 1~2s in remark phase:


2019-09-29T01:55:45.186+0800: 365222.053: 
[GC remark 
	2019-09-29T01:55:45.186+0800: 365222.053: 
	[Finalize Marking, 0.0016327 secs] 
	2019-09-29T01:55:45.188+0800: 365222.054: 
	[GC ref-proc
		2019-09-29T01:55:45.188+0800: 365222.054: [SoftReference, 1264586 refs, 0.3151392 secs]
		2019-09-29T01:55:45.503+0800: 365222.370: [WeakReference, 4317 refs, 0.0024381 secs]
		2019-09-29T01:55:45.505+0800: 365222.372: [FinalReference, 9791 refs, 0.0037445 secs]
		2019-09-29T01:55:45.509+0800: 365222.376: [PhantomReference, 0 refs, 1963 refs, 0.0018941 secs]
		2019-09-29T01:55:45.511+0800: 365222.378: [JNI Weak Reference, 0.0001156 secs]
	, 1.4554361 secs] 
	2019-09-29T01:55:46.643+0800: 365223.510: 
	[Unloading, 0.0211370 secs]
, 1.4851728 secs]

The SoftReference seems used by offsetLock in BucketCache, there is two questions :
1:SoftReference proc cost 0.31s,but why GC ref-proc cost 1.45s at all?
2:Is this a good choice to use SoftReference here?

Re: a problem of long STW because of GC ref-proc

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Hi

Thanks Zheng for pinging here.
As far as I know I have not delved deeper into this offset lock and its
soft reference. I think after Zheng's suggestion the STW came down a lot
after making the block size 64 KB - because the number of blocks reduces
and so the soft references.  But still seems the time is big for the user.
I think it is worth to now check the impact of this particularly when we
suggest bigger sized bucket caches. Will be back .

Regards
Ram


On Mon, Sep 30, 2019 at 9:03 AM OpenInx <op...@gmail.com> wrote:

> OK,  the huge number of softReference from offsetLock for each block still
> be the main problem.
> I'm not sure whether there're some g1 option can help to optimize the long
> STW.
> One solution I can image for now : limit the bucketcache size for a single
> RS, say the 70g bucketcache may
> need to separate it into two RS.
>
> As far as I know, Anoop & ram have some good practice about using huge
> bucket cache.  Ping anoop & ramkrishna,
> Any thoughts about this GC issue ?
>
>
> On Mon, Sep 30, 2019 at 11:09 AM zheng wang <18...@qq.com> wrote:
>
> > Even if set to 64KB,it also has more than 100w softRef ,and will cost too
> > long still.
> >
> >
> > this "GC ref-proc" process 50w softRef and cost 700ms:
> >
> >
> > 2019-09-18T03:16:42.088+0800: 125161.477:
> > [GC remark
> >         2019-09-18T03:16:42.088+0800: 125161.477:
> >         [Finalize Marking, 0.0018076 secs]
> >         2019-09-18T03:16:42.089+0800: 125161.479:
> >         [GC ref-proc
> >                 2019-09-18T03:16:42.089+0800: 125161.479: [SoftReference,
> > 499278 refs, 0.1382086 secs]
> >                 2019-09-18T03:16:42.228+0800: 125161.617: [WeakReference,
> > 3750 refs, 0.0049171 secs]
> >                 2019-09-18T03:16:42.233+0800: 125161.622:
> [FinalReference,
> > 1040 refs, 0.0009375 secs]
> >                 2019-09-18T03:16:42.234+0800: 125161.623:
> > [PhantomReference, 0 refs, 21921 refs, 0.0058014 secs]
> >                 2019-09-18T03:16:42.239+0800: 125161.629: [JNI Weak
> > Reference, 0.0001070 secs]
> >         , 0.6667733 secs]
> >         2019-09-18T03:16:42.756+0800: 125162.146:
> >         [Unloading, 0.0224078 secs]
> > , 0.6987032 secs]
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "OpenInx"<op...@gmail.com>;
> > 发送时间: 2019年9月30日(星期一) 上午10:27
> > 收件人: "Hbase-User"<us...@hbase.apache.org>;
> >
> > 主题: Re: a problem of long STW because of GC ref-proc
> >
> >
> >
> > 100% get is not the right reason for choosing 16KB I think, because  if
> you
> > read a block, there's larger possibility that we
> > will read the adjacent cells in the same block... I think caching a 16KB
> > block or caching a 64KB block in BucketCache won't
> > make a big difference ?  (but if you cell byte size is quite small,  then
> > it will have so many cells encoded in a 64KB block,
> > then block with smaller size will be better because we search the cells
> in
> > a block one by one , means O(N) complexity).
> >
> >
> > On Mon, Sep 30, 2019 at 10:08 AM zheng wang <18...@qq.com> wrote:
> >
> > > Yes,it will be remission by your advise,but there only get request in
> our
> > > business,so 16KB is better.
> > > IMO,the locks of offset will always be used,so is the strong reference
> a
> > > better choice?
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "OpenInx"<op...@gmail.com>;
> > > 发送时间: 2019年9月30日(星期一) 上午9:46
> > > 收件人: "Hbase-User"<us...@hbase.apache.org>;
> > >
> > > 主题: Re: a problem of long STW because of GC ref-proc
> > >
> > >
> > >
> > > Seems your block size is very small (16KB), so there will be
> > > 70*1024*1024/16=4587520 block (at most) in your BucketCache.
> > > For each block, the RS will maintain a soft reference idLock and a
> > > BucketEntry in its bucket cache.  So maybe you can try to
> > > enlarge the block size ?
> > >
> > > On Sun, Sep 29, 2019 at 10:14 PM zheng wang <18...@qq.com> wrote:
> > >
> > > > Hi~
> > > >
> > > >
> > > > My live cluster env config below:
> > > > hbase version:cdh6.0.1(apache hbase2.0.0)
> > > > hbase config: bucketCache(70g),blocksize(16k)
> > > >
> > > >
> > > > java version:1.8.0_51
> > > > javaconfig:heap(32g),-XX:+UseG1GC  -XX:MaxGCPauseMillis=100
> > > > -XX:+ParallelRefProcEnabled
> > > >
> > > >
> > > > About 1-2days ,regionServer would occur a old gen gc that cost 1~2s
> in
> > > > remark phase:
> > > >
> > > >
> > > > 2019-09-29T01:55:45.186+0800: 365222.053:
> > > > [GC remark
> > > >         2019-09-29T01:55:45.186+0800: 365222.053:
> > > >         [Finalize Marking, 0.0016327 secs]
> > > >         2019-09-29T01:55:45.188+0800: 365222.054:
> > > >         [GC ref-proc
> > > >                 2019-09-29T01:55:45.188+0800: 365222.054:
> > [SoftReference,
> > > > 1264586 refs, 0.3151392 secs]
> > > >                 2019-09-29T01:55:45.503+0800: 365222.370:
> > [WeakReference,
> > > > 4317 refs, 0.0024381 secs]
> > > >                 2019-09-29T01:55:45.505+0800: 365222.372:
> > > [FinalReference,
> > > > 9791 refs, 0.0037445 secs]
> > > >                 2019-09-29T01:55:45.509+0800: 365222.376:
> > > > [PhantomReference, 0 refs, 1963 refs, 0.0018941 secs]
> > > >                 2019-09-29T01:55:45.511+0800: 365222.378: [JNI Weak
> > > > Reference, 0.0001156 secs]
> > > >         , 1.4554361 secs]
> > > >         2019-09-29T01:55:46.643+0800: 365223.510:
> > > >         [Unloading, 0.0211370 secs]
> > > > , 1.4851728 secs]
> > > >
> > > > The SoftReference seems used by offsetLock in BucketCache, there is
> two
> > > > questions :
> > > > 1:SoftReference proc cost 0.31s,but why GC ref-proc cost 1.45s at
> all?
> > > > 2:Is this a good choice to use SoftReference here?
>

Re: a problem of long STW because of GC ref-proc

Posted by OpenInx <op...@gmail.com>.

OK,  the huge number of softReference from offsetLock for each block still
be the main problem.
I'm not sure whether there're some g1 option can help to optimize the long
STW.
One solution I can image for now : limit the bucketcache size for a single
RS, say the 70g bucketcache may
need to separate it into two RS.

As far as I know, Anoop & ram have some good practice about using huge
bucket cache.  Ping anoop & ramkrishna,
Any thoughts about this GC issue ?


On Mon, Sep 30, 2019 at 11:09 AM zheng wang <18...@qq.com> wrote:

> Even if set to 64KB,it also has more than 100w softRef ,and will cost too
> long still.
>
>
> this "GC ref-proc" process 50w softRef and cost 700ms:
>
>
> 2019-09-18T03:16:42.088+0800: 125161.477:
> [GC remark
>         2019-09-18T03:16:42.088+0800: 125161.477:
>         [Finalize Marking, 0.0018076 secs]
>         2019-09-18T03:16:42.089+0800: 125161.479:
>         [GC ref-proc
>                 2019-09-18T03:16:42.089+0800: 125161.479: [SoftReference,
> 499278 refs, 0.1382086 secs]
>                 2019-09-18T03:16:42.228+0800: 125161.617: [WeakReference,
> 3750 refs, 0.0049171 secs]
>                 2019-09-18T03:16:42.233+0800: 125161.622: [FinalReference,
> 1040 refs, 0.0009375 secs]
>                 2019-09-18T03:16:42.234+0800: 125161.623:
> [PhantomReference, 0 refs, 21921 refs, 0.0058014 secs]
>                 2019-09-18T03:16:42.239+0800: 125161.629: [JNI Weak
> Reference, 0.0001070 secs]
>         , 0.6667733 secs]
>         2019-09-18T03:16:42.756+0800: 125162.146:
>         [Unloading, 0.0224078 secs]
> , 0.6987032 secs]
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "OpenInx"<op...@gmail.com>;
> 发送时间: 2019年9月30日(星期一) 上午10:27
> 收件人: "Hbase-User"<us...@hbase.apache.org>;
>
> 主题: Re: a problem of long STW because of GC ref-proc
>
>
>
> 100% get is not the right reason for choosing 16KB I think, because  if you
> read a block, there's larger possibility that we
> will read the adjacent cells in the same block... I think caching a 16KB
> block or caching a 64KB block in BucketCache won't
> make a big difference ?  (but if you cell byte size is quite small,  then
> it will have so many cells encoded in a 64KB block,
> then block with smaller size will be better because we search the cells in
> a block one by one , means O(N) complexity).
>
>
> On Mon, Sep 30, 2019 at 10:08 AM zheng wang <18...@qq.com> wrote:
>
> > Yes,it will be remission by your advise,but there only get request in our
> > business,so 16KB is better.
> > IMO,the locks of offset will always be used,so is the strong reference a
> > better choice?
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "OpenInx"<op...@gmail.com>;
> > 发送时间: 2019年9月30日(星期一) 上午9:46
> > 收件人: "Hbase-User"<us...@hbase.apache.org>;
> >
> > 主题: Re: a problem of long STW because of GC ref-proc
> >
> >
> >
> > Seems your block size is very small (16KB), so there will be
> > 70*1024*1024/16=4587520 block (at most) in your BucketCache.
> > For each block, the RS will maintain a soft reference idLock and a
> > BucketEntry in its bucket cache.  So maybe you can try to
> > enlarge the block size ?
> >
> > On Sun, Sep 29, 2019 at 10:14 PM zheng wang <18...@qq.com> wrote:
> >
> > > Hi~
> > >
> > >
> > > My live cluster env config below:
> > > hbase version:cdh6.0.1(apache hbase2.0.0)
> > > hbase config: bucketCache(70g),blocksize(16k)
> > >
> > >
> > > java version:1.8.0_51
> > > javaconfig:heap(32g),-XX:+UseG1GC  -XX:MaxGCPauseMillis=100
> > > -XX:+ParallelRefProcEnabled
> > >
> > >
> > > About 1-2days ,regionServer would occur a old gen gc that cost 1~2s in
> > > remark phase:
> > >
> > >
> > > 2019-09-29T01:55:45.186+0800: 365222.053:
> > > [GC remark
> > >         2019-09-29T01:55:45.186+0800: 365222.053:
> > >         [Finalize Marking, 0.0016327 secs]
> > >         2019-09-29T01:55:45.188+0800: 365222.054:
> > >         [GC ref-proc
> > >                 2019-09-29T01:55:45.188+0800: 365222.054:
> [SoftReference,
> > > 1264586 refs, 0.3151392 secs]
> > >                 2019-09-29T01:55:45.503+0800: 365222.370:
> [WeakReference,
> > > 4317 refs, 0.0024381 secs]
> > >                 2019-09-29T01:55:45.505+0800: 365222.372:
> > [FinalReference,
> > > 9791 refs, 0.0037445 secs]
> > >                 2019-09-29T01:55:45.509+0800: 365222.376:
> > > [PhantomReference, 0 refs, 1963 refs, 0.0018941 secs]
> > >                 2019-09-29T01:55:45.511+0800: 365222.378: [JNI Weak
> > > Reference, 0.0001156 secs]
> > >         , 1.4554361 secs]
> > >         2019-09-29T01:55:46.643+0800: 365223.510:
> > >         [Unloading, 0.0211370 secs]
> > > , 1.4851728 secs]
> > >
> > > The SoftReference seems used by offsetLock in BucketCache, there is two
> > > questions :
> > > 1:SoftReference proc cost 0.31s,but why GC ref-proc cost 1.45s at all?
> > > 2:Is this a good choice to use SoftReference here?

回复： a problem of long STW because of GC ref-proc

Posted by zheng wang <18...@qq.com>.

Even if set to 64KB,it also has more than 100w softRef ,and will cost too long still.


this "GC ref-proc" process 50w softRef and cost 700ms:


2019-09-18T03:16:42.088+0800: 125161.477: 
[GC remark 
	2019-09-18T03:16:42.088+0800: 125161.477: 
	[Finalize Marking, 0.0018076 secs] 
	2019-09-18T03:16:42.089+0800: 125161.479: 
	[GC ref-proc
		2019-09-18T03:16:42.089+0800: 125161.479: [SoftReference, 499278 refs, 0.1382086 secs]
		2019-09-18T03:16:42.228+0800: 125161.617: [WeakReference, 3750 refs, 0.0049171 secs]
		2019-09-18T03:16:42.233+0800: 125161.622: [FinalReference, 1040 refs, 0.0009375 secs]
		2019-09-18T03:16:42.234+0800: 125161.623: [PhantomReference, 0 refs, 21921 refs, 0.0058014 secs]
		2019-09-18T03:16:42.239+0800: 125161.629: [JNI Weak Reference, 0.0001070 secs]
	, 0.6667733 secs] 
	2019-09-18T03:16:42.756+0800: 125162.146: 
	[Unloading, 0.0224078 secs]
, 0.6987032 secs] 


------------------ 原始邮件 ------------------
发件人: "OpenInx"<op...@gmail.com>;
发送时间: 2019年9月30日(星期一) 上午10:27
收件人: "Hbase-User"<us...@hbase.apache.org>;

主题: Re: a problem of long STW because of GC ref-proc



100% get is not the right reason for choosing 16KB I think, because  if you
read a block, there's larger possibility that we
will read the adjacent cells in the same block... I think caching a 16KB
block or caching a 64KB block in BucketCache won't
make a big difference ?  (but if you cell byte size is quite small,  then
it will have so many cells encoded in a 64KB block,
then block with smaller size will be better because we search the cells in
a block one by one , means O(N) complexity).


On Mon, Sep 30, 2019 at 10:08 AM zheng wang <18...@qq.com> wrote:

> Yes,it will be remission by your advise,but there only get request in our
> business,so 16KB is better.
> IMO,the locks of offset will always be used,so is the strong reference a
> better choice?
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "OpenInx"<op...@gmail.com>;
> 发送时间: 2019年9月30日(星期一) 上午9:46
> 收件人: "Hbase-User"<us...@hbase.apache.org>;
>
> 主题: Re: a problem of long STW because of GC ref-proc
>
>
>
> Seems your block size is very small (16KB), so there will be
> 70*1024*1024/16=4587520 block (at most) in your BucketCache.
> For each block, the RS will maintain a soft reference idLock and a
> BucketEntry in its bucket cache.  So maybe you can try to
> enlarge the block size ?
>
> On Sun, Sep 29, 2019 at 10:14 PM zheng wang <18...@qq.com> wrote:
>
> > Hi~
> >
> >
> > My live cluster env config below:
> > hbase version:cdh6.0.1(apache hbase2.0.0)
> > hbase config: bucketCache(70g),blocksize(16k)
> >
> >
> > java version:1.8.0_51
> > javaconfig:heap(32g),-XX:+UseG1GC  -XX:MaxGCPauseMillis=100
> > -XX:+ParallelRefProcEnabled
> >
> >
> > About 1-2days ,regionServer would occur a old gen gc that cost 1~2s in
> > remark phase:
> >
> >
> > 2019-09-29T01:55:45.186+0800: 365222.053:
> > [GC remark
> >         2019-09-29T01:55:45.186+0800: 365222.053:
> >         [Finalize Marking, 0.0016327 secs]
> >         2019-09-29T01:55:45.188+0800: 365222.054:
> >         [GC ref-proc
> >                 2019-09-29T01:55:45.188+0800: 365222.054: [SoftReference,
> > 1264586 refs, 0.3151392 secs]
> >                 2019-09-29T01:55:45.503+0800: 365222.370: [WeakReference,
> > 4317 refs, 0.0024381 secs]
> >                 2019-09-29T01:55:45.505+0800: 365222.372:
> [FinalReference,
> > 9791 refs, 0.0037445 secs]
> >                 2019-09-29T01:55:45.509+0800: 365222.376:
> > [PhantomReference, 0 refs, 1963 refs, 0.0018941 secs]
> >                 2019-09-29T01:55:45.511+0800: 365222.378: [JNI Weak
> > Reference, 0.0001156 secs]
> >         , 1.4554361 secs]
> >         2019-09-29T01:55:46.643+0800: 365223.510:
> >         [Unloading, 0.0211370 secs]
> > , 1.4851728 secs]
> >
> > The SoftReference seems used by offsetLock in BucketCache, there is two
> > questions :
> > 1:SoftReference proc cost 0.31s,but why GC ref-proc cost 1.45s at all?
> > 2:Is this a good choice to use SoftReference here?

Re: a problem of long STW because of GC ref-proc

Posted by OpenInx <op...@gmail.com>.

100% get is not the right reason for choosing 16KB I think, because  if you
read a block, there's larger possibility that we
will read the adjacent cells in the same block... I think caching a 16KB
block or caching a 64KB block in BucketCache won't
make a big difference ?  (but if you cell byte size is quite small,  then
it will have so many cells encoded in a 64KB block,
then block with smaller size will be better because we search the cells in
a block one by one , means O(N) complexity).


On Mon, Sep 30, 2019 at 10:08 AM zheng wang <18...@qq.com> wrote:

> Yes,it will be remission by your advise,but there only get request in our
> business,so 16KB is better.
> IMO,the locks of offset will always be used,so is the strong reference a
> better choice?
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "OpenInx"<op...@gmail.com>;
> 发送时间: 2019年9月30日(星期一) 上午9:46
> 收件人: "Hbase-User"<us...@hbase.apache.org>;
>
> 主题: Re: a problem of long STW because of GC ref-proc
>
>
>
> Seems your block size is very small (16KB), so there will be
> 70*1024*1024/16=4587520 block (at most) in your BucketCache.
> For each block, the RS will maintain a soft reference idLock and a
> BucketEntry in its bucket cache.  So maybe you can try to
> enlarge the block size ?
>
> On Sun, Sep 29, 2019 at 10:14 PM zheng wang <18...@qq.com> wrote:
>
> > Hi~
> >
> >
> > My live cluster env config below:
> > hbase version:cdh6.0.1(apache hbase2.0.0)
> > hbase config: bucketCache(70g),blocksize(16k)
> >
> >
> > java version:1.8.0_51
> > javaconfig:heap(32g),-XX:+UseG1GC  -XX:MaxGCPauseMillis=100
> > -XX:+ParallelRefProcEnabled
> >
> >
> > About 1-2days ,regionServer would occur a old gen gc that cost 1~2s in
> > remark phase:
> >
> >
> > 2019-09-29T01:55:45.186+0800: 365222.053:
> > [GC remark
> >         2019-09-29T01:55:45.186+0800: 365222.053:
> >         [Finalize Marking, 0.0016327 secs]
> >         2019-09-29T01:55:45.188+0800: 365222.054:
> >         [GC ref-proc
> >                 2019-09-29T01:55:45.188+0800: 365222.054: [SoftReference,
> > 1264586 refs, 0.3151392 secs]
> >                 2019-09-29T01:55:45.503+0800: 365222.370: [WeakReference,
> > 4317 refs, 0.0024381 secs]
> >                 2019-09-29T01:55:45.505+0800: 365222.372:
> [FinalReference,
> > 9791 refs, 0.0037445 secs]
> >                 2019-09-29T01:55:45.509+0800: 365222.376:
> > [PhantomReference, 0 refs, 1963 refs, 0.0018941 secs]
> >                 2019-09-29T01:55:45.511+0800: 365222.378: [JNI Weak
> > Reference, 0.0001156 secs]
> >         , 1.4554361 secs]
> >         2019-09-29T01:55:46.643+0800: 365223.510:
> >         [Unloading, 0.0211370 secs]
> > , 1.4851728 secs]
> >
> > The SoftReference seems used by offsetLock in BucketCache, there is two
> > questions :
> > 1:SoftReference proc cost 0.31s,but why GC ref-proc cost 1.45s at all?
> > 2:Is this a good choice to use SoftReference here?

回复： a problem of long STW because of GC ref-proc

Posted by zheng wang <18...@qq.com>.

Yes,it will be remission by your advise,but there only get request in our business,so 16KB is better.
IMO,the locks of offset will always be used,so is the strong reference a better choice?




------------------ 原始邮件 ------------------
发件人: "OpenInx"<op...@gmail.com>;
发送时间: 2019年9月30日(星期一) 上午9:46
收件人: "Hbase-User"<us...@hbase.apache.org>;

主题: Re: a problem of long STW because of GC ref-proc



Seems your block size is very small (16KB), so there will be
70*1024*1024/16=4587520 block (at most) in your BucketCache.
For each block, the RS will maintain a soft reference idLock and a
BucketEntry in its bucket cache.  So maybe you can try to
enlarge the block size ?

On Sun, Sep 29, 2019 at 10:14 PM zheng wang <18...@qq.com> wrote:

> Hi~
>
>
> My live cluster env config below:
> hbase version:cdh6.0.1(apache hbase2.0.0)
> hbase config: bucketCache(70g),blocksize(16k)
>
>
> java version:1.8.0_51
> javaconfig:heap(32g),-XX:+UseG1GC  -XX:MaxGCPauseMillis=100
> -XX:+ParallelRefProcEnabled
>
>
> About 1-2days ,regionServer would occur a old gen gc that cost 1~2s in
> remark phase:
>
>
> 2019-09-29T01:55:45.186+0800: 365222.053:
> [GC remark
>         2019-09-29T01:55:45.186+0800: 365222.053:
>         [Finalize Marking, 0.0016327 secs]
>         2019-09-29T01:55:45.188+0800: 365222.054:
>         [GC ref-proc
>                 2019-09-29T01:55:45.188+0800: 365222.054: [SoftReference,
> 1264586 refs, 0.3151392 secs]
>                 2019-09-29T01:55:45.503+0800: 365222.370: [WeakReference,
> 4317 refs, 0.0024381 secs]
>                 2019-09-29T01:55:45.505+0800: 365222.372: [FinalReference,
> 9791 refs, 0.0037445 secs]
>                 2019-09-29T01:55:45.509+0800: 365222.376:
> [PhantomReference, 0 refs, 1963 refs, 0.0018941 secs]
>                 2019-09-29T01:55:45.511+0800: 365222.378: [JNI Weak
> Reference, 0.0001156 secs]
>         , 1.4554361 secs]
>         2019-09-29T01:55:46.643+0800: 365223.510:
>         [Unloading, 0.0211370 secs]
> , 1.4851728 secs]
>
> The SoftReference seems used by offsetLock in BucketCache, there is two
> questions :
> 1:SoftReference proc cost 0.31s,but why GC ref-proc cost 1.45s at all?
> 2:Is this a good choice to use SoftReference here?

Re: a problem of long STW because of GC ref-proc

Posted by OpenInx <op...@gmail.com>.

Seems your block size is very small (16KB), so there will be
70*1024*1024/16=4587520 block (at most) in your BucketCache.
For each block, the RS will maintain a soft reference idLock and a
BucketEntry in its bucket cache.  So maybe you can try to
enlarge the block size ?

On Sun, Sep 29, 2019 at 10:14 PM zheng wang <18...@qq.com> wrote:

> Hi~
>
>
> My live cluster env config below:
> hbase version:cdh6.0.1(apache hbase2.0.0)
> hbase config: bucketCache(70g),blocksize(16k)
>
>
> java version:1.8.0_51
> javaconfig:heap(32g),-XX:+UseG1GC  -XX:MaxGCPauseMillis=100
> -XX:+ParallelRefProcEnabled
>
>
> About 1-2days ,regionServer would occur a old gen gc that cost 1~2s in
> remark phase:
>
>
> 2019-09-29T01:55:45.186+0800: 365222.053:
> [GC remark
>         2019-09-29T01:55:45.186+0800: 365222.053:
>         [Finalize Marking, 0.0016327 secs]
>         2019-09-29T01:55:45.188+0800: 365222.054:
>         [GC ref-proc
>                 2019-09-29T01:55:45.188+0800: 365222.054: [SoftReference,
> 1264586 refs, 0.3151392 secs]
>                 2019-09-29T01:55:45.503+0800: 365222.370: [WeakReference,
> 4317 refs, 0.0024381 secs]
>                 2019-09-29T01:55:45.505+0800: 365222.372: [FinalReference,
> 9791 refs, 0.0037445 secs]
>                 2019-09-29T01:55:45.509+0800: 365222.376:
> [PhantomReference, 0 refs, 1963 refs, 0.0018941 secs]
>                 2019-09-29T01:55:45.511+0800: 365222.378: [JNI Weak
> Reference, 0.0001156 secs]
>         , 1.4554361 secs]
>         2019-09-29T01:55:46.643+0800: 365223.510:
>         [Unloading, 0.0211370 secs]
> , 1.4851728 secs]
>
> The SoftReference seems used by offsetLock in BucketCache, there is two
> questions :
> 1:SoftReference proc cost 0.31s,but why GC ref-proc cost 1.45s at all?
> 2:Is this a good choice to use SoftReference here?