You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Ramkrishna S Vasudevan <ra...@huawei.com> on 2011/12/01 03:51:19 UTC

RE: Suspected memory leak

Adding dev list to get some suggestions.

Regards
Ram


-----Original Message-----
From: Shrijeet Paliwal [mailto:shrijeet@rocketfuel.com] 
Sent: Thursday, December 01, 2011 8:08 AM
To: user@hbase.apache.org
Cc: Gaojinchao; Chenjian
Subject: Re: Suspected memory leak

Jieshan,
We backported https://issues.apache.org/jira/browse/HBASE-2937 to 0.90.3

-Shrijeet


2011/11/30 bijieshan <bi...@huawei.com>

> Hi Shrijeet,
>
> I think that's jira relevant to trunk, but not for 90.X. For there's no
> timeout mechanism in 90.X. Right?
> We found this problem in 90.x.
>
> Thanks,
>
> Jieshan.
>
> -----邮件原件-----
> 发件人: Shrijeet Paliwal [mailto:shrijeet@rocketfuel.com]
> 发送时间: 2011年12月1日 10:26
> 收件人: user@hbase.apache.org
> 抄送: Gaojinchao; Chenjian
> 主题: Re: Suspected memory leak
>
> Gaojinchao,
>
> I had filed this some time ago,
> https://issues.apache.org/jira/browse/HBASE-4633
> But after some recent insights on our application code, I am inclined to
> think leak (or memory 'hold') is in our application. But it will be good
to
> check out either way.
> I need to update the jira with my saga. See if the description of issue I
> posted there, matches yours. If not, may be you can update with your story
> in detail.
>
> -Shrijeet
>
> 2011/11/30 Gaojinchao <ga...@huawei.com>
>
> > I have noticed some memory leak problems in my HBase client.
> > RES has increased to 27g
> > PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > 12676 root      20   0 30.8g  27g 5092 S    2 57.5 587:57.76
> > /opt/java/jre/bin/java -Djava.library.path=lib/.
> >
> > But I am not sure the leak comes from HBase Client jar itself or just
our
> > client code.
> >
> > This is some parameters of jvm.
> > :-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC
> > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65
> > -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1
> > -XX:+CMSParallelRemarkEnabled
> >
> > Who has experience in this case? , I need continue to dig :)
> >
> >
> >
> > 发件人: Gaojinchao
> > 发送时间: 2011年11月30日 11:02
> > 收件人: user@hbase.apache.org
> > 主题: Suspected memory leak
> >
> > In HBaseClient proceess, I found heap has been increased.
> > I used command ’cat smaps’ to get the heap size.
> > It seems in case when the threads pool in HTable has released the no
> using
> > thread, if you use putlist api to put data again, the memory is
> increased.
> >
> > Who has experience in this case?
> >
> > Below is the heap of Hbase client:
> > C3S31:/proc/18769 # cat smaps
> > 4010a000-4709d000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             114252 kB
> > Rss:              114044 kB
> > Pss:              114044 kB
> >
> > 4010a000-4709d000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             114252 kB
> > Rss:              114044 kB
> > Pss:              114044 kB
> >
> > 4010a000-48374000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             133544 kB
> > Rss:              133336 kB
> > Pss:              133336 kB
> >
> > 4010a000-49f20000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             161880 kB
> > Rss:              161672 kB
> > Pss:              161672 kB
> >
> > 4010a000-4c5de000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             201552 kB
> > Rss:              201344 kB
> > Pss:              201344 kB
> >
>


Re: FeedbackRe: Suspected memory leak

Posted by Gaojinchao <ga...@huawei.com>.
Some information has updated in HBASE-4633.


-----邮件原件-----
发件人: Gaojinchao [mailto:gaojinchao@huawei.com] 
发送时间: 2011年12月5日 8:45
收件人: dev@hbase.apache.org; lars hofhansl
主题: Re: FeedbackRe: Suspected memory leak

I have attached the stack in https://issues.apache.org/jira/browse/HBASE-4633.
I will update our story.


-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2011年12月5日 7:37
收件人: dev@hbase.apache.org; lars hofhansl
主题: Re: FeedbackRe: Suspected memory leak

I looked through TRUNK and 0.90 code but didn't find
HBaseClient.Connection.setParam().
The method should be sendParam().

When I was in China I tried to access Jonathan's post but wasn't able to.

If Jinchao's stack trace resonates with the one Jonathan posted, we should
consider using netty for HBaseClient.

Cheers

On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <lh...@yahoo.com> wrote:

> I think HBASE-4508 is unrelated.
> The "connections" I referring to are HBaseClient.Connection objects (not
> HConnections).
> It turns out that HBaseClient.Connection.setParam is actually called
> directly by the client threads, which means we can get
> an unlimited amount of DirectByteBuffers (until we get a full GC).
>
> The JDK will cache 3 per thread with a size necessary to serve the IO. So
> sending some large requests from many thread
> will lead to OOM.
>
> I think that was a related thread that Stack forwarded a while back from
> the asynchbase mailing lists.
>
> Jinchao, could you add a text version (not a png image, please :-) ) of
> this to the jira?
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Cc: Gaojinchao <ga...@huawei.com>; Chenjian <je...@huawei.com>;
> wenzaohua <we...@huawei.com>
> Sent: Sunday, December 4, 2011 12:43 PM
> Subject: Re: FeedbackRe: Suspected memory leak
>
> I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because
> 0.90.5 hasn't been released.
> Assuming the NIO consumption is related to the number of connections from
> client side, it would help to perform benchmarking on 0.90.5
>
> Jinchao:
> Please attach stack trace to HBASE-4633 so that we can verify our
> assumptions.
>
> Thanks
>
> On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
>
> > Thanks. Now the question is: How many connection threads do we have?
> >
> > I think there is one per regionserver, which would indeed be a problem.
> > Need to look at the code again (I'm only partially familiar with the
> > client code).
> >
> > Either the client should chunk (like the server does), or there should be
> > a limited number of thread that
> > perform IO on behalf of the client (or both).
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Gaojinchao <ga...@huawei.com>
> > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > lhofhansl@yahoo.com>
> > Cc: Chenjian <je...@huawei.com>; wenzaohua <wenzaohua@huawei.com
> >
> > Sent: Saturday, December 3, 2011 11:22 PM
> > Subject: Re: FeedbackRe: Suspected memory leak
> >
> > This is dump stack.
> >
> >
> > -----邮件原件-----
> > 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> > 发送时间: 2011年12月4日 14:15
> > 收件人: dev@hbase.apache.org
> > 抄送: Chenjian; wenzaohua
> > 主题: Re: FeedbackRe: Suspected memory leak
> >
> > Dropping user list.
> >
> > Could you (or somebody) point me to where the client is using NIO?
> > I'm looking at HBaseClient and I do not see references to NIO, also it
> > seems that all work is handed off to
> > separate threads: HBaseClient.Connection, and the JDK will not cache more
> > than 3 direct buffers per thread.
> >
> > It's possible (likely?) that I missed something in the code.
> >
> > Thanks.
> >
> > -- Lars
> >
> > ________________________________
> > From: Gaojinchao <ga...@huawei.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>; "
> dev@hbase.apache.org"
> > <de...@hbase.apache.org>
> > Cc: Chenjian <je...@huawei.com>; wenzaohua <wenzaohua@huawei.com
> >
> > Sent: Saturday, December 3, 2011 7:57 PM
> > Subject: FeedbackRe: Suspected memory leak
> >
> > Thank you for your help.
> >
> > This issue appears to be a configuration problem:
> > 1. HBase client uses NIO(socket) API that uses the direct memory.
> > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> > there doesn't have "full gc", all direct memory can't reclaim.
> > Unfortunately, using GC confiugre parameter of our client doesn't produce
> > any "full gc".
> >
> > This is only a preliminary result,  All tests is running, If have any
> > further results , we will be fed back.
> > Finally , I will update our story to issue
> > https://issues.apache.org/jira/browse/HBASE-4633.
> >
> > If our digging is crrect, whether we should set a default value for the
> > "-XXMaxDirectMemorySize" to prevent this situation?
> >
> >
> > Thanks
> >
> > -----邮件原件-----
> > 发件人: bijieshan [mailto:bijieshan@huawei.com]
> > 发送时间: 2011年12月2日 15:37
> > 收件人: dev@hbase.apache.org; user@hbase.apache.org
> > 抄送: Chenjian; wenzaohua
> > 主题: Re: Suspected memory leak
> >
> > Thank you all.
> > I think it's the same problem with the link provided by Stack. Because
> the
> > heap-size is stabilized, but the non-heap size keep growing. So I think
> not
> > the problem of the CMS GC bug.
> > And we have known the content of the problem memory section, all the
> > records contains the info like below:
> >
> >
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> > "BBZHtable_UFDR_058,048342220093168-02570"
> > ........
> >
> > Jieshan.
> >
> > -----邮件原件-----
> > 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> > 发送时间: 2011年12月2日 4:20
> > 收件人: dev@hbase.apache.org
> > 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> > 主题: Re: Suspected memory leak
> >
> > Adding to the excellent write-up by Jonathan:
> > Since finalizer is involved, it takes two GC cycles to collect them.  Due
> > to a bug/bugs in the CMS GC, collection may not happen and the heap can
> > grow really big.  See
> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.
> >
> > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the
> socket
> > related objects were being collected properly. This option forces the
> > concurrent marker to be one thread. This was for HDFS, but I think the
> same
> > applies here.
> >
> > Kihwal
> >
> > On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
> >
> > Make sure its not the issue that Jonathan Payne identifiied a while
> > back:
> >
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> > St.Ack
> >
> >
>
>

Re: FeedbackRe: Suspected memory leak

Posted by Gaojinchao <ga...@huawei.com>.
Ok. Anyone has better solution?. Do we need to introduce in book?


-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2011年12月5日 11:39
收件人: dev@hbase.apache.org
主题: Re: FeedbackRe: Suspected memory leak

Jinchao:
Since we found the workaround, can you summarize the following statistics
on HBASE-4633 ?

Thanks

2011/12/4 Gaojinchao <ga...@huawei.com>

> Yes, I have tested, System is fine.
> Nearly one hours , trigger a full GC.
> 10022.210: [Full GC (System) 10022.210: [Tenured:
> 577566K->257349K(1048576K), 1.7515610 secs] 9651924K->257349K(14260672K),
> [Perm : 19161K->19161K(65536K)], 1.7518350 secs] [Times: user=1.75
> sys=0.00, real=1.75 secs]
> .........
>
> .........
> 13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K),
> 0.1414370 secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times:
> user=1.90 sys=0.01, real=0.14 secs]
> 13624.630: [Full GC (System) 13624.630: [Tenured:
> 310202K->175378K(1048576K), 1.9529280 secs] 11581276K->175378K(14260672K),
> [Perm : 19225K->19225K(65536K)], 1.9531660 secs]
>           [Times: user=1.94 sys=0.00, real=1.96 secs]
>
> 7543 root      20   0 17.0g  15g 9892 S    0 32.9   1184:34 java
> 7543 root      20   0 17.0g  15g 9892 S    1 32.9   1184:34 java
>
> -----邮件原件-----
> 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> 发送时间: 2011年12月5日 9:06
> 收件人: dev@hbase.apache.org
> 主题: Re: FeedbackRe: Suspected memory leak
>
> Can you try specifying XX:MaxDirectMemorySize with moderate value and see
> if the leak gets under control ?
>
> Thanks
>
> 2011/12/4 Gaojinchao <ga...@huawei.com>
>
> > I have attached the stack in
> > https://issues.apache.org/jira/browse/HBASE-4633.
> > I will update our story.
> >
> >
> > -----邮件原件-----
> > 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> > 发送时间: 2011年12月5日 7:37
> > 收件人: dev@hbase.apache.org; lars hofhansl
> > 主题: Re: FeedbackRe: Suspected memory leak
> >
> > I looked through TRUNK and 0.90 code but didn't find
> > HBaseClient.Connection.setParam().
> > The method should be sendParam().
> >
> > When I was in China I tried to access Jonathan's post but wasn't able to.
> >
> > If Jinchao's stack trace resonates with the one Jonathan posted, we
> should
> > consider using netty for HBaseClient.
> >
> > Cheers
> >
> > On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
> >
> > > I think HBASE-4508 is unrelated.
> > > The "connections" I referring to are HBaseClient.Connection objects
> (not
> > > HConnections).
> > > It turns out that HBaseClient.Connection.setParam is actually called
> > > directly by the client threads, which means we can get
> > > an unlimited amount of DirectByteBuffers (until we get a full GC).
> > >
> > > The JDK will cache 3 per thread with a size necessary to serve the IO.
> So
> > > sending some large requests from many thread
> > > will lead to OOM.
> > >
> > > I think that was a related thread that Stack forwarded a while back
> from
> > > the asynchbase mailing lists.
> > >
> > > Jinchao, could you add a text version (not a png image, please :-) ) of
> > > this to the jira?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Ted Yu <yu...@gmail.com>
> > > To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> > > Cc: Gaojinchao <ga...@huawei.com>; Chenjian <
> > jean.chenjian@huawei.com>;
> > > wenzaohua <we...@huawei.com>
> > > Sent: Sunday, December 4, 2011 12:43 PM
> > > Subject: Re: FeedbackRe: Suspected memory leak
> > >
> > > I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution
> because
> > > 0.90.5 hasn't been released.
> > > Assuming the NIO consumption is related to the number of connections
> from
> > > client side, it would help to perform benchmarking on 0.90.5
> > >
> > > Jinchao:
> > > Please attach stack trace to HBASE-4633 so that we can verify our
> > > assumptions.
> > >
> > > Thanks
> > >
> > > On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lh...@yahoo.com>
> > > wrote:
> > >
> > > > Thanks. Now the question is: How many connection threads do we have?
> > > >
> > > > I think there is one per regionserver, which would indeed be a
> problem.
> > > > Need to look at the code again (I'm only partially familiar with the
> > > > client code).
> > > >
> > > > Either the client should chunk (like the server does), or there
> should
> > be
> > > > a limited number of thread that
> > > > perform IO on behalf of the client (or both).
> > > >
> > > > -- Lars
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: Gaojinchao <ga...@huawei.com>
> > > > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > > > lhofhansl@yahoo.com>
> > > > Cc: Chenjian <je...@huawei.com>; wenzaohua <
> > wenzaohua@huawei.com
> > > >
> > > > Sent: Saturday, December 3, 2011 11:22 PM
> > > > Subject: Re: FeedbackRe: Suspected memory leak
> > > >
> > > > This is dump stack.
> > > >
> > > >
> > > > -----邮件原件-----
> > > > 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> > > > 发送时间: 2011年12月4日 14:15
> > > > 收件人: dev@hbase.apache.org
> > > > 抄送: Chenjian; wenzaohua
> > > > 主题: Re: FeedbackRe: Suspected memory leak
> > > >
> > > > Dropping user list.
> > > >
> > > > Could you (or somebody) point me to where the client is using NIO?
> > > > I'm looking at HBaseClient and I do not see references to NIO, also
> it
> > > > seems that all work is handed off to
> > > > separate threads: HBaseClient.Connection, and the JDK will not cache
> > more
> > > > than 3 direct buffers per thread.
> > > >
> > > > It's possible (likely?) that I missed something in the code.
> > > >
> > > > Thanks.
> > > >
> > > > -- Lars
> > > >
> > > > ________________________________
> > > > From: Gaojinchao <ga...@huawei.com>
> > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>; "
> > > dev@hbase.apache.org"
> > > > <de...@hbase.apache.org>
> > > > Cc: Chenjian <je...@huawei.com>; wenzaohua <
> > wenzaohua@huawei.com
> > > >
> > > > Sent: Saturday, December 3, 2011 7:57 PM
> > > > Subject: FeedbackRe: Suspected memory leak
> > > >
> > > > Thank you for your help.
> > > >
> > > > This issue appears to be a configuration problem:
> > > > 1. HBase client uses NIO(socket) API that uses the direct memory.
> > > > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> > > > there doesn't have "full gc", all direct memory can't reclaim.
> > > > Unfortunately, using GC confiugre parameter of our client doesn't
> > produce
> > > > any "full gc".
> > > >
> > > > This is only a preliminary result,  All tests is running, If have any
> > > > further results , we will be fed back.
> > > > Finally , I will update our story to issue
> > > > https://issues.apache.org/jira/browse/HBASE-4633.
> > > >
> > > > If our digging is crrect, whether we should set a default value for
> the
> > > > "-XXMaxDirectMemorySize" to prevent this situation?
> > > >
> > > >
> > > > Thanks
> > > >
> > > > -----邮件原件-----
> > > > 发件人: bijieshan [mailto:bijieshan@huawei.com]
> > > > 发送时间: 2011年12月2日 15:37
> > > > 收件人: dev@hbase.apache.org; user@hbase.apache.org
> > > > 抄送: Chenjian; wenzaohua
> > > > 主题: Re: Suspected memory leak
> > > >
> > > > Thank you all.
> > > > I think it's the same problem with the link provided by Stack.
> Because
> > > the
> > > > heap-size is stabilized, but the non-heap size keep growing. So I
> think
> > > not
> > > > the problem of the CMS GC bug.
> > > > And we have known the content of the problem memory section, all the
> > > > records contains the info like below:
> > > >
> > > >
> > >
> >
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> > > > "BBZHtable_UFDR_058,048342220093168-02570"
> > > > ........
> > > >
> > > > Jieshan.
> > > >
> > > > -----邮件原件-----
> > > > 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> > > > 发送时间: 2011年12月2日 4:20
> > > > 收件人: dev@hbase.apache.org
> > > > 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> > > > 主题: Re: Suspected memory leak
> > > >
> > > > Adding to the excellent write-up by Jonathan:
> > > > Since finalizer is involved, it takes two GC cycles to collect them.
> >  Due
> > > > to a bug/bugs in the CMS GC, collection may not happen and the heap
> can
> > > > grow really big.  See
> > > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
> > details.
> > > >
> > > > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the
> > > socket
> > > > related objects were being collected properly. This option forces the
> > > > concurrent marker to be one thread. This was for HDFS, but I think
> the
> > > same
> > > > applies here.
> > > >
> > > > Kihwal
> > > >
> > > > On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
> > > >
> > > > Make sure its not the issue that Jonathan Payne identifiied a while
> > > > back:
> > > >
> > >
> >
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> > > > St.Ack
> > > >
> > > >
> > >
> > >
> >
>

Re: FeedbackRe: Suspected memory leak

Posted by Ted Yu <yu...@gmail.com>.
Jinchao:
Since we found the workaround, can you summarize the following statistics
on HBASE-4633 ?

Thanks

2011/12/4 Gaojinchao <ga...@huawei.com>

> Yes, I have tested, System is fine.
> Nearly one hours , trigger a full GC.
> 10022.210: [Full GC (System) 10022.210: [Tenured:
> 577566K->257349K(1048576K), 1.7515610 secs] 9651924K->257349K(14260672K),
> [Perm : 19161K->19161K(65536K)], 1.7518350 secs] [Times: user=1.75
> sys=0.00, real=1.75 secs]
> .........
>
> .........
> 13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K),
> 0.1414370 secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times:
> user=1.90 sys=0.01, real=0.14 secs]
> 13624.630: [Full GC (System) 13624.630: [Tenured:
> 310202K->175378K(1048576K), 1.9529280 secs] 11581276K->175378K(14260672K),
> [Perm : 19225K->19225K(65536K)], 1.9531660 secs]
>           [Times: user=1.94 sys=0.00, real=1.96 secs]
>
> 7543 root      20   0 17.0g  15g 9892 S    0 32.9   1184:34 java
> 7543 root      20   0 17.0g  15g 9892 S    1 32.9   1184:34 java
>
> -----邮件原件-----
> 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> 发送时间: 2011年12月5日 9:06
> 收件人: dev@hbase.apache.org
> 主题: Re: FeedbackRe: Suspected memory leak
>
> Can you try specifying XX:MaxDirectMemorySize with moderate value and see
> if the leak gets under control ?
>
> Thanks
>
> 2011/12/4 Gaojinchao <ga...@huawei.com>
>
> > I have attached the stack in
> > https://issues.apache.org/jira/browse/HBASE-4633.
> > I will update our story.
> >
> >
> > -----邮件原件-----
> > 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> > 发送时间: 2011年12月5日 7:37
> > 收件人: dev@hbase.apache.org; lars hofhansl
> > 主题: Re: FeedbackRe: Suspected memory leak
> >
> > I looked through TRUNK and 0.90 code but didn't find
> > HBaseClient.Connection.setParam().
> > The method should be sendParam().
> >
> > When I was in China I tried to access Jonathan's post but wasn't able to.
> >
> > If Jinchao's stack trace resonates with the one Jonathan posted, we
> should
> > consider using netty for HBaseClient.
> >
> > Cheers
> >
> > On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
> >
> > > I think HBASE-4508 is unrelated.
> > > The "connections" I referring to are HBaseClient.Connection objects
> (not
> > > HConnections).
> > > It turns out that HBaseClient.Connection.setParam is actually called
> > > directly by the client threads, which means we can get
> > > an unlimited amount of DirectByteBuffers (until we get a full GC).
> > >
> > > The JDK will cache 3 per thread with a size necessary to serve the IO.
> So
> > > sending some large requests from many thread
> > > will lead to OOM.
> > >
> > > I think that was a related thread that Stack forwarded a while back
> from
> > > the asynchbase mailing lists.
> > >
> > > Jinchao, could you add a text version (not a png image, please :-) ) of
> > > this to the jira?
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Ted Yu <yu...@gmail.com>
> > > To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> > > Cc: Gaojinchao <ga...@huawei.com>; Chenjian <
> > jean.chenjian@huawei.com>;
> > > wenzaohua <we...@huawei.com>
> > > Sent: Sunday, December 4, 2011 12:43 PM
> > > Subject: Re: FeedbackRe: Suspected memory leak
> > >
> > > I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution
> because
> > > 0.90.5 hasn't been released.
> > > Assuming the NIO consumption is related to the number of connections
> from
> > > client side, it would help to perform benchmarking on 0.90.5
> > >
> > > Jinchao:
> > > Please attach stack trace to HBASE-4633 so that we can verify our
> > > assumptions.
> > >
> > > Thanks
> > >
> > > On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lh...@yahoo.com>
> > > wrote:
> > >
> > > > Thanks. Now the question is: How many connection threads do we have?
> > > >
> > > > I think there is one per regionserver, which would indeed be a
> problem.
> > > > Need to look at the code again (I'm only partially familiar with the
> > > > client code).
> > > >
> > > > Either the client should chunk (like the server does), or there
> should
> > be
> > > > a limited number of thread that
> > > > perform IO on behalf of the client (or both).
> > > >
> > > > -- Lars
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: Gaojinchao <ga...@huawei.com>
> > > > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > > > lhofhansl@yahoo.com>
> > > > Cc: Chenjian <je...@huawei.com>; wenzaohua <
> > wenzaohua@huawei.com
> > > >
> > > > Sent: Saturday, December 3, 2011 11:22 PM
> > > > Subject: Re: FeedbackRe: Suspected memory leak
> > > >
> > > > This is dump stack.
> > > >
> > > >
> > > > -----邮件原件-----
> > > > 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> > > > 发送时间: 2011年12月4日 14:15
> > > > 收件人: dev@hbase.apache.org
> > > > 抄送: Chenjian; wenzaohua
> > > > 主题: Re: FeedbackRe: Suspected memory leak
> > > >
> > > > Dropping user list.
> > > >
> > > > Could you (or somebody) point me to where the client is using NIO?
> > > > I'm looking at HBaseClient and I do not see references to NIO, also
> it
> > > > seems that all work is handed off to
> > > > separate threads: HBaseClient.Connection, and the JDK will not cache
> > more
> > > > than 3 direct buffers per thread.
> > > >
> > > > It's possible (likely?) that I missed something in the code.
> > > >
> > > > Thanks.
> > > >
> > > > -- Lars
> > > >
> > > > ________________________________
> > > > From: Gaojinchao <ga...@huawei.com>
> > > > To: "user@hbase.apache.org" <us...@hbase.apache.org>; "
> > > dev@hbase.apache.org"
> > > > <de...@hbase.apache.org>
> > > > Cc: Chenjian <je...@huawei.com>; wenzaohua <
> > wenzaohua@huawei.com
> > > >
> > > > Sent: Saturday, December 3, 2011 7:57 PM
> > > > Subject: FeedbackRe: Suspected memory leak
> > > >
> > > > Thank you for your help.
> > > >
> > > > This issue appears to be a configuration problem:
> > > > 1. HBase client uses NIO(socket) API that uses the direct memory.
> > > > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> > > > there doesn't have "full gc", all direct memory can't reclaim.
> > > > Unfortunately, using GC confiugre parameter of our client doesn't
> > produce
> > > > any "full gc".
> > > >
> > > > This is only a preliminary result,  All tests is running, If have any
> > > > further results , we will be fed back.
> > > > Finally , I will update our story to issue
> > > > https://issues.apache.org/jira/browse/HBASE-4633.
> > > >
> > > > If our digging is crrect, whether we should set a default value for
> the
> > > > "-XXMaxDirectMemorySize" to prevent this situation?
> > > >
> > > >
> > > > Thanks
> > > >
> > > > -----邮件原件-----
> > > > 发件人: bijieshan [mailto:bijieshan@huawei.com]
> > > > 发送时间: 2011年12月2日 15:37
> > > > 收件人: dev@hbase.apache.org; user@hbase.apache.org
> > > > 抄送: Chenjian; wenzaohua
> > > > 主题: Re: Suspected memory leak
> > > >
> > > > Thank you all.
> > > > I think it's the same problem with the link provided by Stack.
> Because
> > > the
> > > > heap-size is stabilized, but the non-heap size keep growing. So I
> think
> > > not
> > > > the problem of the CMS GC bug.
> > > > And we have known the content of the problem memory section, all the
> > > > records contains the info like below:
> > > >
> > > >
> > >
> >
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> > > > "BBZHtable_UFDR_058,048342220093168-02570"
> > > > ........
> > > >
> > > > Jieshan.
> > > >
> > > > -----邮件原件-----
> > > > 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> > > > 发送时间: 2011年12月2日 4:20
> > > > 收件人: dev@hbase.apache.org
> > > > 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> > > > 主题: Re: Suspected memory leak
> > > >
> > > > Adding to the excellent write-up by Jonathan:
> > > > Since finalizer is involved, it takes two GC cycles to collect them.
> >  Due
> > > > to a bug/bugs in the CMS GC, collection may not happen and the heap
> can
> > > > grow really big.  See
> > > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
> > details.
> > > >
> > > > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the
> > > socket
> > > > related objects were being collected properly. This option forces the
> > > > concurrent marker to be one thread. This was for HDFS, but I think
> the
> > > same
> > > > applies here.
> > > >
> > > > Kihwal
> > > >
> > > > On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
> > > >
> > > > Make sure its not the issue that Jonathan Payne identifiied a while
> > > > back:
> > > >
> > >
> >
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> > > > St.Ack
> > > >
> > > >
> > >
> > >
> >
>

Re: FeedbackRe: Suspected memory leak

Posted by Gaojinchao <ga...@huawei.com>.
Yes, I have tested, System is fine. 
Nearly one hours , trigger a full GC. 
10022.210: [Full GC (System) 10022.210: [Tenured: 577566K->257349K(1048576K), 1.7515610 secs] 9651924K->257349K(14260672K), [Perm : 19161K->19161K(65536K)], 1.7518350 secs] [Times: user=1.75 sys=0.00, real=1.75 secs]
.........

.........
13532.930: [GC 13532.931: [ParNew: 12801558K->981626K(13212096K), 0.1414370 secs] 13111752K->1291828K(14260672K), 0.1416880 secs] [Times: user=1.90 sys=0.01, real=0.14 secs]
13624.630: [Full GC (System) 13624.630: [Tenured: 310202K->175378K(1048576K), 1.9529280 secs] 11581276K->175378K(14260672K), [Perm : 19225K->19225K(65536K)], 1.9531660 secs] 
           [Times: user=1.94 sys=0.00, real=1.96 secs]

7543 root      20   0 17.0g  15g 9892 S    0 32.9   1184:34 java
7543 root      20   0 17.0g  15g 9892 S    1 32.9   1184:34 java

-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2011年12月5日 9:06
收件人: dev@hbase.apache.org
主题: Re: FeedbackRe: Suspected memory leak

Can you try specifying XX:MaxDirectMemorySize with moderate value and see
if the leak gets under control ?

Thanks

2011/12/4 Gaojinchao <ga...@huawei.com>

> I have attached the stack in
> https://issues.apache.org/jira/browse/HBASE-4633.
> I will update our story.
>
>
> -----邮件原件-----
> 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> 发送时间: 2011年12月5日 7:37
> 收件人: dev@hbase.apache.org; lars hofhansl
> 主题: Re: FeedbackRe: Suspected memory leak
>
> I looked through TRUNK and 0.90 code but didn't find
> HBaseClient.Connection.setParam().
> The method should be sendParam().
>
> When I was in China I tried to access Jonathan's post but wasn't able to.
>
> If Jinchao's stack trace resonates with the one Jonathan posted, we should
> consider using netty for HBaseClient.
>
> Cheers
>
> On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <lh...@yahoo.com> wrote:
>
> > I think HBASE-4508 is unrelated.
> > The "connections" I referring to are HBaseClient.Connection objects (not
> > HConnections).
> > It turns out that HBaseClient.Connection.setParam is actually called
> > directly by the client threads, which means we can get
> > an unlimited amount of DirectByteBuffers (until we get a full GC).
> >
> > The JDK will cache 3 per thread with a size necessary to serve the IO. So
> > sending some large requests from many thread
> > will lead to OOM.
> >
> > I think that was a related thread that Stack forwarded a while back from
> > the asynchbase mailing lists.
> >
> > Jinchao, could you add a text version (not a png image, please :-) ) of
> > this to the jira?
> >
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Ted Yu <yu...@gmail.com>
> > To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> > Cc: Gaojinchao <ga...@huawei.com>; Chenjian <
> jean.chenjian@huawei.com>;
> > wenzaohua <we...@huawei.com>
> > Sent: Sunday, December 4, 2011 12:43 PM
> > Subject: Re: FeedbackRe: Suspected memory leak
> >
> > I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because
> > 0.90.5 hasn't been released.
> > Assuming the NIO consumption is related to the number of connections from
> > client side, it would help to perform benchmarking on 0.90.5
> >
> > Jinchao:
> > Please attach stack trace to HBASE-4633 so that we can verify our
> > assumptions.
> >
> > Thanks
> >
> > On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lh...@yahoo.com>
> > wrote:
> >
> > > Thanks. Now the question is: How many connection threads do we have?
> > >
> > > I think there is one per regionserver, which would indeed be a problem.
> > > Need to look at the code again (I'm only partially familiar with the
> > > client code).
> > >
> > > Either the client should chunk (like the server does), or there should
> be
> > > a limited number of thread that
> > > perform IO on behalf of the client (or both).
> > >
> > > -- Lars
> > >
> > >
> > > ----- Original Message -----
> > > From: Gaojinchao <ga...@huawei.com>
> > > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > > lhofhansl@yahoo.com>
> > > Cc: Chenjian <je...@huawei.com>; wenzaohua <
> wenzaohua@huawei.com
> > >
> > > Sent: Saturday, December 3, 2011 11:22 PM
> > > Subject: Re: FeedbackRe: Suspected memory leak
> > >
> > > This is dump stack.
> > >
> > >
> > > -----邮件原件-----
> > > 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> > > 发送时间: 2011年12月4日 14:15
> > > 收件人: dev@hbase.apache.org
> > > 抄送: Chenjian; wenzaohua
> > > 主题: Re: FeedbackRe: Suspected memory leak
> > >
> > > Dropping user list.
> > >
> > > Could you (or somebody) point me to where the client is using NIO?
> > > I'm looking at HBaseClient and I do not see references to NIO, also it
> > > seems that all work is handed off to
> > > separate threads: HBaseClient.Connection, and the JDK will not cache
> more
> > > than 3 direct buffers per thread.
> > >
> > > It's possible (likely?) that I missed something in the code.
> > >
> > > Thanks.
> > >
> > > -- Lars
> > >
> > > ________________________________
> > > From: Gaojinchao <ga...@huawei.com>
> > > To: "user@hbase.apache.org" <us...@hbase.apache.org>; "
> > dev@hbase.apache.org"
> > > <de...@hbase.apache.org>
> > > Cc: Chenjian <je...@huawei.com>; wenzaohua <
> wenzaohua@huawei.com
> > >
> > > Sent: Saturday, December 3, 2011 7:57 PM
> > > Subject: FeedbackRe: Suspected memory leak
> > >
> > > Thank you for your help.
> > >
> > > This issue appears to be a configuration problem:
> > > 1. HBase client uses NIO(socket) API that uses the direct memory.
> > > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> > > there doesn't have "full gc", all direct memory can't reclaim.
> > > Unfortunately, using GC confiugre parameter of our client doesn't
> produce
> > > any "full gc".
> > >
> > > This is only a preliminary result,  All tests is running, If have any
> > > further results , we will be fed back.
> > > Finally , I will update our story to issue
> > > https://issues.apache.org/jira/browse/HBASE-4633.
> > >
> > > If our digging is crrect, whether we should set a default value for the
> > > "-XXMaxDirectMemorySize" to prevent this situation?
> > >
> > >
> > > Thanks
> > >
> > > -----邮件原件-----
> > > 发件人: bijieshan [mailto:bijieshan@huawei.com]
> > > 发送时间: 2011年12月2日 15:37
> > > 收件人: dev@hbase.apache.org; user@hbase.apache.org
> > > 抄送: Chenjian; wenzaohua
> > > 主题: Re: Suspected memory leak
> > >
> > > Thank you all.
> > > I think it's the same problem with the link provided by Stack. Because
> > the
> > > heap-size is stabilized, but the non-heap size keep growing. So I think
> > not
> > > the problem of the CMS GC bug.
> > > And we have known the content of the problem memory section, all the
> > > records contains the info like below:
> > >
> > >
> >
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> > > "BBZHtable_UFDR_058,048342220093168-02570"
> > > ........
> > >
> > > Jieshan.
> > >
> > > -----邮件原件-----
> > > 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> > > 发送时间: 2011年12月2日 4:20
> > > 收件人: dev@hbase.apache.org
> > > 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> > > 主题: Re: Suspected memory leak
> > >
> > > Adding to the excellent write-up by Jonathan:
> > > Since finalizer is involved, it takes two GC cycles to collect them.
>  Due
> > > to a bug/bugs in the CMS GC, collection may not happen and the heap can
> > > grow really big.  See
> > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
> details.
> > >
> > > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the
> > socket
> > > related objects were being collected properly. This option forces the
> > > concurrent marker to be one thread. This was for HDFS, but I think the
> > same
> > > applies here.
> > >
> > > Kihwal
> > >
> > > On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
> > >
> > > Make sure its not the issue that Jonathan Payne identifiied a while
> > > back:
> > >
> >
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> > > St.Ack
> > >
> > >
> >
> >
>

Re: FeedbackRe: Suspected memory leak

Posted by Ted Yu <yu...@gmail.com>.
Can you try specifying XX:MaxDirectMemorySize with moderate value and see
if the leak gets under control ?

Thanks

2011/12/4 Gaojinchao <ga...@huawei.com>

> I have attached the stack in
> https://issues.apache.org/jira/browse/HBASE-4633.
> I will update our story.
>
>
> -----邮件原件-----
> 发件人: Ted Yu [mailto:yuzhihong@gmail.com]
> 发送时间: 2011年12月5日 7:37
> 收件人: dev@hbase.apache.org; lars hofhansl
> 主题: Re: FeedbackRe: Suspected memory leak
>
> I looked through TRUNK and 0.90 code but didn't find
> HBaseClient.Connection.setParam().
> The method should be sendParam().
>
> When I was in China I tried to access Jonathan's post but wasn't able to.
>
> If Jinchao's stack trace resonates with the one Jonathan posted, we should
> consider using netty for HBaseClient.
>
> Cheers
>
> On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <lh...@yahoo.com> wrote:
>
> > I think HBASE-4508 is unrelated.
> > The "connections" I referring to are HBaseClient.Connection objects (not
> > HConnections).
> > It turns out that HBaseClient.Connection.setParam is actually called
> > directly by the client threads, which means we can get
> > an unlimited amount of DirectByteBuffers (until we get a full GC).
> >
> > The JDK will cache 3 per thread with a size necessary to serve the IO. So
> > sending some large requests from many thread
> > will lead to OOM.
> >
> > I think that was a related thread that Stack forwarded a while back from
> > the asynchbase mailing lists.
> >
> > Jinchao, could you add a text version (not a png image, please :-) ) of
> > this to the jira?
> >
> >
> > -- Lars
> >
> >
> >
> > ----- Original Message -----
> > From: Ted Yu <yu...@gmail.com>
> > To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> > Cc: Gaojinchao <ga...@huawei.com>; Chenjian <
> jean.chenjian@huawei.com>;
> > wenzaohua <we...@huawei.com>
> > Sent: Sunday, December 4, 2011 12:43 PM
> > Subject: Re: FeedbackRe: Suspected memory leak
> >
> > I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because
> > 0.90.5 hasn't been released.
> > Assuming the NIO consumption is related to the number of connections from
> > client side, it would help to perform benchmarking on 0.90.5
> >
> > Jinchao:
> > Please attach stack trace to HBASE-4633 so that we can verify our
> > assumptions.
> >
> > Thanks
> >
> > On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lh...@yahoo.com>
> > wrote:
> >
> > > Thanks. Now the question is: How many connection threads do we have?
> > >
> > > I think there is one per regionserver, which would indeed be a problem.
> > > Need to look at the code again (I'm only partially familiar with the
> > > client code).
> > >
> > > Either the client should chunk (like the server does), or there should
> be
> > > a limited number of thread that
> > > perform IO on behalf of the client (or both).
> > >
> > > -- Lars
> > >
> > >
> > > ----- Original Message -----
> > > From: Gaojinchao <ga...@huawei.com>
> > > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > > lhofhansl@yahoo.com>
> > > Cc: Chenjian <je...@huawei.com>; wenzaohua <
> wenzaohua@huawei.com
> > >
> > > Sent: Saturday, December 3, 2011 11:22 PM
> > > Subject: Re: FeedbackRe: Suspected memory leak
> > >
> > > This is dump stack.
> > >
> > >
> > > -----邮件原件-----
> > > 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> > > 发送时间: 2011年12月4日 14:15
> > > 收件人: dev@hbase.apache.org
> > > 抄送: Chenjian; wenzaohua
> > > 主题: Re: FeedbackRe: Suspected memory leak
> > >
> > > Dropping user list.
> > >
> > > Could you (or somebody) point me to where the client is using NIO?
> > > I'm looking at HBaseClient and I do not see references to NIO, also it
> > > seems that all work is handed off to
> > > separate threads: HBaseClient.Connection, and the JDK will not cache
> more
> > > than 3 direct buffers per thread.
> > >
> > > It's possible (likely?) that I missed something in the code.
> > >
> > > Thanks.
> > >
> > > -- Lars
> > >
> > > ________________________________
> > > From: Gaojinchao <ga...@huawei.com>
> > > To: "user@hbase.apache.org" <us...@hbase.apache.org>; "
> > dev@hbase.apache.org"
> > > <de...@hbase.apache.org>
> > > Cc: Chenjian <je...@huawei.com>; wenzaohua <
> wenzaohua@huawei.com
> > >
> > > Sent: Saturday, December 3, 2011 7:57 PM
> > > Subject: FeedbackRe: Suspected memory leak
> > >
> > > Thank you for your help.
> > >
> > > This issue appears to be a configuration problem:
> > > 1. HBase client uses NIO(socket) API that uses the direct memory.
> > > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> > > there doesn't have "full gc", all direct memory can't reclaim.
> > > Unfortunately, using GC confiugre parameter of our client doesn't
> produce
> > > any "full gc".
> > >
> > > This is only a preliminary result,  All tests is running, If have any
> > > further results , we will be fed back.
> > > Finally , I will update our story to issue
> > > https://issues.apache.org/jira/browse/HBASE-4633.
> > >
> > > If our digging is crrect, whether we should set a default value for the
> > > "-XXMaxDirectMemorySize" to prevent this situation?
> > >
> > >
> > > Thanks
> > >
> > > -----邮件原件-----
> > > 发件人: bijieshan [mailto:bijieshan@huawei.com]
> > > 发送时间: 2011年12月2日 15:37
> > > 收件人: dev@hbase.apache.org; user@hbase.apache.org
> > > 抄送: Chenjian; wenzaohua
> > > 主题: Re: Suspected memory leak
> > >
> > > Thank you all.
> > > I think it's the same problem with the link provided by Stack. Because
> > the
> > > heap-size is stabilized, but the non-heap size keep growing. So I think
> > not
> > > the problem of the CMS GC bug.
> > > And we have known the content of the problem memory section, all the
> > > records contains the info like below:
> > >
> > >
> >
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> > > "BBZHtable_UFDR_058,048342220093168-02570"
> > > ........
> > >
> > > Jieshan.
> > >
> > > -----邮件原件-----
> > > 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> > > 发送时间: 2011年12月2日 4:20
> > > 收件人: dev@hbase.apache.org
> > > 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> > > 主题: Re: Suspected memory leak
> > >
> > > Adding to the excellent write-up by Jonathan:
> > > Since finalizer is involved, it takes two GC cycles to collect them.
>  Due
> > > to a bug/bugs in the CMS GC, collection may not happen and the heap can
> > > grow really big.  See
> > > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
> details.
> > >
> > > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the
> > socket
> > > related objects were being collected properly. This option forces the
> > > concurrent marker to be one thread. This was for HDFS, but I think the
> > same
> > > applies here.
> > >
> > > Kihwal
> > >
> > > On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
> > >
> > > Make sure its not the issue that Jonathan Payne identifiied a while
> > > back:
> > >
> >
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> > > St.Ack
> > >
> > >
> >
> >
>

Re: FeedbackRe: Suspected memory leak

Posted by Gaojinchao <ga...@huawei.com>.
I have attached the stack in https://issues.apache.org/jira/browse/HBASE-4633.
I will update our story.


-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2011年12月5日 7:37
收件人: dev@hbase.apache.org; lars hofhansl
主题: Re: FeedbackRe: Suspected memory leak

I looked through TRUNK and 0.90 code but didn't find
HBaseClient.Connection.setParam().
The method should be sendParam().

When I was in China I tried to access Jonathan's post but wasn't able to.

If Jinchao's stack trace resonates with the one Jonathan posted, we should
consider using netty for HBaseClient.

Cheers

On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <lh...@yahoo.com> wrote:

> I think HBASE-4508 is unrelated.
> The "connections" I referring to are HBaseClient.Connection objects (not
> HConnections).
> It turns out that HBaseClient.Connection.setParam is actually called
> directly by the client threads, which means we can get
> an unlimited amount of DirectByteBuffers (until we get a full GC).
>
> The JDK will cache 3 per thread with a size necessary to serve the IO. So
> sending some large requests from many thread
> will lead to OOM.
>
> I think that was a related thread that Stack forwarded a while back from
> the asynchbase mailing lists.
>
> Jinchao, could you add a text version (not a png image, please :-) ) of
> this to the jira?
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Cc: Gaojinchao <ga...@huawei.com>; Chenjian <je...@huawei.com>;
> wenzaohua <we...@huawei.com>
> Sent: Sunday, December 4, 2011 12:43 PM
> Subject: Re: FeedbackRe: Suspected memory leak
>
> I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because
> 0.90.5 hasn't been released.
> Assuming the NIO consumption is related to the number of connections from
> client side, it would help to perform benchmarking on 0.90.5
>
> Jinchao:
> Please attach stack trace to HBASE-4633 so that we can verify our
> assumptions.
>
> Thanks
>
> On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
>
> > Thanks. Now the question is: How many connection threads do we have?
> >
> > I think there is one per regionserver, which would indeed be a problem.
> > Need to look at the code again (I'm only partially familiar with the
> > client code).
> >
> > Either the client should chunk (like the server does), or there should be
> > a limited number of thread that
> > perform IO on behalf of the client (or both).
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Gaojinchao <ga...@huawei.com>
> > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > lhofhansl@yahoo.com>
> > Cc: Chenjian <je...@huawei.com>; wenzaohua <wenzaohua@huawei.com
> >
> > Sent: Saturday, December 3, 2011 11:22 PM
> > Subject: Re: FeedbackRe: Suspected memory leak
> >
> > This is dump stack.
> >
> >
> > -----邮件原件-----
> > 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> > 发送时间: 2011年12月4日 14:15
> > 收件人: dev@hbase.apache.org
> > 抄送: Chenjian; wenzaohua
> > 主题: Re: FeedbackRe: Suspected memory leak
> >
> > Dropping user list.
> >
> > Could you (or somebody) point me to where the client is using NIO?
> > I'm looking at HBaseClient and I do not see references to NIO, also it
> > seems that all work is handed off to
> > separate threads: HBaseClient.Connection, and the JDK will not cache more
> > than 3 direct buffers per thread.
> >
> > It's possible (likely?) that I missed something in the code.
> >
> > Thanks.
> >
> > -- Lars
> >
> > ________________________________
> > From: Gaojinchao <ga...@huawei.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>; "
> dev@hbase.apache.org"
> > <de...@hbase.apache.org>
> > Cc: Chenjian <je...@huawei.com>; wenzaohua <wenzaohua@huawei.com
> >
> > Sent: Saturday, December 3, 2011 7:57 PM
> > Subject: FeedbackRe: Suspected memory leak
> >
> > Thank you for your help.
> >
> > This issue appears to be a configuration problem:
> > 1. HBase client uses NIO(socket) API that uses the direct memory.
> > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> > there doesn't have "full gc", all direct memory can't reclaim.
> > Unfortunately, using GC confiugre parameter of our client doesn't produce
> > any "full gc".
> >
> > This is only a preliminary result,  All tests is running, If have any
> > further results , we will be fed back.
> > Finally , I will update our story to issue
> > https://issues.apache.org/jira/browse/HBASE-4633.
> >
> > If our digging is crrect, whether we should set a default value for the
> > "-XXMaxDirectMemorySize" to prevent this situation?
> >
> >
> > Thanks
> >
> > -----邮件原件-----
> > 发件人: bijieshan [mailto:bijieshan@huawei.com]
> > 发送时间: 2011年12月2日 15:37
> > 收件人: dev@hbase.apache.org; user@hbase.apache.org
> > 抄送: Chenjian; wenzaohua
> > 主题: Re: Suspected memory leak
> >
> > Thank you all.
> > I think it's the same problem with the link provided by Stack. Because
> the
> > heap-size is stabilized, but the non-heap size keep growing. So I think
> not
> > the problem of the CMS GC bug.
> > And we have known the content of the problem memory section, all the
> > records contains the info like below:
> >
> >
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> > "BBZHtable_UFDR_058,048342220093168-02570"
> > ........
> >
> > Jieshan.
> >
> > -----邮件原件-----
> > 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> > 发送时间: 2011年12月2日 4:20
> > 收件人: dev@hbase.apache.org
> > 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> > 主题: Re: Suspected memory leak
> >
> > Adding to the excellent write-up by Jonathan:
> > Since finalizer is involved, it takes two GC cycles to collect them.  Due
> > to a bug/bugs in the CMS GC, collection may not happen and the heap can
> > grow really big.  See
> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.
> >
> > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the
> socket
> > related objects were being collected properly. This option forces the
> > concurrent marker to be one thread. This was for HDFS, but I think the
> same
> > applies here.
> >
> > Kihwal
> >
> > On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
> >
> > Make sure its not the issue that Jonathan Payne identifiied a while
> > back:
> >
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> > St.Ack
> >
> >
>
>

Re: FeedbackRe: Suspected memory leak

Posted by Ted Yu <yu...@gmail.com>.
I looked through TRUNK and 0.90 code but didn't find
HBaseClient.Connection.setParam().
The method should be sendParam().

When I was in China I tried to access Jonathan's post but wasn't able to.

If Jinchao's stack trace resonates with the one Jonathan posted, we should
consider using netty for HBaseClient.

Cheers

On Sun, Dec 4, 2011 at 1:12 PM, lars hofhansl <lh...@yahoo.com> wrote:

> I think HBASE-4508 is unrelated.
> The "connections" I referring to are HBaseClient.Connection objects (not
> HConnections).
> It turns out that HBaseClient.Connection.setParam is actually called
> directly by the client threads, which means we can get
> an unlimited amount of DirectByteBuffers (until we get a full GC).
>
> The JDK will cache 3 per thread with a size necessary to serve the IO. So
> sending some large requests from many thread
> will lead to OOM.
>
> I think that was a related thread that Stack forwarded a while back from
> the asynchbase mailing lists.
>
> Jinchao, could you add a text version (not a png image, please :-) ) of
> this to the jira?
>
>
> -- Lars
>
>
>
> ----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> Cc: Gaojinchao <ga...@huawei.com>; Chenjian <je...@huawei.com>;
> wenzaohua <we...@huawei.com>
> Sent: Sunday, December 4, 2011 12:43 PM
> Subject: Re: FeedbackRe: Suspected memory leak
>
> I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because
> 0.90.5 hasn't been released.
> Assuming the NIO consumption is related to the number of connections from
> client side, it would help to perform benchmarking on 0.90.5
>
> Jinchao:
> Please attach stack trace to HBASE-4633 so that we can verify our
> assumptions.
>
> Thanks
>
> On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
>
> > Thanks. Now the question is: How many connection threads do we have?
> >
> > I think there is one per regionserver, which would indeed be a problem.
> > Need to look at the code again (I'm only partially familiar with the
> > client code).
> >
> > Either the client should chunk (like the server does), or there should be
> > a limited number of thread that
> > perform IO on behalf of the client (or both).
> >
> > -- Lars
> >
> >
> > ----- Original Message -----
> > From: Gaojinchao <ga...@huawei.com>
> > To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> > lhofhansl@yahoo.com>
> > Cc: Chenjian <je...@huawei.com>; wenzaohua <wenzaohua@huawei.com
> >
> > Sent: Saturday, December 3, 2011 11:22 PM
> > Subject: Re: FeedbackRe: Suspected memory leak
> >
> > This is dump stack.
> >
> >
> > -----邮件原件-----
> > 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> > 发送时间: 2011年12月4日 14:15
> > 收件人: dev@hbase.apache.org
> > 抄送: Chenjian; wenzaohua
> > 主题: Re: FeedbackRe: Suspected memory leak
> >
> > Dropping user list.
> >
> > Could you (or somebody) point me to where the client is using NIO?
> > I'm looking at HBaseClient and I do not see references to NIO, also it
> > seems that all work is handed off to
> > separate threads: HBaseClient.Connection, and the JDK will not cache more
> > than 3 direct buffers per thread.
> >
> > It's possible (likely?) that I missed something in the code.
> >
> > Thanks.
> >
> > -- Lars
> >
> > ________________________________
> > From: Gaojinchao <ga...@huawei.com>
> > To: "user@hbase.apache.org" <us...@hbase.apache.org>; "
> dev@hbase.apache.org"
> > <de...@hbase.apache.org>
> > Cc: Chenjian <je...@huawei.com>; wenzaohua <wenzaohua@huawei.com
> >
> > Sent: Saturday, December 3, 2011 7:57 PM
> > Subject: FeedbackRe: Suspected memory leak
> >
> > Thank you for your help.
> >
> > This issue appears to be a configuration problem:
> > 1. HBase client uses NIO(socket) API that uses the direct memory.
> > 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> > there doesn't have "full gc", all direct memory can't reclaim.
> > Unfortunately, using GC confiugre parameter of our client doesn't produce
> > any "full gc".
> >
> > This is only a preliminary result,  All tests is running, If have any
> > further results , we will be fed back.
> > Finally , I will update our story to issue
> > https://issues.apache.org/jira/browse/HBASE-4633.
> >
> > If our digging is crrect, whether we should set a default value for the
> > "-XXMaxDirectMemorySize" to prevent this situation?
> >
> >
> > Thanks
> >
> > -----邮件原件-----
> > 发件人: bijieshan [mailto:bijieshan@huawei.com]
> > 发送时间: 2011年12月2日 15:37
> > 收件人: dev@hbase.apache.org; user@hbase.apache.org
> > 抄送: Chenjian; wenzaohua
> > 主题: Re: Suspected memory leak
> >
> > Thank you all.
> > I think it's the same problem with the link provided by Stack. Because
> the
> > heap-size is stabilized, but the non-heap size keep growing. So I think
> not
> > the problem of the CMS GC bug.
> > And we have known the content of the problem memory section, all the
> > records contains the info like below:
> >
> >
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> > "BBZHtable_UFDR_058,048342220093168-02570"
> > ........
> >
> > Jieshan.
> >
> > -----邮件原件-----
> > 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> > 发送时间: 2011年12月2日 4:20
> > 收件人: dev@hbase.apache.org
> > 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> > 主题: Re: Suspected memory leak
> >
> > Adding to the excellent write-up by Jonathan:
> > Since finalizer is involved, it takes two GC cycles to collect them.  Due
> > to a bug/bugs in the CMS GC, collection may not happen and the heap can
> > grow really big.  See
> > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.
> >
> > Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the
> socket
> > related objects were being collected properly. This option forces the
> > concurrent marker to be one thread. This was for HDFS, but I think the
> same
> > applies here.
> >
> > Kihwal
> >
> > On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
> >
> > Make sure its not the issue that Jonathan Payne identifiied a while
> > back:
> >
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> > St.Ack
> >
> >
>
>

Re: FeedbackRe: Suspected memory leak

Posted by lars hofhansl <lh...@yahoo.com>.
I think HBASE-4508 is unrelated.
The "connections" I referring to are HBaseClient.Connection objects (not HConnections).
It turns out that HBaseClient.Connection.setParam is actually called directly by the client threads, which means we can get
an unlimited amount of DirectByteBuffers (until we get a full GC).

The JDK will cache 3 per thread with a size necessary to serve the IO. So sending some large requests from many thread
will lead to OOM.

I think that was a related thread that Stack forwarded a while back from the asynchbase mailing lists.

Jinchao, could you add a text version (not a png image, please :-) ) of this to the jira?


-- Lars



----- Original Message -----
From: Ted Yu <yu...@gmail.com>
To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Cc: Gaojinchao <ga...@huawei.com>; Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com>
Sent: Sunday, December 4, 2011 12:43 PM
Subject: Re: FeedbackRe: Suspected memory leak

I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because
0.90.5 hasn't been released.
Assuming the NIO consumption is related to the number of connections from
client side, it would help to perform benchmarking on 0.90.5

Jinchao:
Please attach stack trace to HBASE-4633 so that we can verify our
assumptions.

Thanks

On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Thanks. Now the question is: How many connection threads do we have?
>
> I think there is one per regionserver, which would indeed be a problem.
> Need to look at the code again (I'm only partially familiar with the
> client code).
>
> Either the client should chunk (like the server does), or there should be
> a limited number of thread that
> perform IO on behalf of the client (or both).
>
> -- Lars
>
>
> ----- Original Message -----
> From: Gaojinchao <ga...@huawei.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> lhofhansl@yahoo.com>
> Cc: Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com>
> Sent: Saturday, December 3, 2011 11:22 PM
> Subject: Re: FeedbackRe: Suspected memory leak
>
> This is dump stack.
>
>
> -----邮件原件-----
> 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> 发送时间: 2011年12月4日 14:15
> 收件人: dev@hbase.apache.org
> 抄送: Chenjian; wenzaohua
> 主题: Re: FeedbackRe: Suspected memory leak
>
> Dropping user list.
>
> Could you (or somebody) point me to where the client is using NIO?
> I'm looking at HBaseClient and I do not see references to NIO, also it
> seems that all work is handed off to
> separate threads: HBaseClient.Connection, and the JDK will not cache more
> than 3 direct buffers per thread.
>
> It's possible (likely?) that I missed something in the code.
>
> Thanks.
>
> -- Lars
>
> ________________________________
> From: Gaojinchao <ga...@huawei.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org"
> <de...@hbase.apache.org>
> Cc: Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com>
> Sent: Saturday, December 3, 2011 7:57 PM
> Subject: FeedbackRe: Suspected memory leak
>
> Thank you for your help.
>
> This issue appears to be a configuration problem:
> 1. HBase client uses NIO(socket) API that uses the direct memory.
> 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> there doesn't have "full gc", all direct memory can't reclaim.
> Unfortunately, using GC confiugre parameter of our client doesn't produce
> any "full gc".
>
> This is only a preliminary result,  All tests is running, If have any
> further results , we will be fed back.
> Finally , I will update our story to issue
> https://issues.apache.org/jira/browse/HBASE-4633.
>
> If our digging is crrect, whether we should set a default value for the
> "-XXMaxDirectMemorySize" to prevent this situation?
>
>
> Thanks
>
> -----邮件原件-----
> 发件人: bijieshan [mailto:bijieshan@huawei.com]
> 发送时间: 2011年12月2日 15:37
> 收件人: dev@hbase.apache.org; user@hbase.apache.org
> 抄送: Chenjian; wenzaohua
> 主题: Re: Suspected memory leak
>
> Thank you all.
> I think it's the same problem with the link provided by Stack. Because the
> heap-size is stabilized, but the non-heap size keep growing. So I think not
> the problem of the CMS GC bug.
> And we have known the content of the problem memory section, all the
> records contains the info like below:
>
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> "BBZHtable_UFDR_058,048342220093168-02570"
> ........
>
> Jieshan.
>
> -----邮件原件-----
> 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> 发送时间: 2011年12月2日 4:20
> 收件人: dev@hbase.apache.org
> 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> 主题: Re: Suspected memory leak
>
> Adding to the excellent write-up by Jonathan:
> Since finalizer is involved, it takes two GC cycles to collect them.  Due
> to a bug/bugs in the CMS GC, collection may not happen and the heap can
> grow really big.  See
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.
>
> Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket
> related objects were being collected properly. This option forces the
> concurrent marker to be one thread. This was for HDFS, but I think the same
> applies here.
>
> Kihwal
>
> On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
>
> Make sure its not the issue that Jonathan Payne identifiied a while
> back:
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> St.Ack
>
>


Re: FeedbackRe: Suspected memory leak

Posted by Ted Yu <yu...@gmail.com>.
I think Jinchao wasn't using HBASE-4508 in his 0.90 distribution because
0.90.5 hasn't been released.
Assuming the NIO consumption is related to the number of connections from
client side, it would help to perform benchmarking on 0.90.5

Jinchao:
Please attach stack trace to HBASE-4633 so that we can verify our
assumptions.

Thanks

On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Thanks. Now the question is: How many connection threads do we have?
>
> I think there is one per regionserver, which would indeed be a problem.
> Need to look at the code again (I'm only partially familiar with the
> client code).
>
> Either the client should chunk (like the server does), or there should be
> a limited number of thread that
> perform IO on behalf of the client (or both).
>
> -- Lars
>
>
> ----- Original Message -----
> From: Gaojinchao <ga...@huawei.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <
> lhofhansl@yahoo.com>
> Cc: Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com>
> Sent: Saturday, December 3, 2011 11:22 PM
> Subject: Re: FeedbackRe: Suspected memory leak
>
> This is dump stack.
>
>
> -----邮件原件-----
> 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> 发送时间: 2011年12月4日 14:15
> 收件人: dev@hbase.apache.org
> 抄送: Chenjian; wenzaohua
> 主题: Re: FeedbackRe: Suspected memory leak
>
> Dropping user list.
>
> Could you (or somebody) point me to where the client is using NIO?
> I'm looking at HBaseClient and I do not see references to NIO, also it
> seems that all work is handed off to
> separate threads: HBaseClient.Connection, and the JDK will not cache more
> than 3 direct buffers per thread.
>
> It's possible (likely?) that I missed something in the code.
>
> Thanks.
>
> -- Lars
>
> ________________________________
> From: Gaojinchao <ga...@huawei.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org"
> <de...@hbase.apache.org>
> Cc: Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com>
> Sent: Saturday, December 3, 2011 7:57 PM
> Subject: FeedbackRe: Suspected memory leak
>
> Thank you for your help.
>
> This issue appears to be a configuration problem:
> 1. HBase client uses NIO(socket) API that uses the direct memory.
> 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if
> there doesn't have "full gc", all direct memory can't reclaim.
> Unfortunately, using GC confiugre parameter of our client doesn't produce
> any "full gc".
>
> This is only a preliminary result,  All tests is running, If have any
> further results , we will be fed back.
> Finally , I will update our story to issue
> https://issues.apache.org/jira/browse/HBASE-4633.
>
> If our digging is crrect, whether we should set a default value for the
> "-XXMaxDirectMemorySize" to prevent this situation?
>
>
> Thanks
>
> -----邮件原件-----
> 发件人: bijieshan [mailto:bijieshan@huawei.com]
> 发送时间: 2011年12月2日 15:37
> 收件人: dev@hbase.apache.org; user@hbase.apache.org
> 抄送: Chenjian; wenzaohua
> 主题: Re: Suspected memory leak
>
> Thank you all.
> I think it's the same problem with the link provided by Stack. Because the
> heap-size is stabilized, but the non-heap size keep growing. So I think not
> the problem of the CMS GC bug.
> And we have known the content of the problem memory section, all the
> records contains the info like below:
>
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> "BBZHtable_UFDR_058,048342220093168-02570"
> ........
>
> Jieshan.
>
> -----邮件原件-----
> 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> 发送时间: 2011年12月2日 4:20
> 收件人: dev@hbase.apache.org
> 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> 主题: Re: Suspected memory leak
>
> Adding to the excellent write-up by Jonathan:
> Since finalizer is involved, it takes two GC cycles to collect them.  Due
> to a bug/bugs in the CMS GC, collection may not happen and the heap can
> grow really big.  See
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.
>
> Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket
> related objects were being collected properly. This option forces the
> concurrent marker to be one thread. This was for HDFS, but I think the same
> applies here.
>
> Kihwal
>
> On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
>
> Make sure its not the issue that Jonathan Payne identifiied a while
> back:
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> St.Ack
>
>

Re: FeedbackRe: Suspected memory leak

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
Did Gaojinchao attached the stack dump people received it (Lars?).
Could some one or Gaojinchao attach it to the jira.

-Shrijeet

On Sun, Dec 4, 2011 at 12:24 PM, lars hofhansl <lh...@yahoo.com> wrote:
>
> Thanks. Now the question is: How many connection threads do we have?
>
> I think there is one per regionserver, which would indeed be a problem.
> Need to look at the code again (I'm only partially familiar with the client code).
>
> Either the client should chunk (like the server does), or there should be a limited number of thread that
> perform IO on behalf of the client (or both).
>
> -- Lars
>
>
> ----- Original Message -----
> From: Gaojinchao <ga...@huawei.com>
> To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>
> Cc: Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com>
> Sent: Saturday, December 3, 2011 11:22 PM
> Subject: Re: FeedbackRe: Suspected memory leak
>
> This is dump stack.
>
>
> -----邮件原件-----
> 发件人: lars hofhansl [mailto:lhofhansl@yahoo.com]
> 发送时间: 2011年12月4日 14:15
> 收件人: dev@hbase.apache.org
> 抄送: Chenjian; wenzaohua
> 主题: Re: FeedbackRe: Suspected memory leak
>
> Dropping user list.
>
> Could you (or somebody) point me to where the client is using NIO?
> I'm looking at HBaseClient and I do not see references to NIO, also it seems that all work is handed off to
> separate threads: HBaseClient.Connection, and the JDK will not cache more than 3 direct buffers per thread.
>
> It's possible (likely?) that I missed something in the code.
>
> Thanks.
>
> -- Lars
>
> ________________________________
> From: Gaojinchao <ga...@huawei.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org" <de...@hbase.apache.org>
> Cc: Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com>
> Sent: Saturday, December 3, 2011 7:57 PM
> Subject: FeedbackRe: Suspected memory leak
>
> Thank you for your help.
>
> This issue appears to be a configuration problem:
> 1. HBase client uses NIO(socket) API that uses the direct memory.
> 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".
>
> This is only a preliminary result,  All tests is running, If have any further results , we will be fed back.
> Finally , I will update our story to issue https://issues.apache.org/jira/browse/HBASE-4633.
>
> If our digging is crrect, whether we should set a default value for the "-XXMaxDirectMemorySize" to prevent this situation?
>
>
> Thanks
>
> -----邮件原件-----
> 发件人: bijieshan [mailto:bijieshan@huawei.com]
> 发送时间: 2011年12月2日 15:37
> 收件人: dev@hbase.apache.org; user@hbase.apache.org
> 抄送: Chenjian; wenzaohua
> 主题: Re: Suspected memory leak
>
> Thank you all.
> I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug.
> And we have known the content of the problem memory section, all the records contains the info like below:
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
> "BBZHtable_UFDR_058,048342220093168-02570"
> ........
>
> Jieshan.
>
> -----邮件原件-----
> 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> 发送时间: 2011年12月2日 4:20
> 收件人: dev@hbase.apache.org
> 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> 主题: Re: Suspected memory leak
>
> Adding to the excellent write-up by Jonathan:
> Since finalizer is involved, it takes two GC cycles to collect them.  Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.
>
> Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.
>
> Kihwal
>
> On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
>
> Make sure its not the issue that Jonathan Payne identifiied a while
> back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
> St.Ack
>

Re: FeedbackRe: Suspected memory leak

Posted by lars hofhansl <lh...@yahoo.com>.
Thanks. Now the question is: How many connection threads do we have?

I think there is one per regionserver, which would indeed be a problem.
Need to look at the code again (I'm only partially familiar with the client code).

Either the client should chunk (like the server does), or there should be a limited number of thread that
perform IO on behalf of the client (or both).

-- Lars


----- Original Message -----
From: Gaojinchao <ga...@huawei.com>
To: "dev@hbase.apache.org" <de...@hbase.apache.org>; lars hofhansl <lh...@yahoo.com>
Cc: Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com>
Sent: Saturday, December 3, 2011 11:22 PM
Subject: Re: FeedbackRe: Suspected memory leak

This is dump stack.


-----邮件原件-----
发件人: lars hofhansl [mailto:lhofhansl@yahoo.com] 
发送时间: 2011年12月4日 14:15
收件人: dev@hbase.apache.org
抄送: Chenjian; wenzaohua
主题: Re: FeedbackRe: Suspected memory leak

Dropping user list.

Could you (or somebody) point me to where the client is using NIO?
I'm looking at HBaseClient and I do not see references to NIO, also it seems that all work is handed off to
separate threads: HBaseClient.Connection, and the JDK will not cache more than 3 direct buffers per thread.

It's possible (likely?) that I missed something in the code.

Thanks.

-- Lars

________________________________
From: Gaojinchao <ga...@huawei.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org" <de...@hbase.apache.org> 
Cc: Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com> 
Sent: Saturday, December 3, 2011 7:57 PM
Subject: FeedbackRe: Suspected memory leak

Thank you for your help.

This issue appears to be a configuration problem:
1. HBase client uses NIO(socket) API that uses the direct memory.
2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".

This is only a preliminary result,  All tests is running, If have any further results , we will be fed back.
Finally , I will update our story to issue https://issues.apache.org/jira/browse/HBASE-4633. 

If our digging is crrect, whether we should set a default value for the "-XXMaxDirectMemorySize" to prevent this situation?


Thanks

-----邮件原件-----
发件人: bijieshan [mailto:bijieshan@huawei.com] 
发送时间: 2011年12月2日 15:37
收件人: dev@hbase.apache.org; user@hbase.apache.org
抄送: Chenjian; wenzaohua
主题: Re: Suspected memory leak

Thank you all. 
I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. 
And we have known the content of the problem memory section, all the records contains the info like below:
"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
"BBZHtable_UFDR_058,048342220093168-02570"
........

Jieshan.

-----邮件原件-----
发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com] 
发送时间: 2011年12月2日 4:20
收件人: dev@hbase.apache.org
抄送: Ramakrishna s vasudevan; user@hbase.apache.org
主题: Re: Suspected memory leak

Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357# 
St.Ack  


Re: FeedbackRe: Suspected memory leak

Posted by Gaojinchao <ga...@huawei.com>.
This is dump stack.


-----邮件原件-----
发件人: lars hofhansl [mailto:lhofhansl@yahoo.com] 
发送时间: 2011年12月4日 14:15
收件人: dev@hbase.apache.org
抄送: Chenjian; wenzaohua
主题: Re: FeedbackRe: Suspected memory leak

Dropping user list.

Could you (or somebody) point me to where the client is using NIO?
I'm looking at HBaseClient and I do not see references to NIO, also it seems that all work is handed off to
separate threads: HBaseClient.Connection, and the JDK will not cache more than 3 direct buffers per thread.

It's possible (likely?) that I missed something in the code.

Thanks.

-- Lars

________________________________
From: Gaojinchao <ga...@huawei.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org" <de...@hbase.apache.org> 
Cc: Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com> 
Sent: Saturday, December 3, 2011 7:57 PM
Subject: FeedbackRe: Suspected memory leak

Thank you for your help.

This issue appears to be a configuration problem:
1. HBase client uses NIO(socket) API that uses the direct memory.
2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".

This is only a preliminary result,  All tests is running, If have any further results , we will be fed back.
Finally , I will update our story to issue https://issues.apache.org/jira/browse/HBASE-4633. 

If our digging is crrect, whether we should set a default value for the "-XXMaxDirectMemorySize" to prevent this situation?


Thanks

-----邮件原件-----
发件人: bijieshan [mailto:bijieshan@huawei.com] 
发送时间: 2011年12月2日 15:37
收件人: dev@hbase.apache.org; user@hbase.apache.org
抄送: Chenjian; wenzaohua
主题: Re: Suspected memory leak

Thank you all. 
I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. 
And we have known the content of the problem memory section, all the records contains the info like below:
"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
"BBZHtable_UFDR_058,048342220093168-02570"
........

Jieshan.

-----邮件原件-----
发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com] 
发送时间: 2011年12月2日 4:20
收件人: dev@hbase.apache.org
抄送: Ramakrishna s vasudevan; user@hbase.apache.org
主题: Re: Suspected memory leak

Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack  

Re: FeedbackRe: Suspected memory leak

Posted by lars hofhansl <lh...@yahoo.com>.
Dropping user list.

Could you (or somebody) point me to where the client is using NIO?
I'm looking at HBaseClient and I do not see references to NIO, also it seems that all work is handed off to
separate threads: HBaseClient.Connection, and the JDK will not cache more than 3 direct buffers per thread.

It's possible (likely?) that I missed something in the code.

Thanks.

-- Lars

________________________________
From: Gaojinchao <ga...@huawei.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; "dev@hbase.apache.org" <de...@hbase.apache.org> 
Cc: Chenjian <je...@huawei.com>; wenzaohua <we...@huawei.com> 
Sent: Saturday, December 3, 2011 7:57 PM
Subject: FeedbackRe: Suspected memory leak

Thank you for your help.

This issue appears to be a configuration problem:
1. HBase client uses NIO(socket) API that uses the direct memory.
2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".

This is only a preliminary result,  All tests is running, If have any further results , we will be fed back.
Finally , I will update our story to issue https://issues.apache.org/jira/browse/HBASE-4633. 

If our digging is crrect, whether we should set a default value for the "-XXMaxDirectMemorySize" to prevent this situation?


Thanks

-----邮件原件-----
发件人: bijieshan [mailto:bijieshan@huawei.com] 
发送时间: 2011年12月2日 15:37
收件人: dev@hbase.apache.org; user@hbase.apache.org
抄送: Chenjian; wenzaohua
主题: Re: Suspected memory leak

Thank you all. 
I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. 
And we have known the content of the problem memory section, all the records contains the info like below:
"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||";
"BBZHtable_UFDR_058,048342220093168-02570"
........

Jieshan.

-----邮件原件-----
发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com] 
发送时间: 2011年12月2日 4:20
收件人: dev@hbase.apache.org
抄送: Ramakrishna s vasudevan; user@hbase.apache.org
主题: Re: Suspected memory leak

Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack  

RE: FeedbackRe: Suspected memory leak

Posted by Sandy Pratt <pr...@adobe.com>.
Gaojinchao,

I'm not certain, but this looks a lot like some of the issues I've been dealing with lately (namely, non-Java-heap memory leakage).

First, -XX:MaxDirectMemorySize doesn't seem to be a solution.  This flag is poorly documented, and moreover the problem appears to be related to releasing/reclaiming resources rather than over-allocating them.  See http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=ae283c11508fb97ede5fe27a1554b?bug_id=4469299

Second, you may wish to experiment with "-XX:+UseParallelGC -XX:+UseParallelOldGC" rather than CMS GC.  I have been trying this recently on some of my app servers and hadoop servers, and it certainly does fix the problem of non-Java heap growth.  The concern with parallel GC is that full GCs (which are the solution to the non-heap memory problem, it would seem) take too long.  Personally, I consider this reasoning fallacious, since full GC is bound to occur sooner or later, and when using the CMS GC with this bug in effect, they can be fatal (and even without this bug, CMS uses a single thread for a full GC AFAIK).  The numbers for parallel GC on a 2G heap are not terrible, even without tuning, even with old processors (max pause 2.8 sec, avg pause 1 sec for a full GC, with minor collections outnumbering the major at least 3:1, total overhead 1.3%).  If your application can tolerate a second or two of latency once in a while, you can switch to parallelOldGC and call it a day.  

The fact that some installations are trying to deal with ~24GB heaps sounds like a design issue to me; HBase and Hadoop are already designed to scale horizontally, and this emphasis on scaling vertically just because the hardware comes in a certain size sounds misguided.  But not having that hardware, I might be missing something.

Finally, you might look at changing the vm.swappiness parameter in the Linux kernel (I think it's in sysctl.conf).  I have set swappiness to 0 for my servers, and I'm happy with it.  I don't know the exact mechanism, but it certainly appears that there's a memory pressure feedback of some sort going on between the kernel and the JVM.  Perhaps it has to do with the total commit charge appearing lower (just physical instead of physical + swap) when swappiness is low.  I'd love to hear from someone with a deep understanding of OS memory allocation about this.

Hope this helps,
Sandy


> -----Original Message-----
> From: Gaojinchao [mailto:gaojinchao@huawei.com]
> Sent: Saturday, December 03, 2011 19:58
> To: user@hbase.apache.org; dev@hbase.apache.org
> Cc: Chenjian; wenzaohua
> Subject: FeedbackRe: Suspected memory leak
> 
> Thank you for your help.
> 
> This issue appears to be a configuration problem:
> 1. HBase client uses NIO(socket) API that uses the direct memory.
> 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there
> doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using
> GC confiugre parameter of our client doesn't produce any "full gc".
> 
> This is only a preliminary result,  All tests is running, If have any further results
> , we will be fed back.
> Finally , I will update our story to issue
> https://issues.apache.org/jira/browse/HBASE-4633.
> 
> If our digging is crrect, whether we should set a default value for the "-
> XXMaxDirectMemorySize" to prevent this situation?
> 
> 
> Thanks
> 
> -----邮件原件-----
> 发件人: bijieshan [mailto:bijieshan@huawei.com]
> 发送时间: 2011年12月2日 15:37
> 收件人: dev@hbase.apache.org; user@hbase.apache.org
> 抄送: Chenjian; wenzaohua
> 主题: Re: Suspected memory leak
> 
> Thank you all.
> I think it's the same problem with the link provided by Stack. Because the
> heap-size is stabilized, but the non-heap size keep growing. So I think not the
> problem of the CMS GC bug.
> And we have known the content of the problem memory section, all the
> records contains the info like below:
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydi
> ywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"
> "BBZHtable_UFDR_058,048342220093168-02570"
> ........
> 
> Jieshan.
> 
> -----邮件原件-----
> 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> 发送时间: 2011年12月2日 4:20
> 收件人: dev@hbase.apache.org
> 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> 主题: Re: Suspected memory leak
> 
> Adding to the excellent write-up by Jonathan:
> Since finalizer is involved, it takes two GC cycles to collect them.  Due to a
> bug/bugs in the CMS GC, collection may not happen and the heap can grow
> really big.  See
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
> details.
> 
> Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket
> related objects were being collected properly. This option forces the
> concurrent marker to be one thread. This was for HDFS, but I think the same
> applies here.
> 
> Kihwal
> 
> On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
> 
> Make sure its not the issue that Jonathan Payne identifiied a while
> back:
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45b
> c7ba788b2357#
> St.Ack


RE: FeedbackRe: Suspected memory leak

Posted by Sandy Pratt <pr...@adobe.com>.
Gaojinchao,

I'm not certain, but this looks a lot like some of the issues I've been dealing with lately (namely, non-Java-heap memory leakage).

First, -XX:MaxDirectMemorySize doesn't seem to be a solution.  This flag is poorly documented, and moreover the problem appears to be related to releasing/reclaiming resources rather than over-allocating them.  See http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=ae283c11508fb97ede5fe27a1554b?bug_id=4469299

Second, you may wish to experiment with "-XX:+UseParallelGC -XX:+UseParallelOldGC" rather than CMS GC.  I have been trying this recently on some of my app servers and hadoop servers, and it certainly does fix the problem of non-Java heap growth.  The concern with parallel GC is that full GCs (which are the solution to the non-heap memory problem, it would seem) take too long.  Personally, I consider this reasoning fallacious, since full GC is bound to occur sooner or later, and when using the CMS GC with this bug in effect, they can be fatal (and even without this bug, CMS uses a single thread for a full GC AFAIK).  The numbers for parallel GC on a 2G heap are not terrible, even without tuning, even with old processors (max pause 2.8 sec, avg pause 1 sec for a full GC, with minor collections outnumbering the major at least 3:1, total overhead 1.3%).  If your application can tolerate a second or two of latency once in a while, you can switch to parallelOldGC and call it a day.  

The fact that some installations are trying to deal with ~24GB heaps sounds like a design issue to me; HBase and Hadoop are already designed to scale horizontally, and this emphasis on scaling vertically just because the hardware comes in a certain size sounds misguided.  But not having that hardware, I might be missing something.

Finally, you might look at changing the vm.swappiness parameter in the Linux kernel (I think it's in sysctl.conf).  I have set swappiness to 0 for my servers, and I'm happy with it.  I don't know the exact mechanism, but it certainly appears that there's a memory pressure feedback of some sort going on between the kernel and the JVM.  Perhaps it has to do with the total commit charge appearing lower (just physical instead of physical + swap) when swappiness is low.  I'd love to hear from someone with a deep understanding of OS memory allocation about this.

Hope this helps,
Sandy


> -----Original Message-----
> From: Gaojinchao [mailto:gaojinchao@huawei.com]
> Sent: Saturday, December 03, 2011 19:58
> To: user@hbase.apache.org; dev@hbase.apache.org
> Cc: Chenjian; wenzaohua
> Subject: FeedbackRe: Suspected memory leak
> 
> Thank you for your help.
> 
> This issue appears to be a configuration problem:
> 1. HBase client uses NIO(socket) API that uses the direct memory.
> 2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there
> doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using
> GC confiugre parameter of our client doesn't produce any "full gc".
> 
> This is only a preliminary result,  All tests is running, If have any further results
> , we will be fed back.
> Finally , I will update our story to issue
> https://issues.apache.org/jira/browse/HBASE-4633.
> 
> If our digging is crrect, whether we should set a default value for the "-
> XXMaxDirectMemorySize" to prevent this situation?
> 
> 
> Thanks
> 
> -----邮件原件-----
> 发件人: bijieshan [mailto:bijieshan@huawei.com]
> 发送时间: 2011年12月2日 15:37
> 收件人: dev@hbase.apache.org; user@hbase.apache.org
> 抄送: Chenjian; wenzaohua
> 主题: Re: Suspected memory leak
> 
> Thank you all.
> I think it's the same problem with the link provided by Stack. Because the
> heap-size is stabilized, but the non-heap size keep growing. So I think not the
> problem of the CMS GC bug.
> And we have known the content of the problem memory section, all the
> records contains the info like below:
> "|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydi
> ywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"
> "BBZHtable_UFDR_058,048342220093168-02570"
> ........
> 
> Jieshan.
> 
> -----邮件原件-----
> 发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com]
> 发送时间: 2011年12月2日 4:20
> 收件人: dev@hbase.apache.org
> 抄送: Ramakrishna s vasudevan; user@hbase.apache.org
> 主题: Re: Suspected memory leak
> 
> Adding to the excellent write-up by Jonathan:
> Since finalizer is involved, it takes two GC cycles to collect them.  Due to a
> bug/bugs in the CMS GC, collection may not happen and the heap can grow
> really big.  See
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for
> details.
> 
> Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket
> related objects were being collected properly. This option forces the
> concurrent marker to be one thread. This was for HDFS, but I think the same
> applies here.
> 
> Kihwal
> 
> On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:
> 
> Make sure its not the issue that Jonathan Payne identifiied a while
> back:
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45b
> c7ba788b2357#
> St.Ack


FeedbackRe: Suspected memory leak

Posted by Gaojinchao <ga...@huawei.com>.
Thank you for your help.

This issue appears to be a configuration problem:
1. HBase client uses NIO(socket) API that uses the direct memory.
2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".
   
This is only a preliminary result,  All tests is running, If have any further results , we will be fed back.
Finally , I will update our story to issue https://issues.apache.org/jira/browse/HBASE-4633.

If our digging is crrect, whether we should set a default value for the "-XXMaxDirectMemorySize" to prevent this situation?


Thanks

-----邮件原件-----
发件人: bijieshan [mailto:bijieshan@huawei.com] 
发送时间: 2011年12月2日 15:37
收件人: dev@hbase.apache.org; user@hbase.apache.org
抄送: Chenjian; wenzaohua
主题: Re: Suspected memory leak

Thank you all. 
I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. 
And we have known the content of the problem memory section, all the records contains the info like below:
"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"
"BBZHtable_UFDR_058,048342220093168-02570"
........

Jieshan.

-----邮件原件-----
发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com] 
发送时间: 2011年12月2日 4:20
收件人: dev@hbase.apache.org
抄送: Ramakrishna s vasudevan; user@hbase.apache.org
主题: Re: Suspected memory leak

Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack


FeedbackRe: Suspected memory leak

Posted by Gaojinchao <ga...@huawei.com>.
Thank you for your help.

This issue appears to be a configuration problem:
1. HBase client uses NIO(socket) API that uses the direct memory.
2. Default -XXMaxDirectMemorySize value is equal to -Xmx value, So if there doesn't have "full gc", all direct memory can't reclaim. Unfortunately, using GC confiugre parameter of our client doesn't produce any "full gc".
   
This is only a preliminary result,  All tests is running, If have any further results , we will be fed back.
Finally , I will update our story to issue https://issues.apache.org/jira/browse/HBASE-4633.

If our digging is crrect, whether we should set a default value for the "-XXMaxDirectMemorySize" to prevent this situation?


Thanks

-----邮件原件-----
发件人: bijieshan [mailto:bijieshan@huawei.com] 
发送时间: 2011年12月2日 15:37
收件人: dev@hbase.apache.org; user@hbase.apache.org
抄送: Chenjian; wenzaohua
主题: Re: Suspected memory leak

Thank you all. 
I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. 
And we have known the content of the problem memory section, all the records contains the info like below:
"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"
"BBZHtable_UFDR_058,048342220093168-02570"
........

Jieshan.

-----邮件原件-----
发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com] 
发送时间: 2011年12月2日 4:20
收件人: dev@hbase.apache.org
抄送: Ramakrishna s vasudevan; user@hbase.apache.org
主题: Re: Suspected memory leak

Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack


Re: Suspected memory leak

Posted by bijieshan <bi...@huawei.com>.
Thank you all. 
I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. 
And we have known the content of the problem memory section, all the records contains the info like below:
"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"
"BBZHtable_UFDR_058,048342220093168-02570"
........

Jieshan.

-----邮件原件-----
发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com] 
发送时间: 2011年12月2日 4:20
收件人: dev@hbase.apache.org
抄送: Ramakrishna s vasudevan; user@hbase.apache.org
主题: Re: Suspected memory leak

Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack


Re: Suspected memory leak

Posted by bijieshan <bi...@huawei.com>.
Thank you all. 
I think it's the same problem with the link provided by Stack. Because the heap-size is stabilized, but the non-heap size keep growing. So I think not the problem of the CMS GC bug. 
And we have known the content of the problem memory section, all the records contains the info like below:
"|www.hostname00000000000002087075.comlhggmdjapwpfvkqvxgnskzzydiywoacjnpljkarlehrnzzbpbxc||||||460|||||||||||Agent||||"
"BBZHtable_UFDR_058,048342220093168-02570"
........

Jieshan.

-----邮件原件-----
发件人: Kihwal Lee [mailto:kihwal@yahoo-inc.com] 
发送时间: 2011年12月2日 4:20
收件人: dev@hbase.apache.org
抄送: Ramakrishna s vasudevan; user@hbase.apache.org
主题: Re: Suspected memory leak

Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack


Re: Suspected memory leak

Posted by Kihwal Lee <ki...@yahoo-inc.com>.
Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack


Re: Suspected memory leak

Posted by Kihwal Lee <ki...@yahoo-inc.com>.
Adding to the excellent write-up by Jonathan:
Since finalizer is involved, it takes two GC cycles to collect them.  Due to a bug/bugs in the CMS GC, collection may not happen and the heap can grow really big.  See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7112034 for details.

Koji tried "-XX:-CMSConcurrentMTEnabled" and confirmed that all the socket related objects were being collected properly. This option forces the concurrent marker to be one thread. This was for HDFS, but I think the same applies here.

Kihwal

On 12/1/11 1:26 PM, "Stack" <st...@duboce.net> wrote:

Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack


Re: Suspected memory leak

Posted by Stack <st...@duboce.net>.
Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack

Re: Suspected memory leak

Posted by Stack <st...@duboce.net>.
Make sure its not the issue that Jonathan Payne identifiied a while
back: https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357#
St.Ack

RE: Suspected memory leak

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
You can create several heap dumps of JVM process in question and compare heap allocations
To create heap dump:

jmap pid

To analize:
1. jhat
2. visualvm
3. any commercial profiler

One note: -Xmn12G ??? How long is your minor collections GC pauses?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ramkrishna S Vasudevan [ramkrishna.vasudevan@huawei.com]
Sent: Wednesday, November 30, 2011 6:51 PM
To: user@hbase.apache.org; dev@hbase.apache.org
Subject: RE: Suspected memory leak

Adding dev list to get some suggestions.

Regards
Ram


-----Original Message-----
From: Shrijeet Paliwal [mailto:shrijeet@rocketfuel.com]
Sent: Thursday, December 01, 2011 8:08 AM
To: user@hbase.apache.org
Cc: Gaojinchao; Chenjian
Subject: Re: Suspected memory leak

Jieshan,
We backported https://issues.apache.org/jira/browse/HBASE-2937 to 0.90.3

-Shrijeet


2011/11/30 bijieshan <bi...@huawei.com>

> Hi Shrijeet,
>
> I think that's jira relevant to trunk, but not for 90.X. For there's no
> timeout mechanism in 90.X. Right?
> We found this problem in 90.x.
>
> Thanks,
>
> Jieshan.
>
> -----邮件原件-----
> 发件人: Shrijeet Paliwal [mailto:shrijeet@rocketfuel.com]
> 发送时间: 2011年12月1日 10:26
> 收件人: user@hbase.apache.org
> 抄送: Gaojinchao; Chenjian
> 主题: Re: Suspected memory leak
>
> Gaojinchao,
>
> I had filed this some time ago,
> https://issues.apache.org/jira/browse/HBASE-4633
> But after some recent insights on our application code, I am inclined to
> think leak (or memory 'hold') is in our application. But it will be good
to
> check out either way.
> I need to update the jira with my saga. See if the description of issue I
> posted there, matches yours. If not, may be you can update with your story
> in detail.
>
> -Shrijeet
>
> 2011/11/30 Gaojinchao <ga...@huawei.com>
>
> > I have noticed some memory leak problems in my HBase client.
> > RES has increased to 27g
> > PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > 12676 root      20   0 30.8g  27g 5092 S    2 57.5 587:57.76
> > /opt/java/jre/bin/java -Djava.library.path=lib/.
> >
> > But I am not sure the leak comes from HBase Client jar itself or just
our
> > client code.
> >
> > This is some parameters of jvm.
> > :-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC
> > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65
> > -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1
> > -XX:+CMSParallelRemarkEnabled
> >
> > Who has experience in this case? , I need continue to dig :)
> >
> >
> >
> > 发件人: Gaojinchao
> > 发送时间: 2011年11月30日 11:02
> > 收件人: user@hbase.apache.org
> > 主题: Suspected memory leak
> >
> > In HBaseClient proceess, I found heap has been increased.
> > I used command ’cat smaps’ to get the heap size.
> > It seems in case when the threads pool in HTable has released the no
> using
> > thread, if you use putlist api to put data again, the memory is
> increased.
> >
> > Who has experience in this case?
> >
> > Below is the heap of Hbase client:
> > C3S31:/proc/18769 # cat smaps
> > 4010a000-4709d000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             114252 kB
> > Rss:              114044 kB
> > Pss:              114044 kB
> >
> > 4010a000-4709d000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             114252 kB
> > Rss:              114044 kB
> > Pss:              114044 kB
> >
> > 4010a000-48374000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             133544 kB
> > Rss:              133336 kB
> > Pss:              133336 kB
> >
> > 4010a000-49f20000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             161880 kB
> > Rss:              161672 kB
> > Pss:              161672 kB
> >
> > 4010a000-4c5de000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             201552 kB
> > Rss:              201344 kB
> > Pss:              201344 kB
> >
>


Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: Suspected memory leak

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
You can create several heap dumps of JVM process in question and compare heap allocations
To create heap dump:

jmap pid

To analize:
1. jhat
2. visualvm
3. any commercial profiler

One note: -Xmn12G ??? How long is your minor collections GC pauses?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ramkrishna S Vasudevan [ramkrishna.vasudevan@huawei.com]
Sent: Wednesday, November 30, 2011 6:51 PM
To: user@hbase.apache.org; dev@hbase.apache.org
Subject: RE: Suspected memory leak

Adding dev list to get some suggestions.

Regards
Ram


-----Original Message-----
From: Shrijeet Paliwal [mailto:shrijeet@rocketfuel.com]
Sent: Thursday, December 01, 2011 8:08 AM
To: user@hbase.apache.org
Cc: Gaojinchao; Chenjian
Subject: Re: Suspected memory leak

Jieshan,
We backported https://issues.apache.org/jira/browse/HBASE-2937 to 0.90.3

-Shrijeet


2011/11/30 bijieshan <bi...@huawei.com>

> Hi Shrijeet,
>
> I think that's jira relevant to trunk, but not for 90.X. For there's no
> timeout mechanism in 90.X. Right?
> We found this problem in 90.x.
>
> Thanks,
>
> Jieshan.
>
> -----邮件原件-----
> 发件人: Shrijeet Paliwal [mailto:shrijeet@rocketfuel.com]
> 发送时间: 2011年12月1日 10:26
> 收件人: user@hbase.apache.org
> 抄送: Gaojinchao; Chenjian
> 主题: Re: Suspected memory leak
>
> Gaojinchao,
>
> I had filed this some time ago,
> https://issues.apache.org/jira/browse/HBASE-4633
> But after some recent insights on our application code, I am inclined to
> think leak (or memory 'hold') is in our application. But it will be good
to
> check out either way.
> I need to update the jira with my saga. See if the description of issue I
> posted there, matches yours. If not, may be you can update with your story
> in detail.
>
> -Shrijeet
>
> 2011/11/30 Gaojinchao <ga...@huawei.com>
>
> > I have noticed some memory leak problems in my HBase client.
> > RES has increased to 27g
> > PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> > 12676 root      20   0 30.8g  27g 5092 S    2 57.5 587:57.76
> > /opt/java/jre/bin/java -Djava.library.path=lib/.
> >
> > But I am not sure the leak comes from HBase Client jar itself or just
our
> > client code.
> >
> > This is some parameters of jvm.
> > :-Xms15g -Xmn12g -Xmx15g -XX:PermSize=64m -XX:+UseParNewGC
> > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=65
> > -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=1
> > -XX:+CMSParallelRemarkEnabled
> >
> > Who has experience in this case? , I need continue to dig :)
> >
> >
> >
> > 发件人: Gaojinchao
> > 发送时间: 2011年11月30日 11:02
> > 收件人: user@hbase.apache.org
> > 主题: Suspected memory leak
> >
> > In HBaseClient proceess, I found heap has been increased.
> > I used command ’cat smaps’ to get the heap size.
> > It seems in case when the threads pool in HTable has released the no
> using
> > thread, if you use putlist api to put data again, the memory is
> increased.
> >
> > Who has experience in this case?
> >
> > Below is the heap of Hbase client:
> > C3S31:/proc/18769 # cat smaps
> > 4010a000-4709d000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             114252 kB
> > Rss:              114044 kB
> > Pss:              114044 kB
> >
> > 4010a000-4709d000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             114252 kB
> > Rss:              114044 kB
> > Pss:              114044 kB
> >
> > 4010a000-48374000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             133544 kB
> > Rss:              133336 kB
> > Pss:              133336 kB
> >
> > 4010a000-49f20000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             161880 kB
> > Rss:              161672 kB
> > Pss:              161672 kB
> >
> > 4010a000-4c5de000 rwxp 00000000 00:00 0
> >  [heap]
> > Size:             201552 kB
> > Rss:              201344 kB
> > Pss:              201344 kB
> >
>


Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.