You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Shrijeet Paliwal <sh...@rocketfuel.com> on 2011/10/14 21:12:18 UTC

Potential memory leak in client RPC timeout mechanism

Hi All,

HBase version: 0.90.3 + Patches
Hadoop version: CDH3u0
Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937,
https://issues.apache.org/jira/browse/HBASE-4003

We have been using the 'hbase.client.operation.timeout' knob
introduced in 2937 for quite some time now. It helps us enforce SLA.
We have two HBase clusters and two HBase client clusters. One of them
is much busier than the other.

We have seen a deterministic behavior of clients running in busy
cluster. Their (client's) memory footprint increases consistently
after they have been up for roughly 24 hours.
This memory footprint almost doubles from its usual value (usual case
== RPC timeout disabled). After much investigation nothing concrete
came out and we had to put a hack
which keep heap size in control even when RPC timeout is enabled. Also
please note , the same behavior is not observed in 'not so busy
cluster.

The patch is here : https://gist.github.com/1288023

Can some one, who is also running RPC timeout in production under fair
load, please share the experience.

-Shrijeet

Re: Potential memory leak in client RPC timeout mechanism

Posted by Jean-Daniel Cryans <jd...@apache.org>.
We aren't running with those patches, but would it be possible for you to
heap dump one client? At least we would see exactly what's eating all the
memory.

Thx,

J-D

On Fri, Oct 14, 2011 at 12:12 PM, Shrijeet Paliwal
<sh...@rocketfuel.com>wrote:

> Hi All,
>
> HBase version: 0.90.3 + Patches
> Hadoop version: CDH3u0
> Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937,
> https://issues.apache.org/jira/browse/HBASE-4003
>
> We have been using the 'hbase.client.operation.timeout' knob
> introduced in 2937 for quite some time now. It helps us enforce SLA.
> We have two HBase clusters and two HBase client clusters. One of them
> is much busier than the other.
>
> We have seen a deterministic behavior of clients running in busy
> cluster. Their (client's) memory footprint increases consistently
> after they have been up for roughly 24 hours.
> This memory footprint almost doubles from its usual value (usual case
> == RPC timeout disabled). After much investigation nothing concrete
> came out and we had to put a hack
> which keep heap size in control even when RPC timeout is enabled. Also
> please note , the same behavior is not observed in 'not so busy
> cluster.
>
> The patch is here : https://gist.github.com/1288023
>
> Can some one, who is also running RPC timeout in production under fair
> load, please share the experience.
>
> -Shrijeet
>

Re: Potential memory leak in client RPC timeout mechanism

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Yeah it does that if the heap is big... mmm dev could be useful but it's a
guess, as what's eating all that mem might be too small there and buried
under the rest.

J-D

On Wed, Oct 19, 2011 at 7:00 PM, Shrijeet Paliwal
<sh...@rocketfuel.com>wrote:

> Stack, I have created https://issues.apache.org/jira/browse/HBASE-4633
> J-D, I had tried in past to get the dump in production (only
> environment where this is reproducible) but failed. The application
> freezes if any profiling activity is attempted.
> I can try and get you a dump from dev environment. Thanks for writing.
>
> -Shrijeet
>
> On Wed, Oct 19, 2011 at 4:23 PM, Stack <st...@duboce.net> wrote:
> > And file an issue please Shrijeet so we don't forget about it.
> > Thanks boss,
> > St.Ack
> >
> > On Fri, Oct 14, 2011 at 12:12 PM, Shrijeet Paliwal
> > <sh...@rocketfuel.com> wrote:
> >> Hi All,
> >>
> >> HBase version: 0.90.3 + Patches
> >> Hadoop version: CDH3u0
> >> Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937,
> >> https://issues.apache.org/jira/browse/HBASE-4003
> >>
> >> We have been using the 'hbase.client.operation.timeout' knob
> >> introduced in 2937 for quite some time now. It helps us enforce SLA.
> >> We have two HBase clusters and two HBase client clusters. One of them
> >> is much busier than the other.
> >>
> >> We have seen a deterministic behavior of clients running in busy
> >> cluster. Their (client's) memory footprint increases consistently
> >> after they have been up for roughly 24 hours.
> >> This memory footprint almost doubles from its usual value (usual case
> >> == RPC timeout disabled). After much investigation nothing concrete
> >> came out and we had to put a hack
> >> which keep heap size in control even when RPC timeout is enabled. Also
> >> please note , the same behavior is not observed in 'not so busy
> >> cluster.
> >>
> >> The patch is here : https://gist.github.com/1288023
> >>
> >> Can some one, who is also running RPC timeout in production under fair
> >> load, please share the experience.
> >>
> >> -Shrijeet
> >>
> >
>

Re: Potential memory leak in client RPC timeout mechanism

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
Stack, I have created https://issues.apache.org/jira/browse/HBASE-4633
J-D, I had tried in past to get the dump in production (only
environment where this is reproducible) but failed. The application
freezes if any profiling activity is attempted.
I can try and get you a dump from dev environment. Thanks for writing.

-Shrijeet

On Wed, Oct 19, 2011 at 4:23 PM, Stack <st...@duboce.net> wrote:
> And file an issue please Shrijeet so we don't forget about it.
> Thanks boss,
> St.Ack
>
> On Fri, Oct 14, 2011 at 12:12 PM, Shrijeet Paliwal
> <sh...@rocketfuel.com> wrote:
>> Hi All,
>>
>> HBase version: 0.90.3 + Patches
>> Hadoop version: CDH3u0
>> Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937,
>> https://issues.apache.org/jira/browse/HBASE-4003
>>
>> We have been using the 'hbase.client.operation.timeout' knob
>> introduced in 2937 for quite some time now. It helps us enforce SLA.
>> We have two HBase clusters and two HBase client clusters. One of them
>> is much busier than the other.
>>
>> We have seen a deterministic behavior of clients running in busy
>> cluster. Their (client's) memory footprint increases consistently
>> after they have been up for roughly 24 hours.
>> This memory footprint almost doubles from its usual value (usual case
>> == RPC timeout disabled). After much investigation nothing concrete
>> came out and we had to put a hack
>> which keep heap size in control even when RPC timeout is enabled. Also
>> please note , the same behavior is not observed in 'not so busy
>> cluster.
>>
>> The patch is here : https://gist.github.com/1288023
>>
>> Can some one, who is also running RPC timeout in production under fair
>> load, please share the experience.
>>
>> -Shrijeet
>>
>

Re: Potential memory leak in client RPC timeout mechanism

Posted by Stack <st...@duboce.net>.
And file an issue please Shrijeet so we don't forget about it.
Thanks boss,
St.Ack

On Fri, Oct 14, 2011 at 12:12 PM, Shrijeet Paliwal
<sh...@rocketfuel.com> wrote:
> Hi All,
>
> HBase version: 0.90.3 + Patches
> Hadoop version: CDH3u0
> Relevant Jiras: https://issues.apache.org/jira/browse/HBASE-2937,
> https://issues.apache.org/jira/browse/HBASE-4003
>
> We have been using the 'hbase.client.operation.timeout' knob
> introduced in 2937 for quite some time now. It helps us enforce SLA.
> We have two HBase clusters and two HBase client clusters. One of them
> is much busier than the other.
>
> We have seen a deterministic behavior of clients running in busy
> cluster. Their (client's) memory footprint increases consistently
> after they have been up for roughly 24 hours.
> This memory footprint almost doubles from its usual value (usual case
> == RPC timeout disabled). After much investigation nothing concrete
> came out and we had to put a hack
> which keep heap size in control even when RPC timeout is enabled. Also
> please note , the same behavior is not observed in 'not so busy
> cluster.
>
> The patch is here : https://gist.github.com/1288023
>
> Can some one, who is also running RPC timeout in production under fair
> load, please share the experience.
>
> -Shrijeet
>