You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Nitay <ni...@gmail.com> on 2009/04/08 21:39:50 UTC

Preventing SessionExpired events

Hey guys,

We've recently replaced a few pieces of HBase's cluster management and
coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that
he throws a lot of load at. Andrew's cluster was getting a lot of
SessionExpired events which were causing some havoc. After some discussion
on the hbase list and additional testing by Andrew (tweaking things like the
session timeout, quorum size, and GC used), we suspect the problem is that
the Java GC is starving the ZooKeeper hearbeat thread from executing.

There is a JIRA open on the matter where Joey suggests a solution that has
worked for him:

https://issues.apache.org/jira/browse/HBASE-1316

We wanted to loop you guys in to see if you have any thoughts/suggestions on
the matter.

Thanks,
-n

Re: Preventing SessionExpired events

Posted by Patrick Hunt <ph...@apache.org>.

This is good to know. It will allow us to try an replicate the 
situation, which we haven't been able to do.

I'm hoping we can come up with something that we can proactively do to 
address this...

Patrick

Nitay wrote:
> Also, I should mention that some of the errors Andrew was seeing are related
> to ZOOKEEPER-344:
> 
> 
> I see this kind of stuff:
> 
> 2009-04-07 17:58:13,344 - WARN  [NIOServerCxn.Factory:2181:
> NIOServerCnxn@417] - Exception
> causing close of session 0x2208296c38e0000 due to
> java.io.IOException: Read error
> 
> and bye bye HRS ephemeral znodes, which triggers
> (currently) HBASE-1314.
> 
> This I think is ZOOKEEPER-344
> 
>    https://issues.apache.org/jira/browse/ZOOKEEPER-344
> 
>   - Andy
> 
> 
> On Wed, Apr 8, 2009 at 12:39 PM, Nitay <ni...@gmail.com> wrote:
> 
>> Hey guys,
>>
>> We've recently replaced a few pieces of HBase's cluster management and
>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that
>> he throws a lot of load at. Andrew's cluster was getting a lot of
>> SessionExpired events which were causing some havoc. After some discussion
>> on the hbase list and additional testing by Andrew (tweaking things like the
>> session timeout, quorum size, and GC used), we suspect the problem is that
>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>>
>> There is a JIRA open on the matter where Joey suggests a solution that has
>> worked for him:
>>
>> https://issues.apache.org/jira/browse/HBASE-1316
>>
>> We wanted to loop you guys in to see if you have any thoughts/suggestions
>> on the matter.
>>
>> Thanks,
>> -n
>>
>

Re: Preventing SessionExpired events

Posted by Patrick Hunt <ph...@apache.org>.

This is good to know. It will allow us to try an replicate the 
situation, which we haven't been able to do.

I'm hoping we can come up with something that we can proactively do to 
address this...

Patrick

Nitay wrote:
> Also, I should mention that some of the errors Andrew was seeing are related
> to ZOOKEEPER-344:
> 
> 
> I see this kind of stuff:
> 
> 2009-04-07 17:58:13,344 - WARN  [NIOServerCxn.Factory:2181:
> NIOServerCnxn@417] - Exception
> causing close of session 0x2208296c38e0000 due to
> java.io.IOException: Read error
> 
> and bye bye HRS ephemeral znodes, which triggers
> (currently) HBASE-1314.
> 
> This I think is ZOOKEEPER-344
> 
>    https://issues.apache.org/jira/browse/ZOOKEEPER-344
> 
>   - Andy
> 
> 
> On Wed, Apr 8, 2009 at 12:39 PM, Nitay <ni...@gmail.com> wrote:
> 
>> Hey guys,
>>
>> We've recently replaced a few pieces of HBase's cluster management and
>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that
>> he throws a lot of load at. Andrew's cluster was getting a lot of
>> SessionExpired events which were causing some havoc. After some discussion
>> on the hbase list and additional testing by Andrew (tweaking things like the
>> session timeout, quorum size, and GC used), we suspect the problem is that
>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>>
>> There is a JIRA open on the matter where Joey suggests a solution that has
>> worked for him:
>>
>> https://issues.apache.org/jira/browse/HBASE-1316
>>
>> We wanted to loop you guys in to see if you have any thoughts/suggestions
>> on the matter.
>>
>> Thanks,
>> -n
>>
>

Re: Preventing SessionExpired events

Posted by Nitay <ni...@gmail.com>.

Also, I should mention that some of the errors Andrew was seeing are related
to ZOOKEEPER-344:


I see this kind of stuff:

2009-04-07 17:58:13,344 - WARN  [NIOServerCxn.Factory:2181:
NIOServerCnxn@417] - Exception
causing close of session 0x2208296c38e0000 due to
java.io.IOException: Read error

and bye bye HRS ephemeral znodes, which triggers
(currently) HBASE-1314.

This I think is ZOOKEEPER-344

   https://issues.apache.org/jira/browse/ZOOKEEPER-344

  - Andy


On Wed, Apr 8, 2009 at 12:39 PM, Nitay <ni...@gmail.com> wrote:

> Hey guys,
>
> We've recently replaced a few pieces of HBase's cluster management and
> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that
> he throws a lot of load at. Andrew's cluster was getting a lot of
> SessionExpired events which were causing some havoc. After some discussion
> on the hbase list and additional testing by Andrew (tweaking things like the
> session timeout, quorum size, and GC used), we suspect the problem is that
> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>
> There is a JIRA open on the matter where Joey suggests a solution that has
> worked for him:
>
> https://issues.apache.org/jira/browse/HBASE-1316
>
> We wanted to loop you guys in to see if you have any thoughts/suggestions
> on the matter.
>
> Thanks,
> -n
>

Re: Preventing SessionExpired events

Posted by Nitay <ni...@gmail.com>.

Also, I should mention that some of the errors Andrew was seeing are related
to ZOOKEEPER-344:


I see this kind of stuff:

2009-04-07 17:58:13,344 - WARN  [NIOServerCxn.Factory:2181:
NIOServerCnxn@417] - Exception
causing close of session 0x2208296c38e0000 due to
java.io.IOException: Read error

and bye bye HRS ephemeral znodes, which triggers
(currently) HBASE-1314.

This I think is ZOOKEEPER-344

   https://issues.apache.org/jira/browse/ZOOKEEPER-344

  - Andy


On Wed, Apr 8, 2009 at 12:39 PM, Nitay <ni...@gmail.com> wrote:

> Hey guys,
>
> We've recently replaced a few pieces of HBase's cluster management and
> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that
> he throws a lot of load at. Andrew's cluster was getting a lot of
> SessionExpired events which were causing some havoc. After some discussion
> on the hbase list and additional testing by Andrew (tweaking things like the
> session timeout, quorum size, and GC used), we suspect the problem is that
> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>
> There is a JIRA open on the matter where Joey suggests a solution that has
> worked for him:
>
> https://issues.apache.org/jira/browse/HBASE-1316
>
> We wanted to loop you guys in to see if you have any thoughts/suggestions
> on the matter.
>
> Thanks,
> -n
>

RE: Preventing SessionExpired events

Posted by Benjamin Reed <br...@yahoo-inc.com>.

I'm curious about your session scenario. If your servers can hang for X seconds without a problem and Y is your session timeout, why would you set Y < X? In your case if you set your session timeout to 5 secs for example, but your server can hang for 20 seconds doing GC, your clients cannot expect a response of less than 20 seconds, so why don't you set your session timeout to 20 seconds?

ben

________________________________________
From: Joey Echeverria [joey42@gmail.com]
Sent: Wednesday, April 08, 2009 1:47 PM
To: hbase-dev@hadoop.apache.org
Cc: zookeeper-dev@hadoop.apache.org
Subject: Re: Preventing SessionExpired events

Nitay is correct about the native threads. Using the pure Java API,
the garbage collector will occasionally pause other Java threads to do
a full mark and sweep. Even switching to the concurrent collector only
delays the problem. The issues is mixing a high throughput application
(HBase) with a low latency library (Zookeeper). Systems like HBase
live on relatively large numbers of short lived objects. You only key
keys and values long enough for the Memcache to get full then you
write all the data to HDFS and throw away the objects.

You can patch around the issue with object pools, but ultimately you
need to insulate zk from the GC pauses. In our experience, the best
way to do that was a jni wrapper around the zk C api. Since the C api
uses it's own posix threads, it's protected from the GC. In the system
we wrote, we ended up using the Java api with a large session timeout
for most everything, and used the jni code just for creating ephemeral
nodes.

-Joey

On Wed, Apr 8, 2009 at 9:35 PM, Nitay <ni...@gmail.com> wrote:
> The default session timeout in HBase is currently 10 seconds. Bumping it up
> to 30 and 60 reduced SessionExpired exceptions, according to Andrew. I
> believe Andrew did run it under jconsole. He was also tuning GC parameters.
> He mentioned running using incremental garbage collector
> (-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode). He can provide more
> details on all of this.
>
> My understanding with HBASE-1316 is that it solves the problem because the
> ZooKeeper IO/hearbeat thread becomes an OS level thread which is not managed
> by Java. Hence, the GC does not starve it. Joey can comment here as he
> developed the solution.
>
> There are three main components that use ZooKeeper in HBase are: client,
> regionserver, and master.
>
> The client does not have ephemeral nodes so having something like
> ZOOKEEPER-321 for it would be nice. It is currently read only. For now
> recovering it by reinitializing the ZooKeeper handle is not a big deal.
>
> The bigger issue is with the master and regionserver, which do use ephemeral
> nodes. Recovering them is a bit tougher, and we'd like to prevent getting
> SessionExpired as much as possible.
>
> On Wed, Apr 8, 2009 at 1:17 PM, Patrick Hunt <ph...@apache.org> wrote:
>
>> What are you running for a session timeout on your clients?
>>
>> Can you run with something like jvisualvm or jconsole, and watch the gc
>> activity when the session timeouts occur? Might give you some insight.
>> Have you tried one of the alternative GC's available in the VM?
>>
>> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx
>> ie "Flags for Latency Applications"
>>
>> We are also working on the following jira:
>> https://issues.apache.org/jira/browse/ZOOKEEPER-321
>> which will eliminate session expirations for clients w/o ephemerals. (is
>> this the case for you?)
>>
>> Try turning on debug in your client, the client will spit out:
>>   LOG.debug("Got ping response for sessionid:0x"
>> If you turn on trace logging in the server you should see session updates
>> there as well (c->server, which control session expiration).
>>
>> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code still
>> running w/in the same (vm) process?
>>
>>
>> Unfortunately I can't think of anything else if it is the GC. Basically
>> you'd have to increase the timeout or try another gc with lower latency.
>>
>> Perhaps Mahadev/Ben/Flavio might have insight...
>>
>> Patrick
>>
>>
>> Nitay wrote:
>>
>>> Hey guys,
>>>
>>> We've recently replaced a few pieces of HBase's cluster management and
>>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster
>>> that
>>> he throws a lot of load at. Andrew's cluster was getting a lot of
>>> SessionExpired events which were causing some havoc. After some discussion
>>> on the hbase list and additional testing by Andrew (tweaking things like
>>> the
>>> session timeout, quorum size, and GC used), we suspect the problem is that
>>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>>>
>>> There is a JIRA open on the matter where Joey suggests a solution that has
>>> worked for him:
>>>
>>> https://issues.apache.org/jira/browse/HBASE-1316
>>>
>>> We wanted to loop you guys in to see if you have any thoughts/suggestions
>>> on
>>> the matter.
>>>
>>> Thanks,
>>> -n
>>>
>>>
>

Re: Preventing SessionExpired events

Posted by Patrick Hunt <ph...@apache.org>.

I see what you are saying, that's interesting. As Mahadev mentioned on 
another thread, we'd be interested to look at the JNI you've done. 
Perhaps we will need to include this as an option (and document where 
it's necessary, etc...) for other users who might run into this.

Thanks much for bringing this to our attention! Providing a fix: 
priceless. ;-)

Regards,

Patrick

Joey Echeverria wrote:
> Nitay is correct about the native threads. Using the pure Java API,
> the garbage collector will occasionally pause other Java threads to do
> a full mark and sweep. Even switching to the concurrent collector only
> delays the problem. The issues is mixing a high throughput application
> (HBase) with a low latency library (Zookeeper). Systems like HBase
> live on relatively large numbers of short lived objects. You only key
> keys and values long enough for the Memcache to get full then you
> write all the data to HDFS and throw away the objects.
> 
> You can patch around the issue with object pools, but ultimately you
> need to insulate zk from the GC pauses. In our experience, the best
> way to do that was a jni wrapper around the zk C api. Since the C api
> uses it's own posix threads, it's protected from the GC. In the system
> we wrote, we ended up using the Java api with a large session timeout
> for most everything, and used the jni code just for creating ephemeral
> nodes.
> 
> -Joey
> 
> On Wed, Apr 8, 2009 at 9:35 PM, Nitay <ni...@gmail.com> wrote:
>> The default session timeout in HBase is currently 10 seconds. Bumping it up
>> to 30 and 60 reduced SessionExpired exceptions, according to Andrew. I
>> believe Andrew did run it under jconsole. He was also tuning GC parameters.
>> He mentioned running using incremental garbage collector
>> (-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode). He can provide more
>> details on all of this.
>>
>> My understanding with HBASE-1316 is that it solves the problem because the
>> ZooKeeper IO/hearbeat thread becomes an OS level thread which is not managed
>> by Java. Hence, the GC does not starve it. Joey can comment here as he
>> developed the solution.
>>
>> There are three main components that use ZooKeeper in HBase are: client,
>> regionserver, and master.
>>
>> The client does not have ephemeral nodes so having something like
>> ZOOKEEPER-321 for it would be nice. It is currently read only. For now
>> recovering it by reinitializing the ZooKeeper handle is not a big deal.
>>
>> The bigger issue is with the master and regionserver, which do use ephemeral
>> nodes. Recovering them is a bit tougher, and we'd like to prevent getting
>> SessionExpired as much as possible.
>>
>> On Wed, Apr 8, 2009 at 1:17 PM, Patrick Hunt <ph...@apache.org> wrote:
>>
>>> What are you running for a session timeout on your clients?
>>>
>>> Can you run with something like jvisualvm or jconsole, and watch the gc
>>> activity when the session timeouts occur? Might give you some insight.
>>> Have you tried one of the alternative GC's available in the VM?
>>>
>>> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx
>>> ie "Flags for Latency Applications"
>>>
>>> We are also working on the following jira:
>>> https://issues.apache.org/jira/browse/ZOOKEEPER-321
>>> which will eliminate session expirations for clients w/o ephemerals. (is
>>> this the case for you?)
>>>
>>> Try turning on debug in your client, the client will spit out:
>>>   LOG.debug("Got ping response for sessionid:0x"
>>> If you turn on trace logging in the server you should see session updates
>>> there as well (c->server, which control session expiration).
>>>
>>> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code still
>>> running w/in the same (vm) process?
>>>
>>>
>>> Unfortunately I can't think of anything else if it is the GC. Basically
>>> you'd have to increase the timeout or try another gc with lower latency.
>>>
>>> Perhaps Mahadev/Ben/Flavio might have insight...
>>>
>>> Patrick
>>>
>>>
>>> Nitay wrote:
>>>
>>>> Hey guys,
>>>>
>>>> We've recently replaced a few pieces of HBase's cluster management and
>>>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster
>>>> that
>>>> he throws a lot of load at. Andrew's cluster was getting a lot of
>>>> SessionExpired events which were causing some havoc. After some discussion
>>>> on the hbase list and additional testing by Andrew (tweaking things like
>>>> the
>>>> session timeout, quorum size, and GC used), we suspect the problem is that
>>>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>>>>
>>>> There is a JIRA open on the matter where Joey suggests a solution that has
>>>> worked for him:
>>>>
>>>> https://issues.apache.org/jira/browse/HBASE-1316
>>>>
>>>> We wanted to loop you guys in to see if you have any thoughts/suggestions
>>>> on
>>>> the matter.
>>>>
>>>> Thanks,
>>>> -n
>>>>
>>>>

Re: Preventing SessionExpired events

Posted by Patrick Hunt <ph...@apache.org>.

I see what you are saying, that's interesting. As Mahadev mentioned on 
another thread, we'd be interested to look at the JNI you've done. 
Perhaps we will need to include this as an option (and document where 
it's necessary, etc...) for other users who might run into this.

Thanks much for bringing this to our attention! Providing a fix: 
priceless. ;-)

Regards,

Patrick

Joey Echeverria wrote:
> Nitay is correct about the native threads. Using the pure Java API,
> the garbage collector will occasionally pause other Java threads to do
> a full mark and sweep. Even switching to the concurrent collector only
> delays the problem. The issues is mixing a high throughput application
> (HBase) with a low latency library (Zookeeper). Systems like HBase
> live on relatively large numbers of short lived objects. You only key
> keys and values long enough for the Memcache to get full then you
> write all the data to HDFS and throw away the objects.
> 
> You can patch around the issue with object pools, but ultimately you
> need to insulate zk from the GC pauses. In our experience, the best
> way to do that was a jni wrapper around the zk C api. Since the C api
> uses it's own posix threads, it's protected from the GC. In the system
> we wrote, we ended up using the Java api with a large session timeout
> for most everything, and used the jni code just for creating ephemeral
> nodes.
> 
> -Joey
> 
> On Wed, Apr 8, 2009 at 9:35 PM, Nitay <ni...@gmail.com> wrote:
>> The default session timeout in HBase is currently 10 seconds. Bumping it up
>> to 30 and 60 reduced SessionExpired exceptions, according to Andrew. I
>> believe Andrew did run it under jconsole. He was also tuning GC parameters.
>> He mentioned running using incremental garbage collector
>> (-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode). He can provide more
>> details on all of this.
>>
>> My understanding with HBASE-1316 is that it solves the problem because the
>> ZooKeeper IO/hearbeat thread becomes an OS level thread which is not managed
>> by Java. Hence, the GC does not starve it. Joey can comment here as he
>> developed the solution.
>>
>> There are three main components that use ZooKeeper in HBase are: client,
>> regionserver, and master.
>>
>> The client does not have ephemeral nodes so having something like
>> ZOOKEEPER-321 for it would be nice. It is currently read only. For now
>> recovering it by reinitializing the ZooKeeper handle is not a big deal.
>>
>> The bigger issue is with the master and regionserver, which do use ephemeral
>> nodes. Recovering them is a bit tougher, and we'd like to prevent getting
>> SessionExpired as much as possible.
>>
>> On Wed, Apr 8, 2009 at 1:17 PM, Patrick Hunt <ph...@apache.org> wrote:
>>
>>> What are you running for a session timeout on your clients?
>>>
>>> Can you run with something like jvisualvm or jconsole, and watch the gc
>>> activity when the session timeouts occur? Might give you some insight.
>>> Have you tried one of the alternative GC's available in the VM?
>>>
>>> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx
>>> ie "Flags for Latency Applications"
>>>
>>> We are also working on the following jira:
>>> https://issues.apache.org/jira/browse/ZOOKEEPER-321
>>> which will eliminate session expirations for clients w/o ephemerals. (is
>>> this the case for you?)
>>>
>>> Try turning on debug in your client, the client will spit out:
>>>   LOG.debug("Got ping response for sessionid:0x"
>>> If you turn on trace logging in the server you should see session updates
>>> there as well (c->server, which control session expiration).
>>>
>>> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code still
>>> running w/in the same (vm) process?
>>>
>>>
>>> Unfortunately I can't think of anything else if it is the GC. Basically
>>> you'd have to increase the timeout or try another gc with lower latency.
>>>
>>> Perhaps Mahadev/Ben/Flavio might have insight...
>>>
>>> Patrick
>>>
>>>
>>> Nitay wrote:
>>>
>>>> Hey guys,
>>>>
>>>> We've recently replaced a few pieces of HBase's cluster management and
>>>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster
>>>> that
>>>> he throws a lot of load at. Andrew's cluster was getting a lot of
>>>> SessionExpired events which were causing some havoc. After some discussion
>>>> on the hbase list and additional testing by Andrew (tweaking things like
>>>> the
>>>> session timeout, quorum size, and GC used), we suspect the problem is that
>>>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>>>>
>>>> There is a JIRA open on the matter where Joey suggests a solution that has
>>>> worked for him:
>>>>
>>>> https://issues.apache.org/jira/browse/HBASE-1316
>>>>
>>>> We wanted to loop you guys in to see if you have any thoughts/suggestions
>>>> on
>>>> the matter.
>>>>
>>>> Thanks,
>>>> -n
>>>>
>>>>

RE: Preventing SessionExpired events

Posted by Benjamin Reed <br...@yahoo-inc.com>.

I'm curious about your session scenario. If your servers can hang for X seconds without a problem and Y is your session timeout, why would you set Y < X? In your case if you set your session timeout to 5 secs for example, but your server can hang for 20 seconds doing GC, your clients cannot expect a response of less than 20 seconds, so why don't you set your session timeout to 20 seconds?

ben

________________________________________
From: Joey Echeverria [joey42@gmail.com]
Sent: Wednesday, April 08, 2009 1:47 PM
To: hbase-dev@hadoop.apache.org
Cc: zookeeper-dev@hadoop.apache.org
Subject: Re: Preventing SessionExpired events

Nitay is correct about the native threads. Using the pure Java API,
the garbage collector will occasionally pause other Java threads to do
a full mark and sweep. Even switching to the concurrent collector only
delays the problem. The issues is mixing a high throughput application
(HBase) with a low latency library (Zookeeper). Systems like HBase
live on relatively large numbers of short lived objects. You only key
keys and values long enough for the Memcache to get full then you
write all the data to HDFS and throw away the objects.

You can patch around the issue with object pools, but ultimately you
need to insulate zk from the GC pauses. In our experience, the best
way to do that was a jni wrapper around the zk C api. Since the C api
uses it's own posix threads, it's protected from the GC. In the system
we wrote, we ended up using the Java api with a large session timeout
for most everything, and used the jni code just for creating ephemeral
nodes.

-Joey

On Wed, Apr 8, 2009 at 9:35 PM, Nitay <ni...@gmail.com> wrote:
> The default session timeout in HBase is currently 10 seconds. Bumping it up
> to 30 and 60 reduced SessionExpired exceptions, according to Andrew. I
> believe Andrew did run it under jconsole. He was also tuning GC parameters.
> He mentioned running using incremental garbage collector
> (-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode). He can provide more
> details on all of this.
>
> My understanding with HBASE-1316 is that it solves the problem because the
> ZooKeeper IO/hearbeat thread becomes an OS level thread which is not managed
> by Java. Hence, the GC does not starve it. Joey can comment here as he
> developed the solution.
>
> There are three main components that use ZooKeeper in HBase are: client,
> regionserver, and master.
>
> The client does not have ephemeral nodes so having something like
> ZOOKEEPER-321 for it would be nice. It is currently read only. For now
> recovering it by reinitializing the ZooKeeper handle is not a big deal.
>
> The bigger issue is with the master and regionserver, which do use ephemeral
> nodes. Recovering them is a bit tougher, and we'd like to prevent getting
> SessionExpired as much as possible.
>
> On Wed, Apr 8, 2009 at 1:17 PM, Patrick Hunt <ph...@apache.org> wrote:
>
>> What are you running for a session timeout on your clients?
>>
>> Can you run with something like jvisualvm or jconsole, and watch the gc
>> activity when the session timeouts occur? Might give you some insight.
>> Have you tried one of the alternative GC's available in the VM?
>>
>> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx
>> ie "Flags for Latency Applications"
>>
>> We are also working on the following jira:
>> https://issues.apache.org/jira/browse/ZOOKEEPER-321
>> which will eliminate session expirations for clients w/o ephemerals. (is
>> this the case for you?)
>>
>> Try turning on debug in your client, the client will spit out:
>>   LOG.debug("Got ping response for sessionid:0x"
>> If you turn on trace logging in the server you should see session updates
>> there as well (c->server, which control session expiration).
>>
>> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code still
>> running w/in the same (vm) process?
>>
>>
>> Unfortunately I can't think of anything else if it is the GC. Basically
>> you'd have to increase the timeout or try another gc with lower latency.
>>
>> Perhaps Mahadev/Ben/Flavio might have insight...
>>
>> Patrick
>>
>>
>> Nitay wrote:
>>
>>> Hey guys,
>>>
>>> We've recently replaced a few pieces of HBase's cluster management and
>>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster
>>> that
>>> he throws a lot of load at. Andrew's cluster was getting a lot of
>>> SessionExpired events which were causing some havoc. After some discussion
>>> on the hbase list and additional testing by Andrew (tweaking things like
>>> the
>>> session timeout, quorum size, and GC used), we suspect the problem is that
>>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>>>
>>> There is a JIRA open on the matter where Joey suggests a solution that has
>>> worked for him:
>>>
>>> https://issues.apache.org/jira/browse/HBASE-1316
>>>
>>> We wanted to loop you guys in to see if you have any thoughts/suggestions
>>> on
>>> the matter.
>>>
>>> Thanks,
>>> -n
>>>
>>>
>

Re: Preventing SessionExpired events

Posted by Joey Echeverria <jo...@gmail.com>.

Nitay is correct about the native threads. Using the pure Java API,
the garbage collector will occasionally pause other Java threads to do
a full mark and sweep. Even switching to the concurrent collector only
delays the problem. The issues is mixing a high throughput application
(HBase) with a low latency library (Zookeeper). Systems like HBase
live on relatively large numbers of short lived objects. You only key
keys and values long enough for the Memcache to get full then you
write all the data to HDFS and throw away the objects.

You can patch around the issue with object pools, but ultimately you
need to insulate zk from the GC pauses. In our experience, the best
way to do that was a jni wrapper around the zk C api. Since the C api
uses it's own posix threads, it's protected from the GC. In the system
we wrote, we ended up using the Java api with a large session timeout
for most everything, and used the jni code just for creating ephemeral
nodes.

-Joey

On Wed, Apr 8, 2009 at 9:35 PM, Nitay <ni...@gmail.com> wrote:
> The default session timeout in HBase is currently 10 seconds. Bumping it up
> to 30 and 60 reduced SessionExpired exceptions, according to Andrew. I
> believe Andrew did run it under jconsole. He was also tuning GC parameters.
> He mentioned running using incremental garbage collector
> (-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode). He can provide more
> details on all of this.
>
> My understanding with HBASE-1316 is that it solves the problem because the
> ZooKeeper IO/hearbeat thread becomes an OS level thread which is not managed
> by Java. Hence, the GC does not starve it. Joey can comment here as he
> developed the solution.
>
> There are three main components that use ZooKeeper in HBase are: client,
> regionserver, and master.
>
> The client does not have ephemeral nodes so having something like
> ZOOKEEPER-321 for it would be nice. It is currently read only. For now
> recovering it by reinitializing the ZooKeeper handle is not a big deal.
>
> The bigger issue is with the master and regionserver, which do use ephemeral
> nodes. Recovering them is a bit tougher, and we'd like to prevent getting
> SessionExpired as much as possible.
>
> On Wed, Apr 8, 2009 at 1:17 PM, Patrick Hunt <ph...@apache.org> wrote:
>
>> What are you running for a session timeout on your clients?
>>
>> Can you run with something like jvisualvm or jconsole, and watch the gc
>> activity when the session timeouts occur? Might give you some insight.
>> Have you tried one of the alternative GC's available in the VM?
>>
>> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx
>> ie "Flags for Latency Applications"
>>
>> We are also working on the following jira:
>> https://issues.apache.org/jira/browse/ZOOKEEPER-321
>> which will eliminate session expirations for clients w/o ephemerals. (is
>> this the case for you?)
>>
>> Try turning on debug in your client, the client will spit out:
>>   LOG.debug("Got ping response for sessionid:0x"
>> If you turn on trace logging in the server you should see session updates
>> there as well (c->server, which control session expiration).
>>
>> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code still
>> running w/in the same (vm) process?
>>
>>
>> Unfortunately I can't think of anything else if it is the GC. Basically
>> you'd have to increase the timeout or try another gc with lower latency.
>>
>> Perhaps Mahadev/Ben/Flavio might have insight...
>>
>> Patrick
>>
>>
>> Nitay wrote:
>>
>>> Hey guys,
>>>
>>> We've recently replaced a few pieces of HBase's cluster management and
>>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster
>>> that
>>> he throws a lot of load at. Andrew's cluster was getting a lot of
>>> SessionExpired events which were causing some havoc. After some discussion
>>> on the hbase list and additional testing by Andrew (tweaking things like
>>> the
>>> session timeout, quorum size, and GC used), we suspect the problem is that
>>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>>>
>>> There is a JIRA open on the matter where Joey suggests a solution that has
>>> worked for him:
>>>
>>> https://issues.apache.org/jira/browse/HBASE-1316
>>>
>>> We wanted to loop you guys in to see if you have any thoughts/suggestions
>>> on
>>> the matter.
>>>
>>> Thanks,
>>> -n
>>>
>>>
>

Re: Preventing SessionExpired events

Posted by Joey Echeverria <jo...@gmail.com>.

Nitay is correct about the native threads. Using the pure Java API,
the garbage collector will occasionally pause other Java threads to do
a full mark and sweep. Even switching to the concurrent collector only
delays the problem. The issues is mixing a high throughput application
(HBase) with a low latency library (Zookeeper). Systems like HBase
live on relatively large numbers of short lived objects. You only key
keys and values long enough for the Memcache to get full then you
write all the data to HDFS and throw away the objects.

You can patch around the issue with object pools, but ultimately you
need to insulate zk from the GC pauses. In our experience, the best
way to do that was a jni wrapper around the zk C api. Since the C api
uses it's own posix threads, it's protected from the GC. In the system
we wrote, we ended up using the Java api with a large session timeout
for most everything, and used the jni code just for creating ephemeral
nodes.

-Joey

On Wed, Apr 8, 2009 at 9:35 PM, Nitay <ni...@gmail.com> wrote:
> The default session timeout in HBase is currently 10 seconds. Bumping it up
> to 30 and 60 reduced SessionExpired exceptions, according to Andrew. I
> believe Andrew did run it under jconsole. He was also tuning GC parameters.
> He mentioned running using incremental garbage collector
> (-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode). He can provide more
> details on all of this.
>
> My understanding with HBASE-1316 is that it solves the problem because the
> ZooKeeper IO/hearbeat thread becomes an OS level thread which is not managed
> by Java. Hence, the GC does not starve it. Joey can comment here as he
> developed the solution.
>
> There are three main components that use ZooKeeper in HBase are: client,
> regionserver, and master.
>
> The client does not have ephemeral nodes so having something like
> ZOOKEEPER-321 for it would be nice. It is currently read only. For now
> recovering it by reinitializing the ZooKeeper handle is not a big deal.
>
> The bigger issue is with the master and regionserver, which do use ephemeral
> nodes. Recovering them is a bit tougher, and we'd like to prevent getting
> SessionExpired as much as possible.
>
> On Wed, Apr 8, 2009 at 1:17 PM, Patrick Hunt <ph...@apache.org> wrote:
>
>> What are you running for a session timeout on your clients?
>>
>> Can you run with something like jvisualvm or jconsole, and watch the gc
>> activity when the session timeouts occur? Might give you some insight.
>> Have you tried one of the alternative GC's available in the VM?
>>
>> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx
>> ie "Flags for Latency Applications"
>>
>> We are also working on the following jira:
>> https://issues.apache.org/jira/browse/ZOOKEEPER-321
>> which will eliminate session expirations for clients w/o ephemerals. (is
>> this the case for you?)
>>
>> Try turning on debug in your client, the client will spit out:
>>   LOG.debug("Got ping response for sessionid:0x"
>> If you turn on trace logging in the server you should see session updates
>> there as well (c->server, which control session expiration).
>>
>> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code still
>> running w/in the same (vm) process?
>>
>>
>> Unfortunately I can't think of anything else if it is the GC. Basically
>> you'd have to increase the timeout or try another gc with lower latency.
>>
>> Perhaps Mahadev/Ben/Flavio might have insight...
>>
>> Patrick
>>
>>
>> Nitay wrote:
>>
>>> Hey guys,
>>>
>>> We've recently replaced a few pieces of HBase's cluster management and
>>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster
>>> that
>>> he throws a lot of load at. Andrew's cluster was getting a lot of
>>> SessionExpired events which were causing some havoc. After some discussion
>>> on the hbase list and additional testing by Andrew (tweaking things like
>>> the
>>> session timeout, quorum size, and GC used), we suspect the problem is that
>>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>>>
>>> There is a JIRA open on the matter where Joey suggests a solution that has
>>> worked for him:
>>>
>>> https://issues.apache.org/jira/browse/HBASE-1316
>>>
>>> We wanted to loop you guys in to see if you have any thoughts/suggestions
>>> on
>>> the matter.
>>>
>>> Thanks,
>>> -n
>>>
>>>
>

Re: Preventing SessionExpired events

Posted by Nitay <ni...@gmail.com>.

The default session timeout in HBase is currently 10 seconds. Bumping it up
to 30 and 60 reduced SessionExpired exceptions, according to Andrew. I
believe Andrew did run it under jconsole. He was also tuning GC parameters.
He mentioned running using incremental garbage collector
(-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode). He can provide more
details on all of this.

My understanding with HBASE-1316 is that it solves the problem because the
ZooKeeper IO/hearbeat thread becomes an OS level thread which is not managed
by Java. Hence, the GC does not starve it. Joey can comment here as he
developed the solution.

There are three main components that use ZooKeeper in HBase are: client,
regionserver, and master.

The client does not have ephemeral nodes so having something like
ZOOKEEPER-321 for it would be nice. It is currently read only. For now
recovering it by reinitializing the ZooKeeper handle is not a big deal.

The bigger issue is with the master and regionserver, which do use ephemeral
nodes. Recovering them is a bit tougher, and we'd like to prevent getting
SessionExpired as much as possible.

On Wed, Apr 8, 2009 at 1:17 PM, Patrick Hunt <ph...@apache.org> wrote:

> What are you running for a session timeout on your clients?
>
> Can you run with something like jvisualvm or jconsole, and watch the gc
> activity when the session timeouts occur? Might give you some insight.
> Have you tried one of the alternative GC's available in the VM?
>
> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx
> ie "Flags for Latency Applications"
>
> We are also working on the following jira:
> https://issues.apache.org/jira/browse/ZOOKEEPER-321
> which will eliminate session expirations for clients w/o ephemerals. (is
> this the case for you?)
>
> Try turning on debug in your client, the client will spit out:
>   LOG.debug("Got ping response for sessionid:0x"
> If you turn on trace logging in the server you should see session updates
> there as well (c->server, which control session expiration).
>
> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code still
> running w/in the same (vm) process?
>
>
> Unfortunately I can't think of anything else if it is the GC. Basically
> you'd have to increase the timeout or try another gc with lower latency.
>
> Perhaps Mahadev/Ben/Flavio might have insight...
>
> Patrick
>
>
> Nitay wrote:
>
>> Hey guys,
>>
>> We've recently replaced a few pieces of HBase's cluster management and
>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster
>> that
>> he throws a lot of load at. Andrew's cluster was getting a lot of
>> SessionExpired events which were causing some havoc. After some discussion
>> on the hbase list and additional testing by Andrew (tweaking things like
>> the
>> session timeout, quorum size, and GC used), we suspect the problem is that
>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>>
>> There is a JIRA open on the matter where Joey suggests a solution that has
>> worked for him:
>>
>> https://issues.apache.org/jira/browse/HBASE-1316
>>
>> We wanted to loop you guys in to see if you have any thoughts/suggestions
>> on
>> the matter.
>>
>> Thanks,
>> -n
>>
>>

Re: Preventing SessionExpired events

Posted by Nitay <ni...@gmail.com>.

The default session timeout in HBase is currently 10 seconds. Bumping it up
to 30 and 60 reduced SessionExpired exceptions, according to Andrew. I
believe Andrew did run it under jconsole. He was also tuning GC parameters.
He mentioned running using incremental garbage collector
(-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode). He can provide more
details on all of this.

My understanding with HBASE-1316 is that it solves the problem because the
ZooKeeper IO/hearbeat thread becomes an OS level thread which is not managed
by Java. Hence, the GC does not starve it. Joey can comment here as he
developed the solution.

There are three main components that use ZooKeeper in HBase are: client,
regionserver, and master.

The client does not have ephemeral nodes so having something like
ZOOKEEPER-321 for it would be nice. It is currently read only. For now
recovering it by reinitializing the ZooKeeper handle is not a big deal.

The bigger issue is with the master and regionserver, which do use ephemeral
nodes. Recovering them is a bit tougher, and we'd like to prevent getting
SessionExpired as much as possible.

On Wed, Apr 8, 2009 at 1:17 PM, Patrick Hunt <ph...@apache.org> wrote:

> What are you running for a session timeout on your clients?
>
> Can you run with something like jvisualvm or jconsole, and watch the gc
> activity when the session timeouts occur? Might give you some insight.
> Have you tried one of the alternative GC's available in the VM?
>
> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx
> ie "Flags for Latency Applications"
>
> We are also working on the following jira:
> https://issues.apache.org/jira/browse/ZOOKEEPER-321
> which will eliminate session expirations for clients w/o ephemerals. (is
> this the case for you?)
>
> Try turning on debug in your client, the client will spit out:
>   LOG.debug("Got ping response for sessionid:0x"
> If you turn on trace logging in the server you should see session updates
> there as well (c->server, which control session expiration).
>
> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code still
> running w/in the same (vm) process?
>
>
> Unfortunately I can't think of anything else if it is the GC. Basically
> you'd have to increase the timeout or try another gc with lower latency.
>
> Perhaps Mahadev/Ben/Flavio might have insight...
>
> Patrick
>
>
> Nitay wrote:
>
>> Hey guys,
>>
>> We've recently replaced a few pieces of HBase's cluster management and
>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster
>> that
>> he throws a lot of load at. Andrew's cluster was getting a lot of
>> SessionExpired events which were causing some havoc. After some discussion
>> on the hbase list and additional testing by Andrew (tweaking things like
>> the
>> session timeout, quorum size, and GC used), we suspect the problem is that
>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>>
>> There is a JIRA open on the matter where Joey suggests a solution that has
>> worked for him:
>>
>> https://issues.apache.org/jira/browse/HBASE-1316
>>
>> We wanted to loop you guys in to see if you have any thoughts/suggestions
>> on
>> the matter.
>>
>> Thanks,
>> -n
>>
>>

Re: Preventing SessionExpired events

Posted by Andrew Purtell <ap...@apache.org>.

Hello Mahadev,

Thanks for the tip on GC parameters. I will try this.
I have increased the HBase default 10 second session 
timeout to 60 seconds, but will try the GC config
suggested here and begin reducing the timeout to
smaller values to determine its effect on the issues
I have been encountering.

Thanks again,

   - Andy

> From: Mahadev Konar
> Subject: Re: Preventing SessionExpired events
> Date: Wednesday, April 8, 2009, 1:40 PM
> Nitay,
> Thanks for sending us the info.
> 
> We have experienced such gc problem in our HDFS (hadoop
> file system) setups. The gc had been quite a problem
> for us with the Namenode (hadoop hdfs) process. We have
> seen the namenode just stalling for minutes doing
> garbage collection. We currently run the namenode with
> the following gc options
> 
> -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
> 
> And have avoided getting into trouble with the gc.

Re: Preventing SessionExpired events

Posted by Andrew Purtell <ap...@apache.org>.

Hello Mahadev,

Thanks for the tip on GC parameters. I will try this.
I have increased the HBase default 10 second session 
timeout to 60 seconds, but will try the GC config
suggested here and begin reducing the timeout to
smaller values to determine its effect on the issues
I have been encountering.

Thanks again,

   - Andy

> From: Mahadev Konar
> Subject: Re: Preventing SessionExpired events
> Date: Wednesday, April 8, 2009, 1:40 PM
> Nitay,
> Thanks for sending us the info.
> 
> We have experienced such gc problem in our HDFS (hadoop
> file system) setups. The gc had been quite a problem
> for us with the Namenode (hadoop hdfs) process. We have
> seen the namenode just stalling for minutes doing
> garbage collection. We currently run the namenode with
> the following gc options
> 
> -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC
> 
> And have avoided getting into trouble with the gc.

Re: Preventing SessionExpired events

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

Nitay,
Thanks for sending us the info.

We have experienced such gc problem in our HDFS (hadoop file system) setups.
The gc had been quite a problem for us with the Namenode (hadoop hdfs)
process. We have seen the namenode just stalling for minutes doing garbage
collection. We currently run the namenode with the following gc options

-XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC

And have avoided getting into trouble with the gc.

What options are you guys using with the java process that embeds the java
zookeeper client? Maybe the above gc options could help.

Also, we would be interested in having such jni wrappers into the c libarary
in case more people want it. Is the wrapper posted somewhere for us to take
a look at? 

Thanks
mahadev




On 4/8/09 1:17 PM, "Patrick Hunt" <ph...@apache.org> wrote:

> What are you running for a session timeout on your clients?
> 
> Can you run with something like jvisualvm or jconsole, and watch the gc
> activity when the session timeouts occur? Might give you some insight.
> Have you tried one of the alternative GC's available in the VM?
> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbage
> CollectionTuning.aspx
> ie "Flags for Latency Applications"
> 
> We are also working on the following jira:
> https://issues.apache.org/jira/browse/ZOOKEEPER-321
> which will eliminate session expirations for clients w/o ephemerals. (is
> this the case for you?)
> 
> Try turning on debug in your client, the client will spit out:
>     LOG.debug("Got ping response for sessionid:0x"
> If you turn on trace logging in the server you should see session
> updates there as well (c->server, which control session expiration).
> 
> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code
> still running w/in the same (vm) process?
> 
> 
> Unfortunately I can't think of anything else if it is the GC. Basically
> you'd have to increase the timeout or try another gc with lower latency.
> 
> Perhaps Mahadev/Ben/Flavio might have insight...
> 
> Patrick
> 
> Nitay wrote:
>> Hey guys,
>> 
>> We've recently replaced a few pieces of HBase's cluster management and
>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that
>> he throws a lot of load at. Andrew's cluster was getting a lot of
>> SessionExpired events which were causing some havoc. After some discussion
>> on the hbase list and additional testing by Andrew (tweaking things like the
>> session timeout, quorum size, and GC used), we suspect the problem is that
>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>> 
>> There is a JIRA open on the matter where Joey suggests a solution that has
>> worked for him:
>> 
>> https://issues.apache.org/jira/browse/HBASE-1316
>> 
>> We wanted to loop you guys in to see if you have any thoughts/suggestions on
>> the matter.
>> 
>> Thanks,
>> -n
>>

Re: Preventing SessionExpired events

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

Nitay,
Thanks for sending us the info.

We have experienced such gc problem in our HDFS (hadoop file system) setups.
The gc had been quite a problem for us with the Namenode (hadoop hdfs)
process. We have seen the namenode just stalling for minutes doing garbage
collection. We currently run the namenode with the following gc options

-XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC

And have avoided getting into trouble with the gc.

What options are you guys using with the java process that embeds the java
zookeeper client? Maybe the above gc options could help.

Also, we would be interested in having such jni wrappers into the c libarary
in case more people want it. Is the wrapper posted somewhere for us to take
a look at? 

Thanks
mahadev




On 4/8/09 1:17 PM, "Patrick Hunt" <ph...@apache.org> wrote:

> What are you running for a session timeout on your clients?
> 
> Can you run with something like jvisualvm or jconsole, and watch the gc
> activity when the session timeouts occur? Might give you some insight.
> Have you tried one of the alternative GC's available in the VM?
> http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbage
> CollectionTuning.aspx
> ie "Flags for Latency Applications"
> 
> We are also working on the following jira:
> https://issues.apache.org/jira/browse/ZOOKEEPER-321
> which will eliminate session expirations for clients w/o ephemerals. (is
> this the case for you?)
> 
> Try turning on debug in your client, the client will spit out:
>     LOG.debug("Got ping response for sessionid:0x"
> If you turn on trace logging in the server you should see session
> updates there as well (c->server, which control session expiration).
> 
> re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code
> still running w/in the same (vm) process?
> 
> 
> Unfortunately I can't think of anything else if it is the GC. Basically
> you'd have to increase the timeout or try another gc with lower latency.
> 
> Perhaps Mahadev/Ben/Flavio might have insight...
> 
> Patrick
> 
> Nitay wrote:
>> Hey guys,
>> 
>> We've recently replaced a few pieces of HBase's cluster management and
>> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that
>> he throws a lot of load at. Andrew's cluster was getting a lot of
>> SessionExpired events which were causing some havoc. After some discussion
>> on the hbase list and additional testing by Andrew (tweaking things like the
>> session timeout, quorum size, and GC used), we suspect the problem is that
>> the Java GC is starving the ZooKeeper hearbeat thread from executing.
>> 
>> There is a JIRA open on the matter where Joey suggests a solution that has
>> worked for him:
>> 
>> https://issues.apache.org/jira/browse/HBASE-1316
>> 
>> We wanted to loop you guys in to see if you have any thoughts/suggestions on
>> the matter.
>> 
>> Thanks,
>> -n
>>

Re: Preventing SessionExpired events

Posted by Patrick Hunt <ph...@apache.org>.

What are you running for a session timeout on your clients?

Can you run with something like jvisualvm or jconsole, and watch the gc 
activity when the session timeouts occur? Might give you some insight.
Have you tried one of the alternative GC's available in the VM?
http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx
ie "Flags for Latency Applications"

We are also working on the following jira:
https://issues.apache.org/jira/browse/ZOOKEEPER-321
which will eliminate session expirations for clients w/o ephemerals. (is 
this the case for you?)

Try turning on debug in your client, the client will spit out:
    LOG.debug("Got ping response for sessionid:0x"
If you turn on trace logging in the server you should see session 
updates there as well (c->server, which control session expiration).

re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code 
still running w/in the same (vm) process?

Unfortunately I can't think of anything else if it is the GC. Basically 
you'd have to increase the timeout or try another gc with lower latency.

Perhaps Mahadev/Ben/Flavio might have insight...

Patrick

Nitay wrote:
> Hey guys,
> 
> We've recently replaced a few pieces of HBase's cluster management and
> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that
> he throws a lot of load at. Andrew's cluster was getting a lot of
> SessionExpired events which were causing some havoc. After some discussion
> on the hbase list and additional testing by Andrew (tweaking things like the
> session timeout, quorum size, and GC used), we suspect the problem is that
> the Java GC is starving the ZooKeeper hearbeat thread from executing.
> 
> There is a JIRA open on the matter where Joey suggests a solution that has
> worked for him:
> 
> https://issues.apache.org/jira/browse/HBASE-1316
> 
> We wanted to loop you guys in to see if you have any thoughts/suggestions on
> the matter.
> 
> Thanks,
> -n
>

Re: Preventing SessionExpired events

Posted by Patrick Hunt <ph...@apache.org>.

What are you running for a session timeout on your clients?

Can you run with something like jvisualvm or jconsole, and watch the gc 
activity when the session timeouts occur? Might give you some insight.
Have you tried one of the alternative GC's available in the VM?
http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx
ie "Flags for Latency Applications"

We are also working on the following jira:
https://issues.apache.org/jira/browse/ZOOKEEPER-321
which will eliminate session expirations for clients w/o ephemerals. (is 
this the case for you?)

Try turning on debug in your client, the client will spit out:
    LOG.debug("Got ping response for sessionid:0x"
If you turn on trace logging in the server you should see session 
updates there as well (c->server, which control session expiration).

re HBASE-1316 - how does the jni c wrapper fix this? Isn't the code 
still running w/in the same (vm) process?

Unfortunately I can't think of anything else if it is the GC. Basically 
you'd have to increase the timeout or try another gc with lower latency.

Perhaps Mahadev/Ben/Flavio might have insight...

Patrick

Nitay wrote:
> Hey guys,
> 
> We've recently replaced a few pieces of HBase's cluster management and
> coordination with ZooKeeper. One of guys, Andrew Purtell, has a cluster that
> he throws a lot of load at. Andrew's cluster was getting a lot of
> SessionExpired events which were causing some havoc. After some discussion
> on the hbase list and additional testing by Andrew (tweaking things like the
> session timeout, quorum size, and GC used), we suspect the problem is that
> the Java GC is starving the ZooKeeper hearbeat thread from executing.
> 
> There is a JIRA open on the matter where Joey suggests a solution that has
> worked for him:
> 
> https://issues.apache.org/jira/browse/HBASE-1316
> 
> We wanted to loop you guys in to see if you have any thoughts/suggestions on
> the matter.
> 
> Thanks,
> -n
>