You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Joey Echeverria <jo...@gmail.com> on 2009/02/22 12:09:41 UTC

Recommended session timeout

Is there a recommended session timeout? Does it change based on the
ensemble size?

Thanks,

-Joey

Re: Recommended session timeout

Posted by juacamar <jc...@cecropiasolutions.com>.

Hi Iam Using zookeeper in an app,

I need that zk client do not closed the session, is it a way to do that?

thanks



--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Recommended-session-timeout-tp2367205p7582530.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Recommended session timeout

Posted by Joey Echeverria <jo...@gmail.com>.

I doubt we're swapping, The machine we used to test has 12 GB of RAM
and little else active at the time. I think the main problem is the
heap being too small, so the GC has to run for longer trying to track
down objects to collect.

On Thu, Feb 26, 2009 at 10:28 PM, Benjamin Reed <br...@yahoo-inc.com> wrote:
> just a quick sanity check. are you sure your memory is not overcommitted? in other words you aren't swapping. since the gc does a bunch of random memory accesses if you swap at all things will go very slow.
>
> ben
> ________________________________________
> From: Joey Echeverria [joey42@gmail.com]
> Sent: Thursday, February 26, 2009 1:31 PM
> To: zookeeper-user@hadoop.apache.org
> Subject: Re: Recommended session timeout
>
> I've answered the questions you asked previously below, but I thought
> I would open with the actual culprit now that we found it. When I said
> loading data before, what I was talking about was sending data via
> Thrift to the machine that was getting disconnected from zookeeper.
> This turned out to be the problem. Too much data was being sent in
> short span of time and this caused memory pressure on the heap. This
> increased the fraction of the time that the GC had to run to keep up.
> During a 143 second test, the GC was running for 33 seconds.
>
> We found this by running tcpdump on both the machine running the
> ensemble server and the machine connecting to zookeeper as a client.
> We deduced it wasn't a network (lost packet) issue, as we never saw
> unmatched packets in our tests. What did see were "long" 2-7 second
> pauses with no packets being sent. We first attempted to up the
> priority of the zookeeper threads to see if that would help. When it
> didn't, we started monitoring the GC time. We don't have a work around
> yet, other than sending data in smaller batches and  using a longer
> sessionTimeout.
>
> Thanks for all your help!
>
> -Joey
>
>> As an experiment try increasing the timeout to say 30 seconds and re-run
>> your tests. Any change?
>
> 30 seconds and higher works fine.
>
>> "loading data" - could you explain a bit more about what you mean by this?
>> If you are able to provide enough information for us to replicate we could
>> try it out (also provide info on your ensemble configuration as Mahadev
>> suggested)
>
> The ensemble config file looks as follows:
>
> tickTime=2000
> dataDir=/data/zk
> clientPort=2181
> initLimit=5
> syncLimit=2
> skipACL=true
>
> server.1=<server>1:2888:3888
> ...
> server.7=<server>7:2888:3888
>
>> You are referring to startConnect in SendThread?
>>
>> We randomly sleep up to 1 second to ensure that the clients don't all storm
>> the server(s) after a bounce.
>
> That makes some sense, but it might be worth tweaking that parameter
> based on sessionTimeout since 1 second can easily be 10-20% of
> sessionTimeout.
>
>> 1) configure your test client to connect to 1 server in the ensemble
>> 2) run the srst command on that server
>> 3) run your client test
>> 4) run the stat command on that server
>> 5) if the test takes some time, run the stat a few times during the test
>>  to get more data points
>
> The problem doesn't appear to be on the server end as max latency
> never went above 5ms. Also, no messages are shown as queued.
>

RE: Recommended session timeout

Posted by Benjamin Reed <br...@yahoo-inc.com>.

just a quick sanity check. are you sure your memory is not overcommitted? in other words you aren't swapping. since the gc does a bunch of random memory accesses if you swap at all things will go very slow.

ben
________________________________________
From: Joey Echeverria [joey42@gmail.com]
Sent: Thursday, February 26, 2009 1:31 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Recommended session timeout

I've answered the questions you asked previously below, but I thought
I would open with the actual culprit now that we found it. When I said
loading data before, what I was talking about was sending data via
Thrift to the machine that was getting disconnected from zookeeper.
This turned out to be the problem. Too much data was being sent in
short span of time and this caused memory pressure on the heap. This
increased the fraction of the time that the GC had to run to keep up.
During a 143 second test, the GC was running for 33 seconds.

We found this by running tcpdump on both the machine running the
ensemble server and the machine connecting to zookeeper as a client.
We deduced it wasn't a network (lost packet) issue, as we never saw
unmatched packets in our tests. What did see were "long" 2-7 second
pauses with no packets being sent. We first attempted to up the
priority of the zookeeper threads to see if that would help. When it
didn't, we started monitoring the GC time. We don't have a work around
yet, other than sending data in smaller batches and  using a longer
sessionTimeout.

Thanks for all your help!

-Joey

> As an experiment try increasing the timeout to say 30 seconds and re-run
> your tests. Any change?

30 seconds and higher works fine.

> "loading data" - could you explain a bit more about what you mean by this?
> If you are able to provide enough information for us to replicate we could
> try it out (also provide info on your ensemble configuration as Mahadev
> suggested)

The ensemble config file looks as follows:

tickTime=2000
dataDir=/data/zk
clientPort=2181
initLimit=5
syncLimit=2
skipACL=true

server.1=<server>1:2888:3888
...
server.7=<server>7:2888:3888

> You are referring to startConnect in SendThread?
>
> We randomly sleep up to 1 second to ensure that the clients don't all storm
> the server(s) after a bounce.

That makes some sense, but it might be worth tweaking that parameter
based on sessionTimeout since 1 second can easily be 10-20% of
sessionTimeout.

> 1) configure your test client to connect to 1 server in the ensemble
> 2) run the srst command on that server
> 3) run your client test
> 4) run the stat command on that server
> 5) if the test takes some time, run the stat a few times during the test
>  to get more data points

The problem doesn't appear to be on the server end as max latency
never went above 5ms. Also, no messages are shown as queued.

Re: Recommended session timeout

Posted by Patrick Hunt <ph...@apache.org>.

That's very interesting results, a good job sleuthing. You might try the 
concurrent collector?
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#available_collectors.selecting

specifically item 4  "-XX:+UseConcMarkSweepGC"

I've never used this before myself but it's supposed to reduce the gc 
pauses to less than a second. Might require some tuning though...

Patrick


Joey Echeverria wrote:
> I've answered the questions you asked previously below, but I thought
> I would open with the actual culprit now that we found it. When I said
> loading data before, what I was talking about was sending data via
> Thrift to the machine that was getting disconnected from zookeeper.
> This turned out to be the problem. Too much data was being sent in
> short span of time and this caused memory pressure on the heap. This
> increased the fraction of the time that the GC had to run to keep up.
> During a 143 second test, the GC was running for 33 seconds.
> 
> We found this by running tcpdump on both the machine running the
> ensemble server and the machine connecting to zookeeper as a client.
> We deduced it wasn't a network (lost packet) issue, as we never saw
> unmatched packets in our tests. What did see were "long" 2-7 second
> pauses with no packets being sent. We first attempted to up the
> priority of the zookeeper threads to see if that would help. When it
> didn't, we started monitoring the GC time. We don't have a work around
> yet, other than sending data in smaller batches and  using a longer
> sessionTimeout.
> 
> Thanks for all your help!
> 
> -Joey
> 
>> As an experiment try increasing the timeout to say 30 seconds and re-run
>> your tests. Any change?
> 
> 30 seconds and higher works fine.
> 
>> "loading data" - could you explain a bit more about what you mean by this?
>> If you are able to provide enough information for us to replicate we could
>> try it out (also provide info on your ensemble configuration as Mahadev
>> suggested)
> 
> The ensemble config file looks as follows:
> 
> tickTime=2000
> dataDir=/data/zk
> clientPort=2181
> initLimit=5
> syncLimit=2
> skipACL=true
> 
> server.1=<server>1:2888:3888
> ...
> server.7=<server>7:2888:3888
> 
>> You are referring to startConnect in SendThread?
>>
>> We randomly sleep up to 1 second to ensure that the clients don't all storm
>> the server(s) after a bounce.
> 
> That makes some sense, but it might be worth tweaking that parameter
> based on sessionTimeout since 1 second can easily be 10-20% of
> sessionTimeout.
> 
>> 1) configure your test client to connect to 1 server in the ensemble
>> 2) run the srst command on that server
>> 3) run your client test
>> 4) run the stat command on that server
>> 5) if the test takes some time, run the stat a few times during the test
>>  to get more data points
> 
> The problem doesn't appear to be on the server end as max latency
> never went above 5ms. Also, no messages are shown as queued.

Re: Recommended session timeout

Posted by Joey Echeverria <jo...@gmail.com>.

I've answered the questions you asked previously below, but I thought
I would open with the actual culprit now that we found it. When I said
loading data before, what I was talking about was sending data via
Thrift to the machine that was getting disconnected from zookeeper.
This turned out to be the problem. Too much data was being sent in
short span of time and this caused memory pressure on the heap. This
increased the fraction of the time that the GC had to run to keep up.
During a 143 second test, the GC was running for 33 seconds.

We found this by running tcpdump on both the machine running the
ensemble server and the machine connecting to zookeeper as a client.
We deduced it wasn't a network (lost packet) issue, as we never saw
unmatched packets in our tests. What did see were "long" 2-7 second
pauses with no packets being sent. We first attempted to up the
priority of the zookeeper threads to see if that would help. When it
didn't, we started monitoring the GC time. We don't have a work around
yet, other than sending data in smaller batches and  using a longer
sessionTimeout.

Thanks for all your help!

-Joey

> As an experiment try increasing the timeout to say 30 seconds and re-run
> your tests. Any change?

30 seconds and higher works fine.

> "loading data" - could you explain a bit more about what you mean by this?
> If you are able to provide enough information for us to replicate we could
> try it out (also provide info on your ensemble configuration as Mahadev
> suggested)

The ensemble config file looks as follows:

tickTime=2000
dataDir=/data/zk
clientPort=2181
initLimit=5
syncLimit=2
skipACL=true

server.1=<server>1:2888:3888
...
server.7=<server>7:2888:3888

> You are referring to startConnect in SendThread?
>
> We randomly sleep up to 1 second to ensure that the clients don't all storm
> the server(s) after a bounce.

That makes some sense, but it might be worth tweaking that parameter
based on sessionTimeout since 1 second can easily be 10-20% of
sessionTimeout.

> 1) configure your test client to connect to 1 server in the ensemble
> 2) run the srst command on that server
> 3) run your client test
> 4) run the stat command on that server
> 5) if the test takes some time, run the stat a few times during the test
>  to get more data points

The problem doesn't appear to be on the server end as max latency
never went above 5ms. Also, no messages are shown as queued.

Re: Recommended session timeout

Posted by Joey Echeverria <jo...@gmail.com>.

Thanks for all the info. I'll have to run the tests tomorrow as I
don't have access to my cluster right now.

One piece of clarification, the data loading isn't data being loaded
into zk. The data is being loaded into the system we're developing
which uses zk to support distributed operations. I mentioned the data
loading as it increased the burden on the network.

I did run some tests with a longer sessionTimeout (60s) and I didn't
see any disconnects.

I'll let you know how my testing goes tomorrow.

Thanks,

-Joey

On Tue, Feb 24, 2009 at 6:14 PM, Patrick Hunt <ph...@apache.org> wrote:
> Joey Echeverria wrote:
>>
>> Thanks for the link to the documentation. I've been running tests with
>> a 5 second session timeout and disconnect events appear frequent. The
>> network they're operating on is generally quite, but the disconnects
>> to correlate with an increase in activity (e.g. loading data into the
>> system).
>
> As an experiment try increasing the timeout to say 30 seconds and re-run
> your tests. Any change?
>
> "loading data" - could you explain a bit more about what you mean by this?
> If you are able to provide enough information for us to replicate we could
> try it out (also provide info on your ensemble configuration as Mahadev
> suggested)
>
>> Does this seem normal to you or does it imply a potential
>> configuration problem on my network?
>
> Not enough info at this time to speculate. Can you provide the configs for
> at least 1 server in the ensemble (I'm assuming they are all pretty much the
> same)
>
>> On a related topic, I was reading the 3.1 client source code,
>> particularly the reconnect source, and noticed that the client sleeps
>> for up to 1 second before trying to reconnect. This seems excessive
>> and with a 5 second session timeout leads to more frequent session
>> expirations. Almost every time it sleeps for more than about 800 ms, a
>> disconnect is followed by an expiration.
>
> You are referring to startConnect in SendThread?
>
> We randomly sleep up to 1 second to ensure that the clients don't all storm
> the server(s) after a bounce.
>
>
> I suspect that the following is happening:
>
> Your client(s) is sending information to the server, the server has 1 or
> more outstanding requests from the client. You mentioned "loading data", at
> some point the server flushes the data to disk, it could be that this flush
> takes a significant amount of time. As there is communication btw the server
> and client (client sent a request, server is responding) there will be no
> heatbeating server->client going on while the request is outstanding (the
> server will not send a heartbeat because of the request being in progress).
> As a result the client doesn't see a response for a potentially long period
> of time (because of the flush).
>
> Try using the stat & srst commands detailed here:
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands
>
> 1) configure your test client to connect to 1 server in the ensemble
> 2) run the srst command on that server
> 3) run your client test
> 4) run the stat command on that server
> 5) if the test takes some time, run the stat a few times during the test
>  to get more data points
>
> The stat command will give you min/avg/max latency for requests to the
> server. If max latency goes above your timeout then you will see the
> disconnect on the client. This indicates that the server is probably to
> blame (vs say networking issues, which we see alot). Let us know the results
> of this test.
>
> Btw, latency can be effected by a number of factors:
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_commonProblems
>
> 1) make sure the server JDK is not swapping (etc...) as this will kill
> latency
> 2) is the host(s) running the ZK server on dedicated devices (cpu/mem/disk)
> or sharing resources with other applications?
> 3) are you using a dedicated transaction log device (drive)? This is
> critical for low-latency & high throughput of the ensemble.
>
> Patrick
>
>> Is this a bug, or desirable behavior?
>>
>> Thanks,
>>
>> -Joey
>>
>> On Mon, Feb 23, 2009 at 10:37 PM, Patrick Hunt <ph...@apache.org> wrote:
>>>
>>> The latest docs (3.1.0 has some updates to that section) can be found
>>> here:
>>>
>>> http://hadoop.apache.org/zookeeper/docs/r3.1.0/zookeeperProgrammers.html#ch_zkSessions
>>>
>>> Patrick
>>>
>>> Mahadev Konar wrote:
>>>>
>>>> Hi Joey,
>>>>  here is a link to information on session timeouts.
>>>>
>>>>
>>>> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html#ch_
>>>> zkSessions
>>>>  The session timeouts depends on how sensitive you want your application
>>>> to
>>>> be. A very low session timeout like (1-2 seconds) might lead to your
>>>> application being very sensitive to events like minor network problems
>>>> etc.,
>>>> a higher values of say (30 seconds) on the other hand might lead to slow
>>>> detection of client failures -- example one of the zookeeper client
>>>> which
>>>> has ephemeral node goes down, in this case the ephemeral nodes will only
>>>> go
>>>> away after session timeout.
>>>>
>>>> I have seen some users using 10-15 seconds of session timeout, but you
>>>> should use as per your application requirements.
>>>>
>>>> Hope this helps.
>>>> mahadev
>>>>
>>>>
>>>> On 2/22/09 3:09 AM, "Joey Echeverria" <jo...@gmail.com> wrote:
>>>>
>>>>> Is there a recommended session timeout? Does it change based on the
>>>>> ensemble size?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Joey
>

Re: Recommended session timeout

Posted by Patrick Hunt <ph...@apache.org>.

Joey Echeverria wrote:
> Thanks for the link to the documentation. I've been running tests with
> a 5 second session timeout and disconnect events appear frequent. The
> network they're operating on is generally quite, but the disconnects
> to correlate with an increase in activity (e.g. loading data into the
> system).

As an experiment try increasing the timeout to say 30 seconds and re-run 
your tests. Any change?

"loading data" - could you explain a bit more about what you mean by 
this? If you are able to provide enough information for us to replicate 
we could try it out (also provide info on your ensemble configuration as 
Mahadev suggested)

> Does this seem normal to you or does it imply a potential
> configuration problem on my network?

Not enough info at this time to speculate. Can you provide the configs 
for at least 1 server in the ensemble (I'm assuming they are all pretty 
much the same)

> On a related topic, I was reading the 3.1 client source code,
> particularly the reconnect source, and noticed that the client sleeps
> for up to 1 second before trying to reconnect. This seems excessive
> and with a 5 second session timeout leads to more frequent session
> expirations. Almost every time it sleeps for more than about 800 ms, a
> disconnect is followed by an expiration.

You are referring to startConnect in SendThread?

We randomly sleep up to 1 second to ensure that the clients don't all 
storm the server(s) after a bounce.

I suspect that the following is happening:

Your client(s) is sending information to the server, the server has 1 or 
more outstanding requests from the client. You mentioned "loading data", 
at some point the server flushes the data to disk, it could be that this 
flush takes a significant amount of time. As there is communication btw 
the server and client (client sent a request, server is responding) 
there will be no heatbeating server->client going on while the request 
is outstanding (the server will not send a heartbeat because of the 
request being in progress). As a result the client doesn't see a 
response for a potentially long period of time (because of the flush).

Try using the stat & srst commands detailed here:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands

1) configure your test client to connect to 1 server in the ensemble
2) run the srst command on that server
3) run your client test
4) run the stat command on that server
5) if the test takes some time, run the stat a few times during the test
   to get more data points

The stat command will give you min/avg/max latency for requests to the 
server. If max latency goes above your timeout then you will see the 
disconnect on the client. This indicates that the server is probably to 
blame (vs say networking issues, which we see alot). Let us know the 
results of this test.

Btw, latency can be effected by a number of factors:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_commonProblems

1) make sure the server JDK is not swapping (etc...) as this will kill 
latency
2) is the host(s) running the ZK server on dedicated devices 
(cpu/mem/disk) or sharing resources with other applications?
3) are you using a dedicated transaction log device (drive)? This is 
critical for low-latency & high throughput of the ensemble.

Patrick

> Is this a bug, or desirable behavior?
> 
> Thanks,
> 
> -Joey
> 
> On Mon, Feb 23, 2009 at 10:37 PM, Patrick Hunt <ph...@apache.org> wrote:
>> The latest docs (3.1.0 has some updates to that section) can be found here:
>> http://hadoop.apache.org/zookeeper/docs/r3.1.0/zookeeperProgrammers.html#ch_zkSessions
>>
>> Patrick
>>
>> Mahadev Konar wrote:
>>> Hi Joey,
>>>  here is a link to information on session timeouts.
>>>
>>> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html#ch_
>>> zkSessions
>>>  The session timeouts depends on how sensitive you want your application
>>> to
>>> be. A very low session timeout like (1-2 seconds) might lead to your
>>> application being very sensitive to events like minor network problems
>>> etc.,
>>> a higher values of say (30 seconds) on the other hand might lead to slow
>>> detection of client failures -- example one of the zookeeper client which
>>> has ephemeral node goes down, in this case the ephemeral nodes will only
>>> go
>>> away after session timeout.
>>>
>>> I have seen some users using 10-15 seconds of session timeout, but you
>>> should use as per your application requirements.
>>>
>>> Hope this helps.
>>> mahadev
>>>
>>>
>>> On 2/22/09 3:09 AM, "Joey Echeverria" <jo...@gmail.com> wrote:
>>>
>>>> Is there a recommended session timeout? Does it change based on the
>>>> ensemble size?
>>>>
>>>> Thanks,
>>>>
>>>> -Joey

Re: Recommended session timeout

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

On 2/23/09 11:37 PM, "Joey Echeverria" <jo...@gmail.com> wrote:

> Thanks for the link to the documentation. I've been running tests with
> a 5 second session timeout and disconnect events appear frequent. The
> network they're operating on is generally quite, but the disconnects
> to correlate with an increase in activity (e.g. loading data into the
> system).
> 
> Does this seem normal to you or does it imply a potential
> configuration problem on my network?
How many zookeeper quorum servers are you running? What is the config for
the zookeeper servers?

> 
> On a related topic, I was reading the 3.1 client source code,
> particularly the reconnect source, and noticed that the client sleeps
> for up to 1 second before trying to reconnect. This seems excessive
> and with a 5 second session timeout leads to more frequent session
> expirations. Almost every time it sleeps for more than about 800 ms, a
> disconnect is followed by an expiration.
Can you point me to the code which you think does this? A client is supposed
to disconnect itself from a server if it does not hear a response to its
ping's within 1/3 of the session timeout. It should then reconnect to the
other servers. Session expiration  happening so frequently does indicate a
problem. More information on your setup will help.

Thanks
mahadev

> 
> Is this a bug, or desirable behavior?
> 
> Thanks,
> 
> -Joey
> 
> On Mon, Feb 23, 2009 at 10:37 PM, Patrick Hunt <ph...@apache.org> wrote:
>> The latest docs (3.1.0 has some updates to that section) can be found here:
>> http://hadoop.apache.org/zookeeper/docs/r3.1.0/zookeeperProgrammers.html#ch_z
>> kSessions
>> 
>> Patrick
>> 
>> Mahadev Konar wrote:
>>> 
>>> Hi Joey,
>>>  here is a link to information on session timeouts.
>>> 
>>> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html#ch_
>>> zkSessions
>>>  The session timeouts depends on how sensitive you want your application
>>> to
>>> be. A very low session timeout like (1-2 seconds) might lead to your
>>> application being very sensitive to events like minor network problems
>>> etc.,
>>> a higher values of say (30 seconds) on the other hand might lead to slow
>>> detection of client failures -- example one of the zookeeper client which
>>> has ephemeral node goes down, in this case the ephemeral nodes will only
>>> go
>>> away after session timeout.
>>> 
>>> I have seen some users using 10-15 seconds of session timeout, but you
>>> should use as per your application requirements.
>>> 
>>> Hope this helps.
>>> mahadev
>>> 
>>> 
>>> On 2/22/09 3:09 AM, "Joey Echeverria" <jo...@gmail.com> wrote:
>>> 
>>>> Is there a recommended session timeout? Does it change based on the
>>>> ensemble size?
>>>> 
>>>> Thanks,
>>>> 
>>>> -Joey
>>> 
>>

Re: Recommended session timeout

Posted by Joey Echeverria <jo...@gmail.com>.

Thanks for the link to the documentation. I've been running tests with
a 5 second session timeout and disconnect events appear frequent. The
network they're operating on is generally quite, but the disconnects
to correlate with an increase in activity (e.g. loading data into the
system).

Does this seem normal to you or does it imply a potential
configuration problem on my network?

On a related topic, I was reading the 3.1 client source code,
particularly the reconnect source, and noticed that the client sleeps
for up to 1 second before trying to reconnect. This seems excessive
and with a 5 second session timeout leads to more frequent session
expirations. Almost every time it sleeps for more than about 800 ms, a
disconnect is followed by an expiration.

Is this a bug, or desirable behavior?

Thanks,

-Joey

On Mon, Feb 23, 2009 at 10:37 PM, Patrick Hunt <ph...@apache.org> wrote:
> The latest docs (3.1.0 has some updates to that section) can be found here:
> http://hadoop.apache.org/zookeeper/docs/r3.1.0/zookeeperProgrammers.html#ch_zkSessions
>
> Patrick
>
> Mahadev Konar wrote:
>>
>> Hi Joey,
>>  here is a link to information on session timeouts.
>>
>> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html#ch_
>> zkSessions
>>  The session timeouts depends on how sensitive you want your application
>> to
>> be. A very low session timeout like (1-2 seconds) might lead to your
>> application being very sensitive to events like minor network problems
>> etc.,
>> a higher values of say (30 seconds) on the other hand might lead to slow
>> detection of client failures -- example one of the zookeeper client which
>> has ephemeral node goes down, in this case the ephemeral nodes will only
>> go
>> away after session timeout.
>>
>> I have seen some users using 10-15 seconds of session timeout, but you
>> should use as per your application requirements.
>>
>> Hope this helps.
>> mahadev
>>
>>
>> On 2/22/09 3:09 AM, "Joey Echeverria" <jo...@gmail.com> wrote:
>>
>>> Is there a recommended session timeout? Does it change based on the
>>> ensemble size?
>>>
>>> Thanks,
>>>
>>> -Joey
>>
>

Re: Recommended session timeout

Posted by Patrick Hunt <ph...@apache.org>.

The latest docs (3.1.0 has some updates to that section) can be found here:
http://hadoop.apache.org/zookeeper/docs/r3.1.0/zookeeperProgrammers.html#ch_zkSessions

Patrick

Mahadev Konar wrote:
> Hi Joey,
>  here is a link to information on session timeouts.
> http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html#ch_
> zkSessions
>   
> The session timeouts depends on how sensitive you want your application to
> be. A very low session timeout like (1-2 seconds) might lead to your
> application being very sensitive to events like minor network problems etc.,
> a higher values of say (30 seconds) on the other hand might lead to slow
> detection of client failures -- example one of the zookeeper client which
> has ephemeral node goes down, in this case the ephemeral nodes will only go
> away after session timeout.
> 
> I have seen some users using 10-15 seconds of session timeout, but you
> should use as per your application requirements.
> 
> Hope this helps.
> mahadev
> 
> 
> On 2/22/09 3:09 AM, "Joey Echeverria" <jo...@gmail.com> wrote:
> 
>> Is there a recommended session timeout? Does it change based on the
>> ensemble size?
>>
>> Thanks,
>>
>> -Joey
>

Re: Recommended session timeout

Posted by Mahadev Konar <ma...@yahoo-inc.com>.

Hi Joey,
 here is a link to information on session timeouts.
http://hadoop.apache.org/zookeeper/docs/r3.0.1/zookeeperProgrammers.html#ch_
zkSessions

The session timeouts depends on how sensitive you want your application to
be. A very low session timeout like (1-2 seconds) might lead to your
application being very sensitive to events like minor network problems etc.,
a higher values of say (30 seconds) on the other hand might lead to slow
detection of client failures -- example one of the zookeeper client which
has ephemeral node goes down, in this case the ephemeral nodes will only go
away after session timeout.

I have seen some users using 10-15 seconds of session timeout, but you
should use as per your application requirements.

Hope this helps.
mahadev

On 2/22/09 3:09 AM, "Joey Echeverria" <jo...@gmail.com> wrote:

> Is there a recommended session timeout? Does it change based on the
> ensemble size?
> 
> Thanks,
> 
> -Joey