You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Thiago Borges <th...@gmail.com> on 2009/12/16 19:33:42 UTC

Share Zookeeper instance and Connection Limits

I read the documentation at zoo site and can't find some text about 
sharing/limits of zoo clients connections.

I only see the parameter in .conf file about the max number of 
connections per client.

Can someone point me some documentation about sharing the zookeeper 
connections? Can I do this among different threads?

And about client connections limits and how much throughput decreases 
when the number of connections increase?

Thanks,

-- 
Thiago Borges

Re: Share Zookeeper instance and Connection Limits

Posted by Ted Dunning <te...@gmail.com>.
ZK does a good enough job at avoiding seeks that if you give it some
dedicated disks, you may not see much speedup with SSD's.

On Fri, Dec 18, 2009 at 9:13 AM, Thiago Borges <th...@gmail.com> wrote:

> Maybe in some specifically environment as described in ZOOKEEPER-546. But
> yes, I agree, it's only a idea. The world are changing to SSD's too!




-- 
Ted Dunning, CTO
DeepDyve

Re: Share Zookeeper instance and Connection Limits

Posted by Thiago Borges <th...@gmail.com>.
On 16/12/2009 20:06, Benjamin Reed wrote:
> I agree with Ted, it doesn't seem like a good idea to do in practice.

Maybe in some specifically environment as described in ZOOKEEPER-546. 
But yes, I agree, it's only a idea. The world are changing to SSD's too!

>
> 1) use tmpfs

My memory will be split in 2, ok?

> 2) you can set forceSync to "no" in the configuration file to disable 
> syncing to disk before acknowledging responses

Good.

> 3) if you really want to make the disk write go away, you can modify 
> the SyncRequestProcessor in the code
>

More good. Thanks for indicating the path!

-- 
Thiago Borges

Re: Share Zookeeper instance and Connection Limits

Posted by Benjamin Reed <br...@yahoo-inc.com>.
I agree with Ted, it doesn't seem like a good idea to do in practice. 
however, you do have a couple of options if you are just testing things:

1) use tmpfs
2) you can set forceSync to "no" in the configuration file to disable 
syncing to disk before acknowledging responses
3) if you really want to make the disk write go away, you can modify the 
SyncRequestProcessor in the code

ben

Ted Dunning wrote:
> I think that htis would be a very bad idea because of restart issues.  As it
> stands, ZK reads from disk snapshots on startup to avoid moving as much data
> from other members of the cluster.
>
> You might consider putting the snapshots and log on a tmpfs file system if
> you really, really want this.
>
> On Wed, Dec 16, 2009 at 1:08 PM, Thiago Borges <th...@gmail.com> wrote:
>
>   
>> Can Zookeeper ensemble runs only in memory rather than write in both memory
>> and disk? This makes senses since I have a high reliable system? (Of course
>> at some time we need a "dump" to shutdown and restart the entire system).
>>
>> Well, the disk IO or network first limits the throughput?
>>
>> Thanks for you quick response. I'm studding Zookeeper in my master thesis,
>> for coordinate distributed index structures.
>>
>>     
>
>
>
>   


Re: Share Zookeeper instance and Connection Limits

Posted by Ted Dunning <te...@gmail.com>.
I think that htis would be a very bad idea because of restart issues.  As it
stands, ZK reads from disk snapshots on startup to avoid moving as much data
from other members of the cluster.

You might consider putting the snapshots and log on a tmpfs file system if
you really, really want this.

On Wed, Dec 16, 2009 at 1:08 PM, Thiago Borges <th...@gmail.com> wrote:

> Can Zookeeper ensemble runs only in memory rather than write in both memory
> and disk? This makes senses since I have a high reliable system? (Of course
> at some time we need a "dump" to shutdown and restart the entire system).
>
> Well, the disk IO or network first limits the throughput?
>
> Thanks for you quick response. I'm studding Zookeeper in my master thesis,
> for coordinate distributed index structures.
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Share Zookeeper instance and Connection Limits

Posted by Patrick Hunt <ph...@apache.org>.
If you do test large (and by large I'm talking about millions of znodes 
and 10s of millions of watches) be sure to allocate enough memory, get 
the latest JVM (1.6.0_17) and turn on incremental/CMS GC in the sun JVM.

You may find this helpful as well for tracking progress in real time: 
http://bit.ly/1iMZdg

Patrick

Thiago Borges wrote:
> On 16/12/2009 19:15, Patrick Hunt wrote:
>> Rt, this was with 1gE. No, I don't know anyone who has done this. But 
>> it should be easy enough for you to test. Limit the amount of data you 
>> are storing in znodes and it shouldn't be too terrible.
> 
> Ok. I will do experiments in a lab with 40 Core 2 duo machines, 2G RAM, 
> and a common 5400 rpm disk. Try to believe that ;) The data inserted in 
> nodes are quite small (<1k, 2k), but the amount of znodes and watches is 
> large.
> 
>> Not currently, this feature is looking for someone interested enough 
>> to provide some patches ;-)
>> https://issues.apache.org/jira/browse/ZOOKEEPER-546
> 
> Maybe in a near future! ;)
> 

Re: Share Zookeeper instance and Connection Limits

Posted by Thiago Borges <th...@gmail.com>.
On 16/12/2009 19:15, Patrick Hunt wrote:
> Rt, this was with 1gE. No, I don't know anyone who has done this. But 
> it should be easy enough for you to test. Limit the amount of data you 
> are storing in znodes and it shouldn't be too terrible.

Ok. I will do experiments in a lab with 40 Core 2 duo machines, 2G RAM, 
and a common 5400 rpm disk. Try to believe that ;) The data inserted in 
nodes are quite small (<1k, 2k), but the amount of znodes and watches is 
large.

> Not currently, this feature is looking for someone interested enough 
> to provide some patches ;-)
> https://issues.apache.org/jira/browse/ZOOKEEPER-546

Maybe in a near future! ;)

-- 
Thiago Borges

Re: Share Zookeeper instance and Connection Limits

Posted by Patrick Hunt <ph...@apache.org>.
Thiago Borges wrote:
> On 16/12/2009 16:45, Patrick Hunt wrote:
>> This test has 910 clients (sessions) involved:
>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperOver.html#Performance 
>>
>>
>> We have users with 10k sessions accessing a single 5 node ZK ensemble. 
>> That's the largest I know about that's in production. I've personally 
>> tested up to 20k sessions attaching to a 3 node ensemble with 10 
>> second session timeout and it was fine (although I didn't do much 
>> other than test session establishment and teardown).
>>
>> Also see this: http://bit.ly/4ekN8G
> 
> The network of this test is a gigabit ethernet, ok? You know someone 
> with was running ensembles in 100 Mbit/s ethernet?

Rt, this was with 1gE. No, I don't know anyone who has done this. But it 
should be easy enough for you to test. Limit the amount of data you are 
storing in znodes and it shouldn't be too terrible.

> Can Zookeeper ensemble runs only in memory rather than write in both 
> memory and disk? This makes senses since I have a high reliable system? 
> (Of course at some time we need a "dump" to shutdown and restart the 
> entire system).

Not currently, this feature is looking for someone interested enough to 
provide some patches ;-)
https://issues.apache.org/jira/browse/ZOOKEEPER-546

> 
> Well, the disk IO or network first limits the throughput?

I believe the current limitation is CPU bound on the ack processor 
(given that you have a dedicated txlog device). So neither afaik.

> Thanks for you quick response. I'm studding Zookeeper in my master 
> thesis, for coordinate distributed index structures.

NP. Enjoy.

Patrick


Re: Share Zookeeper instance and Connection Limits

Posted by Thiago Borges <th...@gmail.com>.
On 16/12/2009 16:45, Patrick Hunt wrote:
> This test has 910 clients (sessions) involved:
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperOver.html#Performance 
>
>
> We have users with 10k sessions accessing a single 5 node ZK ensemble. 
> That's the largest I know about that's in production. I've personally 
> tested up to 20k sessions attaching to a 3 node ensemble with 10 
> second session timeout and it was fine (although I didn't do much 
> other than test session establishment and teardown).
>
> Also see this: http://bit.ly/4ekN8G

The network of this test is a gigabit ethernet, ok? You know someone 
with was running ensembles in 100 Mbit/s ethernet?

Can Zookeeper ensemble runs only in memory rather than write in both 
memory and disk? This makes senses since I have a high reliable system? 
(Of course at some time we need a "dump" to shutdown and restart the 
entire system).

Well, the disk IO or network first limits the throughput?

Thanks for you quick response. I'm studding Zookeeper in my master 
thesis, for coordinate distributed index structures.

-- 
Thiago Borges

Re: Share Zookeeper instance and Connection Limits

Posted by Patrick Hunt <ph...@apache.org>.
Thiago Borges wrote:
> I read the documentation at zoo site and can't find some text about 
> sharing/limits of zoo clients connections.

No limits in particular to ZK itself (given enough memory) - usually the 
limitations are due to the max number of file descriptors the host OS 
allows. Often this is on the order of 1-8k, check your ulimit.

> I only see the parameter in .conf file about the max number of 
> connections per client.

This is to limit "DOS" attacks - it was added after we saw issues with 
buggy client implementations that would create infinite numbers of 
sessions with the ZK service. Eventually running into the FD limit 
problem I mentioned.

> Can someone point me some documentation about sharing the zookeeper 
> connections? Can I do this among different threads?

The API docs have those details:
http://hadoop.apache.org/zookeeper/docs/current/api/index.html
generally the client interface is thread safe though.

> And about client connections limits and how much throughput decreases 
> when the number of connections increase?

This test has 910 clients (sessions) involved:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperOver.html#Performance

We have users with 10k sessions accessing a single 5 node ZK ensemble. 
That's the largest I know about that's in production. I've personally 
tested up to 20k sessions attaching to a 3 node ensemble with 10 second 
session timeout and it was fine (although I didn't do much other than 
test session establishment and teardown).

Also see this: http://bit.ly/4ekN8G

Patrick