You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by Ted Dunning <te...@gmail.com> on 2011/09/07 21:00:43 UTC

file descriptor leak in client code?

One of our engineers has built a pretty convincing manual test that
demonstrates that the Zookeeper leaks  a few file descriptors every few
seconds if the attempt to connect throws a network unreachable.

If the max file descriptor limit is not reached, the client recovers when
the network comes back.

If the max file descriptor limit is reached, then the client never recovers
even when the network comes back.

Is this a known issue?

I am building a test to demonstrate the problem and experiment across
versions, but if somebody has broken this trail before, I would love to know
about it.

On the topic of testing this, I am also all ears if somebody has any ideas
for how to build a nice unit test for this.  Right now something like
mocking the network connection seems required.  That doesn't sound fun.

RE: file descriptor leak in client code?

Posted by "Fournier, Camille F." <Ca...@gs.com>.

FWIW, I pored over this and the NIO code a bit yesterday and couldn't find anything obviously wrong, but NIO is a tricky beast. Is it possible that because the channel never gets connected, and so we never call select, the selector never cleans up the cancelledKeys and therefore hangs on to the fd?

-----Original Message-----
From: Patrick Hunt [mailto:phunt@apache.org] 
Sent: Thursday, September 08, 2011 2:21 PM
To: dev@zookeeper.apache.org
Subject: Re: file descriptor leak in client code?

I don't think it's a known issue, please enter a jira. We have had
one/two of these in the past, but we've resolved them.

I would suggest aspectj. I've used this quite successfully in the past
to find networking and filesystem issues in ZooKeeper. Not sure how
easy it would be to create a unit test though (I've always verified it
manually)

Patrick

On Wed, Sep 7, 2011 at 12:00 PM, Ted Dunning <te...@gmail.com> wrote:
> One of our engineers has built a pretty convincing manual test that
> demonstrates that the Zookeeper leaks  a few file descriptors every few
> seconds if the attempt to connect throws a network unreachable.
>
> If the max file descriptor limit is not reached, the client recovers when
> the network comes back.
>
> If the max file descriptor limit is reached, then the client never recovers
> even when the network comes back.
>
> Is this a known issue?
>
> I am building a test to demonstrate the problem and experiment across
> versions, but if somebody has broken this trail before, I would love to know
> about it.
>
> On the topic of testing this, I am also all ears if somebody has any ideas
> for how to build a nice unit test for this.  Right now something like
> mocking the network connection seems required.  That doesn't sound fun.
>

Re: file descriptor leak in client code?

Posted by Patrick Hunt <ph...@apache.org>.

I don't think it's a known issue, please enter a jira. We have had
one/two of these in the past, but we've resolved them.

I would suggest aspectj. I've used this quite successfully in the past
to find networking and filesystem issues in ZooKeeper. Not sure how
easy it would be to create a unit test though (I've always verified it
manually)

Patrick

On Wed, Sep 7, 2011 at 12:00 PM, Ted Dunning <te...@gmail.com> wrote:
> One of our engineers has built a pretty convincing manual test that
> demonstrates that the Zookeeper leaks  a few file descriptors every few
> seconds if the attempt to connect throws a network unreachable.
>
> If the max file descriptor limit is not reached, the client recovers when
> the network comes back.
>
> If the max file descriptor limit is reached, then the client never recovers
> even when the network comes back.
>
> Is this a known issue?
>
> I am building a test to demonstrate the problem and experiment across
> versions, but if somebody has broken this trail before, I would love to know
> about it.
>
> On the topic of testing this, I am also all ears if somebody has any ideas
> for how to build a nice unit test for this.  Right now something like
> mocking the network connection seems required.  That doesn't sound fun.
>