You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Michael Bauland <Mi...@knipp.de> on 2010/02/01 13:30:53 UTC

question regarding connectionloss

Hello,

I've got a question regarding the connectionloss exception thrown by Java.
I've got an ensemble running with three zk servers. If one of the three
servers is not running, the whole ensemble should still work (and it
does, so that's fine). But in this situation I experience quite often a
connectionloss exception and I'm wondering if I'm doing something wrong
or if that's to be expected.

My Code is rather simple:
I create a new connection to my ensemble using

ZooKeeper zk = new ZooKeeper (connectString, timeOut, new MyWatcher ());

where connectString contains all three servers. Then I use the ZooKeeper
to read data from a certain path:

zk.getData (path, false, null);

This call quite often returns an exception like

org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /125/170/test

But according to your documentation, the connectionloss exception should
only occur in the following two cases:

>    1. The application calls an operation on a session that is no longer alive/valid

This should not be the case, since I only just created the session.

>    2. The ZooKeeper client disconnects from a server when there are pending operations to that server, i.e., there is a pending asynchronous call.

The should also not be the case. I was just doing a read request and no
other client was accessing the ensemble.


My only idea is that maybe the connection call first tried to connect to
 the zookeeper server that was not running (remember only two of the
three servers are running) and before it had a chance to try to connect
to one of the other servers, my getData call was made and failed with
connectionloss. Could that be the reason?
But I thought the connection handling was automatic and if a connection
failed the client would automatically try any of the other listed
servers without the user noticing!?

Thanks for any help.

Cheers,

Michael


-- 
Michael Bauland
michael.bauland@knipp.de
bauland.tel

Re: question regarding connectionloss

Posted by Ted Dunning <te...@gmail.com>.
I have found that ZK is an excellent diagnostic tool for misconfigured
systems.  Every time I have seen excessive connection loss rates, it has not
been Zookeeper itself, but instead indicated problems on the client side.

On Tue, Feb 2, 2010 at 11:15 AM, Patrick Hunt <ph...@apache.org> wrote:

> For example "Hardware misconfiguration - NIC" caused one system to
> basically work, but with huge numbers of connection loss, esp whenever there
> was load (and I've seen this particular issue twice now).
>



-- 
Ted Dunning, CTO
DeepDyve

Re: question regarding connectionloss

Posted by Patrick Hunt <ph...@apache.org>.
You should never see connection loss except in the case where you have 
some network partition or some other issue that causes communication 
issues btw the client and server. (client swapping? server swapping or 
either having GC pause issues? etc...) Are you monitoring your 
hosts/network/jvms, etc..? "over virtualization" of the cluster hosts?

Take a look at your client/server logs and see if you can determine what 
the issue is. You might also try using some network level tools like 
ping/ssh to verify connectivity btw server/client. See this page for 
issues ppl have had in the past:
http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting
For example "Hardware misconfiguration - NIC" caused one system to 
basically work, but with huge numbers of connection loss, esp whenever 
there was load (and I've seen this particular issue twice now).

See

Patrick

Michael Bauland wrote:
> Hi Ted,
> 
> thanks for your reply.
> 
>> This page: about Zookeeper error
>> handling<http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling>may
>> help.
> 
> I actually read this page before. You may have misunderstood my
> question. I know how to recover from the connectionloss exception. I was
> just curious why it occurred so often in my described scenario. I would
> have assumed that in that scenario it shouldn't occur at all, but it was
> almost half of the requests that returned with a connectionloss.
> 
> Cheers,
> 
> Michael
> 
> 
>> On Mon, Feb 1, 2010 at 4:30 AM, Michael Bauland <Mi...@knipp.de>wrote:
>>
>>> Hello,
>>>
>>> I've got a question regarding the connectionloss exception thrown by Java.
>>> I've got an ensemble running with three zk servers. If one of the three
>>> servers is not running, the whole ensemble should still work (and it
>>> does, so that's fine). But in this situation I experience quite often a
>>> connectionloss exception and I'm wondering if I'm doing something wrong
>>> or if that's to be expected.
>>>
>>> My Code is rather simple:
>>> I create a new connection to my ensemble using
>>>
>>> ZooKeeper zk = new ZooKeeper (connectString, timeOut, new MyWatcher ());
>>>
>>> where connectString contains all three servers. Then I use the ZooKeeper
>>> to read data from a certain path:
>>>
>>> zk.getData (path, false, null);
>>>
>>> This call quite often returns an exception like
>>>
>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> KeeperErrorCode = ConnectionLoss for /125/170/test
>>>
>>> But according to your documentation, the connectionloss exception should
>>> only occur in the following two cases:
>>>
>>>>    1. The application calls an operation on a session that is no longer
>>> alive/valid
>>>
>>> This should not be the case, since I only just created the session.
>>>
>>>>    2. The ZooKeeper client disconnects from a server when there are
>>> pending operations to that server, i.e., there is a pending asynchronous
>>> call.
>>>
>>> The should also not be the case. I was just doing a read request and no
>>> other client was accessing the ensemble.
>>>
>>>
>>> My only idea is that maybe the connection call first tried to connect to
>>>  the zookeeper server that was not running (remember only two of the
>>> three servers are running) and before it had a chance to try to connect
>>> to one of the other servers, my getData call was made and failed with
>>> connectionloss. Could that be the reason?
>>> But I thought the connection handling was automatic and if a connection
>>> failed the client would automatically try any of the other listed
>>> servers without the user noticing!?
>>>
>>> Thanks for any help.
>>>
>>> Cheers,
>>>
>>> Michael
>>>
>>>
>>> --
>>> Michael Bauland
>>> michael.bauland@knipp.de
>>> bauland.tel
>>>
>>
>>
> 
> 
> --
> Michael Bauland
> michael.bauland@knipp.de
> bauland.tel

Re: question regarding connectionloss

Posted by Michael Bauland <Mi...@knipp.de>.
Hi Ted,

thanks for your reply.

> This page: about Zookeeper error
> handling<http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling>may
> help.

I actually read this page before. You may have misunderstood my
question. I know how to recover from the connectionloss exception. I was
just curious why it occurred so often in my described scenario. I would
have assumed that in that scenario it shouldn't occur at all, but it was
almost half of the requests that returned with a connectionloss.

Cheers,

Michael


> On Mon, Feb 1, 2010 at 4:30 AM, Michael Bauland <Mi...@knipp.de>wrote:
> 
>> Hello,
>>
>> I've got a question regarding the connectionloss exception thrown by Java.
>> I've got an ensemble running with three zk servers. If one of the three
>> servers is not running, the whole ensemble should still work (and it
>> does, so that's fine). But in this situation I experience quite often a
>> connectionloss exception and I'm wondering if I'm doing something wrong
>> or if that's to be expected.
>>
>> My Code is rather simple:
>> I create a new connection to my ensemble using
>>
>> ZooKeeper zk = new ZooKeeper (connectString, timeOut, new MyWatcher ());
>>
>> where connectString contains all three servers. Then I use the ZooKeeper
>> to read data from a certain path:
>>
>> zk.getData (path, false, null);
>>
>> This call quite often returns an exception like
>>
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /125/170/test
>>
>> But according to your documentation, the connectionloss exception should
>> only occur in the following two cases:
>>
>>>    1. The application calls an operation on a session that is no longer
>> alive/valid
>>
>> This should not be the case, since I only just created the session.
>>
>>>    2. The ZooKeeper client disconnects from a server when there are
>> pending operations to that server, i.e., there is a pending asynchronous
>> call.
>>
>> The should also not be the case. I was just doing a read request and no
>> other client was accessing the ensemble.
>>
>>
>> My only idea is that maybe the connection call first tried to connect to
>>  the zookeeper server that was not running (remember only two of the
>> three servers are running) and before it had a chance to try to connect
>> to one of the other servers, my getData call was made and failed with
>> connectionloss. Could that be the reason?
>> But I thought the connection handling was automatic and if a connection
>> failed the client would automatically try any of the other listed
>> servers without the user noticing!?
>>
>> Thanks for any help.
>>
>> Cheers,
>>
>> Michael
>>
>>
>> --
>> Michael Bauland
>> michael.bauland@knipp.de
>> bauland.tel
>>
> 
> 
> 


--
Michael Bauland
michael.bauland@knipp.de
bauland.tel

Re: question regarding connectionloss

Posted by Ted Dunning <te...@gmail.com>.
This page: about Zookeeper error
handling<http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling>may
help.

On Mon, Feb 1, 2010 at 4:30 AM, Michael Bauland <Mi...@knipp.de>wrote:

> Hello,
>
> I've got a question regarding the connectionloss exception thrown by Java.
> I've got an ensemble running with three zk servers. If one of the three
> servers is not running, the whole ensemble should still work (and it
> does, so that's fine). But in this situation I experience quite often a
> connectionloss exception and I'm wondering if I'm doing something wrong
> or if that's to be expected.
>
> My Code is rather simple:
> I create a new connection to my ensemble using
>
> ZooKeeper zk = new ZooKeeper (connectString, timeOut, new MyWatcher ());
>
> where connectString contains all three servers. Then I use the ZooKeeper
> to read data from a certain path:
>
> zk.getData (path, false, null);
>
> This call quite often returns an exception like
>
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /125/170/test
>
> But according to your documentation, the connectionloss exception should
> only occur in the following two cases:
>
> >    1. The application calls an operation on a session that is no longer
> alive/valid
>
> This should not be the case, since I only just created the session.
>
> >    2. The ZooKeeper client disconnects from a server when there are
> pending operations to that server, i.e., there is a pending asynchronous
> call.
>
> The should also not be the case. I was just doing a read request and no
> other client was accessing the ensemble.
>
>
> My only idea is that maybe the connection call first tried to connect to
>  the zookeeper server that was not running (remember only two of the
> three servers are running) and before it had a chance to try to connect
> to one of the other servers, my getData call was made and failed with
> connectionloss. Could that be the reason?
> But I thought the connection handling was automatic and if a connection
> failed the client would automatically try any of the other listed
> servers without the user noticing!?
>
> Thanks for any help.
>
> Cheers,
>
> Michael
>
>
> --
> Michael Bauland
> michael.bauland@knipp.de
> bauland.tel
>



-- 
Ted Dunning, CTO
DeepDyve