You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by "Krizansky, Jan" <jk...@netsuite.com> on 2016/08/12 14:39:50 UTC

How to investigate these error codes

<html><bodyHi ZK group,

I'm trying to reach out to you as we couldn't find any satisfying info online.
We've recently started seeing some errors in our cluster. The prevailing one is ZSESSIONEXPIRED but there sometimes is also a ZCONNECTIONLOSS error.
We couldn't find any documentation about possible causes of these issues. Any recommendation where we should investigate and what might be causing these?

The ZCONNECTIONLOSS error is fairly rare. But ZSESSIONEXPIRED is very common happening on almost every other hit.

Thank you,

Jan Krizansky


NOTICE: This email and any attachments may contain confidential and proprietary information of NetSuite Inc. and is for the sole use of the intended recipient for the stated purpose. Any improper use or distribution is prohibited. If you are not the intended recipient, please notify the sender; do not review, copy or distribute; and promptly delete or destroy all transmitted information. Please note that all communications and information transmitted through this email system may be monitored by NetSuite or its agents and that all incoming email is automatically scanned by a third party spam and filtering service

</body></html>

Re: How to investigate these error codes

Posted by Patrick Hunt <ph...@apache.org>.
What version of the c client are you using - multi-threaded or single
threaded? If multi-threaded then the library (incl pthreads) will take care
of handling the periodic heartbeats for you. If single threaded then you
might be starving the event processing - which includes the heartbeat loop.
See the THREADED sections of cli.c for an example.

Patrick

On Mon, Aug 15, 2016 at 12:01 AM, Krizansky, Jan <jk...@netsuite.com>
wrote:

> <html><bodyThank you Flavio for a swift answer.
> Yes, we're using the C client but we don't seem to have any network issues
> or load issues (in fact the setup is still in development mode so there is
> little to none traffic going through it).
> We have also set fairly high session timeout of 1,800,000 and a tickTime
> of 900,000. Yet we're getting SESSIONEXPIRED error even 2-3 times a minute.
> Are there any investigation steps you could recommend to pinpoint the
> problem?
>
> Thank you,
> Jan
>
> -----Original Message-----
> From: Flavio Junqueira [mailto:fpj@apache.org]
> Sent: Friday, August 12, 2016 6:05 PM
> To: user@zookeeper.apache.org
> Subject: Re: How to investigate these error codes
>
> Hi Jan,
>
> Connection loss means that the client has disconnected from the server it
> was connected to and it will try to connect to another server to avoid
> session expiration.
>
> Session expired means that your session has expired. :-)
>
> Session expiration is important because if you have ephemerals associated
> to that session, they will be gone, so it might trigger some recovery path
> in your application.
>
> You're using the C client? If so, then it is not going to be garbage
> collection on the client side causing your clients to disconnect, which is
> a pretty common cause for applications using the Java client. You may want
> to investigate if you're having some network issues or if perhaps your
> servers are overwhelmed with something. If you're sharing the disk devices
> and other applications are inducing a good number of IOs, then you may end
> up affecting the performance of the server.
>
> -Flavio
>
>
> > On 12 Aug 2016, at 15:39, Krizansky, Jan <jk...@netsuite.com>
> wrote:
> >
> > <html><bodyHi ZK group,
> >
> > I'm trying to reach out to you as we couldn't find any satisfying info
> online.
> > We've recently started seeing some errors in our cluster. The prevailing
> one is ZSESSIONEXPIRED but there sometimes is also a ZCONNECTIONLOSS error.
> > We couldn't find any documentation about possible causes of these
> issues. Any recommendation where we should investigate and what might be
> causing these?
> >
> > The ZCONNECTIONLOSS error is fairly rare. But ZSESSIONEXPIRED is very
> common happening on almost every other hit.
> >
> > Thank you,
> >
> > Jan Krizansky
> >
> >
> > NOTICE: This email and any attachments may contain confidential and
> proprietary information of NetSuite Inc. and is for the sole use of the
> intended recipient for the stated purpose. Any improper use or distribution
> is prohibited. If you are not the intended recipient, please notify the
> sender; do not review, copy or distribute; and promptly delete or destroy
> all transmitted information. Please note that all communications and
> information transmitted through this email system may be monitored by
> NetSuite or its agents and that all incoming email is automatically scanned
> by a third party spam and filtering service
> >
> > </body></html>
>
>
> NOTICE: This email and any attachments may contain confidential and
> proprietary information of NetSuite Inc. and is for the sole use of the
> intended recipient for the stated purpose. Any improper use or distribution
> is prohibited. If you are not the intended recipient, please notify the
> sender; do not review, copy or distribute; and promptly delete or destroy
> all transmitted information. Please note that all communications and
> information transmitted through this email system may be monitored by
> NetSuite or its agents and that all incoming email is automatically scanned
> by a third party spam and filtering service
>
> </body></html>
>

RE: How to investigate these error codes

Posted by "Krizansky, Jan" <jk...@netsuite.com>.
<html><bodyThank you Flavio for a swift answer.
Yes, we're using the C client but we don't seem to have any network issues or load issues (in fact the setup is still in development mode so there is little to none traffic going through it).
We have also set fairly high session timeout of 1,800,000 and a tickTime of 900,000. Yet we're getting SESSIONEXPIRED error even 2-3 times a minute. Are there any investigation steps you could recommend to pinpoint the problem?

Thank you,
Jan

-----Original Message-----
From: Flavio Junqueira [mailto:fpj@apache.org] 
Sent: Friday, August 12, 2016 6:05 PM
To: user@zookeeper.apache.org
Subject: Re: How to investigate these error codes

Hi Jan,

Connection loss means that the client has disconnected from the server it was connected to and it will try to connect to another server to avoid session expiration.

Session expired means that your session has expired. :-)

Session expiration is important because if you have ephemerals associated to that session, they will be gone, so it might trigger some recovery path in your application.

You're using the C client? If so, then it is not going to be garbage collection on the client side causing your clients to disconnect, which is a pretty common cause for applications using the Java client. You may want to investigate if you're having some network issues or if perhaps your servers are overwhelmed with something. If you're sharing the disk devices and other applications are inducing a good number of IOs, then you may end up affecting the performance of the server.

-Flavio 


> On 12 Aug 2016, at 15:39, Krizansky, Jan <jk...@netsuite.com> wrote:
> 
> <html><bodyHi ZK group,
> 
> I'm trying to reach out to you as we couldn't find any satisfying info online.
> We've recently started seeing some errors in our cluster. The prevailing one is ZSESSIONEXPIRED but there sometimes is also a ZCONNECTIONLOSS error.
> We couldn't find any documentation about possible causes of these issues. Any recommendation where we should investigate and what might be causing these?
> 
> The ZCONNECTIONLOSS error is fairly rare. But ZSESSIONEXPIRED is very common happening on almost every other hit.
> 
> Thank you,
> 
> Jan Krizansky
> 
> 
> NOTICE: This email and any attachments may contain confidential and proprietary information of NetSuite Inc. and is for the sole use of the intended recipient for the stated purpose. Any improper use or distribution is prohibited. If you are not the intended recipient, please notify the sender; do not review, copy or distribute; and promptly delete or destroy all transmitted information. Please note that all communications and information transmitted through this email system may be monitored by NetSuite or its agents and that all incoming email is automatically scanned by a third party spam and filtering service
> 
> </body></html>


NOTICE: This email and any attachments may contain confidential and proprietary information of NetSuite Inc. and is for the sole use of the intended recipient for the stated purpose. Any improper use or distribution is prohibited. If you are not the intended recipient, please notify the sender; do not review, copy or distribute; and promptly delete or destroy all transmitted information. Please note that all communications and information transmitted through this email system may be monitored by NetSuite or its agents and that all incoming email is automatically scanned by a third party spam and filtering service

</body></html>

Re: How to investigate these error codes

Posted by Michael Han <ha...@cloudera.com>.
On top of what Flavio pointed out:

The liveness of a session is maintained by regular heartbeats between
client and server, and heartbeats could fail due to a couple of reasons:

- Network: increased latency, or network error.
- Server overloaded such as IO contention / swapping; server GC took too
long; server has too many clients connected; server is running in a
multi-tenant environment.
- Client overloaded.
- Configuration issue: the pre-configured tickTime / minSessionTimeout /
maxSessionTimeout is too low for the specific environment; shared dataDir
and dataLogDir (which could cause IO contention in some cases.).

I think it's hard to tell exactly what's going on in the cluster based on
the information posted here given this many reason could cause the issue.
It seems that the cluster was running fine previously, so identifying
what's changed that correlates to the above points might be a good start.

On Fri, Aug 12, 2016 at 9:05 AM, Flavio Junqueira <fp...@apache.org> wrote:

> Hi Jan,
>
> Connection loss means that the client has disconnected from the server it
> was connected to and it will try to connect to another server to avoid
> session expiration.
>
> Session expired means that your session has expired. :-)
>
> Session expiration is important because if you have ephemerals associated
> to that session, they will be gone, so it might trigger some recovery path
> in your application.
>
> You're using the C client? If so, then it is not going to be garbage
> collection on the client side causing your clients to disconnect, which is
> a pretty common cause for applications using the Java client. You may want
> to investigate if you're having some network issues or if perhaps your
> servers are overwhelmed with something. If you're sharing the disk devices
> and other applications are inducing a good number of IOs, then you may end
> up affecting the performance of the server.
>
> -Flavio
>
>
> > On 12 Aug 2016, at 15:39, Krizansky, Jan <jk...@netsuite.com>
> wrote:
> >
> > <html><bodyHi ZK group,
> >
> > I'm trying to reach out to you as we couldn't find any satisfying info
> online.
> > We've recently started seeing some errors in our cluster. The prevailing
> one is ZSESSIONEXPIRED but there sometimes is also a ZCONNECTIONLOSS error.
> > We couldn't find any documentation about possible causes of these
> issues. Any recommendation where we should investigate and what might be
> causing these?
> >
> > The ZCONNECTIONLOSS error is fairly rare. But ZSESSIONEXPIRED is very
> common happening on almost every other hit.
> >
> > Thank you,
> >
> > Jan Krizansky
> >
> >
> > NOTICE: This email and any attachments may contain confidential and
> proprietary information of NetSuite Inc. and is for the sole use of the
> intended recipient for the stated purpose. Any improper use or distribution
> is prohibited. If you are not the intended recipient, please notify the
> sender; do not review, copy or distribute; and promptly delete or destroy
> all transmitted information. Please note that all communications and
> information transmitted through this email system may be monitored by
> NetSuite or its agents and that all incoming email is automatically scanned
> by a third party spam and filtering service
> >
> > </body></html>
>
>


-- 
Cheers
Michael.

Re: How to investigate these error codes

Posted by Flavio Junqueira <fp...@apache.org>.
Hi Jan,

Connection loss means that the client has disconnected from the server it was connected to and it will try to connect to another server to avoid session expiration.

Session expired means that your session has expired. :-)

Session expiration is important because if you have ephemerals associated to that session, they will be gone, so it might trigger some recovery path in your application.

You're using the C client? If so, then it is not going to be garbage collection on the client side causing your clients to disconnect, which is a pretty common cause for applications using the Java client. You may want to investigate if you're having some network issues or if perhaps your servers are overwhelmed with something. If you're sharing the disk devices and other applications are inducing a good number of IOs, then you may end up affecting the performance of the server.

-Flavio 


> On 12 Aug 2016, at 15:39, Krizansky, Jan <jk...@netsuite.com> wrote:
> 
> <html><bodyHi ZK group,
> 
> I'm trying to reach out to you as we couldn't find any satisfying info online.
> We've recently started seeing some errors in our cluster. The prevailing one is ZSESSIONEXPIRED but there sometimes is also a ZCONNECTIONLOSS error.
> We couldn't find any documentation about possible causes of these issues. Any recommendation where we should investigate and what might be causing these?
> 
> The ZCONNECTIONLOSS error is fairly rare. But ZSESSIONEXPIRED is very common happening on almost every other hit.
> 
> Thank you,
> 
> Jan Krizansky
> 
> 
> NOTICE: This email and any attachments may contain confidential and proprietary information of NetSuite Inc. and is for the sole use of the intended recipient for the stated purpose. Any improper use or distribution is prohibited. If you are not the intended recipient, please notify the sender; do not review, copy or distribute; and promptly delete or destroy all transmitted information. Please note that all communications and information transmitted through this email system may be monitored by NetSuite or its agents and that all incoming email is automatically scanned by a third party spam and filtering service
> 
> </body></html>