You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@river.apache.org by Sergio Aguilera Cazorla <sa...@gmail.com> on 2012/06/13 21:13:17 UTC

Client timeouts and remote calls

Hello,

I have a question regarding client-side timeouts in Jini / Apache River. I
am finishing a program where a certain number of clients can obtain a proxy
and set / get properties (values) from an exported class in a server. Each
client becomes a RemoteEventListener of the server, so each time a property
is changed, the server calls notify() in ALL clients to make them aware
that a property has changed (and all clients update their data).

This architecture performs great if client programs finish in a "graceful"
way, because I have a register / unregister mechanism that makes the server
have an updated list of "alive" clients. However, if client machines "die
suddenly", the server will be unaware and will try to call notify() next
time that call is needed. Example (setSomething is a remote method on the
Server):

 public void setSomething(String param) {

<do the Stuff>
Remote event ev = <proper RemoteEvent object>
for(RemoteEventListener l : listeners) {

try {

l.notify(ev);

} catch(Exception e) {   listeners.remove(l);   }

}

}

I'm sure you see where I want to go: if some clients in the list died
suddenly, the notify() will be called over them. A ConnectException is
thrown and the client is removed properly but... it takes a long time for
the exception to be thrown! Do you know how to control this situation?

Thanks in advance!

ADDITIONAL DATA:
I have tried setting un the following RMI System properies, and didn't work:
System.setProperty("sun.rmi.transport.tcp.responseTimeout","2000");
System.setProperty("sun.rmi.transport.tcp.handshakeTimeout","2000");
System.setProperty("sun.rmi.transport.tcp.readTimeout","2000");
System.setProperty("sun.rmi.transport.connectionTimeout","2000");
System.setProperty("sun.rmi.transport.proxy.connectTimeout ","2000");

At present moment, under Windows XP for both client and server, the
ConnectException takes exactly* 21 seconds* to be thrown. Do you know the
reason for this value?

-- 
*Sergio Aguilera*

Re: Client timeouts and remote calls

Posted by Gregg Wonderly <ge...@cox.net>.

On Jun 16, 2012, at 2:16 AM, Dan Creswell wrote:

> On 14 June 2012 15:37, Gregg Wonderly <ge...@cox.net> wrote:
> 
>> If you use a smart proxy, and put the lease renewal call inside the smart
>> proxy, and register a listener, you can see the renewal fail.  But, you
>> still have to know what that means based on how the service and the lease
>> interact.  To get a legitimate, two way, liveness test, you really have to
>> have a conversation with the server, from the client, and have a view of
>> the endpoint activities on the server.
>> 
>> There are lots of ways to engineer this, and both leasing or transactions
>> can be part of the solution.  But, in the end, you must decide what you
>> need to know, and then think through what you are expecting vs what is
>> actually achievable using the facilities you can deploy.
>> 
>> Most of the time, the true test, is merely to be able to use the
>> endpoint(s) end to end by making a call from the client to the service for
>> that liveness test.
>> 
>> Given all the possible forms of partial failure that can occur in a
>> distributed system.  You can't rely on detached functionality, such as
>> leases, as the "only" way to know that something is working on the other
>> end.
>> 
> 
> Indeed, options are ultimately limited by the fact that one cannot tell the
> difference between genuine machine failure and slowness due to excessive
> load or packet loss or network breakage (there is a proof for this, think
> it's due to Lynch but...).
> 
> One often tackles this sort of problem with a Failure Detector (
> http://www.cs.cornell.edu/home/sam/FDpapers.html). Leases are somewhat
> related in that they help form a view that something is wrong, what they
> don't (and can't) tell you is _what_ is wrong. They essentially rely on a
> form of active ping (the extension of the lease) to detect failure. Most
> importantly the Lease forms a contract between client and server such that
> _both_ can make an independent assumption about failure/loss after a period
> of time.
> 
> When one detects a failure, one can attempt to diagnose more accurately
> what is broken but it's tricky. Let's say we want to connect to a server
> using a TCP-based protocol. When connecting we can fail for several reasons
> including packet loss, excessive server load or simply because the
> connection queue is too big. Deducing which of those is the culprit is much
> more a debugging exercise than something one attempts to deal with in the
> system code.
> 
> To summarise:
> 
> (1) Build a model that can eventually deduce there has been a failure of
> some sort.
> (2) Build a recovery model that, given a failure, can restore whatever
> state is required and continue to make progress.

I've found this to be those most reliable and simple way to do complex system design.  Think about it from the perspective of keeping your APIs stateless while allowing the service to understand it's state, and how to reconfigure itself based on API calls into it.

What's the term I'm searching for?  The APIs should always be "successful" (unless the data is wrong), and the arguments should fully specify what is needed.  RESTful web services do this well.  I've had countless arguments about RPC being restful, but I can't seem to get anyone to agree that "invoke" is the operation.  They always say that the method name represents the operation.  I assert the method name is part of the data...

Gregg Wonderly

Re: Client timeouts and remote calls

Posted by Dan Creswell <da...@gmail.com>.

On 14 June 2012 15:37, Gregg Wonderly <ge...@cox.net> wrote:

> If you use a smart proxy, and put the lease renewal call inside the smart
> proxy, and register a listener, you can see the renewal fail.  But, you
> still have to know what that means based on how the service and the lease
> interact.  To get a legitimate, two way, liveness test, you really have to
> have a conversation with the server, from the client, and have a view of
> the endpoint activities on the server.
>
> There are lots of ways to engineer this, and both leasing or transactions
> can be part of the solution.  But, in the end, you must decide what you
> need to know, and then think through what you are expecting vs what is
> actually achievable using the facilities you can deploy.
>
> Most of the time, the true test, is merely to be able to use the
> endpoint(s) end to end by making a call from the client to the service for
> that liveness test.
>
> Given all the possible forms of partial failure that can occur in a
> distributed system.  You can't rely on detached functionality, such as
> leases, as the "only" way to know that something is working on the other
> end.
>

Indeed, options are ultimately limited by the fact that one cannot tell the
difference between genuine machine failure and slowness due to excessive
load or packet loss or network breakage (there is a proof for this, think
it's due to Lynch but...).

One often tackles this sort of problem with a Failure Detector (
http://www.cs.cornell.edu/home/sam/FDpapers.html). Leases are somewhat
related in that they help form a view that something is wrong, what they
don't (and can't) tell you is _what_ is wrong. They essentially rely on a
form of active ping (the extension of the lease) to detect failure. Most
importantly the Lease forms a contract between client and server such that
_both_ can make an independent assumption about failure/loss after a period
of time.

When one detects a failure, one can attempt to diagnose more accurately
what is broken but it's tricky. Let's say we want to connect to a server
using a TCP-based protocol. When connecting we can fail for several reasons
including packet loss, excessive server load or simply because the
connection queue is too big. Deducing which of those is the culprit is much
more a debugging exercise than something one attempts to deal with in the
system code.

To summarise:

(1) Build a model that can eventually deduce there has been a failure of
some sort.
(2) Build a recovery model that, given a failure, can restore whatever
state is required and continue to make progress.

In many cases, one can solve much of the problem by pushing state over to
the client (e.g. recent innovations in browsers), leaving the server
stateless but there are applications where that isn't viable. In those
cases, it gets harder needing disk-replication etc.

Certainly there is a generic model for notify as provided in the spec and
implemented in JavaSpaces, LUS etc. That may be useful. In the particular
case of this problem:

"At present moment, under Windows XP for both client and server, the
ConnectException takes exactly* 21 seconds* to be thrown. Do you know the
reason for this value?"

The implication is that client and server already have an established
connection from the perspective of TCP. I would suspect that the settings
for the Windows network stack apply various timeouts that end up giving a
total of 21 seconds before efforts to interact are terminated.

Also:

"System.setProperty("sun.rmi.transport.tcp.responseTimeout","2000");
System.setProperty("sun.rmi.transport.tcp.handshakeTimeout","2000");
System.setProperty("sun.rmi.transport.tcp.readTimeout","2000");
System.setProperty("sun.rmi.transport.connectionTimeout","2000");
System.setProperty("sun.rmi.transport.proxy.connectTimeout ","2000");
"

Have you configured a JRMP (i.e. native JDK RMI) transport or are you using
JERI? I can't recall just how many of those settings are honoured by JERI
so they may be having no effect at all.

Cheers,

Dan.

<snip>

>
> >>
> >> ________________________________________
> >> From: Gregg Wonderly [gergg@cox.net]
> >> Sent: Wednesday, June 13, 2012 3:53 PM
> >> To: user@river.apache.org
> >> Subject: Re: Client timeouts and remote calls
> >>
> >> There are timeouts that you can change in your Configuration to control
> how long the waits occur.  If it's important that everyone agree on the
> values being changed, you could include the use of a transaction so that if
> one client dies in the middle, then everyone can revert, and you can retry
> to get things to a sane state.
> >>
> >> This is important if the data the clients receives, controls how they
> interact with the service.  But, you can otherwise, just do what you are
> doing, without a transaction.  If you turn on DGC, or use a Lease on the
> client received endpoint, then you might be able to know that a client is
> actually gone, rather than just temporarily unreachable.
> >>
> >> Gregg
> >>
> >> On Jun 13, 2012, at 2:13 PM, Sergio Aguilera Cazorla wrote:
> >>
> >>> Hello,
> >>>
> >>> I have a question regarding client-side timeouts in Jini / Apache
> River. I
> >>> am finishing a program where a certain number of clients can obtain a
> proxy
> >>> and set / get properties (values) from an exported class in a server.
> Each
> >>> client becomes a RemoteEventListener of the server, so each time a
> property
> >>> is changed, the server calls notify() in ALL clients to make them aware
> >>> that a property has changed (and all clients update their data).
> >>>
> >>> This architecture performs great if client programs finish in a
> "graceful"
> >>> way, because I have a register / unregister mechanism that makes the
> server
> >>> have an updated list of "alive" clients. However, if client machines
> "die
> >>> suddenly", the server will be unaware and will try to call notify()
> next
> >>> time that call is needed. Example (setSomething is a remote method on
> the
> >>> Server):
> >>>
> >>> public void setSomething(String param) {
> >>>
> >>> <do the Stuff>
> >>> Remote event ev = <proper RemoteEvent object>
> >>> for(RemoteEventListener l : listeners) {
> >>>
> >>> try {
> >>>
> >>> l.notify(ev);
> >>>
> >>> } catch(Exception e) {   listeners.remove(l);   }
> >>>
> >>> }
> >>>
> >>> }
> >>>
> >>> I'm sure you see where I want to go: if some clients in the list died
> >>> suddenly, the notify() will be called over them. A ConnectException is
> >>> thrown and the client is removed properly but... it takes a long time
> for
> >>> the exception to be thrown! Do you know how to control this situation?
> >>>
> >>> Thanks in advance!
> >>>
> >>> ADDITIONAL DATA:
> >>> I have tried setting un the following RMI System properies, and didn't
> work:
> >>> System.setProperty("sun.rmi.transport.tcp.responseTimeout","2000");
> >>> System.setProperty("sun.rmi.transport.tcp.handshakeTimeout","2000");
> >>> System.setProperty("sun.rmi.transport.tcp.readTimeout","2000");
> >>> System.setProperty("sun.rmi.transport.connectionTimeout","2000");
> >>> System.setProperty("sun.rmi.transport.proxy.connectTimeout ","2000");
> >>>
> >>> At present moment, under Windows XP for both client and server, the
> >>> ConnectException takes exactly* 21 seconds* to be thrown. Do you know
> the
> >>> reason for this value?
> >>>
> >>> --
> >>> *Sergio Aguilera*
> >>
> >
>
>

RE: Client timeouts and remote calls

Posted by Christopher Dolan <ch...@avid.com>.

Very true, well said.

-----Original Message-----
From: Gregg Wonderly [mailto:gergg@cox.net] 
Sent: Thursday, June 14, 2012 9:37 AM
To: user@river.apache.org
Subject: Re: Client timeouts and remote calls

If you use a smart proxy, and put the lease renewal call inside the smart proxy, and register a listener, you can see the renewal fail.  But, you still have to know what that means based on how the service and the lease interact.  To get a legitimate, two way, liveness test, you really have to have a conversation with the server, from the client, and have a view of the endpoint activities on the server.  

There are lots of ways to engineer this, and both leasing or transactions can be part of the solution.  But, in the end, you must decide what you need to know, and then think through what you are expecting vs what is actually achievable using the facilities you can deploy.

Most of the time, the true test, is merely to be able to use the endpoint(s) end to end by making a call from the client to the service for that liveness test.  

Given all the possible forms of partial failure that can occur in a distributed system.  You can't rely on detached functionality, such as leases, as the "only" way to know that something is working on the other end.

Gregg Wonderly

On Jun 14, 2012, at 7:37 AM, Christopher Dolan wrote:

> Very true about the possible client/server asymmetry of leases. However, I've found that in most cases, the server that hosts the lease manager is the same one that hosts the events. So if it crashes and the client tries to renew the lease, you find out via the lease renewal failure.
> 
> Chris
> 
> -----Original Message-----
> From: Greg Trasuk [mailto:trasukg@stratuscom.com] 
> Sent: Thursday, June 14, 2012 3:21 AM
> To: user@river.apache.org
> Subject: RE: Client timeouts and remote calls
> 
> 
> Look carefully at what Gregg said:
>> ...If you turn on DGC, or use a Lease on the client received endpoint,
> then you might be able to know that a client is actually gone, rather
> than just temporarily unreachable.
> 
> And then look at what Chris said:
>> ...Then the client also knows if the server dies and can re-subscribe.
> 
> See the difference?  Gregg is talking about the server knowing the
> client is gone, and Chris is talking about the client knowing the server
> is gone.
> 
> A common misconception about Jini Leases is that they let the client
> know when the service dies.  Not true, unfortunately.  Leases are there
> so that the _Service_ can clean up any state that's held on behalf of
> the client, when the client fails to renew the lease.
> 
> The fact that you can renew a lease in no way guarantees that the
> service is working.  The service could very well have handed off the
> lease to some other landlord implementation.  Even if the lease is
> handled by the service itself, it's not necessarily linked to the actual
> event delivery (or whatever). In any case, the lease duration is
> probably much longer than the time boundary you'd like to have on
> detecting a failure.
> 
> Unfortunately, the only way the client knows if event delivery has
> failed is if events stop showing up. In the case of a server that is
> just holding data for you (like a lease on a transaction or a JavaSpace
> entry), you only know the server has failed when you go to call it and
> can't reach it.  Leases tell you nothing from the client end.  Look at
> it this way:  Your apartment might be leased for another six months, but
> that doesn't rule out the possibility that it burns down.
> 
> Other than that, I really like Chris's argument for a three-thread
> executor for event delivery.
> 
> Cheers,
> 
> Greg.
> 
> On Wed, 2012-06-13 at 22:31, Christopher Dolan wrote:
>> I agree about using a Lease. Then the client also knows if the server dies and can re-subscribe.
>> 
>> If the latency of the timeout is a concern, then my solution has been to use a 3 thread executor to send the updates. I chose 3 as a good number because of the philosophy of "once is an event, twice is a coincidence and thrice is conspiracy." That is, with 1 thread you should expect to be blocked; with 2 threads you should rarely be blocked but you could be blocked by bad luck; with 3 threads you'll only be blocked if there's a noteworthy outage which probably has a cause outside of your control so more threads won't help.
>> 
>> Chris
>> 
>> ________________________________________
>> From: Gregg Wonderly [gergg@cox.net]
>> Sent: Wednesday, June 13, 2012 3:53 PM
>> To: user@river.apache.org
>> Subject: Re: Client timeouts and remote calls
>> 
>> There are timeouts that you can change in your Configuration to control how long the waits occur.  If it's important that everyone agree on the values being changed, you could include the use of a transaction so that if one client dies in the middle, then everyone can revert, and you can retry to get things to a sane state.
>> 
>> This is important if the data the clients receives, controls how they interact with the service.  But, you can otherwise, just do what you are doing, without a transaction.  If you turn on DGC, or use a Lease on the client received endpoint, then you might be able to know that a client is actually gone, rather than just temporarily unreachable.
>> 
>> Gregg
>> 
>> On Jun 13, 2012, at 2:13 PM, Sergio Aguilera Cazorla wrote:
>> 
>>> Hello,
>>> 
>>> I have a question regarding client-side timeouts in Jini / Apache River. I
>>> am finishing a program where a certain number of clients can obtain a proxy
>>> and set / get properties (values) from an exported class in a server. Each
>>> client becomes a RemoteEventListener of the server, so each time a property
>>> is changed, the server calls notify() in ALL clients to make them aware
>>> that a property has changed (and all clients update their data).
>>> 
>>> This architecture performs great if client programs finish in a "graceful"
>>> way, because I have a register / unregister mechanism that makes the server
>>> have an updated list of "alive" clients. However, if client machines "die
>>> suddenly", the server will be unaware and will try to call notify() next
>>> time that call is needed. Example (setSomething is a remote method on the
>>> Server):
>>> 
>>> public void setSomething(String param) {
>>> 
>>> <do the Stuff>
>>> Remote event ev = <proper RemoteEvent object>
>>> for(RemoteEventListener l : listeners) {
>>> 
>>> try {
>>> 
>>> l.notify(ev);
>>> 
>>> } catch(Exception e) {   listeners.remove(l);   }
>>> 
>>> }
>>> 
>>> }
>>> 
>>> I'm sure you see where I want to go: if some clients in the list died
>>> suddenly, the notify() will be called over them. A ConnectException is
>>> thrown and the client is removed properly but... it takes a long time for
>>> the exception to be thrown! Do you know how to control this situation?
>>> 
>>> Thanks in advance!
>>> 
>>> ADDITIONAL DATA:
>>> I have tried setting un the following RMI System properies, and didn't work:
>>> System.setProperty("sun.rmi.transport.tcp.responseTimeout","2000");
>>> System.setProperty("sun.rmi.transport.tcp.handshakeTimeout","2000");
>>> System.setProperty("sun.rmi.transport.tcp.readTimeout","2000");
>>> System.setProperty("sun.rmi.transport.connectionTimeout","2000");
>>> System.setProperty("sun.rmi.transport.proxy.connectTimeout ","2000");
>>> 
>>> At present moment, under Windows XP for both client and server, the
>>> ConnectException takes exactly* 21 seconds* to be thrown. Do you know the
>>> reason for this value?
>>> 
>>> --
>>> *Sergio Aguilera*
>> 
>

Re: Client timeouts and remote calls

Posted by Gregg Wonderly <ge...@cox.net>.

If you use a smart proxy, and put the lease renewal call inside the smart proxy, and register a listener, you can see the renewal fail.  But, you still have to know what that means based on how the service and the lease interact.  To get a legitimate, two way, liveness test, you really have to have a conversation with the server, from the client, and have a view of the endpoint activities on the server.  

There are lots of ways to engineer this, and both leasing or transactions can be part of the solution.  But, in the end, you must decide what you need to know, and then think through what you are expecting vs what is actually achievable using the facilities you can deploy.

Most of the time, the true test, is merely to be able to use the endpoint(s) end to end by making a call from the client to the service for that liveness test.  

Given all the possible forms of partial failure that can occur in a distributed system.  You can't rely on detached functionality, such as leases, as the "only" way to know that something is working on the other end.

Gregg Wonderly

On Jun 14, 2012, at 7:37 AM, Christopher Dolan wrote:

> Very true about the possible client/server asymmetry of leases. However, I've found that in most cases, the server that hosts the lease manager is the same one that hosts the events. So if it crashes and the client tries to renew the lease, you find out via the lease renewal failure.
> 
> Chris
> 
> -----Original Message-----
> From: Greg Trasuk [mailto:trasukg@stratuscom.com] 
> Sent: Thursday, June 14, 2012 3:21 AM
> To: user@river.apache.org
> Subject: RE: Client timeouts and remote calls
> 
> 
> Look carefully at what Gregg said:
>> ...If you turn on DGC, or use a Lease on the client received endpoint,
> then you might be able to know that a client is actually gone, rather
> than just temporarily unreachable.
> 
> And then look at what Chris said:
>> ...Then the client also knows if the server dies and can re-subscribe.
> 
> See the difference?  Gregg is talking about the server knowing the
> client is gone, and Chris is talking about the client knowing the server
> is gone.
> 
> A common misconception about Jini Leases is that they let the client
> know when the service dies.  Not true, unfortunately.  Leases are there
> so that the _Service_ can clean up any state that's held on behalf of
> the client, when the client fails to renew the lease.
> 
> The fact that you can renew a lease in no way guarantees that the
> service is working.  The service could very well have handed off the
> lease to some other landlord implementation.  Even if the lease is
> handled by the service itself, it's not necessarily linked to the actual
> event delivery (or whatever). In any case, the lease duration is
> probably much longer than the time boundary you'd like to have on
> detecting a failure.
> 
> Unfortunately, the only way the client knows if event delivery has
> failed is if events stop showing up. In the case of a server that is
> just holding data for you (like a lease on a transaction or a JavaSpace
> entry), you only know the server has failed when you go to call it and
> can't reach it.  Leases tell you nothing from the client end.  Look at
> it this way:  Your apartment might be leased for another six months, but
> that doesn't rule out the possibility that it burns down.
> 
> Other than that, I really like Chris's argument for a three-thread
> executor for event delivery.
> 
> Cheers,
> 
> Greg.
> 
> On Wed, 2012-06-13 at 22:31, Christopher Dolan wrote:
>> I agree about using a Lease. Then the client also knows if the server dies and can re-subscribe.
>> 
>> If the latency of the timeout is a concern, then my solution has been to use a 3 thread executor to send the updates. I chose 3 as a good number because of the philosophy of "once is an event, twice is a coincidence and thrice is conspiracy." That is, with 1 thread you should expect to be blocked; with 2 threads you should rarely be blocked but you could be blocked by bad luck; with 3 threads you'll only be blocked if there's a noteworthy outage which probably has a cause outside of your control so more threads won't help.
>> 
>> Chris
>> 
>> ________________________________________
>> From: Gregg Wonderly [gergg@cox.net]
>> Sent: Wednesday, June 13, 2012 3:53 PM
>> To: user@river.apache.org
>> Subject: Re: Client timeouts and remote calls
>> 
>> There are timeouts that you can change in your Configuration to control how long the waits occur.  If it's important that everyone agree on the values being changed, you could include the use of a transaction so that if one client dies in the middle, then everyone can revert, and you can retry to get things to a sane state.
>> 
>> This is important if the data the clients receives, controls how they interact with the service.  But, you can otherwise, just do what you are doing, without a transaction.  If you turn on DGC, or use a Lease on the client received endpoint, then you might be able to know that a client is actually gone, rather than just temporarily unreachable.
>> 
>> Gregg
>> 
>> On Jun 13, 2012, at 2:13 PM, Sergio Aguilera Cazorla wrote:
>> 
>>> Hello,
>>> 
>>> I have a question regarding client-side timeouts in Jini / Apache River. I
>>> am finishing a program where a certain number of clients can obtain a proxy
>>> and set / get properties (values) from an exported class in a server. Each
>>> client becomes a RemoteEventListener of the server, so each time a property
>>> is changed, the server calls notify() in ALL clients to make them aware
>>> that a property has changed (and all clients update their data).
>>> 
>>> This architecture performs great if client programs finish in a "graceful"
>>> way, because I have a register / unregister mechanism that makes the server
>>> have an updated list of "alive" clients. However, if client machines "die
>>> suddenly", the server will be unaware and will try to call notify() next
>>> time that call is needed. Example (setSomething is a remote method on the
>>> Server):
>>> 
>>> public void setSomething(String param) {
>>> 
>>> <do the Stuff>
>>> Remote event ev = <proper RemoteEvent object>
>>> for(RemoteEventListener l : listeners) {
>>> 
>>> try {
>>> 
>>> l.notify(ev);
>>> 
>>> } catch(Exception e) {   listeners.remove(l);   }
>>> 
>>> }
>>> 
>>> }
>>> 
>>> I'm sure you see where I want to go: if some clients in the list died
>>> suddenly, the notify() will be called over them. A ConnectException is
>>> thrown and the client is removed properly but... it takes a long time for
>>> the exception to be thrown! Do you know how to control this situation?
>>> 
>>> Thanks in advance!
>>> 
>>> ADDITIONAL DATA:
>>> I have tried setting un the following RMI System properies, and didn't work:
>>> System.setProperty("sun.rmi.transport.tcp.responseTimeout","2000");
>>> System.setProperty("sun.rmi.transport.tcp.handshakeTimeout","2000");
>>> System.setProperty("sun.rmi.transport.tcp.readTimeout","2000");
>>> System.setProperty("sun.rmi.transport.connectionTimeout","2000");
>>> System.setProperty("sun.rmi.transport.proxy.connectTimeout ","2000");
>>> 
>>> At present moment, under Windows XP for both client and server, the
>>> ConnectException takes exactly* 21 seconds* to be thrown. Do you know the
>>> reason for this value?
>>> 
>>> --
>>> *Sergio Aguilera*
>> 
>

RE: Client timeouts and remote calls

Posted by Christopher Dolan <ch...@avid.com>.

Very true about the possible client/server asymmetry of leases. However, I've found that in most cases, the server that hosts the lease manager is the same one that hosts the events. So if it crashes and the client tries to renew the lease, you find out via the lease renewal failure.

Chris

-----Original Message-----
From: Greg Trasuk [mailto:trasukg@stratuscom.com] 
Sent: Thursday, June 14, 2012 3:21 AM
To: user@river.apache.org
Subject: RE: Client timeouts and remote calls


Look carefully at what Gregg said:
> ...If you turn on DGC, or use a Lease on the client received endpoint,
then you might be able to know that a client is actually gone, rather
than just temporarily unreachable.

And then look at what Chris said:
> ...Then the client also knows if the server dies and can re-subscribe.

See the difference?  Gregg is talking about the server knowing the
client is gone, and Chris is talking about the client knowing the server
is gone.

A common misconception about Jini Leases is that they let the client
know when the service dies.  Not true, unfortunately.  Leases are there
so that the _Service_ can clean up any state that's held on behalf of
the client, when the client fails to renew the lease.

The fact that you can renew a lease in no way guarantees that the
service is working.  The service could very well have handed off the
lease to some other landlord implementation.  Even if the lease is
handled by the service itself, it's not necessarily linked to the actual
event delivery (or whatever). In any case, the lease duration is
probably much longer than the time boundary you'd like to have on
detecting a failure.

Unfortunately, the only way the client knows if event delivery has
failed is if events stop showing up. In the case of a server that is
just holding data for you (like a lease on a transaction or a JavaSpace
entry), you only know the server has failed when you go to call it and
can't reach it.  Leases tell you nothing from the client end.  Look at
it this way:  Your apartment might be leased for another six months, but
that doesn't rule out the possibility that it burns down.

Other than that, I really like Chris's argument for a three-thread
executor for event delivery.

Cheers,

Greg.

On Wed, 2012-06-13 at 22:31, Christopher Dolan wrote:
> I agree about using a Lease. Then the client also knows if the server dies and can re-subscribe.
> 
> If the latency of the timeout is a concern, then my solution has been to use a 3 thread executor to send the updates. I chose 3 as a good number because of the philosophy of "once is an event, twice is a coincidence and thrice is conspiracy." That is, with 1 thread you should expect to be blocked; with 2 threads you should rarely be blocked but you could be blocked by bad luck; with 3 threads you'll only be blocked if there's a noteworthy outage which probably has a cause outside of your control so more threads won't help.
> 
> Chris
> 
> ________________________________________
> From: Gregg Wonderly [gergg@cox.net]
> Sent: Wednesday, June 13, 2012 3:53 PM
> To: user@river.apache.org
> Subject: Re: Client timeouts and remote calls
> 
> There are timeouts that you can change in your Configuration to control how long the waits occur.  If it's important that everyone agree on the values being changed, you could include the use of a transaction so that if one client dies in the middle, then everyone can revert, and you can retry to get things to a sane state.
> 
> This is important if the data the clients receives, controls how they interact with the service.  But, you can otherwise, just do what you are doing, without a transaction.  If you turn on DGC, or use a Lease on the client received endpoint, then you might be able to know that a client is actually gone, rather than just temporarily unreachable.
> 
> Gregg
> 
> On Jun 13, 2012, at 2:13 PM, Sergio Aguilera Cazorla wrote:
> 
> > Hello,
> >
> > I have a question regarding client-side timeouts in Jini / Apache River. I
> > am finishing a program where a certain number of clients can obtain a proxy
> > and set / get properties (values) from an exported class in a server. Each
> > client becomes a RemoteEventListener of the server, so each time a property
> > is changed, the server calls notify() in ALL clients to make them aware
> > that a property has changed (and all clients update their data).
> >
> > This architecture performs great if client programs finish in a "graceful"
> > way, because I have a register / unregister mechanism that makes the server
> > have an updated list of "alive" clients. However, if client machines "die
> > suddenly", the server will be unaware and will try to call notify() next
> > time that call is needed. Example (setSomething is a remote method on the
> > Server):
> >
> > public void setSomething(String param) {
> >
> > <do the Stuff>
> > Remote event ev = <proper RemoteEvent object>
> > for(RemoteEventListener l : listeners) {
> >
> > try {
> >
> > l.notify(ev);
> >
> > } catch(Exception e) {   listeners.remove(l);   }
> >
> > }
> >
> > }
> >
> > I'm sure you see where I want to go: if some clients in the list died
> > suddenly, the notify() will be called over them. A ConnectException is
> > thrown and the client is removed properly but... it takes a long time for
> > the exception to be thrown! Do you know how to control this situation?
> >
> > Thanks in advance!
> >
> > ADDITIONAL DATA:
> > I have tried setting un the following RMI System properies, and didn't work:
> > System.setProperty("sun.rmi.transport.tcp.responseTimeout","2000");
> > System.setProperty("sun.rmi.transport.tcp.handshakeTimeout","2000");
> > System.setProperty("sun.rmi.transport.tcp.readTimeout","2000");
> > System.setProperty("sun.rmi.transport.connectionTimeout","2000");
> > System.setProperty("sun.rmi.transport.proxy.connectTimeout ","2000");
> >
> > At present moment, under Windows XP for both client and server, the
> > ConnectException takes exactly* 21 seconds* to be thrown. Do you know the
> > reason for this value?
> >
> > --
> > *Sergio Aguilera*
>

RE: Client timeouts and remote calls

Posted by Greg Trasuk <tr...@stratuscom.com>.

Look carefully at what Gregg said:
> ...If you turn on DGC, or use a Lease on the client received endpoint,
then you might be able to know that a client is actually gone, rather
than just temporarily unreachable.

And then look at what Chris said:
> ...Then the client also knows if the server dies and can re-subscribe.

See the difference?  Gregg is talking about the server knowing the
client is gone, and Chris is talking about the client knowing the server
is gone.

A common misconception about Jini Leases is that they let the client
know when the service dies.  Not true, unfortunately.  Leases are there
so that the _Service_ can clean up any state that's held on behalf of
the client, when the client fails to renew the lease.

The fact that you can renew a lease in no way guarantees that the
service is working.  The service could very well have handed off the
lease to some other landlord implementation.  Even if the lease is
handled by the service itself, it's not necessarily linked to the actual
event delivery (or whatever). In any case, the lease duration is
probably much longer than the time boundary you'd like to have on
detecting a failure.

Unfortunately, the only way the client knows if event delivery has
failed is if events stop showing up. In the case of a server that is
just holding data for you (like a lease on a transaction or a JavaSpace
entry), you only know the server has failed when you go to call it and
can't reach it.  Leases tell you nothing from the client end.  Look at
it this way:  Your apartment might be leased for another six months, but
that doesn't rule out the possibility that it burns down.

Other than that, I really like Chris's argument for a three-thread
executor for event delivery.

Cheers,

Greg.

On Wed, 2012-06-13 at 22:31, Christopher Dolan wrote:
> I agree about using a Lease. Then the client also knows if the server dies and can re-subscribe.
> 
> If the latency of the timeout is a concern, then my solution has been to use a 3 thread executor to send the updates. I chose 3 as a good number because of the philosophy of "once is an event, twice is a coincidence and thrice is conspiracy." That is, with 1 thread you should expect to be blocked; with 2 threads you should rarely be blocked but you could be blocked by bad luck; with 3 threads you'll only be blocked if there's a noteworthy outage which probably has a cause outside of your control so more threads won't help.
> 
> Chris
> 
> ________________________________________
> From: Gregg Wonderly [gergg@cox.net]
> Sent: Wednesday, June 13, 2012 3:53 PM
> To: user@river.apache.org
> Subject: Re: Client timeouts and remote calls
> 
> There are timeouts that you can change in your Configuration to control how long the waits occur.  If it's important that everyone agree on the values being changed, you could include the use of a transaction so that if one client dies in the middle, then everyone can revert, and you can retry to get things to a sane state.
> 
> This is important if the data the clients receives, controls how they interact with the service.  But, you can otherwise, just do what you are doing, without a transaction.  If you turn on DGC, or use a Lease on the client received endpoint, then you might be able to know that a client is actually gone, rather than just temporarily unreachable.
> 
> Gregg
> 
> On Jun 13, 2012, at 2:13 PM, Sergio Aguilera Cazorla wrote:
> 
> > Hello,
> >
> > I have a question regarding client-side timeouts in Jini / Apache River. I
> > am finishing a program where a certain number of clients can obtain a proxy
> > and set / get properties (values) from an exported class in a server. Each
> > client becomes a RemoteEventListener of the server, so each time a property
> > is changed, the server calls notify() in ALL clients to make them aware
> > that a property has changed (and all clients update their data).
> >
> > This architecture performs great if client programs finish in a "graceful"
> > way, because I have a register / unregister mechanism that makes the server
> > have an updated list of "alive" clients. However, if client machines "die
> > suddenly", the server will be unaware and will try to call notify() next
> > time that call is needed. Example (setSomething is a remote method on the
> > Server):
> >
> > public void setSomething(String param) {
> >
> > <do the Stuff>
> > Remote event ev = <proper RemoteEvent object>
> > for(RemoteEventListener l : listeners) {
> >
> > try {
> >
> > l.notify(ev);
> >
> > } catch(Exception e) {   listeners.remove(l);   }
> >
> > }
> >
> > }
> >
> > I'm sure you see where I want to go: if some clients in the list died
> > suddenly, the notify() will be called over them. A ConnectException is
> > thrown and the client is removed properly but... it takes a long time for
> > the exception to be thrown! Do you know how to control this situation?
> >
> > Thanks in advance!
> >
> > ADDITIONAL DATA:
> > I have tried setting un the following RMI System properies, and didn't work:
> > System.setProperty("sun.rmi.transport.tcp.responseTimeout","2000");
> > System.setProperty("sun.rmi.transport.tcp.handshakeTimeout","2000");
> > System.setProperty("sun.rmi.transport.tcp.readTimeout","2000");
> > System.setProperty("sun.rmi.transport.connectionTimeout","2000");
> > System.setProperty("sun.rmi.transport.proxy.connectTimeout ","2000");
> >
> > At present moment, under Windows XP for both client and server, the
> > ConnectException takes exactly* 21 seconds* to be thrown. Do you know the
> > reason for this value?
> >
> > --
> > *Sergio Aguilera*
>

RE: Client timeouts and remote calls

Posted by Christopher Dolan <ch...@avid.com>.

I agree about using a Lease. Then the client also knows if the server dies and can re-subscribe.

If the latency of the timeout is a concern, then my solution has been to use a 3 thread executor to send the updates. I chose 3 as a good number because of the philosophy of "once is an event, twice is a coincidence and thrice is conspiracy." That is, with 1 thread you should expect to be blocked; with 2 threads you should rarely be blocked but you could be blocked by bad luck; with 3 threads you'll only be blocked if there's a noteworthy outage which probably has a cause outside of your control so more threads won't help.

Chris

________________________________________
From: Gregg Wonderly [gergg@cox.net]
Sent: Wednesday, June 13, 2012 3:53 PM
To: user@river.apache.org
Subject: Re: Client timeouts and remote calls

There are timeouts that you can change in your Configuration to control how long the waits occur.  If it's important that everyone agree on the values being changed, you could include the use of a transaction so that if one client dies in the middle, then everyone can revert, and you can retry to get things to a sane state.

This is important if the data the clients receives, controls how they interact with the service.  But, you can otherwise, just do what you are doing, without a transaction.  If you turn on DGC, or use a Lease on the client received endpoint, then you might be able to know that a client is actually gone, rather than just temporarily unreachable.

Gregg

On Jun 13, 2012, at 2:13 PM, Sergio Aguilera Cazorla wrote:

> Hello,
>
> I have a question regarding client-side timeouts in Jini / Apache River. I
> am finishing a program where a certain number of clients can obtain a proxy
> and set / get properties (values) from an exported class in a server. Each
> client becomes a RemoteEventListener of the server, so each time a property
> is changed, the server calls notify() in ALL clients to make them aware
> that a property has changed (and all clients update their data).
>
> This architecture performs great if client programs finish in a "graceful"
> way, because I have a register / unregister mechanism that makes the server
> have an updated list of "alive" clients. However, if client machines "die
> suddenly", the server will be unaware and will try to call notify() next
> time that call is needed. Example (setSomething is a remote method on the
> Server):
>
> public void setSomething(String param) {
>
> <do the Stuff>
> Remote event ev = <proper RemoteEvent object>
> for(RemoteEventListener l : listeners) {
>
> try {
>
> l.notify(ev);
>
> } catch(Exception e) {   listeners.remove(l);   }
>
> }
>
> }
>
> I'm sure you see where I want to go: if some clients in the list died
> suddenly, the notify() will be called over them. A ConnectException is
> thrown and the client is removed properly but... it takes a long time for
> the exception to be thrown! Do you know how to control this situation?
>
> Thanks in advance!
>
> ADDITIONAL DATA:
> I have tried setting un the following RMI System properies, and didn't work:
> System.setProperty("sun.rmi.transport.tcp.responseTimeout","2000");
> System.setProperty("sun.rmi.transport.tcp.handshakeTimeout","2000");
> System.setProperty("sun.rmi.transport.tcp.readTimeout","2000");
> System.setProperty("sun.rmi.transport.connectionTimeout","2000");
> System.setProperty("sun.rmi.transport.proxy.connectTimeout ","2000");
>
> At present moment, under Windows XP for both client and server, the
> ConnectException takes exactly* 21 seconds* to be thrown. Do you know the
> reason for this value?
>
> --
> *Sergio Aguilera*

Re: Client timeouts and remote calls

Posted by Gregg Wonderly <ge...@cox.net>.

There are timeouts that you can change in your Configuration to control how long the waits occur.  If it's important that everyone agree on the values being changed, you could include the use of a transaction so that if one client dies in the middle, then everyone can revert, and you can retry to get things to a sane state.

This is important if the data the clients receives, controls how they interact with the service.  But, you can otherwise, just do what you are doing, without a transaction.  If you turn on DGC, or use a Lease on the client received endpoint, then you might be able to know that a client is actually gone, rather than just temporarily unreachable.

Gregg

On Jun 13, 2012, at 2:13 PM, Sergio Aguilera Cazorla wrote:

> Hello,
> 
> I have a question regarding client-side timeouts in Jini / Apache River. I
> am finishing a program where a certain number of clients can obtain a proxy
> and set / get properties (values) from an exported class in a server. Each
> client becomes a RemoteEventListener of the server, so each time a property
> is changed, the server calls notify() in ALL clients to make them aware
> that a property has changed (and all clients update their data).
> 
> This architecture performs great if client programs finish in a "graceful"
> way, because I have a register / unregister mechanism that makes the server
> have an updated list of "alive" clients. However, if client machines "die
> suddenly", the server will be unaware and will try to call notify() next
> time that call is needed. Example (setSomething is a remote method on the
> Server):
> 
> public void setSomething(String param) {
> 
> <do the Stuff>
> Remote event ev = <proper RemoteEvent object>
> for(RemoteEventListener l : listeners) {
> 
> try {
> 
> l.notify(ev);
> 
> } catch(Exception e) {   listeners.remove(l);   }
> 
> }
> 
> }
> 
> I'm sure you see where I want to go: if some clients in the list died
> suddenly, the notify() will be called over them. A ConnectException is
> thrown and the client is removed properly but... it takes a long time for
> the exception to be thrown! Do you know how to control this situation?
> 
> Thanks in advance!
> 
> ADDITIONAL DATA:
> I have tried setting un the following RMI System properies, and didn't work:
> System.setProperty("sun.rmi.transport.tcp.responseTimeout","2000");
> System.setProperty("sun.rmi.transport.tcp.handshakeTimeout","2000");
> System.setProperty("sun.rmi.transport.tcp.readTimeout","2000");
> System.setProperty("sun.rmi.transport.connectionTimeout","2000");
> System.setProperty("sun.rmi.transport.proxy.connectTimeout ","2000");
> 
> At present moment, under Windows XP for both client and server, the
> ConnectException takes exactly* 21 seconds* to be thrown. Do you know the
> reason for this value?
> 
> -- 
> *Sergio Aguilera*