You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by Daniel Pocock <da...@pocock.pro> on 2017/04/03 09:32:36 UTC

Re: best practices for qpid-proton error handling and transport reconnection?


On 30/03/17 19:03, Alan Conway wrote:
> On Thu, 2017-03-30 at 11:46 +0200, Daniel Pocock wrote:
>> I built a C++ receiver based on the examples.
>>
>> I notice that from time to time my process was stopping with
>> exceptions
>> about the transport going away.  I was able to reproduce the error by
>> manually restarting the broker, although it is not clear why the
>> connections were closing at random, the rabbitmq server log just says
>> it
>> closed.  Is any additional code needed in the example or the broker
>> config to enable heartbeats or other mechanisms to ensure long
>> running
>> connections stay up?
>>
>> I tried adding an implementation of
>> on_receiver_close(proton::receiver
>> &r) and I found it was not being invoked at all.
>>
>> Then I added an implementation of
>> on_transport_error(proton::transport
>> &t) and I found it was being called.  I tried various ways of
>> bringing
>> up the connection again, eventually this appeared to work:
>>
>> void
>> my_handler::on_transport_error(proton::transport &t)
>> {
>>    logger.warn("transport closed unexpectedly, trying to re-establish
>> connection");
>>    t.connection().container().open_receiver(mUrl);
>> }
>>
> 
> Pretty much, but see other discussions on the list on the subject of
> reconnect.
> 
>> Is that the only thing that is necessary to ensure reliable operation
>> for long-running processes?  Is any additional effort needed in
>> coding
>> either a sender or receiver to ensure messages are not lost or
>> delivered
>> twice on reconnection?
>>
>> I also noticed that some of the examples wrap the
>> proton::default_container::run() method in a try/catch block but
>> apart
>> from logging an error, they don't make any attempt to restart the
>> container.  Can anybody provide a more comprehensive example showing
>> everything that should be done to try and keep the container running?
> 
> My preferred coding style would be to never throw exceptions out of a
> handler unless you want to stop the process. You could catch the
> exception out of run() and recover/restart but at that point you have
> lost most of the context for the problem so IMO it would be difficult
> and error-prone. I wouldn't do it.
> 

I agree it is best to do this in the handlers.

Many of the examples don't override most of the handlers.  In some cases
the default handlers throw exceptions that propagate to the code that
invoked run()

Is it possible to specify exactly which of the handler methods must be
overridden to ensure all foreseeable error conditions are handled within
an application?


> If you are writing a client, then stopping the process on an exception
> from run() is perfectly reasonable. If you are a server then you should
> *always* respond to an error with an AMQP protocol response - set an
> error condition and close link/connnection/transport or whatever. Once
> you've done that the problem is resolved and there's no reason to throw
> out of run() (unless there's some server-fatal condition)
> 
>> I also assume it is good practice to wrap everything in the
>> on_message
>> method in a try/catch block in case an individual message contains
>> unexpected content.  Are there any other places where try/catch
>> blocks
>> are needed with the proton API to avoid applications bailing out
>> unexpectedly?
> 
> :) As above: on a server I would use try/catch or nothrow() in all
> handler functions, on a client I would probably let unexpected
> exceptions thru and exit with error message in the catch block for
> run()

In this case the client is actually a daemon (the reSIProcate
registration-agent[1] and the AMQP queue is being used to send commands
to it.  Eventually we would also like to have the SIP proxy accepting
commands through a queue, it already has support for commands submitted
through a socket (repro.config / CommandBindAddress[2]).  While these
processes are clients in the AMQP model, they are meant to be long
running and stateful processes and so I need to have a comprehensive
approach to ensure they don't get brought down by an exception.

Regards,

Daniel

1. https://github.com/resiprocate/registration-agent
2.
https://github.com/resiprocate/resiprocate/blob/master/repro/repro.config#L269

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


)

Posted by Alan Conway <ac...@redhat.com>.
On Mon, 2017-04-03 at 11:32 +0200, Daniel Pocock wrote:
> 
> On 30/03/17 19:03, Alan Conway wrote:
> > 
> > My preferred coding style would be to never throw exceptions out of
> > a
> > handler unless you want to stop the process. You could catch the
> > exception out of run() and recover/restart but at that point you
> > have
> > lost most of the context for the problem so IMO it would be
> > difficult
> > and error-prone. I wouldn't do it.
> > 
> 
> I agree it is best to do this in the handlers.
> 
> Many of the examples don't override most of the handlers.��In some
> cases
> the default handlers throw exceptions that propagate to the code that
> invoked run()
> 
> Is it possible to specify exactly which of the handler methods must
> be
> overridden to ensure all foreseeable error conditions are handled
> within
> an application?
> 

Excellent point. At the very least that should be clearly documented.

Since you can't compile the docs, you can write a boilerplate base
handler that implements every on_foo() function as:

    on_foo(...) {�
        try { proton::messaging_handler::foo(...) }�
        catch(std::exception& e) { default_error_handling(e); } 
    }

Annoying, but a once-off task. You can probably template it in C++>=11,
if you dare.

astitcher: we could extend the C++ API to expose a per-thread run loop
like the C proactor. Max flexibility and avoids the boilerplate, but
exposes more mechanics and provides more options for user to shoot self
in foot. Just a thought.

> In this case the client is actually a daemon (the reSIProcate
> registration-agent[1] and the AMQP queue is being used to send
> commands
> to it.��Eventually we would also like to have the SIP proxy accepting
> commands through a queue, it already has support for commands
> submitted
> through a socket (repro.config / CommandBindAddress[2]).��While these
> processes are clients in the AMQP model, they are meant to be long
> running and stateful processes and so I need to have a comprehensive
> approach to ensure they don't get brought down by an exception.

+1, when I said "client" I was thinking of a short-lived process that
can just bail on error - anything else needs the kind of exception
handling you describe.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org