You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@qpid.apache.org by Tobias Duckworth <to...@duckworth.uk.com> on 2016/10/05 08:44:20 UTC

Re: Azure Service Bus receiver timing out with electron go binding

Hi, 

I'm experiencing the exact same problem using the connection_engine variant
of qpid-proton (version 0.12.2) - The non connection_engine build does not
exhibit the problem.

I'd like to fix this, could you explain what's involved please?

If I set the connection_option idle_timeout
(proton::connection_options().idle_timeout(proton::duration::FOREVER)), I
can see it come out in the output when I have PN_TRACE_FRM=1 :

[0x2b3904003f30]:0 -> @open(16)
[container-id="1104db2d-87ac-492f-89d2-0803dee95435",
hostname="TWDIoTHub.azure-devices.net:5671", channel-max=32767,
idle-time-out=2147483647]
[0x2b3904003f30]:  <- AMQP
[0x2b3904003f30]:0 <- @open(16)
[container-id="DeviceGateway_d56c48665111490aae2f99f08862d6b6",
hostname="10.0.4.51", max-frame-size=65536, channel-max=8191,
idle-time-out=240000]

And if I trace out the transport idle timeouts I get:

idle_timeout = 4294967295
remote_idle_timeout = 240000

Which seems to correlate.

However, after 315 seconds I consistently get the following:

 [0x2b3904006300]:0 <- @close(24) [error=@error(29)
[condition=:"amqp:connection:forced", description="The connection was
inactive for more than the allowed period of time."]]

Seems like i need to set the remote_idle_timeout to never time out, but how
is this possible?

Thanks,
Toby



--
View this message in context: http://qpid.2158936.n2.nabble.com/Azure-Service-Bus-receiver-timing-out-with-electron-go-binding-tp7651057p7651523.html
Sent from the Apache Qpid developers mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: Azure Service Bus receiver timing out with electron go binding

Posted by Tobias Duckworth <to...@duckworth.uk.com>.

I also have a Wireshark trace of the problem if required.





--
View this message in context: http://qpid.2158936.n2.nabble.com/Azure-Service-Bus-receiver-timing-out-with-electron-go-binding-tp7651057p7651525.html
Sent from the Apache Qpid developers mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: Azure Service Bus receiver timing out with electron go binding

Posted by Alan Conway <ac...@redhat.com>.

On Fri, 2016-10-07 at 03:13 -0700, Tobias Duckworth wrote:
> Thank you very much for the detailed explanation.
> 
> I managed to get this going quite simply:
> 
> 1) Added a pure virtual method on connection_engine,
> connection_engine::tick().
> 
> 2) Added a public function pn_timestamp_t
> connection_engine::tick_check(pn_timestamp_t now)
> 
> 3) Added some logic to connection_engine::dispatch that checks
> pn_event_type
> for whether it's a PN_TRANSPORT event, and if so sets a boolean to
> call the
> connection_engine::tick() method.
> 
> 4) Implemented the connection_engine::tick() method in my derived
> class,
> which gets the current time in milliseconds, then calls
> connection_engine::tick_check(now). If the returned value is greater
> than
> zero a timer is setup for the number of milliseconds returned, which
> on
> expiry calls the tick() method in the derived class.
> 
> With these four steps the underlying transport sends heartbeats at
> half of
> the remote_idle_timeout specified in the response to the connect
> packet.
> 
> Thanks again,
> Toby
> 

This will work *provided* you call dispatch() often enough. If your
application goes quiet for a period of time then you can miss
heartbeats unless you have some external timer set to poke the engine
periodically. If you do have that (e.g. a poll timeout or something)
then you should be set.

That's why I'm still not sure I want to put this logic directly into
the engine - setting up the external timer is platform-specific work
outside of the engine's remit. Also the system call to get the current
time is costly if done too frequently, so I'd rather drive ticks from a
timer than check the time on every IO event (it is workable but not
ideal.)

Just FYI: I'm working on an example driver to show using the connection
engine with libuv, it's very raw and doesn't have ticks yet (it will
soon) but if you're curious:

https://github.com/alanconway/qpid-proton/tree/libuv-driver

I think your work is probably ahead of this in features, but it might
be interesting. The goal is to organize a C framework so multi-threaded 
C applications (including the Proton C++ binding and Qpid Dispatch
router) can swap out the IO platform without big modifications.

> 
> 
> 
> --
> View this message in context: http://qpid.2158936.n2.nabble.com/Azure
> -Service-Bus-receiver-timing-out-with-electron-go-binding-
> tp7651057p7651625.html
> Sent from the Apache Qpid developers mailing list archive at
> Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: Azure Service Bus receiver timing out with electron go binding

Posted by Tobias Duckworth <to...@duckworth.uk.com>.

Thank you very much for the detailed explanation.

I managed to get this going quite simply:

1) Added a pure virtual method on connection_engine,
connection_engine::tick().

2) Added a public function pn_timestamp_t
connection_engine::tick_check(pn_timestamp_t now)

3) Added some logic to connection_engine::dispatch that checks pn_event_type
for whether it's a PN_TRANSPORT event, and if so sets a boolean to call the
connection_engine::tick() method.

4) Implemented the connection_engine::tick() method in my derived class,
which gets the current time in milliseconds, then calls
connection_engine::tick_check(now). If the returned value is greater than
zero a timer is setup for the number of milliseconds returned, which on
expiry calls the tick() method in the derived class.

With these four steps the underlying transport sends heartbeats at half of
the remote_idle_timeout specified in the response to the connect packet.

Thanks again,
Toby




--
View this message in context: http://qpid.2158936.n2.nabble.com/Azure-Service-Bus-receiver-timing-out-with-electron-go-binding-tp7651057p7651625.html
Sent from the Apache Qpid developers mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: Azure Service Bus receiver timing out with electron go binding

Posted by Alan Conway <ac...@redhat.com>.

On Thu, 2016-10-06 at 09:15 -0700, Tobias Duckworth wrote:
> Thanks for your response - I started another thread on the 'Users'
> list
> (http://qpid.2158936.n2.nabble.com/qpid-proton-0-12-2-connection-engi
> ne-remote-idle-timeout-td7651556.html)
> 
> What's not clear to me from your response is why it works on a non
> connection_engine build, but not on a connection_engine build. As a
> result
> of this I'm kind of expecting to find the calls I need to make within
> the
> CPP bindings, but have not found them so far.
> 
> I'll have a go at doing as you suggest and translating the Go code
> model
> into C, but in the meantime could you satisfy my curiosity by
> explaining why
> it works for non connection_engine builds, but not for
> connection_engine?

With pleasure, you may regret asking :)�

connection_engine isolates the "pure AMQP" codec and protocol state.
This is "original proton" with a bytes-in/bytes-out API. Very embedable
but not so simple to use, because the user must provide IO, polling and
timers.�

The deprecated pn_messenger_t and current pn_reactor_t both take care
of this for you. Ticking the transports is part of their poll loop. Our
C++, python and ruby APIs use the reactor internally. Go doesn't so it
has to tick for itself.

The big problem with reactor and messenger is hard-coded assumptions of
a single thread, posix-style sockets, cyrus SASL and openSSL. Those
assumptions immediately broke for Windows and we had to re-write chunks
of the reactor for schannel and IOCP. They broke again for Go which has
its own concurrent connection management, TLS stack, and is definitely
not single threaded.

So the connection engine was born to allow concurrent processing of
separate connections and a clean separation of protocol engine from the
IO/threading environment that drives it. It is intended to be of use
for embedding/integration work and longer term to replace the reactor
in language bindings that can support concurrency and use language-
native types and libraries for connection management.

For C and C++ I am working on a multi-threaded driver framework. The
goal is to be able to build the same (multi-threaded) application
against different drivers for different platforms: libuv, posix and
iocp are the initial targets.

Meantime here's the complete set of things a driver needs to provide,
the good news is you've already found most of them :)

Essentials
- initiating outbound connections
- accepting remote connections
- polling to feed bytes between IO and connection_engine
- managing timers for transport ticks

Security
- configuring "external" encryption and authentication

User tools:
- user code schedule timed events using the driver.
- multi-threaded user code communication between connection contexts

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: Azure Service Bus receiver timing out with electron go binding

Posted by Tobias Duckworth <to...@duckworth.uk.com>.

Thanks for your response - I started another thread on the 'Users' list
(http://qpid.2158936.n2.nabble.com/qpid-proton-0-12-2-connection-engine-remote-idle-timeout-td7651556.html)

What's not clear to me from your response is why it works on a non
connection_engine build, but not on a connection_engine build. As a result
of this I'm kind of expecting to find the calls I need to make within the
CPP bindings, but have not found them so far.

I'll have a go at doing as you suggest and translating the Go code model
into C, but in the meantime could you satisfy my curiosity by explaining why
it works for non connection_engine builds, but not for connection_engine?

Many thanks,
Toby




--
View this message in context: http://qpid.2158936.n2.nabble.com/Azure-Service-Bus-receiver-timing-out-with-electron-go-binding-tp7651057p7651581.html
Sent from the Apache Qpid developers mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: Azure Service Bus receiver timing out with electron go binding

Posted by Alan Conway <ac...@redhat.com>.

On Wed, 2016-10-05 at 04:46 -0700, Tobias Duckworth wrote:
> Thanks Robbie,
> 
> Didn't realise I am on the wrong list, sorry about that.
> 
> OK, so comparing what I see in a non connection_engine build to a
> connection_engine one.
> 
> From the Wireshark traces of both I can see that in the
> connection_engine
> build the Azure end is sending empty (presumably heartbeat) frames
> every 70
> seconds, but the qpid-proton end is never sending any (although it
> does
> respond to the Azure ones).
> 
> On the non connection_engine build both ends are sending periodic
> empty
> frames - The qpid-proton ones are every 120 seconds, which is half
> the
> idle_timeout specified in the connect packets coming from Azure.
> 
> So it looks like I need to convince the connection_engine to send out
> heartbeat frames and it will be fixed.

Correct. I just fixed this in the Go binding, you need to call
pn_transport_tick(pn_connnection_engine_transport(eng)) periodically
and it will take care of heartbeats for you.

I am working on a C "driver" framework with a C example (using the
libuv library) to show everything that needs to be done in a complete
proton driver. �

Meanwhile the Go code is model you can translate to C. Basically you
need to call pn_transport_tick when there is a PN_TRANSPORT event
(starts the first tick, and many anounce changes to the tick schedule)
and also ticks chain - each call to tick returns a time to call it
again.

Here's an annotated explanation, notes with XX:�

// Let proton run timed activity and set up the next tick
func (eng *Engine) tick() {
	now := time.Now() // XX Gets current time.
� � � � // XX pn_timestamp_t next = pn_transport_tick(pn_connection_engine_transport(eng));
	next := eng.Transport().Tick(now)
        // XX if (next != 0) {
	if !next.IsZero() {
                // Set some timer mechanism to fire at time `next`,
                // In Go we subtract now from next to get a relative time.
		eng.timer.Reset(next.Sub(now))
	}
}

func (eng *Engine) dispatch() bool {
	var needTick bool // Set if we need to tick the transport.
	for {
		cevent := C.pn_connection_engine_dispatch(&eng.engine)
		if cevent == nil {
			break
		}
		event := makeEvent(cevent, eng)
		if event.Type() == ETransport {
			needTick = true
		}
		for _, h := range eng.handlers {
			h.HandleEvent(event)
		}
	}
	if needTick {
		eng.tick()
	}
	return !bool(C.pn_connection_engine_finished(&eng.engine))
}

The Go IO loop monitors the timer as well as read/write IO and calls
dispatch() if it fires.

Should be easy to translate into C, but you will need to add some
scheduled wakeup to your poller or whatever you are using for IO.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: Azure Service Bus receiver timing out with electron go binding

Posted by Alan Conway <ac...@redhat.com>.

On Wed, 2016-10-05 at 04:46 -0700, Tobias Duckworth wrote:
> Thanks Robbie,
> 
> Didn't realise I am on the wrong list, sorry about that.
> 
> OK, so comparing what I see in a non connection_engine build to a
> connection_engine one.
> 
> From the Wireshark traces of both I can see that in the
> connection_engine
> build the Azure end is sending empty (presumably heartbeat) frames
> every 70
> seconds, but the qpid-proton end is never sending any (although it
> does
> respond to the Azure ones).
> 
> On the non connection_engine build both ends are sending periodic
> empty
> frames - The qpid-proton ones are every 120 seconds, which is half
> the
> idle_timeout specified in the connect packets coming from Azure.
> 
> So it looks like I need to convince the connection_engine to send out
> heartbeat frames and it will be fixed.

Correct. I just fixed this in the Go binding, you need to call
pn_transport_tick(pn_connnection_engine_transport(eng)) periodically
and it will take care of heartbeats for you.

I am working on a C "driver" framework with a C example (using the
libuv library) to show everything that needs to be done in a complete
proton driver. �

Meanwhile the Go code is model you can translate to C. Basically you
need to call pn_transport_tick when there is a PN_TRANSPORT event
(starts the first tick, and many anounce changes to the tick schedule)
and also ticks chain - each call to tick returns a time to call it
again.

Here's an annotated explanation, notes with XX:�

// Let proton run timed activity and set up the next tick
func (eng *Engine) tick() {
	now := time.Now() // XX Gets current time.
� � � � // XX pn_timestamp_t next = pn_transport_tick(pn_connection_engine_transport(eng));
	next := eng.Transport().Tick(now)
        // XX if (next != 0) {
	if !next.IsZero() {
                // Set some timer mechanism to fire at time `next`,
                // In Go we subtract now from next to get a relative time.
		eng.timer.Reset(next.Sub(now))
	}
}

func (eng *Engine) dispatch() bool {
	var needTick bool // Set if we need to tick the transport.
	for {
		cevent := C.pn_connection_engine_dispatch(&eng.engine)
		if cevent == nil {
			break
		}
		event := makeEvent(cevent, eng)
		if event.Type() == ETransport {
			needTick = true
		}
		for _, h := range eng.handlers {
			h.HandleEvent(event)
		}
	}
	if needTick {
		eng.tick()
	}
	return !bool(C.pn_connection_engine_finished(&eng.engine))
}

The Go IO loop monitors the timer as well as read/write IO and calls
dispatch() if it fires.

Should be easy to translate into C, but you will need to add some
scheduled wakeup to your poller or whatever you are using for IO.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: Azure Service Bus receiver timing out with electron go binding

Posted by Tobias Duckworth <to...@duckworth.uk.com>.

Thanks Robbie,

Didn't realise I am on the wrong list, sorry about that.

OK, so comparing what I see in a non connection_engine build to a
connection_engine one.

From the Wireshark traces of both I can see that in the connection_engine
build the Azure end is sending empty (presumably heartbeat) frames every 70
seconds, but the qpid-proton end is never sending any (although it does
respond to the Azure ones).

On the non connection_engine build both ends are sending periodic empty
frames - The qpid-proton ones are every 120 seconds, which is half the
idle_timeout specified in the connect packets coming from Azure.

So it looks like I need to convince the connection_engine to send out
heartbeat frames and it will be fixed.

Toby 



--
View this message in context: http://qpid.2158936.n2.nabble.com/Azure-Service-Bus-receiver-timing-out-with-electron-go-binding-tp7651057p7651534.html
Sent from the Apache Qpid developers mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: Azure Service Bus receiver timing out with electron go binding

Posted by Robbie Gemmell <ro...@gmail.com>.

Realised I missed covering a bit I meant to..

"Seems like i need to set the remote_idle_timeout to never time out,
but how is this possible?"

It isn't. In this scenario, ServiceBus has decided what its
requirements for idle behaviour are, and advertised a value in its
Open frame (the remote idle timeout) to the client to prompt
appropriate behaviour to satisfy them. If the client doesnt satisfy
its needs, ServiceBus then closes the connection. When you set the
idle-timeout configuration locally, thats basically the reverse case:
the client telling ServiceBus what the clients needs are in its Open
frame, and then if ServiceBus doesn't satisfy them then the client end
will close the connection. In both cases, the client side mechanics
are handled via the transport 'tick' process I mentioned earlier.

Robbie

On 5 October 2016 at 11:37, Robbie Gemmell <ro...@gmail.com> wrote:
> Setting the 'idle timeout' config will influence what is
> advertised/requested in the Open frames sent to the peer, but that in
> itself does not do the work needed to send heartbeat frames or enforce
> that they haven't been received. There is a 'tick' method on the
> transport that is responsible for doing that, and identifying the next
> point it needs to be called to do any necessary work. The 'non
> connection_engine' bits are likely doing that process for you. I don't
> know if/how this was exposed in the 0.12.2 connection_engine stuff,
> and it may have changed since, but that perhaps may give you some
> place to look before folks with more of a clue on these bits are
> around later.
>
> Questions like these would generally be better suited on the users@
> list, and certainly would be more visible there.
>
> Robbie
>
> On 5 October 2016 at 09:44, Tobias Duckworth <to...@duckworth.uk.com> wrote:
>> Hi,
>>
>> I'm experiencing the exact same problem using the connection_engine variant
>> of qpid-proton (version 0.12.2) - The non connection_engine build does not
>> exhibit the problem.
>>
>> I'd like to fix this, could you explain what's involved please?
>>
>> If I set the connection_option idle_timeout
>> (proton::connection_options().idle_timeout(proton::duration::FOREVER)), I
>> can see it come out in the output when I have PN_TRACE_FRM=1 :
>>
>> [0x2b3904003f30]:0 -> @open(16)
>> [container-id="1104db2d-87ac-492f-89d2-0803dee95435",
>> hostname="TWDIoTHub.azure-devices.net:5671", channel-max=32767,
>> idle-time-out=2147483647]
>> [0x2b3904003f30]:  <- AMQP
>> [0x2b3904003f30]:0 <- @open(16)
>> [container-id="DeviceGateway_d56c48665111490aae2f99f08862d6b6",
>> hostname="10.0.4.51", max-frame-size=65536, channel-max=8191,
>> idle-time-out=240000]
>>
>> And if I trace out the transport idle timeouts I get:
>>
>> idle_timeout = 4294967295
>> remote_idle_timeout = 240000
>>
>> Which seems to correlate.
>>
>> However, after 315 seconds I consistently get the following:
>>
>>  [0x2b3904006300]:0 <- @close(24) [error=@error(29)
>> [condition=:"amqp:connection:forced", description="The connection was
>> inactive for more than the allowed period of time."]]
>>
>> Seems like i need to set the remote_idle_timeout to never time out, but how
>> is this possible?
>>
>> Thanks,
>> Toby
>>
>>
>>
>> --
>> View this message in context: http://qpid.2158936.n2.nabble.com/Azure-Service-Bus-receiver-timing-out-with-electron-go-binding-tp7651057p7651523.html
>> Sent from the Apache Qpid developers mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
>> For additional commands, e-mail: dev-help@qpid.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: Azure Service Bus receiver timing out with electron go binding

Posted by Robbie Gemmell <ro...@gmail.com>.

Setting the 'idle timeout' config will influence what is
advertised/requested in the Open frames sent to the peer, but that in
itself does not do the work needed to send heartbeat frames or enforce
that they haven't been received. There is a 'tick' method on the
transport that is responsible for doing that, and identifying the next
point it needs to be called to do any necessary work. The 'non
connection_engine' bits are likely doing that process for you. I don't
know if/how this was exposed in the 0.12.2 connection_engine stuff,
and it may have changed since, but that perhaps may give you some
place to look before folks with more of a clue on these bits are
around later.

Questions like these would generally be better suited on the users@
list, and certainly would be more visible there.

Robbie

On 5 October 2016 at 09:44, Tobias Duckworth <to...@duckworth.uk.com> wrote:
> Hi,
>
> I'm experiencing the exact same problem using the connection_engine variant
> of qpid-proton (version 0.12.2) - The non connection_engine build does not
> exhibit the problem.
>
> I'd like to fix this, could you explain what's involved please?
>
> If I set the connection_option idle_timeout
> (proton::connection_options().idle_timeout(proton::duration::FOREVER)), I
> can see it come out in the output when I have PN_TRACE_FRM=1 :
>
> [0x2b3904003f30]:0 -> @open(16)
> [container-id="1104db2d-87ac-492f-89d2-0803dee95435",
> hostname="TWDIoTHub.azure-devices.net:5671", channel-max=32767,
> idle-time-out=2147483647]
> [0x2b3904003f30]:  <- AMQP
> [0x2b3904003f30]:0 <- @open(16)
> [container-id="DeviceGateway_d56c48665111490aae2f99f08862d6b6",
> hostname="10.0.4.51", max-frame-size=65536, channel-max=8191,
> idle-time-out=240000]
>
> And if I trace out the transport idle timeouts I get:
>
> idle_timeout = 4294967295
> remote_idle_timeout = 240000
>
> Which seems to correlate.
>
> However, after 315 seconds I consistently get the following:
>
>  [0x2b3904006300]:0 <- @close(24) [error=@error(29)
> [condition=:"amqp:connection:forced", description="The connection was
> inactive for more than the allowed period of time."]]
>
> Seems like i need to set the remote_idle_timeout to never time out, but how
> is this possible?
>
> Thanks,
> Toby
>
>
>
> --
> View this message in context: http://qpid.2158936.n2.nabble.com/Azure-Service-Bus-receiver-timing-out-with-electron-go-binding-tp7651057p7651523.html
> Sent from the Apache Qpid developers mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org