You are viewing a plain text version of this content. The canonical link for it is here.

Posted to proton@qpid.apache.org by Rafael Schloming <rh...@alum.mit.edu> on 2015/04/01 01:14:01 UTC

Re: proton-j reactor implementation?

On Tue, Mar 31, 2015 at 2:48 PM, Alan Conway <ac...@redhat.com> wrote:

> On Tue, 2015-03-31 at 11:26 -0400, Rafael Schloming wrote:
> > On Tue, Mar 31, 2015 at 11:02 AM, Alan Conway <ac...@redhat.com>
> wrote:
> >
> > > On Mon, 2015-03-30 at 00:11 +0100, Adrian Preston wrote:
> > > > Hello all,
> > > >
> > > > I've been following the development of the reactor API and think
> that it
> > > looks really neat. Is anyone working on a pure Java version? I'd be
> > > interested in helping.
> > > >
> > > > Regards
> > > > - Adrian
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> number
> > > 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> > > 3AU
> > > >
> > >
> > > I'm currently working on a Go version which is not directly relevant,
> > > but porting directly from the python handlers.py and it is pretty
> > > straightforward. That's where I would start. For Go I also had to wrap
> a
> > > bunch of lower-level proton details but the task should be easier for
> > > Java since all that stuff already exists in Java.
> > >
> > > In Go I am not using the proton reactor to establish or select over
> > > connections, so I'm not using any of the reactor or selectable events.
> I
> > > have a goroutine per connection pumping a proton transport with
> separate
> > > event handling per-connection so we have connection concurrency but
> each
> > > proton engine is only used in a single thread.
> > >
> >
> > This sounds to me like it is based on a bit of a missunderstanding of how
> > the reactor works. The reactor doesn't actually establish or select over
> > connections in C/Python. It is actually just mediating the request for
> > connection creation between whatever piece of application code wants a
> new
> > connection, and whatever handler has been configured to deal with I/O
> > related events. This allows for a significant amount of flexiblity since
> > you can have multiple I/O implementations without having to hard code
> your
> > app to work against a specific one. This is just as important a
> requirement
> > in Go or Java or any other language.
>
> Yep I think I've misstated the issue for Go and I agree with you on
> Java.
>
> The real reason I'm not using the reactor is because event-loop
> programming is counter to the whole design of Go. I want to minimize it
> to handling AMQP-related events.
>

What makes you think this? If anything I would argue that go actually
embraces the concept of event-loop programming. For example they have
elevated the select API into a first class language construct (See
https://gobyexample.com/select).

Also, perhaps more importantly, the most significant difference between go
and other languages is the emphasis on concurrency via ownership (i.e.
passing pointers across concurrent queues) rather than on using explicit
mutexes. Event loops are most popular exactly where they enable a
processing pipeline that defines clear ownership of program constructs such
that programmers can take advantage of some concurrency without having to
explicitly reason about locks (e.g. nodejs, virt.x, spring reactor, ...).

Case in point, using the reactor API (either in C or in Python), I can
quite trivially write a simple program that shuffles messages from a link
on one connection to a link on another connection. It's not much more than
this:

    def on_message(self, event):
        sender = lookup(event.receiver)
        sender.send(transform(event.message))

In fact this example would not only be how you'd write such a proxy, but it
would also be how you'd write a simple request/response processor. The
thing is, the only reason this works is because the reactor owns both of
these connections and so it is safe for the event handler to manipulate
both connections at the same time. In the model you've described below,
this code or any similar code would be fraught with read/modify/write
hazards because every connection is potentially owned by a distinct thread
at any given point. To do something like this you would either need to
start introducing explicit locks (which is frowned upon in go, and even so
would have some fairly fundamental deadlocking issues), or you would need
to somehow separate the interaction between the two connections, e.g.
basically inject code in the goroutine that actually owns the connection so
it can run safely. Once you have a generic way of injecting this code
though I claim you've rebuilt the reactor API since that is one of its
major purposes. (See Reactor.schedule)

The proton engine API (pn_transport, pn_connnection et. al) already
> gives me an IO-neutral bytes in/bytes out interface. I run a goroutine
> per connection to pump a pn_transport and handle AMQP-related events. Go
> provides standard connection handling and IO abstractions that are more
> suited to Go than a mirror of the reactor API.
>
> I am doing an event-based API for AMQP events: it is a straight wrap of
> the proton C API + port of python MessageHandler API. I originally hoped
> to skip that wrapping and use the proton C API directly to implement a
> concurrent Go API (it is laughably easy to call C direct from Go.)
>
> However I decided to do the event wrapping for reasons:
>
> 1. I'm still thrashing on how the Go API should look I needed to make
> proton programming easy and fast while I experiment. C is never easy or
> fast ;)
>
> 2. Having an event layer that is a straight analog of the C/python
> reactive APIs will be valuable for cross-language development.
>
> 3. The first Go API I come up with will very likely need a lot of
> improving, again we want that to be easy and fast.
>
> So I will (shortly I hope) unveil the event based API, running an event
> loop per connection in its own goroutine (actually there are 3
> goroutines per connection - one reading, one writing and one processing
> proton events)
>
> I've done initial experiments that prove we can use channels to inject
> behavior into the event-loop goroutine from other goroutines, and
> extract results. There might actually be channels carrying messages and
> acknowledgements, but more likely they will be carrying co-ordination
> information, and say a Link object may be composed of several channels
> plus non-channel storage. I'm not there yet.
>

I think you are implicitly referring to several distinct exercises, and I
suspect it's important to keep them separate: (1) providing a basic go
binding to access the functionality that exists in C, (2) building a higher
level messaging API for go, and (3) exploring an application level
processing model that allows for more concurrency than you get with just a
single reactor.

I believe most of what you are going to run into in (2) is probably not
actually go specific since the go concurrency model is also very popular in
non-go languages. For example in Java you would have the exact same set of
issues if you tried to distribute your application level processing of
connection related events across distinct threads. If an event fired on one
thread needs to modify a connection that is currently owned by another
thread, you have exactly the same set of issues you would have in go.

> >
> > > I'm not sure what the right approach is for Java. Having a C-based
> > > reactor is useful in C and for some bindings (e.g. the python binding
> > > uses it) but in languages that have their own version of event
> > > loops/polling/selecting it may be better to go with the native
> > > approach.
> > >
> >
> > The reactor is pure code/data structure. I believe the correct approach
> for
> > Java would be a straightforward port,
>
> As long as it's possible to service connections concurrently I agree for
> Java (not having looked closely.) Java it is in the same concurrency
> category (heavy threads) as all our other binding languages (python,
> ruby, C++) so the reactor is probably appropriate.
>
> > and the correct approach for Go would
> > be a simple binding, just like all the other pure code/data pieces
> > (connection, transport, etc). Tthinking of the reactor as part of the I/O
> > subsystem is to misunderstand how it works. The reactor proper has been
> > carefully designed to not directly incorporate any I/O dependencies at
> all.
> >
> > In other words, don't think of the reactor as analogous to or a
> replacement
> > for the old Driver, think of the reactor as a (potentially)
> > multi-connection engine, or in UML terms:
> >
> >     Reactor <>---> Connection <>--> Session <>---> Link <>---> Delivery
> >
> > Please excuse the ascii art UML. The diamonds are supposed to imply
> > containment by composition.
>
> That's why its a poor fit for Go. Handling multiple connections is much
> easier in Go than in C (or other non-concurrent language) and the
> natural concurrent way to do it does not fit under an event-loop
> interface.
>

The hard part of handling multiple connections is the concurrency model,
and this is really the same in any language, and arguably should be the
choice of the application, not the library. The reactor API allows this
since you can use a separate reactor per connection if you wish, but you
also have the flexibility of grouping together related connections that
want to be more tightly coupled than is permitted when connections are
forced to be in separate threads. For example if my application wants to
set up some sort of proxy where connections are tightly paired, then being
able to explicity construct the reactors in that way is quite useful.

It's also worth noting that a good deal of what the reactor is providing is
a dispatch model, e.g. deciding exactly what handler processes a given
event. Duplicating this in another language is likely to lead to unwanted
inconsistencies. In general I think a pretty faithful binding of the C API
as-is should really be the first step of any binding (be it go or anything
else). We lose no generality by doing this, and we put ourselves in a much
better position to experiment with higher level APIs on top of that.

On a slight (but related) tangent, it's worth noting that there are two
distinct usage scenarios here that tend to pull in two different
directions. Infrastructury programs like dispatch or the cpp broker tend to
have something of a focus on efficiency, which leads to architectures that
try to exploit fine-grained concurrency, i.e. the ability to have one giant
process that can make use of many cores while still having a shared memory
model. For most applications, I think it's fairly safe to say that the
common wisdom is that such fine-grained concurrency only leads to lots of
concurrency related bugs, and the big application level focus is on coarse
grained concurrency (multiple processes, cloud-based elastic architectures,
etc).

Given that the application level is probably the default we want to cater
to for bindings, I think the Reactor API makes even more sense as the
starting point for bindings. By default you have all your state owned by
one thread at a time, and you don't need to think about concurrency at all.
If you do want to exploit shared memory concurrency you can do so, since
you are free to create as many reactors as you like, e.g. one per core or
one per connection.

--Rafael

Re: proton-j reactor implementation?

Posted by Adrian Preston <PR...@uk.ibm.com>.

Wow - thanks for the replies - that's quite a lot to take on-board!

While trying to get my head around a proton-J port of the Reactor API, I've created a prototype implementation (https://github.com/prestona/qpid-proton/tree/reactor - warning: it's more TODO's than code...).  One of my starting points was taking the Python examples and trying to work out how they would look in Java (there's an incomplete set in qpid-proton/examples/java/reactor).  When I started scaffolding the code to back these examples, I ran into an interesting design choice: how to integrate with the existing Java Driver / Connector / Listener classes.  I think that this touches on Rafael's UML diagram...

My plan is to introduce Selector and Selectable classes (which I hope can end up being similar enough to their C counterparts that testing is common), refactor DriverImpl to extend Selector, and refactor ConnectorImpl and ListenerImpl to be Selectables.  The advantage of doing this is that there's only one I/O implementation in the codebase, the Java Reactor can be written in terms of Selectors and Selectables but under the covers make use of common code in DriverImpl / ConnectorImpl and ListenerImpl.  The disadvantage is complexity: now DriverImpl (et al) need to support the functionality required by the traditional APIs as well as the new functions required to fit in with the Reactor API.

How's this sound as a compromise? 

Regards
- Adrian

-----Rafael Schloming <rh...@alum.mit.edu> wrote: -----
To: "proton@qpid.apache.org" <pr...@qpid.apache.org>
From: Rafael Schloming <rh...@alum.mit.edu>
Date: 04/01/2015 12:14AM
Subject: Re: proton-j reactor implementation?

On Tue, Mar 31, 2015 at 2:48 PM, Alan Conway <ac...@redhat.com> wrote:

> On Tue, 2015-03-31 at 11:26 -0400, Rafael Schloming wrote:
> > On Tue, Mar 31, 2015 at 11:02 AM, Alan Conway <ac...@redhat.com>
> wrote:
> >
> > > On Mon, 2015-03-30 at 00:11 +0100, Adrian Preston wrote:
> > > > Hello all,
> > > >
> > > > I've been following the development of the reactor API and think
> that it
> > > looks really neat. Is anyone working on a pure Java version? I'd be
> > > interested in helping.
> > > >
> > > > Regards
> > > > - Adrian
> > > > Unless stated otherwise above:
> > > > IBM United Kingdom Limited - Registered in England and Wales with
> number
> > > 741598.
> > > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire
> PO6
> > > 3AU
> > > >
> > >
> > > I'm currently working on a Go version which is not directly relevant,
> > > but porting directly from the python handlers.py and it is pretty
> > > straightforward. That's where I would start. For Go I also had to wrap
> a
> > > bunch of lower-level proton details but the task should be easier for
> > > Java since all that stuff already exists in Java.
> > >
> > > In Go I am not using the proton reactor to establish or select over
> > > connections, so I'm not using any of the reactor or selectable events.
> I
> > > have a goroutine per connection pumping a proton transport with
> separate
> > > event handling per-connection so we have connection concurrency but
> each
> > > proton engine is only used in a single thread.
> > >
> >
> > This sounds to me like it is based on a bit of a missunderstanding of how
> > the reactor works. The reactor doesn't actually establish or select over
> > connections in C/Python. It is actually just mediating the request for
> > connection creation between whatever piece of application code wants a
> new
> > connection, and whatever handler has been configured to deal with I/O
> > related events. This allows for a significant amount of flexiblity since
> > you can have multiple I/O implementations without having to hard code
> your
> > app to work against a specific one. This is just as important a
> requirement
> > in Go or Java or any other language.
>
> Yep I think I've misstated the issue for Go and I agree with you on
> Java.
>
> The real reason I'm not using the reactor is because event-loop
> programming is counter to the whole design of Go. I want to minimize it
> to handling AMQP-related events.
>

What makes you think this? If anything I would argue that go actually
embraces the concept of event-loop programming. For example they have
elevated the select API into a first class language construct (See
https://gobyexample.com/select).

Also, perhaps more importantly, the most significant difference between go
and other languages is the emphasis on concurrency via ownership (i.e.
passing pointers across concurrent queues) rather than on using explicit
mutexes. Event loops are most popular exactly where they enable a
processing pipeline that defines clear ownership of program constructs such
that programmers can take advantage of some concurrency without having to
explicitly reason about locks (e.g. nodejs, virt.x, spring reactor, ...).

Case in point, using the reactor API (either in C or in Python), I can
quite trivially write a simple program that shuffles messages from a link
on one connection to a link on another connection. It's not much more than
this:

    def on_message(self, event):
        sender = lookup(event.receiver)
        sender.send(transform(event.message))

In fact this example would not only be how you'd write such a proxy, but it
would also be how you'd write a simple request/response processor. The
thing is, the only reason this works is because the reactor owns both of
these connections and so it is safe for the event handler to manipulate
both connections at the same time. In the model you've described below,
this code or any similar code would be fraught with read/modify/write
hazards because every connection is potentially owned by a distinct thread
at any given point. To do something like this you would either need to
start introducing explicit locks (which is frowned upon in go, and even so
would have some fairly fundamental deadlocking issues), or you would need
to somehow separate the interaction between the two connections, e.g.
basically inject code in the goroutine that actually owns the connection so
it can run safely. Once you have a generic way of injecting this code
though I claim you've rebuilt the reactor API since that is one of its
major purposes. (See Reactor.schedule)

The proton engine API (pn_transport, pn_connnection et. al) already
> gives me an IO-neutral bytes in/bytes out interface. I run a goroutine
> per connection to pump a pn_transport and handle AMQP-related events. Go
> provides standard connection handling and IO abstractions that are more
> suited to Go than a mirror of the reactor API.
>
> I am doing an event-based API for AMQP events: it is a straight wrap of
> the proton C API + port of python MessageHandler API. I originally hoped
> to skip that wrapping and use the proton C API directly to implement a
> concurrent Go API (it is laughably easy to call C direct from Go.)
>
> However I decided to do the event wrapping for reasons:
>
> 1. I'm still thrashing on how the Go API should look I needed to make
> proton programming easy and fast while I experiment. C is never easy or
> fast ;)
>
> 2. Having an event layer that is a straight analog of the C/python
> reactive APIs will be valuable for cross-language development.
>
> 3. The first Go API I come up with will very likely need a lot of
> improving, again we want that to be easy and fast.
>
> So I will (shortly I hope) unveil the event based API, running an event
> loop per connection in its own goroutine (actually there are 3
> goroutines per connection - one reading, one writing and one processing
> proton events)
>
> I've done initial experiments that prove we can use channels to inject
> behavior into the event-loop goroutine from other goroutines, and
> extract results. There might actually be channels carrying messages and
> acknowledgements, but more likely they will be carrying co-ordination
> information, and say a Link object may be composed of several channels
> plus non-channel storage. I'm not there yet.
>

I think you are implicitly referring to several distinct exercises, and I
suspect it's important to keep them separate: (1) providing a basic go
binding to access the functionality that exists in C, (2) building a higher
level messaging API for go, and (3) exploring an application level
processing model that allows for more concurrency than you get with just a
single reactor.

I believe most of what you are going to run into in (2) is probably not
actually go specific since the go concurrency model is also very popular in
non-go languages. For example in Java you would have the exact same set of
issues if you tried to distribute your application level processing of
connection related events across distinct threads. If an event fired on one
thread needs to modify a connection that is currently owned by another
thread, you have exactly the same set of issues you would have in go.

> >
> > > I'm not sure what the right approach is for Java. Having a C-based
> > > reactor is useful in C and for some bindings (e.g. the python binding
> > > uses it) but in languages that have their own version of event
> > > loops/polling/selecting it may be better to go with the native
> > > approach.
> > >
> >
> > The reactor is pure code/data structure. I believe the correct approach
> for
> > Java would be a straightforward port,
>
> As long as it's possible to service connections concurrently I agree for
> Java (not having looked closely.) Java it is in the same concurrency
> category (heavy threads) as all our other binding languages (python,
> ruby, C++) so the reactor is probably appropriate.
>
> > and the correct approach for Go would
> > be a simple binding, just like all the other pure code/data pieces
> > (connection, transport, etc). Tthinking of the reactor as part of the I/O
> > subsystem is to misunderstand how it works. The reactor proper has been
> > carefully designed to not directly incorporate any I/O dependencies at
> all.
> >
> > In other words, don't think of the reactor as analogous to or a
> replacement
> > for the old Driver, think of the reactor as a (potentially)
> > multi-connection engine, or in UML terms:
> >
> >     Reactor <>---> Connection <>--> Session <>---> Link <>---> Delivery
> >
> > Please excuse the ascii art UML. The diamonds are supposed to imply
> > containment by composition.
>
> That's why its a poor fit for Go. Handling multiple connections is much
> easier in Go than in C (or other non-concurrent language) and the
> natural concurrent way to do it does not fit under an event-loop
> interface.
>

The hard part of handling multiple connections is the concurrency model,
and this is really the same in any language, and arguably should be the
choice of the application, not the library. The reactor API allows this
since you can use a separate reactor per connection if you wish, but you
also have the flexibility of grouping together related connections that
want to be more tightly coupled than is permitted when connections are
forced to be in separate threads. For example if my application wants to
set up some sort of proxy where connections are tightly paired, then being
able to explicity construct the reactors in that way is quite useful.

It's also worth noting that a good deal of what the reactor is providing is
a dispatch model, e.g. deciding exactly what handler processes a given
event. Duplicating this in another language is likely to lead to unwanted
inconsistencies. In general I think a pretty faithful binding of the C API
as-is should really be the first step of any binding (be it go or anything
else). We lose no generality by doing this, and we put ourselves in a much
better position to experiment with higher level APIs on top of that.

On a slight (but related) tangent, it's worth noting that there are two
distinct usage scenarios here that tend to pull in two different
directions. Infrastructury programs like dispatch or the cpp broker tend to
have something of a focus on efficiency, which leads to architectures that
try to exploit fine-grained concurrency, i.e. the ability to have one giant
process that can make use of many cores while still having a shared memory
model. For most applications, I think it's fairly safe to say that the
common wisdom is that such fine-grained concurrency only leads to lots of
concurrency related bugs, and the big application level focus is on coarse
grained concurrency (multiple processes, cloud-based elastic architectures,
etc).

Given that the application level is probably the default we want to cater
to for bindings, I think the Reactor API makes even more sense as the
starting point for bindings. By default you have all your state owned by
one thread at a time, and you don't need to think about concurrency at all.
If you do want to exploit shared memory concurrency you can do so, since
you are free to create as many reactors as you like, e.g. one per core or
one per connection.

--Rafael
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU