You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by Andrew Stitcher <as...@redhat.com> on 2010/10/12 19:22:32 UTC

Heads up re Rdma IO state transitions [Was: AsynchIO state transition]

For those interested in the Rdma implementation:

I've been doing a lot of stability work, stressing the rdma code in odd
corner cases (unexpected disconnects mostly). While on this trail I
reailised I could simplify the Rdma::AsynchIO state machine drastically
by ensuring that all callbacks generated by this layer happen in the
"thread context" of the connection.

After an iteration to improve the performance which added a simple
version of the state machine back, we have a version that has very
similar throughput, but a little better latency as measured by me on my
development boxes.

I like this new code much better, but then I wrote it, so take a look
and see what you think, constructive comments welcome.

Most relevant files:
qpid/cpp/src/qpid/sys/rdma/RdmaIO.h
qpid/cpp/src/qpid/sys/rdma/RdmaIO.cpp

Andrew



---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: Heads up re Rdma IO state transitions [Was: AsynchIO state transition]

Posted by Aaron Fabbri <aj...@gmail.com>.
Figures.. found a missing edge right after sending this out.  This is better:

linux$ dot dot asynch_io_state_machine.dot -Tpng -o asynch_io_state_machine.png

---

digraph asynchio_fsm {

        node [shape = doublecircle] IDLE;
        node [fontsize=9] NOTIFY_PENDING;
        node [shape = circle, fontsize=12];
        edge [fontsize=9];
        IDLE -> NOTIFY_PENDING [label="notifyPendingWrite"];
        NOTIFY -> NOTIFY_PENDING [label="notifyPendingWrite"];
        NOTIFY_PENDING -> NOTIFY_PENDING [label="notifyPendingWrite"];
        STOPPED -> STOPPED [label="*"];

        IDLE -> NOTIFY_PENDING [label="dataEvent"];
        NOTIFY -> NOTIFY_PENDING [label="dataEvent"];
        NOTIFY_PENDING -> NOTIFY_PENDING [label="dataEvent"];

        IDLE -> NOTIFY [label="writeEvent"];
        NOTIFY -> NOTIFY [label="writeEvent"];
        NOTIFY_PENDING -> NOTIFY [label="writeEvent"];

        IDLE -> IDLE [label="writeEvent2"];
        NOTIFY -> IDLE [label="writeEvent2"];

        IDLE -> STOPPED [label="stop"];
        NOTIFY -> STOPPED [label="stop"];
        NOTIFY_PENDING -> STOPPED [label="stop"];
}


Re: Heads up re Rdma IO state transitions [Was: AsynchIO state transition]

Posted by Aaron Fabbri <aj...@gmail.com>.
I whipped up a quick state diagram for Andrew's latest RdmaIO stuff.
I'll paste the dot source here and attach the png.

To generate (assuming you have graphviz installed):

linux$  dot asynch_io_state_machine.dot -Tpng -o asynch_io_state_machine.png

Here's the source:

digraph asynchio_fsm {

        node [shape = doublecircle] IDLE;
        node [fontsize=9] NOTIFY_PENDING;
        node [shape = circle, fontsize=12];
        edge [fontsize=9];
        IDLE -> NOTIFY_PENDING [label="notifyPendingWrite"];
        NOTIFY_PENDING -> NOTIFY_PENDING [label="notifyPendingWrite"];
        STOPPED -> STOPPED [label="*"];

        IDLE -> NOTIFY_PENDING [label="dataEvent"];
        NOTIFY -> NOTIFY_PENDING [label="dataEvent"];
        NOTIFY_PENDING -> NOTIFY_PENDING [label="dataEvent"];

        IDLE -> NOTIFY [label="writeEvent"];
        NOTIFY -> NOTIFY [label="writeEvent"];
        NOTIFY_PENDING -> NOTIFY [label="writeEvent"];

        IDLE -> IDLE [label="writeEvent2"];
        NOTIFY -> IDLE [label="writeEvent2"];

        IDLE -> STOPPED [label="stop"];
        NOTIFY -> STOPPED [label="stop"];
        NOTIFY_PENDING -> STOPPED [label="stop"];
}


Re: Heads up re Rdma IO state transitions [Was: AsynchIO state transition]

Posted by Andrew Stitcher <as...@redhat.com>.
On Thu, 2010-10-14 at 15:03 -0700, Aaron Fabbri wrote:
> On Thu, Oct 14, 2010 at 6:28 AM, Andrew Stitcher <as...@redhat.com> wrote:
> > On Wed, 2010-10-13 at 21:59 -0700, Aaron Fabbri wrote:
> >> On Tue, Oct 12, 2010 at 10:22 AM, Andrew Stitcher <as...@redhat.com> wrote:
> >> > For those interested in the Rdma implementation:
> >> >
> >> > I've been doing a lot of stability work, stressing the rdma code in odd
> >> > corner cases (unexpected disconnects mostly). While on this trail I
> >> > reailised I could simplify the Rdma::AsynchIO state machine drastically
> >> > by ensuring that all callbacks generated by this layer happen in the
> >> > "thread context" of the connection.
> >>
> >> Thanks for the heads up.  I'm taking a quick look at the diffs.  By
> >> "thread context of the connection", do you mean always having these
> >> callbacks happen from the poller threads?
> >
> > yes.
> >
> >>
> >> Can you give some hints on how this simplified things?
> >
> > Look at the code, and you will see ;-)
> 
> Fair enough. I did look at the diffs and the new stuff is much
> cleaner.  I was fishing for some color or background on the previous
> complex state machine, and why restricting thread contexts simplifies
> it so much. I'm sure I could figure it out with more than 15 minutes
> staring at 1500 lines of diffs.

The state machine changes are all in the last 2 RDMA changes checked in
(much fewer than 1500 lines of changes!). I use a git mirror of the svn
repo for all my development and find gitk a great way to inspect tree
changes. I recommend it. 

The other changes don't change the state machine at all, but do
ultimately tidy up some of the other code I think. Note that I try
fairly  hard to make each changeset coherent in itself (I go back and
alter the changesets if necessary before finally pushing them up to svn)
So you should be able to look at each change in some isolation.

> ...
> It is curious that forcing a context switch in the write path
> (notifyPendingWrite now wakes up a poller which does the idle callback
> which enqueues the write) is OK performance-wise.  A major motivation
> of verbs/RDMA is to avoid context switches.

Yes this is an interesting aspect of the code. What happens is that the
code actually avoids context switching when under load. Actually the
whole purpose of the new state machine is to this: Essentially if you
get a notifyWrite event whilst processing incoming completions then the
notifying thread just sets state so that the already processing thread
will go and do the notify callback.

The only time when you get actual context switch in the write path is
when there isn't a lot of load.

Actually this was the major purpose of the previous difficult to
understand state machine too.

What seems to be the case from my rough measurements is that calling
back on whatever thread got the notifyPendingWrite() isn't in itself a
major performance gain over hijacking another thread already servicing
the connection.

Obviously proper measurement and experience will show in the end.

Andrew


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: Heads up re Rdma IO state transitions [Was: AsynchIO state transition]

Posted by Aaron Fabbri <aj...@gmail.com>.
On Thu, Oct 14, 2010 at 6:28 AM, Andrew Stitcher <as...@redhat.com> wrote:
> On Wed, 2010-10-13 at 21:59 -0700, Aaron Fabbri wrote:
>> On Tue, Oct 12, 2010 at 10:22 AM, Andrew Stitcher <as...@redhat.com> wrote:
>> > For those interested in the Rdma implementation:
>> >
>> > I've been doing a lot of stability work, stressing the rdma code in odd
>> > corner cases (unexpected disconnects mostly). While on this trail I
>> > reailised I could simplify the Rdma::AsynchIO state machine drastically
>> > by ensuring that all callbacks generated by this layer happen in the
>> > "thread context" of the connection.
>>
>> Thanks for the heads up.  I'm taking a quick look at the diffs.  By
>> "thread context of the connection", do you mean always having these
>> callbacks happen from the poller threads?
>
> yes.
>
>>
>> Can you give some hints on how this simplified things?
>
> Look at the code, and you will see ;-)

Fair enough. I did look at the diffs and the new stuff is much
cleaner.  I was fishing for some color or background on the previous
complex state machine, and why restricting thread contexts simplifies
it so much. I'm sure I could figure it out with more than 15 minutes
staring at 1500 lines of diffs.

> In a little more detail - the Rdma::AsynchIO code is a lot easier to
> understand IMO. being sure that all the callbacks happens from an IO
> thread with the connection properly serialised makes it easier to not
> screw up changes too.
>
>>
>> >
>> > After an iteration to improve the performance which added a simple
>> > version of the state machine back, we have a version that has very
>> > similar throughput, but a little better latency as measured by me on my
>> > development boxes.
>>
>> What sort of latency improvement are you seeing?
>
> Irrelevant I think given I did no real tuning and this seems to be
> important to get good and reliable results. I was only testing in order
> to be sure there was no obvious regression.

It is curious that forcing a context switch in the write path
(notifyPendingWrite now wakes up a poller which does the idle callback
which enqueues the write) is OK performance-wise.  A major motivation
of verbs/RDMA is to avoid context switches.

That's the feedback that came to mind.  I know sending a patch would
be more constructive.  ;-)  I will give this latest stuff a go and let
you know if anything interesting happens.

Thanks,
Aaron

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: Heads up re Rdma IO state transitions [Was: AsynchIO state transition]

Posted by Andrew Stitcher <as...@redhat.com>.
On Wed, 2010-10-13 at 21:59 -0700, Aaron Fabbri wrote:
> On Tue, Oct 12, 2010 at 10:22 AM, Andrew Stitcher <as...@redhat.com> wrote:
> > For those interested in the Rdma implementation:
> >
> > I've been doing a lot of stability work, stressing the rdma code in odd
> > corner cases (unexpected disconnects mostly). While on this trail I
> > reailised I could simplify the Rdma::AsynchIO state machine drastically
> > by ensuring that all callbacks generated by this layer happen in the
> > "thread context" of the connection.
> 
> Thanks for the heads up.  I'm taking a quick look at the diffs.  By
> "thread context of the connection", do you mean always having these
> callbacks happen from the poller threads?

yes.

> 
> Can you give some hints on how this simplified things?

Look at the code, and you will see ;-)

In a little more detail - the Rdma::AsynchIO code is a lot easier to
understand IMO. being sure that all the callbacks happens from an IO
thread with the connection properly serialised makes it easier to not
screw up changes too.

> 
> >
> > After an iteration to improve the performance which added a simple
> > version of the state machine back, we have a version that has very
> > similar throughput, but a little better latency as measured by me on my
> > development boxes.
> 
> What sort of latency improvement are you seeing?

Irrelevant I think given I did no real tuning and this seems to be
important to get good and reliable results. I was only testing in order
to be sure there was no obvious regression.

In other words in my testing the minimum latencies returned by latency
test were not very dissimilar, but the max and avg latencies were
improved.


Andrew



---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org


Re: Heads up re Rdma IO state transitions [Was: AsynchIO state transition]

Posted by Aaron Fabbri <aj...@gmail.com>.
On Tue, Oct 12, 2010 at 10:22 AM, Andrew Stitcher <as...@redhat.com> wrote:
> For those interested in the Rdma implementation:
>
> I've been doing a lot of stability work, stressing the rdma code in odd
> corner cases (unexpected disconnects mostly). While on this trail I
> reailised I could simplify the Rdma::AsynchIO state machine drastically
> by ensuring that all callbacks generated by this layer happen in the
> "thread context" of the connection.

Thanks for the heads up.  I'm taking a quick look at the diffs.  By
"thread context of the connection", do you mean always having these
callbacks happen from the poller threads?

Can you give some hints on how this simplified things?

>
> After an iteration to improve the performance which added a simple
> version of the state machine back, we have a version that has very
> similar throughput, but a little better latency as measured by me on my
> development boxes.

What sort of latency improvement are you seeing?

> I like this new code much better, but then I wrote it, so take a look
> and see what you think, constructive comments welcome.
>
> Most relevant files:
> qpid/cpp/src/qpid/sys/rdma/RdmaIO.h
> qpid/cpp/src/qpid/sys/rdma/RdmaIO.cpp
>
> Andrew

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:dev-subscribe@qpid.apache.org