You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Tyler Hobbs <ty...@datastax.com> on 2016/08/17 19:08:16 UTC

CASSANDRA-10993 Approaches

In the spirit of the recent thread about discussing large changes on the
Dev ML, I'd like to talk about CASSANDRA-10993, the first step in the
"thread per core" work.

The goal of 10993 is to transform the read and write paths into an
event-driven model powered by event loops. This means that each request
can be handled on a single thread (although typically broken up into
multiple steps, depending on I/O and locking) and the old mutation and read
thread pools can be removed. So far, we've prototyped this with a couple
of approaches:

The first approach models each request as a state machine (or composition
of state machines). For example, a single write request is encapsulated in
a WriteTask object which moves through a series of states as portions of
the write complete (allocating a commitlog segment, syncing the commitlog,
receiving responses from remote replicas). These state transitions are
triggered by Events that are emitted by, e.g., the
CommitlogSegmentManager. The event loop that manages tasks, events,
timeouts, and scheduling is custom and is (currently) closely tied to a
Netty event loop. Here are a couple of example classes to take a look at:

WriteTask:
https://github.com/thobbs/cassandra/blob/CASSANDRA-10993-WIP/src/java/org/apache/cassandra/poc/WriteTask.java
EventLoop:
https://github.com/thobbs/cassandra/blob/CASSANDRA-10993-WIP/src/java/org/apache/cassandra/poc/EventLoop.java

The second approach utilizes RxJava and the Observable pattern. Where we
would wait for emitted events in the state machine approach, we instead
depend on an Observable to "push" the data/result we're awaiting.
Scheduling is handled by an Rx scheduler (which is customizable). The code
changes required for this are, overall, less intrusive. Here's a quick
example of what this looks like for high-level operations:
https://github.com/thobbs/cassandra/blob/rxjava-rebase/src/java/org/apache/cassandra/service/StorageProxy.java#L1724-L1732
.

So far we've benchmarked both approaches on in-memory reads to get an idea
of the upper-bound performance of both approaches. Throughput appears to
be very similar with both branches.

There are a few considerations up for debate as to which approach we should
go with that I would appreciate input on.

First, performance. There are concerns that going with Rx (or something
similar) may limit the peak performance we can eventually attain in a
couple of ways. First, we don't have as much control over the event loop,
scheduling, and chunking of tasks. With the state machine approach, we're
writing all of this, so it's totally under our control. With Rx, a lot of
things are customizable or already have decent tools, but this may come up
short in critical ways. Second, the overhead of the Observable machinery
may become significant as other bottlenecks are removed. Of course,
WriteTask et al have their own overhead, but once again, we have more
control there.

The second consideration is code style and ease of understanding. I think
both of these approaches have downsides in different areas. The state
machines are very explicit (an upside), but also very verbose and somewhat
disjointed. Most of the complex operations in Cassandra can't cleanly be
represented as a single state machine, because they're logically multiple
state machines operating in parallel (e.g. the local write path and the
remote write path in WriteTask). After working on the prototypes, I've
found the state machines to be harder to logically follow than I had
hoped. Perhaps we could come up with better abstractions and patterns for
this, but that's the current state of things. On the Rx side, the downside
is that the behavior is much less explicit. Additionally, some find it
more difficult to mentally follow the flow of execution. Based on my past
work with a large Twisted Python codebase, I'll agree that it's tough to
get used to, but not unmanageable with experience and good coding patterns.

A third consideration is code reuse. A big advantage of Rx is that it
comes with many tools for transforming Observables, handling multiple
Observables, error handling, and tracing. With the state machine approach,
we would need to write equivalents for these from scratch. This is a
non-trivial amount of work that might make the project take significantly
longer to complete. Combining this with fact that the Rx approach would be
less invasive, it seems like we would have an easier time introducing
incremental changes to the code base rather than having a big-bang commit.

If I can boil these concerns down to one tradeoff, it's this: do we want to
expend more effort and have more explicit code and complete control, or do
we want to piggyback on the Rx work, give up some control, and (hopefully)
get to the next, deeper optimizations sooner?

Thanks for any input on this topic.

--
Tyler Hobbs
DataStax <http://datastax.com/>

Re: CASSANDRA-10993 Approaches

Posted by Stefan Podkowinski <sp...@gmail.com>.

From my perspective, one of the most important reasons for RxJava would be
the strategic option to integrate reactive streams [1] in the overall
Cassandra architecture at some point in the future. Reactive streams would
allow to design back pressure fundamentally different compared to what we
do in the current work-queue based execution model. Think about the
optimizations currently deployed to walk a thin line between throughput,
latency and GC pressure. About the lack of coordination between individual
processes such as compactions, streaming and client requests that will
effect each other; where we can just hope that clients back off due to
latency aware policies, streams that will eventually timeout, or
compactions that hopefully get enough work done at some point. We even have
to tell people to tune batch sizes to not overwhelm nodes in the cluster.
Squeezing out n% during performance tests is nice, but IMO 10993 should
also address how to get more control on using system resources and a
reactive stream based approach could help with that.

[1] https://github.com/ReactiveX/RxJava/wiki/Reactive-Streams


On Wed, Aug 17, 2016 at 9:54 PM, Jake Luciani <ja...@gmail.com> wrote:

> I think I outlined the tradeoffs I see between the roll our own vs use a
> reactive framework in
> https://issues.apache.org/jira/plugins/servlet/mobile#
> issue/CASSANDRA-10528
>
> My view is we should try to utilize the existing before we start writing
> our own. And even if we do write our own keep it reactive since reactive
> APIs are going to be adopted in the Java 9 spec.  There is an entire
> community out there thinking about asynchronous programming that we can tap
> into.
>
> I don't buy the argument (yet) that Rx or other libraries lack the control
> we need. In fact these APIs are quite extensible.
>
> On Aug 17, 2016 3:08 PM, "Tyler Hobbs" <ty...@datastax.com> wrote:
>
> > In the spirit of the recent thread about discussing large changes on the
> > Dev ML, I'd like to talk about CASSANDRA-10993, the first step in the
> > "thread per core" work.
> >
> > The goal of 10993 is to transform the read and write paths into an
> > event-driven model powered by event loops.  This means that each request
> > can be handled on a single thread (although typically broken up into
> > multiple steps, depending on I/O and locking) and the old mutation and
> read
> > thread pools can be removed.  So far, we've prototyped this with a couple
> > of approaches:
> >
> > The first approach models each request as a state machine (or composition
> > of state machines).  For example, a single write request is encapsulated
> in
> > a WriteTask object which moves through a series of states as portions of
> > the write complete (allocating a commitlog segment, syncing the
> commitlog,
> > receiving responses from remote replicas).  These state transitions are
> > triggered by Events that are emitted by, e.g., the
> > CommitlogSegmentManager.  The event loop that manages tasks, events,
> > timeouts, and scheduling is custom and is (currently) closely tied to a
> > Netty event loop.  Here are a couple of example classes to take a look
> at:
> >
> > WriteTask:
> > https://github.com/thobbs/cassandra/blob/CASSANDRA-
> > 10993-WIP/src/java/org/apache/cassandra/poc/WriteTask.java
> > EventLoop:
> > https://github.com/thobbs/cassandra/blob/CASSANDRA-
> > 10993-WIP/src/java/org/apache/cassandra/poc/EventLoop.java
> >
> > The second approach utilizes RxJava and the Observable pattern.  Where we
> > would wait for emitted events in the state machine approach, we instead
> > depend on an Observable to "push" the data/result we're awaiting.
> > Scheduling is handled by an Rx scheduler (which is customizable).  The
> code
> > changes required for this are, overall, less intrusive.  Here's a quick
> > example of what this looks like for high-level operations:
> > https://github.com/thobbs/cassandra/blob/rxjava-rebase/
> > src/java/org/apache/cassandra/service/StorageProxy.java#L1724-L1732
> > .
> >
> > So far we've benchmarked both approaches on in-memory reads to get an
> idea
> > of the upper-bound performance of both approaches.  Throughput appears to
> > be very similar with both branches.
> >
> > There are a few considerations up for debate as to which approach we
> should
> > go with that I would appreciate input on.
> >
> > First, performance.  There are concerns that going with Rx (or something
> > similar) may limit the peak performance we can eventually attain in a
> > couple of ways.  First, we don't have as much control over the event
> loop,
> > scheduling, and chunking of tasks.  With the state machine approach,
> we're
> > writing all of this, so it's totally under our control.  With Rx, a lot
> of
> > things are customizable or already have decent tools, but this may come
> up
> > short in critical ways.  Second, the overhead of the Observable machinery
> > may become significant as other bottlenecks are removed.  Of course,
> > WriteTask et al have their own overhead, but once again, we have more
> > control there.
> >
> > The second consideration is code style and ease of understanding.  I
> think
> > both of these approaches have downsides in different areas.  The state
> > machines are very explicit (an upside), but also very verbose and
> somewhat
> > disjointed.  Most of the complex operations in Cassandra can't cleanly be
> > represented as a single state machine, because they're logically multiple
> > state machines operating in parallel (e.g. the local write path and the
> > remote write path in WriteTask).  After working on the prototypes, I've
> > found the state machines to be harder to logically follow than I had
> > hoped.  Perhaps we could come up with better abstractions and patterns
> for
> > this, but that's the current state of things.  On the Rx side, the
> downside
> > is that the behavior is much less explicit.  Additionally, some find it
> > more difficult to mentally follow the flow of execution.  Based on my
> past
> > work with a large Twisted Python codebase, I'll agree that it's tough to
> > get used to, but not unmanageable with experience and good coding
> patterns.
> >
> > A third consideration is code reuse.  A big advantage of Rx is that it
> > comes with many tools for transforming Observables, handling multiple
> > Observables, error handling, and tracing.  With the state machine
> approach,
> > we would need to write equivalents for these from scratch.  This is a
> > non-trivial amount of work that might make the project take significantly
> > longer to complete.  Combining this with fact that the Rx approach would
> be
> > less invasive, it seems like we would have an easier time introducing
> > incremental changes to the code base rather than having a big-bang
> commit.
> >
> > If I can boil these concerns down to one tradeoff, it's this: do we want
> to
> > expend more effort and have more explicit code and complete control, or
> do
> > we want to piggyback on the Rx work, give up some control, and
> (hopefully)
> > get to the next, deeper optimizations sooner?
> >
> > Thanks for any input on this topic.
> >
> >
> > --
> > Tyler Hobbs
> > DataStax <http://datastax.com/>
> >
>

Re: CASSANDRA-10993 Approaches

Posted by Jake Luciani <ja...@gmail.com>.

I think I outlined the tradeoffs I see between the roll our own vs use a
reactive framework in
https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-10528

My view is we should try to utilize the existing before we start writing
our own. And even if we do write our own keep it reactive since reactive
APIs are going to be adopted in the Java 9 spec.  There is an entire
community out there thinking about asynchronous programming that we can tap
into.

I don't buy the argument (yet) that Rx or other libraries lack the control
we need. In fact these APIs are quite extensible.

On Aug 17, 2016 3:08 PM, "Tyler Hobbs" <ty...@datastax.com> wrote:

> In the spirit of the recent thread about discussing large changes on the
> Dev ML, I'd like to talk about CASSANDRA-10993, the first step in the
> "thread per core" work.
>
> The goal of 10993 is to transform the read and write paths into an
> event-driven model powered by event loops.  This means that each request
> can be handled on a single thread (although typically broken up into
> multiple steps, depending on I/O and locking) and the old mutation and read
> thread pools can be removed.  So far, we've prototyped this with a couple
> of approaches:
>
> The first approach models each request as a state machine (or composition
> of state machines).  For example, a single write request is encapsulated in
> a WriteTask object which moves through a series of states as portions of
> the write complete (allocating a commitlog segment, syncing the commitlog,
> receiving responses from remote replicas).  These state transitions are
> triggered by Events that are emitted by, e.g., the
> CommitlogSegmentManager.  The event loop that manages tasks, events,
> timeouts, and scheduling is custom and is (currently) closely tied to a
> Netty event loop.  Here are a couple of example classes to take a look at:
>
> WriteTask:
> https://github.com/thobbs/cassandra/blob/CASSANDRA-
> 10993-WIP/src/java/org/apache/cassandra/poc/WriteTask.java
> EventLoop:
> https://github.com/thobbs/cassandra/blob/CASSANDRA-
> 10993-WIP/src/java/org/apache/cassandra/poc/EventLoop.java
>
> The second approach utilizes RxJava and the Observable pattern.  Where we
> would wait for emitted events in the state machine approach, we instead
> depend on an Observable to "push" the data/result we're awaiting.
> Scheduling is handled by an Rx scheduler (which is customizable).  The code
> changes required for this are, overall, less intrusive.  Here's a quick
> example of what this looks like for high-level operations:
> https://github.com/thobbs/cassandra/blob/rxjava-rebase/
> src/java/org/apache/cassandra/service/StorageProxy.java#L1724-L1732
> .
>
> So far we've benchmarked both approaches on in-memory reads to get an idea
> of the upper-bound performance of both approaches.  Throughput appears to
> be very similar with both branches.
>
> There are a few considerations up for debate as to which approach we should
> go with that I would appreciate input on.
>
> First, performance.  There are concerns that going with Rx (or something
> similar) may limit the peak performance we can eventually attain in a
> couple of ways.  First, we don't have as much control over the event loop,
> scheduling, and chunking of tasks.  With the state machine approach, we're
> writing all of this, so it's totally under our control.  With Rx, a lot of
> things are customizable or already have decent tools, but this may come up
> short in critical ways.  Second, the overhead of the Observable machinery
> may become significant as other bottlenecks are removed.  Of course,
> WriteTask et al have their own overhead, but once again, we have more
> control there.
>
> The second consideration is code style and ease of understanding.  I think
> both of these approaches have downsides in different areas.  The state
> machines are very explicit (an upside), but also very verbose and somewhat
> disjointed.  Most of the complex operations in Cassandra can't cleanly be
> represented as a single state machine, because they're logically multiple
> state machines operating in parallel (e.g. the local write path and the
> remote write path in WriteTask).  After working on the prototypes, I've
> found the state machines to be harder to logically follow than I had
> hoped.  Perhaps we could come up with better abstractions and patterns for
> this, but that's the current state of things.  On the Rx side, the downside
> is that the behavior is much less explicit.  Additionally, some find it
> more difficult to mentally follow the flow of execution.  Based on my past
> work with a large Twisted Python codebase, I'll agree that it's tough to
> get used to, but not unmanageable with experience and good coding patterns.
>
> A third consideration is code reuse.  A big advantage of Rx is that it
> comes with many tools for transforming Observables, handling multiple
> Observables, error handling, and tracing.  With the state machine approach,
> we would need to write equivalents for these from scratch.  This is a
> non-trivial amount of work that might make the project take significantly
> longer to complete.  Combining this with fact that the Rx approach would be
> less invasive, it seems like we would have an easier time introducing
> incremental changes to the code base rather than having a big-bang commit.
>
> If I can boil these concerns down to one tradeoff, it's this: do we want to
> expend more effort and have more explicit code and complete control, or do
> we want to piggyback on the Rx work, give up some control, and (hopefully)
> get to the next, deeper optimizations sooner?
>
> Thanks for any input on this topic.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: CASSANDRA-10993 Approaches

Posted by Eric Evans <jo...@gmail.com>.

On Wed, Aug 17, 2016 at 2:08 PM, Tyler Hobbs <ty...@datastax.com> wrote:
> In the spirit of the recent thread about discussing large changes on the
> Dev ML, I'd like to talk about CASSANDRA-10993, the first step in the
> "thread per core" work.

I'm just lurking ATM; I don't have anything to add WRT
CASSANDRA-10993, but wanted to say that this is awesome.  Thanks for
this Tyler!

-- 
Eric Evans
john.eric.evans@gmail.com