You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Ariel Weisberg <ar...@weisberg.ws> on 2018/08/27 17:29:47 UTC

Transient Replication 4.0 status update

Hi all,

I wanted to give everyone an update on how development of Transient Replication is going and where we are going to be as of 9/1. Blake Eggleston, Alex Petrov, Benedict Elliott Smith, and myself have been working to get TR implemented for 4.0. Up to now we have avoided merging anything related to TR to trunk because we weren't 100% sure we were going to make the 9/1 deadline and even minimal TR functionality requires significant changes (see 14405).

We focused on getting a minimal set of deployable functionality working, and want to avoid overselling what's going to work in the first version. The feature is marked explicitly as experimental and has to be enabled via a feature flag in cassandra.yaml. The expected audience for TR in 4.0 is more experienced users who are ready to tackle deploying experimental functionality. As it is deployed by experienced users and we gain more confidence in it and remove caveats the # of users it will be appropriate for will expand.

For 4.0 it looks like we will be able to merge TR with support for normal reads and writes without monotonic reads. Monotonic reads require blocking read repair and blocking read repair with TR requires further changes that aren't feasible by 9/1.

Future TR support would look something like

4.0.next:
    * vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)

4.next:
    * Monotonic reads (https://issues.apache.org/jira/browse/CASSANDRA-14665)
    * LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
    * Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
    * Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)

Possibly never:
    * Materialized views
 
Probably never:
    * Secondary indexes

The most difficult changes to support Transient Replication should be behind us. LWT, Batch log, and counters shouldn't be that hard to make transient replication aware. Monotonic reads require some changes to the read path, but are at least conceptually not that hard to support. I am confident that by 4.next TR will have fewer tradeoffs.

If you want to take a peek the current feature branch is https://github.com/aweisberg/cassandra/tree/14409-7 although we will be moving to 14409-8 to rebase on to trunk.

Regards,
Ariel

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Transient Replication 4.0 status update

Posted by Ariel Weisberg <ar...@weisberg.ws>.
Hi,

There are no transient nodes. All nodes are the same. If you have transient replication enabled each node will transiently replicate some ranges instead of fully replicating them.

Capacity requirements are reduced evenly across all nodes in the cluster.

Nodes are not temporarily transient replicas during expansion. They need to stream data like a full replica for the transient range before they can serve reads. There is a pending state similar to how there is a pending state for full replicas. Transient replicas also always receive writes when they are pending. There may be some room to relax how that is handled, but for now we opt to send pending transient ranges a bit more data and avoid reading from them when maybe we could.

This doesn't change how expansion works with vnodes. The same restrictions still apply. We won't officially support vnodes until we have done more testing and really thought through the corner cases. It's quite possible we will relax the restriction on creating transient keyspaces with vnodes in 4.0.x.

Ariel

On Fri, Aug 31, 2018, at 2:07 PM, Carl Mueller wrote:
> I put these questions on the ticket too... Sorry if some of them are
> stupid.
> 
> So are (basically) these transient nodes basically serving as centralized
> hinted handoff caches rather than having the hinted handoffs cluttering up
> full replicas, especially nodes that have no concern for the token range
> involved? I understand that hinted handoffs aren't being replaced by this,
> but is that kind of the idea?
> 
> Are the transient nodes "sitting around"?
> 
> Will the transient nodes have cheaper/lower hardware requirements?
> 
> During cluster expansion, does the newly streaming node acquiring data
> function as a temporary transient node until it becomes a full replica?
> Likewise while shrinking, does a previously full replica function as a
> transient while it streams off data?
> 
> Can this help vnode expansion with multiple concurrent nodes? Admittedly
> I'm not familiar with how much work has gone into fixing cluster expansion
> with vnodes, it is my understanding that you typically expand only one node
> at a time or in multiples of the datacenter size
> 
> On Mon, Aug 27, 2018 at 12:29 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
> 
> > Hi all,
> >
> > I wanted to give everyone an update on how development of Transient
> > Replication is going and where we are going to be as of 9/1. Blake
> > Eggleston, Alex Petrov, Benedict Elliott Smith, and myself have been
> > working to get TR implemented for 4.0. Up to now we have avoided merging
> > anything related to TR to trunk because we weren't 100% sure we were going
> > to make the 9/1 deadline and even minimal TR functionality requires
> > significant changes (see 14405).
> >
> > We focused on getting a minimal set of deployable functionality working,
> > and want to avoid overselling what's going to work in the first version.
> > The feature is marked explicitly as experimental and has to be enabled via
> > a feature flag in cassandra.yaml. The expected audience for TR in 4.0 is
> > more experienced users who are ready to tackle deploying experimental
> > functionality. As it is deployed by experienced users and we gain more
> > confidence in it and remove caveats the # of users it will be appropriate
> > for will expand.
> >
> > For 4.0 it looks like we will be able to merge TR with support for normal
> > reads and writes without monotonic reads. Monotonic reads require blocking
> > read repair and blocking read repair with TR requires further changes that
> > aren't feasible by 9/1.
> >
> > Future TR support would look something like
> >
> > 4.0.next:
> >     * vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)
> >
> > 4.next:
> >     * Monotonic reads (
> > https://issues.apache.org/jira/browse/CASSANDRA-14665)
> >     * LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
> >     * Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
> >     * Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)
> >
> > Possibly never:
> >     * Materialized views
> >
> > Probably never:
> >     * Secondary indexes
> >
> > The most difficult changes to support Transient Replication should be
> > behind us. LWT, Batch log, and counters shouldn't be that hard to make
> > transient replication aware. Monotonic reads require some changes to the
> > read path, but are at least conceptually not that hard to support. I am
> > confident that by 4.next TR will have fewer tradeoffs.
> >
> > If you want to take a peek the current feature branch is
> > https://github.com/aweisberg/cassandra/tree/14409-7 although we will be
> > moving to 14409-8 to rebase on to trunk.
> >
> > Regards,
> > Ariel
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Transient Replication 4.0 status update

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.
I see, so there are no dedicated transient nodes, just other nodes that
function as witnesses.

This is still very exciting.


On Fri, Aug 31, 2018 at 1:49 PM Ariel Weisberg <ar...@weisberg.ws> wrote:

> Hi,
>
> All nodes being the same (in terms of functionality) is something we
> wanted to stick with at least for now. I think we want a design that
> changes the operational, availability, and consistency story as little as
> possible when it's completed.
>
> Ariel
> On Fri, Aug 31, 2018, at 2:27 PM, Carl Mueller wrote:
> > SOrry to spam this with two messages...
> >
> > This ticket is also interesting because it is very close to what I
> imagined
> > a useful use case of RF4 / RF6: being basically RF3 + hot spare where you
> > marked (in the case of RF4) three nodes as primary and the fourth as hot
> > standby, which may be equivalent if I understand the paper/protocol to
> > RF3+1 transient.
> >
> > On Fri, Aug 31, 2018 at 1:07 PM Carl Mueller <
> carl.mueller@smartthings.com>
> > wrote:
> >
> > > I put these questions on the ticket too... Sorry if some of them are
> > > stupid.
> > >
> > > So are (basically) these transient nodes basically serving as
> centralized
> > > hinted handoff caches rather than having the hinted handoffs
> cluttering up
> > > full replicas, especially nodes that have no concern for the token
> range
> > > involved? I understand that hinted handoffs aren't being replaced by
> this,
> > > but is that kind of the idea?
> > >
> > > Are the transient nodes "sitting around"?
> > >
> > > Will the transient nodes have cheaper/lower hardware requirements?
> > >
> > > During cluster expansion, does the newly streaming node acquiring data
> > > function as a temporary transient node until it becomes a full replica?
> > > Likewise while shrinking, does a previously full replica function as a
> > > transient while it streams off data?
> > >
> > > Can this help vnode expansion with multiple concurrent nodes?
> Admittedly
> > > I'm not familiar with how much work has gone into fixing cluster
> expansion
> > > with vnodes, it is my understanding that you typically expand only one
> node
> > > at a time or in multiples of the datacenter size
> > >
> > > On Mon, Aug 27, 2018 at 12:29 PM Ariel Weisberg <ar...@weisberg.ws>
> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I wanted to give everyone an update on how development of Transient
> > >> Replication is going and where we are going to be as of 9/1. Blake
> > >> Eggleston, Alex Petrov, Benedict Elliott Smith, and myself have been
> > >> working to get TR implemented for 4.0. Up to now we have avoided
> merging
> > >> anything related to TR to trunk because we weren't 100% sure we were
> going
> > >> to make the 9/1 deadline and even minimal TR functionality requires
> > >> significant changes (see 14405).
> > >>
> > >> We focused on getting a minimal set of deployable functionality
> working,
> > >> and want to avoid overselling what's going to work in the first
> version.
> > >> The feature is marked explicitly as experimental and has to be
> enabled via
> > >> a feature flag in cassandra.yaml. The expected audience for TR in 4.0
> is
> > >> more experienced users who are ready to tackle deploying experimental
> > >> functionality. As it is deployed by experienced users and we gain more
> > >> confidence in it and remove caveats the # of users it will be
> appropriate
> > >> for will expand.
> > >>
> > >> For 4.0 it looks like we will be able to merge TR with support for
> normal
> > >> reads and writes without monotonic reads. Monotonic reads require
> blocking
> > >> read repair and blocking read repair with TR requires further changes
> that
> > >> aren't feasible by 9/1.
> > >>
> > >> Future TR support would look something like
> > >>
> > >> 4.0.next:
> > >>     * vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)
> > >>
> > >> 4.next:
> > >>     * Monotonic reads (
> > >> https://issues.apache.org/jira/browse/CASSANDRA-14665)
> > >>     * LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
> > >>     * Batch log (
> https://issues.apache.org/jira/browse/CASSANDRA-14549)
> > >>     * Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548
> )
> > >>
> > >> Possibly never:
> > >>     * Materialized views
> > >>
> > >> Probably never:
> > >>     * Secondary indexes
> > >>
> > >> The most difficult changes to support Transient Replication should be
> > >> behind us. LWT, Batch log, and counters shouldn't be that hard to make
> > >> transient replication aware. Monotonic reads require some changes to
> the
> > >> read path, but are at least conceptually not that hard to support. I
> am
> > >> confident that by 4.next TR will have fewer tradeoffs.
> > >>
> > >> If you want to take a peek the current feature branch is
> > >> https://github.com/aweisberg/cassandra/tree/14409-7 although we will
> be
> > >> moving to 14409-8 to rebase on to trunk.
> > >>
> > >> Regards,
> > >> Ariel
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Transient Replication 4.0 status update

Posted by Ariel Weisberg <ar...@weisberg.ws>.
Hi,

All nodes being the same (in terms of functionality) is something we wanted to stick with at least for now. I think we want a design that changes the operational, availability, and consistency story as little as possible when it's completed.

Ariel
On Fri, Aug 31, 2018, at 2:27 PM, Carl Mueller wrote:
> SOrry to spam this with two messages...
> 
> This ticket is also interesting because it is very close to what I imagined
> a useful use case of RF4 / RF6: being basically RF3 + hot spare where you
> marked (in the case of RF4) three nodes as primary and the fourth as hot
> standby, which may be equivalent if I understand the paper/protocol to
> RF3+1 transient.
> 
> On Fri, Aug 31, 2018 at 1:07 PM Carl Mueller <ca...@smartthings.com>
> wrote:
> 
> > I put these questions on the ticket too... Sorry if some of them are
> > stupid.
> >
> > So are (basically) these transient nodes basically serving as centralized
> > hinted handoff caches rather than having the hinted handoffs cluttering up
> > full replicas, especially nodes that have no concern for the token range
> > involved? I understand that hinted handoffs aren't being replaced by this,
> > but is that kind of the idea?
> >
> > Are the transient nodes "sitting around"?
> >
> > Will the transient nodes have cheaper/lower hardware requirements?
> >
> > During cluster expansion, does the newly streaming node acquiring data
> > function as a temporary transient node until it becomes a full replica?
> > Likewise while shrinking, does a previously full replica function as a
> > transient while it streams off data?
> >
> > Can this help vnode expansion with multiple concurrent nodes? Admittedly
> > I'm not familiar with how much work has gone into fixing cluster expansion
> > with vnodes, it is my understanding that you typically expand only one node
> > at a time or in multiples of the datacenter size
> >
> > On Mon, Aug 27, 2018 at 12:29 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
> >
> >> Hi all,
> >>
> >> I wanted to give everyone an update on how development of Transient
> >> Replication is going and where we are going to be as of 9/1. Blake
> >> Eggleston, Alex Petrov, Benedict Elliott Smith, and myself have been
> >> working to get TR implemented for 4.0. Up to now we have avoided merging
> >> anything related to TR to trunk because we weren't 100% sure we were going
> >> to make the 9/1 deadline and even minimal TR functionality requires
> >> significant changes (see 14405).
> >>
> >> We focused on getting a minimal set of deployable functionality working,
> >> and want to avoid overselling what's going to work in the first version.
> >> The feature is marked explicitly as experimental and has to be enabled via
> >> a feature flag in cassandra.yaml. The expected audience for TR in 4.0 is
> >> more experienced users who are ready to tackle deploying experimental
> >> functionality. As it is deployed by experienced users and we gain more
> >> confidence in it and remove caveats the # of users it will be appropriate
> >> for will expand.
> >>
> >> For 4.0 it looks like we will be able to merge TR with support for normal
> >> reads and writes without monotonic reads. Monotonic reads require blocking
> >> read repair and blocking read repair with TR requires further changes that
> >> aren't feasible by 9/1.
> >>
> >> Future TR support would look something like
> >>
> >> 4.0.next:
> >>     * vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)
> >>
> >> 4.next:
> >>     * Monotonic reads (
> >> https://issues.apache.org/jira/browse/CASSANDRA-14665)
> >>     * LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
> >>     * Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
> >>     * Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)
> >>
> >> Possibly never:
> >>     * Materialized views
> >>
> >> Probably never:
> >>     * Secondary indexes
> >>
> >> The most difficult changes to support Transient Replication should be
> >> behind us. LWT, Batch log, and counters shouldn't be that hard to make
> >> transient replication aware. Monotonic reads require some changes to the
> >> read path, but are at least conceptually not that hard to support. I am
> >> confident that by 4.next TR will have fewer tradeoffs.
> >>
> >> If you want to take a peek the current feature branch is
> >> https://github.com/aweisberg/cassandra/tree/14409-7 although we will be
> >> moving to 14409-8 to rebase on to trunk.
> >>
> >> Regards,
> >> Ariel
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Transient Replication 4.0 status update

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.
SOrry to spam this with two messages...

This ticket is also interesting because it is very close to what I imagined
a useful use case of RF4 / RF6: being basically RF3 + hot spare where you
marked (in the case of RF4) three nodes as primary and the fourth as hot
standby, which may be equivalent if I understand the paper/protocol to
RF3+1 transient.

On Fri, Aug 31, 2018 at 1:07 PM Carl Mueller <ca...@smartthings.com>
wrote:

> I put these questions on the ticket too... Sorry if some of them are
> stupid.
>
> So are (basically) these transient nodes basically serving as centralized
> hinted handoff caches rather than having the hinted handoffs cluttering up
> full replicas, especially nodes that have no concern for the token range
> involved? I understand that hinted handoffs aren't being replaced by this,
> but is that kind of the idea?
>
> Are the transient nodes "sitting around"?
>
> Will the transient nodes have cheaper/lower hardware requirements?
>
> During cluster expansion, does the newly streaming node acquiring data
> function as a temporary transient node until it becomes a full replica?
> Likewise while shrinking, does a previously full replica function as a
> transient while it streams off data?
>
> Can this help vnode expansion with multiple concurrent nodes? Admittedly
> I'm not familiar with how much work has gone into fixing cluster expansion
> with vnodes, it is my understanding that you typically expand only one node
> at a time or in multiples of the datacenter size
>
> On Mon, Aug 27, 2018 at 12:29 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
>
>> Hi all,
>>
>> I wanted to give everyone an update on how development of Transient
>> Replication is going and where we are going to be as of 9/1. Blake
>> Eggleston, Alex Petrov, Benedict Elliott Smith, and myself have been
>> working to get TR implemented for 4.0. Up to now we have avoided merging
>> anything related to TR to trunk because we weren't 100% sure we were going
>> to make the 9/1 deadline and even minimal TR functionality requires
>> significant changes (see 14405).
>>
>> We focused on getting a minimal set of deployable functionality working,
>> and want to avoid overselling what's going to work in the first version.
>> The feature is marked explicitly as experimental and has to be enabled via
>> a feature flag in cassandra.yaml. The expected audience for TR in 4.0 is
>> more experienced users who are ready to tackle deploying experimental
>> functionality. As it is deployed by experienced users and we gain more
>> confidence in it and remove caveats the # of users it will be appropriate
>> for will expand.
>>
>> For 4.0 it looks like we will be able to merge TR with support for normal
>> reads and writes without monotonic reads. Monotonic reads require blocking
>> read repair and blocking read repair with TR requires further changes that
>> aren't feasible by 9/1.
>>
>> Future TR support would look something like
>>
>> 4.0.next:
>>     * vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)
>>
>> 4.next:
>>     * Monotonic reads (
>> https://issues.apache.org/jira/browse/CASSANDRA-14665)
>>     * LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
>>     * Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
>>     * Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)
>>
>> Possibly never:
>>     * Materialized views
>>
>> Probably never:
>>     * Secondary indexes
>>
>> The most difficult changes to support Transient Replication should be
>> behind us. LWT, Batch log, and counters shouldn't be that hard to make
>> transient replication aware. Monotonic reads require some changes to the
>> read path, but are at least conceptually not that hard to support. I am
>> confident that by 4.next TR will have fewer tradeoffs.
>>
>> If you want to take a peek the current feature branch is
>> https://github.com/aweisberg/cassandra/tree/14409-7 although we will be
>> moving to 14409-8 to rebase on to trunk.
>>
>> Regards,
>> Ariel
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>

Re: Transient Replication 4.0 status update

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.
I put these questions on the ticket too... Sorry if some of them are
stupid.

So are (basically) these transient nodes basically serving as centralized
hinted handoff caches rather than having the hinted handoffs cluttering up
full replicas, especially nodes that have no concern for the token range
involved? I understand that hinted handoffs aren't being replaced by this,
but is that kind of the idea?

Are the transient nodes "sitting around"?

Will the transient nodes have cheaper/lower hardware requirements?

During cluster expansion, does the newly streaming node acquiring data
function as a temporary transient node until it becomes a full replica?
Likewise while shrinking, does a previously full replica function as a
transient while it streams off data?

Can this help vnode expansion with multiple concurrent nodes? Admittedly
I'm not familiar with how much work has gone into fixing cluster expansion
with vnodes, it is my understanding that you typically expand only one node
at a time or in multiples of the datacenter size

On Mon, Aug 27, 2018 at 12:29 PM Ariel Weisberg <ar...@weisberg.ws> wrote:

> Hi all,
>
> I wanted to give everyone an update on how development of Transient
> Replication is going and where we are going to be as of 9/1. Blake
> Eggleston, Alex Petrov, Benedict Elliott Smith, and myself have been
> working to get TR implemented for 4.0. Up to now we have avoided merging
> anything related to TR to trunk because we weren't 100% sure we were going
> to make the 9/1 deadline and even minimal TR functionality requires
> significant changes (see 14405).
>
> We focused on getting a minimal set of deployable functionality working,
> and want to avoid overselling what's going to work in the first version.
> The feature is marked explicitly as experimental and has to be enabled via
> a feature flag in cassandra.yaml. The expected audience for TR in 4.0 is
> more experienced users who are ready to tackle deploying experimental
> functionality. As it is deployed by experienced users and we gain more
> confidence in it and remove caveats the # of users it will be appropriate
> for will expand.
>
> For 4.0 it looks like we will be able to merge TR with support for normal
> reads and writes without monotonic reads. Monotonic reads require blocking
> read repair and blocking read repair with TR requires further changes that
> aren't feasible by 9/1.
>
> Future TR support would look something like
>
> 4.0.next:
>     * vnodes (https://issues.apache.org/jira/browse/CASSANDRA-14404)
>
> 4.next:
>     * Monotonic reads (
> https://issues.apache.org/jira/browse/CASSANDRA-14665)
>     * LWT (https://issues.apache.org/jira/browse/CASSANDRA-14547)
>     * Batch log (https://issues.apache.org/jira/browse/CASSANDRA-14549)
>     * Counters (https://issues.apache.org/jira/browse/CASSANDRA-14548)
>
> Possibly never:
>     * Materialized views
>
> Probably never:
>     * Secondary indexes
>
> The most difficult changes to support Transient Replication should be
> behind us. LWT, Batch log, and counters shouldn't be that hard to make
> transient replication aware. Monotonic reads require some changes to the
> read path, but are at least conceptually not that hard to support. I am
> confident that by 4.next TR will have fewer tradeoffs.
>
> If you want to take a peek the current feature branch is
> https://github.com/aweisberg/cassandra/tree/14409-7 although we will be
> moving to 14409-8 to rebase on to trunk.
>
> Regards,
> Ariel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>