You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Carl Mueller <ca...@smartthings.com.INVALID> on 2018/10/18 15:53:41 UTC

Built in trigger: double-write for app migration

tl;dr: a generic trigger on TABLES that will mirror all writes to
facilitate data migrations between clusters or systems. What is necessary
to ensure full write mirroring/coherency?

When cassandra clusters have several "apps" aka keyspaces serving
applications colocated on them, but the app/keyspace bandwidth and size
demands begin impacting other keyspaces/apps, then one strategy is to
migrate the keyspace to its own dedicated cluster.

With backups/sstableloading, this will entail a delay and therefore a
"coherency" shortfall between the clusters. So typically one would employ a
"double write, read once":

- all updates are mirrored to both clusters
- writes come from the current most coherent.

Often two sstable loads are done:

1) first load
2) turn on double writes/write mirroring
3) a second load is done to finalize coherency
4) switch the app to point to the new cluster now that it is coherent

The double writes and read is the sticking point. We could do it at the app
layer, but if the app wasn't written with that, it is a lot of testing and
customization specific to the framework.

We could theoretically do some sort of proxying of the java-driver somehow,
but all the async structures and complex interfaces/apis would be difficult
to proxy. Maybe there is a lower level in the java-driver that is possible.
This also would only apply to the java-driver, and not
python/go/javascript/other drivers.

Finally, I suppose we could do a trigger on the tables. It would be really
nice if we could add to the cassandra toolbox the basics of a write
mirroring trigger that could be activated "fairly easily"... now I know
there are the complexities of inter-cluster access, and if we are even
using cassandra as the target mirror system (for example there is an
article on triggers write-mirroring to kafka:
https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).

And this starts to get into the complexities of hinted handoff as well. But
fundamentally this seems something that would be a very nice feature
(especially when you NEED it) to have in the core of cassandra.

Finally, is the mutation hook in triggers sufficient to track all incoming
mutations (outside of "shudder" other triggers generating data)

Re: Built in trigger: double-write for app migration

Posted by Rahul Singh <ra...@gmail.com>.
Trigger based has worked for us in the past to get once only output of what’s happened - pushing this to Kafka and using Kafka Connect allowed to then direct the stream to to other endpoints.

CDC based streaming has the issue of duplicates which are technically fine if you don’t care that much about repeat changes coming from replicas.

I agree with Ben. If the goal is just to move a key space from one cluster to another that is active and can’t go down, his method will work for sure.

Also, is there a specific reason you need to split the cluster? Why not just have another DC and keep it part of the cluster? Do you have more than a hundred tables?


Rahul Singh
Chief Executive Officer
m 202.905.2818

Anant Corporation
1010 Wisconsin Ave NW, Suite 250
Washington, D.C. 20007

We build and manage digital business technology platforms.
On Oct 18, 2018, 4:35 PM -0400, Ben Slater <be...@instaclustr.com>, wrote:
> I might be missing something but we’ve done this operation on a few
> occasions by:
> 1) Commission the new cluster and join it to the existing cluster as a 2nd
> DC
> 2) Replicate just the keyspace that you want to move to the 2nd DC
> 3) Make app changes to read moved tables from 2nd DC
> 4) Change keyspace definition to remove moved keyspace from first DC
> 5) Split the 2DCs into separate clusters (sever network connections, change
> seeds)
>
> If it’s just a table you moving and not a whole keyspace then you can skip
> step 4 and drop the unneeded tables from either side after splitting. This
> might mean the new cluster needs to be temporarily bigger than the
> end-state during the migration process.
>
> Cheers
> Ben
>
> On Fri, 19 Oct 2018 at 07:04 Jeff Jirsa <jj...@gmail.com> wrote:
>
> > Could be done with CDC
> > Could be done with triggers
> > (Could be done with vtables — double writes or double reads — if they were
> > extended to be user facing)
> >
> > Would be very hard to generalize properly, especially handling failure
> > cases (write succeeds in one cluster/table but not the other) which are
> > often app specific
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Oct 18, 2018, at 6:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> > >
> > > Isn't this what CDC was designed for?
> > >
> > > https://issues.apache.org/jira/browse/CASSANDRA-8844
> > >
> > > On Thu, Oct 18, 2018 at 10:54 AM Carl Mueller
> > > <ca...@smartthings.com.invalid> wrote:
> > >
> > > > tl;dr: a generic trigger on TABLES that will mirror all writes to
> > > > facilitate data migrations between clusters or systems. What is
> > necessary
> > > > to ensure full write mirroring/coherency?
> > > >
> > > > When cassandra clusters have several "apps" aka keyspaces serving
> > > > applications colocated on them, but the app/keyspace bandwidth and size
> > > > demands begin impacting other keyspaces/apps, then one strategy is to
> > > > migrate the keyspace to its own dedicated cluster.
> > > >
> > > > With backups/sstableloading, this will entail a delay and therefore a
> > > > "coherency" shortfall between the clusters. So typically one would
> > employ a
> > > > "double write, read once":
> > > >
> > > > - all updates are mirrored to both clusters
> > > > - writes come from the current most coherent.
> > > >
> > > > Often two sstable loads are done:
> > > >
> > > > 1) first load
> > > > 2) turn on double writes/write mirroring
> > > > 3) a second load is done to finalize coherency
> > > > 4) switch the app to point to the new cluster now that it is coherent
> > > >
> > > > The double writes and read is the sticking point. We could do it at the
> > app
> > > > layer, but if the app wasn't written with that, it is a lot of testing
> > and
> > > > customization specific to the framework.
> > > >
> > > > We could theoretically do some sort of proxying of the java-driver
> > somehow,
> > > > but all the async structures and complex interfaces/apis would be
> > difficult
> > > > to proxy. Maybe there is a lower level in the java-driver that is
> > possible.
> > > > This also would only apply to the java-driver, and not
> > > > python/go/javascript/other drivers.
> > > >
> > > > Finally, I suppose we could do a trigger on the tables. It would be
> > really
> > > > nice if we could add to the cassandra toolbox the basics of a write
> > > > mirroring trigger that could be activated "fairly easily"... now I know
> > > > there are the complexities of inter-cluster access, and if we are even
> > > > using cassandra as the target mirror system (for example there is an
> > > > article on triggers write-mirroring to kafka:
> > > > https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> > > >
> > > > And this starts to get into the complexities of hinted handoff as well.
> > But
> > > > fundamentally this seems something that would be a very nice feature
> > > > (especially when you NEED it) to have in the core of cassandra.
> > > >
> > > > Finally, is the mutation hook in triggers sufficient to track all
> > incoming
> > > > mutations (outside of "shudder" other triggers generating data)
> > > >
> > >
> > >
> > > --
> > > Jonathan Ellis
> > > co-founder, http://www.datastax.com
> > > @spyced
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> > --
>
>
> *Ben Slater*
>
> *Chief Product Officer <https://www.instaclustr.com/>*
>
> <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr>
> <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information. If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.

Re: Built in trigger: double-write for app migration

Posted by Ben Slater <be...@instaclustr.com>.
I might be missing something but we’ve done this operation on a few
occasions by:
1) Commission the new cluster and join it to the existing cluster as a 2nd
DC
2) Replicate just the keyspace that you want to move to the 2nd DC
3) Make app changes to read moved tables from 2nd DC
4) Change keyspace definition to remove moved keyspace from first DC
5) Split the 2DCs into separate clusters (sever network connections, change
seeds)

If it’s just a table you moving and not a whole keyspace then you can skip
step 4 and drop the unneeded tables from either side after splitting. This
might mean the new cluster needs to be temporarily bigger than the
end-state during the migration process.

Cheers
Ben

On Fri, 19 Oct 2018 at 07:04 Jeff Jirsa <jj...@gmail.com> wrote:

> Could be done with CDC
> Could be done with triggers
> (Could be done with vtables — double writes or double reads — if they were
> extended to be user facing)
>
> Would be very hard to generalize properly, especially handling failure
> cases (write succeeds in one cluster/table but not the other) which are
> often app specific
>
>
> --
> Jeff Jirsa
>
>
> > On Oct 18, 2018, at 6:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> >
> > Isn't this what CDC was designed for?
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-8844
> >
> > On Thu, Oct 18, 2018 at 10:54 AM Carl Mueller
> > <ca...@smartthings.com.invalid> wrote:
> >
> >> tl;dr: a generic trigger on TABLES that will mirror all writes to
> >> facilitate data migrations between clusters or systems. What is
> necessary
> >> to ensure full write mirroring/coherency?
> >>
> >> When cassandra clusters have several "apps" aka keyspaces serving
> >> applications colocated on them, but the app/keyspace bandwidth and size
> >> demands begin impacting other keyspaces/apps, then one strategy is to
> >> migrate the keyspace to its own dedicated cluster.
> >>
> >> With backups/sstableloading, this will entail a delay and therefore a
> >> "coherency" shortfall between the clusters. So typically one would
> employ a
> >> "double write, read once":
> >>
> >> - all updates are mirrored to both clusters
> >> - writes come from the current most coherent.
> >>
> >> Often two sstable loads are done:
> >>
> >> 1) first load
> >> 2) turn on double writes/write mirroring
> >> 3) a second load is done to finalize coherency
> >> 4) switch the app to point to the new cluster now that it is coherent
> >>
> >> The double writes and read is the sticking point. We could do it at the
> app
> >> layer, but if the app wasn't written with that, it is a lot of testing
> and
> >> customization specific to the framework.
> >>
> >> We could theoretically do some sort of proxying of the java-driver
> somehow,
> >> but all the async structures and complex interfaces/apis would be
> difficult
> >> to proxy. Maybe there is a lower level in the java-driver that is
> possible.
> >> This also would only apply to the java-driver, and not
> >> python/go/javascript/other drivers.
> >>
> >> Finally, I suppose we could do a trigger on the tables. It would be
> really
> >> nice if we could add to the cassandra toolbox the basics of a write
> >> mirroring trigger that could be activated "fairly easily"... now I know
> >> there are the complexities of inter-cluster access, and if we are even
> >> using cassandra as the target mirror system (for example there is an
> >> article on triggers write-mirroring to kafka:
> >> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> >>
> >> And this starts to get into the complexities of hinted handoff as well.
> But
> >> fundamentally this seems something that would be a very nice feature
> >> (especially when you NEED it) to have in the core of cassandra.
> >>
> >> Finally, is the mutation hook in triggers sufficient to track all
> incoming
> >> mutations (outside of "shudder" other triggers generating data)
> >>
> >
> >
> > --
> > Jonathan Ellis
> > co-founder, http://www.datastax.com
> > @spyced
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
> --


*Ben Slater*

*Chief Product Officer <https://www.instaclustr.com/>*

<https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr>

Read our latest technical blog posts here
<https://www.instaclustr.com/blog/>.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
and Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally
privileged information.  If you are not the intended recipient, do not copy
or disclose its content, but please reply to this email immediately and
highlight the error to the sender and then immediately delete the message.

Re: Built in trigger: double-write for app migration

Posted by Jeff Jirsa <jj...@gmail.com>.
Could be done with CDC
Could be done with triggers
(Could be done with vtables — double writes or double reads — if they were extended to be user facing)

Would be very hard to generalize properly, especially handling failure cases (write succeeds in one cluster/table but not the other) which are often app specific


-- 
Jeff Jirsa


> On Oct 18, 2018, at 6:47 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> 
> Isn't this what CDC was designed for?
> 
> https://issues.apache.org/jira/browse/CASSANDRA-8844
> 
> On Thu, Oct 18, 2018 at 10:54 AM Carl Mueller
> <ca...@smartthings.com.invalid> wrote:
> 
>> tl;dr: a generic trigger on TABLES that will mirror all writes to
>> facilitate data migrations between clusters or systems. What is necessary
>> to ensure full write mirroring/coherency?
>> 
>> When cassandra clusters have several "apps" aka keyspaces serving
>> applications colocated on them, but the app/keyspace bandwidth and size
>> demands begin impacting other keyspaces/apps, then one strategy is to
>> migrate the keyspace to its own dedicated cluster.
>> 
>> With backups/sstableloading, this will entail a delay and therefore a
>> "coherency" shortfall between the clusters. So typically one would employ a
>> "double write, read once":
>> 
>> - all updates are mirrored to both clusters
>> - writes come from the current most coherent.
>> 
>> Often two sstable loads are done:
>> 
>> 1) first load
>> 2) turn on double writes/write mirroring
>> 3) a second load is done to finalize coherency
>> 4) switch the app to point to the new cluster now that it is coherent
>> 
>> The double writes and read is the sticking point. We could do it at the app
>> layer, but if the app wasn't written with that, it is a lot of testing and
>> customization specific to the framework.
>> 
>> We could theoretically do some sort of proxying of the java-driver somehow,
>> but all the async structures and complex interfaces/apis would be difficult
>> to proxy. Maybe there is a lower level in the java-driver that is possible.
>> This also would only apply to the java-driver, and not
>> python/go/javascript/other drivers.
>> 
>> Finally, I suppose we could do a trigger on the tables. It would be really
>> nice if we could add to the cassandra toolbox the basics of a write
>> mirroring trigger that could be activated "fairly easily"... now I know
>> there are the complexities of inter-cluster access, and if we are even
>> using cassandra as the target mirror system (for example there is an
>> article on triggers write-mirroring to kafka:
>> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
>> 
>> And this starts to get into the complexities of hinted handoff as well. But
>> fundamentally this seems something that would be a very nice feature
>> (especially when you NEED it) to have in the core of cassandra.
>> 
>> Finally, is the mutation hook in triggers sufficient to track all incoming
>> mutations (outside of "shudder" other triggers generating data)
>> 
> 
> 
> -- 
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Built in trigger: double-write for app migration

Posted by Jonathan Ellis <jb...@gmail.com>.
Isn't this what CDC was designed for?

https://issues.apache.org/jira/browse/CASSANDRA-8844

On Thu, Oct 18, 2018 at 10:54 AM Carl Mueller
<ca...@smartthings.com.invalid> wrote:

> tl;dr: a generic trigger on TABLES that will mirror all writes to
> facilitate data migrations between clusters or systems. What is necessary
> to ensure full write mirroring/coherency?
>
> When cassandra clusters have several "apps" aka keyspaces serving
> applications colocated on them, but the app/keyspace bandwidth and size
> demands begin impacting other keyspaces/apps, then one strategy is to
> migrate the keyspace to its own dedicated cluster.
>
> With backups/sstableloading, this will entail a delay and therefore a
> "coherency" shortfall between the clusters. So typically one would employ a
> "double write, read once":
>
> - all updates are mirrored to both clusters
> - writes come from the current most coherent.
>
> Often two sstable loads are done:
>
> 1) first load
> 2) turn on double writes/write mirroring
> 3) a second load is done to finalize coherency
> 4) switch the app to point to the new cluster now that it is coherent
>
> The double writes and read is the sticking point. We could do it at the app
> layer, but if the app wasn't written with that, it is a lot of testing and
> customization specific to the framework.
>
> We could theoretically do some sort of proxying of the java-driver somehow,
> but all the async structures and complex interfaces/apis would be difficult
> to proxy. Maybe there is a lower level in the java-driver that is possible.
> This also would only apply to the java-driver, and not
> python/go/javascript/other drivers.
>
> Finally, I suppose we could do a trigger on the tables. It would be really
> nice if we could add to the cassandra toolbox the basics of a write
> mirroring trigger that could be activated "fairly easily"... now I know
> there are the complexities of inter-cluster access, and if we are even
> using cassandra as the target mirror system (for example there is an
> article on triggers write-mirroring to kafka:
> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
>
> And this starts to get into the complexities of hinted handoff as well. But
> fundamentally this seems something that would be a very nice feature
> (especially when you NEED it) to have in the core of cassandra.
>
> Finally, is the mutation hook in triggers sufficient to track all incoming
> mutations (outside of "shudder" other triggers generating data)
>


-- 
Jonathan Ellis
co-founder, http://www.datastax.com
@spyced

Re: Built in trigger: double-write for app migration

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.
Also we have 2.1.x and 2.2 clusters, so we can't use CDC since apparently
that is a 3.8 feature.

Virtual tables are very exciting so we could do some collating stuff (which
I'd LOVE to do with our scheduling application where we can split tasks
into near term/most frequent(hours to days), medium-term/less common(days
to weeks), long/years ), with the aim of avoiding having to do compaction
at all and just truncating buckets as they "expire" for a nice O(1)
compaction process.

On Fri, Oct 19, 2018 at 9:57 AM Carl Mueller <ca...@smartthings.com>
wrote:

> new DC and then split is one way, but you have to wait for it to stream,
> and then how do you know the dc coherence is good enough to switch the
> targetted DC for local_quorum? And then once we split it we'd have downtime
> to "change the name" and other work that would distinguish it from the
> original cluster, from what I'm told from the peoples that do the DC /
> cluster setup and aws provisioning. It is a tool in the toolchest...
>
> We might be able to get stats of the queries and updates impacting the
> cluster in a centralized manner with a trigger too.
>
> We will probably do stream-to-kafka trigger based on what is on the
> intarweb and since we have kafka here already.
>
> I will look at CDC.
>
> Thank you everybody!
>
>
> On Fri, Oct 19, 2018 at 3:29 AM Antonis Papaioannou <pa...@ics.forth.gr>
> wrote:
>
>> It reminds me of “shadow writes” described in [1].
>> During data migration the coordinator forwards  a copy of any write
>> request regarding tokens that are being transferred to the new node.
>>
>> [1] Incremental Elasticity for NoSQL Data Stores, SRDS’17,
>> https://ieeexplore.ieee.org/document/8069080
>>
>>
>> > On 18 Oct 2018, at 18:53, Carl Mueller <ca...@smartthings.com.INVALID>
>> wrote:
>> >
>> > tl;dr: a generic trigger on TABLES that will mirror all writes to
>> > facilitate data migrations between clusters or systems. What is
>> necessary
>> > to ensure full write mirroring/coherency?
>> >
>> > When cassandra clusters have several "apps" aka keyspaces serving
>> > applications colocated on them, but the app/keyspace bandwidth and size
>> > demands begin impacting other keyspaces/apps, then one strategy is to
>> > migrate the keyspace to its own dedicated cluster.
>> >
>> > With backups/sstableloading, this will entail a delay and therefore a
>> > "coherency" shortfall between the clusters. So typically one would
>> employ a
>> > "double write, read once":
>> >
>> > - all updates are mirrored to both clusters
>> > - writes come from the current most coherent.
>> >
>> > Often two sstable loads are done:
>> >
>> > 1) first load
>> > 2) turn on double writes/write mirroring
>> > 3) a second load is done to finalize coherency
>> > 4) switch the app to point to the new cluster now that it is coherent
>> >
>> > The double writes and read is the sticking point. We could do it at the
>> app
>> > layer, but if the app wasn't written with that, it is a lot of testing
>> and
>> > customization specific to the framework.
>> >
>> > We could theoretically do some sort of proxying of the java-driver
>> somehow,
>> > but all the async structures and complex interfaces/apis would be
>> difficult
>> > to proxy. Maybe there is a lower level in the java-driver that is
>> possible.
>> > This also would only apply to the java-driver, and not
>> > python/go/javascript/other drivers.
>> >
>> > Finally, I suppose we could do a trigger on the tables. It would be
>> really
>> > nice if we could add to the cassandra toolbox the basics of a write
>> > mirroring trigger that could be activated "fairly easily"... now I know
>> > there are the complexities of inter-cluster access, and if we are even
>> > using cassandra as the target mirror system (for example there is an
>> > article on triggers write-mirroring to kafka:
>> > https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
>> >
>> > And this starts to get into the complexities of hinted handoff as well.
>> But
>> > fundamentally this seems something that would be a very nice feature
>> > (especially when you NEED it) to have in the core of cassandra.
>> >
>> > Finally, is the mutation hook in triggers sufficient to track all
>> incoming
>> > mutations (outside of "shudder" other triggers generating data)
>>
>>

Re: Built in trigger: double-write for app migration

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.
new DC and then split is one way, but you have to wait for it to stream,
and then how do you know the dc coherence is good enough to switch the
targetted DC for local_quorum? And then once we split it we'd have downtime
to "change the name" and other work that would distinguish it from the
original cluster, from what I'm told from the peoples that do the DC /
cluster setup and aws provisioning. It is a tool in the toolchest...

We might be able to get stats of the queries and updates impacting the
cluster in a centralized manner with a trigger too.

We will probably do stream-to-kafka trigger based on what is on the
intarweb and since we have kafka here already.

I will look at CDC.

Thank you everybody!


On Fri, Oct 19, 2018 at 3:29 AM Antonis Papaioannou <pa...@ics.forth.gr>
wrote:

> It reminds me of “shadow writes” described in [1].
> During data migration the coordinator forwards  a copy of any write
> request regarding tokens that are being transferred to the new node.
>
> [1] Incremental Elasticity for NoSQL Data Stores, SRDS’17,
> https://ieeexplore.ieee.org/document/8069080
>
>
> > On 18 Oct 2018, at 18:53, Carl Mueller <ca...@smartthings.com.INVALID>
> wrote:
> >
> > tl;dr: a generic trigger on TABLES that will mirror all writes to
> > facilitate data migrations between clusters or systems. What is necessary
> > to ensure full write mirroring/coherency?
> >
> > When cassandra clusters have several "apps" aka keyspaces serving
> > applications colocated on them, but the app/keyspace bandwidth and size
> > demands begin impacting other keyspaces/apps, then one strategy is to
> > migrate the keyspace to its own dedicated cluster.
> >
> > With backups/sstableloading, this will entail a delay and therefore a
> > "coherency" shortfall between the clusters. So typically one would
> employ a
> > "double write, read once":
> >
> > - all updates are mirrored to both clusters
> > - writes come from the current most coherent.
> >
> > Often two sstable loads are done:
> >
> > 1) first load
> > 2) turn on double writes/write mirroring
> > 3) a second load is done to finalize coherency
> > 4) switch the app to point to the new cluster now that it is coherent
> >
> > The double writes and read is the sticking point. We could do it at the
> app
> > layer, but if the app wasn't written with that, it is a lot of testing
> and
> > customization specific to the framework.
> >
> > We could theoretically do some sort of proxying of the java-driver
> somehow,
> > but all the async structures and complex interfaces/apis would be
> difficult
> > to proxy. Maybe there is a lower level in the java-driver that is
> possible.
> > This also would only apply to the java-driver, and not
> > python/go/javascript/other drivers.
> >
> > Finally, I suppose we could do a trigger on the tables. It would be
> really
> > nice if we could add to the cassandra toolbox the basics of a write
> > mirroring trigger that could be activated "fairly easily"... now I know
> > there are the complexities of inter-cluster access, and if we are even
> > using cassandra as the target mirror system (for example there is an
> > article on triggers write-mirroring to kafka:
> > https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> >
> > And this starts to get into the complexities of hinted handoff as well.
> But
> > fundamentally this seems something that would be a very nice feature
> > (especially when you NEED it) to have in the core of cassandra.
> >
> > Finally, is the mutation hook in triggers sufficient to track all
> incoming
> > mutations (outside of "shudder" other triggers generating data)
>
>

Re: Built in trigger: double-write for app migration

Posted by Antonis Papaioannou <pa...@ics.forth.gr>.
It reminds me of “shadow writes” described in [1].
During data migration the coordinator forwards  a copy of any write request regarding tokens that are being transferred to the new node.

[1] Incremental Elasticity for NoSQL Data Stores, SRDS’17,  https://ieeexplore.ieee.org/document/8069080


> On 18 Oct 2018, at 18:53, Carl Mueller <ca...@smartthings.com.INVALID> wrote:
> 
> tl;dr: a generic trigger on TABLES that will mirror all writes to
> facilitate data migrations between clusters or systems. What is necessary
> to ensure full write mirroring/coherency?
> 
> When cassandra clusters have several "apps" aka keyspaces serving
> applications colocated on them, but the app/keyspace bandwidth and size
> demands begin impacting other keyspaces/apps, then one strategy is to
> migrate the keyspace to its own dedicated cluster.
> 
> With backups/sstableloading, this will entail a delay and therefore a
> "coherency" shortfall between the clusters. So typically one would employ a
> "double write, read once":
> 
> - all updates are mirrored to both clusters
> - writes come from the current most coherent.
> 
> Often two sstable loads are done:
> 
> 1) first load
> 2) turn on double writes/write mirroring
> 3) a second load is done to finalize coherency
> 4) switch the app to point to the new cluster now that it is coherent
> 
> The double writes and read is the sticking point. We could do it at the app
> layer, but if the app wasn't written with that, it is a lot of testing and
> customization specific to the framework.
> 
> We could theoretically do some sort of proxying of the java-driver somehow,
> but all the async structures and complex interfaces/apis would be difficult
> to proxy. Maybe there is a lower level in the java-driver that is possible.
> This also would only apply to the java-driver, and not
> python/go/javascript/other drivers.
> 
> Finally, I suppose we could do a trigger on the tables. It would be really
> nice if we could add to the cassandra toolbox the basics of a write
> mirroring trigger that could be activated "fairly easily"... now I know
> there are the complexities of inter-cluster access, and if we are even
> using cassandra as the target mirror system (for example there is an
> article on triggers write-mirroring to kafka:
> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> 
> And this starts to get into the complexities of hinted handoff as well. But
> fundamentally this seems something that would be a very nice feature
> (especially when you NEED it) to have in the core of cassandra.
> 
> Finally, is the mutation hook in triggers sufficient to track all incoming
> mutations (outside of "shudder" other triggers generating data)


Re: Built in trigger: double-write for app migration

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.
Thanks. Well, at a minimum I'll probably start writing something soon for
trigger-based write mirroring, and we will probably support kafka and
another cassandra cluster, so if those seem to work I will contribute
those.

On Thu, Oct 18, 2018 at 11:27 AM Jeff Jirsa <jj...@gmail.com> wrote:

> The write sampling is adding an extra instance with the same schema to
> test things like yaml params or compaction without impacting reads or
> correctness - it’s different than what you describe
>
>
>
> --
> Jeff Jirsa
>
>
> > On Oct 18, 2018, at 5:57 PM, Carl Mueller <ca...@smartthings.com.INVALID>
> wrote:
> >
> > I guess there is also write-survey-mode from cass 1.1:
> >
> > https://issues.apache.org/jira/browse/CASSANDRA-3452
> >
> > Were triggers intended to supersede this capability? I can't find a lot
> of
> > "user level" info on it.
> >
> >
> > On Thu, Oct 18, 2018 at 10:53 AM Carl Mueller <
> carl.mueller@smartthings.com>
> > wrote:
> >
> >> tl;dr: a generic trigger on TABLES that will mirror all writes to
> >> facilitate data migrations between clusters or systems. What is
> necessary
> >> to ensure full write mirroring/coherency?
> >>
> >> When cassandra clusters have several "apps" aka keyspaces serving
> >> applications colocated on them, but the app/keyspace bandwidth and size
> >> demands begin impacting other keyspaces/apps, then one strategy is to
> >> migrate the keyspace to its own dedicated cluster.
> >>
> >> With backups/sstableloading, this will entail a delay and therefore a
> >> "coherency" shortfall between the clusters. So typically one would
> employ a
> >> "double write, read once":
> >>
> >> - all updates are mirrored to both clusters
> >> - writes come from the current most coherent.
> >>
> >> Often two sstable loads are done:
> >>
> >> 1) first load
> >> 2) turn on double writes/write mirroring
> >> 3) a second load is done to finalize coherency
> >> 4) switch the app to point to the new cluster now that it is coherent
> >>
> >> The double writes and read is the sticking point. We could do it at the
> >> app layer, but if the app wasn't written with that, it is a lot of
> testing
> >> and customization specific to the framework.
> >>
> >> We could theoretically do some sort of proxying of the java-driver
> >> somehow, but all the async structures and complex interfaces/apis would
> be
> >> difficult to proxy. Maybe there is a lower level in the java-driver
> that is
> >> possible. This also would only apply to the java-driver, and not
> >> python/go/javascript/other drivers.
> >>
> >> Finally, I suppose we could do a trigger on the tables. It would be
> really
> >> nice if we could add to the cassandra toolbox the basics of a write
> >> mirroring trigger that could be activated "fairly easily"... now I know
> >> there are the complexities of inter-cluster access, and if we are even
> >> using cassandra as the target mirror system (for example there is an
> >> article on triggers write-mirroring to kafka:
> >> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
> >>
> >> And this starts to get into the complexities of hinted handoff as well.
> >> But fundamentally this seems something that would be a very nice feature
> >> (especially when you NEED it) to have in the core of cassandra.
> >>
> >> Finally, is the mutation hook in triggers sufficient to track all
> incoming
> >> mutations (outside of "shudder" other triggers generating data)
> >>
> >>
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Built in trigger: double-write for app migration

Posted by Jeff Jirsa <jj...@gmail.com>.
The write sampling is adding an extra instance with the same schema to test things like yaml params or compaction without impacting reads or correctness - it’s different than what you describe



-- 
Jeff Jirsa


> On Oct 18, 2018, at 5:57 PM, Carl Mueller <ca...@smartthings.com.INVALID> wrote:
> 
> I guess there is also write-survey-mode from cass 1.1:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-3452
> 
> Were triggers intended to supersede this capability? I can't find a lot of
> "user level" info on it.
> 
> 
> On Thu, Oct 18, 2018 at 10:53 AM Carl Mueller <ca...@smartthings.com>
> wrote:
> 
>> tl;dr: a generic trigger on TABLES that will mirror all writes to
>> facilitate data migrations between clusters or systems. What is necessary
>> to ensure full write mirroring/coherency?
>> 
>> When cassandra clusters have several "apps" aka keyspaces serving
>> applications colocated on them, but the app/keyspace bandwidth and size
>> demands begin impacting other keyspaces/apps, then one strategy is to
>> migrate the keyspace to its own dedicated cluster.
>> 
>> With backups/sstableloading, this will entail a delay and therefore a
>> "coherency" shortfall between the clusters. So typically one would employ a
>> "double write, read once":
>> 
>> - all updates are mirrored to both clusters
>> - writes come from the current most coherent.
>> 
>> Often two sstable loads are done:
>> 
>> 1) first load
>> 2) turn on double writes/write mirroring
>> 3) a second load is done to finalize coherency
>> 4) switch the app to point to the new cluster now that it is coherent
>> 
>> The double writes and read is the sticking point. We could do it at the
>> app layer, but if the app wasn't written with that, it is a lot of testing
>> and customization specific to the framework.
>> 
>> We could theoretically do some sort of proxying of the java-driver
>> somehow, but all the async structures and complex interfaces/apis would be
>> difficult to proxy. Maybe there is a lower level in the java-driver that is
>> possible. This also would only apply to the java-driver, and not
>> python/go/javascript/other drivers.
>> 
>> Finally, I suppose we could do a trigger on the tables. It would be really
>> nice if we could add to the cassandra toolbox the basics of a write
>> mirroring trigger that could be activated "fairly easily"... now I know
>> there are the complexities of inter-cluster access, and if we are even
>> using cassandra as the target mirror system (for example there is an
>> article on triggers write-mirroring to kafka:
>> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
>> 
>> And this starts to get into the complexities of hinted handoff as well.
>> But fundamentally this seems something that would be a very nice feature
>> (especially when you NEED it) to have in the core of cassandra.
>> 
>> Finally, is the mutation hook in triggers sufficient to track all incoming
>> mutations (outside of "shudder" other triggers generating data)
>> 
>> 
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: Built in trigger: double-write for app migration

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.
I guess there is also write-survey-mode from cass 1.1:

https://issues.apache.org/jira/browse/CASSANDRA-3452

Were triggers intended to supersede this capability? I can't find a lot of
"user level" info on it.


On Thu, Oct 18, 2018 at 10:53 AM Carl Mueller <ca...@smartthings.com>
wrote:

> tl;dr: a generic trigger on TABLES that will mirror all writes to
> facilitate data migrations between clusters or systems. What is necessary
> to ensure full write mirroring/coherency?
>
> When cassandra clusters have several "apps" aka keyspaces serving
> applications colocated on them, but the app/keyspace bandwidth and size
> demands begin impacting other keyspaces/apps, then one strategy is to
> migrate the keyspace to its own dedicated cluster.
>
> With backups/sstableloading, this will entail a delay and therefore a
> "coherency" shortfall between the clusters. So typically one would employ a
> "double write, read once":
>
> - all updates are mirrored to both clusters
> - writes come from the current most coherent.
>
> Often two sstable loads are done:
>
> 1) first load
> 2) turn on double writes/write mirroring
> 3) a second load is done to finalize coherency
> 4) switch the app to point to the new cluster now that it is coherent
>
> The double writes and read is the sticking point. We could do it at the
> app layer, but if the app wasn't written with that, it is a lot of testing
> and customization specific to the framework.
>
> We could theoretically do some sort of proxying of the java-driver
> somehow, but all the async structures and complex interfaces/apis would be
> difficult to proxy. Maybe there is a lower level in the java-driver that is
> possible. This also would only apply to the java-driver, and not
> python/go/javascript/other drivers.
>
> Finally, I suppose we could do a trigger on the tables. It would be really
> nice if we could add to the cassandra toolbox the basics of a write
> mirroring trigger that could be activated "fairly easily"... now I know
> there are the complexities of inter-cluster access, and if we are even
> using cassandra as the target mirror system (for example there is an
> article on triggers write-mirroring to kafka:
> https://dzone.com/articles/cassandra-to-kafka-data-pipeline-part-1).
>
> And this starts to get into the complexities of hinted handoff as well.
> But fundamentally this seems something that would be a very nice feature
> (especially when you NEED it) to have in the core of cassandra.
>
> Finally, is the mutation hook in triggers sufficient to track all incoming
> mutations (outside of "shudder" other triggers generating data)
>
>
>
>