You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2017/04/13 14:25:42 UTC

[jira] [Commented] (CASSANDRA-13066) Fast streaming with materialized views

    [ https://issues.apache.org/jira/browse/CASSANDRA-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967642#comment-15967642 ] 

Paulo Motta commented on CASSANDRA-13066:
-----------------------------------------

While it may make sense to pursue this optimization, I'm not sure adding a {{mv_fast_stream}} option is the best way to expose this to general usage for the following reasons:
a) It has a limited scope requiring users to know streaming internals of MVs to enable it, so it's not very friendly.
b) It has has a significant foot-shooting potential, when users enable this and perform partial writes or updates to existing rows, so users may enable it thinking fast=good without thinking of the consequences.

It basically boils down to this Sylvain's comment on CASSANDRA-9779:

bq. It seems clear to me that this will add complexity from the user point of view (it's a new concept that will either have good footshooting potential (if we were to just trust the user to insert only without checking it) and be annoying to use (if we force all columns every time)), so it sounds to me like we would need to demonstrate fairly big performance benefits to be worth doing (keep in mind that once we add such thing, we can't easily remove it, even if the improvement become obsolete).

With this said, since this would only be applicable to append-only MVs so I'd be more in favor of providing the whole feature set of append-only MVs instead which would include this and other optimizations (such as skipping read-before-write) and also enforce the append-only contract defined on MV creation, being much safer and having a more well defined semantics to users.

> Fast streaming with materialized views
> --------------------------------------
>
>                 Key: CASSANDRA-13066
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13066
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benjamin Roth
>            Assignee: Benjamin Roth
>             Fix For: 4.0
>
>
> I propose adding a configuration option to send streams of tables with MVs not through the regular write path.
> This may be either a global option or better a CF option.
> Background:
> A repair of a CF with an MV that is much out of sync creates many streams. These streams all go through the regular write path to assert local consistency of the MV. This again causes a read before write for every single mutation which again puts a lot of pressure on the node - much more than simply streaming the SSTable down.
> In some cases this can be avoided. Instead of only repairing the base table, all base + mv tables would have to be repaired. But this can break eventual consistency between base table and MV. The proposed behaviour is always safe, when having append-only MVs. It also works when using CL_QUORUM writes but it cannot be absolutely guaranteed, that a quorum write is applied atomically, so this can also lead to inconsistencies, if a quorum write is started but one node dies in the middle of a request.
> So, this proposal can help a lot in some situations but also can break consistency in others. That's why it should be left upon the operator if that behaviour is appropriate for individual use cases.
> This issue came up here:
> https://issues.apache.org/jira/browse/CASSANDRA-12888?focusedCommentId=15736599&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15736599



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)