You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Tyler Hobbs (JIRA)" <ji...@apache.org> on 2015/09/09 20:46:47 UTC

[jira] [Commented] (CASSANDRA-10226) Support multiple non-PK cols in MV clustering key when partition key is shared

    [ https://issues.apache.org/jira/browse/CASSANDRA-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737393#comment-14737393 ] 

Tyler Hobbs commented on CASSANDRA-10226:
-----------------------------------------

I think we have another timestamp limitation around this, also due to reinserting view rows:

{code}
CREATE TABLE base (a int, b int, c int, d int, e int, PRIMARY KEY (a, b);

CREATE MATERIALIZED VIEW ... PRIMARY KEY (a, c, d, b);

INSERT INTO base (a, b, c, d, e) VALUES (0, 0, 0, 0, 0) USING TIMESTAMP 0;
-- MV row should be row(a=0, c=0, d=0, b=0, e=0)

-- update c to be null
UPDATE base USING TIMESTAMP 1 SET c = null WHERE a = 0 AND b = 0");
-- Base row is now row(a=0, b=0, c=null, d=0, e=0)
-- MV row gets deleted due to null column in PK; tombstone timestamp is 0 (due to d's timestamp of 0)

-- set c back to zero
UPDATE base USING TIMESTAMP 2 SET c =0 WHERE a = 0 AND b = 0");
-- Base row is now row(a=0, b=0, c=0, d=0, e=0) 
-- MV row gets re-inserted with timestamp 0 due to d's timestamp of 0, so the tombstone shadows it
{code}

To clarify, row timestamps in the view are picked as the minimum of the timestamps of columns in the base row that are in the view's primary key.  In this case, the min covers {{c}}, {{d}}, and the row-level timestamp.  Since {{d}} has a timestamp of 0 in the base row, that is chosen for the view row's timestamp.

We can't avoid using the minimum timestamp for this.  If we used the max instead, updates to the oldest cell might not result in changes to the view.

The changes in CASSANDRA-10261 also don't help in this case, because even a "weak" tombstone would shadow the reinserted row.

To me, it seems like we need to use a tuple of timestamps for conflict resolution in order to support this, something like {{(c_timestamp, d_timestamp)}}.  In that case, the view row tombstone would have a timestamp of {{(1, 0)}} and the re-inserted row would have a timestamp of {{(2, 0)}}.  An update to {{d}} at timestamp 1 would still update the view as expected, because the deletions/insertions would use a timestamp of {{(2, 1)}}, which is greater than {{(2, 0)}}.  The base row's PK timestamp probably also needs to be integrated into this, but I haven't sorted through the subtleties yet.  Regardless, it's looking like this is out of scope for 3.0.

> Support multiple non-PK cols in MV clustering key when partition key is shared
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10226
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10226
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Tyler Hobbs
>            Assignee: Tyler Hobbs
>              Labels: materializedviews
>             Fix For: 3.0.0 rc1
>
>
> This issue is similar to CASSANDRA-9928, but with one key limitation: the MV partition key must match the base table's partition key.  This limitation results in the base replica always pairing with itself as the MV replica.  Because of this pairing, if the base replica is lost, any MV rows that would otherwise be ambiguous are also lost.  This allows us to avoid the problem described in 9928 of not knowing which MV row to delete.
> Although this limitation has the potential to be a bit confusing for users, I believe this improvement is still worthwhile because:
> * The base table's partition key will often be a good choice for the MV partition key as well.  I expect it to be common for users to partition data the same way, but use a different clustering order to optimize for (or allow for) different queries.
> * It may take a long time to solve the problems presented in 9928 in general (if we can solve them at all).  On the other hand, this is straightforward and is a significant improvement to the usability of MVs.
> I have a minimal prototype of this that works well, so I should be able to upload a patch with thorough tests within the next few days.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)