You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Andres de la Peña (Jira)" <ji...@apache.org> on 2021/08/19 11:23:00 UTC

[jira] [Updated] (CASSANDRA-16868) Secondary indexes on primary key columns can miss some writes

     [ https://issues.apache.org/jira/browse/CASSANDRA-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andres de la Peña updated CASSANDRA-16868:
------------------------------------------
    Test and Documentation Plan: The PR contains unit tests for the problematic use cases
                         Status: Patch Available  (was: Open)

> Secondary indexes on primary key columns can miss some writes
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-16868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16868
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/2i Index
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
>
> Secondary indexes on primary key columns can miss some writes. For example, an update after a deletion won't create an index entry:
> {code:java}
> CREATE TABLE t (pk int, ck int, v int, PRIMARY KEY (pk, ck));
> CREATE INDEX ON t(ck);
> INSERT INTO t(pk, ck, v) VALUES (1, 2, 3); -- creates an index entry (right)
> DELETE FROM t WHERE pk = 1 AND ck = 2; -- deletes the previous index entry (right)
> UPDATE t SET v = 3 WHERE pk = 1 AND ck = 2; -- doesn't create a new index entry (wrong)
> SELECT * FROM t WHERE ck = 2; -- doesn't find the row (wrong)
> {code}
> This happens because the update uses the {{LivenssInfo}} of the previously deleted row (see [here|https://github.com/apache/cassandra/blob/cassandra-3.0.25/src/java/org/apache/cassandra/index/internal/CassandraIndex.java#L439]). The same happens when updating an expired row:
> {code:java}
> CREATE TABLE t (pk int, ck int, v int, PRIMARY KEY (pk, ck));
> CREATE INDEX ON t(ck);
> UPDATE t USING TTL 1 SET v = 3 WHERE pk = 1 AND ck = 2; -- creates a non-expiring index entry (right)
> -- wait for the expiration of the above row
> SELECT * FROM t WHERE ck = 2; -- deletes the index entry (right)
> UPDATE t SET v = 3 WHERE pk = 1 AND ck = 2; -- doesn't create an index entry (wrong)
> SELECT * FROM t WHERE ck = 2; -- doesn't find the row (wrong)
> {code}
> I think that the fix for this is just using the {{getPrimaryKeyIndexLiveness}} in {{updateRow}}, as it's used in {{insertRow}}.
> Another related problem is that {{getPrimaryKeyIndexLiveness}} uses [the most recent TTL in the columns contained on the indexed row fragment|https://github.com/apache/cassandra/blob/cassandra-3.0.25/src/java/org/apache/cassandra/index/internal/CassandraIndex.java#L519] as the TTL of the index entry, producing an expiring index entry that ignores the columns without TTL that are already present in flushed sstables. So we can find this other error when setting a TTL over flushed indexed data:
> {code:java}
> CREATE TABLE t(k1 int, k2 int, v int, PRIMARY KEY ((k1, k2)));
> CREATE INDEX idx ON t(k1);
> INSERT INTO t (k1, k2, v) VALUES (1, 2, 3);
> -- flush
> UPDATE t USING TTL 1 SET v=0 WHERE k1=1 AND k2=2; -- creates an index entry with TTL (wrong)
> -- wait for TTL expiration
> SELECT TTL(v) FROM t WHERE k1=1; -- doesn't find the row (wrong)
> {code}
> The straightforward fix is just ignoring the TTL of the columns for indexes on primary key components, so we don't produce expiring index entries in that case. The index entries will be eventually deleted during index reads, when we are sure that they are not pointing to any live data.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org