You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Andrés de la Peña (JIRA)" <ji...@apache.org> on 2017/03/24 13:06:41 UTC

[jira] [Commented] (CASSANDRA-13277) Duplicate results with secondary index on static column

    [ https://issues.apache.org/jira/browse/CASSANDRA-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940277#comment-15940277 ] 

Andrés de la Peña commented on CASSANDRA-13277:
-----------------------------------------------

The underlying problem can be reproduced with a single node:
{code}
CREATE KEYSPACE k WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

CREATE TABLE k.c (
	pk int, 
	ck int, 
	sc int static
	primary key (pk, ck)
);

CREATE index ON k.c (sc);

INSERT INTO k.c (pk, ck, sc) values (1, 2, 3);
INSERT INTO k.c (pk, ck, sc) values (-1, 2, 3);

SELECT token(pk), pk, ck, sc FROM k.c where sc = 3 AND token(pk) > 0;
 system.token(pk)     | pk | ck | sc
----------------------+----+----+----
 -4069959284402364209 |  1 |  2 |  3
  7297452126230313552 | -1 |  2 |  3

SELECT token(pk), pk, ck, sc FROM k.c where sc = 3 AND token(pk) <= 0;
 system.token(pk)     | pk | ck | sc
----------------------+----+----+----
 -4069959284402364209 |  1 |  2 |  3
  7297452126230313552 | -1 |  2 |  3
{code}
This is produced because {{CompositesSearcher}} doesn't verify that index hits satisfy command's key constraint when dealing with static columns, as it is done with regular columns.

The provided examples don't specify key restrictions but they fail when RF is lesser than the number of nodes because they are internally split into subqueries directed to specific token ranges. Replicas ignore the token range restriction and the coordinator receives duplicate rows from unexpected token ranges, as it is shown in the previous example.

An initial version of the patch can be found here.
||[trunk|https://github.com/apache/cassandra/compare/trunk...adelapena:13277-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/adelapena/job/adelapena-13277-trunk-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/adelapena/job/adelapena-13277-trunk-dtest/]|
||[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...adelapena:13277-3.11]|[utests|http://cassci.datastax.com/view/Dev/view/adelapena/job/adelapena-13277-3.11-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/adelapena/job/adelapena-13277-3.11-dtest/]|

> Duplicate results with secondary index on static column
> -------------------------------------------------------
>
>                 Key: CASSANDRA-13277
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13277
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Romain Hardouin
>            Assignee: Andrés de la Peña
>              Labels: 2i
>
> As a follow up of http://www.mail-archive.com/user@cassandra.apache.org/msg50816.html 
> Duplicate results appear with secondary index on static column with RF > 1.
> Number of results vary depending on consistency level.
> Here is a CCM session to reproduce the issue:
> {code}
> romain@debian:~$ ccm create 39 -n 3 -v 3.9 -s
> Current cluster is now: 39
> romain@debian:~$ ccm node1 cqlsh
> Connected to 39 at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
> Use HELP for help.
> cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2};
> cqlsh> CREATE TABLE test.idx_static (id text, id2 bigint static, added timestamp, source text static, dest text, primary key (id, added));
> cqlsh> CREATE index ON test.idx_static (id2);
> cqlsh> INSERT INTO test.idx_static (id, id2, added, source, dest) values ('id1', 22,'2017-01-28', 'src1', 'dst1');
> cqlsh> SELECT * FROM test.idx_static where id2=22;
>  id  | added                           | id2 | source | dest
> -----+---------------------------------+-----+--------+------
>  id1 | 2017-01-27 23:00:00.000000+0000 |  22 |   src1 | dst1
>  id1 | 2017-01-27 23:00:00.000000+0000 |  22 |   src1 | dst1
> (2 rows)
> cqlsh> CONSISTENCY ALL 
> Consistency level set to ALL.
> cqlsh> SELECT * FROM test.idx_static where id2=22;
>  id  | added                           | id2 | source | dest
> -----+---------------------------------+-----+--------+------
>  id1 | 2017-01-27 23:00:00.000000+0000 |  22 |   src1 | dst1
>  id1 | 2017-01-27 23:00:00.000000+0000 |  22 |   src1 | dst1
>  id1 | 2017-01-27 23:00:00.000000+0000 |  22 |   src1 | dst1
> (3 rows)
> {code}
> When RF matches the number of nodes, it works as expected.
> Example with RF=3 and 3 nodes:
> {code}
> romain@debian:~$ ccm create 39 -n 3 -v 3.9 -s
> Current cluster is now: 39
> romain@debian:~$ ccm node1 cqlsh
> Connected to 39 at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
> Use HELP for help.
> cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
> cqlsh> CREATE TABLE test.idx_static (id text, id2 bigint static, added timestamp, source text static, dest text, primary key (id, added));
> cqlsh> CREATE index ON test.idx_static (id2);
> cqlsh> INSERT INTO test.idx_static (id, id2, added, source, dest) values ('id1', 22,'2017-01-28', 'src1', 'dst1');
> cqlsh> SELECT * FROM test.idx_static where id2=22;
>  id  | added                           | id2 | source | dest
> -----+---------------------------------+-----+--------+------
>  id1 | 2017-01-27 23:00:00.000000+0000 |  22 |   src1 | dst1
> (1 rows)
> cqlsh> CONSISTENCY all
> Consistency level set to ALL.
> cqlsh> SELECT * FROM test.idx_static where id2=22;
>  id  | added                           | id2 | source | dest
> -----+---------------------------------+-----+--------+------
>  id1 | 2017-01-27 23:00:00.000000+0000 |  22 |   src1 | dst1
> (1 rows)
> {code}
> Example with RF = 2 and 2 nodes:
> {code}
> romain@debian:~$ ccm create 39 -n 2 -v 3.9 -s
> Current cluster is now: 39
> romain@debian:~$ ccm node1 cqlsh
> Connected to 39 at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
> Use HELP for help.
> cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 2};
> cqlsh> CREATE TABLE test.idx_static (id text, id2 bigint static, added timestamp, source text static, dest text, primary key (id, added));
> cqlsh> INSERT INTO test.idx_static (id, id2, added, source, dest) values ('id1', 22,'2017-01-28', 'src1', 'dst1');
> cqlsh> CREATE index ON test.idx_static (id2);
> cqlsh> INSERT INTO test.idx_static (id, id2, added, source, dest) values ('id1', 22,'2017-01-28', 'src1', 'dst1');
> cqlsh> SELECT * FROM test.idx_static where id2=22;
>  id  | added                           | id2 | source | dest
> -----+---------------------------------+-----+--------+------
>  id1 | 2017-01-27 23:00:00.000000+0000 |  22 |   src1 | dst1
> (1 rows)
> cqlsh> CONSISTENCY ALL 
> Consistency level set to ALL.
> cqlsh> SELECT * FROM test.idx_static where id2=22;
>  id  | added                           | id2 | source | dest
> -----+---------------------------------+-----+--------+------
>  id1 | 2017-01-27 23:00:00.000000+0000 |  22 |   src1 | dst1
> (1 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)