You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Igor Novgorodov (JIRA)" <ji...@apache.org> on 2017/03/26 21:15:41 UTC
[jira] [Created] (CASSANDRA-13379) SASI index returns duplicate
rows
Igor Novgorodov created CASSANDRA-13379:
-------------------------------------------
Summary: SASI index returns duplicate rows
Key: CASSANDRA-13379
URL: https://issues.apache.org/jira/browse/CASSANDRA-13379
Project: Cassandra
Issue Type: Bug
Components: sasi
Reporter: Igor Novgorodov
{code}
CREATE TABLE bulks_recipients (
bulk_id uuid,
recipient text,
bulk_id_idx uuid,
status int,
ts timestamp,
PRIMARY KEY ((bulk_id, recipient))
)
{code}
*bulk_id_idx* is just a copy of *bulk_id* because SASI does not work on partition key component at all for some reason.
{code}
CREATE CUSTOM INDEX bulks_recipients_bulk_id ON bulks_recipients (bulk_id_idx) USING 'org.apache.cassandra.index.sasi.SASIIndex';
{code}
Then i insert 1 million rows with the same *bulk_id* and different *recipient*. Then
{code}
> select count(*) from bulks_recipients ;
count
---------
1000000
(1 rows)
{/code}
Ok, it's fine here. Now let's query by SASI:
{code}
> select count(*) from bulks_recipients where bulk_id_idx = fedd95ec-2cc8-4040-8619-baf69647700b;
count
---------
1010101
(1 rows)
{code}
Hmm, very strange count - 10101 extra rows.
Ok, i've dumped the query result into a text file:
{code}
# cat sasi.txt | wc -l
1000200
{code}
Here we have 200 extra rows for some reason.
Let's check if these are duplicates:
{code}
# cat sasi.txt | sort | uniq | wc -l
1000000
{code}
Yep, looks like.
Recreating index does not help.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)