You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Igor Novgorodov (JIRA)" <ji...@apache.org> on 2017/03/27 10:24:41 UTC
[jira] [Comment Edited] (CASSANDRA-13379) SASI index returns
duplicate rows
[ https://issues.apache.org/jira/browse/CASSANDRA-13379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943012#comment-15943012 ]
Igor Novgorodov edited comment on CASSANDRA-13379 at 3/27/17 10:23 AM:
-----------------------------------------------------------------------
Yep, it's paging:
{code}
> PAGING 2
Page size: 2
> select * from bulks_recipients where bulk_id_idx = fdeca186-e9aa-41a7-a086-eb377bb656a5;
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 4 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 1 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
---MORE---
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 1 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 5 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
---MORE---
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 5 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 3 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
---MORE---
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 3 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 2 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
---MORE---
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 2 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
(9 rows)
{code}
9 rows instead of 5:
{code}
> select * from bulks_recipients ;
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 4 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 1 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 5 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 3 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 2 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
(5 rows)
{code}
By the way, it does not explain {noformat}select count(*){noformat} error, looks like it's a separate issue, no?
Or count does paging internally too?
was (Author: blind_oracle):
Yep, it's paging:
{code}
> PAGING 2
Page size: 2
> select * from bulks_recipients where bulk_id_idx = fdeca186-e9aa-41a7-a086-eb377bb656a5;
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 4 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 1 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
---MORE---
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 1 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 5 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
---MORE---
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 5 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 3 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
---MORE---
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 3 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 2 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
---MORE---
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 2 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
(9 rows)
{code}
9 rows instead of 5:
{code}
> select * from bulks_recipients ;
bulk_id | recipient | bulk_id_idx | final_time | request_id | send_time | status
--------------------------------------+-----------+--------------------------------------+------------+------------+-----------+--------
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 4 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 1 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 5 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 3 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
fdeca186-e9aa-41a7-a086-eb377bb656a5 | 2 | fdeca186-e9aa-41a7-a086-eb377bb656a5 | null | null | null | null
(5 rows)
{code}
> SASI index returns duplicate rows
> ---------------------------------
>
> Key: CASSANDRA-13379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13379
> Project: Cassandra
> Issue Type: Bug
> Components: sasi
> Reporter: Igor Novgorodov
>
> {code}
> CREATE TABLE bulks_recipients (
> bulk_id uuid,
> recipient text,
> bulk_id_idx uuid,
> PRIMARY KEY ((bulk_id, recipient))
> )
> {code}
> *bulk_id_idx* is just a copy of *bulk_id* because SASI does not work on partition key component at all for some reason.
> {code}
> CREATE CUSTOM INDEX bulks_recipients_bulk_id ON bulks_recipients (bulk_id_idx) USING 'org.apache.cassandra.index.sasi.SASIIndex';
> {code}
> Then i insert 1 million rows with the same *bulk_id* and different *recipient*. Then
> {code}
> > select count(*) from bulks_recipients ;
> count
> ---------
> 1000000
> (1 rows)
> {code}
> Ok, it's fine here. Now let's query by SASI:
> {code}
> > select count(*) from bulks_recipients where bulk_id_idx = fedd95ec-2cc8-4040-8619-baf69647700b;
> count
> ---------
> 1010101
> (1 rows)
> {code}
> Hmm, very strange count - 10101 extra rows.
> Ok, i've dumped the query result into a text file:
> {code}
> # cat sasi.txt | wc -l
> 1000200
> {code}
> Here we have 200 extra rows for some reason.
> Let's check if these are duplicates:
> {code}
> # cat sasi.txt | sort | uniq | wc -l
> 1000000
> {code}
> Yep, looks like.
> Recreating index does not help. If i issue the very same query (against partition key *bulk_id*, not *bulk_id_idx*) - i get correct results.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)