You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Aleksey Yeschenko (JIRA)" <ji...@apache.org> on 2016/05/13 14:58:12 UTC
[jira] [Commented] (CASSANDRA-11680) Inconsistent data while paging through a table

    [ https://issues.apache.org/jira/browse/CASSANDRA-11680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15282758#comment-15282758 ] 

Aleksey Yeschenko commented on CASSANDRA-11680:
-----------------------------------------------

What's the Lucene index in question? It's definitely not something we ship with C*, and thus not something we support. This doesn't seem to be a Cassandra problem, but if you manage to reproduce a similar issue with built-in indexes, feel free to reopen the JIRA.

> Inconsistent data while paging through a table
> ----------------------------------------------
>
>                 Key: CASSANDRA-11680
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11680
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: siddharth verma
>
> We have the following table structure:
> CREATE TABLE keyspace.book_properties (
> book_id text,
> group_id bigint,
> property_display_name text,
> created timestamp,
> property_name text,
> property_uuid uuid,
> property_value text,
> updated timestamp,
> PRIMARY KEY (book_id, group_id, property_display_name)
> ) WITH CLUSTERING ORDER BY (group_id ASC, property_display_name ASC);
> We have lucene indexes on group_id, property_display_name, created, property_name, property_uuid, updated
> When we run a full table scan. Below is the sample code snippet
> boundStatement = new BoundStatement(session.prepare("select * from keyspace.book_properties");
> boundStatement.setConsistencyLevel(ConsistencyLevel.ALL);
> boundStatement.setFetchSize(fetchSize);
> PagingState currentPageInfo = null;
> do {
> try {
> if (currentPageInfo != null)
> { boundStatement.setPagingState(currentPageInfo); }
> ResultSet rs = session.execute(boundStatement);
> processResultSet(rs);
> currentPageInfo = rs.getExecutionInfo().getPagingState();
> } catch (NoHostAvailableException e) {
> }
> } while (currentPageInfo != null);
> ......
> processResultSet(ResultSet rs){
> int remaining = rs.getAvailableWithoutFetching();
> if (remaining != 0) {
> for (Row row : rs) {
> processCassandraRow(row);
> if (--remaining == 0)
> { break; }
> }
> }
> }
> Many a time, we got corrupted data in this process.
> 1. property_uuid was returned as null in many cases, when actual data had a value for it.
> 2. returned value for property_uuid in table scan was different from property_uuid as seen from cqlsh
> 3. returned value for group_id in table scan was different from group_id as seen from cqlsh
> book_properties has around 140 million records.
> book_properties has heavy read, write and update requests while paging is in process
> Cassandra version dsc3.0.3
> Side Note:
> For one of the inconsistent column, we specifically checked the writetime(..) to make sure data hadn't been changed while the job was in process. This was not the case
> checked for case 2 : select property_uuid, writetime(property_uuid) from book_properties where book_id = 'BOOK31263786';
> Edit1:
> ->when we do "select * from book_properties where book_id = 'BOOK31263786';" we get two records
> ->when while pagination job, I match and print Row where book_id = 'BOOK31263786', and we got 4 records.
> It is a speculation from our side, that other two might have been deleted some time back(definitely not during the job). Again, it is a speculation, not sure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)