You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Jira)" <ji...@apache.org> on 2020/08/20 14:28:00 UTC

[jira] [Commented] (CASSANDRA-15962) Digest for some queries is different depending whether the data are retrieved from sstable or memtable

    [ https://issues.apache.org/jira/browse/CASSANDRA-15962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181221#comment-17181221 ] 

Sylvain Lebresne commented on CASSANDRA-15962:
----------------------------------------------

The approach lgtm. Just a few remarks:
* Could you include the {{DigestTest}} you wrote into the PR?
* Changes to {{SelectStatement}} (special casing of the filter for super columns on range queries, as is done for slice ones) looks legit, but unrelated and a problem on 3.0. We should either split to another ticket (probably ideal), or have a 3.0 branch with that fix. It would also be good to have a test (one that shows the problem that exists and is fixed by this change).
* Similarly, the change to {{AbstractReadExecutor}} also look legit but unrelated and exists in 3.0. That one is pretty minor since I don't think it has visible consequence, so not worth a ticket of its own, but would nice to include with whatever we do for my previous point so it gets into 3.0.
* BTreeRow#filter: we should reuse/extract the {{!queriedByUserTester.test(column)}} test from the {{isSkippable}} initializer as value for {{shouldSkipValue}} instead of redoing the work by calling {{!filter.fetchedColumnIsQueried(column)}}.
* ComplexColumnData#filter:
** I'd store the result of {{filter.fetchedColumnIsQueried(column))}} in a variable at the beginning of the function, to avoid potentially repeating the call multiple times. Mostly because it'll be imo more readable, but it also avoid bad cases where we repeat it for vary many cells.
** Nit: The {{path != null}} in the {{shouldSkipValue}} initializer is unecessary: cells of complex columns are guaranteed to have a non-null path (see assertion in {{BufferCell}} ctor).
** It would be nice to avoid the repetition of {{cellTest.fetchedCellIsQueried(path)}} as well, readability wise. I'd suggest something like:
   {noformat}
   CellPath path = cell.path();
   boolean isForDropped = ...;
   boolean isShardowed = ...;
   boolean isFetchedCell = cellTester == null || cellTest.fetches(path);
   boolean isQueriedCell = isQueriedColumn && isFetchedCell && (cellTester == null || cellTester.fetchedCellIsQueried(path));
   boolean isSkippableCell = !isFetchedCell || (!isQueriedCell && cell.timestamp() < rowLiveness.timestamp());
   if (isForDropped ||| isShadowed || isSkippable)
       return null;

   // If the cell is only fetched but not queried, we need the cell but never the value. So, when reading from sstables, we
   // "skip" the value of such cells as an optimiation (see Cell#deserialize). We _must_ thus do the same here to avoid
   // disrepancies between data coming from memtables or sstables, which would lead to digest mismatches.
   return isQueriedCell ? cell : cell.withSkippedValue();
   {noformat}


With those addressed, if you could set up both a 3.11 branch and a trunk one and run CI, this would be ideal.


> Digest for some queries is different depending whether the data are retrieved from sstable or memtable
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15962
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15962
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 4.0, 3.11.x
>
>         Attachments: DigestTest.java
>
>
> Not sure into which category should I assign this ticket.
>  
> Basically when reading using certain column filters, the digest is different depending whether we read from sstable and memtable. This happens on {{trunk}} and {{cassandra-3.11}} branches. However it works properly on {{cassandra-3.0}} branch.
>  
> I'm attaching a simple test for trunk to demonstrate what I mean. 
>  
> Please verify my test and my conclusions
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org