You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Aaron Morton (Created) (JIRA)" <ji...@apache.org> on 2011/11/21 08:16:52 UTC

[jira] [Created] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Incorrect query results due to invalid SSTable.maxTimestamp
-----------------------------------------------------------

                 Key: CASSANDRA-3510
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.0.3
            Reporter: Aaron Morton
            Priority: Critical



related to CASSANDRA-3446

(sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
 
h1. Summary

SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
 
h2. Details 

Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 

4 Node cluster, all files upgraded to "hb" format. 

In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 

{noformat}
[default@XXX] get Users[ascii('username')]['meta']['password'];
=> (column=password, value=3130323130343130, timestamp=1307352647576000)

[default@XX] get Users[ascii('username')]['meta'];     
(snip)       
=> (column=password, value=3034323131303034, timestamp=1319563673493000)
{noformat}

The correct value is the second one. 

I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:

{code:java}
for (SSTableReader sstable : view.sstables)
{
    long currentMaxTs = sstable.getMaxTimestamp();
    logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
    reduceNameFilter(reducedFilter, container, currentMaxTs);
{code}

{noformat}
DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
{noformat}

The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.

When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .

SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.

During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 

It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 

e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.

h1. Reproduce

1. Start a clean 0.8.7

2. Add a schema (details of the schema do not matter):
{noformat}
[default@unknown] create keyspace dev;   
5f834620-140b-11e1-0000-242d50cf1fdf
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] 
[default@unknown] use dev;
Authenticated to keyspace: dev
[default@dev] 
[default@dev] create column family super_dev with column_type = 'Super' 
...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
60490720-140b-11e1-0000-242d50cf1fdf
Waiting for schema agreement...
... schemas agree across the cluster
{noformat}

3. Shutdown 0.8.7

4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 

{noformat}
 INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
{noformat}

5. Check the schema using the 1.0.3 CLI 

{noformat}
[default@unknown] use dev;
Authenticated to keyspace: dev
[default@dev] describe;
Keyspace: dev:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
    Options: [datacenter1:1]
  Column Families:
[default@dev] 
{noformat}

6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 

{noformat}
DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
{noformat}

h1. Fixes

Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...

Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
 
Looking at making scrub re-calculate the maxTimestamp.

Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3510:
----------------------------------------

    Attachment: 3510.patch

The fix of the first patch looks. For the second patch, even though this is an option, I think I'd rather make this parts of compaction, because I don't like too much having potential subtle bug that needs scrub to be run (without really anything telling you that you potentially have a problem btw).

Attaching a patch that does this (make it work for compaction in general as long as we're not using an EchoedRow (which makes it work for scrub in particular)). The patch also adds a few comments and a unit test. It includes the fix of the first patch. 
                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch, 3510.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Aaron Morton (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154111#comment-13154111 ] 

Aaron Morton edited comment on CASSANDRA-3510 at 11/21/11 10:48 AM:
--------------------------------------------------------------------

patch 0001 is a proof of concept hack based on my first comment, it generated this output when using the extra logging 

{noformat}
DEBUG [ReadStage:1] 2011-11-21 23:12:28,578 CollationController.java (line 77) collectTimeOrderedData
DEBUG [ReadStage:1] 2011-11-21 23:12:28,578 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12348-Data.db') and max TS 1321824847534000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,578 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12346-Data.db') and max TS 1321813380793000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,583 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12330-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,584 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,585 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12325-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,587 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,588 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12327-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,590 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,591 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12328-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,592 CollationController.java (line 130) Add Sub Column password:false:8@1319563673493000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,593 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12326-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,595 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12331-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,596 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,597 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12332-Data.db') and max TS -9223372036854775808
DEBUG [pool-2-thread-1] 2011-11-21 23:12:28,604 StorageProxy.java (line 694) Read: 26 ms.
{noformat}


patch 0002 makes scrub update the maxTimestamp and when I ran my test afterwards it created this output:

**NOTE** patch 0002 has incorrect file name, it modifies scrub.

{noformat}

DEBUG [ReadStage:33] 2011-11-21 23:20:32,032 CollationController.java (line 77) collectTimeOrderedData
DEBUG [ReadStage:33] 2011-11-21 23:20:32,033 CollationController.java (line 111) Got sstable SSTableReader(path='/private/var/lib/cassandra/data/fmm/Users-hc-12357-Data.db') and max TS 1321824847534000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,033 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hc-12352-Data.db') and max TS 1321813380793000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,063 CollationController.java (line 111) Got sstable SSTableReader(path='/private/var/lib/cassandra/data/fmm/Users-hc-12356-Data.db') and max TS 1321560509938000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,064 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hc-12349-Data.db') and max TS 1319813295567000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,105 CollationController.java (line 130) Add Sub Column password:false:8@1319563673493000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,105 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hc-12350-Data.db') and max TS 1318523190222000
DEBUG [pool-2-thread-2] 2011-11-21 23:20:32,106 StorageProxy.java (line 694) Read: 74 ms.
{noformat}
                
      was (Author: amorton):
    patch 0001 is a proof of concept hack based on my first comment, it generated this output when using the extra logging 

{noformat}
DEBUG [ReadStage:1] 2011-11-21 23:12:28,578 CollationController.java (line 77) collectTimeOrderedData
DEBUG [ReadStage:1] 2011-11-21 23:12:28,578 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12348-Data.db') and max TS 1321824847534000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,578 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12346-Data.db') and max TS 1321813380793000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,583 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12330-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,584 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,585 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12325-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,587 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,588 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12327-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,590 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,591 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12328-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,592 CollationController.java (line 130) Add Sub Column password:false:8@1319563673493000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,593 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12326-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,595 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12331-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,596 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,597 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12332-Data.db') and max TS -9223372036854775808
DEBUG [pool-2-thread-1] 2011-11-21 23:12:28,604 StorageProxy.java (line 694) Read: 26 ms.
{noformat}


patch 0002 makes repair update the maxTimestamp and when I ran my test afterwards it created this output:

{noformat}

DEBUG [ReadStage:33] 2011-11-21 23:20:32,032 CollationController.java (line 77) collectTimeOrderedData
DEBUG [ReadStage:33] 2011-11-21 23:20:32,033 CollationController.java (line 111) Got sstable SSTableReader(path='/private/var/lib/cassandra/data/fmm/Users-hc-12357-Data.db') and max TS 1321824847534000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,033 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hc-12352-Data.db') and max TS 1321813380793000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,063 CollationController.java (line 111) Got sstable SSTableReader(path='/private/var/lib/cassandra/data/fmm/Users-hc-12356-Data.db') and max TS 1321560509938000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,064 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hc-12349-Data.db') and max TS 1319813295567000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,105 CollationController.java (line 130) Add Sub Column password:false:8@1319563673493000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,105 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hc-12350-Data.db') and max TS 1318523190222000
DEBUG [pool-2-thread-2] 2011-11-21 23:20:32,106 StorageProxy.java (line 694) Read: 74 ms.
{noformat}
                  
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154246#comment-13154246 ] 

Sylvain Lebresne commented on CASSANDRA-3510:
---------------------------------------------

bq. Where is the "don't use echoedrow for old sstables" logic?

In CompactionController.needDeserialize(), but I agree that the comment could be improved to recall it.
                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch, 3510.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Aaron Morton (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Morton updated CASSANDRA-3510:
------------------------------------

    Attachment: 0002-3510-update-maxTimestamp-during-repair.patch
                0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch

patch 0001 is a proof of concept hack based on my first comment, it generated this output when using the extra logging 

{noformat}
DEBUG [ReadStage:1] 2011-11-21 23:12:28,578 CollationController.java (line 77) collectTimeOrderedData
DEBUG [ReadStage:1] 2011-11-21 23:12:28,578 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12348-Data.db') and max TS 1321824847534000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,578 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12346-Data.db') and max TS 1321813380793000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,583 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12330-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,584 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,585 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12325-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,587 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,588 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12327-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,590 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,591 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12328-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,592 CollationController.java (line 130) Add Sub Column password:false:8@1319563673493000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,593 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12326-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,595 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12331-Data.db') and max TS -9223372036854775808
DEBUG [ReadStage:1] 2011-11-21 23:12:28,596 CollationController.java (line 130) Add Sub Column password:false:8@1307352647576000
DEBUG [ReadStage:1] 2011-11-21 23:12:28,597 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hb-12332-Data.db') and max TS -9223372036854775808
DEBUG [pool-2-thread-1] 2011-11-21 23:12:28,604 StorageProxy.java (line 694) Read: 26 ms.
{noformat}


patch 0002 makes repair update the maxTimestamp and when I ran my test afterwards it created this output:

{noformat}

DEBUG [ReadStage:33] 2011-11-21 23:20:32,032 CollationController.java (line 77) collectTimeOrderedData
DEBUG [ReadStage:33] 2011-11-21 23:20:32,033 CollationController.java (line 111) Got sstable SSTableReader(path='/private/var/lib/cassandra/data/fmm/Users-hc-12357-Data.db') and max TS 1321824847534000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,033 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hc-12352-Data.db') and max TS 1321813380793000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,063 CollationController.java (line 111) Got sstable SSTableReader(path='/private/var/lib/cassandra/data/fmm/Users-hc-12356-Data.db') and max TS 1321560509938000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,064 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hc-12349-Data.db') and max TS 1319813295567000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,105 CollationController.java (line 130) Add Sub Column password:false:8@1319563673493000
DEBUG [ReadStage:33] 2011-11-21 23:20:32,105 CollationController.java (line 111) Got sstable SSTableReader(path='/var/lib/cassandra/data/fmm/Users-hc-12350-Data.db') and max TS 1318523190222000
DEBUG [pool-2-thread-2] 2011-11-21 23:20:32,106 StorageProxy.java (line 694) Read: 74 ms.
{noformat}
                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154252#comment-13154252 ] 

Jonathan Ellis commented on CASSANDRA-3510:
-------------------------------------------

Referring to the isLatestVersion check?
                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch, 3510.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154228#comment-13154228 ] 

Jonathan Ellis commented on CASSANDRA-3510:
-------------------------------------------

bq. 

{noformat}
+         * However, for old sstables without timestamp, we still want to update the timestamp (and we know
+         * that in this case we will not use EchoedRow).
{noformat}

Where is the "don't use echoedrow for old sstables" logic?
                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch, 3510.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Aaron Morton (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154085#comment-13154085 ] 

Aaron Morton commented on CASSANDRA-3510:
-----------------------------------------

To be clear, I do not think this is an issue with super CF's. The Schema CF is a standard, it was noticed on a super CF. 

I added some more logging and removed the call to cut the query short by checking maxTimestamp in CollationController.collectTimeOrderedData(). The query returned the result value (password col with timestamp 1319563673493000) and this was the output 

{code:java}
for (SSTableReader sstable : view.sstables)
{
    long currentMaxTs = sstable.getMaxTimestamp();
    logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
//                reduceNameFilter(reducedFilter, container, currentMaxTs);
//                if (((NamesQueryFilter) reducedFilter.filter).columns.isEmpty())
//                    break;

    IColumnIterator iter = reducedFilter.getSSTableColumnIterator(sstable);
    iterators.add(iter);
    if (iter.getColumnFamily() != null)
    {
        container.delete(iter.getColumnFamily());
        sstablesIterated++;
        while (iter.hasNext())
        {
            IColumn col = iter.next();
            if (col instanceof SuperColumn)
            {
              for (IColumn subcol : ((SuperColumn)col).columns)
              {
                if (subcol.name().equals(ByteBufferUtil.bytes("password")))
                  logger.debug(String.format("Add Sub Column %s", subcol.getString(cfs.metadata.subcolumnComparator)));
              }
            }
            else
            {
              logger.debug(String.format("Add Column %s", col.getString(cfs.metadata.comparator)));
            }
            container.addColumn(col);
        }
{code}


{noformat}
DEBUG 22:05:20,748 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
DEBUG 22:05:20,749 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
DEBUG 22:05:20,753 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
DEBUG 22:05:20,754 Add Sub Column password:false:8@1307352647576000
DEBUG 22:05:20,755 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
DEBUG 22:05:20,757 Add Sub Column password:false:8@1307352647576000
DEBUG 22:05:20,758 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12327-Data.db') and max TS -9223372036854775808
DEBUG 22:05:20,760 Add Sub Column password:false:8@1307352647576000
DEBUG 22:05:20,761 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12328-Data.db') and max TS -9223372036854775808
DEBUG 22:05:20,762 Add Sub Column password:false:8@1319563673493000
DEBUG 22:05:20,763 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12326-Data.db') and max TS -9223372036854775808
DEBUG 22:05:20,765 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12331-Data.db') and max TS -9223372036854775808
DEBUG 22:05:20,767 Add Sub Column password:false:8@1307352647576000
DEBUG 22:05:20,768 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12332-Data.db') and max TS -9223372036854775808
DEBUG 22:05:20,774 Read: 27 ms.
{noformat}


                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Aaron Morton (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154378#comment-13154378 ] 

Aaron Morton commented on CASSANDRA-3510:
-----------------------------------------

Thanks, will test shortly. 

                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch, 3510.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154294#comment-13154294 ] 

Sylvain Lebresne commented on CASSANDRA-3510:
---------------------------------------------

bq. Referring to the isLatestVersion check?

Yes.
                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch, 3510.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154433#comment-13154433 ] 

Jonathan Ellis commented on CASSANDRA-3510:
-------------------------------------------

+1
                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch, 3510.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Aaron Morton (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Morton updated CASSANDRA-3510:
------------------------------------

    Description: 
related to CASSANDRA-3446

(sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
 
h1. Summary

SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
 
h2. Details 

Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 

4 Node cluster, all files upgraded to "hb" format. 

In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 

{noformat}
[default@XXX] get Users[ascii('username')]['meta']['password'];
=> (column=password, value=3130323130343130, timestamp=1307352647576000)

[default@XX] get Users[ascii('username')]['meta'];     
(snip)       
=> (column=password, value=3034323131303034, timestamp=1319563673493000)
{noformat}

The correct value is the second one. 

I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:

{code:java}
for (SSTableReader sstable : view.sstables)
{
    long currentMaxTs = sstable.getMaxTimestamp();
    logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
    reduceNameFilter(reducedFilter, container, currentMaxTs);
{code}

{noformat}
DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
{noformat}

The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.

**Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 

When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .

SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.

During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 

It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 

e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.

h1. Reproduce

1. Start a clean 0.8.7

2. Add a schema (details of the schema do not matter):
{noformat}
[default@unknown] create keyspace dev;   
5f834620-140b-11e1-0000-242d50cf1fdf
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] 
[default@unknown] use dev;
Authenticated to keyspace: dev
[default@dev] 
[default@dev] create column family super_dev with column_type = 'Super' 
...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
60490720-140b-11e1-0000-242d50cf1fdf
Waiting for schema agreement...
... schemas agree across the cluster
{noformat}

3. Shutdown 0.8.7

4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 

{noformat}
 INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
{noformat}

5. Check the schema using the 1.0.3 CLI 

{noformat}
[default@unknown] use dev;
Authenticated to keyspace: dev
[default@dev] describe;
Keyspace: dev:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
    Options: [datacenter1:1]
  Column Families:
[default@dev] 
{noformat}

6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 

{noformat}
DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
{noformat}

h1. Fixes

Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...

Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
 
Looking at making scrub re-calculate the maxTimestamp.

Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?


  was:

related to CASSANDRA-3446

(sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
 
h1. Summary

SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
 
h2. Details 

Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 

4 Node cluster, all files upgraded to "hb" format. 

In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 

{noformat}
[default@XXX] get Users[ascii('username')]['meta']['password'];
=> (column=password, value=3130323130343130, timestamp=1307352647576000)

[default@XX] get Users[ascii('username')]['meta'];     
(snip)       
=> (column=password, value=3034323131303034, timestamp=1319563673493000)
{noformat}

The correct value is the second one. 

I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:

{code:java}
for (SSTableReader sstable : view.sstables)
{
    long currentMaxTs = sstable.getMaxTimestamp();
    logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
    reduceNameFilter(reducedFilter, container, currentMaxTs);
{code}

{noformat}
DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
{noformat}

The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.

When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .

SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.

During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 

It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 

e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.

h1. Reproduce

1. Start a clean 0.8.7

2. Add a schema (details of the schema do not matter):
{noformat}
[default@unknown] create keyspace dev;   
5f834620-140b-11e1-0000-242d50cf1fdf
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] 
[default@unknown] use dev;
Authenticated to keyspace: dev
[default@dev] 
[default@dev] create column family super_dev with column_type = 'Super' 
...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
60490720-140b-11e1-0000-242d50cf1fdf
Waiting for schema agreement...
... schemas agree across the cluster
{noformat}

3. Shutdown 0.8.7

4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 

{noformat}
 INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
{noformat}

5. Check the schema using the 1.0.3 CLI 

{noformat}
[default@unknown] use dev;
Authenticated to keyspace: dev
[default@dev] describe;
Keyspace: dev:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
    Options: [datacenter1:1]
  Column Families:
[default@dev] 
{noformat}

6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 

{noformat}
DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
{noformat}

h1. Fixes

Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...

Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
 
Looking at making scrub re-calculate the maxTimestamp.

Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?


    
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Aaron Morton (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154425#comment-13154425 ] 

Aaron Morton commented on CASSANDRA-3510:
-----------------------------------------

Tested against the current 1.0 head, with the test case I had and the query works as expected. 

I agree doing it in compaction is safer, putting it in repair just got me there faster last night. 

Thanks for cleaning up the patch. 
+1
                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Priority: Critical
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch, 3510.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-3510) Incorrect query results due to invalid SSTable.maxTimestamp

Posted by "Sylvain Lebresne (Resolved) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne resolved CASSANDRA-3510.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.4
         Reviewer: amorton
         Assignee: Sylvain Lebresne

Committed
                
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-3510
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.3
>            Reporter: Aaron Morton
>            Assignee: Sylvain Lebresne
>            Priority: Critical
>             Fix For: 1.0.4
>
>         Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch, 0002-3510-update-maxTimestamp-during-repair.patch, 3510.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a lot of new code :) )
>  
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE, and this means the wrong data is returned from queries. 
>  
> h2. Details 
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to 1.0.3. 
> 4 Node cluster, all files upgraded to "hb" format. 
> In a super CF there are situations where a get for a sub columns returns a different value than a get for the column. .e.g. 
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];     
> (snip)       
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one. 
> I added logging after line 109 in o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
>     long currentMaxTs = sstable.getMaxTimestamp();
>     logger.debug(String.format("Got sstable %s and max TS %d", sstable, currentMaxTs));
>     reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains the *old / wrong* value with timestamp 1307352647576000 above. The second contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see the first comment. 
> When CollectionController.collectTimeOrderedData() processes the 12325 file (after processing the 12330 file) while looping over the sstables the call to reduceNameFilter() removes the column  from the filter because the column read from the 12330 file has a time stamp of 1307352647576000 and the 12325 file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter using cfs.createCompactionWriter() which sets the maxTimestamp in the meta data collector according to the maxTimestamp in the meta data for the file(s) that will be scrubbed / compacted. But for pre 1.0 format files the default in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and the ctor). So scrubbing a pre 1.0 file will write stats files that have maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because append(AbstractCompactedRow) is called which expects the that cfs.createCompactionWriter() was able to set the correct maxTimestamp on the meta data. Compaction also uses append(AbstractCompactedRow) so may create an SSTable with an incorrect maxTimestamp if one of the input files started life as a pre 1.0 file and has a bad maxTimestamp. 
> It looks like the only time the maxTimestamp is calculated is when the SSTable is originally written. So the error from the old files will be carried along. 
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE compaction will write a SSTable with maxTimestamp 100. However file c may actually contain columns with a timestamp > 100 which will be in the compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;   
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown] 
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] 
> [default@dev] create column family super_dev with column_type = 'Super' 
> ...	and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and 
> ...	subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example below shows the wrong schema is loaded. I stepped the code and the wrong value was read from Migration.getLastMigrationId() due to this bug. 
> {noformat}
>  INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501) Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI 
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:1]
>   Column Families:
> [default@dev] 
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the maxTimestamp so I think it was only the random order of the files that made it work. 
> {noformat}
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know. This would not fix issues where the incorrect maxTimestamp been included in compaction. 
>  
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read from a file format that does not support maxTimestamp ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira