You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2011/02/22 00:28:38 UTC

[jira] Created: (CASSANDRA-2211) Cleanup can create sstables whose contents do not match their advertised version

Cleanup can create sstables whose contents do not match their advertised version
--------------------------------------------------------------------------------

                 Key: CASSANDRA-2211
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2211
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jonathan Ellis
            Assignee: Jonathan Ellis


{code}
                    if (Range.isTokenInRanges(row.getKey().token, ranges))
                    {
                        writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
                        writer.append(new EchoedRow(row));
                        totalkeysWritten++;
                    }
                    else
                    {
                        while (row.hasNext())
                        {
                            IColumn column = row.next();
                            if (indexedColumns.contains(column.name()))
                                Table.cleanupIndexEntry(cfs, row.getKey().key, column);
                        }
                    }
{code}

... that is, rows that haven't changed we copy to the new sstable without deserializing.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.

(This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2211) Cleanup can create sstables whose contents do not match their advertised version

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2211:
--------------------------------------

    Attachment: 2211.txt

> Cleanup can create sstables whose contents do not match their advertised version
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2211
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2211
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.3
>
>         Attachments: 2211.txt
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Since cleanup switched to per-sstable operation (CASSANDRA-1916), the main loop looks like this:
> {code}
>                     if (Range.isTokenInRanges(row.getKey().token, ranges))
>                     {
>                         writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
>                         writer.append(new EchoedRow(row));
>                         totalkeysWritten++;
>                     }
>                     else
>                     {
>                         while (row.hasNext())
>                         {
>                             IColumn column = row.next();
>                             if (indexedColumns.contains(column.name()))
>                                 Table.cleanupIndexEntry(cfs, row.getKey().key, column);
>                         }
>                     }
> {code}
> ... that is, rows that haven't changed we copy to the new sstable without deserializing, with EchoedRow.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.
> (This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2211) Cleanup can create sstables whose contents do not match their advertised version

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2211:
--------------------------------------

    Attachment: 2211.txt

> Cleanup can create sstables whose contents do not match their advertised version
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2211
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2211
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.3
>
>         Attachments: 2211.txt
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Since cleanup switched to per-sstable operation (CASSANDRA-1916), the main loop looks like this:
> {code}
>                     if (Range.isTokenInRanges(row.getKey().token, ranges))
>                     {
>                         writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
>                         writer.append(new EchoedRow(row));
>                         totalkeysWritten++;
>                     }
>                     else
>                     {
>                         while (row.hasNext())
>                         {
>                             IColumn column = row.next();
>                             if (indexedColumns.contains(column.name()))
>                                 Table.cleanupIndexEntry(cfs, row.getKey().key, column);
>                         }
>                     }
> {code}
> ... that is, rows that haven't changed we copy to the new sstable without deserializing, with EchoedRow.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.
> (This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2211) Cleanup can create sstables whose contents do not match their advertised version

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997629#comment-12997629 ] 

Jonathan Ellis commented on CASSANDRA-2211:
-------------------------------------------

A better fix would be to have it echo if the data is on the current version, otherwise rewrite.  This would (a) be a better fit with our policy of not having to keep code around to write old versions and (b) allow a better upgrade path to version N + 1 (that doesn't support the old version sstables) than major compaction. I'll see if I can do that tonight.

> Cleanup can create sstables whose contents do not match their advertised version
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2211
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2211
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.3
>
>         Attachments: 2211.txt
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Since cleanup switched to per-sstable operation (CASSANDRA-1916), the main loop looks like this:
> {code}
>                     if (Range.isTokenInRanges(row.getKey().token, ranges))
>                     {
>                         writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
>                         writer.append(new EchoedRow(row));
>                         totalkeysWritten++;
>                     }
>                     else
>                     {
>                         while (row.hasNext())
>                         {
>                             IColumn column = row.next();
>                             if (indexedColumns.contains(column.name()))
>                                 Table.cleanupIndexEntry(cfs, row.getKey().key, column);
>                         }
>                     }
> {code}
> ... that is, rows that haven't changed we copy to the new sstable without deserializing, with EchoedRow.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.
> (This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2211) Cleanup can create sstables whose contents do not match their advertised version

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2211:
--------------------------------------

           Description: 
Since cleanup switched to per-sstable operation (CASSANDRA-1916), the main loop looks like this:

{code}
                    if (Range.isTokenInRanges(row.getKey().token, ranges))
                    {
                        writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
                        writer.append(new EchoedRow(row));
                        totalkeysWritten++;
                    }
                    else
                    {
                        while (row.hasNext())
                        {
                            IColumn column = row.next();
                            if (indexedColumns.contains(column.name()))
                                Table.cleanupIndexEntry(cfs, row.getKey().key, column);
                        }
                    }
{code}

... that is, rows that haven't changed we copy to the new sstable without deserializing, with EchoedRow.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.

(This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

  was:
{code}
                    if (Range.isTokenInRanges(row.getKey().token, ranges))
                    {
                        writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
                        writer.append(new EchoedRow(row));
                        totalkeysWritten++;
                    }
                    else
                    {
                        while (row.hasNext())
                        {
                            IColumn column = row.next();
                            if (indexedColumns.contains(column.name()))
                                Table.cleanupIndexEntry(cfs, row.getKey().key, column);
                        }
                    }
{code}

... that is, rows that haven't changed we copy to the new sstable without deserializing.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.

(This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

    Remaining Estimate: 4h  (was: 2h)
     Original Estimate: 4h  (was: 2h)

> Cleanup can create sstables whose contents do not match their advertised version
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2211
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2211
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.3
>
>         Attachments: 2211.txt
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Since cleanup switched to per-sstable operation (CASSANDRA-1916), the main loop looks like this:
> {code}
>                     if (Range.isTokenInRanges(row.getKey().token, ranges))
>                     {
>                         writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
>                         writer.append(new EchoedRow(row));
>                         totalkeysWritten++;
>                     }
>                     else
>                     {
>                         while (row.hasNext())
>                         {
>                             IColumn column = row.next();
>                             if (indexedColumns.contains(column.name()))
>                                 Table.cleanupIndexEntry(cfs, row.getKey().key, column);
>                         }
>                     }
> {code}
> ... that is, rows that haven't changed we copy to the new sstable without deserializing, with EchoedRow.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.
> (This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2211) Cleanup can create sstables whose contents do not match their advertised version

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2211:
--------------------------------------

    Attachment:     (was: 2211.txt)

> Cleanup can create sstables whose contents do not match their advertised version
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2211
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2211
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.3
>
>         Attachments: 2211.txt
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Since cleanup switched to per-sstable operation (CASSANDRA-1916), the main loop looks like this:
> {code}
>                     if (Range.isTokenInRanges(row.getKey().token, ranges))
>                     {
>                         writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
>                         writer.append(new EchoedRow(row));
>                         totalkeysWritten++;
>                     }
>                     else
>                     {
>                         while (row.hasNext())
>                         {
>                             IColumn column = row.next();
>                             if (indexedColumns.contains(column.name()))
>                                 Table.cleanupIndexEntry(cfs, row.getKey().key, column);
>                         }
>                     }
> {code}
> ... that is, rows that haven't changed we copy to the new sstable without deserializing, with EchoedRow.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.
> (This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2211) Cleanup can create sstables whose contents do not match their advertised version

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2211:
----------------------------------------

    Attachment: 0001-2211-v3.patch

+1 on the patch. I'm just attaching a v3 that simply use getDefaultGcBefore() throughout CompactionManager (to make things cleaner)

Sadly, this is not the only place where we echo data wrongfully, cf. CASSANDRA-2216

> Cleanup can create sstables whose contents do not match their advertised version
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2211
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2211
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.3
>
>         Attachments: 0001-2211-v3.patch, 2211-v2.txt, 2211.txt
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Since cleanup switched to per-sstable operation (CASSANDRA-1916), the main loop looks like this:
> {code}
>                     if (Range.isTokenInRanges(row.getKey().token, ranges))
>                     {
>                         writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
>                         writer.append(new EchoedRow(row));
>                         totalkeysWritten++;
>                     }
>                     else
>                     {
>                         while (row.hasNext())
>                         {
>                             IColumn column = row.next();
>                             if (indexedColumns.contains(column.name()))
>                                 Table.cleanupIndexEntry(cfs, row.getKey().key, column);
>                         }
>                     }
> {code}
> ... that is, rows that haven't changed we copy to the new sstable without deserializing, with EchoedRow.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.
> (This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2211) Cleanup can create sstables whose contents do not match their advertised version

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2211:
--------------------------------------

    Attachment: 2211-v2.txt

v2 as described above.

> Cleanup can create sstables whose contents do not match their advertised version
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2211
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2211
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.3
>
>         Attachments: 2211-v2.txt, 2211.txt
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Since cleanup switched to per-sstable operation (CASSANDRA-1916), the main loop looks like this:
> {code}
>                     if (Range.isTokenInRanges(row.getKey().token, ranges))
>                     {
>                         writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
>                         writer.append(new EchoedRow(row));
>                         totalkeysWritten++;
>                     }
>                     else
>                     {
>                         while (row.hasNext())
>                         {
>                             IColumn column = row.next();
>                             if (indexedColumns.contains(column.name()))
>                                 Table.cleanupIndexEntry(cfs, row.getKey().key, column);
>                         }
>                     }
> {code}
> ... that is, rows that haven't changed we copy to the new sstable without deserializing, with EchoedRow.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.
> (This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2211) Cleanup can create sstables whose contents do not match their advertised version

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997845#comment-12997845 ] 

Hudson commented on CASSANDRA-2211:
-----------------------------------

Integrated in Cassandra-0.7 #303 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/303/])
    fix for cleanup writing old-format data into new-version sstable
patch by jbellis; reviewed by slebresne for CASSANDRA-2211


> Cleanup can create sstables whose contents do not match their advertised version
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2211
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2211
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 0.7.3
>
>         Attachments: 0001-2211-v3.patch, 2211-v2.txt, 2211.txt
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Since cleanup switched to per-sstable operation (CASSANDRA-1916), the main loop looks like this:
> {code}
>                     if (Range.isTokenInRanges(row.getKey().token, ranges))
>                     {
>                         writer = maybeCreateWriter(sstable, compactionFileLocation, expectedBloomFilterSize, writer);
>                         writer.append(new EchoedRow(row));
>                         totalkeysWritten++;
>                     }
>                     else
>                     {
>                         while (row.hasNext())
>                         {
>                             IColumn column = row.next();
>                             if (indexedColumns.contains(column.name()))
>                                 Table.cleanupIndexEntry(cfs, row.getKey().key, column);
>                         }
>                     }
> {code}
> ... that is, rows that haven't changed we copy to the new sstable without deserializing, with EchoedRow.  But, the new sstable is created with CURRENT_VERSION which may not be what the old data consisted of.
> (This could cause symptoms similar to CASSANDRA-2195 but I do not think it is the cause of that bug; IIRC the cluster in question there was not upgraded from an older Cassandra.)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira