You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sammy Yu (JIRA)" <ji...@apache.org> on 2009/09/03 03:51:32 UTC

[jira] Created: (CASSANDRA-418) SSTable generation clash during compaction

SSTable generation clash during compaction
------------------------------------------

Key: CASSANDRA-418
URL: https://issues.apache.org/jira/browse/CASSANDRA-418
Project: Cassandra
Issue Type: Bug
Affects Versions: 0.5
Reporter: Sammy Yu
Assignee: Sammy Yu
Fix For: 0.5

We found that one of our node started getting timeouts for get_slice. Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.

Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting
[/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db. 0/1010269806 bytes for 9482/9373 keys read/written. Time: 96173ms.

It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1. In this scenario it is 6037+1=6038.
The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.

Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750785#action_12750785 ] 

Jonathan Ellis commented on CASSANDRA-418:
------------------------------------------

the compaction code relies on the bucketizer to keep files of the same compaction-count (a bucket of sstables that have been compacted twice, one of sstables that have been compacted 3 times) so that you are never compacting sstables of consecutive generations -- all will have even numbers, or all odd.  something has broken that invariant.

rather than try to band-aid the bucketizer i think making the generation-generator more robust is the way to go.  this seems like a flimsy property to try to preserve.

my vote would be to simplify: just pick the next monotonically increasing int any time we need a new tmp sstable file, whether for flush, compaction, or bootstrap.  I.e. via CFS.getTempSSTableFileName, without the extra increment.

the reason historically that FB tried to be fancy is, they were trying to optimize away reading older sstables at all if the data being queried was found in a newer one.  the "only new sstables get a number from the atomic int, and the compactions fit in between" was to preserve this.  (then you sort on the generation number and higher ones are always newer.)

but that can't work (see CASSANDRA-223) so we always do a full merge across all sstables now.  so we can simplify this safely.

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Sammy Yu
>            Assignee: Sammy Yu
>             Fix For: 0.5
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Sammy Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750795#action_12750795 ] 

Sammy Yu commented on CASSANDRA-418:
------------------------------------

Should we also change the getTempSSTableFileName to just increment fileIndexGenerator_ once?


> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Sammy Yu
>            Assignee: Chris Goffinet
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet reassigned CASSANDRA-418:
----------------------------------------

    Assignee: Chris Goffinet  (was: Sammy Yu)

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Sammy Yu
>            Assignee: Chris Goffinet
>             Fix For: 0.4
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12752475#action_12752475 ] 

Hudson commented on CASSANDRA-418:
----------------------------------

Integrated in Cassandra #191 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/191/])
    clean up inaccurate comments; remaining double-increment code.
patch by jbellis; reviewed by Sammy Yu for 


> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sammy Yu
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 0002-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 418-2.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-418.
--------------------------------------

    Resolution: Fixed

committed to 0.4 and 0.5

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sammy Yu
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 0002-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 418-2.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Sammy Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750795#action_12750795 ] 

Sammy Yu edited comment on CASSANDRA-418 at 9/2/09 7:58 PM:
------------------------------------------------------------

Should we also change CFS.getTempSSTableFileName to just increment fileIndexGenerator_ once?


      was (Author: sammy.yu):
    Should we also change the getTempSSTableFileName to just increment fileIndexGenerator_ once?

  
> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Sammy Yu
>            Assignee: Chris Goffinet
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750992#action_12750992 ] 

Jonathan Ellis commented on CASSANDRA-418:
------------------------------------------

> Should we also change CFS.getTempSSTableFileName to just increment fileIndexGenerator_ once? 

right, that's what i meant by "w/o the extra increment."

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Sammy Yu
>            Assignee: Sammy Yu
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Sammy Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751544#action_12751544 ] 

Sammy Yu commented on CASSANDRA-418:
------------------------------------

+1 looks good

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sammy Yu
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 0002-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 418-2.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reopened CASSANDRA-418:
--------------------------------------

      Assignee: Jonathan Ellis  (was: Sammy Yu)

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sammy Yu
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 0002-CASSANDRA-418-Use-monotonically-increasing-generatio.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Sammy Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sammy Yu updated CASSANDRA-418:
-------------------------------

    Attachment: 0002-CASSANDRA-418-Use-monotonically-increasing-generatio.patch

Self contained patch that now increment the generation number by one


> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Sammy Yu
>            Assignee: Sammy Yu
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 0002-CASSANDRA-418-Use-monotonically-increasing-generatio.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-418:
-------------------------------------

    Affects Version/s:     (was: 0.4)

(also affects version 0.3 for the record)

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sammy Yu
>            Assignee: Sammy Yu
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 0002-CASSANDRA-418-Use-monotonically-increasing-generatio.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-418:
-------------------------------------

    Attachment: 418-2.patch

missed some code.  this cleans up inaccurate comments and remaining double-increment code

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Sammy Yu
>            Assignee: Jonathan Ellis
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 0002-CASSANDRA-418-Use-monotonically-increasing-generatio.patch, 418-2.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet reassigned CASSANDRA-418:
----------------------------------------

    Assignee: Sammy Yu  (was: Chris Goffinet)

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Sammy Yu
>            Assignee: Sammy Yu
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750793#action_12750793 ] 

Chris Goffinet commented on CASSANDRA-418:
------------------------------------------

A note on the ML might be needed, with this bug it looks like we are going to have to dump our old data and re-import since we don't have a 100% way of figuring out what data is missing across the cluster.

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Sammy Yu
>            Assignee: Chris Goffinet
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Chris Goffinet (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Goffinet updated CASSANDRA-418:
-------------------------------------

    Component/s: Core

> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5
>            Reporter: Sammy Yu
>            Assignee: Sammy Yu
>             Fix For: 0.5
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-418) SSTable generation clash during compaction

Posted by "Sammy Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sammy Yu updated CASSANDRA-418:
-------------------------------

    Attachment: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch

Use monotonically increase generation number for newly compacted sstable.


> SSTable generation clash during compaction
> ------------------------------------------
>
>                 Key: CASSANDRA-418
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-418
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.4
>            Reporter: Sammy Yu
>            Assignee: Chris Goffinet
>             Fix For: 0.4
>
>         Attachments: 0001-CASSANDRA-418-Use-monotonically-increasing-generatio.patch
>
>
> We found that one of our node started getting timeouts for get_slice.  Looking further we found that the CFS.ssTables_ references a SStable doesn't exist on the file system.
> Walking down the log we see that the sstable in question 6038 is being compacted onto itself (in terms of filename file wise it is written to -tmp):
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:50:07,553 ColumnFamilyStore.java (line 1067) Compacting 
> [/mnt/var/cassandra/data/Digg/FriendActions-6037-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db,/mnt/var/cassandra/data/Digg/
> FriendActions-6040-Data.db,/mnt/var/cassandra/data/Digg/FriendActions-6042-Data.db]
> system.log.2009-09-01: INFO [MINOR-COMPACTION-POOL:1] 2009-09-01 23:51:43,727 ColumnFamilyStore.java (line 1209) Compacted to
> /mnt/var/cassandra/data/Digg/FriendActions-6038-Data.db.  0/1010269806 bytes for 9482/9373 keys read/written.  Time: 96173ms.
> It appears the generation number is generated by looking at the lowest number in the list of files to be compacted and adding 1.  In this scenario it is 6037+1=6038.
> The code in CFS.doFileCompaction will remove the key and add the key back and remove the key again, hence the error we were seeing.
> Should the generation number be generated via another way or should we update doFileCompaction to be smarter?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.