You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stu Hood (JIRA)" <ji...@apache.org> on 2011/04/19 04:17:05 UTC

[jira] [Created] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Eagerly re-write data at read time ("superseding")
--------------------------------------------------

                 Key: CASSANDRA-2503
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Stu Hood


Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.

Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).

Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137305#comment-13137305 ] 

Sylvain Lebresne commented on CASSANDRA-2503:
---------------------------------------------

+1 on the technical side

I'm still far from excited by this because I'm neither convinced that this will be very useful (especially with the max timestamp optimization) nor that it won't be counterproductive in some cases. But for the same reasons I'm not opposing it either if others are more convinced. 


                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.0.1
>
>         Attachments: 2503-v2.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2503:
--------------------------------------

    Attachment: 2503-v2.txt

v2 attached w/ commitlog optimization
                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.0.1
>
>         Attachments: 2503-v2.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155917#comment-13155917 ] 

Hudson commented on CASSANDRA-2503:
-----------------------------------

Integrated in Cassandra #1219 (See [https://builds.apache.org/job/Cassandra/1219/])
    "defragment" rows for name-based queries under STCS, again
patch by jbellis; reviewed by slebresne for CASSANDRA-2503

jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1205403
Files : 
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/db/CollationController.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/Table.java

                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.1
>
>         Attachments: 2503-v2.txt, 2503-v3.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2503:
--------------------------------------

    Attachment: 2503-v2.txt
    
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.0.1
>
>         Attachments: 2503-v2.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133942#comment-13133942 ] 

Sylvain Lebresne commented on CASSANDRA-2503:
---------------------------------------------

On the technical side:
* we probably should skip the commit log (by using Table.apply(rm, false) directly).
* what is the reason for limiting this to SizeTieredCompaction?

On the idea itself, I won't hide that I'm less than enthusiastic. It feels to me like the wrong fix to the 'compaction is behind' problem. This will be basically be triggered when compaction is behind, but is basically solving the problem temporarily by adding more pressure on compaction. I'd really like it if we could benchmark/evaluate this before adding it because I kind of fear there is scenario where it will do more harm than help.
                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.0.1
>
>         Attachments: 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Jonathan Ellis (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-2503.
---------------------------------------

    Resolution: Fixed

committed w/ both changes
                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.1
>
>         Attachments: 2503-v2.txt, 2503-v3.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding / defragmenting")

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2503:
--------------------------------------

    Summary: Eagerly re-write data at read time ("superseding / defragmenting")  (was: Eagerly re-write data at read time ("superseding"))
    
> Eagerly re-write data at read time ("superseding / defragmenting")
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.1.0
>
>         Attachments: 2503-v2.txt, 2503-v3.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137309#comment-13137309 ] 

Jonathan Ellis commented on CASSANDRA-2503:
-------------------------------------------

It's a pretty simple piece of code, and it's trivially clear that if max timestamp makes it unnecessary, then it's simply a no-op. :)
                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.0.1
>
>         Attachments: 2503-v2.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2503:
--------------------------------------

    Attachment:     (was: 2503-v2.txt)
    
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.0.1
>
>         Attachments: 2503-v2.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2503:
--------------------------------------

    Attachment: 2503.txt

Straightforward patch attached.
                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>              Labels: compaction, performance
>             Fix For: 1.0.1
>
>         Attachments: 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Jonathan Ellis (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reopened CASSANDRA-2503:
---------------------------------------

    
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.1
>
>         Attachments: 2503-v2.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2503:
--------------------------------------

    Attachment: 2503-v3.txt

v3 adds "boolean updateIndexes" to Table.apply; this is safe to turn off for the defragment write, since we're updating w/ exactly the existing data, timestamp and all.

Also adds a check for {{cfs.getMinimumCompactionThreshold() > 0}}.
                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.1
>
>         Attachments: 2503-v2.txt, 2503-v3.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-2503:
--------------------------------------

    Fix Version/s:     (was: 1.0.2)
                   1.1

reopening for 1.1 since we reverted it out of 1.0.3 in CASSANDRA-3491
                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.1
>
>         Attachments: 2503-v2.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155751#comment-13155751 ] 

Sylvain Lebresne commented on CASSANDRA-2503:
---------------------------------------------

Two small nits:
* I would prefer using {{sstablesIterated > cfs.getMinimumCompactionThreshold()}} rather than {{>=}}. I guess I'm afraid that this 'limit worst-case' would get trigger too often. Typically for minThreshold == 2, hoisting as soon as we hit more than 1 sstable feels a bit too much. Again, I have no big argument, it just feels a tad more reasonable with >.
* We probably should use {{!CFS.isCompactionDisabled()}} instead of {{cfs.getMinimumCompactionThreshold() > 0}}

But those minor nits apart, patch lgtm. +1 with or without the changes.
                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.1
>
>         Attachments: 2503-v2.txt, 2503-v3.txt, 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2503) Eagerly re-write data at read time ("superseding")

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134095#comment-13134095 ] 

Jonathan Ellis commented on CASSANDRA-2503:
-------------------------------------------

Good point on skipping the commitlog.

The reason to limit to STC is that I don't think of this as a band-aid for compaction-is-behind (although I suppose it accomplishes that as well) so much as a limit on the worst-case behavior; even when STC is "fully" compacted (i.e. not major compacted but there is nothing left for the bucketing to do) you can have an arbitrary number of sstables contain columns for a given row.  Thus, I think using min_compaction_threshold as the cutoff here makes a lot of sense.
                
> Eagerly re-write data at read time ("superseding")
> --------------------------------------------------
>
>                 Key: CASSANDRA-2503
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2503
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Jonathan Ellis
>              Labels: compaction, performance
>             Fix For: 1.0.1
>
>         Attachments: 2503.txt
>
>
> Once CASSANDRA-2498 is implemented, it will be possible to implement an optimization to eagerly rewrite ("supersede") data at read time. If a successful read needed to hit more than a certain threshold of sstables, we can eagerly rewrite it in a new sstable, and 2498 will allow only that file to be accessed. This basic approach would improve read performance considerably, but would cause a lot of duplicate data to be written, and would make compaction's work more necessary.
> Augmenting the basic idea, if when we superseded data in a file we marked it as superseded somehow, the next compaction that touched that file could remove the data. Since our file format is immutable, the values that a particular sstable superseded could be recorded in a component of that sstable. If we always supersede at the "block" level (as defined by CASSANDRA-674 or CASSANDRA-47), then the list of superseded blocks could be represented using a generation number and a bitmap of block numbers. Since 2498 would already allow for sstables to be eliminated due to timestamps, this information would probably only be used at compaction time (by loading all superseding information in the system for the sstables that are being compacted).
> Initially described on [1608|https://issues.apache.org/jira/secure/EditComment!default.jspa?id=12477095&commentId=12920353].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira