You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Edward Capriolo (JIRA)" <ji...@apache.org> on 2012/08/22 01:46:38 UTC

[jira] [Created] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Edward Capriolo created CASSANDRA-4565:
------------------------------------------

             Summary: TTL columns with older then gcgrace do not need to flush
                 Key: CASSANDRA-4565
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Edward Capriolo
            Assignee: Edward Capriolo


With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a user very large memtables,small ttl, and low gc_grace it is possible that writing memtables can be skipped entirely in some scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko reopened CASSANDRA-4565:
------------------------------------------

    
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo updated CASSANDRA-4565:
---------------------------------------

    Fix Version/s: 1.3
    
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453637#comment-13453637 ] 

Edward Capriolo commented on CASSANDRA-4565:
--------------------------------------------

I was originally thinking that total columns would have less of a chance of a lost delete issue. Since if they exist they will self delete. Not sure how this will work if a total column with a shorter total shadows a title column with a longer ttl. Then lost update might be an issue.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453697#comment-13453697 ] 

Aleksey Yeschenko commented on CASSANDRA-4565:
----------------------------------------------

bq. Not sure how this will work if a total column with a shorter total shadows a title column with a longer ttl. Then lost update might be an issue.

Or a column with tll shadowing a regular column with no ttl at all. This ttl change will be lost.
I'm certain that nothing can be done about it. Turning them into tombstones is an option though.

[~jbellis] there is a solution for 4542, but only because we know that if a row is in the memtable it can't exist in any of the sstables. I can implement that at least.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko updated CASSANDRA-4565:
-----------------------------------------

    Reviewer: jbellis
    
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453550#comment-13453550 ] 

Aleksey Yeschenko commented on CASSANDRA-4565:
----------------------------------------------

Currently ExpiringColumns are replaced with DeletedColumns in ExpiringColumn#create method, which is called when deserializing columns during sstable reads.
So expired columns with ttl are being turned into tombstones during compaction, but NOT at memtable flush time.
We can turn them into tombstones at flush time too - this should save some space but will require additional logic.
But we can't just dismiss them completely no matter how small gcgs is - for the same reason that we still write row-level tombstones that are beyond gcgs period.
Which is that it can result in unexpected behaviour where deletes never make it to disk, as they are lost, and so cannot override existing column values in existing sstables.

Do you want to leave it as is or add [expired column with ttl -> column tombstone] conversion at memtable flush time?
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko reassigned CASSANDRA-4565:
--------------------------------------------

    Assignee: Aleksey Yeschenko  (was: Edward Capriolo)
    
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453725#comment-13453725 ] 

Jonathan Ellis commented on CASSANDRA-4565:
-------------------------------------------

You're right, can't really generalize this...  still worth special casing batchlog probably.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439229#comment-13439229 ] 

Jonathan Ellis commented on CASSANDRA-4565:
-------------------------------------------

CASSANDRA-4542 calls for a generalization of this, btw.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439190#comment-13439190 ] 

Edward Capriolo commented on CASSANDRA-4565:
--------------------------------------------

Nevermind, from a code dive I see cf.maybeResetDeletionTimes(gcBefore); converts ExpiringColumns to DeletedColumns before this method.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo updated CASSANDRA-4565:
---------------------------------------

    Attachment: cassandra-4565.patch.1.txt

First attempt at a patch. Test passes, but we will likely refine this later.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a user very large memtables,small ttl, and low gc_grace it is possible that writing memtables can be skipped entirely in some scenarios.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443295#comment-13443295 ] 

Edward Capriolo commented on CASSANDRA-4565:
--------------------------------------------

 I noticed this because the method I added had loggers that I never saw in the output.  What I saw happening was ExpiringColumns past TTL never made it into the flush. After some tracing I came to the conclusion that maybeResetDeletionTimes takes into account ttl time, and converts them to DeletedColumns which do not get flushed. 

Also if you take the unit test I added, it passes without modification to the code base so unless my test is not doing what I think it seems like cassandra is already handling this. 

If you are finding this not to be the case we can re-open and I will take a look. 
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453842#comment-13453842 ] 

Aleksey Yeschenko commented on CASSANDRA-4565:
----------------------------------------------

bq. So we would need to do an iteration just for that purpose, and given that having expired column during flush is a corner case, it would cost more than it would give us.

That's what I thought as well. Closing the issue then.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aleksey Yeschenko resolved CASSANDRA-4565.
------------------------------------------

    Resolution: Not A Problem
    
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward Capriolo updated CASSANDRA-4565:
---------------------------------------

    Description: With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios.   (was: With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a user very large memtables,small ttl, and low gc_grace it is possible that writing memtables can be skipped entirely in some scenarios.)
    
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453838#comment-13453838 ] 

Sylvain Lebresne commented on CASSANDRA-4565:
---------------------------------------------

bq. Do you think expired ttl columns should be replaced with tombstones at memtable flush?

No, I'm even pretty sure it would be a bad idea. Currently the code does two iterations over a row to flush it: first it computes the row serialized size (to write that at the beginning of the row), then it actually writes it. We should *not* transform expired columns to tombstone during the 2nd iteration because it would screw up the serialized size computation. And the first iteration is just ill suited too because doing that transformation in the serializedSize() method would be a big hack. So we would need to do an iteration just for that purpose, and given that having expired column during flush is a corner case, it would cost more than it would give us.

If we remove the row serialized size (and column count) in the sstable format (which we may at some point), then we can revisit as it will be trivial then.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Aleksey Yeschenko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453801#comment-13453801 ] 

Aleksey Yeschenko commented on CASSANDRA-4565:
----------------------------------------------

Do you think expired ttl columns should be replaced with tombstones at memtable flush? (I'm leaning towards "no"). If you agree then I'm closing this task.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Aleksey Yeschenko
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439604#comment-13439604 ] 

Jonathan Ellis commented on CASSANDRA-4565:
-------------------------------------------

bq. Nevermind

Does that mean "never mind, this patch doesn't work," or "never mind, this doesn't need refinement after all?" :)
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439712#comment-13439712 ] 

Edward Capriolo commented on CASSANDRA-4565:
--------------------------------------------

Never mind means this does not need to be done.

The method called earlier in the flush process.     cf.maybeResetDeletionTimes(gcBefore); 
converts ExpiredColumns past their grace into DeletedColumns and then they do not get flushed.

We could keep the test case if to be sure this functionality stays working.
Also the condition:
c.getLocalDeletionTime() < gcBefore

Makes no sense to me gcBefore is defined as Integer.MIN_VALUE so this condition looks like it can never be met.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4565) TTL columns with older then gcgrace do not need to flush

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443214#comment-13443214 ] 

Jonathan Ellis commented on CASSANDRA-4565:
-------------------------------------------

bq. The method called earlier in the flush process. cf.maybeResetDeletionTimes(gcBefore);
converts ExpiredColumns past their grace into DeletedColumns and then they do not get flushed

Really?  We have a special case for {{if (cf.isMarkedForDelete())}} but I don't think we do this in general.
                
> TTL columns with older then gcgrace do not need to flush
> --------------------------------------------------------
>
>                 Key: CASSANDRA-4565
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4565
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>             Fix For: 1.3
>
>         Attachments: cassandra-4565.patch.1.txt
>
>
> With memcache many people are willing to sacrifice durability for performance. Cassandra has a TimeToLive feature that can be used in caching scenarios with low values for gc_grace_seconds. However from a code dive it seems that cassandra will always write TTL to disk, even those that are beyond gc_grace_seconds. If a use case very large memtables,small ttl, and small gc_grace it is possible that flushing these columns to disk can be skipped entirely in some scenarios. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira