You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (Created) (JIRA)" <ji...@apache.org> on 2012/02/28 16:45:46 UTC

[jira] [Created] (CASSANDRA-3974) Per-CF TTL

Per-CF TTL
----------

                 Key: CASSANDRA-3974
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Jonathan Ellis
            Priority: Minor


Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246621#comment-13246621 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

Thanks Kirk!

My comments:

- Looks like this only updates the CQL path?  We'd want to make the Thrift path cf-ttl-aware as well.  I *think* this just means updating RowMutation + CF addColumn methods.
- Nit: we could simplify getTTL a bit by adding assert ttl > 0.
- I got it backwards: we want max(cf ttl, column ttl) to be able to reason about the live-ness of CF data w/o looking at individual rows
- We can break the compaction optimizations into another ticket.  It really needs a separate compaction Strategy; the idea is if we have an sstable A older than CF ttl, then all the data in the file is dead and we can just delete the file without looking at it row-by-row.  However, there's a lot of tension there with the goal of normal compaction, which wants to merge different versions of the same row, so we're going to churn a lot with a low chance of ever having an sstable last the full TTL without being merged, effectively restarting our timer.  So, I think we're best served by a ArchivingCompactionStrategy that doesn't merge sstables at all, just drops obsolete ones, and let people use that for append-only insert workloads.  Which is a common enough case that it's worth the trouble... probably. :)
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285300#comment-13285300 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

If we're leaving QF cleanup for another ticket, is this done / ready for review then?
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427314#comment-13427314 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
---------------------------------------------

I'm sorry I'm a little late to the discussion, but I'm not sure I'm a fan of using the metadata TTL to decide of expiration because:
# It means we use the column timestamp to decide of the expiration. However, we have been very careful so far to not use the column timestamp as a server side timestamp. And in particular, the patch assumes the timestamp is in microseconds, while most clients and CQL actually use microseconds.
# Altering the default TTL is imo more confusing that way, because we are pretending that altering the TTL will apply to all existing CF and columns, which itself suggests that if you want to remove everything older than say 1h, you can switch the TTL to 1h and then change it back right away to some other much longer value (or 0). But that's not the case, because the new TTL will only be applied to existing data only when compaction happens. And I really don't think that user visible behaviors should depends in any way on the timing of internal operations.
# This requires passing the CFMetadata in lots of places in the code, which isn't really nice. In particular, we should call isColumnExpiredFromDefaultTTL pretty much every time DeletionInfo.isDeleted() is called (after all, having an expired column is exatly the same than having a deleted one), and the current patch is missing quite a few places.

So I think I do prefer the idea of having the CF TTL just being the default TTL applied to columns when inserted if they don't have one. 

                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258846#comment-13258846 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

I mean, a CF ttl of X is useful only if it lets us reason that an sstable written more than X seconds ago is entirely expired.  So... min? :)

Is describe caching the schema as in CASSANDRA-4052?

Agreed that if we want to allow altering CF ttl, keeping it separate from column ttl until we need to check for expired-ness makes the most sense.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427399#comment-13427399 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
---------------------------------------------

bq. I guess it would be a good thing to have for CQL though by the same reasoning as CASSANDRA-4448.

Agreed.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (CASSANDRA-3974) Per-CF TTL

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501412#comment-13501412 ] 

Jeremy Hanna edited comment on CASSANDRA-3974 at 11/20/12 7:49 PM:
-------------------------------------------------------------------

So this isn't going into 1.2 because it didn't apply cleanly to trunk?  I'm confused why the status is set to open and the target version is now set to 1.3.
                
      was (Author: jeromatron):
    So this isn't going into 1.2 because it didn't apply cleanly to trunk?  I'm confused why the status is set to open and the target version is not set to 1.3.
                  
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: 3974-v8.txt, trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278939#comment-13278939 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

Hmm. I think you've found a bug...
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251561#comment-13251561 ] 

Jonathan Ellis edited comment on CASSANDRA-3974 at 4/11/12 1:42 PM:
--------------------------------------------------------------------

bq. Part of the code I changed was in CFMetaData's toThrift and fromThrift methods

Let me back up.  I can see two main approaches towards respecting the per-CF ttl:

# Set the column TTL to the max(column, CF) ttl on insert; then the rest of the code doesn't have to know anything changed
# Take max(column, CF) ttl during operations like compaction, and leave column ttl (which is to say, ExpiringColumn objects) to specify *only* the column TTL

The code in UpdateStatement led me to believe you're going with option 1.  So what I meant by my comment was, you need to make a similar change for inserts done over Thrift RPC, as well.  (to/from Thrift methods are used for telling Thrift clients about the schema, but are not used for insert/update operations.)

Does that help?

bq. Sorry, I'm not sure to which part of the code you're referring

CFMetadata.getTimeToLive.  Sounds like you addressed this anyway.
                
      was (Author: jbellis):
    bq. Part of the code I changed was in CFMetaData's toThrift and fromThrift methods

Let me back up.  I can see two main approaches towards respecting the per-CF ttl:

# Set the column TTL to the max(column, CF) ttl on insert; then the rest of the code doesn't have to know anything changed
# Take max(column, CF) ttl during operations like compaction, and leave column ttl to specify *only* the column TTL

The code in UpdateStatement led me to believe you're going with option 1.  So what I meant by my comment was, you need to make a similar change for inserts done over Thrift RPC, as well.  (to/from Thrift methods are used for telling Thrift clients about the schema, but are not used for insert/update operations.)

Does that help?

bq. Sorry, I'm not sure to which part of the code you're referring

CFMetadata.getTimeToLive.  Sounds like you addressed this anyway.
                  
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286113#comment-13286113 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

I still need to implement unit tests. Do you have an suggestions as to an existing class into which they could be incorporated and/or good examples to copy?
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278414#comment-13278414 ] 

Kirk True edited comment on CASSANDRA-3974 at 5/17/12 11:52 PM:
----------------------------------------------------------------

In my case (inserting data and then calling {{list my_cf}} from the CLI), it goes through the {{RangeSliceCommand}} path which doesn't end up calling {{ColumnFamilyStorage.getColumnFamily}}. As such, the expired-and-thus-should-be-ignored columns are still showing up.
                
      was (Author: kirktrue):
    In my case (inserting data and then calling {{list my_cf}} from the CLI), it goes through the {{RangeSliceCommand}} path which doesn't end up calling {{ColumnFamilyStorage.getColumnFamily}}. As such, the expired-and-thus-should-be-ignored rows are still showing up.
                  
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269956#comment-13269956 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

My understanding is that in order to reduce potential user confusion when updating the column family's default TTL, we need to keep the column family's default TTL value separate. That is, we probably *don't* want to make {{ExpiringColumn}}s for a column family that has a default TTL (using {{min(CF TTL, column TTL)}} as the TTL value). Instead, we keep the logic as is and keep the column family's default TTL value in {{CFMetaData}}.

That's all fine and good, but looking at the code I'm not quite sure as to when we'd check the column family default TTL. It would seem that we need to pass a {{CFMetaData}} instance in to {{Column}}'s {{isMarkedForDelete}} so that it can perform logic such as:

{noformat}
    public boolean isMarkedForDelete(CFMetaData metadata)
    {
        if (metadata.getDefaultTimeToLive() > 0)
        {
            // Check if we're using a CF-based TTL.
            return System.currentTimeMillis() >= (timestamp + (metadata.getDefaultTimeToLive() * 1000));
        }
        else
        {
            return (int) (System.currentTimeMillis() / 1000) >= getLocalDeletionTime();
        }
    }
{noformat}

Is this the correct line of thought? If so, that changes a couple of dozen call sites which makes me wonder if I'm doing something wrong :)
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278344#comment-13278344 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

On IRC Jonathan suggested to look at {{ColumnFamilyStore.removeDeleted}} and {{PrecompactedRow.removeDeletedAndOldShards}}. However, at doesn't _appear_ that either of these are called during column reads so I can't rely on those to filter out results sent back to the client.

The logic that I see for filtering out results sent to the client is in places such as {{CassandraServer.thriftifyColumns}} via the {{IColumn.isMarkedForDelete}} call. However, as stated previously, since an {{IColumn}} doesn't internally store a {{CFMetaData}} object, we'd have to pass one in. {{isMarkedForDelete}} is used in a lot of places, so it has a ripple effect that causes a lot of changes.

Please advise.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289007#comment-13289007 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

CASSANDRA-4299 ({{removeDeleted}} clean up) blocks this as reads aren't presently calling {{removeDeleted}} and columns aren't being filtering out.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Dave Brosius (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226702#comment-13226702 ] 

Dave Brosius commented on CASSANDRA-3974:
-----------------------------------------

What happens if a column in a ttl'ed column family has a ttl that's longer than the cf's ttl? Would that be allowed?
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3974:
--------------------------------------

    Fix Version/s: 1.2
    
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 1.2
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226772#comment-13226772 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

It would have to be min(cf ttl, column ttl) to be useful.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500420#comment-13500420 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
---------------------------------------------

v8 lgtm.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: 3974-v8.txt, trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258685#comment-13258685 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

There are a few pieces missing yet:

# The ability to alter a column family to change the default TTL option. Because I made the change to use max(column TTL, CF TTL) at column mutate time, altering the column family default TTL value will be "lost" on such columns.
# I'm fighting with Python to understand why 'DESCRIBE COLUMNFAMILY FOO' doesn't show the new default TTL value. I made the change in the Python and Java layers to accept this new option, but the describe fails to display it.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286138#comment-13286138 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

Filed CASSANDRA-4299 to handle the "QF cleanup" and 'putting removeDeleted on every path'.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421558#comment-13421558 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

Sylvain - the main issue is that the code isn't structured in such a way that a CFMetaData object is available.

Neither the code for QueryFilter.isRelevant nor its callers have access to a CFMetaData. Can you think of a way to get the CFMetaData in there or a different way to structure the code in general?
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Robert Coli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Coli updated CASSANDRA-3974:
-----------------------------------

    Attachment: cassandra.1.0.10.replaying.log.after.exception.during.drain.txt
    
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: cassandra.1.0.10.replaying.log.after.exception.during.drain.txt, trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278414#comment-13278414 ] 

Kirk True edited comment on CASSANDRA-3974 at 5/17/12 11:52 PM:
----------------------------------------------------------------

In my case (inserting data and then calling {{list my_cf}} from the CLI), it goes through the {{RangeSliceCommand}} path which doesn't end up calling {{ColumnFamilyStorage.getColumnFamily}}. As such, the expired-and-thus-should-be-ignored rows are still showing up.
                
      was (Author: kirktrue):
    In my case (inserting data and then calling {{list my_cf}} from the CLI}}), it goes through the {{RangeSliceCommand}} path which doesn't end up calling {{ColumnFamilyStorage.getColumnFamily}}. As such, the expired-and-thus-should-be-ignored rows are still showing up.
                  
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Aleksey Vorona (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243831#comment-13243831 ] 

Aleksey Vorona commented on CASSANDRA-3974:
-------------------------------------------

Copying real life use cases which need this feature from the older bug ( CASSANDRA-3077 ):

1. I want one of my CFs not to store any data older than two months. It is a "notifications" CF which is of no interest to user past certain point in time.
Currently I am setting TTL with each insert in the CF, but since it is a constant it makes sense to me to have it configured in CF definition to apply automatically to all rows in the CF.

2. Default TTL would be very helpfull in Map/Reduce scenarios where you dont have direct control of TTL (IE: hive)
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated CASSANDRA-3974:
---------------------------------

    Attachment: trunk-3974v7.txt

Rebase against trunk.

Privatizing the constructors causes a lot of collateral changes and forces the creation of a factor method that is IMO not very intuitive  to the caller.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427343#comment-13427343 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
---------------------------------------------

Well, if the goal is just to be able to drop entire sstables when we know everything is expired, we could compute and keep in the metadata the min TTL of the sstable. 
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288953#comment-13288953 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

bq. when the user explicitly provides a column TTL longer than the default column family TTL, I would think we'd either want to a) give an error, or b) provide a warning

I'd be in favor of (a).
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402025#comment-13402025 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
---------------------------------------------

About the removeDeleted problem: I think that trying to force calls to removeDeleted (which force an iteration of all columns) so that we can add the logic of this ticket is the wrong approach (because it's inefficient for no good reason and doesn't make the code easier to follow). I.e. currently the code to ignore irrelevant columns is split between QueryFilter.isRelevant() and removeDeleted depending of which code path is taken (reads use isRelevant and compaction uses removeDeleted basically). So I see mostly 2 options:
# we find a way to refactor the code so that we only ever ignore irrelevant columns in one place. That would be great but again it's unclear how to do that correctly.
# we put the logic for this patch in both removeDeleted and isRelevant.

I'm personally fine going the second solution for the purpose of this ticket and keep the first option in mind for later as a way to improve the code base. 
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278414#comment-13278414 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

In my case (inserting data and then calling {{list my_cf}} from the CLI}}), it goes through the {{RangeSliceCommand}} path which doesn't end up calling {{ColumnFamilyStorage.getColumnFamily}}. As such, the expired-and-thus-should-be-ignored rows are still showing up.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278360#comment-13278360 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

CFS.removeDeleted is the one that's called during column reads. E.g., SliceByNamesReadCommand.getRow -> Table.getRow -> CFS.getColumnFamily -> CFS.removeDeleted
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Leonardo Stern (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288962#comment-13288962 ] 

Leonardo Stern commented on CASSANDRA-3974:
-------------------------------------------

As I user I prefer option (a)
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501891#comment-13501891 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

Set to open: because patch has been reviewed, no need for it to keep showing up in Patch Available until there is a new one.

Target version 1.3: because we missed the window to make 1.2.0rc1.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: 3974-v8.txt, trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427346#comment-13427346 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

Hmm.  Now that you mention it, Yuki already added that in CASSANDRA-3442...
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251966#comment-13251966 ] 

Kirk True edited comment on CASSANDRA-3974 at 4/11/12 9:57 PM:
---------------------------------------------------------------

In the initial patch, I had made changes to both {{UpdateStatement.addToMutation}} and {{ColumnFamily.addColumn}} to use the larger of the column's TTL or the column family default TTL. I tested against the {{cassandra-cli}} and {{cqlsh}} tools and both show the default TTL being used if none is specified.

This is all to say that it _looks_ like both the Thrift and CQL paths are working as expected. Perhaps it's high time I found the unit tests and added some...
                
      was (Author: kirktrue):
    I made changes to both {{UpdateStatement.addToMutation}} and {{ColumnFamily.addColumn}} to use the larger of the column's TTL or the column family default TTL. I tested against the {{cassandra-cli}} and {{cqlsh}} tools and both show the default TTL being used if none is specified.

This is all to say that it _looks_ like both the Thrift and CQL paths are working as expected. Perhaps it's high time I found the unit tests and added some...
                  
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251196#comment-13251196 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

Jonathan, thanks for the feedback.

I need a bit of clarification for a newbie hacking on the code...

bq. Looks like this only updates the CQL path? We'd want to make the Thrift path cf-ttl-aware as well. I think this just means updating RowMutation + CF addColumn methods.

I actually thought the opposite. Part of the code I changed was in {{CFMetaData}}'s {{toThrift}} and {{fromThrift}} methods. Perhaps I'm reading too much into the method names?

But I took a look at {{ColumnFamily}}'s {{addColumn}} method, but it already performs the conditional based on the TTL value.

bq. Nit: we could simplify getTTL a bit by adding assert ttl > 0.

Sorry, I'm not sure to which part of the code you're referring :( Can you elaborate?

bq.    I got it backwards: we want max(cf ttl, column ttl) to be able to reason about the live-ness of CF data w/o looking at individual rows

I cleaned up the {{CFMetaData.getTimeToLive}} method, which is now simply:

{noformat}
public int getTimeToLive(int timeToLive)
{
    return Math.max(defaultTimeToLive, timeToLive);
}
{noformat}

bq.    We can break the compaction optimizations into another ticket. It really needs a separate compaction Strategy; the idea is if we have an sstable A older than CF ttl, then all the data in the file is dead and we can just delete the file without looking at it row-by-row. However, there's a lot of tension there with the goal of normal compaction, which wants to merge different versions of the same row, so we're going to churn a lot with a low chance of ever having an sstable last the full TTL without being merged, effectively restarting our timer. So, I think we're best served by a ArchivingCompactionStrategy that doesn't merge sstables at all, just drops obsolete ones, and let people use that for append-only insert workloads. Which is a common enough case that it's worth the trouble... probably.

Either way is fine. Would love to contribute.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated CASSANDRA-3974:
---------------------------------

    Attachment: trunk-3974v2.txt
    
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467776#comment-13467776 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
---------------------------------------------

Sorry, I've forgot about that one, so the patch needs rebasing. But from a cursory inspection, v5 looks ok (except maybe for the M/R support Jeremy suggested above (but I'm not sure where's the best place to add that)).

Also, CASSANDRA-3442 adds info that should help use optimize things by dropping fully expired sstables, but I'm not sure the optimization itself is implemented yet. Do we want to do that in this ticket or should we move that to later (I'm good either way)?
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269956#comment-13269956 ] 

Kirk True edited comment on CASSANDRA-3974 at 5/7/12 8:08 PM:
--------------------------------------------------------------

My understanding is that in order to reduce potential user confusion when updating the column family's default TTL, we need to keep the column family's default TTL value separate. That is, we probably *don't* want to make {{ExpiringColumn}} instances for a column family that has a default TTL (using {{min(CF TTL, column TTL)}} as the TTL value). Instead, we keep the logic as is and keep the column family's default TTL value in {{CFMetaData}}.

That's all fine and good, but looking at the code I'm not quite sure as to when we'd check the column family default TTL. It would seem that we need to pass a {{CFMetaData}} instance in to {{Column}}'s {{isMarkedForDelete}} so that it can perform logic such as:

{noformat}
    public boolean isMarkedForDelete(CFMetaData metadata)
    {
        if (metadata.getDefaultTimeToLive() > 0)
        {
            // Check if we're using a CF-based TTL.
            return System.currentTimeMillis() >= (timestamp + (metadata.getDefaultTimeToLive() * 1000));
        }
        else
        {
            return (int) (System.currentTimeMillis() / 1000) >= getLocalDeletionTime();
        }
    }
{noformat}

Is this the correct line of thought? If so, that changes a couple of dozen call sites which makes me wonder if I'm doing something wrong :)
                
      was (Author: kirktrue):
    My understanding is that in order to reduce potential user confusion when updating the column family's default TTL, we need to keep the column family's default TTL value separate. That is, we probably *don't* want to make {{ExpiringColumn}}s for a column family that has a default TTL (using {{min(CF TTL, column TTL)}} as the TTL value). Instead, we keep the logic as is and keep the column family's default TTL value in {{CFMetaData}}.

That's all fine and good, but looking at the code I'm not quite sure as to when we'd check the column family default TTL. It would seem that we need to pass a {{CFMetaData}} instance in to {{Column}}'s {{isMarkedForDelete}} so that it can perform logic such as:

{noformat}
    public boolean isMarkedForDelete(CFMetaData metadata)
    {
        if (metadata.getDefaultTimeToLive() > 0)
        {
            // Check if we're using a CF-based TTL.
            return System.currentTimeMillis() >= (timestamp + (metadata.getDefaultTimeToLive() * 1000));
        }
        else
        {
            return (int) (System.currentTimeMillis() / 1000) >= getLocalDeletionTime();
        }
    }
{noformat}

Is this the correct line of thought? If so, that changes a couple of dozen call sites which makes me wonder if I'm doing something wrong :)
                  
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434497#comment-13434497 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

I think Sylvain is right that that makes more sense...  sorry about the wild goose chase!
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494091#comment-13494091 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

Pinging for feedback.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated CASSANDRA-3974:
---------------------------------

    Attachment: trunk-3974v5.txt
    
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507627#comment-13507627 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

committed, thanks!
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: 3974-v8.txt, 3974-v9.txt, trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251966#comment-13251966 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

I made changes to both {{UpdateStatement.addToMutation}} and {{ColumnFamily.addColumn}} to use the larger of the column's TTL or the column family default TTL. I tested against the {{cassandra-cli}} and {{cqlsh}} tools and both show the default TTL being used if none is specified.

This is all to say that it _looks_ like both the Thrift and CQL paths are working as expected. Perhaps it's high time I found the unit tests and added some...
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True reassigned CASSANDRA-3974:
------------------------------------

    Assignee: Kirk True
    
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated CASSANDRA-3974:
---------------------------------

    Attachment: trunk-3974v4.txt
    
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated CASSANDRA-3974:
---------------------------------

    Attachment: 3974-v9.txt

Rebased against trunk.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: 3974-v8.txt, 3974-v9.txt, trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278489#comment-13278489 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

RSVH still goes through getColumnFamily.  It can either go through the index AbstractScanIterator which calls getCF at line 195 in KeysSearcher, or the seq scan iterator via CFS.filterColumnFamily (RowIteratorFactory line 111).

Remember that these only remove *expired* tombstones; non-expired ones need to be returned to the coordinator for read repair.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501412#comment-13501412 ] 

Jeremy Hanna commented on CASSANDRA-3974:
-----------------------------------------

So this isn't going into 1.2 because it didn't apply cleanly to trunk?  I'm confused why the status is set to open and the target version is not set to 1.3.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: 3974-v8.txt, trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated CASSANDRA-3974:
---------------------------------

    Attachment: trunk-3974v6.txt

Version 6, sync'ed with trunk as of this morning.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated CASSANDRA-3974:
---------------------------------

    Attachment: trunk-3974.txt

This is a proof-of-concept patch for allowing optional default TTLs on a column family. 

I'm not sure how to implement the "compaction optimizations" the main JIRA description mentions.

Please provide feedback on what needs to be added/changed.

Thanks.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427397#comment-13427397 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

bq. If we're just going to have CF TTL being sugar for clients too lazy to apply what they want, then I'm not interested.

I guess it would be a good thing to have for CQL though by the same reasoning as CASSANDRA-4448.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427448#comment-13427448 ] 

Jeremy Hanna commented on CASSANDRA-3974:
-----------------------------------------

bq. If we're just going to have CF TTL being sugar for clients too lazy to apply what they want, then I'm not interested.

Also if that client happens to be Pig or Hive, there's not currently a way to set TTLs.  So in that case it's not laziness of the client.

A use case: I don't want to MapReduce over my giant archival column family so when ingesting data, I'll write to my archival column family and in addition a column family with a default TTL or however it's implemented, so it would just be data from the last 30 days.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397644#comment-13397644 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

Pinging to have someone look at patch v2.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430612#comment-13430612 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

I can reintroduce the 'TTL as a default' approach from the first patch if that's how we want it to work.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Leonardo Stern (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239502#comment-13239502 ] 

Leonardo Stern commented on CASSANDRA-3974:
-------------------------------------------

This is related to CASSANDRA-3077, Also very helpful in map/reduce scenarios.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Priority: Minor
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13252522#comment-13252522 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

bq. In the initial patch, I had made changes to both UpdateStatement.addToMutation and ColumnFamily.addColumn to use the larger of the column's TTL or the column family default TTL

Oops, I totally missed the addColumn changes.  That's exactly what I had in mind.

It sounds like you have an updated patch, could you post that?
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Dave Brosius (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258738#comment-13258738 ] 

Dave Brosius commented on CASSANDRA-3974:
-----------------------------------------

JBellis... you mean min, right?
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3974:
--------------------------------------

    Attachment: 3974-v8.txt
    
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: 3974-v8.txt, trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281036#comment-13281036 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
---------------------------------------------

I does indeed seem that removeDeleted is not called on that path. I don't know if this show up as a bug though: columns shadowed by a row tombstone are removed by QueryFilter.isRelevant, and column tombstones are removed before being returned to the client. Yet, it would probably be cleaner to put removeDeleted on every path, especially since I agree with Jonathan that it's probably the right place to put the CF-TTL check.

Actually I think that if we make sure to put removeDeleted on every path, we could probably make it the only method concerned with tombstone and remove QueryFilter.isRelelevant for instance, which would clean things up. But we can probably leave that to another ticket.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258702#comment-13258702 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

What are the semantics of updating the CF TTL? Should updating the CF TTL effect existing columns? If so, we would not want to use max(column TTL, CF TTL) _at column mutation time_ but keeping them separate and dynamic to evaluate liveness at some other event (column retrieval and/or compaction).
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427342#comment-13427342 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

If we're just going to have CF TTL being sugar for clients too lazy to apply what they want, then I'm not interested.

But if we use CF TTL to provide an upper bound on how long data can live, then we open the door for some interesting optimizations.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494329#comment-13494329 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

bq. it does feel a bit fragile that some future internal code could too easily add an ExpiringColumn though ColumnFamily.addColumn(IColumn)

Tracing and unit tests (via Util.expiringColumn) already use this method.

Should we make the constructors private and expose a factory method that requires the metadata?

                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494137#comment-13494137 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
---------------------------------------------

I realize I'm reviewer on this one. I seem that remember that [~jbellis] wanted to have a look at that but maybe I misunderstood that?

In any case, I had a look at that patch and that looks good to me overall. That being, and that's not really a criticism of the patch, I do was slightly surprised that we only need to modify {{ColumnFamily.addColumn(QueryPath, ...)}} and {{InsertStatement}} to make that work. Don't get me wrong, I do think this is correct, but it does feel a bit fragile that some future internal code could too easily add an ExpiringColumn though ColumnFamily.addColumn(IColumn) and skip the global cf setting. I don't really have any good solution to make it less fragile however, I'm just thinking out loud. But that remark aside, again the patch does lgtm (aside from needing rebase).
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496414#comment-13496414 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

Jonathan, we want the ability for clients to explicitly *not* use the column family default TTL?
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498111#comment-13498111 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

I guess I thought that was implied but we can apply YAGNI here.

What about UpdateStatemnt/addColumn?
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirk True updated CASSANDRA-3974:
---------------------------------

    Attachment: trunk-3974v3.txt
    
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499413#comment-13499413 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

v8 attached to add Column.create factory and use that from addColumn and UpdateStatement
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: 3974-v8.txt, trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495293#comment-13495293 ] 

Sylvain Lebresne commented on CASSANDRA-3974:
---------------------------------------------

bq. Should we make the constructors private and expose a factory method that requires the metadata?

I like that idea.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3974) Per-CF TTL

Posted by "Robert Coli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Coli updated CASSANDRA-3974:
-----------------------------------

    Attachment:     (was: cassandra.1.0.10.replaying.log.after.exception.during.drain.txt)
    
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278616#comment-13278616 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

In my test case, it does go through {{RowIteratorFactory}}, but it *doesn't* go through line 111. In {{getReduced}} {{cached}} is always {{null}} so it calls the {{filter.collateColumns}} path.

So I made this naive change:

{noformat}
if (cached == null)
{
    // not cached: collate
    filter.collateColumns(returnCF, colIters, gcBefore);
    returnCF = ColumnFamilyStore.removeDeleted(returnCF, gcBefore);
}
else
{
    QueryFilter keyFilter = new QueryFilter(key, filter.path, filter.filter);
    returnCF = cfs.filterColumnFamily(cached, keyFilter, gcBefore);
}
{noformat}

Be "manually" calling {{removeDeleted}} I was able to get my columns filtered out as expected.

I'm pretty sure this is incomplete or just plain wrong, but I wanted to get your thoughts.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496334#comment-13496334 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

If we use ttl <= 0 as a signal to use the default ttl in CF.addColumn, how do we override the default to be "no ttl at all?"  Should we treat Integer.MAX_VALUE as "don't use the default, just give me a non-expiring Column?"

Does UpdateStatement go through addColumn eventually?  If so we are duplicating code there.  If not that makes me a bigger fan of centralizing this in a factory method.  (Guess we can leave the other constructors alone.)
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2.0 rc1
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, trunk-3974v4.txt, trunk-3974v5.txt, trunk-3974v6.txt, trunk-3974v7.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251561#comment-13251561 ] 

Jonathan Ellis commented on CASSANDRA-3974:
-------------------------------------------

bq. Part of the code I changed was in CFMetaData's toThrift and fromThrift methods

Let me back up.  I can see two main approaches towards respecting the per-CF ttl:

# Set the column TTL to the max(column, CF) ttl on insert; then the rest of the code doesn't have to know anything changed
# Take max(column, CF) ttl during operations like compaction, and leave column ttl to specify *only* the column TTL

The code in UpdateStatement led me to believe you're going with option 1.  So what I meant by my comment was, you need to make a similar change for inserts done over Thrift RPC, as well.  (to/from Thrift methods are used for telling Thrift clients about the schema, but are not used for insert/update operations.)

Does that help?

bq. Sorry, I'm not sure to which part of the code you're referring

CFMetadata.getTimeToLive.  Sounds like you addressed this anyway.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (CASSANDRA-3974) Per-CF TTL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278489#comment-13278489 ] 

Jonathan Ellis edited comment on CASSANDRA-3974 at 5/18/12 1:43 AM:
--------------------------------------------------------------------

RSVH still goes through removeDeleted.  It can either go through the index AbstractScanIterator which calls getCF at line 195 in KeysSearcher, or the seq scan iterator via CFS.filterColumnFamily (RowIteratorFactory line 111).

Remember that these only remove *expired* tombstones; non-expired ones need to be returned to the coordinator for read repair.
                
      was (Author: jbellis):
    RSVH still goes through getColumnFamily.  It can either go through the index AbstractScanIterator which calls getCF at line 195 in KeysSearcher, or the seq scan iterator via CFS.filterColumnFamily (RowIteratorFactory line 111).

Remember that these only remove *expired* tombstones; non-expired ones need to be returned to the coordinator for read repair.
                  
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Kirk True (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286115#comment-13286115 ] 

Kirk True commented on CASSANDRA-3974:
--------------------------------------

Also, given that the logic is {{min(CF TTL, column TTL)}}, when the user explicitly provides a column TTL longer than the default column family TTL, I would think we'd either want to a) give an error, or b) provide a warning. At this point, the larger value provided by the user is simply ignored.

Thoughts?  
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

Posted by "Robert Coli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417710#comment-13417710 ] 

Robert Coli commented on CASSANDRA-3974:
----------------------------------------

Sorry for the erroneous attachment, somehow JIRA produced a link on creation of https://issues.apache.org/jira/browse/CASSANDRA-4446 which directed me here and I attached before I noticed it was the wrong ticket.
                
> Per-CF TTL
> ----------
>
>                 Key: CASSANDRA-3974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 1.2
>            Reporter: Jonathan Ellis
>            Assignee: Kirk True
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt
>
>
> Per-CF TTL would allow compaction optimizations ("drop an entire sstable's worth of expired data") that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira