You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (Created) (JIRA)" <ji...@apache.org> on 2012/02/17 17:15:59 UTC

[jira] [Created] (CASSANDRA-3929) Support row size limits

Support row size limits
-----------------------

                 Key: CASSANDRA-3929
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Jonathan Ellis


We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449423#comment-13449423 ] 

Jonathan Ellis commented on CASSANDRA-3929:
-------------------------------------------

That's what Reversed comparator is for. :)

(Non-facetiously, that is what we'd recommend in that case since reading from start of row can skip index deserialization for a decent speedup.  Basically you only want to be reading from end of row if that's a once-in-a-while query.  If it's your main query, reverse it at the comparator level, not the query level.)
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Drew Kutcharian (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211621#comment-13211621 ] 

Drew Kutcharian commented on CASSANDRA-3929:
--------------------------------------------

I agree with Sylvain. Most of the time all you care is to have a capped collection, for example keeping a history or an audit log for something.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Dave Brosius (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3929:
------------------------------------

    Attachment: 3929_c.txt
    
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Sylvain Lebresne (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210358#comment-13210358 ] 

Sylvain Lebresne commented on CASSANDRA-3929:
---------------------------------------------

Agreed, it's hard to do efficiently. What is easy to do is to write a compaction strategy (or have a strategy option) that only keeps the N first columns on each compaction. Of course, that doesn't guarantee that you will only get the N most recent columns but in practice that would fairly efficiently get rid of the excess data, which I believe is mostly what people care about. Basically it would really just be "we'll discard everything we know is out of the N first columns". I suspect that in practice that may be the good trade-off, but given it's not perfect I've always though that it probably make more sense as an externally contributed compaction strategy.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449412#comment-13449412 ] 

Jonathan Ellis commented on CASSANDRA-3929:
-------------------------------------------

I think we also need to limit

- on flush, since there's no need to knowingly save data we don't want
- on LCR as well as PCR, since enough small rows can still overflow to LCR mode

Also: if I'm understanding correctly, we're tombstoneing the beginning of the row here?  ISTM tombstoning the end of the row will be more in keeping with our advice that "querying from the start of the row in comparator order is fastest."
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Rick Branson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408165#comment-13408165 ] 

Rick Branson commented on CASSANDRA-3929:
-----------------------------------------

Would love to see this as well, as a way to keep data sizes for wide rows under control, for use cases where old data at the tail of the row becomes more or less useless and time is not a dependable dimension to use as a truncation method.

Clearly it doesn't have to be perfect as far as how much data it actually keeps around, but I'd like to see the CF configuration be a lower bound on the number of columns kept. Basically a way to communicate to Cassandra what your requirement is as far as retention, and it takes care of meeting that target. An acceptable edge case (at least from my perspective) where this might be "break" is if the user does their own deletion of some columns.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Rick Branson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449420#comment-13449420 ] 

Rick Branson commented on CASSANDRA-3929:
-----------------------------------------

+1 for tombstoning the tail of the row and not the head.

If you want the most recent data at the head of the row, use a ReversedType(TimeUUIDType) comparator. Grabbing the tail on every query will kill performance.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487467#comment-13487467 ] 

Jonathan Ellis commented on CASSANDRA-3929:
-------------------------------------------

Good idea putting the code in the index Builder!

It looks though that build() is only used when we can fit the row in memory, otherwise LazilyCompactedRow calls {{add}} directly (also called by {{build}}).  So I think you're going to need to move the retained row count into the Builder instance to maintain state across invocations.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Dave Brosius (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3929:
------------------------------------

    Attachment: 3929_b.txt

retain the columns at the front of the row. This patch needs to add tombstoning of columns on flush as well, as suggested by jbellis. (in progress)
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929_b.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Dave Brosius (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3929:
------------------------------------

    Attachment: 3929_c.txt
    
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-3929) Support row size limits

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449412#comment-13449412 ] 

Jonathan Ellis edited comment on CASSANDRA-3929 at 9/6/12 3:41 PM:
-------------------------------------------------------------------

I think we also need to limit

- on flush, since there's no need to knowingly save data we don't want
- on LCR as well as PCR, since enough small rows can still overflow to LCR mode

Also: if I'm understanding correctly, we're tombstoning the beginning of the row here?  ISTM tombstoning the end of the row will be more in keeping with our advice that "querying from the start of the row in comparator order is fastest."
                
      was (Author: jbellis):
    I think we also need to limit

- on flush, since there's no need to knowingly save data we don't want
- on LCR as well as PCR, since enough small rows can still overflow to LCR mode

Also: if I'm understanding correctly, we're tombstoneing the beginning of the row here?  ISTM tombstoning the end of the row will be more in keeping with our advice that "querying from the start of the row in comparator order is fastest."
                  
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Fabien Rousseau (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510583#comment-13510583 ] 

Fabien Rousseau commented on CASSANDRA-3929:
--------------------------------------------

Hum, the current patch works if no deletes are done...

Let's have an example with deletes :
Suppose that we want to keep 3 columns, and have standard comparator.
Let's insert 4 column names : E, F, G, H
Then flush (on the SSTable, we will have : E, F, G, tombstone(H) ).
Let's insert another 4 column names : A, B, C, D
Then delete column B.
Then flush (on the SSTable, we will have : A, tombstone(B), C, tombstone(D) )

With the current patch (which excludes tombstones in the count on the read path) :
reading the first 3 columns would return : A,C,E
By including the tombstones in the count in the read path :
reading the first 3 columns would return : A,C

I think returning A,C,E is incorrect because last inserted columns where A,C,D.

So, to support delete, there is also something to do on the read path (include tombstones in columns count, so it never goes after "maxColumns").

I propose the patch 3929_e.txt.



                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.3
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929_d.txt, 3929_e.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3929:
--------------------------------------

    Affects Version/s:     (was: 1.2.1)
        Fix Version/s:     (was: 1.2.1)
                       1.3
    
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.3
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Ahmet AKYOL (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222406#comment-13222406 ] 

Ahmet AKYOL commented on CASSANDRA-3929:
----------------------------------------

Here are some example hypothetical column family storage parameters for this feature:

max_column_number_hint : 1000 // meaning: try to keep around 1000 columns. Since it's a hint, we(users) are OK with tombstones or 800 - 1200 range

or

max_row_size_hint : 1MB
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Dave Brosius (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3929:
------------------------------------

    Attachment:     (was: 3929_c.txt)
    
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Colin Taylor (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211725#comment-13211725 ] 

Colin Taylor commented on CASSANDRA-3929:
-----------------------------------------

Another for the compaction strategy. We're required to keep at least N days of logs so would like to bound our usage without needing precision.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501467#comment-13501467 ] 

Jonathan Ellis commented on CASSANDRA-3929:
-------------------------------------------

The good news is, this looks good. (Nit: getRetainedColumnCount would be a bit cleaner as a method on CFMetaData.)

The bad news is, I think we need to scope creep -- the right unit of retention is the cql3 row.  For {{COMPACT STORAGE}} there is one row per cell, but otherwise it gets complicated... there's a "this row exists" marker cell, and collection columns become one cell per entry.  Dealing with partial (cql3) rows is not something we want to inflict on users, so we should enable column tombstoning only on cql3 row boundaries.

cfmetadata.cqlCfDef will have the information we need to do this, in particulary {{isCompact}} and {{keys}}.  (See www.datastax.com/dev/blog/thrift-to-cql3.)

I suspect you're going to want a unit test or two here.  QueryProcessor.processInternal is probably the easiest way to do cql from a test.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.3
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929_d.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Dave Brosius (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3929:
------------------------------------

    Attachment: 3929.txt
    
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Fabien Rousseau (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fabien Rousseau updated CASSANDRA-3929:
---------------------------------------

    Attachment: 3929_e.txt
    
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.3
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929_d.txt, 3929_e.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Dave Brosius (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3929:
------------------------------------

    Attachment: 3929_d.txt

store state in Builder, and push logic to add
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.3
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929_d.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Dave Brosius (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3929:
------------------------------------

    Attachment:     (was: 3929.txt)
    
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3929) Support row size limits

Posted by "Colin Taylor (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13211725#comment-13211725 ] 

Colin Taylor edited comment on CASSANDRA-3929 at 2/20/12 9:09 AM:
------------------------------------------------------------------

Another vote for the compaction strategy. We're required to keep at least N days worth of logs so would like to bound our usage without needing precision.
                
      was (Author: coltnz):
    Another for the compaction strategy. We're required to keep at least N days of logs so would like to bound our usage without needing precision.
                  
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3929:
--------------------------------------

    Reviewer: jbellis
    Assignee: Dave Brosius  (was: Dave Brosius)
    
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929_b.txt, 3929_c.txt, 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3929:
--------------------------------------

    Priority: Minor  (was: Major)
    
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210354#comment-13210354 ] 

Jonathan Ellis commented on CASSANDRA-3929:
-------------------------------------------

This is difficult to do efficiently, since it implies checking the entire row's contents on each update.  (Skipping this, and only checking/deleting obsolete columns at read time, means you could blow your column budget by arbitrarily large amounts during write intensive workloads.)  Even checking randomly on say 1% of writes could dramatically affect write performance for larger-than-memory datasets.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>              Labels: ponies
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3929) Support row size limits

Posted by "Dave Brosius (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Brosius updated CASSANDRA-3929:
------------------------------------

    Attachment: 3929.txt
    
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3929) Support row size limits

Posted by "Dave Brosius (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449417#comment-13449417 ] 

Dave Brosius commented on CASSANDRA-3929:
-----------------------------------------

{quote}
Also: if I'm understanding correctly, we're tombstoning the beginning of the row here? ISTM tombstoning the end of the row will be more in keeping with our advice that "querying from the start of the row in comparator order is fastest."
{quote}

It seems to me you would want this feature only when you have some sort of time based column name scheme, and thus you only want to save the most recent n samples. Thus tossing out the old ones.
                
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (CASSANDRA-3929) Support row size limits

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449423#comment-13449423 ] 

Jonathan Ellis edited comment on CASSANDRA-3929 at 9/6/12 4:06 PM:
-------------------------------------------------------------------

That's what Reversed comparator is for. :)

(Non-facetiously, that is what we'd recommend in that case since reading from start of row can skip index deserialization for a decent speedup.  Basically you only want to be reading from end of row if that's a once-in-a-while query.  If it's your main query, reverse it at the comparator level, not the query level.)

Edit: Rick typed faster than I did.
                
      was (Author: jbellis):
    That's what Reversed comparator is for. :)

(Non-facetiously, that is what we'd recommend in that case since reading from start of row can skip index deserialization for a decent speedup.  Basically you only want to be reading from end of row if that's a once-in-a-while query.  If it's your main query, reverse it at the comparator level, not the query level.)
                  
> Support row size limits
> -----------------------
>
>                 Key: CASSANDRA-3929
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3929
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 1.2.1
>            Reporter: Jonathan Ellis
>            Assignee: Dave Brosius
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.2.1
>
>         Attachments: 3929.txt
>
>
> We currently support expiring columns by time-to-live; we've also had requests for keeping the most recent N columns in a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira