You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2012/08/01 12:06:02 UTC

[jira] [Updated] (CASSANDRA-4478) Make index_interval be measured in kb (instead of number of keys)

     [ https://issues.apache.org/jira/browse/CASSANDRA-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-4478:
----------------------------------------

    Attachment: 4478-incomplete.txt

I'll note that changing IndexSummary to consider a byte size instead of number of keys is relatively straightforward. I'm attaching an incomplete patch that does that part.

However, one problem is that we currently use the index summary for different estimate of number of keys in the sstable. And in particular, we need to estimate the number of keys given a range of tokens, which means simply keeping the total number of keys in the sstable is not enough.

The simplest/cheapest solution I can see for that problem would be to add to the IndexSummary a new int[] to keep how many key each sample covers (since it's not constant anymore). That does mean breaking the format of the serialized indexSummary however, but that may in turn be fine if we get this in 1.2 (since index summary aren't save before that). If someone feels like completing the attached patch with that idea, feel free to (I can find other ways to entertain myself).
                
> Make index_interval be measured in kb (instead of number of keys)
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-4478
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4478
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 4478-incomplete.txt
>
>
> Currently, index_interval is measured in number of keys: how may keys before adding an entry to the index summary. After CASSANDRA-2319, each index entry also contains the columns index for the row, so index entry can be a bit bigger and of differing sizes. Measuring in number of keys is thus sub-optimal and difficult to tune, since you might want a different setting depending of whether your rows are big or small, but the setting is global.
> So we should move to measuring the interval in bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira