You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Radim Kolar (Created) (JIRA)" <ji...@apache.org> on 2012/02/26 16:04:54 UTC

[jira] [Created] (CASSANDRA-3961) Make index_interval configurable per column family

Make index_interval configurable per column family
--------------------------------------------------

                 Key: CASSANDRA-3961
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
             Project: Cassandra
          Issue Type: Improvement
    Affects Versions: 1.0.7
         Environment: Cassandra 1.0.7/unix
            Reporter: Radim Kolar
            Priority: Minor


After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 

There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501464#comment-13501464 ] 

Jonathan Ellis commented on CASSANDRA-3961:
-------------------------------------------

Eyeballing this looks reasonable.  Can you rebase to trunk?
                
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>             Fix For: 1.3
>
>         Attachments: cass-interval1.txt, cass-interval2.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507808#comment-13507808 ] 

Jonathan Ellis commented on CASSANDRA-3961:
-------------------------------------------

My mistake.  v6 lgtm; rebased and committed.
                
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>             Fix For: 1.3
>
>         Attachments: cass-interval1.txt, cass-interval2.txt, cass-interval3.txt, cass-interval4.txt, cass-interval5.txt, cass-interval6.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Jiri Eichler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236627#comment-13236627 ] 

Jiri Eichler commented on CASSANDRA-3961:
-----------------------------------------

If there are column families of very different sizes, such option will save a lot of memory and since it's not any technical limitation of the Cassandra's design, this option should be available.
                
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.7
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Priority: Minor
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radim Kolar updated CASSANDRA-3961:
-----------------------------------

    Attachment: cass-interval6.txt

some extra whitespace cleaned
                
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>             Fix For: 1.3
>
>         Attachments: cass-interval1.txt, cass-interval2.txt, cass-interval3.txt, cass-interval4.txt, cass-interval5.txt, cass-interval6.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502390#comment-13502390 ] 

Jonathan Ellis commented on CASSANDRA-3961:
-------------------------------------------

Comments on v4:

- Shouldn't need to touch avro, that's just for compatibility with 1.0
- Please observe http://wiki.apache.org/cassandra/CodeStyle
- If index interval is changed, we should regenerate it on restart instead of using the saved summary with the old interval

                
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>             Fix For: 1.3
>
>         Attachments: cass-interval1.txt, cass-interval2.txt, cass-interval3.txt, cass-interval4.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502998#comment-13502998 ] 

Radim Kolar commented on CASSANDRA-3961:
----------------------------------------

your review is wrong. #3 is done by load(boolean)
                
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>             Fix For: 1.3
>
>         Attachments: cass-interval1.txt, cass-interval2.txt, cass-interval3.txt, cass-interval4.txt, cass-interval5.txt, cass-interval6.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radim Kolar reassigned CASSANDRA-3961:
--------------------------------------

    Assignee: Radim Kolar
    
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.7
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>            Priority: Minor
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radim Kolar updated CASSANDRA-3961:
-----------------------------------

    Attachment: cass-interval2.txt
    
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.7
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>            Priority: Minor
>         Attachments: cass-interval1.txt, cass-interval2.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radim Kolar updated CASSANDRA-3961:
-----------------------------------

    Attachment: cass-interval1.txt
    
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.0.7
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>            Priority: Minor
>         Attachments: cass-interval1.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radim Kolar updated CASSANDRA-3961:
-----------------------------------

    Attachment: cass-interval3.txt
    
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>             Fix For: 1.3
>
>         Attachments: cass-interval1.txt, cass-interval2.txt, cass-interval3.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radim Kolar updated CASSANDRA-3961:
-----------------------------------

    Attachment: cass-interval5.txt
    
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>             Fix For: 1.3
>
>         Attachments: cass-interval1.txt, cass-interval2.txt, cass-interval3.txt, cass-interval4.txt, cass-interval5.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radim Kolar updated CASSANDRA-3961:
-----------------------------------

          Component/s: Core
             Priority: Major  (was: Minor)
    Affects Version/s:     (was: 1.0.7)
                       1.2.0 beta 1
    
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.2.0 beta 1
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>         Attachments: cass-interval1.txt, cass-interval2.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3961) Make index_interval configurable per column family

Posted by "Radim Kolar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Radim Kolar updated CASSANDRA-3961:
-----------------------------------

    Attachment: cass-interval4.txt
    
> Make index_interval configurable per column family
> --------------------------------------------------
>
>                 Key: CASSANDRA-3961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3961
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 1.0.7/unix
>            Reporter: Radim Kolar
>            Assignee: Radim Kolar
>             Fix For: 1.3
>
>         Attachments: cass-interval1.txt, cass-interval2.txt, cass-interval3.txt, cass-interval4.txt
>
>
> After various experiments with mixing OLTP a OLAP workload running on single cassandra cluster i discovered that lot of memory is wasted on holding index samples for CF which are rarely accessed or index is not much used for CF access because slices over keys are used. 
> There is per column family setting for configuring bloom filters - bloom_filter_fp_chance. Please add setting index_interval configurable per CF as well. If this setting is not set or it is zero, default from cassandra.yaml will be used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira