You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (Created) (JIRA)" <ji...@apache.org> on 2011/11/15 23:31:51 UTC

[jira] [Created] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

BloomFilter FP ratio should be configurable or size-restricted some other way
-----------------------------------------------------------------------------

                 Key: CASSANDRA-3497
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Brandon Williams


When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though so you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175279#comment-13175279 ] 

Yuki Morishita commented on CASSANDRA-3497:
-------------------------------------------

Jonathan,

Yours is what I first tried, but instead I tried to do it in SSTR, and I think that is what we can do best for 1.0.x.
One thing to point out is that it NPE when fpChance is null and try to convert it to double at SSTableWriter.java#403.

                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 3497-v3.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3497:
--------------------------------------

    Attachment:     (was: 3497-v3.txt)
    
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174278#comment-13174278 ] 

Yuki Morishita commented on CASSANDRA-3497:
-------------------------------------------

The problem is that currently strategy_options for NTS is thoroughly for replication setting, for example {DC1:2, DC2:2}.
We can do like strategy_options={DC1:2, DC2:1, DC2:fp(0.5)} or strategy_options={DC1:2, DC2:1,fp(0.5)} or something  preserving backward compatibility, but I think it's complicated.

Maybe easiest fix is to have node-wide setting for fp ratio in cassandra.yaml (w/ jmx interface exposed) and have different values for each datacenter?
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Yuki Morishita (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-3497:
--------------------------------------

    Attachment: 0001-Add-bloom_filter_fp_chance-to-cli.patch

Patch attached so that cli show schema or describe commands show bloom_filter_fp_chance if set.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-Add-bloom_filter_fp_chance-to-cli.patch, 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reopened CASSANDRA-3497:
---------------------------------------

    
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Radim Kolar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169512#comment-13169512 ] 

Radim Kolar commented on CASSANDRA-3497:
----------------------------------------

BF configuration needs to be per CF like in HBASE. This will allow to have CF used for log with minimal BF if their rows are rarely read back.

See HBASE for example:
http://hbase.apache.org/book/blooms.html#d1161e4353
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Priority: Minor
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3497:
--------------------------------------

    Attachment: 3497-v3.txt
    
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174392#comment-13174392 ] 

Brandon Williams commented on CASSANDRA-3497:
---------------------------------------------

bq. Maybe easiest fix is to have node-wide setting for fp ratio in cassandra.yaml (w/ jmx interface exposed) and have different values for each datacenter?

Yes, I think that's good enough for the multi-datacenter scenario, however as Radim mentioned we also have a good use case for a per-CF threshold.  We could do both, and then use whichever value is the lower, the one in the CF schema or the one in the node's yaml.


                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-3497.
---------------------------------------

    Resolution: Fixed

bq. Patch attached so that cli show schema or describe commands show bloom_filter_fp_chance if set.

committed
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-Add-bloom_filter_fp_chance-to-cli.patch, 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150851#comment-13150851 ] 

Brandon Williams commented on CASSANDRA-3497:
---------------------------------------------

Perhaps as a strategy_option?
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Priority: Minor
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3497:
--------------------------------------

    Attachment: 3497-v3.txt
    
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 3497-v3.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278356#comment-13278356 ] 

Brandon Williams commented on CASSANDRA-3497:
---------------------------------------------

Note for others trying to disable their BF: despite earlier discussion on this ticket, zero is NOT disabled, but instead sets it back to the default, since 0 false positives is invalid.  You actually want to set it to 1 to have the smallest possible filter.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-Add-bloom_filter_fp_chance-to-cli.patch, 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Ophir Radnitz (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179439#comment-13179439 ] 

Ophir Radnitz commented on CASSANDRA-3497:
------------------------------------------

We've tried this patch with 1.0.6 with fp_ratio of 0.99 (if I get it correctly, after a major compaction and a single albeit large SSTable, bloom filter has very little effect). We've found that many records that were inserted counld not be fetched in a multiget_slice query. It seemed as if the bloom filters resulted in *false negatives*.

By the way, the fix patch (0001-give-default-val-to-fp_chance.patch) works for the 1.1 branch but not for 1.0.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-3497.
---------------------------------------

    Resolution: Fixed

committed
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Radim Kolar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170529#comment-13170529 ] 

Radim Kolar commented on CASSANDRA-3497:
----------------------------------------

It will be good to have ability to shrink bloom filter during loading. Save only standard cassandra bloom filters but shrink them during load according to CF settings.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Priority: Minor
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Yuki Morishita (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-3497:
--------------------------------------

    Attachment: CASSANDRA-1.0-3497.txt

OK, in attached patch, I removed filter_enabled option.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3497:
--------------------------------------

    Priority: Minor  (was: Major)

Hmm, that sounds messy.  How do you propose to distinguish BF configuration per-datacenter in the schema?
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Priority: Minor
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reopened CASSANDRA-3497:
---------------------------------------

    
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175462#comment-13175462 ] 

Yuki Morishita commented on CASSANDRA-3497:
-------------------------------------------

+1
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Ophir Radnitz (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182226#comment-13182226 ] 

Ophir Radnitz commented on CASSANDRA-3497:
------------------------------------------

I actually applied the 'CASSANDRA-1.0-3497' patch, which I can see now is not the most updated one. We'll probably revisit this once 1.0.7 is out.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-Add-bloom_filter_fp_chance-to-cli.patch, 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179525#comment-13179525 ] 

Jonathan Ellis commented on CASSANDRA-3497:
-------------------------------------------

bq. the fix patch (0001-give-default-val-to-fp_chance.patch) works for the 1.1 branch but not for 1.0

it's already applied to both.  (Note that we've switched to git, the old svn repo is no longer maintained.)
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-3497:
----------------------------------------

    Description: When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.  (was: When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though so you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.)
    
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Yuki Morishita (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-3497:
--------------------------------------

    Attachment:     (was: CASSANDRA-1.0-3497.txt)
    
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175123#comment-13175123 ] 

Jonathan Ellis commented on CASSANDRA-3497:
-------------------------------------------

Can we do it with a single setting?

fp_ratio = null: use current 15-buckets-per-element filters
fp_ratio = 0: no filter
fp_ratio > 0: BF based on given FP probability

Further, I think we should split this up so that for 1.0 we only worry about the null and positive cases -- let's do a separate ticket for 1.1 about skipping the BF entirely.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174396#comment-13174396 ] 

Jonathan Ellis commented on CASSANDRA-3497:
-------------------------------------------

Let's just go with a per-CF option.  Brandon's right that ideally we'd like to configure it differently (ideally leaving them out entirely) in analytical DCs but I don't want to invent a totally new concept in 1.0.x, and having it per-CF (which we get via schema) is more important than having it per-DC (which we get with strategy_options).
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Yuki Morishita (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-3497:
--------------------------------------

    Attachment: CASSANDRA-1.0-3497.txt

I added 2 new Bloom Filter related options  to CFMetadata.

- filter_enabled
  if set to false, SSTableReader uses EMPTY bloom filter. Default to true.

- fp_ratio
  if the value is greater than 0, SSTableReader adjusts Bloom Filter based on FP ratio and uses it. Default to 0.

BloomFilter is created and saved as usual, but when opening SSTableReader, you got the one based on the CF setting.

One thing to note is that the change is effective when next time SSTableReader is opened, so  you need to restart node or compact/scrub sstable for existing sstables.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Yuki Morishita (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-3497:
--------------------------------------

    Attachment: 0001-give-default-val-to-fp_chance.patch

Radim,

Thanks for the report. The problem is that the new bloom_filter_fp_chance in avro interface definition does not have proper default.
I attached the patch to fix it.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3497:
--------------------------------------

    Attachment: 3497-v3.txt

Sorry, I didn't look closely enough the first time.  The BloomFilter#modify approach won't work: when we change the BF parameters we change what bits should be set -- there's no way to rebuild it with new parameters without re-inserting all the keys.

Attached v3 that just changes the BloomFilter constructor in SSTableWriter.  (So, people will have to scrub to rebuild things, but that's the best we can do.)  Also changed the setting to bloom_filter_fp_chance and updated cli help.

How does that look to you?
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 3497-v3.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180914#comment-13180914 ] 

Jonathan Ellis commented on CASSANDRA-3497:
-------------------------------------------

bq. We've found that many records that were inserted counld not be fetched in a multiget_slice query. It seemed as if the bloom filters resulted in false negatives.

I have trouble understanding how this could be the case, because if our BF could cause false negatives then surely we'd see that even at today's low default FP rates.  This patch didn't change how the BF is used, only the parameters it's created with, nor does it try to retrofit the new BF parameters onto existing sstables.

You did apply the v4 patch and not an earlier one, right?
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-Add-bloom_filter_fp_chance-to-cli.patch, 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3497:
--------------------------------------

    Attachment:     (was: 3497-v3.txt)
    
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 3497-v3.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Radim Kolar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175760#comment-13175760 ] 

Radim Kolar commented on CASSANDRA-3497:
----------------------------------------

i compiled jars with this patch and cassandra do not boots an existing node

 Opening /var/lib/cassandra/data/system/Migrations-hc-109 (757635 bytes)
 INFO [SSTableBatchOpen:1] 2011-12-24 18:26:47,326 SSTableReader.java (line 134) Opening /var/lib/cassandra/data/system/LocationInfo-hc-273 (647 bytes)
 INFO [SSTableBatchOpen:1] 2011-12-24 18:26:47,338 SSTableReader.java (line 134) Opening /var/lib/cassandra/data/system/HintsColumnFamily-hc-1 (275 bytes)
 INFO [SSTableBatchOpen:2] 2011-12-24 18:26:47,338 SSTableReader.java (line 134) Opening /var/lib/cassandra/data/system/HintsColumnFamily-hc-2 (85 bytes)
 INFO [main] 2011-12-24 18:26:47,396 DatabaseDescriptor.java (line 501) Loading schema version ad8d50b0-2cc3-11e1-0000-b1504fb874be
ERROR [main] 2011-12-24 18:26:47,555 AbstractCassandraDaemon.java (line 372) Exception encountered during startup
org.apache.avro.AvroTypeException: Found {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.db.migration.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":"boolean","default":false},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"key_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"row_cache_keys_to_save","type":["null","int"],"default":null},{"name":"merge_shards_chance","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS","CUSTOM"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]},{"name":"index_options","type":["null",{"type":"map","values":"string"}],"default":null}]}},"null"]},{"name":"row_cache_provider","type":["string","null"],"default":"org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider"},{"name":"key_alias","type":["null","bytes"],"default":null},{"name":"compaction_strategy","type":["null","string"],"default":null},{"name":"compaction_strategy_options","type":["null",{"type":"map","values":"string"}],"default":null},{"name":"compression_options","type":["null",{"type":"map","values":"string"}],"default":null}]}, expecting {"type":"record","name":"CfDef","namespace":"org.apache.cassandra.db.migration.avro","fields":[{"name":"keyspace","type":"string"},{"name":"name","type":"string"},{"name":"column_type","type":["string","null"]},{"name":"comparator_type","type":["string","null"]},{"name":"subcomparator_type","type":["string","null"]},{"name":"comment","type":["string","null"]},{"name":"row_cache_size","type":["double","null"]},{"name":"key_cache_size","type":["double","null"]},{"name":"read_repair_chance","type":["double","null"]},{"name":"replicate_on_write","type":"boolean","default":false},{"name":"gc_grace_seconds","type":["int","null"]},{"name":"default_validation_class","type":["null","string"],"default":null},{"name":"key_validation_class","type":["null","string"],"default":null},{"name":"min_compaction_threshold","type":["null","int"],"default":null},{"name":"max_compaction_threshold","type":["null","int"],"default":null},{"name":"row_cache_save_period_in_seconds","type":["int","null"],"default":0},{"name":"key_cache_save_period_in_seconds","type":["int","null"],"default":3600},{"name":"row_cache_keys_to_save","type":["null","int"],"default":null},{"name":"merge_shards_chance","type":["null","double"],"default":null},{"name":"id","type":["int","null"]},{"name":"column_metadata","type":[{"type":"array","items":{"type":"record","name":"ColumnDef","fields":[{"name":"name","type":"bytes"},{"name":"validation_class","type":"string"},{"name":"index_type","type":[{"type":"enum","name":"IndexType","symbols":["KEYS","CUSTOM"],"aliases":["org.apache.cassandra.config.avro.IndexType"]},"null"]},{"name":"index_name","type":["string","null"]},{"name":"index_options","type":["null",{"type":"map","values":"string"}],"default":null}],"aliases":["org.apache.cassandra.config.avro.ColumnDef"]}},"null"]},{"name":"row_cache_provider","type":["string","null"],"default":"org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider"},{"name":"key_alias","type":["null","bytes"],"default":null},{"name":"compaction_strategy","type":["null","string"],"default":null},{"name":"compaction_strategy_options","type":["null",{"type":"map","values":"string"}],"default":null},{"name":"compression_options","type":["null",{"type":"map","values":"string"}],"default":null},{"name":"bloom_filter_fp_chance","type":["double","null"]}],"aliases":["org.apache.cassandra.config.avro.CfDef"]}
        at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:212)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
        at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:138)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
        at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:192)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:116)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105)
        at org.apache.cassandra.io.SerDeUtils.deserialize(SerDeUtils.java:60)
        at org.apache.cassandra.db.DefsTable.loadFromStorage(DefsTable.java:98)
        at org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:502)
        at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:179)
        at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:355)
        at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:107)

                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Radim Kolar (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176313#comment-13176313 ] 

Radim Kolar commented on CASSANDRA-3497:
----------------------------------------

FP ratio it is not displayed in output of cli: show schema, describe;

                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3497:
--------------------------------------

    Fix Version/s: 1.1
         Assignee: Yuki Morishita
    
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.1
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182267#comment-13182267 ] 

Jonathan Ellis commented on CASSANDRA-3497:
-------------------------------------------

Makes sense, that's what I was referring to when I reviewed that patch and said "the BloomFilter#modify approach won't work."  v4 / 1.0 branch should be fine.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 0001-Add-bloom_filter_fp_chance-to-cli.patch, 0001-give-default-val-to-fp_chance.patch, 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3497:
--------------------------------------

    Attachment: 3497-v3.txt
    
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 3497-v3.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3497) BloomFilter FP ratio should be configurable or size-restricted some other way

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3497:
--------------------------------------

    Attachment: 3497-v4.txt

v4 attached with unbox-of-null fixed.
                
> BloomFilter FP ratio should be configurable or size-restricted some other way
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3497
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3497
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.7
>
>         Attachments: 3497-v3.txt, 3497-v4.txt, CASSANDRA-1.0-3497.txt
>
>
> When you have a live dc and purely analytical dc, in many situations you can have less nodes on the analytical side, but end up getting restricted by having the BloomFilters in-memory, even though you have absolutely no use for them.  It would be nice if you could reduce this memory requirement by tuning the desired FP ratio, or even just disabling them altogether.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira