You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jason Brown (JIRA)" <ji...@apache.org> on 2012/11/13 22:20:12 UTC

[jira] [Updated] (CASSANDRA-4885) Remove or rework per-row bloom filters

     [ https://issues.apache.org/jira/browse/CASSANDRA-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Brown updated CASSANDRA-4885:
-----------------------------------

    Attachment: 0001-CASSANRDA-4885-Remove-per-row-bloom-filter.patch

OK, so I've removed the bloom filter from the row header and the key index. For the row header, on reads we just skip over the BF section that's on disk (if the sstable is any version lesser than the new 'ja'), and never write it out when serializing. For the index, more or less the same solution. I think I've caught the places in the logic to make sure it does the right thing (don't misread existing files, never write out the BF in new files).

However, I did run into a problem with one of the unit tests. ScrubTest.testScrubFile() expects a file that is corrupted so it can then attempt to fix it. The existing corrupt file (test/data/corrupt-sstables/Keyspace1-Super5-f-2-Data.db) has a corrupted BF that throws an exception the unit test is expecting. Now that we're skipping over the BF in the row header, however, the exception never gets thrown and the test fails because the file is 'no longer corrupt' :). I hacked up the code to trip a failure (by changing the  columns count), but can somebody recommend a good way to create a corrupt file?
                
> Remove or rework per-row bloom filters
> --------------------------------------
>
>                 Key: CASSANDRA-4885
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4885
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>             Fix For: 1.3
>
>         Attachments: 0001-CASSANRDA-4885-Remove-per-row-bloom-filter.patch
>
>
> Per-row bloom filters may be a misfeature.
> On small rows we don't create them.
> On large rows we essentially only do slice queries that can't take advantage of it.
> And on very large rows if we ever did deserialize it, the performance hit of doing so would outweigh the benefit of skipping the actual read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira