You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Michael Harris (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/03/09 04:38:03 UTC

[jira] [Issue Comment Edited] (CASSANDRA-4023) Batch reading BloomFilters on startup

    [ https://issues.apache.org/jira/browse/CASSANDRA-4023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225777#comment-13225777 ] 

Michael Harris edited comment on CASSANDRA-4023 at 3/9/12 3:37 AM:
-------------------------------------------------------------------

My $0.02 is that it may be helpful to batch reads.  Not sure if the underlying stream used in reading the bloom filters reads a large chunk and caches it, but if not, it could help to instead of just calling ois.readLong(), you read 64K or 1M or whatever you feel is appropriate (maybe configurable?) into a buffer and grab the longs out of those.  This doesn't completely fix the problem of disk contention, but it might cause larger sequential reads to be submitted to the disk, which then might behave nicer?

The specific example I'm thinking of here is: it looks like the deserialization of LegacyBloomFilter (perhaps what 0.8 uses?) is just a ois.readObject() for a BitSet.  And that's like, it.  Whereas for BloomFilter (the new version?), deserialization is a tight loop of readLong() calls.  Same with serialization FWIW.  Not that using Java serialization for LTS is necessarily a good idea, but it may be happier for the disk.
                
      was (Author: mharris):
    My $0.02 is that it may be helpful to batch reads.  Not sure if the underlying stream used in reading the bloom filters reads a large chunk and caches it, but if not, it could help to instead of just calling ois.readLong(), you read 64K or 1M or whatever you feel is appropriate (maybe configurable?) into a buffer and grab the longs out of those.  This doesn't completely fix the problem of disk contention, but it might cause larger sequential reads to be submitted to the disk, which then might behave nicer?
                  
> Batch reading BloomFilters on startup
> -------------------------------------
>
>                 Key: CASSANDRA-4023
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4023
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Joaquin Casares
>              Labels: datastax_qa
>
> The difference of startup times between a 0.8.7 cluster and 1.0.7 cluster with the same amount of data is 4x greater in 1.0.7.
> It seems as though 1.0.7 loads the BloomFilter through a series of reading longs out in a multithreaded process while 0.8.7 reads the entire object.
> Perhaps we should update the new BloomFilter to do reading in batch as well?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira