You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Marcus Eriksson (JIRA)" <ji...@apache.org> on 2015/03/28 20:38:53 UTC

[jira] [Assigned] (CASSANDRA-9060) Anticompaction hangs on bloom filter bitset serialization

     [ https://issues.apache.org/jira/browse/CASSANDRA-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcus Eriksson reassigned CASSANDRA-9060:
------------------------------------------

    Assignee: Marcus Eriksson

> Anticompaction hangs on bloom filter bitset serialization 
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-9060
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9060
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Gustav Munkby
>            Assignee: Marcus Eriksson
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: trunk-9060.patch
>
>
> I tried running an incremental repair against a 15-node vnode-cluster with roughly 500GB data running on 2.1.3-SNAPSHOT, without performing the suggested migration steps. I manually chose a small range for the repair (using --start/end-token). The actual repair part took almost no time at all, but the anticompactions took a lot of time (not surprisingly).
> Obviously, this might not be the ideal way to run incremental repairs, but I wanted to look into what made the whole process so slow. The results were rather surprising. The majority of the time was spent serializing bloom filters.
> The reason seemed to be two-fold. First, the bloom-filters generated were huge (probably because the original SSTables were large). With a proper migration to incremental repairs, I'm guessing this would not happen. Secondly, however, the bloom filters were being written to the output one byte at a time (with quite a few type-conversions on the way) to transform the little-endian in-memory representation to the big-endian on-disk representation.
> I have implemented a solution where big-endian is used in-memory as well as on-disk, which obviously makes de-/serialization much, much faster. This introduces some slight overhead when checking the bloom filter, but I can't see how that would be problematic. An obvious alternative would be to still perform the serialization/deserialization using a byte array, but perform the byte-order swap there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)