You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yuki Morishita (JIRA)" <ji...@apache.org> on 2013/11/15 23:27:21 UTC

[jira] [Commented] (CASSANDRA-6356) Proposal: Statistics.db (SSTableMetadata) format change

    [ https://issues.apache.org/jira/browse/CASSANDRA-6356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13824176#comment-13824176 ] 

Yuki Morishita commented on CASSANDRA-6356:
-------------------------------------------

Pushed proposal to: https://github.com/yukim/cassandra/tree/6356-v1

New format will have sections separated by type and size(https://github.com/yukim/cassandra/blob/6356-v1/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java).
Initially, I created 3 metadata sections(or components), Validation, Compaction and Stats.

* ValidationMetadata: properties only used to validate SSTable before opening(partitioner name and bloom filter fp chance).
* CompactionMetadata: properties meant to be accessed only on compaction(ancestors).
* StatsMetadata: everything else that are kept in memory.

Note that CompactionMetadata is loaded for "compacting" SSTable. Tombstone drop time histogram and SSTable level are frequently used to determine compaction candidates, so those are kept in memory as StatsMetadata.


> Proposal: Statistics.db (SSTableMetadata) format change
> -------------------------------------------------------
>
>                 Key: CASSANDRA-6356
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6356
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 2.1
>
>
> We started to distinguish what's loaded to heap, and what's not from Statistics.db. For now, ancestors are loaded as they needed.
> Current serialization format is so adhoc that adding new metadata that are not permanently hold onto memory is somewhat difficult and messy. I propose to change serialization format so that a group of stats can be loaded as needed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)