You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stu Hood (Updated) (JIRA)" <ji...@apache.org> on 2012/04/15 20:25:17 UTC

[jira] [Updated] (CASSANDRA-3943) Too many small size sstables after loading data using sstableloader or BulkOutputFormat increases compaction time.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stu Hood updated CASSANDRA-3943:
--------------------------------

    Assignee:     (was: Stu Hood)

I talked to Peter after my initial comment, and he's right: we'd actually prefer to dump in many small sstables _if-and-only-if_ they can be correctly incorporated into the interval tree such that they don't trigger a ton of compaction. This would allow the bulk insert to warm up incrementally as it is loaded, rather than all at once, as it would with one large sstable.

But unfortunately, work commitments mean I won't be able to start investigating this anytime soon, so I'm un-assigning it.
                
> Too many small size sstables after loading data using sstableloader or BulkOutputFormat increases compaction time.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3943
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3943
>             Project: Cassandra
>          Issue Type: Wish
>          Components: Hadoop, Tools
>    Affects Versions: 0.8.2, 1.1.0
>            Reporter: Samarth Gahire
>            Priority: Minor
>              Labels: bulkloader, hadoop, sstableloader, streaming, tools
>             Fix For: 1.2
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When we create sstables using SimpleUnsortedWriter or BulkOutputFormat,the size of sstables created is around the buffer size provided.
> But After loading , sstables created in the cluster nodes are of size around
> {code}( (sstable_size_before_loading) * replication_factor ) / No_Of_Nodes_In_Cluster{code}
> As the no of nodes in cluster goes increasing, size of each sstable loaded to cassandra node decreases.Such small size sstables take too much time to compact (minor compaction) as compare to relatively large size sstables.
> One solution that we have tried is to increase the buffer size while generating sstables.But as we increase the buffer size ,time taken to generate sstables increases.Is there any solution to this in existing versions or are you fixing this in future version?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira