You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Russell Alexander Spitzer (JIRA)" <ji...@apache.org> on 2014/07/29 00:33:39 UTC

[jira] [Comment Edited] (CASSANDRA-7631) Allow Stress to write directly to SSTables

    [ https://issues.apache.org/jira/browse/CASSANDRA-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077020#comment-14077020 ] 

Russell Alexander Spitzer edited comment on CASSANDRA-7631 at 7/28/14 10:32 PM:
--------------------------------------------------------------------------------

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java wraps SSTableSimpleUnsorted Writer so I think we are ok there. The main reason I would like this as part of stress is that we already have all the data generation code written in for arbitrary schemas, Thanks [~tjake]! This way we could prepare for a test that writes a large amount of data and then runs a mixed workload much faster. 


was (Author: rspitzer):
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java wraps SSTableSimpleUnsorted Writer so I think we are ok there. The main reason I would like this as part of stress is that we already have all the data generation code backed in for arbitrary schemas, Thanks [~tjake]! This way we could prepare for a test that uses a large amount of data and a mixed workload much faster. 

> Allow Stress to write directly to SSTables
> ------------------------------------------
>
>                 Key: CASSANDRA-7631
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7631
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Russell Alexander Spitzer
>            Assignee: Russell Alexander Spitzer
>
> One common difficulty with benchmarking machines is the amount of time it takes to initially load data. For machines with a large amount of ram this becomes especially onerous because a very large amount of data needs to be placed on the machine before page-cache can be circumvented. 
> To remedy this I suggest we add a top level flag to Cassandra-Stress which would cause the tool to write directly to sstables rather than actually performing CQL inserts. Internally this would use CQLSStable writer to write directly to sstables while skipping any keys which are not owned by the node stress is running on. The same stress command run on each node in the cluster would then write unique sstables only containing data which that node is responsible for. Following this no further network IO would be required to distribute data as it would all already be correctly in place.



--
This message was sent by Atlassian JIRA
(v6.2#6252)