You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2014/05/01 15:58:18 UTC
[jira] [Comment Edited] (CASSANDRA-6572) Workload recording / playback

    [ https://issues.apache.org/jira/browse/CASSANDRA-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986591#comment-13986591 ] 

Benedict edited comment on CASSANDRA-6572 at 5/1/14 1:57 PM:
-------------------------------------------------------------

Some comments on the in progress patch:

* Don't create a string with the header and convert it to bytes - convert the string to bytes and write a normal byte-encoded header with timestamp + length as longs. This will make encoding the prepared statement parameters much easier also
* Encapsulate queryQue and logPosition into a single object, and use an atomicinteger for the position - don't synchronise, just bump the position however much you need, then write to the owned range. On flush swap the object (use an AtomicReference to track the current buffer)
* On flush, append directly from the byte buffer, don't copy it. Create a FileOutputStream and call its appropriate write method with the range that is in use
* On the read path, you're now eagerly reading _all_ files which is likely to blow up the heap; at least create an Iterator that only reads a whole file at once (preferably read a chunk of a file at a time, with a BufferedInputStream)
* On replay timing we want to target hitting the same delta from epoch for running the query, not the delta from the prior query - this should help prevent massive timing drifts
* Query frequency can be an int rather than an Integer to avoid unboxing
* I think it would be nice if we checked the actual CFMetaData for the keyspaces we're modifying in the CQLStatement, rather than doing a find within the whole string, but it's not too big a deal
* atomicCounterLock needs to be removed
* As a general rule, never copy array contents with a loop - always use System.arraycopy
* Still need to log the thread + session id as Jonathan mentioned




was (Author: benedict):
Some comments on the in progress patch:

* Don't create a string with the header and convert it to bytes - convert the string to bytes and write a normal byte-encoded header with timestamp + length as a long. This will make encoding the prepared statement parameters much easier also
* Encapsulate queryQue and logPosition into a single object, and use an atomicinteger for the position - don't synchronise, just bump the position however much you need, then write to the owned range. On flush swap the object (use an AtomicReference to track the current buffer)
* On flush, append directly from the byte buffer, don't copy it. Create a FileOutputStream and call its appropriate write method with the range that is in use
* On the read path, you're now eagerly reading _all_ files which is likely to blow up the heap; at least create an Iterator that only reads a whole file at once (preferably read a chunk of a file at a time, with a BufferedInputStream)
* On replay timing we want to target hitting the same delta from epoch for running the query, not the delta from the prior query - this should help prevent massive timing drifts
* Query frequency can be an int rather than an Integer to avoid unboxing
* I think it would be nice if we checked the actual CFMetaData for the keyspaces we're modifying in the CQLStatement, rather than doing a find within the whole string, but it's not too big a deal
* atomicCounterLock needs to be removed
* As a general rule, never copy array contents with a loop - always use System.arraycopy
* Still need to log the thread + query id as Jonathan mentioned



> Workload recording / playback
> -----------------------------
>
>                 Key: CASSANDRA-6572
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6572
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Tools
>            Reporter: Jonathan Ellis
>            Assignee: Lyuben Todorov
>             Fix For: 2.1.1
>
>         Attachments: 6572-trunk.diff
>
>
> "Write sample mode" gets us part way to testing new versions against a real world workload, but we need an easy way to test the query side as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)