You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yuki Morishita (JIRA)" <ji...@apache.org> on 2012/12/17 22:32:13 UTC

[jira] [Updated] (CASSANDRA-4894) log number of combined/merged rows during a compaction

     [ https://issues.apache.org/jira/browse/CASSANDRA-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuki Morishita updated CASSANDRA-4894:
--------------------------------------

    Attachment: 4894-1.2.txt

Patch attached to track count per number of merged rows.

For logging counters, I just append dump of counters to the end of compaction log.

{code}
 INFO [CompactionExecutor:1] 2012-12-17 15:22:53,528 CompactionTask.java (line 238) Compacted to [/Users/yuki/.ccm/1.2/node1/data/system/local/system-local-ia-18-Data.db,].  957 to 629 (~65% of original) bytes for 1 keys at 0.017139MB/s.  Time: 35ms.  Merged row stats: [0, 0, 0, 1].
{code}

'Merged row stats' part is newly added one. If there is better format, please let me know.
                
> log number of combined/merged rows during a compaction
> ------------------------------------------------------
>
>                 Key: CASSANDRA-4894
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4894
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Matthew F. Dennis
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.2.1
>
>         Attachments: 4894-1.2.txt
>
>
> we already log some details about compactions but it would be useful to know how many rows were merged (resulting in "useful" work) and how many were unique (representing "wasted work").
> the simple approach requires two additional counters (one for unique rows, one for merged rows).  As the merge join is progressing if two or more rows are combined, tick the joined counter.  If a row is simply copied tick the unique counter.
> a more complete solution would be to keep a separate count for each number of merges.  This would require number_of_files_being_merged counters.  If no rows were merged, tick counters[0], if two rows were merged tick counters[1] and so on 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira