You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2009/05/05 19:10:30 UTC

[jira] Created: (CASSANDRA-134) Support flush based on timer interval, in addtion to size

Support flush based on timer interval, in addtion to size
---------------------------------------------------------

                 Key: CASSANDRA-134
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-134
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Jun Rao
            Assignee: Jun Rao


Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-134) Support flush based on timer interval, in addtion to size

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706894#action_12706894 ] 

Hudson commented on CASSANDRA-134:
----------------------------------

Integrated in Cassandra #63 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/63/])
    

> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-134
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-134) Support flush based on timer interval, in addtion to size

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-134.
--------------------------------------

    Resolution: Fixed

committed

> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-134
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-134) Support flush based on timer interval, in addtion to size

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-134:
-------------------------------------

    Attachment: 134-v2.patch

make forced flush a no-op when there is nothing to flush.  this allows cleaning up flush logic a bit too.  (if there is nothing being flushed then ipso facto there will not be any clog segments we can reclaim.)

> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-134
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-134) Support flush based on timer interval, in addtion to size

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706210#action_12706210 ] 

Jonathan Ellis commented on CASSANDRA-134:
------------------------------------------

oops, should have named that -v3.

> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-134
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-134) Support flush based on timer interval, in addtion to size

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated CASSANDRA-134:
------------------------------

    Attachment: issue134.patchv2

Upload patch v2.
1. make flushPeriod configurable per CF.
2. update storage-conf.xml with the new configure parameter.

Currently, flushPeriod only deals with application CFs, not system CFs. Among the system CFs, we probably only want to flush hints_. However, hinted handoff logic already flushes that CF periodically.

> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-134
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-134) Support flush based on timer interval, in addtion to size

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707037#action_12707037 ] 

Hudson commented on CASSANDRA-134:
----------------------------------

Integrated in Cassandra #67 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/67/])
    

> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-134
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-134) Support flush based on timer interval, in addtion to size

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao updated CASSANDRA-134:
------------------------------

    Attachment: issue134.patchv1

Attach a patch that schedules a background flusher that runs periodically, if a configuration parameter is set.

> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-134
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue134.patchv1
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-134) Support flush based on timer interval, in addtion to size

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706241#action_12706241 ] 

Jun Rao commented on CASSANDRA-134:
-----------------------------------

The new patch looks good to me. I didn't see PeriodicFlushManager.java in this patch and assume that it is the same as v2.


> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-134
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-134) Support flush based on timer interval, in addtion to size

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706127#action_12706127 ] 

Jonathan Ellis commented on CASSANDRA-134:
------------------------------------------

patch looks ok, but a setting that is low enough to be useful on almost-unused CFs will cause unnecessary flushes on high-traffic ones.

If I were writing it I would take the approach of recording flush time in CFS and then every so often (more frequently than FlushPeriod) scan all CFS and schedule a flush only for those who have not had one already during the last FlushPeriod minutes.

I could also go for the PeriodicFlushManager approach (always flush every FlushPeriod even if it was recently flushed) if FlushPeriod were per-CF.

Either way this is a good thing to have turned on by default so let's include it in the sample config.

> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-134
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>         Attachments: issue134.patchv1
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.