You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2009/05/05 19:10:30 UTC
[jira] Created: (CASSANDRA-134) Support flush based on timer
interval, in addtion to size
Support flush based on timer interval, in addtion to size
---------------------------------------------------------
Key: CASSANDRA-134
URL: https://issues.apache.org/jira/browse/CASSANDRA-134
Project: Cassandra
Issue Type: Improvement
Reporter: Jun Rao
Assignee: Jun Rao
Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-134) Support flush based on timer
interval, in addtion to size
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706894#action_12706894 ]
Hudson commented on CASSANDRA-134:
----------------------------------
Integrated in Cassandra #63 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/63/])
> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
> Key: CASSANDRA-134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-134
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jun Rao
> Assignee: Jun Rao
> Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (CASSANDRA-134) Support flush based on timer
interval, in addtion to size
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis resolved CASSANDRA-134.
--------------------------------------
Resolution: Fixed
committed
> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
> Key: CASSANDRA-134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-134
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jun Rao
> Assignee: Jun Rao
> Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-134) Support flush based on timer
interval, in addtion to size
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-134:
-------------------------------------
Attachment: 134-v2.patch
make forced flush a no-op when there is nothing to flush. this allows cleaning up flush logic a bit too. (if there is nothing being flushed then ipso facto there will not be any clog segments we can reclaim.)
> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
> Key: CASSANDRA-134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-134
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jun Rao
> Assignee: Jun Rao
> Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-134) Support flush based on timer
interval, in addtion to size
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706210#action_12706210 ]
Jonathan Ellis commented on CASSANDRA-134:
------------------------------------------
oops, should have named that -v3.
> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
> Key: CASSANDRA-134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-134
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jun Rao
> Assignee: Jun Rao
> Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-134) Support flush based on timer
interval, in addtion to size
Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Rao updated CASSANDRA-134:
------------------------------
Attachment: issue134.patchv2
Upload patch v2.
1. make flushPeriod configurable per CF.
2. update storage-conf.xml with the new configure parameter.
Currently, flushPeriod only deals with application CFs, not system CFs. Among the system CFs, we probably only want to flush hints_. However, hinted handoff logic already flushes that CF periodically.
> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
> Key: CASSANDRA-134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-134
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jun Rao
> Assignee: Jun Rao
> Attachments: issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-134) Support flush based on timer
interval, in addtion to size
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707037#action_12707037 ]
Hudson commented on CASSANDRA-134:
----------------------------------
Integrated in Cassandra #67 (See [http://hudson.zones.apache.org/hudson/job/Cassandra/67/])
> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
> Key: CASSANDRA-134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-134
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jun Rao
> Assignee: Jun Rao
> Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-134) Support flush based on timer
interval, in addtion to size
Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Rao updated CASSANDRA-134:
------------------------------
Attachment: issue134.patchv1
Attach a patch that schedules a background flusher that runs periodically, if a configuration parameter is set.
> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
> Key: CASSANDRA-134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-134
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jun Rao
> Assignee: Jun Rao
> Attachments: issue134.patchv1
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-134) Support flush based on timer
interval, in addtion to size
Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706241#action_12706241 ]
Jun Rao commented on CASSANDRA-134:
-----------------------------------
The new patch looks good to me. I didn't see PeriodicFlushManager.java in this patch and assume that it is the same as v2.
> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
> Key: CASSANDRA-134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-134
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jun Rao
> Assignee: Jun Rao
> Attachments: 134-v2.patch, issue134.patchv1, issue134.patchv2
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (CASSANDRA-134) Support flush based on timer
interval, in addtion to size
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12706127#action_12706127 ]
Jonathan Ellis commented on CASSANDRA-134:
------------------------------------------
patch looks ok, but a setting that is low enough to be useful on almost-unused CFs will cause unnecessary flushes on high-traffic ones.
If I were writing it I would take the approach of recording flush time in CFS and then every so often (more frequently than FlushPeriod) scan all CFS and schedule a flush only for those who have not had one already during the last FlushPeriod minutes.
I could also go for the PeriodicFlushManager approach (always flush every FlushPeriod even if it was recently flushed) if FlushPeriod were per-CF.
Either way this is a good thing to have turned on by default so let's include it in the sample config.
> Support flush based on timer interval, in addtion to size
> ---------------------------------------------------------
>
> Key: CASSANDRA-134
> URL: https://issues.apache.org/jira/browse/CASSANDRA-134
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Jun Rao
> Assignee: Jun Rao
> Attachments: issue134.patchv1
>
>
> Today, the CFs are flushed purely based on the size of the data accumulated in Memtable. If a table has multiple CFs and some CFs are updated at a much slower pace than others, this can prevent a larger number of log files from being deleted. This is because the CF bit in the log header is only turned off when a CF is flushed. A log can't be deleted until all CF bits in the header are cleared. One solution is to add a background flusher that periodically force-flushes every CF.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.