You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Robert Coli (JIRA)" <ji...@apache.org> on 2011/01/11 22:26:46 UTC

[jira] Created: (CASSANDRA-1967) commit log replay shouldn't end with a flush

commit log replay shouldn't end with a flush
--------------------------------------------

                 Key: CASSANDRA-1967
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Robert Coli


(Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)

Currently, when a node restarts, the following sequence occurs :

a) commitlog is replayed
b) any memtables resulting from a) are flushed 
c) a new commitlog is opened, new memtables are switched in
... (other stuff happens)
d) node starts taking traffic

This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.

I propose that the sequence be :

a) commitlog is replayed
b) a new commitlog is opened, new memtables are switched in 
... (other stuff happens)
c) node starts taking traffic

Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
"
        // all old segments are recovered and deleted before CommitLog is instantiated.
        // All we need to do is create a new one.
        segments.add(new CommitLogSegment());
"

Presumably this code would have to be refactored to be aware of the currently open commitlog.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1967) commit log replay shouldn't end with a flush

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12980424#action_12980424 ] 

Jonathan Ellis commented on CASSANDRA-1967:
-------------------------------------------

The main reason to flush after replay is that it means you never have to replay the data you just did, again.

Every once in a while we have someone with excessively large memtable thresholds OOM himself during replay.  I'd actually like to flush after replaying each segment, so that as long as you can finish one segment before OOMing you'll make progress.

The problem isn't flushing per se (if you're 90% full, it's immaterial if you flush now or in two minutes of write load), but rather flushing mostly-empty sstables that still count towards compaction threshold.

Perhaps introducing a "don't bother compacting if one sstable is < X% of the next-smallest" rule would fix this.

> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>
> (Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (CASSANDRA-1967) commit log replay shouldn't end with a flush

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1967:
--------------------------------------

    Priority: Minor  (was: Major)
    
> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>            Priority: Minor
>
> (Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-1967) commit log replay shouldn't end with a flush

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422628#comment-13422628 ] 

Jonathan Ellis commented on CASSANDRA-1967:
-------------------------------------------

You're barking up the wrong tree by blaming flush.  To the degree that compaction is a problem (and on a properly tuned system it shouldn't be), we can simply extend the five minute delay on autocompaction to these flushes as well.
                
> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>            Priority: Minor
>
> (Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-1967) commit log replay shouldn't end with a flush

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999522#comment-12999522 ] 

Jonathan Ellis commented on CASSANDRA-1967:
-------------------------------------------

Right.  What we have now is sort of a compromise between "never flush at all" and "flush after each segment is replayed."

> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>
> (Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-1967) commit log replay shouldn't end with a flush

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-1967.
---------------------------------------

    Resolution: Not A Problem
    
> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>            Priority: Minor
>
> (Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-1967) commit log replay shouldn't end with a flush

Posted by "Robert Coli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422613#comment-13422613 ] 

Robert Coli commented on CASSANDRA-1967:
----------------------------------------

After making the above update, I noticed Cassandra 1.0.10 flushing after replay. Given this experience clashing with my interpretation of the code, I conjectured that the flush must be deeper in the code paths than previous versions, and deeper than I read this time. I asked about this in #cassandra.

Per jbellis in #cassandra :

1) Explicit flush at the end of replay is by design.
2) The design goal in this case is to avoid multiple replay of the same log, if node crashes before replayed data is flushed.

I don't find 2) a compelling design goal, and believe it violates the principle of least surprise. 

The purpose of the commitlog is to hold the contents of memtables. In the case of a crash, I expect the commitlog replay process to result in the same memtables that my node contained before it crashed. If it then crashes again, I expect the same memtables to be replayed again. There may be some negative externalities to this repeated replay which are not currently clear to me, but I am relatively confident that being surprised by my memtable state is not one of them.

In my opinion, avoiding compaction as a side effect of restart/replay is, in contrast, a compelling design goal.

Significant production users appear to agree in CASSANDRA-2444 ("[Twitter has] ran into many times where we do not want compaction to run right away against CFs when booting up a node.") But the resolution of CASSANDRA-2444 ("If the node needs to compact, it will do so at the first flush, which is more likely to be staggered across the cluster") does not make sense if commitlog replay always ends with a flush. The logical result of both code paths appears the same : restart has a potential to trigger immediate compaction.

In summary... +1 for re-opening this ticket and making commit log replay not end with a flush.
                
> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>            Priority: Minor
>
> (Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-1967) commit log replay shouldn't end with a flush

Posted by "Norman Maurer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999496#comment-12999496 ] 

Norman Maurer commented on CASSANDRA-1967:
------------------------------------------

Maybe related to this.. I think if we keep the flush we should remove the commitlog file (segement) as soon as it was replayed. At the moment the file get deleted after all segements was replayed. At the moment it would be possible to have 19 segements replayed then on the 20th segement it throw an exception and so no file would get deleted. Which would lead to a complete replay of the previous 19 files on next start. 

> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>
> (Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-1967) commit log replay shouldn't end with a flush

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425295#comment-13425295 ] 

Jonathan Ellis commented on CASSANDRA-1967:
-------------------------------------------

Created CASSANDRA-4474 for that approach.
                
> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>            Priority: Minor
>
> (Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-1967) commit log replay shouldn't end with a flush

Posted by "Robert Coli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411102#comment-13411102 ] 

Robert Coli commented on CASSANDRA-1967:
----------------------------------------

Relevant code section now (1.1.1 release) reads :
"
   public boolean accept(File dir, String name)
            {
                // we used to try to avoid instantiating commitlog (thus creating an empty segment ready for writes)
                // until after recover was finished.  this turns out to be fragile; it is less error-prone to go
                // ahead and allow writes before recover(), and just skip active segments when we do.
                return CommitLogSegment.possibleCommitLogFile(name) && !instance.allocator.manages(name);
            }
"

This suggests that the described pattern of an explicit flush triggering compaction is no longer a concern.

A node which has just been restarted might start compacting shortly after restart as a side effect of accepting new writes during replay. This might fill a memtable and flush, triggering compaction. Unless the heap has been made smaller between restarts, I don't believe a flush can be triggered during replay in any other way. If you have changed the size of your heap between restarts, it seems reasonable and logical to presume that replay might result in flush. 

This situation is the same as normal operation : a node being written to might flush and compact. Unless we are very very stringent about wanting to ELIMINATE ANY CHANCE that a node which has "recently" restarted might start compacting as a side effect of restart (and https://issues.apache.org/jira/browse/CASSANDRA-2444 doesn't seem interested in being that stringent...) I think we are probably at best case behavior here.

In any case, this particular ticket should probably be resolved as it seems to no longer describe the current state of code.
                
> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>            Priority: Minor
>
> (Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-1967) commit log replay shouldn't end with a flush

Posted by "Robert Coli (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Coli updated CASSANDRA-1967:
-----------------------------------

    Affects Version/s: 0.3

> commit log replay shouldn't end with a flush
> --------------------------------------------
>
>                 Key: CASSANDRA-1967
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1967
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.3
>            Reporter: Robert Coli
>
> (Apologies in advance if there is some very compelling reason to flush after replay, of which I am not currently aware. ;D)
> Currently, when a node restarts, the following sequence occurs :
> a) commitlog is replayed
> b) any memtables resulting from a) are flushed 
> c) a new commitlog is opened, new memtables are switched in
> ... (other stuff happens)
> d) node starts taking traffic
> This has side effects, perhaps most seriously the potential of triggering compaction. As a node is likely to struggle performance-wise after restarting, triggering compaction at that time seems like something we might wish to avoid.
> I propose that the sequence be :
> a) commitlog is replayed
> b) a new commitlog is opened, new memtables are switched in 
> ... (other stuff happens)
> c) node starts taking traffic
> Looking through the relevant code, the only code that appears to depend on this flush is at src/java/org/apache/cassandra/db/commitlog/CommitLog.java:112 :
> "
>         // all old segments are recovered and deleted before CommitLog is instantiated.
>         // All we need to do is create a new one.
>         segments.add(new CommitLogSegment());
> "
> Presumably this code would have to be refactored to be aware of the currently open commitlog.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.