You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Christian Spriegel (Created) (JIRA)" <ji...@apache.org> on 2012/04/15 02:53:17 UTC

[jira] [Created] (CASSANDRA-4153) Optimize truncate when snapshots are disabled or keyspace not durable

Optimize truncate when snapshots are disabled or keyspace not durable
---------------------------------------------------------------------

                 Key: CASSANDRA-4153
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4153
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Christian Spriegel
            Priority: Minor


My goal is to make truncate to be less IO intensive so that my junit tests run faster (as already explained in CASSANDRA-3710). I think I have now a solution which does not change too much:

I created a patch that optimizes three things within truncate:
- Skip the whole Commitlog.forceNewSegment/discardCompletedSegments, if durable_writes are disabled for the keyspace.
- With CASSANDRA-3710 implemented, truncate does not need to flush memtables to disk when snapshots are disabled.
- Reduce the sleep interval

The patch works nicely for me. Applying it and disabling durable_writes/autoSnapshot increased the speed of my testsuite vastly. I hope I did not overlook something.

Let me know if my patch needs cleanup. I'd be glad to change it, if it means the patch will get accepted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-4153) Optimize truncate when snapshots are disabled or keyspace not durable

Posted by "Jonathan Ellis (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-4153.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1.1
         Reviewer: jbellis
         Assignee: Christian Spriegel

Looks good to me, committed.

(We do want the lock: we're not concerned about writes-in-progress per se (either keeping them or discarding them is fine), but we definitely want to keep them consistent with their indexes, and taking out the writeLock here is the only way I can see to do that.)
                
> Optimize truncate when snapshots are disabled or keyspace not durable
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-4153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4153
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Christian Spriegel
>            Assignee: Christian Spriegel
>            Priority: Minor
>             Fix For: 1.1.1
>
>         Attachments: OptimizeTruncate_v1.diff
>
>
> My goal is to make truncate to be less IO intensive so that my junit tests run faster (as already explained in CASSANDRA-3710). I think I have now a solution which does not change too much:
> I created a patch that optimizes three things within truncate:
> - Skip the whole Commitlog.forceNewSegment/discardCompletedSegments, if durable_writes are disabled for the keyspace.
> - With CASSANDRA-3710 implemented, truncate does not need to flush memtables to disk when snapshots are disabled.
> - Reduce the sleep interval
> The patch works nicely for me. Applying it and disabling durable_writes/autoSnapshot increased the speed of my testsuite vastly. I hope I did not overlook something.
> Let me know if my patch needs cleanup. I'd be glad to change it, if it means the patch will get accepted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4153) Optimize truncate when snapshots are disabled or keyspace not durable

Posted by "Christian Spriegel (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Spriegel updated CASSANDRA-4153:
------------------------------------------

    Attachment: OptimizeTruncate_v1.diff

Added patch
                
> Optimize truncate when snapshots are disabled or keyspace not durable
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-4153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4153
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Christian Spriegel
>            Priority: Minor
>         Attachments: OptimizeTruncate_v1.diff
>
>
> My goal is to make truncate to be less IO intensive so that my junit tests run faster (as already explained in CASSANDRA-3710). I think I have now a solution which does not change too much:
> I created a patch that optimizes three things within truncate:
> - Skip the whole Commitlog.forceNewSegment/discardCompletedSegments, if durable_writes are disabled for the keyspace.
> - With CASSANDRA-3710 implemented, truncate does not need to flush memtables to disk when snapshots are disabled.
> - Reduce the sleep interval
> The patch works nicely for me. Applying it and disabling durable_writes/autoSnapshot increased the speed of my testsuite vastly. I hope I did not overlook something.
> Let me know if my patch needs cleanup. I'd be glad to change it, if it means the patch will get accepted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4153) Optimize truncate when snapshots are disabled or keyspace not durable

Posted by "Christian Spriegel (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254305#comment-13254305 ] 

Christian Spriegel commented on CASSANDRA-4153:
-----------------------------------------------

Yes, you are right. That is why I call renewMemtable() instead. It drops the old memtable and creates a new one:
{code}
        if (DatabaseDescriptor.isAutoSnapshot())
        {
            forceBlockingFlush(); // this was the old flush
        }
        else
        {
            Table.switchLock.writeLock().lock();
            try
            {
                for (ColumnFamilyStore cfs : concatWithIndexes())
                {
                    Memtable mt = cfs.getMemtableThreadSafe();
                    if (!mt.isClean() && !mt.isFrozen())
                    {
                        mt.cfs.data.renewMemtable(); // just drop the memtable
                    }
                }
            }
            finally
            {
                Table.switchLock.writeLock().unlock();
            }
        }
{code}
This code is for flushing the memtable that shall be truncated.

Unfortunetaly that is not all. In order to be able to delete the commitlog, truncate does also flush all other memtables (Which probably has the worst impact on my testperformance). These flushes however become obsolete if the CF does not use the commitlog (the keyspace that the CF is in, to be more precise):
{code}
        KSMetaData ksm = Schema.instance.getKSMetaData(this.table.name);
        if(ksm.durableWrites)
        {
            CommitLog.instance.forceNewSegment();
            ReplayPosition position = CommitLog.instance.getContext();
            // now flush everyone else.  re-flushing ourselves is not necessary, but harmless
            for (ColumnFamilyStore cfs : ColumnFamilyStore.all())
                cfs.forceFlush(); // these flushes are obsolete if durableWrites are off
            waitForActiveFlushes();
            // if everything was clean, flush won't have called discard
            CommitLog.instance.discardCompletedSegments(metadata.cfId, position);
        }
{code}

btw: I ran my testsuite with the patched Cassandra and it did truncate properly. So the very basic stuff should work, but I am not that sure about side effects :-)

Whilst we're at it, I have other questions:
# Do you I need to call Table.switchLock.writeLock().lock() for renewMemtable()?
# Are you ok with my sleep change? I think waiting a full 100ms is not neccessary, we just want to ensure that currentTimeInMillis() advances.
                
> Optimize truncate when snapshots are disabled or keyspace not durable
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-4153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4153
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Christian Spriegel
>            Priority: Minor
>         Attachments: OptimizeTruncate_v1.diff
>
>
> My goal is to make truncate to be less IO intensive so that my junit tests run faster (as already explained in CASSANDRA-3710). I think I have now a solution which does not change too much:
> I created a patch that optimizes three things within truncate:
> - Skip the whole Commitlog.forceNewSegment/discardCompletedSegments, if durable_writes are disabled for the keyspace.
> - With CASSANDRA-3710 implemented, truncate does not need to flush memtables to disk when snapshots are disabled.
> - Reduce the sleep interval
> The patch works nicely for me. Applying it and disabling durable_writes/autoSnapshot increased the speed of my testsuite vastly. I hope I did not overlook something.
> Let me know if my patch needs cleanup. I'd be glad to change it, if it means the patch will get accepted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-4153) Optimize truncate when snapshots are disabled or keyspace not durable

Posted by "Christian Spriegel (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254305#comment-13254305 ] 

Christian Spriegel edited comment on CASSANDRA-4153 at 4/15/12 11:23 AM:
-------------------------------------------------------------------------

Yes, you are right. That is why I call renewMemtable() instead. It drops the old memtable and creates a new one:
{code}
        if (DatabaseDescriptor.isAutoSnapshot())
        {
            forceBlockingFlush(); // this was the old flush
        }
        else
        {
            Table.switchLock.writeLock().lock();
            try
            {
                for (ColumnFamilyStore cfs : concatWithIndexes())
                {
                    Memtable mt = cfs.getMemtableThreadSafe();
                    if (!mt.isClean() && !mt.isFrozen())
                    {
                        mt.cfs.data.renewMemtable(); // just drop the memtable
                    }
                }
            }
            finally
            {
                Table.switchLock.writeLock().unlock();
            }
        }
{code}
This code is for flushing the memtable that shall be truncated only.

Unfortunetaly that is not all. In order to be able to delete the commitlog, truncate does also flush all other memtables (Which probably has the worst impact on my testperformance). These flushes however become obsolete if the CF does not use the commitlog (the keyspace that the CF is in, to be more precise):
{code}
        KSMetaData ksm = Schema.instance.getKSMetaData(this.table.name);
        if(ksm.durableWrites)
        {
            CommitLog.instance.forceNewSegment();
            ReplayPosition position = CommitLog.instance.getContext();
            // now flush everyone else.  re-flushing ourselves is not necessary, but harmless
            for (ColumnFamilyStore cfs : ColumnFamilyStore.all())
                cfs.forceFlush(); // these flushes are obsolete if durableWrites are off
            waitForActiveFlushes();
            // if everything was clean, flush won't have called discard
            CommitLog.instance.discardCompletedSegments(metadata.cfId, position);
        }
{code}

btw: I ran my testsuite with the patched Cassandra and it did truncate properly. So the very basic stuff should work, but I am not that sure about side effects :-)

Whilst we're at it, I have other questions:
# Do you I need to call Table.switchLock.writeLock().lock() for renewMemtable()?
# Are you ok with my sleep change? I think waiting a full 100ms is not neccessary, we just want to ensure that currentTimeInMillis() advances.
                
      was (Author: christianmovi):
    Yes, you are right. That is why I call renewMemtable() instead. It drops the old memtable and creates a new one:
{code}
        if (DatabaseDescriptor.isAutoSnapshot())
        {
            forceBlockingFlush(); // this was the old flush
        }
        else
        {
            Table.switchLock.writeLock().lock();
            try
            {
                for (ColumnFamilyStore cfs : concatWithIndexes())
                {
                    Memtable mt = cfs.getMemtableThreadSafe();
                    if (!mt.isClean() && !mt.isFrozen())
                    {
                        mt.cfs.data.renewMemtable(); // just drop the memtable
                    }
                }
            }
            finally
            {
                Table.switchLock.writeLock().unlock();
            }
        }
{code}
This code is for flushing the memtable that shall be truncated.

Unfortunetaly that is not all. In order to be able to delete the commitlog, truncate does also flush all other memtables (Which probably has the worst impact on my testperformance). These flushes however become obsolete if the CF does not use the commitlog (the keyspace that the CF is in, to be more precise):
{code}
        KSMetaData ksm = Schema.instance.getKSMetaData(this.table.name);
        if(ksm.durableWrites)
        {
            CommitLog.instance.forceNewSegment();
            ReplayPosition position = CommitLog.instance.getContext();
            // now flush everyone else.  re-flushing ourselves is not necessary, but harmless
            for (ColumnFamilyStore cfs : ColumnFamilyStore.all())
                cfs.forceFlush(); // these flushes are obsolete if durableWrites are off
            waitForActiveFlushes();
            // if everything was clean, flush won't have called discard
            CommitLog.instance.discardCompletedSegments(metadata.cfId, position);
        }
{code}

btw: I ran my testsuite with the patched Cassandra and it did truncate properly. So the very basic stuff should work, but I am not that sure about side effects :-)

Whilst we're at it, I have other questions:
# Do you I need to call Table.switchLock.writeLock().lock() for renewMemtable()?
# Are you ok with my sleep change? I think waiting a full 100ms is not neccessary, we just want to ensure that currentTimeInMillis() advances.
                  
> Optimize truncate when snapshots are disabled or keyspace not durable
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-4153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4153
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Christian Spriegel
>            Priority: Minor
>         Attachments: OptimizeTruncate_v1.diff
>
>
> My goal is to make truncate to be less IO intensive so that my junit tests run faster (as already explained in CASSANDRA-3710). I think I have now a solution which does not change too much:
> I created a patch that optimizes three things within truncate:
> - Skip the whole Commitlog.forceNewSegment/discardCompletedSegments, if durable_writes are disabled for the keyspace.
> - With CASSANDRA-3710 implemented, truncate does not need to flush memtables to disk when snapshots are disabled.
> - Reduce the sleep interval
> The patch works nicely for me. Applying it and disabling durable_writes/autoSnapshot increased the speed of my testsuite vastly. I hope I did not overlook something.
> Let me know if my patch needs cleanup. I'd be glad to change it, if it means the patch will get accepted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4153) Optimize truncate when snapshots are disabled or keyspace not durable

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13254250#comment-13254250 ] 

Jonathan Ellis commented on CASSANDRA-4153:
-------------------------------------------

bq. truncate does not need to flush memtables to disk when snapshots are disabled

It still needs to clear out the memtables somehow though, or truncate won't actually discard all the data it's expected to.
                
> Optimize truncate when snapshots are disabled or keyspace not durable
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-4153
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4153
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Christian Spriegel
>            Priority: Minor
>         Attachments: OptimizeTruncate_v1.diff
>
>
> My goal is to make truncate to be less IO intensive so that my junit tests run faster (as already explained in CASSANDRA-3710). I think I have now a solution which does not change too much:
> I created a patch that optimizes three things within truncate:
> - Skip the whole Commitlog.forceNewSegment/discardCompletedSegments, if durable_writes are disabled for the keyspace.
> - With CASSANDRA-3710 implemented, truncate does not need to flush memtables to disk when snapshots are disabled.
> - Reduce the sleep interval
> The patch works nicely for me. Applying it and disabling durable_writes/autoSnapshot increased the speed of my testsuite vastly. I hope I did not overlook something.
> Let me know if my patch needs cleanup. I'd be glad to change it, if it means the patch will get accepted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira