You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sean Bridges (JIRA)" <ji...@apache.org> on 2010/06/14 00:43:14 UTC

[jira] Created: (CASSANDRA-1187) make the number of compaction threads configurable

make the number of compaction threads configurable
--------------------------------------------------

                 Key: CASSANDRA-1187
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1187
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.6.1
            Reporter: Sean Bridges


On our test machines, compaction is the limiting factor when we are writing to Cassandra.  It's easy to write to Cassandra faster than the single compaction thread can keep up, leading to a large number of sstables.

In one extreme example, we inserted a TB of data into a single cassandra node overnight, and ended up with 100,000 sstables, which took another two days to finish compacting.

If the number of compaction threads was configurable, we could tune cassandra to support a higher write workload.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1187) make the number of compaction threads configurable

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1187:
--------------------------------------

        Fix Version/s: 0.7
    Affects Version/s:     (was: 0.6.1)

> make the number of compaction threads configurable
> --------------------------------------------------
>
>                 Key: CASSANDRA-1187
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1187
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sean Bridges
>             Fix For: 0.7
>
>         Attachments: CASSANDRA-1187.patch
>
>
> On our test machines, compaction is the limiting factor when we are writing to Cassandra.  It's easy to write to Cassandra faster than the single compaction thread can keep up, leading to a large number of sstables.
> In one extreme example, we inserted a TB of data into a single cassandra node overnight, and ended up with 100,000 sstables, which took another two days to finish compacting.
> If the number of compaction threads was configurable, we could tune cassandra to support a higher write workload.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1187) make the number of compaction threads configurable

Posted by "Sean Bridges (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Bridges updated CASSANDRA-1187:
------------------------------------

    Attachment: CASSANDRA-1187.patch

This patch allows setting the number of threads used in compaction.

A queue is created for each column family, and only one compaction thread is allowed to compact a column family at a time.

> make the number of compaction threads configurable
> --------------------------------------------------
>
>                 Key: CASSANDRA-1187
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1187
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.6.1
>            Reporter: Sean Bridges
>         Attachments: CASSANDRA-1187.patch
>
>
> On our test machines, compaction is the limiting factor when we are writing to Cassandra.  It's easy to write to Cassandra faster than the single compaction thread can keep up, leading to a large number of sstables.
> In one extreme example, we inserted a TB of data into a single cassandra node overnight, and ended up with 100,000 sstables, which took another two days to finish compacting.
> If the number of compaction threads was configurable, we could tune cassandra to support a higher write workload.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1187) make the number of compaction threads configurable

Posted by "Sean Bridges (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Bridges updated CASSANDRA-1187:
------------------------------------

    Attachment: CASSANDRA-1187-2.patch

Is this what you were thinking of?  

The patch adds a new ConcurrentCompactedRow which can read columns from multiple SSTables in parallell.  I'm not sure how much parallelism this patch gives.  For the case where two SSTables have no rows in common, there is no benefit.

Trying to read from multiple rows in parallell seems like it would get messy.

> make the number of compaction threads configurable
> --------------------------------------------------
>
>                 Key: CASSANDRA-1187
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1187
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sean Bridges
>         Attachments: CASSANDRA-1187-2.patch, CASSANDRA-1187.patch
>
>
> On our test machines, compaction is the limiting factor when we are writing to Cassandra.  It's easy to write to Cassandra faster than the single compaction thread can keep up, leading to a large number of sstables.
> In one extreme example, we inserted a TB of data into a single cassandra node overnight, and ended up with 100,000 sstables, which took another two days to finish compacting.
> If the number of compaction threads was configurable, we could tune cassandra to support a higher write workload.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.