You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/07/12 21:15:59 UTC

[jira] [Created] (CASSANDRA-2889) Avoids having replicate on write tasks stacking up at CL.ONE

Avoids having replicate on write tasks stacking up at CL.ONE
------------------------------------------------------------

                 Key: CASSANDRA-2889
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2889
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.8.0
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne


The counter design involves a read on the first replica during a write. At CL.ONE, this read is not involved in the latency of the operation (the write is acknowledged before). This means it is fairly easy to insert too quickly at CL.ONE and have the replicate on write tasks falling behind. The goal of this ticket is to protect against that.

An option could be to bound the replicate on write task queue so that write start to block once we have too much of those in the queue. Another option could be to drop the oldest tasks when they are too old, but it's probably a more unsafe option.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2889) Avoids having replicate on write tasks stacking up at CL.ONE

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083228#comment-13083228 ] 

Jonathan Ellis commented on CASSANDRA-2889:
-------------------------------------------

Is this the same as CASSANDRA-2892?

> Avoids having replicate on write tasks stacking up at CL.ONE
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2889
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2889
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>
> The counter design involves a read on the first replica during a write. At CL.ONE, this read is not involved in the latency of the operation (the write is acknowledged before). This means it is fairly easy to insert too quickly at CL.ONE and have the replicate on write tasks falling behind. The goal of this ticket is to protect against that.
> An option could be to bound the replicate on write task queue so that write start to block once we have too much of those in the queue. Another option could be to drop the oldest tasks when they are too old, but it's probably a more unsafe option.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2889) Avoids having replicate on write tasks stacking up at CL.ONE

Posted by "David Phillips (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071873#comment-13071873 ] 

David Phillips commented on CASSANDRA-2889:
-------------------------------------------

You can run this temporary fix before starting the server to bound the queue size:

{code}
private static void monkeyPatchCassandra()
{
    // hack to fix CASSANDRA-2889
    try {
        Field field = StageManager.class.getDeclaredField("stages");
        field.setAccessible(true);
        EnumMap<Stage, ThreadPoolExecutor> stages = (EnumMap<Stage, ThreadPoolExecutor>) field.get(StageManager.class);

        Stage stage = Stage.REPLICATE_ON_WRITE;
        stages.get(stage).shutdown();
        stages.put(stage, new JMXConfigurableThreadPoolExecutor(
                DatabaseDescriptor.getConcurrentReplicators(),
                StageManager.KEEPALIVE,
                TimeUnit.SECONDS,
                new LinkedBlockingQueue<Runnable>(100000),
                new NamedThreadFactory(stage.getJmxName()),
                stage.getJmxType()));
    }
    catch (Exception e) {
        throw new AssertionError(e);
    }
}
{code}

> Avoids having replicate on write tasks stacking up at CL.ONE
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2889
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2889
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>
> The counter design involves a read on the first replica during a write. At CL.ONE, this read is not involved in the latency of the operation (the write is acknowledged before). This means it is fairly easy to insert too quickly at CL.ONE and have the replicate on write tasks falling behind. The goal of this ticket is to protect against that.
> An option could be to bound the replicate on write task queue so that write start to block once we have too much of those in the queue. Another option could be to drop the oldest tasks when they are too old, but it's probably a more unsafe option.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2889) Avoids having replicate on write tasks stacking up at CL.ONE

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083830#comment-13083830 ] 

Stu Hood commented on CASSANDRA-2889:
-------------------------------------

bq.  But maybe the best solution here would be to make CL.ONE wait for the read to have happened to ack the client. 
Or perhaps some kind of proportional backpressure for writes based on the length of the replicate-on-write queue? No backpressure below X, and then linear backpressure above X?

> Avoids having replicate on write tasks stacking up at CL.ONE
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2889
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2889
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>
> The counter design involves a read on the first replica during a write. At CL.ONE, this read is not involved in the latency of the operation (the write is acknowledged before). This means it is fairly easy to insert too quickly at CL.ONE and have the replicate on write tasks falling behind. The goal of this ticket is to protect against that.
> An option could be to bound the replicate on write task queue so that write start to block once we have too much of those in the queue. Another option could be to drop the oldest tasks when they are too old, but it's probably a more unsafe option.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2889) Avoids having replicate on write tasks stacking up at CL.ONE

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263765#comment-13263765 ] 

Jonathan Ellis commented on CASSANDRA-2889:
-------------------------------------------

+1
                
> Avoids having replicate on write tasks stacking up at CL.ONE
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2889
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2889
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 1.1.1
>
>         Attachments: 2889.txt
>
>
> The counter design involves a read on the first replica during a write. At CL.ONE, this read is not involved in the latency of the operation (the write is acknowledged before). This means it is fairly easy to insert too quickly at CL.ONE and have the replicate on write tasks falling behind. The goal of this ticket is to protect against that.
> An option could be to bound the replicate on write task queue so that write start to block once we have too much of those in the queue. Another option could be to drop the oldest tasks when they are too old, but it's probably a more unsafe option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2889) Avoids having replicate on write tasks stacking up at CL.ONE

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2889:
----------------------------------------

    Attachment: 2889.txt

Forgot a bit about this issue. Attaching a simple patch to simply limit the queue size for the replicate_on_write stage. My intuition is that this is probably "good enough" so not sure if it's worth getting much more fancy.
                
> Avoids having replicate on write tasks stacking up at CL.ONE
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2889
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2889
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 1.1.1
>
>         Attachments: 2889.txt
>
>
> The counter design involves a read on the first replica during a write. At CL.ONE, this read is not involved in the latency of the operation (the write is acknowledged before). This means it is fairly easy to insert too quickly at CL.ONE and have the replicate on write tasks falling behind. The goal of this ticket is to protect against that.
> An option could be to bound the replicate on write task queue so that write start to block once we have too much of those in the queue. Another option could be to drop the oldest tasks when they are too old, but it's probably a more unsafe option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2889) Avoids having replicate on write tasks stacking up at CL.ONE

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263758#comment-13263758 ] 

Jonathan Ellis commented on CASSANDRA-2889:
-------------------------------------------

Who is going to block if ROW queue fills up?  Read stage?
                
> Avoids having replicate on write tasks stacking up at CL.ONE
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2889
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2889
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 1.1.1
>
>         Attachments: 2889.txt
>
>
> The counter design involves a read on the first replica during a write. At CL.ONE, this read is not involved in the latency of the operation (the write is acknowledged before). This means it is fairly easy to insert too quickly at CL.ONE and have the replicate on write tasks falling behind. The goal of this ticket is to protect against that.
> An option could be to bound the replicate on write task queue so that write start to block once we have too much of those in the queue. Another option could be to drop the oldest tasks when they are too old, but it's probably a more unsafe option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2889) Avoids having replicate on write tasks stacking up at CL.ONE

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263761#comment-13263761 ] 

Sylvain Lebresne commented on CASSANDRA-2889:
---------------------------------------------

No, the write stage (it's the one pushing the replicate task)
                
> Avoids having replicate on write tasks stacking up at CL.ONE
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2889
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2889
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 1.1.1
>
>         Attachments: 2889.txt
>
>
> The counter design involves a read on the first replica during a write. At CL.ONE, this read is not involved in the latency of the operation (the write is acknowledged before). This means it is fairly easy to insert too quickly at CL.ONE and have the replicate on write tasks falling behind. The goal of this ticket is to protect against that.
> An option could be to bound the replicate on write task queue so that write start to block once we have too much of those in the queue. Another option could be to drop the oldest tasks when they are too old, but it's probably a more unsafe option.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2889) Avoids having replicate on write tasks stacking up at CL.ONE

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083261#comment-13083261 ] 

Sylvain Lebresne commented on CASSANDRA-2889:
---------------------------------------------

bq. Is this the same as CASSANDRA-2892?

No, CASSANDRA-2892 was really just saying "if we have nothing to replicate to, let's not push a replication task that will do nothing anyway". It was really just a super easy optimization for the RF=1 case.

This one is because at CL.ONE, we ack the client as soon as we have written the local mutation. But the replication involves a read. So if you write very quickly at CL.ONE, you're "read to replicate" task may stack up because you're not able to do them fast enough. But maybe the best solution here would be to make CL.ONE wait for the read to have happened to ack the client. The current make for a better latency at CL.ONE, but this is kind of a lie, because the hardest part of the work (the read) till happens in the background, and it is thus easy to flood the node.

> Avoids having replicate on write tasks stacking up at CL.ONE
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-2889
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2889
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>
> The counter design involves a read on the first replica during a write. At CL.ONE, this read is not involved in the latency of the operation (the write is acknowledged before). This means it is fairly easy to insert too quickly at CL.ONE and have the replicate on write tasks falling behind. The goal of this ticket is to protect against that.
> An option could be to bound the replicate on write task queue so that write start to block once we have too much of those in the queue. Another option could be to drop the oldest tasks when they are too old, but it's probably a more unsafe option.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira