You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/01/11 17:54:54 UTC

[jira] Created: (CASSANDRA-685) add backpressure to StorageProxy

add backpressure to StorageProxy
--------------------------------

                 Key: CASSANDRA-685
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Jonathan Ellis
            Assignee: Jonathan Ellis
            Priority: Minor
             Fix For: 0.9


Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment:     (was: 0001-impose-stage-queue-limit-of-4096-operations-which-shou.txt)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840315#action_12840315 ] 

Jonathan Ellis commented on CASSANDRA-685:
------------------------------------------

yes, capping RM stage is part of this

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Comment: was deleted

(was: new version: creates a write queue for streaming mode (turns out that needs it after all, barely))

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800323#action_12800323 ] 

Jonathan Ellis commented on CASSANDRA-685:
------------------------------------------

new version: creates a write queue for streaming mode (turns out that needs it after all, barely)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment:     (was: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802445#action_12802445 ] 

Jonathan Ellis commented on CASSANDRA-685:
------------------------------------------

Explained backpressure motivation on IRC:

under heavy load each node A will have 2 kinds of traffic to each other node B.  A will have new commands it needs to send to B, and it will also have replies to commands that B sent to it.  If B is overloaded, you need to be able to backpressure new commands to it, while allowing replies to it to go through.  Replies create virtually no extra load and it makes the clients much happier.


> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment: 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
                0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803740#action_12803740 ] 

Jonathan Ellis commented on CASSANDRA-685:
------------------------------------------

Another thought: there is tension between "I want to make the client slow down, so it stops making things worse by attempting more operations against an almost-overloaded node" and "if only one node is overloaded for whatever reason (maybe it is doing compactions and handling bootstrap simultaneously for instance), I want to be able to continue if my ConsistencyLevel and ReplicationFactor allow it."

Also: reads are different from writes; a read against an overloaded node may just be dropped; a write should be HH'd.

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Comment: was deleted

(was: attached new patches w/ NPE fix)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment: 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
                0001-impose-stage-queue-limit-of-4096-operations-which-shou.txt

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-4096-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment: 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
                0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Comment: was deleted

(was: Receiving the follow traceback with these patches:

ERROR - Internal error processing batch_insert
java.lang.NullPointerException
        at org.apache.cassandra.net.TcpConnection.write(TcpConnection.java:141)
        at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:350)
        at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:208)
        at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:495)
        at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:427)
        at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1113)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:842)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800275#action_12800275 ] 

Brandon Williams commented on CASSANDRA-685:
--------------------------------------------

Now I receive this:

ERROR - Internal error processing batch_insert
java.lang.IllegalStateException: Queue full
        at java.util.AbstractQueue.add(AbstractQueue.java:99)
        at org.apache.cassandra.net.TcpConnection.write(TcpConnection.java:143)
        at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:350)
        at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:208)
        at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:495)
        at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:427)
        at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1113)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:842)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)


> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment:     (was: 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment:     (was: 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment:     (was: 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment:     (was: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800272#action_12800272 ] 

Jonathan Ellis commented on CASSANDRA-685:
------------------------------------------

attached new patches w/ NPE fix

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment: 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
                0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803613#action_12803613 ] 

Jonathan Ellis commented on CASSANDRA-685:
------------------------------------------

We either need to (a) have a separate deserialization queue for "reply" traffic (we could use one of the "header" bits that isn't part of the Message proper to control this), or (b) drop messages for overloaded states on the floor so the deserializer doesn't overload, or (c) we need to give up the command/reply division entirely.

Alternatively, option (b) reminds me that instead of "backpressure" we could just "timeoutpressure," where instead of overloaded stages backpressuring message deserializer backpressuring socket reads, the deserializer can just discard messages the system is too busy to handle.  The downside is, it will take an extra rpc_timeout latency before the clients start to get timeouts.  The upside is, as things unclog the messages that get processed will be fresh ones, so we are less likely to waste work processing messages that the client isn't even waiting for anymore.

Also, I'd like to dynamically adjust stage capacity based on the amount of work that gets processed, rather than have a fixed value that has to be manually tuned.  Not sure what that would look like -- none of the Java BlockingQueue classes have adjustable capacity post-construction.  But, stage enqueueing is only done in one place (by the deserializer executor) so we can one-off something if we have to.

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Comment: was deleted

(was: Now I receive this:

ERROR - Internal error processing batch_insert
java.lang.IllegalStateException: Queue full
        at java.util.AbstractQueue.add(AbstractQueue.java:99)
        at org.apache.cassandra.net.TcpConnection.write(TcpConnection.java:143)
        at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:350)
        at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:208)
        at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:495)
        at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:427)
        at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1113)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:842)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.6
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment: 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
                0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834548#action_12834548 ] 

Jonathan Ellis commented on CASSANDRA-685:
------------------------------------------

note to self: cap CFS.flushWriter queue

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment:     (was: 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798923#action_12798923 ] 

Jonathan Ellis commented on CASSANDRA-685:
------------------------------------------

the first patch makes it so the target node won't OOM and will instead backpressure the control node.

the second makes it so the control node will notice the backpressure and pass it on (via timeoutexception) to the thrift client, rather than OOMing itself from continuing to enqueue messages to an unresponsive target.

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-4096-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Fix Version/s:     (was: 0.6)
                   0.7

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Stu Hood (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840288#action_12840288 ] 

Stu Hood commented on CASSANDRA-685:
------------------------------------

> Would this back-pressure apply to commit log replay?
It seems that one of the reasons why commit logs get large in the first place is because we don't have backpressure, so I think this should fix the problem indirectly by preventing huge commit logs.

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Ryan King (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840272#action_12840272 ] 

Ryan King commented on CASSANDRA-685:
-------------------------------------

Would this back-pressure apply to commit log replay? We recently ran into a situation where a node with very large commit logs managed to OOM itself by backing up the row mutation stage queue.

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800006#action_12800006 ] 

Brandon Williams commented on CASSANDRA-685:
--------------------------------------------

Receiving the follow traceback with these patches:

ERROR - Internal error processing batch_insert
java.lang.NullPointerException
        at org.apache.cassandra.net.TcpConnection.write(TcpConnection.java:141)
        at org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:350)
        at org.apache.cassandra.service.StorageProxy.mutateBlocking(StorageProxy.java:208)
        at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:495)
        at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:427)
        at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1113)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:842)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)


> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>         Attachments: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt, 0002-make-TcpConnection.write-throw-WriteEnqueueException-i.txt
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-685) add backpressure to StorageProxy

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-685:
-------------------------------------

    Attachment:     (was: 0001-impose-stage-queue-limit-of-2048-operations-which-shou.txt)

> add backpressure to StorageProxy
> --------------------------------
>
>                 Key: CASSANDRA-685
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-685
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.9
>
>
> Now that we have CASSANDRA-401 and CASSANDRA-488 there is one last piece: we need to stop the target node from pulling mutations out of MessagingService as fast as it can only to take up space in the mutation queue and eventually fill up memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.