You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Chris Goffinet (Created) (JIRA)" <ji...@apache.org> on 2012/02/15 02:03:59 UTC

[jira] [Created] (CASSANDRA-3911) Basic QoS support for helping reduce OOMing cluster

Basic QoS support for helping reduce OOMing cluster
---------------------------------------------------

                 Key: CASSANDRA-3911
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3911
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Chris Goffinet
            Assignee: Chris Goffinet
            Priority: Minor
             Fix For: 1.2


We'd like to propose adding some basic QoS features to Cassandra. There can be a lot to be done here but for v1 to keep things less invasive, and still provide basics we would like to contribute the following features and see if the community thinks this is OK.

We would set these on server (cassandra.yaml). If threshold is crossed, we throw an exception up to the client.

1) Limit how many rows a client can fetch over RPC through multi-get.
2) Limit how many columns may be returned (if count > N) throw exception before processing.
3) Limit how many rows and columns a client can try to batch mutate.

This can be added in our Thrift logic, before any processing can be done. The big reason why we want to do this, is so that customers don't shoot themselves in the foot, by making mistakes or not knowing how many columns they might have returned.

We can build logic like this into a basic client, but I propose one of the features we might want in Cassandra is support for not being able to OOM a node. We've done lots of work around memtable flushing, dropping messages, etc.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3911) Basic QoS support for helping reduce OOMing cluster

Posted by "Harish Doddi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240584#comment-13240584 ] 

Harish Doddi commented on CASSANDRA-3911:
-----------------------------------------

The following are the different ways you can enable/disable QoS feature. 

a. YAML file config
    -- qos: true
    -- qos: false

b. Tuning through nodetool/JMX
   -- nodetool -h localhost enableqos
   -- nodetool -h localhost disableqos

The following are the ways to tune the qos parameters (if qos enabled)

a. YAML file config
    -- qos_read_rows: 10
    -- qos_write_rows: 10
    -- qos_read_columns: 10
    -- qos_write_columns: 10

b. Tuning through nodetool/JMX
   -- nodetool -h localhost qos read_columns 10
   -- nodetool -h localhost qos write_columns 10
   -- nodetool -h localhost qos read_rows 10
   -- nodetool -h localhost qos write_rows 10

Note that the default values for all the above options is 1000
                
> Basic QoS support for helping reduce OOMing cluster
> ---------------------------------------------------
>
>                 Key: CASSANDRA-3911
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3911
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Harish Doddi
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3911-trunk.txt
>
>
> We'd like to propose adding some basic QoS features to Cassandra. There can be a lot to be done here but for v1 to keep things less invasive, and still provide basics we would like to contribute the following features and see if the community thinks this is OK.
> We would set these on server (cassandra.yaml). If threshold is crossed, we throw an exception up to the client.
> 1) Limit how many rows a client can fetch over RPC through multi-get.
> 2) Limit how many columns may be returned (if count > N) throw exception before processing.
> 3) Limit how many rows and columns a client can try to batch mutate.
> This can be added in our Thrift logic, before any processing can be done. The big reason why we want to do this, is so that customers don't shoot themselves in the foot, by making mistakes or not knowing how many columns they might have returned.
> We can build logic like this into a basic client, but I propose one of the features we might want in Cassandra is support for not being able to OOM a node. We've done lots of work around memtable flushing, dropping messages, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3911) Basic QoS support for helping reduce OOMing cluster

Posted by "Harish Doddi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harish Doddi updated CASSANDRA-3911:
------------------------------------

    Attachment: CASSANDRA-3911-trunk.txt

Patch attached against trunk
                
> Basic QoS support for helping reduce OOMing cluster
> ---------------------------------------------------
>
>                 Key: CASSANDRA-3911
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3911
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Harish Doddi
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3911-trunk.txt
>
>
> We'd like to propose adding some basic QoS features to Cassandra. There can be a lot to be done here but for v1 to keep things less invasive, and still provide basics we would like to contribute the following features and see if the community thinks this is OK.
> We would set these on server (cassandra.yaml). If threshold is crossed, we throw an exception up to the client.
> 1) Limit how many rows a client can fetch over RPC through multi-get.
> 2) Limit how many columns may be returned (if count > N) throw exception before processing.
> 3) Limit how many rows and columns a client can try to batch mutate.
> This can be added in our Thrift logic, before any processing can be done. The big reason why we want to do this, is so that customers don't shoot themselves in the foot, by making mistakes or not knowing how many columns they might have returned.
> We can build logic like this into a basic client, but I propose one of the features we might want in Cassandra is support for not being able to OOM a node. We've done lots of work around memtable flushing, dropping messages, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3911) Basic QoS support for helping reduce OOMing cluster

Posted by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241502#comment-13241502 ] 

Brandon Williams commented on CASSANDRA-3911:
---------------------------------------------

First off, this is setting result limits, more than quality of service, since we aren't [de]prioritizing anything, so I don't think QoS is the right term to be using here.

Secondly, I don't think this patch satisfies the goal of "Limit how many columns may be returned (if count > N) throw exception before processing" since the server actually does the processing, then limits what it will return, which only saves a copy in thrift.  You can still request all 2B columns in a row and OOM the server.
                
> Basic QoS support for helping reduce OOMing cluster
> ---------------------------------------------------
>
>                 Key: CASSANDRA-3911
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3911
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Harish Doddi
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3911-trunk.txt
>
>
> We'd like to propose adding some basic QoS features to Cassandra. There can be a lot to be done here but for v1 to keep things less invasive, and still provide basics we would like to contribute the following features and see if the community thinks this is OK.
> We would set these on server (cassandra.yaml). If threshold is crossed, we throw an exception up to the client.
> 1) Limit how many rows a client can fetch over RPC through multi-get.
> 2) Limit how many columns may be returned (if count > N) throw exception before processing.
> 3) Limit how many rows and columns a client can try to batch mutate.
> This can be added in our Thrift logic, before any processing can be done. The big reason why we want to do this, is so that customers don't shoot themselves in the foot, by making mistakes or not knowing how many columns they might have returned.
> We can build logic like this into a basic client, but I propose one of the features we might want in Cassandra is support for not being able to OOM a node. We've done lots of work around memtable flushing, dropping messages, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-3911) Basic QoS support for helping reduce OOMing cluster

Posted by "Harish Doddi (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harish Doddi reassigned CASSANDRA-3911:
---------------------------------------

    Assignee: Harish Doddi  (was: Chris Goffinet)
    
> Basic QoS support for helping reduce OOMing cluster
> ---------------------------------------------------
>
>                 Key: CASSANDRA-3911
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3911
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Harish Doddi
>            Priority: Minor
>             Fix For: 1.2
>
>
> We'd like to propose adding some basic QoS features to Cassandra. There can be a lot to be done here but for v1 to keep things less invasive, and still provide basics we would like to contribute the following features and see if the community thinks this is OK.
> We would set these on server (cassandra.yaml). If threshold is crossed, we throw an exception up to the client.
> 1) Limit how many rows a client can fetch over RPC through multi-get.
> 2) Limit how many columns may be returned (if count > N) throw exception before processing.
> 3) Limit how many rows and columns a client can try to batch mutate.
> This can be added in our Thrift logic, before any processing can be done. The big reason why we want to do this, is so that customers don't shoot themselves in the foot, by making mistakes or not knowing how many columns they might have returned.
> We can build logic like this into a basic client, but I propose one of the features we might want in Cassandra is support for not being able to OOM a node. We've done lots of work around memtable flushing, dropping messages, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3911) Basic QoS support for helping reduce OOMing cluster

Posted by "Harish Doddi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241543#comment-13241543 ] 

Harish Doddi commented on CASSANDRA-3911:
-----------------------------------------

Comments from brandon through IRC 
==================================

1. Move all the qos validations to Thrift validator class possibly.

2. Have validation on cql side (QP validation)

3. Possible renaming of "QoS" to something else.
                
> Basic QoS support for helping reduce OOMing cluster
> ---------------------------------------------------
>
>                 Key: CASSANDRA-3911
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3911
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Harish Doddi
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3911-trunk.txt
>
>
> We'd like to propose adding some basic QoS features to Cassandra. There can be a lot to be done here but for v1 to keep things less invasive, and still provide basics we would like to contribute the following features and see if the community thinks this is OK.
> We would set these on server (cassandra.yaml). If threshold is crossed, we throw an exception up to the client.
> 1) Limit how many rows a client can fetch over RPC through multi-get.
> 2) Limit how many columns may be returned (if count > N) throw exception before processing.
> 3) Limit how many rows and columns a client can try to batch mutate.
> This can be added in our Thrift logic, before any processing can be done. The big reason why we want to do this, is so that customers don't shoot themselves in the foot, by making mistakes or not knowing how many columns they might have returned.
> We can build logic like this into a basic client, but I propose one of the features we might want in Cassandra is support for not being able to OOM a node. We've done lots of work around memtable flushing, dropping messages, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3911) Basic QoS support for helping reduce OOMing cluster

Posted by "Jonathan Ellis (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241507#comment-13241507 ] 

Jonathan Ellis commented on CASSANDRA-3911:
-------------------------------------------

I'm not a huge fan of this approach in general, but we're already doing something similar with thrift frame size.  So I guess I'm okay with it.

We should probably stick to just the read side though, since the write side *is* already covered by said frame size.  (I suppose you could make that a double, if MB is not granular enough.)

Finally, would prefer to just see settings for read_rows and read_columns (read_columns_per_row?), rather than a two-tier system of Master Switch *and* individual settings.
                
> Basic QoS support for helping reduce OOMing cluster
> ---------------------------------------------------
>
>                 Key: CASSANDRA-3911
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3911
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Harish Doddi
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3911-trunk.txt
>
>
> We'd like to propose adding some basic QoS features to Cassandra. There can be a lot to be done here but for v1 to keep things less invasive, and still provide basics we would like to contribute the following features and see if the community thinks this is OK.
> We would set these on server (cassandra.yaml). If threshold is crossed, we throw an exception up to the client.
> 1) Limit how many rows a client can fetch over RPC through multi-get.
> 2) Limit how many columns may be returned (if count > N) throw exception before processing.
> 3) Limit how many rows and columns a client can try to batch mutate.
> This can be added in our Thrift logic, before any processing can be done. The big reason why we want to do this, is so that customers don't shoot themselves in the foot, by making mistakes or not knowing how many columns they might have returned.
> We can build logic like this into a basic client, but I propose one of the features we might want in Cassandra is support for not being able to OOM a node. We've done lots of work around memtable flushing, dropping messages, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3911) Basic QoS support for helping reduce OOMing cluster

Posted by "Jonathan Ellis (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-3911:
--------------------------------------

    Reviewer: brandon.williams
    
> Basic QoS support for helping reduce OOMing cluster
> ---------------------------------------------------
>
>                 Key: CASSANDRA-3911
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3911
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Harish Doddi
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-3911-trunk.txt
>
>
> We'd like to propose adding some basic QoS features to Cassandra. There can be a lot to be done here but for v1 to keep things less invasive, and still provide basics we would like to contribute the following features and see if the community thinks this is OK.
> We would set these on server (cassandra.yaml). If threshold is crossed, we throw an exception up to the client.
> 1) Limit how many rows a client can fetch over RPC through multi-get.
> 2) Limit how many columns may be returned (if count > N) throw exception before processing.
> 3) Limit how many rows and columns a client can try to batch mutate.
> This can be added in our Thrift logic, before any processing can be done. The big reason why we want to do this, is so that customers don't shoot themselves in the foot, by making mistakes or not knowing how many columns they might have returned.
> We can build logic like this into a basic client, but I propose one of the features we might want in Cassandra is support for not being able to OOM a node. We've done lots of work around memtable flushing, dropping messages, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira