You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Tomas Salfischberger (JIRA)" <ji...@apache.org> on 2011/06/27 10:48:51 UTC

[jira] [Created] (CASSANDRA-2830) Allow summing of counter columns in CQL

Allow summing of counter columns in CQL
---------------------------------------

                 Key: CASSANDRA-2830
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2830
             Project: Cassandra
          Issue Type: New Feature
          Components: API
            Reporter: Tomas Salfischberger


CQL could be extended with a method to calculate the sum of a set of counter columns. This avoids transferring a long list of counter columns to be summed by the client, while the server could calculate the total and instead only transfer that result. My proposal for the syntax (based on the COUNT() suggestion in the comments of CASSANDRA-1704):
{code}SELECT SUM(<columnFrom>..<columnTo>) FROM <CF> WHERE ...{code}

The simplest approach would be to only allow summing of counters under the same key, thus a query with a WHERE part that specifies multiple keys would return 1 result per key. This avoids summing values from different nodes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2830) Allow summing of counter columns in CQL

Posted by "Tomas Salfischberger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055448#comment-13055448 ] 

Tomas Salfischberger commented on CASSANDRA-2830:
-------------------------------------------------

Good point, you could have a generic function implementation that is allowed to do whatever it wants with an Iterator over the counter values and return a single value. That would support easy implementation of SUM, MIN, MAX, AVG, but also things like standard deviation and variance when the need arises.

> Allow summing of counter columns in CQL
> ---------------------------------------
>
>                 Key: CASSANDRA-2830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Tomas Salfischberger
>            Priority: Minor
>              Labels: CQL
>
> CQL could be extended with a method to calculate the sum of a set of counter columns. This avoids transferring a long list of counter columns to be summed by the client, while the server could calculate the total and instead only transfer that result. My proposal for the syntax (based on the COUNT() suggestion in the comments of CASSANDRA-1704):
> {code}SELECT SUM(<columnFrom>..<columnTo>) FROM <CF> WHERE ...{code}
> The simplest approach would be to only allow summing of counters under the same key, thus a query with a WHERE part that specifies multiple keys would return 1 result per key. This avoids summing values from different nodes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2830) Allow summing of counter columns in CQL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260613#comment-13260613 ] 

Jonathan Ellis commented on CASSANDRA-2830:
-------------------------------------------

My problem with this is that I don't have a clear line in my head between supporting {{WHERE KEY IN (...)}}, which is clearly bounded in terms of space and time required, and any arbitrary WHERE clause (indexed or even seq scan).

Kind of inclined to say "we should leave aggregation to Hive."
                
> Allow summing of counter columns in CQL
> ---------------------------------------
>
>                 Key: CASSANDRA-2830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Tomas Salfischberger
>            Priority: Minor
>              Labels: CQL
>
> CQL could be extended with a method to calculate the sum of a set of counter columns. This avoids transferring a long list of counter columns to be summed by the client, while the server could calculate the total and instead only transfer that result. My proposal for the syntax (based on the COUNT() suggestion in the comments of CASSANDRA-1704):
> {code}SELECT SUM(<columnFrom>..<columnTo>) FROM <CF> WHERE ...{code}
> The simplest approach would be to only allow summing of counters under the same key, thus a query with a WHERE part that specifies multiple keys would return 1 result per key. This avoids summing values from different nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2830) Allow summing of counter columns in CQL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055445#comment-13055445 ] 

Sylvain Lebresne commented on CASSANDRA-2830:
---------------------------------------------

It's not a crazy idea. Though we should at the very least make it generic enough to have AVG(), MIN(), MAX() and such.

> Allow summing of counter columns in CQL
> ---------------------------------------
>
>                 Key: CASSANDRA-2830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Tomas Salfischberger
>            Priority: Minor
>              Labels: CQL
>
> CQL could be extended with a method to calculate the sum of a set of counter columns. This avoids transferring a long list of counter columns to be summed by the client, while the server could calculate the total and instead only transfer that result. My proposal for the syntax (based on the COUNT() suggestion in the comments of CASSANDRA-1704):
> {code}SELECT SUM(<columnFrom>..<columnTo>) FROM <CF> WHERE ...{code}
> The simplest approach would be to only allow summing of counters under the same key, thus a query with a WHERE part that specifies multiple keys would return 1 result per key. This avoids summing values from different nodes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2830) Allow summing of counter columns in CQL

Posted by "Tomas Salfischberger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055798#comment-13055798 ] 

Tomas Salfischberger commented on CASSANDRA-2830:
-------------------------------------------------

I did a really crude trail of adding the SUM keyword to the Cql.g definition and handle it in the same way as COUNT is done. It works, I can SUM my counter columns using CQL, but this is of course not a great way to implement it.

Any pointers for the right way to implement this? How do we want to define generic aggregate functions in the grammar and what would be the best way to handle them?

> Allow summing of counter columns in CQL
> ---------------------------------------
>
>                 Key: CASSANDRA-2830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Tomas Salfischberger
>            Priority: Minor
>              Labels: CQL
>
> CQL could be extended with a method to calculate the sum of a set of counter columns. This avoids transferring a long list of counter columns to be summed by the client, while the server could calculate the total and instead only transfer that result. My proposal for the syntax (based on the COUNT() suggestion in the comments of CASSANDRA-1704):
> {code}SELECT SUM(<columnFrom>..<columnTo>) FROM <CF> WHERE ...{code}
> The simplest approach would be to only allow summing of counters under the same key, thus a query with a WHERE part that specifies multiple keys would return 1 result per key. This avoids summing values from different nodes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2830) Allow summing of counter columns in CQL

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2830:
----------------------------------------

    Priority: Minor  (was: Major)

> Allow summing of counter columns in CQL
> ---------------------------------------
>
>                 Key: CASSANDRA-2830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Tomas Salfischberger
>            Priority: Minor
>              Labels: CQL
>
> CQL could be extended with a method to calculate the sum of a set of counter columns. This avoids transferring a long list of counter columns to be summed by the client, while the server could calculate the total and instead only transfer that result. My proposal for the syntax (based on the COUNT() suggestion in the comments of CASSANDRA-1704):
> {code}SELECT SUM(<columnFrom>..<columnTo>) FROM <CF> WHERE ...{code}
> The simplest approach would be to only allow summing of counters under the same key, thus a query with a WHERE part that specifies multiple keys would return 1 result per key. This avoids summing values from different nodes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2830) Allow summing of counter columns in CQL

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055802#comment-13055802 ] 

Jonathan Ellis commented on CASSANDRA-2830:
-------------------------------------------

Aggregate functions really needs to wait for CASSANDRA-2474 and its treat-a-row-as-a-table feature.

> Allow summing of counter columns in CQL
> ---------------------------------------
>
>                 Key: CASSANDRA-2830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Tomas Salfischberger
>            Priority: Minor
>              Labels: CQL
>
> CQL could be extended with a method to calculate the sum of a set of counter columns. This avoids transferring a long list of counter columns to be summed by the client, while the server could calculate the total and instead only transfer that result. My proposal for the syntax (based on the COUNT() suggestion in the comments of CASSANDRA-1704):
> {code}SELECT SUM(<columnFrom>..<columnTo>) FROM <CF> WHERE ...{code}
> The simplest approach would be to only allow summing of counters under the same key, thus a query with a WHERE part that specifies multiple keys would return 1 result per key. This avoids summing values from different nodes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2830) Allow summing of counter columns in CQL

Posted by "Tomas Salfischberger (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055798#comment-13055798 ] 

Tomas Salfischberger edited comment on CASSANDRA-2830 at 6/27/11 10:06 PM:
---------------------------------------------------------------------------

I did a really crude trial of adding the SUM keyword to the Cql.g definition and handle it in the same way as COUNT is done. It works, I can SUM my counter columns using CQL, but this is of course not a great way to implement it.

Any pointers for the right way to implement this? How do we want to define generic aggregate functions in the grammar and what would be the best way to handle them?

      was (Author: t0mas):
    I did a really crude trail of adding the SUM keyword to the Cql.g definition and handle it in the same way as COUNT is done. It works, I can SUM my counter columns using CQL, but this is of course not a great way to implement it.

Any pointers for the right way to implement this? How do we want to define generic aggregate functions in the grammar and what would be the best way to handle them?
  
> Allow summing of counter columns in CQL
> ---------------------------------------
>
>                 Key: CASSANDRA-2830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Tomas Salfischberger
>            Priority: Minor
>              Labels: CQL
>
> CQL could be extended with a method to calculate the sum of a set of counter columns. This avoids transferring a long list of counter columns to be summed by the client, while the server could calculate the total and instead only transfer that result. My proposal for the syntax (based on the COUNT() suggestion in the comments of CASSANDRA-1704):
> {code}SELECT SUM(<columnFrom>..<columnTo>) FROM <CF> WHERE ...{code}
> The simplest approach would be to only allow summing of counters under the same key, thus a query with a WHERE part that specifies multiple keys would return 1 result per key. This avoids summing values from different nodes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira