You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Brian O'Neill <bo...@alumni.brown.edu> on 2013/12/18 03:41:03 UTC

Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

We are seeking to replace Acunu in our technology stack / platform.  It is
the only component in our stack that is not open source.

In preparation, over the last few weeks I’ve migrated Virgil to CQL.   The
vision is that Virgil could receive a REST request to upsert/delete data
(hierarchical JSON to support collections).  Virgil would lookup the
dimensions/aggregations for that table, add the key to the pertinent
dimensional tables (e.g. DISTINCT), incorporate values into aggregations
(e.g. SUMs) and increment/decrement relevant counters (COUNT).  (using
additional CF’s)

This seems straight-forward, but appears to require a read-before-write.
 (e.g. read the current value of a SUM, incorporate the new value, then use
the lightweight transactions of C* 2.0 to conditionally update the value.)

Before I go down this path, I was wondering if anyone is designing/working
on the same, perhaps at a lower level?  (CQL?)

Is there any intent to support aggregations/filters (COUNT, SUM, DISTINCT,
etc) at the CQL level?  If so, is there a preliminary design?

I can see a lower-level approach, which would leverage the commit logs (and
mem/sstables) and perform the aggregation during read-operations (and
flush/compaction).

thoughts?  i'm open to all ideas.

-brian
-- 
Brian ONeill
Chief Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42

Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

Posted by Brian O'Neill <bo...@alumni.brown.edu>.
Thanks for the pointer Alain.

At a quick glance, it looks like people are looking for query time
filtering/aggregation, which will suffice for small data sets.  Hopefully we
might be able to extend that to perform pre-computations as well. (which
would support much larger data sets / volumes)

I¹ll continue the discussion on the issue.

thanks again,
brian


---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Alain RODRIGUEZ <ar...@gmail.com>
Reply-To:  <us...@cassandra.apache.org>
Date:  Wednesday, December 18, 2013 at 5:13 AM
To:  <us...@cassandra.apache.org>
Cc:  "dev@cassandra.apache.org" <de...@cassandra.apache.org>
Subject:  Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

Hi, this would indeed be much appreciated by a lot of people.

There is this issue, existing about this subject

 https://issues.apache.org/jira/browse/CASSANDRA-4914

Maybe could you help commiters there.

Hope this will be usefull to you.

Please let us know when you find a way to do these operations.

Cheers.


2013/12/18 Brian O'Neill <bo...@alumni.brown.edu>
> We are seeking to replace Acunu in our technology stack / platform.  It is the
> only component in our stack that is not open source.
> 
> In preparation, over the last few weeks I¹ve migrated Virgil to CQL.   The
> vision is that Virgil could receive a REST request to upsert/delete data
> (hierarchical JSON to support collections).  Virgil would lookup the
> dimensions/aggregations for that table, add the key to the pertinent
> dimensional tables (e.g. DISTINCT), incorporate values into aggregations (e.g.
> SUMs) and increment/decrement relevant counters (COUNT).  (using additional
> CF¹s)
> 
> This seems straight-forward, but appears to require a read-before-write.
> (e.g. read the current value of a SUM, incorporate the new value, then use the
> lightweight transactions of C* 2.0 to conditionally update the value.)
> 
> Before I go down this path, I was wondering if anyone is designing/working on
> the same, perhaps at a lower level?  (CQL?)
> 
> Is there any intent to support aggregations/filters (COUNT, SUM, DISTINCT,
> etc) at the CQL level?  If so, is there a preliminary design?
> 
> I can see a lower-level approach, which would leverage the commit logs (and
> mem/sstables) and perform the aggregation during read-operations (and
> flush/compaction).
> 
> thoughts?  i'm open to all ideas.
> 
> -brian
> -- 
> Brian ONeill
> Chief Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024 <tel:215.588.6024>
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42




Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

Posted by Brian O'Neill <bo...@alumni.brown.edu>.
Thanks for the pointer Alain.

At a quick glance, it looks like people are looking for query time
filtering/aggregation, which will suffice for small data sets.  Hopefully we
might be able to extend that to perform pre-computations as well. (which
would support much larger data sets / volumes)

I¹ll continue the discussion on the issue.

thanks again,
brian


---
Brian O'Neill
Chief Architect
Health Market Science
The Science of Better Results
2700 Horizon Drive € King of Prussia, PA € 19406
M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>   €
healthmarketscience.com


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Alain RODRIGUEZ <ar...@gmail.com>
Reply-To:  <us...@cassandra.apache.org>
Date:  Wednesday, December 18, 2013 at 5:13 AM
To:  <us...@cassandra.apache.org>
Cc:  "dev@cassandra.apache.org" <de...@cassandra.apache.org>
Subject:  Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

Hi, this would indeed be much appreciated by a lot of people.

There is this issue, existing about this subject

 https://issues.apache.org/jira/browse/CASSANDRA-4914

Maybe could you help commiters there.

Hope this will be usefull to you.

Please let us know when you find a way to do these operations.

Cheers.


2013/12/18 Brian O'Neill <bo...@alumni.brown.edu>
> We are seeking to replace Acunu in our technology stack / platform.  It is the
> only component in our stack that is not open source.
> 
> In preparation, over the last few weeks I¹ve migrated Virgil to CQL.   The
> vision is that Virgil could receive a REST request to upsert/delete data
> (hierarchical JSON to support collections).  Virgil would lookup the
> dimensions/aggregations for that table, add the key to the pertinent
> dimensional tables (e.g. DISTINCT), incorporate values into aggregations (e.g.
> SUMs) and increment/decrement relevant counters (COUNT).  (using additional
> CF¹s)
> 
> This seems straight-forward, but appears to require a read-before-write.
> (e.g. read the current value of a SUM, incorporate the new value, then use the
> lightweight transactions of C* 2.0 to conditionally update the value.)
> 
> Before I go down this path, I was wondering if anyone is designing/working on
> the same, perhaps at a lower level?  (CQL?)
> 
> Is there any intent to support aggregations/filters (COUNT, SUM, DISTINCT,
> etc) at the CQL level?  If so, is there a preliminary design?
> 
> I can see a lower-level approach, which would leverage the commit logs (and
> mem/sstables) and perform the aggregation during read-operations (and
> flush/compaction).
> 
> thoughts?  i'm open to all ideas.
> 
> -brian
> -- 
> Brian ONeill
> Chief Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024 <tel:215.588.6024>
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42




Re: Dimensional SUM, COUNT, & DISTINCT in C* (replacing Acunu)

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi, this would indeed be much appreciated by a lot of people.

There is this issue, existing about this subject

 https://issues.apache.org/jira/browse/CASSANDRA-4914

Maybe could you help commiters there.

Hope this will be usefull to you.

Please let us know when you find a way to do these operations.

Cheers.


2013/12/18 Brian O'Neill <bo...@alumni.brown.edu>

> We are seeking to replace Acunu in our technology stack / platform.  It is
> the only component in our stack that is not open source.
>
> In preparation, over the last few weeks I’ve migrated Virgil to CQL.   The
> vision is that Virgil could receive a REST request to upsert/delete data
> (hierarchical JSON to support collections).  Virgil would lookup the
> dimensions/aggregations for that table, add the key to the pertinent
> dimensional tables (e.g. DISTINCT), incorporate values into aggregations
> (e.g. SUMs) and increment/decrement relevant counters (COUNT).  (using
> additional CF’s)
>
> This seems straight-forward, but appears to require a read-before-write.
>  (e.g. read the current value of a SUM, incorporate the new value, then use
> the lightweight transactions of C* 2.0 to conditionally update the value.)
>
> Before I go down this path, I was wondering if anyone is designing/working
> on the same, perhaps at a lower level?  (CQL?)
>
> Is there any intent to support aggregations/filters (COUNT, SUM, DISTINCT,
> etc) at the CQL level?  If so, is there a preliminary design?
>
> I can see a lower-level approach, which would leverage the commit logs
> (and mem/sstables) and perform the aggregation during read-operations (and
> flush/compaction).
>
> thoughts?  i'm open to all ideas.
>
> -brian
> --
> Brian ONeill
> Chief Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42
>