You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Brian O'Neill <bo...@alumni.brown.edu> on 2012/10/01 17:22:01 UTC

Re: 1000's of column families

Dean,

We have the same question...

We have thousands of separate feeds of data as well (20,000+).  To
date, we've been using a CF per feed strategy, but as we scale this
thing out to accommodate all of those feeds, we're trying to figure
out if we're going to blow out the memory.

The initial documentation for heap sizing had column families in the equation:
http://www.datastax.com/docs/0.7/operations/tuning#heap-sizing

But in the more recent documentation, it looks like they removed the
column family variable with the introduction of the universal
key_cache_size.
http://www.datastax.com/docs/1.0/operations/tuning#tuning-java-heap-size

We haven't committed either way yet, but given Ed Anuff's presentation
on virtual keyspaces, we were leaning towards a single column family
approach:
http://blog.apigee.com/detail/building_a_mobile_data_platform_with_cassandra_-_apigee_under_the_hood/?

Definitely let us know what you decide.

-brian

On Fri, Sep 28, 2012 at 11:48 AM, Flavio Baronti
<f....@list-group.com> wrote:
> We had some serious trouble with dynamically adding CFs, although last time
> we tried we were using version 0.7, so maybe
> that's not an issue any more.
> Our problems were two:
> - You are (were?) not supposed to add CFs concurrently. Since we had more
> servers talking to the same Cassandra cluster,
> we had to use distributed locks (Hazelcast) to avoid concurrency.
> - You must be very careful to add new CFs to different Cassandra nodes. If
> you do that fast enough, and the clocks of
> the two servers are skewed, you will severely compromise your schema
> (Cassandra will not understand in which order the
> updates must be applied).
>
> As I said, this applied to version 0.7, maybe current versions solved these
> problems.
>
> Flavio
>
>
> Il 2012/09/27 16:11 PM, Hiller, Dean ha scritto:
>> We have 1000's of different building devices and we stream data from these
> devices.  The format and data from each one varies so one device has temperature
> at timeX with some other variables, another device has CO2 percentage and other
> variables.  Every device is unique and streams it's own data.  We dynamically
> discover devices and register them.  Basically, one CF or table per thing really
> makes sense in this environment.  While we could try to find out which devices
> "are" similar, this would really be a pain and some devices add some new
> variable into the equation.  NOT only that but researchers can register new
> datasets and upload them as well and each dataset they have they do NOT want to
> share with other researches necessarily so we have security groups and each CF
> belongs to security groups.  We dynamically create CF's on the fly as people
> register new datasets.
>>
>> On top of that, when the data sets get too large, we probably want to
> partition a single CF into time partitions.  We could create one CF and put all
> the data and have a partition per device, but then a time partition will contain
> "multiple" devices of data meaning we need to shrink our time partition size
> where if we have CF per device, the time partition can be larger as it is only
> for that one device.
>>
>> THEN, on top of that, we have a meta CF for these devices so some people want
> to query for streams that match criteria AND which returns a CF name and they
> query that CF name so we almost need a query with variables like select cfName
> from Meta where x = y and then select * from cfName where xxxxx. Which we can do
> today.
>>
>> Dean
>>
>> From: Marcelo Elias Del Valle <mv...@gmail.com>>
>> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
> <us...@cassandra.apache.org>>
>> Date: Thursday, September 27, 2012 8:01 AM
>> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
> <us...@cassandra.apache.org>>
>> Subject: Re: 1000's of column families
>>
>> Out of curiosity, is it really necessary to have that amount of CFs?
>> I am probably still used to relational databases, where you would use a new
> table just in case you need to store different kinds of data. As Cassandra
> stores anything in each CF, it might probably make sense to have a lot of CFs to
> store your data...
>> But why wouldn't you use a single CF with partitions in these case? Wouldn't
> it be the same thing? I am asking because I might learn a new modeling technique
> with the answer.
>>
>> []s
>>
>> 2012/9/26 Hiller, Dean <De...@nrel.gov>>
>> We are streaming data with 1 stream per 1 CF and we have 1000's of CF.  When
> using the tools they are all geared to analyzing ONE column family at a time :(.
> If I remember correctly, Cassandra supports as many CF's as you want, correct?
> Even though I am going to have tons of funs with limitations on the tools,
> correct?
>>
>> (I may end up wrapping the node tool with my own aggregate calls if needed to
> sum up multiple column families and such).
>>
>> Thanks,
>> Dean
>>
>>
>>
>> --
>> Marcelo Elias Del Valle
>> http://mvalle.com - @mvallebr
>>
>
>



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
Apache Cassandra MVP
mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42

Re: 1000's of column families

Posted by Jeremy Hanna <je...@gmail.com>.

It's always had data locality (since hadoop support was added in 0.6).

You don't need to specify a partition, you specify the input predicate with ConfigHelper or the cassandra.input.predicate property.

On Oct 2, 2012, at 2:26 PM, "Hiller, Dean" <De...@nrel.gov> wrote:

> So you're saying that you can access the primary index with a key range, but to access the secondary index, you first need to get all keys and follow up with a multiget, which would use the secondary index to speed the lookup of the matching rows?
> 
> Yes, that is how I "believe" it works.  I am by no means an expert.
> 
> I also wanted to fire off a MR to process matching rows in the "virtual" CF ideally running on the nodes where it reads data in.  In 0.7, I thought the M/R jobs did not run locally with the data like hadoop does???  Anyone know if that is still true or does it run locally to the data now?
> 
> Thanks,
> Dean
> 
> From: Ben Hood <0x...@gmail.com>>
> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Date: Tuesday, October 2, 2012 1:01 PM
> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
> Subject: Re: 1000's of column families
> 
> Dean,
> 
> On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:
> 
> Because the data for an index is not all together(ie. Need a multi get to get the data). It is not contiguous.
> 
> The prefix in a partition they keep the data so all data for a prefix from what I understand is contiguous.
> 
> 
> 
> 
> 
> QUESTION: What I don't get in the comment is I assume you are referring to CQL in which case we would need to specify the partition (in addition to the index)which means all that data is on one node, correct? Or did I miss something there.
> 
> Maybe my question was just silly - I wasn't referring to CQL.
> 
> As for the locality of the data, I was hoping to be able to fire off an MR job to process all matching rows in the CF - I was assuming that that this job would get executed on the same node as the data.
> 
> But I think the real confusion in my question has to do with the way the ColumnFamilyInputFormat has been implemented, since it would appear that it ingests the entire (non-OPP) CF into Hadoop, such that the predicate needs to be applied in the job rather than up front in the Cassandra query.
> 
> Cheers,
> 
> Ben
>

Re: 1000's of column families

Posted by "Hiller, Dean" <De...@nrel.gov>.

So you're saying that you can access the primary index with a key range, but to access the secondary index, you first need to get all keys and follow up with a multiget, which would use the secondary index to speed the lookup of the matching rows?

Yes, that is how I "believe" it works.  I am by no means an expert.

I also wanted to fire off a MR to process matching rows in the "virtual" CF ideally running on the nodes where it reads data in.  In 0.7, I thought the M/R jobs did not run locally with the data like hadoop does???  Anyone know if that is still true or does it run locally to the data now?

Thanks,
Dean

From: Ben Hood <0x...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, October 2, 2012 1:01 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: 1000's of column families

Dean,

On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:

Because the data for an index is not all together(ie. Need a multi get to get the data). It is not contiguous.

The prefix in a partition they keep the data so all data for a prefix from what I understand is contiguous.





QUESTION: What I don't get in the comment is I assume you are referring to CQL in which case we would need to specify the partition (in addition to the index)which means all that data is on one node, correct? Or did I miss something there.

Maybe my question was just silly - I wasn't referring to CQL.

As for the locality of the data, I was hoping to be able to fire off an MR job to process all matching rows in the CF - I was assuming that that this job would get executed on the same node as the data.

But I think the real confusion in my question has to do with the way the ColumnFamilyInputFormat has been implemented, since it would appear that it ingests the entire (non-OPP) CF into Hadoop, such that the predicate needs to be applied in the job rather than up front in the Cassandra query.

Cheers,

Ben

Re: 1000's of column families

Posted by Ben Hood <0x...@gmail.com>.

Dean, 

On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote:

> Because the data for an index is not all together(ie. Need a multi get to get the data). It is not contiguous.
> 
> The prefix in a partition they keep the data so all data for a prefix from what I understand is contiguous.
> 

So you're saying that you can access the primary index with a key range, but to access the secondary index, you first need to get all keys and follow up with a multiget, which would use the secondary index to speed the lookup of the matching rows?

> 
> QUESTION: What I don't get in the comment is I assume you are referring to CQL in which case we would need to specify the partition (in addition to the index)which means all that data is on one node, correct? Or did I miss something there.
> 
> 

Maybe my question was just silly - I wasn't referring to CQL.

As for the locality of the data, I was hoping to be able to fire off an MR job to process all matching rows in the CF - I was assuming that that this job would get executed on the same node as the data.

But I think the real confusion in my question has to do with the way the ColumnFamilyInputFormat has been implemented, since it would appear that it ingests the entire (non-OPP) CF into Hadoop, such that the predicate needs to be applied in the job rather than up front in the Cassandra query.

Cheers,

Ben

Re: 1000's of column families

Posted by "Hiller, Dean" <De...@nrel.gov>.

Because the data for an index is not all together(ie. Need a multi get to get the data).  It is not contiguous.

The prefix in a partition they keep the data so all data for a prefix from what I understand is contiguous.

QUESTION: What I don't get in the comment is I assume you are referring to CQL in which case we would need to specify the partition (in addition to the index)which means all that data is on one node, correct?  Or did I miss something there.

Thanks,
Dean

From: Ben Hood <0x...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, October 2, 2012 11:18 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: 1000's of column families

Jeremy,

On Tuesday, October 2, 2012 at 17:06, Jeremy Hanna wrote:

Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary index as an input to your mapreduce job. What you might do is add a field to the column family that represents which virtual column family that it is part of. Then when doing mapreduce jobs, you could use that field as the secondary index limiter. Secondary index mapreduce is not as efficient since you first get all of the keys and then do multigets to get the data that you need for the mapreduce job. However, it's another option for not scanning the whole column family.

Interesting. This is probably a stupid question but why shouldn't you be able to use the secondary index to go straight to the slices that belong to the attribute you are searching by? Is this something to do with the way Cassandra is exposed as an InputFormat for Hadoop or is this a general property for searching by secondary index?

Ben

Re: 1000's of column families

Posted by Ben Hood <0x...@gmail.com>.

Jeremy, 

On Tuesday, October 2, 2012 at 17:06, Jeremy Hanna wrote:

> Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary index as an input to your mapreduce job. What you might do is add a field to the column family that represents which virtual column family that it is part of. Then when doing mapreduce jobs, you could use that field as the secondary index limiter. Secondary index mapreduce is not as efficient since you first get all of the keys and then do multigets to get the data that you need for the mapreduce job. However, it's another option for not scanning the whole column family.
> 

Interesting. This is probably a stupid question but why shouldn't you be able to use the secondary index to go straight to the slices that belong to the attribute you are searching by? Is this something to do with the way Cassandra is exposed as an InputFormat for Hadoop or is this a general property for searching by secondary index?

Ben

Re: 1000's of column families

Posted by Jeremy Hanna <je...@gmail.com>.

Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary index as an input to your mapreduce job.  What you might do is add a field to the column family that represents which virtual column family that it is part of.  Then when doing mapreduce jobs, you could use that field as the secondary index limiter.  Secondary index mapreduce is not as efficient since you first get all of the keys and then do multigets to get the data that you need for the mapreduce job.  However, it's another option for not scanning the whole column family.

On Oct 2, 2012, at 10:09 AM, Ben Hood <0x...@gmail.com> wrote:

> On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill <bo...@gmail.com> wrote:
>> Exactly.
> 
> So you're back to the deliberation between using multiple CFs
> (potentially with some known working upper bound*) or feeding your map
> reduce in some other way (as you decided to do with Storm). In my
> particular scenario I'd like to be able to do a combination of some
> batch processing on top of less frequently changing data (hence why I
> was looking at Hadoop) and some real time analytics.
> 
> Cheers,
> 
> Ben
> 
> (*) Not sure whether this applies to an individual keyspace or an
> entire cluster.

Re: 1000's of column families

Posted by "Hiller, Dean" <De...@nrel.gov>.

Ben, Brian, 
   By the way, PlayOrm offers a NoSqlTypedSession that is different than
the ORM half of PlayOrm dealing in raw stuff that does indexing(so you can
do Scalable SQL on data that has no ORM on top of it).  That is what we
use for our 1000's of CF's as we don't know the format of any of those
tables ahead of time(in our world, users tell us the format and wire in
streams through an api we expose AND they tell PlayOrm which columns to
index).  That layer deals with BigInteger, BigDecimal, String and I think
byte[].

So, I am going to add virtual CF's to PlayOrm in the coming week and we
are going to feed in streams and partition the virtual CF's which sit in a
single real CF using PlayOrm partitioning and then we can then query into
each partition.  

The only issue is really what partitions exist and that is left to the
client to keep track of, but if your app knows all the partitions(and that
could be saved to some rows in the nosql store), then I will probably try
out storm after that.

Later,
Dean

On 10/2/12 9:09 AM, "Ben Hood" <0x...@gmail.com> wrote:

>On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill <bo...@gmail.com> wrote:
>> Exactly.
>
>So you're back to the deliberation between using multiple CFs
>(potentially with some known working upper bound*) or feeding your map
>reduce in some other way (as you decided to do with Storm). In my
>particular scenario I'd like to be able to do a combination of some
>batch processing on top of less frequently changing data (hence why I
>was looking at Hadoop) and some real time analytics.
>
>Cheers,
>
>Ben
>
>(*) Not sure whether this applies to an individual keyspace or an
>entire cluster.

Re: 1000's of column families

Posted by Ben Hood <0x...@gmail.com>.

On Tue, Oct 2, 2012 at 3:37 PM, Brian O'Neill <bo...@gmail.com> wrote:
> Exactly.

So you're back to the deliberation between using multiple CFs
(potentially with some known working upper bound*) or feeding your map
reduce in some other way (as you decided to do with Storm). In my
particular scenario I'd like to be able to do a combination of some
batch processing on top of less frequently changing data (hence why I
was looking at Hadoop) and some real time analytics.

Cheers,

Ben

(*) Not sure whether this applies to an individual keyspace or an
entire cluster.

Re: 1000's of column families

Posted by Brian O'Neill <bo...@gmail.com>.

Exactly.

---
Brian O'Neill
Lead Architect, Software Development

Health Market Science
The Science of Better Results
2700 Horizon Drive  King of Prussia, PA  19406
M: 215.588.6024  @boneill42 <http://www.twitter.com/boneill42>  
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.

On 10/2/12 9:55 AM, "Ben Hood" <0x...@gmail.com> wrote:

>Brian,
>
>On Tue, Oct 2, 2012 at 2:20 PM, Brian O'Neill <bo...@gmail.com> wrote:
>>
>> Without putting too much thought into it...
>>
>> Given the underlying architecture, I think you could/would have to write
>> your own partitioner, which would partition based on the prefix/virtual
>> keyspace.
>
>I might be barking up the wrong tree here, but looking at source of
>ColumnFamilyInputFormat, it seems that you can specify a KeyRange for
>the input, but only when you use an order preserving partitioner. So I
>presume that if you are using the RandomPartitioner, you are
>effectively doing a full CF scan (i.e. including all tenants in your
>system).
>
>Ben

Re: 1000's of column families

Posted by Ben Hood <0x...@gmail.com>.

Brian,

On Tue, Oct 2, 2012 at 2:20 PM, Brian O'Neill <bo...@gmail.com> wrote:
>
> Without putting too much thought into it...
>
> Given the underlying architecture, I think you could/would have to write
> your own partitioner, which would partition based on the prefix/virtual
> keyspace.

I might be barking up the wrong tree here, but looking at source of
ColumnFamilyInputFormat, it seems that you can specify a KeyRange for
the input, but only when you use an order preserving partitioner. So I
presume that if you are using the RandomPartitioner, you are
effectively doing a full CF scan (i.e. including all tenants in your
system).

Ben

Re: 1000's of column families

Posted by Brian O'Neill <bo...@gmail.com>.

Agreed. 

Do we know yet what the overhead is for each column family?  What is the
limit?
If you have a SINGLE keyspace w/ 20000+ CF's, what happens?  Anyone know?

-brian


---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive • King of Prussia, PA • 19406
M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42>  •
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:28 AM, "Hiller, Dean" <De...@nrel.gov> wrote:

>Thanks for the idea but…(but please keep thinking on it)...
>
>100% what we don't want since partitioned data resides on the same node.
>I want to map/reduce the column families and leverage the parallel disks
>
>:( :(
>
>I am sure others would want to do the same…..We almost need a feature of
>virtual Column Families and column family should really not be column
>family but should be called ReplicationGroup or something where
>replication is configured for all CF's in that group.
>
>ANYONE have any other ideas???
>
>Dean
>
>On 10/2/12 7:20 AM, "Brian O'Neill" <bo...@gmail.com> wrote:
>
>>
>>Without putting too much thought into it...
>>
>>Given the underlying architecture, I think you could/would have to write
>>your own partitioner, which would partition based on the prefix/virtual
>>keyspace.  
>>
>>-brian
>>
>>---
>>Brian O'Neill
>>Lead Architect, Software Development
>> 
>>Health Market Science
>>The Science of Better Results
>>2700 Horizon Drive € King of Prussia, PA € 19406
>>M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
>>healthmarketscience.com
>>
>>This information transmitted in this email message is for the intended
>>recipient only and may contain confidential and/or privileged material.
>>If
>>you received this email in error and are not the intended recipient, or
>>the person responsible to deliver it to the intended recipient, please
>>contact the sender at the email above and delete this email and any
>>attachments and destroy any copies thereof. Any review, retransmission,
>>dissemination, copying or other use of, or taking any action in reliance
>>upon, this information by persons or entities other than the intended
>>recipient is strictly prohibited.
>> 
>>
>>
>>
>>
>>
>>
>>On 10/2/12 9:00 AM, "Ben Hood" <0x...@gmail.com> wrote:
>>
>>>Dean,
>>>
>>>On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean <De...@nrel.gov>
>>>wrote:
>>>> Ben,
>>>>   to address your question, read my last post but to summarize, yes,
>>>>there
>>>> is less overhead in memory to prefix keys than manage multiple Cfs
>>>>EXCEPT
>>>> when doing map/reduce.  Doing map/reduce, you will now have HUGE
>>>>overhead
>>>> in reading a whole slew of rows you don't care about as you can't
>>>> map/reduce a single virtual CF but must map/reduce the whole CF
>>>>wasting
>>>> TONS of resources.
>>>
>>>That's a good point that I hadn't considered beforehand, especially as
>>>I'd like to run MR jobs against these CFs.
>>>
>>>Is this limitation inherent in the way that Cassandra is modelled as
>>>input for Hadoop or could you write a custom slice query to only feed
>>>in one particular prefix into Hadoop?
>>>
>>>Cheers,
>>>
>>>Ben
>>
>>
>

Re: 1000's of column families

Posted by "Hiller, Dean" <De...@nrel.gov>.

Thanks for the idea but…(but please keep thinking on it)...

100% what we don't want since partitioned data resides on the same node.
I want to map/reduce the column families and leverage the parallel disks

:( :(

I am sure others would want to do the same…..We almost need a feature of
virtual Column Families and column family should really not be column
family but should be called ReplicationGroup or something where
replication is configured for all CF's in that group.

ANYONE have any other ideas???

Dean

On 10/2/12 7:20 AM, "Brian O'Neill" <bo...@gmail.com> wrote:

>
>Without putting too much thought into it...
>
>Given the underlying architecture, I think you could/would have to write
>your own partitioner, which would partition based on the prefix/virtual
>keyspace.  
>
>-brian
>
>---
>Brian O'Neill
>Lead Architect, Software Development
> 
>Health Market Science
>The Science of Better Results
>2700 Horizon Drive € King of Prussia, PA € 19406
>M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42>  €
>healthmarketscience.com
>
>This information transmitted in this email message is for the intended
>recipient only and may contain confidential and/or privileged material. If
>you received this email in error and are not the intended recipient, or
>the person responsible to deliver it to the intended recipient, please
>contact the sender at the email above and delete this email and any
>attachments and destroy any copies thereof. Any review, retransmission,
>dissemination, copying or other use of, or taking any action in reliance
>upon, this information by persons or entities other than the intended
>recipient is strictly prohibited.
> 
>
>
>
>
>
>
>On 10/2/12 9:00 AM, "Ben Hood" <0x...@gmail.com> wrote:
>
>>Dean,
>>
>>On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean <De...@nrel.gov>
>>wrote:
>>> Ben,
>>>   to address your question, read my last post but to summarize, yes,
>>>there
>>> is less overhead in memory to prefix keys than manage multiple Cfs
>>>EXCEPT
>>> when doing map/reduce.  Doing map/reduce, you will now have HUGE
>>>overhead
>>> in reading a whole slew of rows you don't care about as you can't
>>> map/reduce a single virtual CF but must map/reduce the whole CF wasting
>>> TONS of resources.
>>
>>That's a good point that I hadn't considered beforehand, especially as
>>I'd like to run MR jobs against these CFs.
>>
>>Is this limitation inherent in the way that Cassandra is modelled as
>>input for Hadoop or could you write a custom slice query to only feed
>>in one particular prefix into Hadoop?
>>
>>Cheers,
>>
>>Ben
>
>

Re: 1000's of column families

Posted by Brian O'Neill <bo...@gmail.com>.

Without putting too much thought into it...

Given the underlying architecture, I think you could/would have to write
your own partitioner, which would partition based on the prefix/virtual
keyspace.  

-brian

---
Brian O'Neill
Lead Architect, Software Development

Health Market Science
The Science of Better Results
2700 Horizon Drive  King of Prussia, PA  19406
M: 215.588.6024  @boneill42 <http://www.twitter.com/boneill42>  
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.

On 10/2/12 9:00 AM, "Ben Hood" <0x...@gmail.com> wrote:

>Dean,
>
>On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean <De...@nrel.gov> wrote:
>> Ben,
>>   to address your question, read my last post but to summarize, yes,
>>there
>> is less overhead in memory to prefix keys than manage multiple Cfs
>>EXCEPT
>> when doing map/reduce.  Doing map/reduce, you will now have HUGE
>>overhead
>> in reading a whole slew of rows you don't care about as you can't
>> map/reduce a single virtual CF but must map/reduce the whole CF wasting
>> TONS of resources.
>
>That's a good point that I hadn't considered beforehand, especially as
>I'd like to run MR jobs against these CFs.
>
>Is this limitation inherent in the way that Cassandra is modelled as
>input for Hadoop or could you write a custom slice query to only feed
>in one particular prefix into Hadoop?
>
>Cheers,
>
>Ben

Re: 1000's of column families

Posted by Ben Hood <0x...@gmail.com>.

Dean,

On Tue, Oct 2, 2012 at 1:37 PM, Hiller, Dean <De...@nrel.gov> wrote:
> Ben,
>   to address your question, read my last post but to summarize, yes, there
> is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT
> when doing map/reduce.  Doing map/reduce, you will now have HUGE overhead
> in reading a whole slew of rows you don't care about as you can't
> map/reduce a single virtual CF but must map/reduce the whole CF wasting
> TONS of resources.

That's a good point that I hadn't considered beforehand, especially as
I'd like to run MR jobs against these CFs.

Is this limitation inherent in the way that Cassandra is modelled as
input for Hadoop or could you write a custom slice query to only feed
in one particular prefix into Hadoop?

Cheers,

Ben

Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...

Posted by Brian O'Neill <bo...@gmail.com>.

Dean,

We moved away from Hadoop and M/R, and instead we are using Storm as our
compute grid.  We queue keys in Kafka, then Storm distributes the work to
the grid.  Its working well so far, but we haven't taken it to prod yet.
Data is read from Cassandra using a Cassandra-bolt.

If you end up using Storm, let me know.  We have an unreleased version of
the bolt that you probably want to use.  (we're waiting on Nathan/Storm to
fix some classpath loading issues)

RE: a customer virtual keyspace Partitioner, point well taken

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive ? King of Prussia, PA ? 19406
M: 215.588.6024 ? @boneill42 <http://www.twitter.com/boneill42>  ?
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 9:33 AM, "Hiller, Dean" <De...@nrel.gov> wrote:

>Well, I think I know the direction we may follow so we can
>1. Have Virtual CF's
>2. Be able to map/reduce ONE Virtual CF
>
>Well, not map/reduce exactly but really really close.  We use PlayOrm with
>it's partitioning so I am now thinking what we will do is have a compute
>grid  where we can have each node doing a findAll query into the
>partitions it is responsible for.  In this way, I think we can 1000's of
>virtual CF's inside ONE CF and then PlayOrm does it's query and retrieves
>the rows for that partition of one virtual CF.
>
>Anyone know of a computer grid we can dish out work to?  That would be my
>only missing piece (well, that and the PlayOrm virtual CF feature but I
>can add that within a week probably though I am on vacation this Thursday
>to monday).
>
>Later,
>Dean
>
>
>On 10/2/12 6:35 AM, "Hiller, Dean" <De...@nrel.gov> wrote:
>
>>So basically, with moving towards the 1000's of CF all being put in one
>>CF, our performance is going to tank on map/reduce, correct?  I mean,
>>from
>>what I remember we could do map/reduce on a single CF, but by stuffing
>>1000's of virtual Cf's into one CF, our map/reduce will have to read in
>>all 999 virtual CF's rows that we don't want just to map/reduce the ONE
>>CF.
>>
>>Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.
>>
>>Is this correct?  This really sounds like highly undesirable behavior.
>>There needs to be a way for people with 1000's of CF's to also run
>>map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
>>rows will be 1000 times slowerŠ.and of course, we will most likely get up
>>to 20,000 tables from my most recent projectionsŠ.our last test load, we
>>ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
>>started getting really REALLY slow when we got up to 15k+ CF's in the
>>system though I didn't look into why.
>>
>>I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
>>map/reduce "just" the virtual CF!!!!!  Ugh.
>>
>>Thanks,
>>Dean
>>
>>On 10/1/12 3:38 PM, "Ben Hood" <0x...@gmail.com> wrote:
>>
>>>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill <bo...@alumni.brown.edu>
>>>wrote:
>>>> Its just a convenient way of prefixing:
>>>> 
>>>>http://hector-client.github.com/hector/build/html/content/virtual_keysp
>>>>a
>>>>c
>>>>es.html
>>>
>>>So given that it is possible to use a CF per tenant, should we assume
>>>that there at sufficient scale that there is less overhead to prefix
>>>keys than there is to manage multiple CFs?
>>>
>>>Ben
>>
>

Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...

Posted by "Hiller, Dean" <De...@nrel.gov>.

Well, I think I know the direction we may follow so we can
1. Have Virtual CF's
2. Be able to map/reduce ONE Virtual CF

Well, not map/reduce exactly but really really close.  We use PlayOrm with
it's partitioning so I am now thinking what we will do is have a compute
grid  where we can have each node doing a findAll query into the
partitions it is responsible for.  In this way, I think we can 1000's of
virtual CF's inside ONE CF and then PlayOrm does it's query and retrieves
the rows for that partition of one virtual CF.

Anyone know of a computer grid we can dish out work to?  That would be my
only missing piece (well, that and the PlayOrm virtual CF feature but I
can add that within a week probably though I am on vacation this Thursday
to monday).

Later,
Dean

On 10/2/12 6:35 AM, "Hiller, Dean" <De...@nrel.gov> wrote:

>So basically, with moving towards the 1000's of CF all being put in one
>CF, our performance is going to tank on map/reduce, correct?  I mean, from
>what I remember we could do map/reduce on a single CF, but by stuffing
>1000's of virtual Cf's into one CF, our map/reduce will have to read in
>all 999 virtual CF's rows that we don't want just to map/reduce the ONE
>CF.
>
>Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.
>
>Is this correct?  This really sounds like highly undesirable behavior.
>There needs to be a way for people with 1000's of CF's to also run
>map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
>rows will be 1000 times slowerŠ.and of course, we will most likely get up
>to 20,000 tables from my most recent projectionsŠ.our last test load, we
>ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
>started getting really REALLY slow when we got up to 15k+ CF's in the
>system though I didn't look into why.
>
>I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
>map/reduce "just" the virtual CF!!!!!  Ugh.
>
>Thanks,
>Dean
>
>On 10/1/12 3:38 PM, "Ben Hood" <0x...@gmail.com> wrote:
>
>>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill <bo...@alumni.brown.edu>
>>wrote:
>>> Its just a convenient way of prefixing:
>>> 
>>>http://hector-client.github.com/hector/build/html/content/virtual_keyspa
>>>c
>>>es.html
>>
>>So given that it is possible to use a CF per tenant, should we assume
>>that there at sufficient scale that there is less overhead to prefix
>>keys than there is to manage multiple CFs?
>>
>>Ben
>

Re: 1000's of CF's. virtual CFs do NOTworkŠ..map/reduce

Posted by Brian O'Neill <bo...@gmail.com>.

Dean,

Great point.  I hadn't considered that either.  Per my other email, think
we would need a custom partitioner for this? (a mix of
OrderPreservingPartitioner and RandomPartitioner, OPP for the prefix)

-brian

---
Brian O'Neill
Lead Architect, Software Development

Health Market Science
The Science of Better Results
2700 Horizon Drive ? King of Prussia, PA ? 19406
M: 215.588.6024 ? @boneill42 <http://www.twitter.com/boneill42>  ?
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.

On 10/2/12 8:35 AM, "Hiller, Dean" <De...@nrel.gov> wrote:

>So basically, with moving towards the 1000's of CF all being put in one
>CF, our performance is going to tank on map/reduce, correct?  I mean, from
>what I remember we could do map/reduce on a single CF, but by stuffing
>1000's of virtual Cf's into one CF, our map/reduce will have to read in
>all 999 virtual CF's rows that we don't want just to map/reduce the ONE
>CF.
>
>Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.
>
>Is this correct?  This really sounds like highly undesirable behavior.
>There needs to be a way for people with 1000's of CF's to also run
>map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
>rows will be 1000 times slowerŠ.and of course, we will most likely get up
>to 20,000 tables from my most recent projectionsŠ.our last test load, we
>ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
>started getting really REALLY slow when we got up to 15k+ CF's in the
>system though I didn't look into why.
>
>I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
>map/reduce "just" the virtual CF!!!!!  Ugh.
>
>Thanks,
>Dean
>
>On 10/1/12 3:38 PM, "Ben Hood" <0x...@gmail.com> wrote:
>
>>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill <bo...@alumni.brown.edu>
>>wrote:
>>> Its just a convenient way of prefixing:
>>> 
>>>http://hector-client.github.com/hector/build/html/content/virtual_keyspa
>>>c
>>>es.html
>>
>>So given that it is possible to use a CF per tenant, should we assume
>>that there at sufficient scale that there is less overhead to prefix
>>keys than there is to manage multiple CFs?
>>
>>Ben
>

1000's of CF's. virtual CFs do NOT workŠ..map/reduce

Posted by "Hiller, Dean" <De...@nrel.gov>.

So basically, with moving towards the 1000's of CF all being put in one
CF, our performance is going to tank on map/reduce, correct?  I mean, from
what I remember we could do map/reduce on a single CF, but by stuffing
1000's of virtual Cf's into one CF, our map/reduce will have to read in
all 999 virtual CF's rows that we don't want just to map/reduce the ONE CF.

Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.

Is this correct?  This really sounds like highly undesirable behavior.
There needs to be a way for people with 1000's of CF's to also run
map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
rows will be 1000 times slowerŠ.and of course, we will most likely get up
to 20,000 tables from my most recent projectionsŠ.our last test load, we
ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
started getting really REALLY slow when we got up to 15k+ CF's in the
system though I didn't look into why.

I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
map/reduce "just" the virtual CF!!!!!  Ugh.

Thanks,
Dean

On 10/1/12 3:38 PM, "Ben Hood" <0x...@gmail.com> wrote:

>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill <bo...@alumni.brown.edu>
>wrote:
>> Its just a convenient way of prefixing:
>> 
>>http://hector-client.github.com/hector/build/html/content/virtual_keyspac
>>es.html
>
>So given that it is possible to use a CF per tenant, should we assume
>that there at sufficient scale that there is less overhead to prefix
>keys than there is to manage multiple CFs?
>
>Ben

Re: 1000's of column families

Posted by "Hiller, Dean" <De...@nrel.gov>.

Ben, 
  to address your question, read my last post but to summarize, yes, there
is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT
when doing map/reduce.  Doing map/reduce, you will now have HUGE overhead
in reading a whole slew of rows you don't care about as you can't
map/reduce a single virtual CF but must map/reduce the whole CF wasting
TONS of resources.

Thanks,
Dean

On 10/1/12 3:38 PM, "Ben Hood" <0x...@gmail.com> wrote:

>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill <bo...@alumni.brown.edu>
>wrote:
>> Its just a convenient way of prefixing:
>> 
>>http://hector-client.github.com/hector/build/html/content/virtual_keyspac
>>es.html
>
>So given that it is possible to use a CF per tenant, should we assume
>that there at sufficient scale that there is less overhead to prefix
>keys than there is to manage multiple CFs?
>
>Ben

Re: 1000's of column families

Posted by Ben Hood <0x...@gmail.com>.

On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill <bo...@alumni.brown.edu> wrote:
> Its just a convenient way of prefixing:
> http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html

So given that it is possible to use a CF per tenant, should we assume
that there at sufficient scale that there is less overhead to prefix
keys than there is to manage multiple CFs?

Ben

Re: 1000's of column families

Posted by Brian O'Neill <bo...@alumni.brown.edu>.

Its just a convenient way of prefixing:
http://hector-client.github.com/hector/build/html/content/virtual_keyspaces.html

-brian

On Mon, Oct 1, 2012 at 4:22 PM, Ben Hood <0x...@gmail.com> wrote:
> Brian,
>
> On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill <bo...@alumni.brown.edu> wrote:
>> We haven't committed either way yet, but given Ed Anuff's presentation
>> on virtual keyspaces, we were leaning towards a single column family
>> approach:
>> http://blog.apigee.com/detail/building_a_mobile_data_platform_with_cassandra_-_apigee_under_the_hood/?
>
> Is this doing something special or is this just a convenience way of
> prefixing keys to make the storage space multi-tenanted?
>
> Cheers,
>
> Ben



-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)

mobile:215.588.6024
blog: http://brianoneill.blogspot.com/
twitter: @boneill42

Re: 1000's of column families

Posted by Ben Hood <0x...@gmail.com>.

Brian,

On Mon, Oct 1, 2012 at 4:22 PM, Brian O'Neill <bo...@alumni.brown.edu> wrote:
> We haven't committed either way yet, but given Ed Anuff's presentation
> on virtual keyspaces, we were leaning towards a single column family
> approach:
> http://blog.apigee.com/detail/building_a_mobile_data_platform_with_cassandra_-_apigee_under_the_hood/?

Is this doing something special or is this just a convenience way of
prefixing keys to make the storage space multi-tenanted?

Cheers,

Ben

Re: 1000's of column families

Posted by "Hiller, Dean" <De...@nrel.gov>.

Well, I am now thinking of adding a virtual capability to PlayOrm which we
currently use to allow grouping entities into one column family.  Right
now the CF creation comes from a single entity so this then may change for
those entities that define they are in a single CF groupŠ.This should not
be a very hard change if we decide to do that.

This makes us rely even more on PlayOrm's command line tool(instead of
cassandra-cli) as I can't stand reading hex all the time nor do I like
switching my "assume validator to utf8 to decimal, to integer just so I
can read stuff".

Later,
Dean

On 10/1/12 9:22 AM, "Brian O'Neill" <bo...@alumni.brown.edu> wrote:

>Dean,
>
>We have the same question...
>
>We have thousands of separate feeds of data as well (20,000+).  To
>date, we've been using a CF per feed strategy, but as we scale this
>thing out to accommodate all of those feeds, we're trying to figure
>out if we're going to blow out the memory.
>
>The initial documentation for heap sizing had column families in the
>equation:
>http://www.datastax.com/docs/0.7/operations/tuning#heap-sizing
>
>But in the more recent documentation, it looks like they removed the
>column family variable with the introduction of the universal
>key_cache_size.
>http://www.datastax.com/docs/1.0/operations/tuning#tuning-java-heap-size
>
>We haven't committed either way yet, but given Ed Anuff's presentation
>on virtual keyspaces, we were leaning towards a single column family
>approach:
>http://blog.apigee.com/detail/building_a_mobile_data_platform_with_cassand
>ra_-_apigee_under_the_hood/?
>
>Definitely let us know what you decide.
>
>-brian
>
>On Fri, Sep 28, 2012 at 11:48 AM, Flavio Baronti
><f....@list-group.com> wrote:
>> We had some serious trouble with dynamically adding CFs, although last
>>time
>> we tried we were using version 0.7, so maybe
>> that's not an issue any more.
>> Our problems were two:
>> - You are (were?) not supposed to add CFs concurrently. Since we had
>>more
>> servers talking to the same Cassandra cluster,
>> we had to use distributed locks (Hazelcast) to avoid concurrency.
>> - You must be very careful to add new CFs to different Cassandra nodes.
>>If
>> you do that fast enough, and the clocks of
>> the two servers are skewed, you will severely compromise your schema
>> (Cassandra will not understand in which order the
>> updates must be applied).
>>
>> As I said, this applied to version 0.7, maybe current versions solved
>>these
>> problems.
>>
>> Flavio
>>
>>
>> Il 2012/09/27 16:11 PM, Hiller, Dean ha scritto:
>>> We have 1000's of different building devices and we stream data from
>>>these
>> devices.  The format and data from each one varies so one device has
>>temperature
>> at timeX with some other variables, another device has CO2 percentage
>>and other
>> variables.  Every device is unique and streams it's own data.  We
>>dynamically
>> discover devices and register them.  Basically, one CF or table per
>>thing really
>> makes sense in this environment.  While we could try to find out which
>>devices
>> "are" similar, this would really be a pain and some devices add some new
>> variable into the equation.  NOT only that but researchers can register
>>new
>> datasets and upload them as well and each dataset they have they do NOT
>>want to
>> share with other researches necessarily so we have security groups and
>>each CF
>> belongs to security groups.  We dynamically create CF's on the fly as
>>people
>> register new datasets.
>>>
>>> On top of that, when the data sets get too large, we probably want to
>> partition a single CF into time partitions.  We could create one CF and
>>put all
>> the data and have a partition per device, but then a time partition
>>will contain
>> "multiple" devices of data meaning we need to shrink our time partition
>>size
>> where if we have CF per device, the time partition can be larger as it
>>is only
>> for that one device.
>>>
>>> THEN, on top of that, we have a meta CF for these devices so some
>>>people want
>> to query for streams that match criteria AND which returns a CF name
>>and they
>> query that CF name so we almost need a query with variables like select
>>cfName
>> from Meta where x = y and then select * from cfName where xxxxx. Which
>>we can do
>> today.
>>>
>>> Dean
>>>
>>> From: Marcelo Elias Del Valle
>>><mv...@gmail.com>>
>>> Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
>> <us...@cassandra.apache.org>>
>>> Date: Thursday, September 27, 2012 8:01 AM
>>> To: "user@cassandra.apache.org<ma...@cassandra.apache.org>"
>> <us...@cassandra.apache.org>>
>>> Subject: Re: 1000's of column families
>>>
>>> Out of curiosity, is it really necessary to have that amount of CFs?
>>> I am probably still used to relational databases, where you would use
>>>a new
>> table just in case you need to store different kinds of data. As
>>Cassandra
>> stores anything in each CF, it might probably make sense to have a lot
>>of CFs to
>> store your data...
>>> But why wouldn't you use a single CF with partitions in these case?
>>>Wouldn't
>> it be the same thing? I am asking because I might learn a new modeling
>>technique
>> with the answer.
>>>
>>> []s
>>>
>>> 2012/9/26 Hiller, Dean
>>><De...@nrel.gov>>
>>> We are streaming data with 1 stream per 1 CF and we have 1000's of CF.
>>> When
>> using the tools they are all geared to analyzing ONE column family at a
>>time :(.
>> If I remember correctly, Cassandra supports as many CF's as you want,
>>correct?
>> Even though I am going to have tons of funs with limitations on the
>>tools,
>> correct?
>>>
>>> (I may end up wrapping the node tool with my own aggregate calls if
>>>needed to
>> sum up multiple column families and such).
>>>
>>> Thanks,
>>> Dean
>>>
>>>
>>>
>>> --
>>> Marcelo Elias Del Valle
>>> http://mvalle.com - @mvallebr
>>>
>>
>>
>
>
>
>-- 
>Brian ONeill
>Lead Architect, Health Market Science (http://healthmarketscience.com)
>Apache Cassandra MVP
>mobile:215.588.6024
>blog: http://brianoneill.blogspot.com/
>twitter: @boneill42