You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Brian O'Neill <bo...@gmail.com> on 2012/10/02 15:29:54 UTC
Re: 1000's of CF's. virtual CFs do NOTworkŠ..map/reduce

Dean,

Great point.  I hadn't considered that either.  Per my other email, think
we would need a custom partitioner for this? (a mix of
OrderPreservingPartitioner and RandomPartitioner, OPP for the prefix)

-brian

---
Brian O'Neill
Lead Architect, Software Development
 
Health Market Science
The Science of Better Results
2700 Horizon Drive ? King of Prussia, PA ? 19406
M: 215.588.6024 ? @boneill42 <http://www.twitter.com/boneill42>  ?
healthmarketscience.com

This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or
the person responsible to deliver it to the intended recipient, please
contact the sender at the email above and delete this email and any
attachments and destroy any copies thereof. Any review, retransmission,
dissemination, copying or other use of, or taking any action in reliance
upon, this information by persons or entities other than the intended
recipient is strictly prohibited.
 






On 10/2/12 8:35 AM, "Hiller, Dean" <De...@nrel.gov> wrote:

>So basically, with moving towards the 1000's of CF all being put in one
>CF, our performance is going to tank on map/reduce, correct?  I mean, from
>what I remember we could do map/reduce on a single CF, but by stuffing
>1000's of virtual Cf's into one CF, our map/reduce will have to read in
>all 999 virtual CF's rows that we don't want just to map/reduce the ONE
>CF.
>
>Map/reduce VERY VERY SLOW when reading in 1000 times more rows :( :(.
>
>Is this correct?  This really sounds like highly undesirable behavior.
>There needs to be a way for people with 1000's of CF's to also run
>map/reduce on any one CF.  Doing Map/reduce on 1000 times the number of
>rows will be 1000 times slowerŠ.and of course, we will most likely get up
>to 20,000 tables from my most recent projectionsŠ.our last test load, we
>ended up with 8k+ CF's.  Since I kept two other keyspaces, cassandra
>started getting really REALLY slow when we got up to 15k+ CF's in the
>system though I didn't look into why.
>
>I don't mind having 1000's of virtual CF's in ONE CF, BUT I need to
>map/reduce "just" the virtual CF!!!!!  Ugh.
>
>Thanks,
>Dean
>
>On 10/1/12 3:38 PM, "Ben Hood" <0x...@gmail.com> wrote:
>
>>On Mon, Oct 1, 2012 at 9:38 PM, Brian O'Neill <bo...@alumni.brown.edu>
>>wrote:
>>> Its just a convenient way of prefixing:
>>> 
>>>http://hector-client.github.com/hector/build/html/content/virtual_keyspa
>>>c
>>>es.html
>>
>>So given that it is possible to use a CF per tenant, should we assume
>>that there at sufficient scale that there is less overhead to prefix
>>keys than there is to manage multiple CFs?
>>
>>Ben
>