You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Oleg Dulin <ol...@gmail.com> on 2012/11/05 17:57:19 UTC

Replication factor and performance questions

I have 4 nodes at my disposal.

I can configure them like this:

1) RF=1, each node has 25% of the data. On random-reads, how big is the 
performance penalty if a node needs to look for data on another replica 
?

2) RF=2, each node has 50% of the data. Same question ?



-- 
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Re: Replication factor and performance questions

Posted by "B. Todd Burruss" <bt...@gmail.com>.

@oleg, to answer your last question a cassandra node should never ask
another node for information it doesn't have.  it uses the key and the
partitioner to determine where the data is located before ever
contacting another node.

On Mon, Nov 5, 2012 at 9:45 AM, Andrey Ilinykh <ai...@gmail.com> wrote:
> You will have one extra hop. Not big deal, actually. And many client
> libraries (astyanax for example) are token aware, so they are smart
> enough to call the right node.
>
> On Mon, Nov 5, 2012 at 9:12 AM, Oleg Dulin <ol...@gmail.com> wrote:
>> Should be all under 400Gig on each.
>>
>> My question is -- is there additional overhead with replicas making requests
>> to one another for keys they don't have ? how much of an overhead is that ?
>>
>>
>> On 2012-11-05 17:00:37 +0000, Michael Kjellman said:
>>
>>> Rule of thumb is to try to keep nodes under 400GB.
>>> Compactions/Repairs/Move operations etc become a nightmare otherwise. How
>>> much data do you expect to have on each node? Also depends on caches,
>>> bloom filters etc
>>>
>>> On 11/5/12 8:57 AM, "Oleg Dulin" <ol...@gmail.com> wrote:
>>>
>>>> I have 4 nodes at my disposal.
>>>>
>>>> I can configure them like this:
>>>>
>>>> 1) RF=1, each node has 25% of the data. On random-reads, how big is the
>>>> performance penalty if a node needs to look for data on another replica
>>>> ?
>>>>
>>>> 2) RF=2, each node has 50% of the data. Same question ?
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Oleg Dulin
>>>> NYC Java Big Data Engineer
>>>> http://www.olegdulin.com/
>>>>
>>>>
>>>
>>>
>>> 'Like' us on Facebook for exclusive content and other resources on all
>>> Barracuda Networks solutions.
>>>
>>> Visit http://barracudanetworks.com/facebook
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Oleg Dulin
>> NYC Java Big Data Engineer
>> http://www.olegdulin.com/
>>
>>

Re: Replication factor and performance questions

Posted by Andrey Ilinykh <ai...@gmail.com>.

You will have one extra hop. Not big deal, actually. And many client
libraries (astyanax for example) are token aware, so they are smart
enough to call the right node.

On Mon, Nov 5, 2012 at 9:12 AM, Oleg Dulin <ol...@gmail.com> wrote:
> Should be all under 400Gig on each.
>
> My question is -- is there additional overhead with replicas making requests
> to one another for keys they don't have ? how much of an overhead is that ?
>
>
> On 2012-11-05 17:00:37 +0000, Michael Kjellman said:
>
>> Rule of thumb is to try to keep nodes under 400GB.
>> Compactions/Repairs/Move operations etc become a nightmare otherwise. How
>> much data do you expect to have on each node? Also depends on caches,
>> bloom filters etc
>>
>> On 11/5/12 8:57 AM, "Oleg Dulin" <ol...@gmail.com> wrote:
>>
>>> I have 4 nodes at my disposal.
>>>
>>> I can configure them like this:
>>>
>>> 1) RF=1, each node has 25% of the data. On random-reads, how big is the
>>> performance penalty if a node needs to look for data on another replica
>>> ?
>>>
>>> 2) RF=2, each node has 50% of the data. Same question ?
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Oleg Dulin
>>> NYC Java Big Data Engineer
>>> http://www.olegdulin.com/
>>>
>>>
>>
>>
>> 'Like' us on Facebook for exclusive content and other resources on all
>> Barracuda Networks solutions.
>>
>> Visit http://barracudanetworks.com/facebook
>>
>>
>>
>
>
> --
> Regards,
> Oleg Dulin
> NYC Java Big Data Engineer
> http://www.olegdulin.com/
>
>

Re: Replication factor and performance questions

Posted by Oleg Dulin <ol...@gmail.com>.

Should be all under 400Gig on each.

My question is -- is there additional overhead with replicas making 
requests to one another for keys they don't have ? how much of an 
overhead is that ?

On 2012-11-05 17:00:37 +0000, Michael Kjellman said:

> Rule of thumb is to try to keep nodes under 400GB.
> Compactions/Repairs/Move operations etc become a nightmare otherwise. How
> much data do you expect to have on each node? Also depends on caches,
> bloom filters etc
> 
> On 11/5/12 8:57 AM, "Oleg Dulin" <ol...@gmail.com> wrote:
> 
>> I have 4 nodes at my disposal.
>> 
>> I can configure them like this:
>> 
>> 1) RF=1, each node has 25% of the data. On random-reads, how big is the
>> performance penalty if a node needs to look for data on another replica
>> ?
>> 
>> 2) RF=2, each node has 50% of the data. Same question ?
>> 
>> 
>> 
>> --
>> Regards,
>> Oleg Dulin
>> NYC Java Big Data Engineer
>> http://www.olegdulin.com/
>> 
>> 
> 
> 
> 'Like' us on Facebook for exclusive content and other resources on all 
> Barracuda Networks solutions.
> 
> Visit http://barracudanetworks.com/facebook
> 
> 
> 


-- 
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/

Re: Replication factor and performance questions

Posted by Bryan <br...@appssavvy.com>.

Our compactions/repairs have already become nightmares and we have not approached the levels of data you describe here (~200 GB). Have any pointers/case studies for optimizing this?


On Nov 5, 2012, at 12:00 PM, Michael Kjellman wrote:

> Rule of thumb is to try to keep nodes under 400GB.
> Compactions/Repairs/Move operations etc become a nightmare otherwise. How
> much data do you expect to have on each node? Also depends on caches,
> bloom filters etc
> 
> On 11/5/12 8:57 AM, "Oleg Dulin" <ol...@gmail.com> wrote:
> 
>> I have 4 nodes at my disposal.
>> 
>> I can configure them like this:
>> 
>> 1) RF=1, each node has 25% of the data. On random-reads, how big is the
>> performance penalty if a node needs to look for data on another replica
>> ?
>> 
>> 2) RF=2, each node has 50% of the data. Same question ?
>> 
>> 
>> 
>> -- 
>> Regards,
>> Oleg Dulin
>> NYC Java Big Data Engineer
>> http://www.olegdulin.com/
>> 
>> 
> 
> 
> 'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
> Visit http://barracudanetworks.com/facebook
> 
>

Re: Replication factor and performance questions

Posted by Michael Kjellman <mk...@barracuda.com>.

Rule of thumb is to try to keep nodes under 400GB.
Compactions/Repairs/Move operations etc become a nightmare otherwise. How
much data do you expect to have on each node? Also depends on caches,
bloom filters etc

On 11/5/12 8:57 AM, "Oleg Dulin" <ol...@gmail.com> wrote:

>I have 4 nodes at my disposal.
>
>I can configure them like this:
>
>1) RF=1, each node has 25% of the data. On random-reads, how big is the
>performance penalty if a node needs to look for data on another replica
>?
>
>2) RF=2, each node has 50% of the data. Same question ?
>
>
>
>-- 
>Regards,
>Oleg Dulin
>NYC Java Big Data Engineer
>http://www.olegdulin.com/
>
>

'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook