You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jonathan Colby <jo...@gmail.com> on 2011/03/24 14:02:58 UTC

Quorum, Hector, and datacenter preference

Hi -

Our cluster is spread between 2 datacenters.   We have a straight-forward IP assignment so that OldNetworkTopology (rackinferring snitch) works well.    We have cassandra clients written in Hector in each of those data centers.   The Hector clients all have a list of all cassandra nodes across both data centers.  RF=3.

Is there an order as to which data center gets the first write?    In other words, would (or can) the Hector client do its first write to the cassandra nodes in its own data center?

It would be ideal it Hector chose the "local" cassandra nodes.  That way, if one data center is unreachable, the Quorum of replicas in cassandra is still reached (because it was written to the working data center first).

Otherwise, if the cassandra writes are really random from the Hector client point-of-view, a data center outage would result in a read failure for any data that has 2 replicas in the lost data center.

Is anyone doing this?  Is there a flaw in my logic?

Re: Re: Quorum, Hector, and datacenter preference

Posted by Patricio Echagüe <pa...@gmail.com>.

Doesn't CL=LOCAL_QUORUM solve your problem?

On Thu, Mar 24, 2011 at 9:33 AM, <jo...@gmail.com> wrote:

> Hi Nate -
>
> That sounds really promising and I'm looking forward to trying that out.
>
> My original question came up while thinking how to achieve quorum (with
> rf=3) with a loss of 1 of 2 data centers. My logic was that if you had 2
> replicas in the same data center where the client originally written to,
> then that client is guaranteed to be able to satisfy quorum, even if the
> other data center is unreachable.
>
> But I think there is no way to guarantee where the first write is written
> to. That would be based on the token range, which could very well be in any
> data center.
>
> Jon
>
>
>
>
> On Mar 24, 2011 3:05pm, Nate McCall <na...@datastax.com> wrote:
> > We have a load balancing policy which selects the host best on latency
> >
> > and uses a Phi convict algorithm in a method similar to DynamicSnitch.
> >
> > Using this policy, you would inherently get the closest replica
> >
> > whenever possible as that would most likely be the best performing.
> >
> >
> >
> > This policy is still in trunk and 0.7.0 tip. We should have a new
> >
> > release out containing the above in the next few days.
> >
> >
> >
> > On Thu, Mar 24, 2011 at 8:46 AM, Jonathan Colby
> >
> > jonathan.colby@gmail.com> wrote:
> >
> > > Indeed I found the big flaw in my own logic.   Even writing to the
> "local" cassandra nodes does not guarantee where the replicas will end up.
> The decision where to write the first replica is based on the token ring,
> which is spread out on all nodes regardless of datacenter.   right ?
> >
> > >
> >
> > > On Mar 24, 2011, at 2:02 PM, Jonathan Colby wrote:
> >
> > >
> >
> > >> Hi -
> >
> > >>
> >
> > >> Our cluster is spread between 2 datacenters.   We have a
> straight-forward IP assignment so that OldNetworkTopology (rackinferring
> snitch) works well.    We have cassandra clients written in Hector in each
> of those data centers.   The Hector clients all have a list of all cassandra
> nodes across both data centers.  RF=3.
> >
> > >>
> >
> > >> Is there an order as to which data center gets the first write?    In
> other words, would (or can) the Hector client do its first write to the
> cassandra nodes in its own data center?
> >
> > >>
> >
> > >> It would be ideal it Hector chose the "local" cassandra nodes.  That
> way, if one data center is unreachable, the Quorum of replicas in cassandra
> is still reached (because it was written to the working data center first).
> >
> > >>
> >
> > >> Otherwise, if the cassandra writes are really random from the Hector
> client point-of-view, a data center outage would result in a read failure
> for any data that has 2 replicas in the lost data center.
> >
> > >>
> >
> > >> Is anyone doing this?  Is there a flaw in my logic?
> >
> > >>
> >
> > >>
> >
> > >
> >
> > >
> >
>

Re: Re: Quorum, Hector, and datacenter preference

Posted by jo...@gmail.com.

Hi Nate -

That sounds really promising and I'm looking forward to trying that out.

My original question came up while thinking how to achieve quorum (with  
rf=3) with a loss of 1 of 2 data centers. My logic was that if you had 2  
replicas in the same data center where the client originally written to,  
then that client is guaranteed to be able to satisfy quorum, even if the  
other data center is unreachable.

But I think there is no way to guarantee where the first write is written  
to. That would be based on the token range, which could very well be in any  
data center.

Jon



On Mar 24, 2011 3:05pm, Nate McCall <na...@datastax.com> wrote:
> We have a load balancing policy which selects the host best on latency

> and uses a Phi convict algorithm in a method similar to DynamicSnitch.

> Using this policy, you would inherently get the closest replica

> whenever possible as that would most likely be the best performing.



> This policy is still in trunk and 0.7.0 tip. We should have a new

> release out containing the above in the next few days.



> On Thu, Mar 24, 2011 at 8:46 AM, Jonathan Colby

> jonathan.colby@gmail.com> wrote:

> > Indeed I found the big flaw in my own logic. Even writing to  
> the "local" cassandra nodes does not guarantee where the replicas will  
> end up. The decision where to write the first replica is based on the  
> token ring, which is spread out on all nodes regardless of datacenter.  
> right ?

> >

> > On Mar 24, 2011, at 2:02 PM, Jonathan Colby wrote:

> >

> >> Hi -

> >>

> >> Our cluster is spread between 2 datacenters. We have a  
> straight-forward IP assignment so that OldNetworkTopology (rackinferring  
> snitch) works well. We have cassandra clients written in Hector in each  
> of those data centers. The Hector clients all have a list of all  
> cassandra nodes across both data centers. RF=3.

> >>

> >> Is there an order as to which data center gets the first write? In  
> other words, would (or can) the Hector client do its first write to the  
> cassandra nodes in its own data center?

> >>

> >> It would be ideal it Hector chose the "local" cassandra nodes. That  
> way, if one data center is unreachable, the Quorum of replicas in  
> cassandra is still reached (because it was written to the working data  
> center first).

> >>

> >> Otherwise, if the cassandra writes are really random from the Hector  
> client point-of-view, a data center outage would result in a read failure  
> for any data that has 2 replicas in the lost data center.

> >>

> >> Is anyone doing this? Is there a flaw in my logic?

> >>

> >>

> >

> >

Re: Quorum, Hector, and datacenter preference

Posted by Nate McCall <na...@datastax.com>.

We have a load balancing policy which selects the host best on latency
and uses a Phi convict algorithm in a method similar to DynamicSnitch.
Using this policy, you would inherently get the closest replica
whenever possible as that would most likely be the best performing.

This policy is still in trunk and 0.7.0 tip. We should have a new
release out containing the above in the next few days.

On Thu, Mar 24, 2011 at 8:46 AM, Jonathan Colby
<jo...@gmail.com> wrote:
> Indeed I found the big flaw in my own logic.   Even writing to the "local" cassandra nodes does not guarantee where the replicas will end up.   The decision where to write the first replica is based on the token ring, which is spread out on all nodes regardless of datacenter.   right ?
>
> On Mar 24, 2011, at 2:02 PM, Jonathan Colby wrote:
>
>> Hi -
>>
>> Our cluster is spread between 2 datacenters.   We have a straight-forward IP assignment so that OldNetworkTopology (rackinferring snitch) works well.    We have cassandra clients written in Hector in each of those data centers.   The Hector clients all have a list of all cassandra nodes across both data centers.  RF=3.
>>
>> Is there an order as to which data center gets the first write?    In other words, would (or can) the Hector client do its first write to the cassandra nodes in its own data center?
>>
>> It would be ideal it Hector chose the "local" cassandra nodes.  That way, if one data center is unreachable, the Quorum of replicas in cassandra is still reached (because it was written to the working data center first).
>>
>> Otherwise, if the cassandra writes are really random from the Hector client point-of-view, a data center outage would result in a read failure for any data that has 2 replicas in the lost data center.
>>
>> Is anyone doing this?  Is there a flaw in my logic?
>>
>>
>
>

Re: Quorum, Hector, and datacenter preference

Posted by Jonathan Colby <jo...@gmail.com>.

Indeed I found the big flaw in my own logic.   Even writing to the "local" cassandra nodes does not guarantee where the replicas will end up.   The decision where to write the first replica is based on the token ring, which is spread out on all nodes regardless of datacenter.   right ?

On Mar 24, 2011, at 2:02 PM, Jonathan Colby wrote:

> Hi -
> 
> Our cluster is spread between 2 datacenters.   We have a straight-forward IP assignment so that OldNetworkTopology (rackinferring snitch) works well.    We have cassandra clients written in Hector in each of those data centers.   The Hector clients all have a list of all cassandra nodes across both data centers.  RF=3.
> 
> Is there an order as to which data center gets the first write?    In other words, would (or can) the Hector client do its first write to the cassandra nodes in its own data center?
> 
> It would be ideal it Hector chose the "local" cassandra nodes.  That way, if one data center is unreachable, the Quorum of replicas in cassandra is still reached (because it was written to the working data center first).
> 
> Otherwise, if the cassandra writes are really random from the Hector client point-of-view, a data center outage would result in a read failure for any data that has 2 replicas in the lost data center.
> 
> Is anyone doing this?  Is there a flaw in my logic?
> 
>