You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Donald Smith <Do...@audiencescience.com> on 2014/01/30 02:07:45 UTC

Question about local reads with multiple data centers

We have two datacenters, DC1 and DC2 in our test cluster. Our write process uses a connection string with just the two hosts in DC1. Our read process uses a connection string just with the two hosts in DC2.   We use a PropertyFileSnitch and a property file that 'DC1':2, 'DC2':1 between data centers.

I notice from the read process's logs that the reader adds ALL the hosts (in both datacenters) to the list of queried hosts.

My question: will the read process try to read first locally from the datacenter DC2 I specified in its connection string?     I presume so.  (I doubt that it uses the client's IP address to decide which datacenter is closer. And I am unaware of another way to tell it to read locally.)

Also, will read repair happen between datacenters automatically ("read_repair_chance=0.100000")?  Or does that only happen within a single data center?

We're using Cassandra 2.0.4  and CQL.

Thank you

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
donalds@AudienceScience.com<ma...@AudienceScience.com>

[AudienceScience]


Re: Question about local reads with multiple data centers

Posted by Chris Burroughs <ch...@gmail.com>.
On 01/29/2014 08:07 PM, Donald Smith wrote:
> My question: will the read process try to read first locally from the datacenter DC2 I specified in its connection string?     I presume so.  (I doubt that it uses the client's IP address to decide which datacenter is closer. And I am unaware of another way to tell it to read locally.)
>

 From the rest if this thread it looks like you were asking about how 
the client selected a Cassandra node to act as a coordinator.  Note 
however that if you are using a DC oblivious CL (ONE, QUORUM) then that 
Cassandra coordinator may send requests to the remote data center.


> Also, will read repair happen between datacenters automatically ("read_repair_chance=0.100000")?  Or does that only happen within a single data center?

Yes read_repair_chance is global.  There is a separate dc_local repair 
chance if you want to make local reap repairs more common.

RE: Question about local reads with multiple data centers

Posted by Donald Smith <Do...@audiencescience.com>.
I found the answer.

By default, the Datastax driver for Cassandra uses the RoundRobinPolicy for deciding which Cassandra node a client read or write request should be routed to. But that policy is independent of data center.

Per the documentation (http://www.datastax.com/drivers/java/2.0/apidocs/com/datastax/driver/core/policies/LoadBalancingPolicy.html) , one can see  that if you have multiple data centers, it's probably better to use DCAwareRoundRobinPolicy, which gives preference to the local data center. The client program needs to know which datacenter it resides in (e.g., "DC1").


        private void connect() {
                if (m_session != null) {
                        return;
                }
                String[] components = m_cassandraNode.split(",");
                Builder builder = Cluster.builder();  
                for (String component : components) {
                        builder.addContactPoint(component);
                }
                long start = System.currentTimeMillis();
                LoadBalancingPolicy loadBalancingPolicy = new DCAwareRoundRobinPolicy(localDataCenterName);
                if (useTokenAwarePolicy) {loadBalancingPolicy= new TokenAwarePolicy(loadBalancingPolicy);}
                m_cluster = builder.withLoadBalancingPolicy(loadBalancingPolicy)
                                .build();
                m_session = m_cluster.connect();
                prepareQueries();
                float seconds = 0.001f * (System.currentTimeMillis() - start);
                System.out.println("Connected to cassandra host " + m_cassandraNode
                                + " in " + seconds + " seconds.");
      }


-----Original Message-----
From: Duncan Sands [mailto:duncan.sands@gmail.com] 
Sent: Thursday, January 30, 2014 1:19 AM
To: user@cassandra.apache.org
Subject: Re: Question about local reads with multiple data centers

Hi Donald, which driver are you using?  With the datastax python driver you need to use the DCAwareRoundRobinPolicy for the load balancing policy if you want the driver to distinguish between your data centres, otherwise by default it round robins robins requests amongst all nodes regardless of which data centre they are in, and regardless of which data centre the nodes you told it to connect to are in.  Probably it is the same for the other datastax drivers.

Best wishes, Duncan.

On 30/01/14 02:07, Donald Smith wrote:
> We have two datacenters, DC1 and DC2 in our test cluster. Our *write* 
> process uses a connection string with just the two hosts in DC1. Our *read* process uses
> a connection string just with the two hosts in DC2.   We use a
> PropertyFileSnitch and a property file that 'DC1':2, 'DC2':1 between data centers.
>
> I notice from the *read* process's logs that the reader adds ALL the 
> hosts (in both datacenters) to the list of queried hosts.
>
> My question: will the *read* process try to read first locally from the
> datacenter DC2 I specified in its connection string?     I presume so.  (I doubt
> that it uses the client's IP address to decide which datacenter is 
> closer. And I am unaware of another way to tell it to read locally.)
>
> Also, will read repair happen between datacenters automatically 
> ("read_repair_chance=0.100000")?  Or does that only happen within a 
> single data center?
>
> We're using Cassandra 2.0.4  and CQL.
>
> Thank you
>
> *Donald A. Smith*| Senior Software Engineer
> P: 425.201.3900 x 3866
> C: (206) 819-5965
> F: (646) 443-2333
> donalds@AudienceScience.com <ma...@AudienceScience.com>
>
>
> AudienceScience
>


Re: Question about local reads with multiple data centers

Posted by Duncan Sands <du...@gmail.com>.
Hi Donald, which driver are you using?  With the datastax python driver you need 
to use the DCAwareRoundRobinPolicy for the load balancing policy if you want the 
driver to distinguish between your data centres, otherwise by default it round 
robins robins requests amongst all nodes regardless of which data centre they 
are in, and regardless of which data centre the nodes you told it to connect to 
are in.  Probably it is the same for the other datastax drivers.

Best wishes, Duncan.

On 30/01/14 02:07, Donald Smith wrote:
> We have two datacenters, DC1 and DC2 in our test cluster. Our *write* process
> uses a connection string with just the two hosts in DC1. Our *read* process uses
> a connection string just with the two hosts in DC2.   We use a
> PropertyFileSnitch and a property file that ‘DC1’:2, ‘DC2’:1 between data centers.
>
> I notice from the *read* process’s logs that the reader adds ALL the hosts (in
> both datacenters) to the list of queried hosts.
>
> My question: will the *read* process try to read first locally from the
> datacenter DC2 I specified in its connection string?     I presume so.  (I doubt
> that it uses the client’s IP address to decide which datacenter is closer. And I
> am unaware of another way to tell it to read locally.)
>
> Also, will read repair happen between datacenters automatically
> (“read_repair_chance=0.100000”)?  Or does that only happen within a single data
> center?
>
> We’re using Cassandra 2.0.4  and CQL.
>
> Thank you
>
> *Donald A. Smith*| Senior Software Engineer
> P: 425.201.3900 x 3866
> C: (206) 819-5965
> F: (646) 443-2333
> donalds@AudienceScience.com <ma...@AudienceScience.com>
>
>
> AudienceScience
>