You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Shannon Carey <sc...@expedia.com> on 2017/03/21 19:07:08 UTC

ONE has much higher latency than LOCAL_ONE

I am seeing unexpected behavior: consistency level ONE increases read latency 99th percentile to ~108ms (95th percentile to 5ms-90ms) up from ~5ms (99th percentile) when using LOCAL_ONE.

I am using DSE 5.0 with Datastax client 3.0.0. The client is configured with a TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with usedHostsPerRemoteDc set to a very high number. Cassandra cluster has two datacenters.

I would expect that when the cluster is operating normally (all local nodes reachable), ONE would behave the same as LOCAL_ONE. The  Does anyone know why this is not the case?

Re: ONE has much higher latency than LOCAL_ONE

Posted by Shannon Carey <sc...@expedia.com>.

Thanks for the link, I hadn't seen that before.

It's unfortunate that they don't explain what they mean by "closest replica". The nodes in the remote DC should not be regarded as "closest". Also, it's not clear what the black arrows mean… the coordinator sends the read to all three replicas, but only one of them responds?

Reading further (assuming this article from 2012 is still accurate for Cassandra 3.0 http://www.datastax.com/dev/blog/dynamic-snitching-in-cassandra-past-present-and-future), it appears that by "closest replica" what they really mean is the replica chosen by the "dynamic snitch". The structure of the documentation https://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archSnitchesAbout.html is misleading in this regard: it puts the "Dynamic snitching" section side by side with the other snitch implementations, implying that it's one of the choices you can configure as a snitch, which is why I hadn't read that section (I didn't want a snitch that "monitors the performance of reads"). Instead, the info about the dynamic snitch should be in the top-level page. In any case, the dynamic snitch is apparently governed by read latency, the node's state, and whether the node is doing a compaction ("severity"). So why is it routing requests to nodes with latency that's ~20 times larger? I don't know, but I wish it wasn't.

I guess it's important to differentiate between that and the load balancing policy called LatencyAwarePolicy… even if you're not using the LatencyAwarePolicy, internally the snitch is still doing stuff based on latency.

This is also unfortunate because it makes the DCAwareRoundRobinPolicy useless for anything but local consistency levels, and (if you read it at face value) contradicts the description in the documentation that "This policy queries nodes of the local data-center in a round-robin fashion; optionally, it can also try a configurable number of hosts in remote data centers if all local hosts failed."

Also, if you're right that the requests are getting routed to the remote DC, then those requests aren't showing up in my graph of read request rate… which is problematic because I'm not getting an accurate representation of what's actually happening. I can't find any other metric beyond org.apache.cassandra.metrics.ClientRequest.* which might include these internal read requests.

I am wondering if perhaps there's a mistake with the way that the dynamic snitch measures latency… if it's only measuring requests coming from clients, then if a remote node happens to win the dynamic snitch's favor momentarily, the latency of the local node will increase (because it's querying the remote node), and then the dynamic snitch will see that the local node is performing poorly, and will continue directing traffic to the remote cluster.  Or, perhaps they're measuring the latency of each node based not on how long the client request takes but based on how long the internal request takes… which, again, could mislead the snitch into thinking that the remote host is providing a better deal to the client than it really is. It seems like a mistake that the dynamic switch would think that a remote node will be faster or less work than the local node which actually contains a copy of the data being queried.

Looks like I'm not the only one who's run into this: https://issues.apache.org/jira/browse/CASSANDRA-6908

I think I'm going to try setting the system property "cassandra.ignore_dynamic_snitch_severity" to "true" and see what happens. That or "dynamic_snitch: false"… it's not documented in https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html but it appears to be a valid config option.




From: Eric Plowe <er...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Wednesday, March 22, 2017 at 11:44 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: ONE has much higher latency than LOCAL_ONE

Yes, your request from the client is going to the LocalDC that you've defined for the data center aware load balancing policy, but with a consistency level of ONE, there is a chance for the coordinator (the node your client has connected to) to route the request across DC's.

Please see: https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dmlClientRequestsRead.html#dmlClientRequestsRead__two-dc-one

"A two datacenter cluster with a consistency level of ONE
"In a multiple datacenter cluster with a replication factor of 3, and a read consistency of ONE, the closest replica for the given row, regardless of datacenter, is contacted to fulfill the read request. In the background a read repair is potentially initiated, based on the read_repair_chance setting of the table, for the other replicas."


A two datacenter cluster with a consistency level of LOCAL_ONE <https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dmlClientRequestsRead.html#dmlClientRequestsRead__two-dc-local-one>

In a multiple datacenter cluster with a replication factor of 3, and a read consistency of LOCAL_ONE, the closest replica for the given row in the same datacenter as the coordinator node is contacted to fulfill the read request. In the background a read repair is potentially initiated, based on the read_repair_chance setting of the table, for the other replicas."


Dynamic snitching also comes into play with reads. Just because your client is using TokenAware, and should connect to the appropriate replica node (which now is your coordinator) it can route your read request away from what it believes to be poorly performing nodes to another replica which could be in the other DC with CL = ONE. Read more about dynamic snitch here: https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureSnitchDynamic_c.html

Regards,

Eric Plowe

Re: ONE has much higher latency than LOCAL_ONE

Posted by Eric Plowe <er...@gmail.com>.

Yes, your request from the client is going to the LocalDC that you've
defined for the data center aware load balancing policy, but with a
consistency level of ONE, there is a chance for the coordinator (the node
your client has connected to) to route the request across DC's.

Please see:
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dmlClientRequestsRead.html#dmlClientRequestsRead__two-dc-one

"A two datacenter cluster with a consistency level of ONE
"In a multiple datacenter cluster with a replication factor of 3, and a
read consistency of ONE, the closest replica for the given row, regardless
of datacenter, is contacted to fulfill the read request. In the background
a read repair is potentially initiated, based on the read_repair_chance setting
of the table, for the other replicas."


A two datacenter cluster with a consistency level of LOCAL_ONE
<https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dmlClientRequestsRead.html#dmlClientRequestsRead__two-dc-local-one>

In a multiple datacenter cluster with a replication factor of 3, and a read
consistency of LOCAL_ONE, the closest replica for the given row in the same
datacenter as the coordinator node is contacted to fulfill the read
request. In the background a read repair is potentially initiated, based on
the read_repair_chance setting of the table, for the other replicas."


Dynamic snitching also comes into play with reads. Just because your client
is using TokenAware, and should connect to the appropriate replica node
(which now is your coordinator) it can route your read request away from
what it believes to be poorly performing nodes to another replica which
could be in the other DC with CL = ONE. Read more about dynamic snitch
here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/architecture/architectureSnitchDynamic_c.html


Regards,

Eric Plowe







On Wed, Mar 22, 2017 at 12:21 PM Shannon Carey <sc...@expedia.com> wrote:

I understand all that, but it doesn't explain why the latency increases.
The requests are not going to a remote DC. I know this because currently
all requests are going to the client in one particular DC. The read request
rate of the Cassandra nodes in the other DC remained flat (near zero) the
whole time, compared to ~200read/s on the Cassandra nodes in the DC local
to the client doing the reads. This is expected, because the
DCAwareRoundRobinPolicy will cause local nodes to be used preferentially
whenever possible. What's not expected is the dramatic latency increase.
Btw this client is read-only: no writes.


From: Eric Plowe <er...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Tuesday, March 21, 2017 at 7:23 PM

To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: ONE has much higher latency than LOCAL_ONE

ONE means at least one replica node to ack the write, but doesn't require
that the coordinator route the request to a node in the local data center.

LOCAL_ONE was introduced to handle the case of when you have multiple data
centers and cross data center traffic is not desirable.

In multiple datacenter clusters, a consistency level of ONE is often
desirable, but cross-DC traffic is not. LOCAL_ONEaccomplishes this. For
security and quality reasons, you can use this consistency level in an
offline datacenter to prevent automatic connection to online nodes in other
datacenters if an offline node goes down.

From:
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html

Regards,

Eric

On Tue, Mar 21, 2017 at 7:49 PM Shannon Carey <sc...@expedia.com> wrote:

The cluster is in two DCs, and yes the client is deployed locally to each
DC.

From: Matija Gobec <ma...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Tuesday, March 21, 2017 at 2:56 PM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: ONE has much higher latency than LOCAL_ONE

Are you running a multi DC cluster? If yes do you have application in both
data centers/regions ?

On Tue, Mar 21, 2017 at 8:07 PM, Shannon Carey <sc...@expedia.com> wrote:

I am seeing unexpected behavior: consistency level ONE increases read
latency 99th percentile to ~108ms (95th percentile to 5ms-90ms) up from
~5ms (99th percentile) when using LOCAL_ONE.

I am using DSE 5.0 with Datastax client 3.0.0. The client is configured
with a TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with
usedHostsPerRemoteDc set to a very high number. Cassandra cluster has two
datacenters.

I would expect that when the cluster is operating normally (all local nodes
reachable), ONE would behave the same as LOCAL_ONE. The  Does anyone know
why this is not the case?

Re: ONE has much higher latency than LOCAL_ONE

Posted by Shannon Carey <sc...@expedia.com>.

I understand all that, but it doesn't explain why the latency increases. The requests are not going to a remote DC. I know this because currently all requests are going to the client in one particular DC. The read request rate of the Cassandra nodes in the other DC remained flat (near zero) the whole time, compared to ~200read/s on the Cassandra nodes in the DC local to the client doing the reads. This is expected, because the DCAwareRoundRobinPolicy will cause local nodes to be used preferentially whenever possible. What's not expected is the dramatic latency increase. Btw this client is read-only: no writes.

From: Eric Plowe <er...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, March 21, 2017 at 7:23 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: ONE has much higher latency than LOCAL_ONE

ONE means at least one replica node to ack the write, but doesn't require that the coordinator route the request to a node in the local data center.

LOCAL_ONE was introduced to handle the case of when you have multiple data centers and cross data center traffic is not desirable.

In multiple datacenter clusters, a consistency level of ONE is often desirable, but cross-DC traffic is not. LOCAL_ONEaccomplishes this. For security and quality reasons, you can use this consistency level in an offline datacenter to prevent automatic connection to online nodes in other datacenters if an offline node goes down.

From: https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html

Regards,

Eric

On Tue, Mar 21, 2017 at 7:49 PM Shannon Carey <sc...@expedia.com>> wrote:
The cluster is in two DCs, and yes the client is deployed locally to each DC.

From: Matija Gobec <ma...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, March 21, 2017 at 2:56 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: ONE has much higher latency than LOCAL_ONE

Are you running a multi DC cluster? If yes do you have application in both data centers/regions ?

On Tue, Mar 21, 2017 at 8:07 PM, Shannon Carey <sc...@expedia.com>> wrote:
I am seeing unexpected behavior: consistency level ONE increases read latency 99th percentile to ~108ms (95th percentile to 5ms-90ms) up from ~5ms (99th percentile) when using LOCAL_ONE.

I am using DSE 5.0 with Datastax client 3.0.0. The client is configured with a TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with usedHostsPerRemoteDc set to a very high number. Cassandra cluster has two datacenters.

I would expect that when the cluster is operating normally (all local nodes reachable), ONE would behave the same as LOCAL_ONE. The  Does anyone know why this is not the case?

Re: ONE has much higher latency than LOCAL_ONE

Posted by Eric Plowe <er...@gmail.com>.

ONE means at least one replica node to ack the write, but doesn't require
that the coordinator route the request to a node in the local data center.

LOCAL_ONE was introduced to handle the case of when you have multiple data
centers and cross data center traffic is not desirable.

In multiple datacenter clusters, a consistency level of ONE is often
desirable, but cross-DC traffic is not. LOCAL_ONEaccomplishes this. For
security and quality reasons, you can use this consistency level in an
offline datacenter to prevent automatic connection to online nodes in other
datacenters if an offline node goes down.

From:
https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_config_consistency_c.html

Regards,

Eric

On Tue, Mar 21, 2017 at 7:49 PM Shannon Carey <sc...@expedia.com> wrote:

> The cluster is in two DCs, and yes the client is deployed locally to each
> DC.
>
> From: Matija Gobec <ma...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Tuesday, March 21, 2017 at 2:56 PM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: ONE has much higher latency than LOCAL_ONE
>
> Are you running a multi DC cluster? If yes do you have application in both
> data centers/regions ?
>
> On Tue, Mar 21, 2017 at 8:07 PM, Shannon Carey <sc...@expedia.com> wrote:
>
> I am seeing unexpected behavior: consistency level ONE increases read
> latency 99th percentile to ~108ms (95th percentile to 5ms-90ms) up from
> ~5ms (99th percentile) when using LOCAL_ONE.
>
> I am using DSE 5.0 with Datastax client 3.0.0. The client is configured
> with a TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with
> usedHostsPerRemoteDc set to a very high number. Cassandra cluster has two
> datacenters.
>
> I would expect that when the cluster is operating normally (all local
> nodes reachable), ONE would behave the same as LOCAL_ONE. The  Does anyone
> know why this is not the case?
>
>
>

Re: ONE has much higher latency than LOCAL_ONE

Posted by Shannon Carey <sc...@expedia.com>.

Yes, as I mentioned in my other thread, LOCAL_ONE does not allow the retry policy to take action if all local nodes are down.

Yes, I am using withLocalDc(); Here's the code (Scala):

  def getClusterBuilder: Builder = {
    val pool = new PoolingOptions
    pool.setConnectionsPerHost(HostDistance.LOCAL, config.coreConnectionsPerHost, config.maxConnectionsPerHost)

    val codecRegistry: CodecRegistry = new CodecRegistry()
        .register(InstantCodec.instance)
        .register(SimpleTimestampCodec.instance)

    // By specifying MaxValue here, we allow any & all hosts in remote DCs to be used by queries when necessary. That
    // allows TokenAwarePolicy to choose the appropriate nodes in remote DCs.
    val maxHostsToUsePerRemoteDc = Int.MaxValue

    // We have nodes from the remote DC in the initial list (so that we can tolerate DC failover), so we have to
    // specify the local DC explicitly.
    val dcAwarePolicy = DCAwareRoundRobinPolicy.builder()
        .withLocalDc(config.localDc)
        .withUsedHostsPerRemoteDc(maxHostsToUsePerRemoteDc)
        .build()

    val shuffleReplicas = true
    val builder = Cluster.builder()
        .withClusterName(config.clusterName)
        .withPoolingOptions(pool)
        .withLoadBalancingPolicy(new TokenAwarePolicy(dcAwarePolicy, shuffleReplicas))
        .withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
        .withCodecRegistry(codecRegistry)

    val contactPoints = config.contactPointsProvider.getContactPoints.asScala
    contactPoints.foreach(builder.addContactPoint)

    builder
  }

I will have to turn up the logging in order to see that log message you refer to. But it seems to me that the DC config is the same regardless of whether I use ONE or LOCAL_ONE so I don't think it would make a difference. From what I've seen, I'd expect all the non-local nodes to be listed in that message. But I'll see what I can.

Thanks for your responses! I posted to the other list as you suggested: https://groups.google.com/a/lists.datastax.com/forum/#!topic/java-driver-user/o0GVBjFCHCA

From: Nate McCall <na...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, March 21, 2017 at 7:16 PM
To: Cassandra Users <us...@cassandra.apache.org>>
Subject: Re: ONE has much higher latency than LOCAL_ONE

On Wed, Mar 22, 2017 at 1:11 PM, Nate McCall <na...@thelastpickle.com>> wrote:

On Wed, Mar 22, 2017 at 12:48 PM, Shannon Carey <sc...@expedia.com>> wrote:
>
> The cluster is in two DCs, and yes the client is deployed locally to each DC.

First off, what is the goal of using ONE instead of LOCAL_ONE? If it's failover, this could be addressed with a RetryPolicy starting wth LOCAL_ONE and falling back to ONE.

Just read your previous thread about this. That's pretty un-intuitive and counter to the way I remember that working (though admittedly, it's been a while).

Do please open a thread on the driver mailing list, i'm curious about the response.

Re: ONE has much higher latency than LOCAL_ONE

Posted by Nate McCall <na...@thelastpickle.com>.

On Wed, Mar 22, 2017 at 1:11 PM, Nate McCall <na...@thelastpickle.com> wrote:

>
>
> On Wed, Mar 22, 2017 at 12:48 PM, Shannon Carey <sc...@expedia.com>
> wrote:
> >
> > The cluster is in two DCs, and yes the client is deployed locally to
> each DC.
>
> First off, what is the goal of using ONE instead of LOCAL_ONE? If it's
> failover, this could be addressed with a RetryPolicy starting wth LOCAL_ONE
> and falling back to ONE.
>
>
Just read your previous thread about this. That's pretty un-intuitive and
counter to the way I remember that working (though admittedly, it's been a
while).

Do please open a thread on the driver mailing list, i'm curious about the
response.

Re: ONE has much higher latency than LOCAL_ONE

Posted by Nate McCall <na...@thelastpickle.com>.

On Wed, Mar 22, 2017 at 12:48 PM, Shannon Carey <sc...@expedia.com> wrote:
>
> The cluster is in two DCs, and yes the client is deployed locally to each
DC.

First off, what is the goal of using ONE instead of LOCAL_ONE? If it's
failover, this could be addressed with a RetryPolicy starting wth LOCAL_ONE
and falling back to ONE.

Are you using the ".withLocalDc" option in the DCAwareRoundRobinPolicy
builder? (It's been a while since I've gone through this in detail,
though). If you could provide a snippet that included the complete options
passed to the builder that might be helpful.

Also, check for the complete forms of these two logging messages on the app
side during startup (the second one is at INFO so adjust if needed):
"Some contact points don't match local data center. Local DC = {}.
Non-conforming contact points: {}"
"Using data-center name '{}' for DCAwareRoundRobinPolicy..."

Make sure those line up with the cluster topology and your expectations.

Actually, in typing that up, it may be more appropriate to move the
conversation over here since this is probably driver specific:
https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user

--
-----------------
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: ONE has much higher latency than LOCAL_ONE

Posted by Shannon Carey <sc...@expedia.com>.

The cluster is in two DCs, and yes the client is deployed locally to each DC.

From: Matija Gobec <ma...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, March 21, 2017 at 2:56 PM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: ONE has much higher latency than LOCAL_ONE

Are you running a multi DC cluster? If yes do you have application in both data centers/regions ?

On Tue, Mar 21, 2017 at 8:07 PM, Shannon Carey <sc...@expedia.com>> wrote:
I am seeing unexpected behavior: consistency level ONE increases read latency 99th percentile to ~108ms (95th percentile to 5ms-90ms) up from ~5ms (99th percentile) when using LOCAL_ONE.

I am using DSE 5.0 with Datastax client 3.0.0. The client is configured with a TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with usedHostsPerRemoteDc set to a very high number. Cassandra cluster has two datacenters.

I would expect that when the cluster is operating normally (all local nodes reachable), ONE would behave the same as LOCAL_ONE. The  Does anyone know why this is not the case?

Re: ONE has much higher latency than LOCAL_ONE

Posted by Matija Gobec <ma...@gmail.com>.

Are you running a multi DC cluster? If yes do you have application in both
data centers/regions ?

On Tue, Mar 21, 2017 at 8:07 PM, Shannon Carey <sc...@expedia.com> wrote:

> I am seeing unexpected behavior: consistency level ONE increases read
> latency 99th percentile to ~108ms (95th percentile to 5ms-90ms) up from
> ~5ms (99th percentile) when using LOCAL_ONE.
>
> I am using DSE 5.0 with Datastax client 3.0.0. The client is configured
> with a TokenAwarePolicy wrapping a DCAwareRoundRobinPolicy with
> usedHostsPerRemoteDc set to a very high number. Cassandra cluster has two
> datacenters.
>
> I would expect that when the cluster is operating normally (all local
> nodes reachable), ONE would behave the same as LOCAL_ONE. The  Does anyone
> know why this is not the case?
>