You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Joey Lynch (Jira)" <ji...@apache.org> on 2019/11/11 07:37:00 UTC

[jira] [Commented] (CASSANDRA-14459) DynamicEndpointSnitch should never prefer latent nodes

    [ https://issues.apache.org/jira/browse/CASSANDRA-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971360#comment-16971360 ] 

Joey Lynch commented on CASSANDRA-14459:
----------------------------------------

[~bdeggleston] alright I finally got this rebase sorted out. The messaging patches and in-jvm dtests conflicted a lot unfortunately; but on the bright side they actually did a lot of the cleanups for us so that was nice. I still have a few test cases to fixup (testScheduleProbes and the DESLongTest), but I figured getting this patch moving again is a good idea.
||trunk||
|[branch|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-14459-rebase]|
|[!https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-14459.png?circle-token= 1102a59698d04899ec971dd36e925928f7b521f5!|https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-14459-rebase]|
{quote}Some general notes:
 * I don't think we need a dynamicsnitch package, the 2 implementations could just live in the locator package.
 * I can't find anywhere that we validate the various dynamic snitch values? I didn't look super hard, but if we're not, it would be good to check people are using sane values on startup and jmx update.
 * It's probably a good time to rebase onto current trunk.{quote}
Ok I've removed the package and rebased. I've also validated the snitch values in {{applyConfigChanges}} which I believe is called in all paths (constructor, dynamic swap, etc ...).
{quote}ScheduledExecutors:
 * {{getOrCreateSharedExecutor}} could be private
 * shutdown times accumulate in the wait section. ie: if we have 2 executors with a shutdown time of 2 seconds, we'll wait up to 4 seconds for them to stop.
 * we should validate shutdown hasn't been called in {{getOrCreateSharedExecutor}}
 * I don't see any benefit to making this class lock free. Making the {{getOrCreateSharedExecutor}} and {{shutdownAndWait}} methods synchronized and just using a normal hash map for {{executors}} would make this class easier to reason about. As it stands, you could argue there's some raciness with creating and shutting down at the same time, although it's unlikely to be a problem in practice. I do think I might be borderline nit picking here though.{quote}
My intention for {{getOrCreateSharedExecutor}} is that it is {{public}} so that we can safely create STPEs elsewhere in the codebase without running the risk of failing to shutdown the executors (e.g. during in jvm dtests). Are you just referring to the overloaded one that doesn't provide the constructor?

I also am not sure I agree we need to validate shutdown since the executors themselves will throw exceptions if a task is scheduled and I can't think of a better thing to do than throw an exception. Curious what you'd prefer we do in that case?

I've fixed the shutdown slowness you mentioned by re-using the existing shutdown multiple threadpool logic and I've made it just use regular hashmaps.
{quote}StorageService
 * {{doLocalReadTest}} method is unused{quote}
Oops, missed that in the refactors (I was planning on making PingMessage's require a local disk read to make the ping messages a little more realistic but then I decided against it. Removed.
{quote}MessagingService
 * Fix class declaration indentation
 * remove unused import{quote}
Fixed.
{quote}DynamicEndpointSnitch
 * I don't think we need to check {{logger.isTraceEnabled}} before calling {{logger.trace()}}? At least I don't see any toString computation that would happen in the calls.
 * The class hierarchy could be improved a bit. There's code in {{DynamicEndpointSnitchHistogram}} that's also used in the legacy snitch, and code in {{DynamicEndpointSnitch}} that's only used in {{DynamicEndpointSnitchHistogram}}. The boundary between DynamicEndpointSnitch and DynamicEndpointSnitchHistogram in particular feels kind of arbitrary.{quote}
Agreed on the logging point, I've taken it out since fwict there isn't any hot path that's doing logging.

Regarding the class hierarchy I agree now that the whole DES is pluggable these seams don't make much sense (since you can just load your own dsnitch). I've still tried making specific methods protected that I hope to play around with in CASSANDRA-14817 to test out different snitch implementations but I've gotten rid of the inheritance for DES and DESH (leaving only {{DynamicEndpointSnitchLegacy}}). I hope that's acceptable?
{quote}DynamicEndpointLegacySnitch
 * If we're going to keep the old behavior around as a failsafe (and we should), I think we should avoid improving it by changing the reset behavior. Only resetting under some situations intuitively feels like the right thing to do, but it would suck if there were unforeseen problems with it that made it a regression from the 3.x behavior.{quote}
Ok, we will dump the entire map every 10 minutes by default in the Legacy behavior.

> DynamicEndpointSnitch should never prefer latent nodes
> ------------------------------------------------------
>
>                 Key: CASSANDRA-14459
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14459
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Coordination
>            Reporter: Joey Lynch
>            Assignee: Joey Lynch
>            Priority: Low
>              Labels: 4.0-feature-freeze-review-requested, pull-request-available
>             Fix For: 4.x
>
>          Time Spent: 25.5h
>  Remaining Estimate: 0h
>
> The DynamicEndpointSnitch has two unfortunate behaviors that allow it to provide latent hosts as replicas:
>  # Loses all latency information when Cassandra restarts
>  # Clears latency information entirely every ten minutes (by default), allowing global queries to be routed to _other datacenters_ (and local queries cross racks/azs)
> This means that the first few queries after restart/reset could be quite slow compared to average latencies. I propose we solve this by resetting to the minimum observed latency instead of completely clearing the samples and extending the {{isLatencyForSnitch}} idea to a three state variable instead of two, in particular {{YES}}, {{NO}}, {{MAYBE}}. This extension allows {{EchoMessages}} and {{PingMessages}} to send {{MAYBE}} indicating that the DS should use those measurements if it only has one or fewer samples for a host. This fixes both problems because on process restart we send out {{PingMessages}} / {{EchoMessages}} as part of startup, and we would reset to effectively the RTT of the hosts (also at that point normal gossip {{EchoMessages}} have an opportunity to add an additional latency measurement).
> This strategy also nicely deals with the "a host got slow but now it's fine" problem that the DS resets were (afaik) designed to stop because the {{EchoMessage}} ping latency will count only after the reset for that host. Ping latency is a more reasonable lower bound on host latency (as opposed to status quo of zero).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org