You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2014/12/01 20:09:13 UTC

[jira] [Comment Edited] (CASSANDRA-6976) Determining replicas to query is very slow with large numbers of nodes or vnodes

    [ https://issues.apache.org/jira/browse/CASSANDRA-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230244#comment-14230244 ] 

Ariel Weisberg edited comment on CASSANDRA-6976 at 12/1/14 7:08 PM:
--------------------------------------------------------------------

bq. Sure it does - if an action that is likely memory bound (like this one - after all, i
The entire thing runs in 60 milliseconds with 2000 tokens. That is 2x the time to warm up the cache (assuming a correct number for warmup). So warming up the cache is definitely impacting the numbers, but not changing it from 100s of milliseconds to 10s. Tack on the time  to warm up the last level cache to the current time and still the same order of magnitude.

bq. For a lookup (i.e. small) table query, or a range query that can be serviced entirely by the local node, it is quite unlikely that the fetching would dominate when talking about timescales >= 1ms.
Range queries are slow because they produce a lot of ranges. That means contacting a lot of nodes. The cost of getRestrictedRanges is proportional to the cost of getRangeSlice, but still a small part of overall execution time.

If the lookup table really only needed to contact one node getRestrictedRanges wouldn't run for long and would return a small set of ranges right?

bq. Like I said, please do feel to drop this particular line of enquiry for the moment, ...
What your describing is that it's bad in production we just don't see it in test. I don't see a reason to drop it just because the ticket got caught up in implementation details and not the user facing issue we want to address.  [~jbellis]?

bq. In the meantime it might be worth having a simple short-circuit path for queries that may be answered by the local node only, though.
What queries could identify that this shortcut is possible? By nature those queries would only hit one local node if they didn't cover a lot of ranges in which case all the problem code we are discussing runs relatively fast (compared to its worst case).


was (Author: aweisberg):
bq. Sure it does - if an action that is likely memory bound (like this one - after all, i
The entire thing runs in 60 milliseconds with 2000 tokens. That is 2x the time to warm up the cache (assuming a correct number for warmup). So warming up the cache is definitely impacting the numbers, but not changing it from 100s of milliseconds to 10s. Tack on the time  to warm up the last level cache to the current time and still the same order of magnitude. We could do the cache optimization thing and then find out that in practice the cache is not beneficial anyways.

bq. For a lookup (i.e. small) table query, or a range query that can be serviced entirely by the local node, it is quite unlikely that the fetching would dominate when talking about timescales >= 1ms.
Range queries are slow because they produce a lot of ranges. That means contacting a lot of nodes. The cost of getRestrictedRanges is proportional to the cost of getRangeSlice, but still a small part of overall execution time.

If the lookup table really only needed to contact one node getRestrictedRanges wouldn't run for long and would return a small set of ranges right?

bq. Like I said, please do feel to drop this particular line of enquiry for the moment, ...
What your describing is that it's bad in production we just don't see it in test. I don't see a reason to drop it just because the ticket got caught up in implementation details and not the user facing issue we want to address.  [~jbellis]?

bq. In the meantime it might be worth having a simple short-circuit path for queries that may be answered by the local node only, though.
What queries could identify that this shortcut is possible? By nature those queries would only hit one local node if they didn't cover a lot of ranges in which case all the problem code we are discussing runs relatively fast (compared to its worst case).

> Determining replicas to query is very slow with large numbers of nodes or vnodes
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6976
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6976
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>              Labels: performance
>         Attachments: GetRestrictedRanges.java, jmh_output.txt, jmh_output_murmur3.txt, make_jmh_work.patch
>
>
> As described in CASSANDRA-6906, this can be ~100ms for a relatively small cluster with vnodes, which is longer than it will spend in transit on the network. This should be much faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)