You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Updated) (JIRA)" <ji...@apache.org> on 2012/01/02 15:04:32 UTC
[jira] [Updated] (CASSANDRA-1600) Merge get_indexed_slices with get_range_slices

     [ https://issues.apache.org/jira/browse/CASSANDRA-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-1600:
----------------------------------------

    Attachment: 0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v4.patch
                0002-thrift-generated-code-changes-v4.patch
                0001-Add-optional-index-clause-to-KeyRange-v4.patch

Attaching rebased and updated v4 (I've squashed the CQL changes into the 3rd patch).

bq. What do we gain from "typedefing" List<IndexExpression> to FilterClause? (I note this was part of Stu and my original attempts back in April but I don't remember a good reason for that.)

I do not really see a good reason either (I rebased that part without really thinking about it). I've removed FilterClause in v4.

{quote}
{noformat}
+        /*
+         * XXX: If the range requested is a token range, we'll have to start at the beginning (and stop at the end) of
+         * the indexed row unfortunately (which will be inefficient), because we have not way to intuit the small
+         * possible key having a given token. A fix would be to actually store the token along the key in the
+         * indexed row.
+         */
{noformat}
This is fine since there's no reason to be searching by token unless you're doing an exhaustive scan, i.e. a m/r job.
{quote}

Actually, I believe this does have some impact. When a requested range of keys spans multiple nodes, the range is splitted using nodes Token (by SP.getRestrictedRanges()). And while it is true that in simple cases this amount to checking all the keys in the node and thus is not inefficient, there is at least 2 cases where this isn't the case:
* If the ranges a node is responsible of are not contiguous. Say a node is responsible for range (5, 10] and (20, 30]. And say a user requests a range of keys such that the start key token is 15 (and the end key token is say 25). Then the node will end up being requested for keys in (20, 25] (where those are tokens, not keys). In that case, it will have to uselessly scan (and skip) all keys in (0, 10] the node has.
* When token range wraps. This is actually the same problem than above in that when a node has a range like (2^127-1, 2^126], it really hold both ranges (0, 2^126] and (2^127, 2^127-1] which are non contiguous as far as the ordering on disk is concerned.
While the 'the strategy creates non-contiguous ranges' problem is not a huge deal since neither SimpleStrategy nor NTS creates them (if I'm correct), the instance of the problem due to wrapping ranges is a very real inefficiency of the implementation.

As a side note, storing the token as suggested in the comments above would also likely have the benefit of helping a good deal the problem of the time spent doing digest computations noted in CASSANDRA-3545.

{quote}
{noformat}
+                        rows.addAll(RangeSliceVerbHandler.executeLocally(command));
{noformat}
Another place the original patches failed... we should avoid this because it means we're now allowing one range scan per thrift client instead of one per read stage thread, and it bypasses the "drop hopeless requests" overcapacity protection built in there. Look at SP.LocalReadRunnable for how to do this safely. Simplest fix would be to just continue routing all range scans over MessagingService.
{quote}

While I agree with you, I'll note that this is not a problem introduced by this patch: the current code does a call to CFS.getRangeSlice() directly in that case (bypassing the read stage and the protections coming with it). We clearly should fix that but it's neither a detail nor related to this patch so I suggest spawning another ticket for that instead.
                
> Merge get_indexed_slices with get_range_slices
> ----------------------------------------------
>
>                 Key: CASSANDRA-1600
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1600
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Stu Hood
>            Assignee: Sylvain Lebresne
>             Fix For: 1.1
>
>         Attachments: 0001-Add-optional-FilterClause-to-KeyRange-and-support-do-v2.patch, 0001-Add-optional-FilterClause-to-KeyRange-and-support-doin.txt, 0001-Add-optional-FilterClause-to-KeyRange-v3.patch, 0001-Add-optional-index-clause-to-KeyRange-v4.patch, 0002-allow-get_range_slices-to-apply-filter-to-a-sequenti-v2.patch, 0002-allow-get_range_slices-to-apply-filter-to-a-sequential.txt, 0002-thrift-generated-code-changes-v3.patch, 0002-thrift-generated-code-changes-v4.patch, 0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v3.patch, 0003-Allow-get_range_slices-to-apply-filter-to-a-sequenti-v4.patch, 0004-Update-cql-to-not-use-deprecated-index-scan-v3.patch
>
>
> From a comment on 1157:
> {quote}
> IndexClause only has a start key for get_indexed_slices, but it would seem that the reasoning behind using 'KeyRange' for get_range_slices applies there as well, since if you know the range you care about in the primary index, you don't want to continue scanning until you exhaust 'count' (or the cluster).
> Since it would appear that get_indexed_slices would benefit from a KeyRange, why not smash get_(range|indexed)_slices together, and make IndexClause an optional field on KeyRange?
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira