You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Voytek Jarnot <vo...@gmail.com> on 2017/03/17 17:03:05 UTC

Very odd & inconsistent results from SASI query

Cassandra 3.9, 4 nodes, rf=3

Hi folks, we're see 0 results returned from queries that (a) should return
results, and (b) will return results with minor tweaks.

I've attached the sanitized trace outputs for the following 3 queries (pk1
and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed
non-key column):

Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001
ALLOW FILTERING;
Q1 works - it returns a list of records, one of which has
val1='abcdefghijklmn'.

Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefghi%'  LIMIT
1001 ALLOW FILTERING;
Q2 does not work - 0 results returned. Only difference to Q1 is one
additional character provided in LIKE comparison.

Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
'2017-03-16' AND ck2 <= '2017-03-17'  AND val1 = 'abcdefghijklmn'  LIMIT
1001 ALLOW FILTERING;
Q3 does not work - 0 results returned.

As I've written above, the data set *does* include a record with
val1='abcdefghijklmn'.

Confounding the issue is that this behavior is inconsistent.  For different
values of val1, I'll have scenarios where Q3 works, but Q1 and Q2 do not.
Now, that particular behavior I could explain with index/like problems, but
it is Q3 that sometimes does not work and that's a simply equality
comparison (although still using the index).

Further confounding the issue is that if my testers run these same queries
with the same parameters tomorrow, they're likely to work correctly.

Only thing I've been able to glean from tracing execution is that the
queries that work follow "Executing read..." with "Executing single
partition query on t1" and so forth,  whereas the queries that don't work
simply follow "Executing read..." with "Read 0 live and 0 tombstone cells"
with no actual work seemingly done. But that's not helping me narrow this
down much.

Thanks for your time - appreciate any help.

Re: Very odd & inconsistent results from SASI query

Posted by Voytek Jarnot <vo...@gmail.com>.

Apologies for the stream-of-consciousness replies, but are the dropped
message stats output by tpstats an accumulation since the node came up, or
are there processes which clear and/or time-out the info?

On Mon, Mar 20, 2017 at 3:18 PM, Voytek Jarnot <vo...@gmail.com>
wrote:

> No dropped messages in tpstats on any of the nodes.
>
> On Mon, Mar 20, 2017 at 3:11 PM, Voytek Jarnot <vo...@gmail.com>
> wrote:
>
>> Appreciate the reply, Kurt.
>>
>> I sanitized it out of the traces, but all trace outputs listed the same
>> node for all three queries (1 working, 2 not working). Read repair chance
>> set to 0.0 as recommended when using TWCS.
>>
>> I'll check tpstats - in this environment, load is not an issue, but
>> network issues may be.
>>
>> On Mon, Mar 20, 2017 at 2:42 PM, kurt greaves <ku...@instaclustr.com>
>> wrote:
>>
>>> As secondary indexes are stored individually on each node what you're
>>> suggesting sounds exactly like a consistency issue. the fact that you read
>>> 0 cells on one query implies the node that got the query did not have any
>>> data for the row. The reason you would sometimes see different behaviours
>>> is likely because of read repairs. The fact that the repair guides the
>>> issue pretty much guarantees it's a consistency issue.
>>>
>>> You should check for dropped mutations in tpstats/logs and if they are
>>> occurring try and stop that from happening (probably load related). You
>>> could also try performing reads and writes at LOCAL_QUORUM for stronger
>>> consistency, however note this has a performance/latency impact.
>>>
>>>
>>>
>>
>

Re: Very odd & inconsistent results from SASI query

Posted by Voytek Jarnot <vo...@gmail.com>.

No dropped messages in tpstats on any of the nodes.

On Mon, Mar 20, 2017 at 3:11 PM, Voytek Jarnot <vo...@gmail.com>
wrote:

> Appreciate the reply, Kurt.
>
> I sanitized it out of the traces, but all trace outputs listed the same
> node for all three queries (1 working, 2 not working). Read repair chance
> set to 0.0 as recommended when using TWCS.
>
> I'll check tpstats - in this environment, load is not an issue, but
> network issues may be.
>
> On Mon, Mar 20, 2017 at 2:42 PM, kurt greaves <ku...@instaclustr.com>
> wrote:
>
>> As secondary indexes are stored individually on each node what you're
>> suggesting sounds exactly like a consistency issue. the fact that you read
>> 0 cells on one query implies the node that got the query did not have any
>> data for the row. The reason you would sometimes see different behaviours
>> is likely because of read repairs. The fact that the repair guides the
>> issue pretty much guarantees it's a consistency issue.
>>
>> You should check for dropped mutations in tpstats/logs and if they are
>> occurring try and stop that from happening (probably load related). You
>> could also try performing reads and writes at LOCAL_QUORUM for stronger
>> consistency, however note this has a performance/latency impact.
>>
>>
>>
>

Re: Very odd & inconsistent results from SASI query

Posted by Voytek Jarnot <vo...@gmail.com>.

Appreciate the reply, Kurt.

I sanitized it out of the traces, but all trace outputs listed the same
node for all three queries (1 working, 2 not working). Read repair chance
set to 0.0 as recommended when using TWCS.

I'll check tpstats - in this environment, load is not an issue, but network
issues may be.

On Mon, Mar 20, 2017 at 2:42 PM, kurt greaves <ku...@instaclustr.com> wrote:

> As secondary indexes are stored individually on each node what you're
> suggesting sounds exactly like a consistency issue. the fact that you read
> 0 cells on one query implies the node that got the query did not have any
> data for the row. The reason you would sometimes see different behaviours
> is likely because of read repairs. The fact that the repair guides the
> issue pretty much guarantees it's a consistency issue.
>
> You should check for dropped mutations in tpstats/logs and if they are
> occurring try and stop that from happening (probably load related). You
> could also try performing reads and writes at LOCAL_QUORUM for stronger
> consistency, however note this has a performance/latency impact.
>
>
>

Re: Very odd & inconsistent results from SASI query

Posted by kurt greaves <ku...@instaclustr.com>.

As secondary indexes are stored individually on each node what you're
suggesting sounds exactly like a consistency issue. the fact that you read
0 cells on one query implies the node that got the query did not have any
data for the row. The reason you would sometimes see different behaviours
is likely because of read repairs. The fact that the repair guides the
issue pretty much guarantees it's a consistency issue.

You should check for dropped mutations in tpstats/logs and if they are
occurring try and stop that from happening (probably load related). You
could also try performing reads and writes at LOCAL_QUORUM for stronger
consistency, however note this has a performance/latency impact.

Re: Very odd & inconsistent results from SASI query

Posted by Voytek Jarnot <vo...@gmail.com>.

A wrinkle further confounds the issue: running a repair on the node which
was servicing the queries has cleared things up and all the queries now
work.

That doesn't make a whole lot of sense to me - my assumption was that a
repair shouldn't have fixed it.

On Fri, Mar 17, 2017 at 12:03 PM, Voytek Jarnot <vo...@gmail.com>
wrote:

> Cassandra 3.9, 4 nodes, rf=3
>
> Hi folks, we're see 0 results returned from queries that (a) should return
> results, and (b) will return results with minor tweaks.
>
> I've attached the sanitized trace outputs for the following 3 queries (pk1
> and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed
> non-key column):
>
> Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefgh%'  LIMIT 1001
> ALLOW FILTERING;
> Q1 works - it returns a list of records, one of which has
> val1='abcdefghijklmn'.
>
> Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck1 <= '2017-03-17'  AND val1 LIKE 'abcdefghi%'  LIMIT
> 1001 ALLOW FILTERING;
> Q2 does not work - 0 results returned. Only difference to Q1 is one
> additional character provided in LIKE comparison.
>
> Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11  AND  ck1 >=
> '2017-03-16' AND ck2 <= '2017-03-17'  AND val1 = 'abcdefghijklmn'  LIMIT
> 1001 ALLOW FILTERING;
> Q3 does not work - 0 results returned.
>
> As I've written above, the data set *does* include a record with
> val1='abcdefghijklmn'.
>
> Confounding the issue is that this behavior is inconsistent.  For
> different values of val1, I'll have scenarios where Q3 works, but Q1 and Q2
> do not. Now, that particular behavior I could explain with index/like
> problems, but it is Q3 that sometimes does not work and that's a simply
> equality comparison (although still using the index).
>
> Further confounding the issue is that if my testers run these same queries
> with the same parameters tomorrow, they're likely to work correctly.
>
> Only thing I've been able to glean from tracing execution is that the
> queries that work follow "Executing read..." with "Executing single
> partition query on t1" and so forth,  whereas the queries that don't work
> simply follow "Executing read..." with "Read 0 live and 0 tombstone cells"
> with no actual work seemingly done. But that's not helping me narrow this
> down much.
>
> Thanks for your time - appreciate any help.
>