You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Owen Kim <oh...@gmail.com> on 2014/10/07 22:38:12 UTC

MIssing data in range query

Hello,

I'm running Cassandra 1.2.16 with supercolumns and Hector.

create column family CFName

  with column_type = 'Super'

  and comparator = 'UTF8Type'

  and subcomparator = 'UTF8Type'

  and default_validation_class = 'UTF8Type'

  and key_validation_class = 'UTF8Type'

  and read_repair_chance = 0.2

  and dclocal_read_repair_chance = 0.0

  and populate_io_cache_on_flush = false

  and gc_grace = 43200

  and min_compaction_threshold = 4

  and max_compaction_threshold = 32

  and replicate_on_write = true

  and compaction_strategy =
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'

  and caching = 'KEYS_ONLY';


I'm doing a adding a time series supercolumn then doing a slice query over
this super column. I'm really just trying to see if any data is in the time
slice so I'm doing a slice query with limit 1. The insert isn't at the data
bounds.

However, sometimes, nothing shows up in the time slice, even 8 seconds
after the insert. I'm doing quorum reads and writes so I'd expect
consistent results but the slice query comes up empty, even if there have
been multiple inserts.

I'm not sure what's happening here and trying to narrow down suspects. Can
key caching produce stale results? Do slice queries have different
consistency guarantees?

Re: MIssing data in range query

Posted by Owen Kim <oh...@gmail.com>.

Nope. No secondary index. Just a slice query on the PK.



On Tuesday, October 7, 2014, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Oct 7, 2014 at 3:11 PM, Owen Kim <ohechkay@gmail.com
> <javascript:_e(%7B%7D,'cvml','ohechkay@gmail.com');>> wrote:
>
>> Sigh, it is a bit grating. I (genuinely) appreciate your acknowledgement
>> of that. Though, I didn't intend for the question to be "about"
>> supercolumns.
>>
>
> (Yep, understand tho that if you hadn't been told that advice before, it
> would grate a lot less. I will try to remember that "Owen Kim" has received
> this piece of info, and will do my best to not repeat it to you... :D)
>
>
>> It is possible I'm hitting an odd edge case though I'm having trouble
>> reproducing the issue in a controlled environment since there seems to be a
>> timing element to it, or at least it's not consistently happening. I
>> haven't been able to reproduce it on a single node test cluster. I'm moving
>> on to test a larger one now.
>>
>
> Right, my hypothesis is that there is something within the supercolumn
> write path which differs from the non-supercolumn write path. In theory
> this should be less possible since the 1.2 era supercolumn rewrite.
>
> To be clear, are you reading back via PK? No secondary indexes involved,
> right? The only bells your symptoms are ringing are secondary index bugs...
>
> =Rob
>
>

Re: MIssing data in range query

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Oct 7, 2014 at 3:11 PM, Owen Kim <oh...@gmail.com> wrote:

> Sigh, it is a bit grating. I (genuinely) appreciate your acknowledgement
> of that. Though, I didn't intend for the question to be "about"
> supercolumns.
>

(Yep, understand tho that if you hadn't been told that advice before, it
would grate a lot less. I will try to remember that "Owen Kim" has received
this piece of info, and will do my best to not repeat it to you... :D)

> It is possible I'm hitting an odd edge case though I'm having trouble
> reproducing the issue in a controlled environment since there seems to be a
> timing element to it, or at least it's not consistently happening. I
> haven't been able to reproduce it on a single node test cluster. I'm moving
> on to test a larger one now.
>

Right, my hypothesis is that there is something within the supercolumn
write path which differs from the non-supercolumn write path. In theory
this should be less possible since the 1.2 era supercolumn rewrite.

To be clear, are you reading back via PK? No secondary indexes involved,
right? The only bells your symptoms are ringing are secondary index bugs...

=Rob

Re: MIssing data in range query

Posted by Owen Kim <oh...@gmail.com>.

Sigh, it is a bit grating. I (genuinely) appreciate your acknowledgement of
that. Though, I didn't intend for the question to be "about" supercolumns.

It is possible I'm hitting an odd edge case though I'm having trouble
reproducing the issue in a controlled environment since there seems to be a
timing element to it, or at least it's not consistently happening. I
haven't been able to reproduce it on a single node test cluster. I'm moving
on to test a larger one now.

On Tue, Oct 7, 2014 at 2:39 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Oct 7, 2014 at 2:03 PM, Owen Kim <oh...@gmail.com> wrote:
>
>> I'm aware. I've had the system up since pre-composite columns and haven't
>> had the cycles to do a major data and schema migration.
>>
>> And that's not "slightly" non-responsive.
>>
>
> "There may be unknown bugs in the code you're using, especially because no
> one else uses it" is in fact slightly responsive. While I'm sure it does
> grate to be told that one should not be using a feature one cannot choose
> to not-use, I consider "don't use them" responsive to every question about
> supercolumns since 2010, unless the asker pre-emptively states they know
> this fact. I assure you that my meta-response is infinitely more responsive
> than the total non-response you were otherwise likely to receive...
>
> ... aaaaaaanyway ...
>
> Probably you are just hitting an edge case in the 1.2 era rewrite of
> supercolumns which no one else has ever encountered because no one uses
> them. For the record, I do not believe either of your hypotheses (key cache
> or slice queries having different guarantees) are likely to be implicated.
> One of them is trivial to test : create a test CF with the key cache
> disabled and try to repro there.
>
> Instead of attempting to debug by yourself, or on the user list (which
> will be full of people not-using supercolumns) I suggest filing an JIRA
> with reproduction steps, and then mentioning the URL on this thread for
> future googlers.
>
> =Rob
>
>
>

Re: MIssing data in range query

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Oct 7, 2014 at 2:03 PM, Owen Kim <oh...@gmail.com> wrote:

> I'm aware. I've had the system up since pre-composite columns and haven't
> had the cycles to do a major data and schema migration.
>
> And that's not "slightly" non-responsive.
>

"There may be unknown bugs in the code you're using, especially because no
one else uses it" is in fact slightly responsive. While I'm sure it does
grate to be told that one should not be using a feature one cannot choose
to not-use, I consider "don't use them" responsive to every question about
supercolumns since 2010, unless the asker pre-emptively states they know
this fact. I assure you that my meta-response is infinitely more responsive
than the total non-response you were otherwise likely to receive...

... aaaaaaanyway ...

Probably you are just hitting an edge case in the 1.2 era rewrite of
supercolumns which no one else has ever encountered because no one uses
them. For the record, I do not believe either of your hypotheses (key cache
or slice queries having different guarantees) are likely to be implicated.
One of them is trivial to test : create a test CF with the key cache
disabled and try to repro there.

Instead of attempting to debug by yourself, or on the user list (which will
be full of people not-using supercolumns) I suggest filing an JIRA with
reproduction steps, and then mentioning the URL on this thread for future
googlers.

=Rob

Re: MIssing data in range query

Posted by Owen Kim <oh...@gmail.com>.

I'm aware. I've had the system up since pre-composite columns and haven't
had the cycles to do a major data and schema migration.

And that's not "slightly" non-responsive.

On Tue, Oct 7, 2014 at 1:49 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Oct 7, 2014 at 1:38 PM, Owen Kim <oh...@gmail.com> wrote:
>
>> I'm running Cassandra 1.2.16 with supercolumns and Hector.
>>
>
> Slightly non-responsive response :
>
> In general supercolumn use is not recommended. It makes it more difficult
> to get support when one uses a feature no one else uses.
>
> =Rob
>
>

Re: MIssing data in range query

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Oct 7, 2014 at 1:38 PM, Owen Kim <oh...@gmail.com> wrote:

> I'm running Cassandra 1.2.16 with supercolumns and Hector.
>

Slightly non-responsive response :

In general supercolumn use is not recommended. It makes it more difficult
to get support when one uses a feature no one else uses.

=Rob