You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Mathijs Vogelzang (JIRA)" <ji...@apache.org> on 2015/06/04 14:13:38 UTC
[jira] [Commented] (CASSANDRA-9540) Cql query doesn't return right information when using IN on columns for some keys

    [ https://issues.apache.org/jira/browse/CASSANDRA-9540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572659#comment-14572659 ] 

Mathijs Vogelzang commented on CASSANDRA-9540:
----------------------------------------------

I was able to make this case reproducible by getting the key from our production database and saving it with sstable2json.
It seems that the broken keys have one somewhat bigger cell (144 kB in this case).

A testcase json file is available at https://www.dropbox.com/s/kzu2jmqmwz788k8/testcase.json?dl=0 (the SSTable that it creates is available at https://www.dropbox.com/s/ilwpjka5r70n2os/test-testbug-ka-2-Data.db?dl=0 )

The commands that I executed are in cqlsh:
{noformat}
create keyspace test with replication={'class':'SimpleStrategy','replication_factor':1};
use test;
CREATE TABLE test.testbug (     key blob,     column1 blob,     value blob,     PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE     AND CLUSTERING ORDER BY (column1 ASC)     AND bloom_filter_fp_chance = 0.01     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'     AND comment = ''     AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}     AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}     AND dclocal_read_repair_chance = 0.1     AND default_time_to_live = 0     AND gc_grace_seconds = 864000     AND max_index_interval = 2048     AND memtable_flush_period_in_ms = 0     AND min_index_interval = 128     AND read_repair_chance = 1.0     AND speculative_retry = 'NONE';
{noformat}
Then I injected the json as follows on the commandline:
{noformat}
json2sstable -K test -c testbug testcase.json DATADIR/test/testbug/test-testbug-ka-1-Data.db
nodetool refresh test testbug
{noformat}

The cqlsh then behaves as reported earlier, where querying for row "test" behaves as expected, but for row "broken" we don't get any data when we ask for columns that don't exist in that row (depending on the exact set of asked for columns, either 'metadata' is dropped, or no data is returned at all):
{noformat}
cqlsh:test> select blobAsText(key),blobAsText(column1) from testbug where key=textAsBlob('test') and column1 in (textAsBlob('metadata'),textAsBlob('bucket/0'));

 blobAsText(key) | blobAsText(column1)
-----------------+---------------------
            test |            bucket/0
            test |            metadata

(2 rows)
cqlsh:test> select blobAsText(key),blobAsText(column1) from testbug where key=textAsBlob('broken') and column1 in (textAsBlob('metadata'),textAsBlob('bucket/0'));

 blobAsText(key) | blobAsText(column1)
-----------------+---------------------
          broken |            bucket/0
          broken |            metadata

(2 rows)
cqlsh:test> select blobAsText(key),blobAsText(column1) from testbug where key=textAsBlob('test') and column1 in (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdf'));

 blobAsText(key) | blobAsText(column1)
-----------------+---------------------
            test |            bucket/0
            test |            metadata

(2 rows)
cqlsh:test> select blobAsText(key),blobAsText(column1) from testbug where key=textAsBlob('broken') and column1 in (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdf'));

 key | column1 | value
-----+---------+-------

(0 rows)
cqlsh:test> select blobAsText(key),blobAsText(column1) from testbug where key=textAsBlob('broken') and column1 in (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'));

 blobAsText(key) | blobAsText(column1)
-----------------+---------------------
          broken |            bucket/0

(1 rows)
{noformat}

> Cql query doesn't return right information when using IN on columns for some keys
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9540
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>         Environment: Cassandra 2.1.5
>            Reporter: Mathijs Vogelzang
>
> We are investigating a weird issue where one of our clients doesn't get data on his dashboard. It seems Cassandra is not returning data for a particular key ("brokenkey" from now on).
> Some background:
> We have a row where we store a "metadata" column and data in columns "bucket/0", "bucket/1", "bucket/2", etc. Depending on the date selection of the UI, we know that we only need to retrieve bucket/0, bucket/0 and bucket/1 etc. (we always need to retrieve "metadata").
> A typical query may look like this (using SELECT column1 to just show what is returned, normally we would of course do SELECT value):
> {noformat}
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/workingkey');
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/brokenkey');
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> {noformat}
> These two queries work as expected, and return the information that we actually stored.
> However, when we "filter" for certain columns, the brokenkey starts behaving very weird:
> {noformat}
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/workingkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
>             metadata
> (2 rows)
> ***  As expected, querying for more information doesn't really matter for the working key ***
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'));
>  blobAsText(column1)
> ---------------------
>             bucket/0
> (1 rows)
> *** Cassandra stops giving us the metadata column when asking for a few more columns! ***
> cqlsh:AppBrain> select blobAsText(column1) from "GroupedSeries" where key=textAsBlob('install/brokenkey') and column1 IN (textAsBlob('metadata'),textAsBlob('bucket/0'),textAsBlob('bucket/1'),textAsBlob('bucket/2'),textAsBlob('asdfasdfasdf'));
>  key | column1 | value
> -----+---------+-------
> (0 rows)
> *** Adding the bogus column name even makes it return nothing from this row anymore! ***
> {noformat}
> There are at least two rows that malfunction like this in our table (which is quite old already and has gone through a bunch of Cassandra upgrades). I've upgraded our whole cluster to 2.1.5 (we were on 2.1.2 when I discovered this problem) and compacted, repaired and scrubbed this column family, which hasn't helped.
> Our table structure is:
> {noformat}
> cqlsh:AppBrain> describe table "GroupedSeries";
> CREATE TABLE "AppBrain"."GroupedSeries" (
>     key blob,
>     column1 blob,
>     value blob,
>     PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE
>     AND CLUSTERING ORDER BY (column1 ASC)
>     AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment = ''
>     AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
>     AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 1.0
>     AND speculative_retry = 'NONE';
> {noformat}
> Let me know if I can give more information that may be helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)