You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Edouard COLE <Ed...@rgsystem.com> on 2016/02/04 09:36:40 UTC

Duplicated key with an IN statement

Hello,

I just discovered this, and I think this is weird:

ed@debian:~$ cqlsh 192.168.10.8
Connected to _CLUSTER_ at 192.168.10.8:9160.
[cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE ks-test ;
cqlsh:ks-test> CREATE TABLE t (
            ...     key int,
            ...     value int,
            ...     PRIMARY KEY (key)
            ... );
cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
cqlsh:ks-test> SELECT * FROM t ;

 key | value
-----+-------
 123 |   456

(1 rows)

cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);

 key | value
-----+-------
 123 |   456
 123 |   456 <----- WTF?

(2 rows)

Adding multiple time the same key into an IN statement make the query returns multiple time the tuple

This looks weird to me, can anyone give me some feedback on such a behavior?

Edouard COLE

RE: Duplicated key with an IN statement

Posted by Edouard COLE <Ed...@rgsystem.com>.

Thanks :)

De : Robert Wille [mailto:rwille@fold3.com]
Envoyé : Thursday, February 04, 2016 4:37 PM
À : user@cassandra.apache.org
Objet : Re: Duplicated key with an IN statement

You shouldn't be using IN anyway. It is better to issue multiple queries, each for a single key, and issue them in parallel. Better performance. Less GC pressure.

On Feb 4, 2016, at 7:54 AM, Sylvain Lebresne <sy...@datastax.com>> wrote:


That behavior has been changed in 2.2 and upwards. If you don't like it, upgrade. In the meantime, it's probably not hard to avoid passing duplicate keys in IN.

On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE <Ed...@rgsystem.com>> wrote:
Hello,

When running that kind of query with TRACING ON; I noticed the coordinator is also performing multiple time the same query

Because the element in the IN statement can involve many nodes, it makes sense to map/reduce the query, but running multiple time the same sub query should not happen. What if the result set change? Let's imagine that query : SELECT * FROM t WHERE key IN (123, 123, .... X1000, 123), and while this query runs, the data for 123 change?

key | value
-----+-------
123 |   456
123 |   456
 123 |   456
 123 |   789 <-- Change here :(
123 |   789


There's also something very important: when your table define a tuple being unique for a specific key, this is a real problem to be able to have a result set having multiple time the same key, which should be unique. This is why on every SQL implementation, this is not happening

I think this is a bug

Edouard COLE


De : Alain RODRIGUEZ [mailto:arodrime@gmail.com<ma...@gmail.com>]
Envoyé : Thursday, February 04, 2016 11:55 AM
À : Edouard COLE
Cc : user@cassandra.apache.org<ma...@cassandra.apache.org>
Objet : Re: Duplicated key with an IN statement

Hi,

This is interesting.

It seems rational that if you are looking at 2 keys and both exist (which is the case) it returns you 2 keys, it. Yet, I just checked this kind of command on MySQL and it gives a one line result. So here CQL differs from SQL (at least MySQL). I know we are trying to fit as much as possible with SQL to avoid loosing people, so we might want to change this.
Not sure if this behavior is intentional / known. Not even sure someone ever tried to do this kind of query actually :).

Does anyone know about that ? Should we raise a ticket ?

-----------------
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com<http://www.thelastpickle.com/>



2016-02-04 8:36 GMT+00:00 Edouard COLE <Ed...@rgsystem.com>>:
Hello,

I just discovered this, and I think this is weird:

ed@debian:~$ cqlsh 192.168.10.8
Connected to _CLUSTER_ at 192.168.10.8:9160<http://192.168.10.8:9160/>.
[cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE ks-test ;
cqlsh:ks-test> CREATE TABLE t (
            ...     key int,
            ...     value int,
            ...     PRIMARY KEY (key)
            ... );
cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
cqlsh:ks-test> SELECT * FROM t ;

 key | value
-----+-------
 123 |   456

(1 rows)

cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);

 key | value
-----+-------
 123 |   456
 123 |   456 <----- WTF?

(2 rows)

Adding multiple time the same key into an IN statement make the query returns multiple time the tuple

This looks weird to me, can anyone give me some feedback on such a behavior?

Edouard COLE

Re: Duplicated key with an IN statement

Posted by Robert Wille <rw...@fold3.com>.

You shouldn’t be using IN anyway. It is better to issue multiple queries, each for a single key, and issue them in parallel. Better performance. Less GC pressure.

On Feb 4, 2016, at 7:54 AM, Sylvain Lebresne <sy...@datastax.com>> wrote:

That behavior has been changed in 2.2 and upwards. If you don't like it, upgrade. In the meantime, it's probably not hard to avoid passing duplicate keys in IN.

On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE <Ed...@rgsystem.com>> wrote:
Hello,

When running that kind of query with TRACING ON; I noticed the coordinator is also performing multiple time the same query

Because the element in the IN statement can involve many nodes, it makes sense to map/reduce the query, but running multiple time the same sub query should not happen. What if the result set change? Let’s imagine that query : SELECT * FROM t WHERE key IN (123, 123, …. X1000, 123), and while this query runs, the data for 123 change?

key | value
-----+-------
123 |   456
123 |   456
 123 |   456
 123 |   789 <-- Change here :(
123 |   789


There’s also something very important: when your table define a tuple being unique for a specific key, this is a real problem to be able to have a result set having multiple time the same key, which should be unique. This is why on every SQL implementation, this is not happening

I think this is a bug

Edouard COLE


De : Alain RODRIGUEZ [mailto:arodrime@gmail.com<ma...@gmail.com>]
Envoyé : Thursday, February 04, 2016 11:55 AM
À : Edouard COLE
Cc : user@cassandra.apache.org<ma...@cassandra.apache.org>
Objet : Re: Duplicated key with an IN statement

Hi,

This is interesting.

It seems rational that if you are looking at 2 keys and both exist (which is the case) it returns you 2 keys, it. Yet, I just checked this kind of command on MySQL and it gives a one line result. So here CQL differs from SQL (at least MySQL). I know we are trying to fit as much as possible with SQL to avoid loosing people, so we might want to change this.
Not sure if this behavior is intentional / known. Not even sure someone ever tried to do this kind of query actually :).

Does anyone know about that ? Should we raise a ticket ?

-----------------
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com<http://www.thelastpickle.com/>



2016-02-04 8:36 GMT+00:00 Edouard COLE <Ed...@rgsystem.com>>:
Hello,

I just discovered this, and I think this is weird:

ed@debian:~$ cqlsh 192.168.10.8
Connected to _CLUSTER_ at 192.168.10.8:9160<http://192.168.10.8:9160/>.
[cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE ks-test ;
cqlsh:ks-test> CREATE TABLE t (
            ...     key int,
            ...     value int,
            ...     PRIMARY KEY (key)
            ... );
cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
cqlsh:ks-test> SELECT * FROM t ;

 key | value
-----+-------
 123 |   456

(1 rows)

cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);

 key | value
-----+-------
 123 |   456
 123 |   456 <----- WTF?

(2 rows)

Adding multiple time the same key into an IN statement make the query returns multiple time the tuple

This looks weird to me, can anyone give me some feedback on such a behavior?

Edouard COLE

Re: Duplicated key with an IN statement

Posted by Tyler Hobbs <ty...@datastax.com>.

On Thu, Feb 4, 2016 at 9:57 AM, Jack Krupansky <ja...@gmail.com>
wrote:

> there's a bug in CHANGES.TXT for this issue. It says: "Duplicate rows
> returned when in clause has repeated values (CASSANDRA-6707)", but the
> issue number is really 6706.
>

Thanks, I've fixed this.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Duplicated key with an IN statement

Posted by Jack Krupansky <ja...@gmail.com>.

Sylvain, there's a bug in CHANGES.TXT for this issue. It says: "Duplicate
rows returned when in clause has repeated values (CASSANDRA-6707)", but the
issue number is really 6706.

-- Jack Krupansky

On Thu, Feb 4, 2016 at 9:54 AM, Sylvain Lebresne <sy...@datastax.com>
wrote:

> That behavior has been changed in 2.2 and upwards. If you don't like it,
> upgrade. In the meantime, it's probably not hard to avoid passing duplicate
> keys in IN.
>
> On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE <Ed...@rgsystem.com>
> wrote:
>
>> Hello,
>>
>>
>>
>> When running that kind of query with TRACING ON; I noticed the
>> coordinator is also performing multiple time the same query
>>
>>
>>
>> Because the element in the IN statement can involve many nodes, it makes
>> sense to map/reduce the query, but running multiple time the same sub query
>> should not happen. What if the result set change? Let’s imagine that query
>> : SELECT * FROM t WHERE key IN (123, 123, …. X1000, 123), and while this
>> query runs, the data for 123 change?
>>
>>
>>
>> key | value
>>
>> -----+-------
>>
>> 123 |   456
>>
>> 123 |   456
>>
>>  123 |   456
>>
>>  123 |   789 <-- Change here L
>>
>> 123 |   789
>>
>>
>>
>>
>>
>> There’s also something very important: when your table define a tuple
>> being unique for a specific key, this is a real problem to be able to have
>> a result set having multiple time the same key, which should be unique.
>> This is why on every SQL implementation, this is not happening
>>
>>
>>
>> I think this is a bug
>>
>>
>>
>> Edouard COLE
>>
>>
>>
>>
>>
>> *De :* Alain RODRIGUEZ [mailto:arodrime@gmail.com]
>> *Envoyé :* Thursday, February 04, 2016 11:55 AM
>> *À :* Edouard COLE
>> *Cc :* user@cassandra.apache.org
>> *Objet :* Re: Duplicated key with an IN statement
>>
>>
>>
>> Hi,
>>
>>
>>
>> This is interesting.
>>
>>
>>
>> It seems rational that if you are looking at 2 keys and both exist (which
>> is the case) it returns you 2 keys, it. Yet, I just checked this kind of
>> command on MySQL and it gives a one line result. So here CQL differs from
>> SQL (at least MySQL). I know we are trying to fit as much as possible with
>> SQL to avoid loosing people, so we might want to change this.
>>
>> Not sure if this behavior is intentional / known. Not even sure someone
>> ever tried to do this kind of query actually :).
>>
>>
>>
>> Does anyone know about that ? Should we raise a ticket ?
>>
>>
>>
>> -----------------
>>
>> Alain Rodriguez
>>
>> France
>>
>>
>>
>> The Last Pickle
>>
>> http://www.thelastpickle.com
>>
>>
>>
>>
>>
>>
>>
>> 2016-02-04 8:36 GMT+00:00 Edouard COLE <Ed...@rgsystem.com>:
>>
>> Hello,
>>
>> I just discovered this, and I think this is weird:
>>
>> ed@debian:~$ cqlsh 192.168.10.8
>> Connected to _CLUSTER_ at 192.168.10.8:9160.
>> [cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol
>> 19.39.0]
>> Use HELP for help.
>> cqlsh> USE ks-test ;
>> cqlsh:ks-test> CREATE TABLE t (
>>             ...     key int,
>>             ...     value int,
>>             ...     PRIMARY KEY (key)
>>             ... );
>> cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
>> cqlsh:ks-test> SELECT * FROM t ;
>>
>>  key | value
>> -----+-------
>>  123 |   456
>>
>> (1 rows)
>>
>> cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);
>>
>>  key | value
>> -----+-------
>>  123 |   456
>>  123 |   456 <----- WTF?
>>
>> (2 rows)
>>
>> Adding multiple time the same key into an IN statement make the query
>> returns multiple time the tuple
>>
>> This looks weird to me, can anyone give me some feedback on such a
>> behavior?
>>
>> Edouard COLE
>>
>>
>>
>
>

Re: Duplicated key with an IN statement

Posted by Sylvain Lebresne <sy...@datastax.com>.

That behavior has been changed in 2.2 and upwards. If you don't like it,
upgrade. In the meantime, it's probably not hard to avoid passing duplicate
keys in IN.

On Thu, Feb 4, 2016 at 3:48 PM, Edouard COLE <Ed...@rgsystem.com>
wrote:

> Hello,
>
>
>
> When running that kind of query with TRACING ON; I noticed the coordinator
> is also performing multiple time the same query
>
>
>
> Because the element in the IN statement can involve many nodes, it makes
> sense to map/reduce the query, but running multiple time the same sub query
> should not happen. What if the result set change? Let’s imagine that query
> : SELECT * FROM t WHERE key IN (123, 123, …. X1000, 123), and while this
> query runs, the data for 123 change?
>
>
>
> key | value
>
> -----+-------
>
> 123 |   456
>
> 123 |   456
>
>  123 |   456
>
>  123 |   789 <-- Change here L
>
> 123 |   789
>
>
>
>
>
> There’s also something very important: when your table define a tuple
> being unique for a specific key, this is a real problem to be able to have
> a result set having multiple time the same key, which should be unique.
> This is why on every SQL implementation, this is not happening
>
>
>
> I think this is a bug
>
>
>
> Edouard COLE
>
>
>
>
>
> *De :* Alain RODRIGUEZ [mailto:arodrime@gmail.com]
> *Envoyé :* Thursday, February 04, 2016 11:55 AM
> *À :* Edouard COLE
> *Cc :* user@cassandra.apache.org
> *Objet :* Re: Duplicated key with an IN statement
>
>
>
> Hi,
>
>
>
> This is interesting.
>
>
>
> It seems rational that if you are looking at 2 keys and both exist (which
> is the case) it returns you 2 keys, it. Yet, I just checked this kind of
> command on MySQL and it gives a one line result. So here CQL differs from
> SQL (at least MySQL). I know we are trying to fit as much as possible with
> SQL to avoid loosing people, so we might want to change this.
>
> Not sure if this behavior is intentional / known. Not even sure someone
> ever tried to do this kind of query actually :).
>
>
>
> Does anyone know about that ? Should we raise a ticket ?
>
>
>
> -----------------
>
> Alain Rodriguez
>
> France
>
>
>
> The Last Pickle
>
> http://www.thelastpickle.com
>
>
>
>
>
>
>
> 2016-02-04 8:36 GMT+00:00 Edouard COLE <Ed...@rgsystem.com>:
>
> Hello,
>
> I just discovered this, and I think this is weird:
>
> ed@debian:~$ cqlsh 192.168.10.8
> Connected to _CLUSTER_ at 192.168.10.8:9160.
> [cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol
> 19.39.0]
> Use HELP for help.
> cqlsh> USE ks-test ;
> cqlsh:ks-test> CREATE TABLE t (
>             ...     key int,
>             ...     value int,
>             ...     PRIMARY KEY (key)
>             ... );
> cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
> cqlsh:ks-test> SELECT * FROM t ;
>
>  key | value
> -----+-------
>  123 |   456
>
> (1 rows)
>
> cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);
>
>  key | value
> -----+-------
>  123 |   456
>  123 |   456 <----- WTF?
>
> (2 rows)
>
> Adding multiple time the same key into an IN statement make the query
> returns multiple time the tuple
>
> This looks weird to me, can anyone give me some feedback on such a
> behavior?
>
> Edouard COLE
>
>
>

RE: Duplicated key with an IN statement

Posted by Edouard COLE <Ed...@rgsystem.com>.

Hello,

When running that kind of query with TRACING ON; I noticed the coordinator is also performing multiple time the same query

Because the element in the IN statement can involve many nodes, it makes sense to map/reduce the query, but running multiple time the same sub query should not happen. What if the result set change? Let’s imagine that query : SELECT * FROM t WHERE key IN (123, 123, …. X1000, 123), and while this query runs, the data for 123 change?

key | value
-----+-------
123 |   456
123 |   456
 123 |   456
 123 |   789 <-- Change here ☹
123 |   789


There’s also something very important: when your table define a tuple being unique for a specific key, this is a real problem to be able to have a result set having multiple time the same key, which should be unique. This is why on every SQL implementation, this is not happening

I think this is a bug

Edouard COLE


De : Alain RODRIGUEZ [mailto:arodrime@gmail.com]
Envoyé : Thursday, February 04, 2016 11:55 AM
À : Edouard COLE
Cc : user@cassandra.apache.org
Objet : Re: Duplicated key with an IN statement

Hi,

This is interesting.

It seems rational that if you are looking at 2 keys and both exist (which is the case) it returns you 2 keys, it. Yet, I just checked this kind of command on MySQL and it gives a one line result. So here CQL differs from SQL (at least MySQL). I know we are trying to fit as much as possible with SQL to avoid loosing people, so we might want to change this.
Not sure if this behavior is intentional / known. Not even sure someone ever tried to do this kind of query actually :).

Does anyone know about that ? Should we raise a ticket ?

-----------------
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com



2016-02-04 8:36 GMT+00:00 Edouard COLE <Ed...@rgsystem.com>>:
Hello,

I just discovered this, and I think this is weird:

ed@debian:~$ cqlsh 192.168.10.8
Connected to _CLUSTER_ at 192.168.10.8:9160<http://192.168.10.8:9160>.
[cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> USE ks-test ;
cqlsh:ks-test> CREATE TABLE t (
            ...     key int,
            ...     value int,
            ...     PRIMARY KEY (key)
            ... );
cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
cqlsh:ks-test> SELECT * FROM t ;

 key | value
-----+-------
 123 |   456

(1 rows)

cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);

 key | value
-----+-------
 123 |   456
 123 |   456 <----- WTF?

(2 rows)

Adding multiple time the same key into an IN statement make the query returns multiple time the tuple

This looks weird to me, can anyone give me some feedback on such a behavior?

Edouard COLE

Re: Duplicated key with an IN statement

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hi,

This is interesting.

It seems rational that if you are looking at 2 keys and both exist (which
is the case) it returns you 2 keys, it. Yet, I just checked this kind of
command on MySQL and it gives a one line result. So here CQL differs from
SQL (at least MySQL). I know we are trying to fit as much as possible with
SQL to avoid loosing people, so we might want to change this.
Not sure if this behavior is intentional / known. Not even sure someone
ever tried to do this kind of query actually :).

Does anyone know about that ? Should we raise a ticket ?

-----------------
Alain Rodriguez
France

The Last Pickle
http://www.thelastpickle.com



2016-02-04 8:36 GMT+00:00 Edouard COLE <Ed...@rgsystem.com>:

> Hello,
>
> I just discovered this, and I think this is weird:
>
> ed@debian:~$ cqlsh 192.168.10.8
> Connected to _CLUSTER_ at 192.168.10.8:9160.
> [cqlsh 4.0.1 | Cassandra 2.0.14.459 | CQL spec 3.1.1 | Thrift protocol
> 19.39.0]
> Use HELP for help.
> cqlsh> USE ks-test ;
> cqlsh:ks-test> CREATE TABLE t (
>             ...     key int,
>             ...     value int,
>             ...     PRIMARY KEY (key)
>             ... );
> cqlsh:ks-test> INSERT INTO t (key, value) VALUES (123, 456) ;
> cqlsh:ks-test> SELECT * FROM t ;
>
>  key | value
> -----+-------
>  123 |   456
>
> (1 rows)
>
> cqlsh:ks-test> SELECT * FROM t WHERE key IN (123, 123);
>
>  key | value
> -----+-------
>  123 |   456
>  123 |   456 <----- WTF?
>
> (2 rows)
>
> Adding multiple time the same key into an IN statement make the query
> returns multiple time the tuple
>
> This looks weird to me, can anyone give me some feedback on such a
> behavior?
>
> Edouard COLE
>
>