You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Bill Mitchell (JIRA)" <ji...@apache.org> on 2014/05/26 22:54:02 UTC

[jira] [Comment Edited] (CASSANDRA-6875) CQL3: select multiple CQL rows in a single partition using IN

    [ https://issues.apache.org/jira/browse/CASSANDRA-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009075#comment-14009075 ] 

Bill Mitchell edited comment on CASSANDRA-6875 at 5/26/14 8:52 PM:
-------------------------------------------------------------------

To try this out, I cobbled up a test case by accessing the TupleType directly on the client side, as this feature is not yet supported in the Java driver.  My approach was to serialize my two ordering column values, then use TupleType.buildValue() to concatenate them into a single ByteBuffer, build a List of all these, then use serialize on a ListType<ByteBuffer> instance to get a single ByteBuffer representing the entire list, and bind that using setBytesUnsafe().  I'm not totally sure of all this, but it seems reasonable.  

My SELECT statement syntax followed the first of the three Tyler suggested: ... WHERE (c1, c2) IN ?, as this allows the statement to be prepared only once, irrespective of the number of compound keys provided.  

What I saw was the following traceback on the server:
14/05/26 14:33:09 ERROR messages.ErrorMessage: Unexpected exception during request
java.util.NoSuchElementException
	at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:396)
	at java.util.LinkedHashMap$ValueIterator.next(LinkedHashMap.java:409)
	at org.apache.cassandra.cql3.statements.SelectStatement.buildMultiColumnInBound(SelectStatement.java:941)
	at org.apache.cassandra.cql3.statements.SelectStatement.buildBound(SelectStatement.java:814)
	at org.apache.cassandra.cql3.statements.SelectStatement.getRequestedBound(SelectStatement.java:977)
	at org.apache.cassandra.cql3.statements.SelectStatement.makeFilter(SelectStatement.java:444)
	at org.apache.cassandra.cql3.statements.SelectStatement.getSliceCommands(SelectStatement.java:340)
	at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:210)
	at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:61)
	at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:158)
	at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:309)
	at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:132)
	at org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304)

Stepping through the code, it appears to have analyzed my statement correctly.  In BuildMultiColumnInBound, splitInValues contains 1426 tuples, which is the number I intended to pass.  The names parameter identifies two columns, createdate and emailcrypt.  The loop executes twice, but on the third iteration there are no more elements in names, thus the exception. 

Moving the construction of the iterator within the loop fixed my Exception.  The code still looks suspect, though, as it calculates a bound b based on whether the first column is reversed, then uses bound, not b, in the following statement.  I've not researched which would be correct, as this appears closely related to the fix Sylvain just developed for CASSANDRA-7105.  In my test case, where the columns were declared as DESC, the code as written did return all the expected rows. 

{code}
        TreeSet<ByteBuffer> inValues = new TreeSet<>(isReversed ? cfDef.cfm.comparator.reverseComparator : cfDef.cfm.comparator);
        for (List<ByteBuffer> components : splitInValues)
        {
            ColumnNameBuilder nameBuilder = builder.copy();
            for (ByteBuffer component : components)
                nameBuilder.add(component);

            Iterator<CFDefinition.Name> iter = names.iterator();
            Bound b = isReversed == isReversedType(iter.next()) ? bound : Bound.reverse(bound);
            inValues.add((bound == Bound.END && nameBuilder.remainingCount() > 0) ? nameBuilder.buildAsEndOfRange() : nameBuilder.build());
        }
        return new ArrayList<>(inValues);
{code}  


was (Author: wtmitchell3):
To try this out, I cobbled up a test case by accessing the TupleType directly on the client side, as this feature is not yet supported in the Java driver.  My approach was to serialize my two ordering column values, then use TupleType.buildValue() to concatenate them into a single ByteBuffer, build a List of all these, then use serialize on a ListType<ByteBuffer> instance to get a single ByteBuffer representing the entire list, and bind that using setBytesUnsafe().  I'm not totally sure of all this, but it seems reasonable.  

My SELECT statement syntax followed the first of the three Tyler suggested: ... WHERE (c1, c2) IN ?, as this allows the statement to be prepared only once, irrespective of the number of compound keys provided.  

What I saw was the following traceback on the server:
14/05/26 14:33:09 ERROR messages.ErrorMessage: Unexpected exception during request
java.util.NoSuchElementException
	at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:396)
	at java.util.LinkedHashMap$ValueIterator.next(LinkedHashMap.java:409)
	at org.apache.cassandra.cql3.statements.SelectStatement.buildMultiColumnInBound(SelectStatement.java:941)
	at org.apache.cassandra.cql3.statements.SelectStatement.buildBound(SelectStatement.java:814)
	at org.apache.cassandra.cql3.statements.SelectStatement.getRequestedBound(SelectStatement.java:977)
	at org.apache.cassandra.cql3.statements.SelectStatement.makeFilter(SelectStatement.java:444)
	at org.apache.cassandra.cql3.statements.SelectStatement.getSliceCommands(SelectStatement.java:340)
	at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:210)
	at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:61)
	at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:158)
	at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:309)
	at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:132)
	at org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304)

Stepping through the code, it appears to have analyzed my statement correctly.  In BuildMultiColumnInBound, splitInValues contains 1426 tuples, which is the number I intended to pass.  The names parameter identifies two columns, createdate and emailcrypt.  The loop executes twice, but on the third iteration there are no more elements in names, thus the exception. 

Moving the construction of the iterator within the loop fixed my Exception.  The code still looks suspect, though, as it calculates a bound b based on whether the first column is reversed, then uses bound, not b, in the following statement.  I've not researched which would be correct, as this appears closely related to the fix Sylvain just developed for CASSANDRA-7105.   

{code}
        TreeSet<ByteBuffer> inValues = new TreeSet<>(isReversed ? cfDef.cfm.comparator.reverseComparator : cfDef.cfm.comparator);
        for (List<ByteBuffer> components : splitInValues)
        {
            ColumnNameBuilder nameBuilder = builder.copy();
            for (ByteBuffer component : components)
                nameBuilder.add(component);

            Iterator<CFDefinition.Name> iter = names.iterator();
            Bound b = isReversed == isReversedType(iter.next()) ? bound : Bound.reverse(bound);
            inValues.add((bound == Bound.END && nameBuilder.remainingCount() > 0) ? nameBuilder.buildAsEndOfRange() : nameBuilder.build());
        }
        return new ArrayList<>(inValues);
{code}  

> CQL3: select multiple CQL rows in a single partition using IN
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-6875
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6875
>             Project: Cassandra
>          Issue Type: Bug
>          Components: API
>            Reporter: Nicolas Favre-Felix
>            Assignee: Tyler Hobbs
>            Priority: Minor
>             Fix For: 2.0.9, 2.1 rc1
>
>
> In the spirit of CASSANDRA-4851 and to bring CQL to parity with Thrift, it is important to support reading several distinct CQL rows from a given partition using a distinct set of "coordinates" for these rows within the partition.
> CASSANDRA-4851 introduced a range scan over the multi-dimensional space of clustering keys. We also need to support a "multi-get" of CQL rows, potentially using the "IN" keyword to define a set of clustering keys to fetch at once.
> (reusing the same example\:)
> Consider the following table:
> {code}
> CREATE TABLE test (
>   k int,
>   c1 int,
>   c2 int,
>   PRIMARY KEY (k, c1, c2)
> );
> {code}
> with the following data:
> {code}
>  k | c1 | c2
> ---+----+----
>  0 |  0 |  0
>  0 |  0 |  1
>  0 |  1 |  0
>  0 |  1 |  1
> {code}
> We can fetch a single row or a range of rows, but not a set of them:
> {code}
> > SELECT * FROM test WHERE k = 0 AND (c1, c2) IN ((0, 0), (1,1)) ;
> Bad Request: line 1:54 missing EOF at ','
> {code}
> Supporting this syntax would return:
> {code}
>  k | c1 | c2
> ---+----+----
>  0 |  0 |  0
>  0 |  1 |  1
> {code}
> Being able to fetch these two CQL rows in a single read is important to maintain partition-level isolation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)