You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Johannes Hoerle <jo...@yoochoose.com> on 2011/03/14 16:17:27 UTC

problems while TimeUUIDType-index-querying with two expressions

Hi all,

in order to improve our queries, we started to use IndexedSliceQueries from the hector project (https://github.com/zznate/hector-examples). I followed the instructions for creating IndexedSlicesQuery with GetIndexedSlices.java.
I created the corresponding CF with in a keyspace called "Keyspace1" ( "create keyspace  Keyspace1;") with:
"create column family Indexed1 with column_type='Standard' and comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and rows_cached=20000 and column_metadata=[{column_name: birthdate, validation_class: LongType, index_name: dateIndex, index_type: KEYS},{column_name: birthmonth, validation_class: LongType, index_name: monthIndex, index_type: KEYS}];"
and the example GetIndexedSlices.java worked fine.

Output of CF Indexed1:
---------------------------------------
[default@Keyspace1] list Indexed1;
Using default limit of 100
-------------------
RowKey: fake_key_12
=> (column=birthdate, value=1974, timestamp=1300110485826059)
=> (column=birthmonth, value=0, timestamp=1300110485826060)
=> (column=fake_column_0, value=66616b655f76616c75655f305f3132, timestamp=1300110485826056)
=> (column=fake_column_1, value=66616b655f76616c75655f315f3132, timestamp=1300110485826057)
=> (column=fake_column_2, value=66616b655f76616c75655f325f3132, timestamp=1300110485826058)
-------------------
RowKey: fake_key_8
=> (column=birthdate, value=1974, timestamp=1300110485826039)
=> (column=birthmonth, value=8, timestamp=1300110485826040)
=> (column=fake_column_0, value=66616b655f76616c75655f305f38, timestamp=1300110485826036)
=> (column=fake_column_1, value=66616b655f76616c75655f315f38, timestamp=1300110485826037)
=> (column=fake_column_2, value=66616b655f76616c75655f325f38, timestamp=1300110485826038)
-------------------
....


Now to the problem:
As we have another column format in our cluster (using TimeUUIDType as comparator in CF definition) I adapted the application to our schema on a cassandra-0.7.3 cluster.
We use a manually defined UUID for a mandator id index (00000000-0000-1000-0000-000000000000) and another one for a userid index (00000001-0000-1000-0000-000000000000). It can be created with:
"create column family ByUser with column_type='Standard' and comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0 and rows_cached=20000 and column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 00000001-0000-1000-0000-000000000000, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];"


which looks in the cluster using cassandra-cli like this:

[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: ByUser
      Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
      Column Metadata:
        Column Name: 00000001-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: useridIndex
          Index Type: KEYS
        Column Name: 00000000-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: mandatorIndex
          Index Type: KEYS
    ColumnFamily: Indexed1
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
      Column Metadata:
        Column Name: birthmonth (birthmonth)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: monthIndex
          Index Type: KEYS
        Column Name: birthdate (birthdate)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: dateIndex
          Index Type: KEYS
[default@Keyspace1] list ByUser;
Using default limit of 100
-------------------
RowKey: testMandator!!user01
=> (column=00000000-0000-1000-0000-000000000000, value=746573744d616e6461746f72, timestamp=1300111213321000)
=> (column=00000001-0000-1000-0000-000000000000, value=757365723031, timestamp=1300111213322000)
=> (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135, timestamp=1300111213561000)

1 Row Returned.

the values of the index colums 00000000-0000-1000-0000-000000000000 and 00000001-0000-1000-0000-000000000000 represent "testMandator" and and "user01" as bytes
the third column is a randomly generated one with value "15" that are inserted in GetTimeUUIDIndexedSlices app.
I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices. Currently the second index expression for the userid index in GetTimeUUIDIndexedSlices.queryCf(...) method

            indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new StringSerializer().toBytes(mandator));
        //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new StringSerializer().toBytes(dummyUserId));

is commented out, so the GetTimeUUIDIndexedSlices will run. Using one IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or lt expression I get an IndexOutOfBoundsException (see below).

This issue can be easily reproduced by
- downloading the zznate example (https://github.com/zznate/hector-examples),
- mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
- importing it in eclipse and
- letting it run against a locally running cassandra instance (v0.7.3) which has the default settings (no changes in the .yaml)

I hope that someone can help me with this issue ... after a couple of days it's driving me bonkers.

Thx in advance,
Johannes


Exception:
ERROR 14:47:56,842 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IndexOutOfBoundsException: 6
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
        at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
meUUIDType.java:56)
        at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:45)
        at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:29)
        at org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
.java:1608)
        at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
:1552)
        at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:42)
        ... 4 more
ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)

Re: AW: problems while TimeUUIDType-index-querying with two expressions

Posted by Aaron Morton <aa...@thelastpickle.com>.

Good work.

Aaron

On 17/03/2011, at 4:37 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Thanks for tracking that down, Roland.  I've created
> https://issues.apache.org/jira/browse/CASSANDRA-2347 to fix this.
> 
> On Wed, Mar 16, 2011 at 10:37 AM, Roland Gude <ro...@yoochoose.com> wrote:
>> I have applied the suggested changes in my local source tree and did run all
>> my testcases (the supplied ones as well as those with real data).
>> 
>> They do work now.
>> 
>> 
>> 
>> Von: Roland Gude [mailto:roland.gude@yoochoose.com]
>> Gesendet: Mittwoch, 16. März 2011 16:29
>> 
>> An: user@cassandra.apache.org
>> Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
>> expressions
>> 
>> 
>> 
>> With debugging into it i found something that might be the issue (please
>> correct me if I am wrong):
>> 
>> In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether
>> some column satisfies an index expression.
>> 
>> In line 1608 it compares the value of the index expression with the value
>> given in the expression.
>> 
>> 
>> 
>> For this comparison it utilizes the comparator of the columnfamily while it
>> should use the comparator of the Column validation class.
>> 
>> 
>> 
>>     private static boolean satisfies(ColumnFamily data, IndexClause clause,
>> IndexExpression first)
>> 
>>     {
>> 
>>         for (IndexExpression expression : clause.expressions)
>> 
>>         {
>> 
>>             // (we can skip "first" since we already know it's satisfied)
>> 
>>             if (expression == first)
>> 
>>                 continue;
>> 
>>             // check column data vs expression
>> 
>>             IColumn column = data.getColumn(expression.column_name);
>> 
>>             if (column == null)
>> 
>>                 return false;
>> 
>>             int v = data.getComparator().compare(column.value(),
>> expression.value);
>> 
>>             if (!satisfies(v, expression.op))
>> 
>>                 return false;
>> 
>>         }
>> 
>>         return true;
>> 
>>     }
>> 
>> 
>> 
>> 
>> 
>> The line 1608 should be changed from:
>> 
>>             int v = data.getComparator().compare(column.value(),
>> expression.value);
>> 
>> 
>> 
>> to
>> 
>>             int v = data.metadata().getValueValidator
>> (expression.column_name).compare(column.value(), expression.value);
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> greetings roland
>> 
>> 
>> 
>> 
>> 
>> Von: Roland Gude [mailto:roland.gude@yoochoose.com]
>> Gesendet: Mittwoch, 16. März 2011 14:50
>> An: user@cassandra.apache.org
>> Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
>> expressions
>> 
>> 
>> 
>> Hi Aaron,
>> 
>> 
>> 
>> now I am completely confused.
>> 
>> The code that did not work for days now – like a miracle – works even
>> against the unpatched Cassandra 0.7.3 but the testcase still does not…
>> 
>> There seems to be some randomness in whether it works or not (which is a bad
>> sign I think)… I will debug a little deeper into this and report anything I
>> find.
>> 
>> 
>> 
>> Greetings,
>> 
>> roland
>> 
>> 
>> 
>> Von: aaron morton [mailto:aaron@thelastpickle.com]
>> Gesendet: Mittwoch, 16. März 2011 01:15
>> An: user@cassandra.apache.org
>> Betreff: Re: AW: problems while TimeUUIDType-index-querying with two
>> expressions
>> 
>> 
>> 
>> Have attached a patch
>> to https://issues.apache.org/jira/browse/CASSANDRA-2328
>> 
>> 
>> 
>> Can you give it a try ? You should not get a InvalidRequestException when
>> you send an invalid name or value in the query expression.
>> 
>> 
>> 
>> Aaron
>> 
>> 
>> 
>> On 16 Mar 2011, at 10:30, aaron morton wrote:
>> 
>> 
>> 
>> Will have the Jira I created finished soon, it's a legitimate issue we
>> should be validating the column names and values when a ger_indexed_slice()
>> request is sent. The error in your original email shows that.
>> 
>> 
>> 
>> WRT your code example. You are using the TimeUUID Validator for the column
>> name when creating the index expression, but are using a string serialiser
>> for the value...
>> 
>> IndexedSlicesQuery<String, UUID, String> indexQuery = HFactory
>>                         .createIndexedSlicesQuery(keyspace,
>>                                                stringSerializer,
>> UUID_SERIALIZER, stringSerializer);
>>         indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
>> 
>> But your schema is saying it is a bytes type...
>> 
>> 
>> 
>> column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000,
>> validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS},
>> {column_name: 00000001-0000-1000-0000-000000000000, validation_class:
>> BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011, at
>> 22:41,
>> 
>> 
>> 
>> Once I have the patch can you apply it and run your test again ?
>> 
>> 
>> 
>> You may also want to ask on the Hector list if it automagically check you
>> are using the correct types when creating an IndexedSlicesQuery.
>> 
>> 
>> 
>> Aaron
>> 
>> 
>> 
>> Roland Gude wrote:
>> 
>> 
>> 
>> Forgot to attach the source code… here it comes
>> 
>> 
>> 
>> Von: Roland Gude [mailto:roland.gude@yoochoose.com]
>> Gesendet: Dienstag, 15. März 2011 10:39
>> An: user@cassandra.apache.org
>> Betreff: AW: problems while TimeUUIDType-index-querying with two expressions
>> 
>> 
>> 
>> Actually its not the column values that should be UUIDs in our case, but the
>> column keys. The CF uses TimeUUID ordering and the values are just some
>> ByteArrays. Even with changing the code to use UUIDSerializer instead of
>> serializing the UUIDs manually the issue still exists.
>> 
>> 
>> 
>> As far as I can see, there is nothing wrong with the IndexExpression.
>> 
>> using two Index expressions with key=TimedUUID and Value=anything does not
>> work
>> 
>> using one index expression (any one of the other two) alone does work fine.
>> 
>> 
>> 
>> I refactored Johannes code into a junit testcase. It  needs the cluster
>> configured as described in Johannes mail.
>> 
>> There are three cases. Two with one of the indexExpressions and one with
>> both index expression. The one with Both IndexExpression will never finish
>> and youz will see the exception in the Cassandra logs.
>> 
>> 
>> 
>> Bye,
>> 
>> roland
>> 
>> 
>> 
>> Von: aaron morton [mailto:aaron@thelastpickle.com]
>> Gesendet: Dienstag, 15. März 2011 07:54
>> An: user@cassandra.apache.org
>> Cc: Juergen Link; Roland Gude; hermes@datastax.com
>> Betreff: Re: problems while TimeUUIDType-index-querying with two expressions
>> 
>> 
>> 
>> Perfectly reasonable,
>> created https://issues.apache.org/jira/browse/CASSANDRA-2328
>> 
>> 
>> 
>> Aaron
>> 
>> On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:
>> 
>> 
>> 
>> Sounds like we should send an InvalidRequestException then.
>> 
>> On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>> 
>> It's failing to when comparing two TimeUUID values because on of them is not
>> 
>> properly formatted. In this case it's comparing a stored value with the
>> 
>> value passed in the get_indexed_slice() query expression.
>> 
>> I'm going to assume it's the value passed for the expression.
>> 
>> When you create the IndexedSlicesQuery this is incorrect
>> 
>> IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
>> 
>> .createIndexedSlicesQuery(keyspace,
>> 
>> stringSerializer, bytesSerializer, bytesSerializer);
>> 
>> Use a UUIDSerializer for the last param and then pass the UUID you want to
>> 
>> build the expressing. Rather than the string/byte thing you are passing
>> 
>> Hope that helps.
>> 
>> Aaron
>> 
>> On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:
>> 
>> 
>> 
>> Hi all,
>> 
>> 
>> 
>> in order to improve our queries, we started to use IndexedSliceQueries from
>> 
>> the hector project (https://github.com/zznate/hector-examples). I followed
>> 
>> the instructions for creating IndexedSlicesQuery with
>> 
>> GetIndexedSlices.java.
>> 
>> I created the corresponding CF with in a keyspace called “Keyspace1” (
>> 
>> “create keyspace  Keyspace1;”) with:
>> 
>> "create column family Indexed1 with column_type='Standard' and
>> 
>> comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
>> 
>> rows_cached=20000 and column_metadata=[{column_name: birthdate,
>> 
>> validation_class: LongType, index_name: dateIndex, index_type:
>> 
>> KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
>> 
>> monthIndex, index_type: KEYS}];"
>> 
>> and the example GetIndexedSlices.java worked fine.
>> 
>> 
>> 
>> Output of CF Indexed1:
>> 
>> ---------------------------------------
>> 
>> [default@Keyspace1] list Indexed1;
>> 
>> Using default limit of 100
>> 
>> -------------------
>> 
>> RowKey: fake_key_12
>> 
>> => (column=birthdate, value=1974, timestamp=1300110485826059)
>> 
>> => (column=birthmonth, value=0, timestamp=1300110485826060)
>> 
>> => (column=fake_column_0, value=66616b655f76616c75655f305f3132,
>> 
>> timestamp=1300110485826056)
>> 
>> => (column=fake_column_1, value=66616b655f76616c75655f315f3132,
>> 
>> timestamp=1300110485826057)
>> 
>> => (column=fake_column_2, value=66616b655f76616c75655f325f3132,
>> 
>> timestamp=1300110485826058)
>> 
>> -------------------
>> 
>> RowKey: fake_key_8
>> 
>> => (column=birthdate, value=1974, timestamp=1300110485826039)
>> 
>> => (column=birthmonth, value=8, timestamp=1300110485826040)
>> 
>> => (column=fake_column_0, value=66616b655f76616c75655f305f38,
>> 
>> timestamp=1300110485826036)
>> 
>> => (column=fake_column_1, value=66616b655f76616c75655f315f38,
>> 
>> timestamp=1300110485826037)
>> 
>> => (column=fake_column_2, value=66616b655f76616c75655f325f38,
>> 
>> timestamp=1300110485826038)
>> 
>> -------------------
>> 
>> ....
>> 
>> 
>> 
>> 
>> 
>> Now to the problem:
>> 
>> As we have another column format in our cluster (using TimeUUIDType as
>> 
>> comparator in CF definition) I adapted the application to our schema on a
>> 
>> cassandra-0.7.3 cluster.
>> 
>> We use a manually defined UUID for a mandator id index
>> 
>> (00000000-0000-1000-0000-000000000000) and another one for a userid index
>> 
>> (00000001-0000-1000-0000-000000000000). It can be created with:
>> 
>> "create column family ByUser with column_type='Standard' and
>> 
>> comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
>> 
>> and rows_cached=20000 and column_metadata=[{column_name:
>> 
>> 00000000-0000-1000-0000-000000000000, validation_class: BytesType,
>> 
>> index_name: mandatorIndex, index_type: KEYS}, {column_name:
>> 
>> 00000001-0000-1000-0000-000000000000, validation_class: BytesType,
>> 
>> index_name: useridIndex, index_type: KEYS}];"
>> 
>> 
>> 
>> 
>> 
>> which looks in the cluster using cassandra-cli like this:
>> 
>> 
>> 
>> [default@Keyspace1] describe keyspace;
>> 
>> Keyspace: Keyspace1:
>> 
>>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>> 
>>     Replication Factor: 1
>> 
>>   Column Families:
>> 
>>     ColumnFamily: ByUser
>> 
>>       Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>> 
>>       Row cache size / save period: 20000.0/0
>> 
>>       Key cache size / save period: 200000.0/14400
>> 
>>       Memtable thresholds: 0.2953125/63/1440
>> 
>>       GC grace seconds: 864000
>> 
>>       Compaction min/max thresholds: 4/32
>> 
>>       Read repair chance: 0.01
>> 
>>       Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
>> 
>>       Column Metadata:
>> 
>>         Column Name: 00000001-0000-1000-0000-000000000000
>> 
>>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>> 
>>           Index Name: useridIndex
>> 
>>           Index Type: KEYS
>> 
>>         Column Name: 00000000-0000-1000-0000-000000000000
>> 
>>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>> 
>>           Index Name: mandatorIndex
>> 
>>           Index Type: KEYS
>> 
>>     ColumnFamily: Indexed1
>> 
>>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>> 
>>       Row cache size / save period: 20000.0/0
>> 
>>       Key cache size / save period: 200000.0/14400
>> 
>>       Memtable thresholds: 0.2953125/63/1440
>> 
>>       GC grace seconds: 864000
>> 
>>       Compaction min/max thresholds: 4/32
>> 
>>       Read repair chance: 0.01
>> 
>>       Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
>> 
>>       Column Metadata:
>> 
>>         Column Name: birthmonth (birthmonth)
>> 
>>           Validation Class: org.apache.cassandra.db.marshal.LongType
>> 
>>           Index Name: monthIndex
>> 
>>           Index Type: KEYS
>> 
>>         Column Name: birthdate (birthdate)
>> 
>>           Validation Class: org.apache.cassandra.db.marshal.LongType
>> 
>>           Index Name: dateIndex
>> 
>>           Index Type: KEYS
>> 
>> [default@Keyspace1] list ByUser;
>> 
>> Using default limit of 100
>> 
>> -------------------
>> 
>> RowKey: testMandator!!user01
>> 
>> => (column=00000000-0000-1000-0000-000000000000,
>> 
>> value=746573744d616e6461746f72, timestamp=1300111213321000)
>> 
>> => (column=00000001-0000-1000-0000-000000000000, value=757365723031,
>> 
>> timestamp=1300111213322000)
>> 
>> => (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
>> 
>> timestamp=1300111213561000)
>> 
>> 
>> 
>> 1 Row Returned.
>> 
>> 
>> 
>> the values of the index colums 00000000-0000-1000-0000-000000000000 and
>> 
>> 00000001-0000-1000-0000-000000000000 represent "testMandator" and and
>> 
>> "user01" as bytes
>> 
>> the third column is a randomly generated one with value "15" that are
>> 
>> inserted in GetTimeUUIDIndexedSlices app.
>> 
>> I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
>> 
>> Currently the second index expression for the userid index in
>> 
>> GetTimeUUIDIndexedSlices.queryCf(...) method
>> 
>> 
>> 
>>             indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
>> 
>> StringSerializer().toBytes(mandator));
>> 
>>         //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
>> 
>> StringSerializer().toBytes(dummyUserId));
>> 
>> 
>> 
>> is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
>> 
>> IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
>> 
>> lt expression I get an IndexOutOfBoundsException (see below).
>> 
>> 
>> 
>> This issue can be easily reproduced by
>> 
>> - downloading the zznate example
>> 
>> (https://github.com/zznate/hector-examples),
>> 
>> - mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
>> 
>> - importing it in eclipse and
>> 
>> - letting it run against a locally running cassandra instance (v0.7.3) which
>> 
>> has the default settings (no changes in the .yaml)
>> 
>> 
>> 
>> I hope that someone can help me with this issue ... after a couple of days
>> 
>> it's driving me bonkers.
>> 
>> 
>> 
>> Thx in advance,
>> 
>> Johannes
>> 
>> 
>> 
>> 
>> 
>> Exception:
>> 
>> ERROR 14:47:56,842 Error in ThreadPoolExecutor
>> 
>> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>> 
>>         at
>> 
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> 
>> bHandler.java:51)
>> 
>>         at
>> 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>> 
>> java:72)
>> 
>>         at
>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>> 
>> utor.java:886)
>> 
>>         at
>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> 
>> .java:908)
>> 
>>         at java.lang.Thread.run(Thread.java:619)
>> 
>> Caused by: java.lang.IndexOutOfBoundsException: 6
>> 
>>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
>> 
>>         at
>> 
>> org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
>> 
>> meUUIDType.java:56)
>> 
>>         at
>> 
>> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>> 
>> a:45)
>> 
>>         at
>> 
>> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>> 
>> a:29)
>> 
>>         at
>> 
>> org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
>> 
>> .java:1608)
>> 
>>         at
>> 
>> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
>> 
>> :1552)
>> 
>>         at
>> 
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> 
>> bHandler.java:42)
>> 
>>         ... 4 more
>> 
>> ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
>> 
>> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>> 
>>         at
>> 
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> 
>> bHandler.java:51)
>> 
>>         at
>> 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>> 
>> java:72)
>> 
>>         at
>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>> 
>> utor.java:886)
>> 
>>         at
>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> 
>> .java:908)
>> 
>> <GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>
>> 
>> 
>> 
>> 
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>> 
>> 
>> 
>> <GetTimeUUIDIndexedSlices.java>
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: AW: problems while TimeUUIDType-index-querying with two expressions

Posted by Jonathan Ellis <jb...@gmail.com>.

Thanks for tracking that down, Roland.  I've created
https://issues.apache.org/jira/browse/CASSANDRA-2347 to fix this.

On Wed, Mar 16, 2011 at 10:37 AM, Roland Gude <ro...@yoochoose.com> wrote:
> I have applied the suggested changes in my local source tree and did run all
> my testcases (the supplied ones as well as those with real data).
>
> They do work now.
>
>
>
> Von: Roland Gude [mailto:roland.gude@yoochoose.com]
> Gesendet: Mittwoch, 16. März 2011 16:29
>
> An: user@cassandra.apache.org
> Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
> expressions
>
>
>
> With debugging into it i found something that might be the issue (please
> correct me if I am wrong):
>
> In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether
> some column satisfies an index expression.
>
> In line 1608 it compares the value of the index expression with the value
> given in the expression.
>
>
>
> For this comparison it utilizes the comparator of the columnfamily while it
> should use the comparator of the Column validation class.
>
>
>
>     private static boolean satisfies(ColumnFamily data, IndexClause clause,
> IndexExpression first)
>
>     {
>
>         for (IndexExpression expression : clause.expressions)
>
>         {
>
>             // (we can skip "first" since we already know it's satisfied)
>
>             if (expression == first)
>
>                 continue;
>
>             // check column data vs expression
>
>             IColumn column = data.getColumn(expression.column_name);
>
>             if (column == null)
>
>                 return false;
>
>             int v = data.getComparator().compare(column.value(),
> expression.value);
>
>             if (!satisfies(v, expression.op))
>
>                 return false;
>
>         }
>
>         return true;
>
>     }
>
>
>
>
>
> The line 1608 should be changed from:
>
>             int v = data.getComparator().compare(column.value(),
> expression.value);
>
>
>
> to
>
>             int v = data.metadata().getValueValidator
> (expression.column_name).compare(column.value(), expression.value);
>
>
>
>
>
>
>
> greetings roland
>
>
>
>
>
> Von: Roland Gude [mailto:roland.gude@yoochoose.com]
> Gesendet: Mittwoch, 16. März 2011 14:50
> An: user@cassandra.apache.org
> Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
> expressions
>
>
>
> Hi Aaron,
>
>
>
> now I am completely confused.
>
> The code that did not work for days now – like a miracle – works even
> against the unpatched Cassandra 0.7.3 but the testcase still does not…
>
> There seems to be some randomness in whether it works or not (which is a bad
> sign I think)… I will debug a little deeper into this and report anything I
> find.
>
>
>
> Greetings,
>
> roland
>
>
>
> Von: aaron morton [mailto:aaron@thelastpickle.com]
> Gesendet: Mittwoch, 16. März 2011 01:15
> An: user@cassandra.apache.org
> Betreff: Re: AW: problems while TimeUUIDType-index-querying with two
> expressions
>
>
>
> Have attached a patch
> to https://issues.apache.org/jira/browse/CASSANDRA-2328
>
>
>
> Can you give it a try ? You should not get a InvalidRequestException when
> you send an invalid name or value in the query expression.
>
>
>
> Aaron
>
>
>
> On 16 Mar 2011, at 10:30, aaron morton wrote:
>
>
>
> Will have the Jira I created finished soon, it's a legitimate issue we
> should be validating the column names and values when a ger_indexed_slice()
> request is sent. The error in your original email shows that.
>
>
>
> WRT your code example. You are using the TimeUUID Validator for the column
> name when creating the index expression, but are using a string serialiser
> for the value...
>
> IndexedSlicesQuery<String, UUID, String> indexQuery = HFactory
>                         .createIndexedSlicesQuery(keyspace,
>                                                stringSerializer,
> UUID_SERIALIZER, stringSerializer);
>         indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
>
> But your schema is saying it is a bytes type...
>
>
>
> column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000,
> validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS},
> {column_name: 00000001-0000-1000-0000-000000000000, validation_class:
> BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011, at
> 22:41,
>
>
>
> Once I have the patch can you apply it and run your test again ?
>
>
>
> You may also want to ask on the Hector list if it automagically check you
> are using the correct types when creating an IndexedSlicesQuery.
>
>
>
> Aaron
>
>
>
> Roland Gude wrote:
>
>
>
> Forgot to attach the source code… here it comes
>
>
>
> Von: Roland Gude [mailto:roland.gude@yoochoose.com]
> Gesendet: Dienstag, 15. März 2011 10:39
> An: user@cassandra.apache.org
> Betreff: AW: problems while TimeUUIDType-index-querying with two expressions
>
>
>
> Actually its not the column values that should be UUIDs in our case, but the
> column keys. The CF uses TimeUUID ordering and the values are just some
> ByteArrays. Even with changing the code to use UUIDSerializer instead of
> serializing the UUIDs manually the issue still exists.
>
>
>
> As far as I can see, there is nothing wrong with the IndexExpression.
>
> using two Index expressions with key=TimedUUID and Value=anything does not
> work
>
> using one index expression (any one of the other two) alone does work fine.
>
>
>
> I refactored Johannes code into a junit testcase. It  needs the cluster
> configured as described in Johannes mail.
>
> There are three cases. Two with one of the indexExpressions and one with
> both index expression. The one with Both IndexExpression will never finish
> and youz will see the exception in the Cassandra logs.
>
>
>
> Bye,
>
> roland
>
>
>
> Von: aaron morton [mailto:aaron@thelastpickle.com]
> Gesendet: Dienstag, 15. März 2011 07:54
> An: user@cassandra.apache.org
> Cc: Juergen Link; Roland Gude; hermes@datastax.com
> Betreff: Re: problems while TimeUUIDType-index-querying with two expressions
>
>
>
> Perfectly reasonable,
> created https://issues.apache.org/jira/browse/CASSANDRA-2328
>
>
>
> Aaron
>
> On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:
>
>
>
> Sounds like we should send an InvalidRequestException then.
>
> On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com>
> wrote:
>
> It's failing to when comparing two TimeUUID values because on of them is not
>
> properly formatted. In this case it's comparing a stored value with the
>
> value passed in the get_indexed_slice() query expression.
>
> I'm going to assume it's the value passed for the expression.
>
> When you create the IndexedSlicesQuery this is incorrect
>
> IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
>
> .createIndexedSlicesQuery(keyspace,
>
> stringSerializer, bytesSerializer, bytesSerializer);
>
> Use a UUIDSerializer for the last param and then pass the UUID you want to
>
> build the expressing. Rather than the string/byte thing you are passing
>
> Hope that helps.
>
> Aaron
>
> On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:
>
>
>
> Hi all,
>
>
>
> in order to improve our queries, we started to use IndexedSliceQueries from
>
> the hector project (https://github.com/zznate/hector-examples). I followed
>
> the instructions for creating IndexedSlicesQuery with
>
> GetIndexedSlices.java.
>
> I created the corresponding CF with in a keyspace called “Keyspace1” (
>
> “create keyspace  Keyspace1;”) with:
>
> "create column family Indexed1 with column_type='Standard' and
>
> comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
>
> rows_cached=20000 and column_metadata=[{column_name: birthdate,
>
> validation_class: LongType, index_name: dateIndex, index_type:
>
> KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
>
> monthIndex, index_type: KEYS}];"
>
> and the example GetIndexedSlices.java worked fine.
>
>
>
> Output of CF Indexed1:
>
> ---------------------------------------
>
> [default@Keyspace1] list Indexed1;
>
> Using default limit of 100
>
> -------------------
>
> RowKey: fake_key_12
>
> => (column=birthdate, value=1974, timestamp=1300110485826059)
>
> => (column=birthmonth, value=0, timestamp=1300110485826060)
>
> => (column=fake_column_0, value=66616b655f76616c75655f305f3132,
>
> timestamp=1300110485826056)
>
> => (column=fake_column_1, value=66616b655f76616c75655f315f3132,
>
> timestamp=1300110485826057)
>
> => (column=fake_column_2, value=66616b655f76616c75655f325f3132,
>
> timestamp=1300110485826058)
>
> -------------------
>
> RowKey: fake_key_8
>
> => (column=birthdate, value=1974, timestamp=1300110485826039)
>
> => (column=birthmonth, value=8, timestamp=1300110485826040)
>
> => (column=fake_column_0, value=66616b655f76616c75655f305f38,
>
> timestamp=1300110485826036)
>
> => (column=fake_column_1, value=66616b655f76616c75655f315f38,
>
> timestamp=1300110485826037)
>
> => (column=fake_column_2, value=66616b655f76616c75655f325f38,
>
> timestamp=1300110485826038)
>
> -------------------
>
> ....
>
>
>
>
>
> Now to the problem:
>
> As we have another column format in our cluster (using TimeUUIDType as
>
> comparator in CF definition) I adapted the application to our schema on a
>
> cassandra-0.7.3 cluster.
>
> We use a manually defined UUID for a mandator id index
>
> (00000000-0000-1000-0000-000000000000) and another one for a userid index
>
> (00000001-0000-1000-0000-000000000000). It can be created with:
>
> "create column family ByUser with column_type='Standard' and
>
> comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
>
> and rows_cached=20000 and column_metadata=[{column_name:
>
> 00000000-0000-1000-0000-000000000000, validation_class: BytesType,
>
> index_name: mandatorIndex, index_type: KEYS}, {column_name:
>
> 00000001-0000-1000-0000-000000000000, validation_class: BytesType,
>
> index_name: useridIndex, index_type: KEYS}];"
>
>
>
>
>
> which looks in the cluster using cassandra-cli like this:
>
>
>
> [default@Keyspace1] describe keyspace;
>
> Keyspace: Keyspace1:
>
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>
>     Replication Factor: 1
>
>   Column Families:
>
>     ColumnFamily: ByUser
>
>       Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>
>       Row cache size / save period: 20000.0/0
>
>       Key cache size / save period: 200000.0/14400
>
>       Memtable thresholds: 0.2953125/63/1440
>
>       GC grace seconds: 864000
>
>       Compaction min/max thresholds: 4/32
>
>       Read repair chance: 0.01
>
>       Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
>
>       Column Metadata:
>
>         Column Name: 00000001-0000-1000-0000-000000000000
>
>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>
>           Index Name: useridIndex
>
>           Index Type: KEYS
>
>         Column Name: 00000000-0000-1000-0000-000000000000
>
>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>
>           Index Name: mandatorIndex
>
>           Index Type: KEYS
>
>     ColumnFamily: Indexed1
>
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>
>       Row cache size / save period: 20000.0/0
>
>       Key cache size / save period: 200000.0/14400
>
>       Memtable thresholds: 0.2953125/63/1440
>
>       GC grace seconds: 864000
>
>       Compaction min/max thresholds: 4/32
>
>       Read repair chance: 0.01
>
>       Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
>
>       Column Metadata:
>
>         Column Name: birthmonth (birthmonth)
>
>           Validation Class: org.apache.cassandra.db.marshal.LongType
>
>           Index Name: monthIndex
>
>           Index Type: KEYS
>
>         Column Name: birthdate (birthdate)
>
>           Validation Class: org.apache.cassandra.db.marshal.LongType
>
>           Index Name: dateIndex
>
>           Index Type: KEYS
>
> [default@Keyspace1] list ByUser;
>
> Using default limit of 100
>
> -------------------
>
> RowKey: testMandator!!user01
>
> => (column=00000000-0000-1000-0000-000000000000,
>
> value=746573744d616e6461746f72, timestamp=1300111213321000)
>
> => (column=00000001-0000-1000-0000-000000000000, value=757365723031,
>
> timestamp=1300111213322000)
>
> => (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
>
> timestamp=1300111213561000)
>
>
>
> 1 Row Returned.
>
>
>
> the values of the index colums 00000000-0000-1000-0000-000000000000 and
>
> 00000001-0000-1000-0000-000000000000 represent "testMandator" and and
>
> "user01" as bytes
>
> the third column is a randomly generated one with value "15" that are
>
> inserted in GetTimeUUIDIndexedSlices app.
>
> I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
>
> Currently the second index expression for the userid index in
>
> GetTimeUUIDIndexedSlices.queryCf(...) method
>
>
>
>             indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
>
> StringSerializer().toBytes(mandator));
>
>         //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
>
> StringSerializer().toBytes(dummyUserId));
>
>
>
> is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
>
> IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
>
> lt expression I get an IndexOutOfBoundsException (see below).
>
>
>
> This issue can be easily reproduced by
>
> - downloading the zznate example
>
> (https://github.com/zznate/hector-examples),
>
> - mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
>
> - importing it in eclipse and
>
> - letting it run against a locally running cassandra instance (v0.7.3) which
>
> has the default settings (no changes in the .yaml)
>
>
>
> I hope that someone can help me with this issue ... after a couple of days
>
> it's driving me bonkers.
>
>
>
> Thx in advance,
>
> Johannes
>
>
>
>
>
> Exception:
>
> ERROR 14:47:56,842 Error in ThreadPoolExecutor
>
> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>
>         at
>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>
> bHandler.java:51)
>
>         at
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>
> java:72)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>
> utor.java:886)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>
> .java:908)
>
>         at java.lang.Thread.run(Thread.java:619)
>
> Caused by: java.lang.IndexOutOfBoundsException: 6
>
>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
>
>         at
>
> org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
>
> meUUIDType.java:56)
>
>         at
>
> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>
> a:45)
>
>         at
>
> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>
> a:29)
>
>         at
>
> org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
>
> .java:1608)
>
>         at
>
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
>
> :1552)
>
>         at
>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>
> bHandler.java:42)
>
>         ... 4 more
>
> ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
>
> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>
>         at
>
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>
> bHandler.java:51)
>
>         at
>
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>
> java:72)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>
> utor.java:886)
>
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>
> .java:908)
>
> <GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>
>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
>
>
> <GetTimeUUIDIndexedSlices.java>
>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

AW: AW: problems while TimeUUIDType-index-querying with two expressions

Posted by Roland Gude <ro...@yoochoose.com>.

I have applied the suggested changes in my local source tree and did run all my testcases (the supplied ones as well as those with real data).
They do work now.

Von: Roland Gude [mailto:roland.gude@yoochoose.com]
Gesendet: Mittwoch, 16. März 2011 16:29
An: user@cassandra.apache.org
Betreff: AW: AW: problems while TimeUUIDType-index-querying with two expressions

With debugging into it i found something that might be the issue (please correct me if I am wrong):
In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether some column satisfies an index expression.
In line 1608 it compares the value of the index expression with the value given in the expression.

For this comparison it utilizes the comparator of the columnfamily while it should use the comparator of the Column validation class.

    private static boolean satisfies(ColumnFamily data, IndexClause clause, IndexExpression first)
    {
        for (IndexExpression expression : clause.expressions)
        {
            // (we can skip "first" since we already know it's satisfied)
            if (expression == first)
                continue;
            // check column data vs expression
            IColumn column = data.getColumn(expression.column_name);
            if (column == null)
                return false;
            int v = data.getComparator().compare(column.value(), expression.value);
            if (!satisfies(v, expression.op))
                return false;
        }
        return true;
    }


The line 1608 should be changed from:
            int v = data.getComparator().compare(column.value(), expression.value);

to
            int v = data.metadata().getValueValidator (expression.column_name).compare(column.value(), expression.value);



greetings roland


Von: Roland Gude [mailto:roland.gude@yoochoose.com]
Gesendet: Mittwoch, 16. März 2011 14:50
An: user@cassandra.apache.org
Betreff: AW: AW: problems while TimeUUIDType-index-querying with two expressions

Hi Aaron,

now I am completely confused.
The code that did not work for days now - like a miracle - works even against the unpatched Cassandra 0.7.3 but the testcase still does not...
There seems to be some randomness in whether it works or not (which is a bad sign I think)... I will debug a little deeper into this and report anything I find.

Greetings,
roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Mittwoch, 16. März 2011 01:15
An: user@cassandra.apache.org
Betreff: Re: AW: problems while TimeUUIDType-index-querying with two expressions

Have attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-2328

Can you give it a try ? You should not get a InvalidRequestException when you send an invalid name or value in the query expression.

Aaron

On 16 Mar 2011, at 10:30, aaron morton wrote:

Will have the Jira I created finished soon, it's a legitimate issue we should be validating the column names and values when a ger_indexed_slice() request is sent. The error in your original email shows that.

WRT your code example. You are using the TimeUUID Validator for the column name when creating the index expression, but are using a string serialiser for the value...
IndexedSlicesQuery<String, UUID, String> indexQuery = HFactory
                        .createIndexedSlicesQuery(keyspace,
                                               stringSerializer, UUID_SERIALIZER, stringSerializer);
        indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
But your schema is saying it is a bytes type...

column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 00000001-0000-1000-0000-000000000000, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011, at 22:41,

Once I have the patch can you apply it and run your test again ?

You may also want to ask on the Hector list if it automagically check you are using the correct types when creating an IndexedSlicesQuery.

Aaron

Roland Gude wrote:

Forgot to attach the source code... here it comes

Von: Roland Gude [mailto:roland.gude@yoochoose.com]
Gesendet: Dienstag, 15. März 2011 10:39
An: user@cassandra.apache.org<ma...@cassandra.apache.org>
Betreff: AW: problems while TimeUUIDType-index-querying with two expressions

Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists.

As far as I can see, there is nothing wrong with the IndexExpression.
using two Index expressions with key=TimedUUID and Value=anything does not work
using one index expression (any one of the other two) alone does work fine.

I refactored Johannes code into a junit testcase. It  needs the cluster configured as described in Johannes mail.
There are three cases. Two with one of the indexExpressions and one with both index expression. The one with Both IndexExpression will never finish and youz will see the exception in the Cassandra logs.

Bye,
roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Dienstag, 15. März 2011 07:54
An: user@cassandra.apache.org<ma...@cassandra.apache.org>
Cc: Juergen Link; Roland Gude; hermes@datastax.com<ma...@datastax.com>
Betreff: Re: problems while TimeUUIDType-index-querying with two expressions

Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328

Aaron
On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:

Sounds like we should send an InvalidRequestException then.

On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com>> wrote:
It's failing to when comparing two TimeUUID values because on of them is not
properly formatted. In this case it's comparing a stored value with the
value passed in the get_indexed_slice() query expression.
I'm going to assume it's the value passed for the expression.
When you create the IndexedSlicesQuery this is incorrect
IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
.createIndexedSlicesQuery(keyspace,
stringSerializer, bytesSerializer, bytesSerializer);
Use a UUIDSerializer for the last param and then pass the UUID you want to
build the expressing. Rather than the string/byte thing you are passing
Hope that helps.
Aaron
On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:

Hi all,

in order to improve our queries, we started to use IndexedSliceQueries from
the hector project (https://github.com/zznate/hector-examples). I followed
the instructions for creating IndexedSlicesQuery with
GetIndexedSlices.java.
I created the corresponding CF with in a keyspace called "Keyspace1" (
"create keyspace  Keyspace1;") with:
"create column family Indexed1 with column_type='Standard' and
comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
rows_cached=20000 and column_metadata=[{column_name: birthdate,
validation_class: LongType, index_name: dateIndex, index_type:
KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
monthIndex, index_type: KEYS}];"
and the example GetIndexedSlices.java worked fine.

Output of CF Indexed1:
---------------------------------------
[default@Keyspace1] list Indexed1;
Using default limit of 100
-------------------
RowKey: fake_key_12
=> (column=birthdate, value=1974, timestamp=1300110485826059)
=> (column=birthmonth, value=0, timestamp=1300110485826060)
=> (column=fake_column_0, value=66616b655f76616c75655f305f3132,
timestamp=1300110485826056)
=> (column=fake_column_1, value=66616b655f76616c75655f315f3132,
timestamp=1300110485826057)
=> (column=fake_column_2, value=66616b655f76616c75655f325f3132,
timestamp=1300110485826058)
-------------------
RowKey: fake_key_8
=> (column=birthdate, value=1974, timestamp=1300110485826039)
=> (column=birthmonth, value=8, timestamp=1300110485826040)
=> (column=fake_column_0, value=66616b655f76616c75655f305f38,
timestamp=1300110485826036)
=> (column=fake_column_1, value=66616b655f76616c75655f315f38,
timestamp=1300110485826037)
=> (column=fake_column_2, value=66616b655f76616c75655f325f38,
timestamp=1300110485826038)
-------------------
....


Now to the problem:
As we have another column format in our cluster (using TimeUUIDType as
comparator in CF definition) I adapted the application to our schema on a
cassandra-0.7.3 cluster.
We use a manually defined UUID for a mandator id index
(00000000-0000-1000-0000-000000000000) and another one for a userid index
(00000001-0000-1000-0000-000000000000). It can be created with:
"create column family ByUser with column_type='Standard' and
comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
and rows_cached=20000 and column_metadata=[{column_name:
00000000-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: mandatorIndex, index_type: KEYS}, {column_name:
00000001-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: useridIndex, index_type: KEYS}];"


which looks in the cluster using cassandra-cli like this:

[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: ByUser
      Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
      Column Metadata:
        Column Name: 00000001-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: useridIndex
          Index Type: KEYS
        Column Name: 00000000-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: mandatorIndex
          Index Type: KEYS
    ColumnFamily: Indexed1
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
      Column Metadata:
        Column Name: birthmonth (birthmonth)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: monthIndex
          Index Type: KEYS
        Column Name: birthdate (birthdate)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: dateIndex
          Index Type: KEYS
[default@Keyspace1] list ByUser;
Using default limit of 100
-------------------
RowKey: testMandator!!user01
=> (column=00000000-0000-1000-0000-000000000000,
value=746573744d616e6461746f72, timestamp=1300111213321000)
=> (column=00000001-0000-1000-0000-000000000000, value=757365723031,
timestamp=1300111213322000)
=> (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
timestamp=1300111213561000)

1 Row Returned.

the values of the index colums 00000000-0000-1000-0000-000000000000 and
00000001-0000-1000-0000-000000000000 represent "testMandator" and and
"user01" as bytes
the third column is a randomly generated one with value "15" that are
inserted in GetTimeUUIDIndexedSlices app.
I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
Currently the second index expression for the userid index in
GetTimeUUIDIndexedSlices.queryCf(...) method

            indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
StringSerializer().toBytes(mandator));
        //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
StringSerializer().toBytes(dummyUserId));

is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
lt expression I get an IndexOutOfBoundsException (see below).

This issue can be easily reproduced by
- downloading the zznate example
(https://github.com/zznate/hector-examples),
- mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
- importing it in eclipse and
- letting it run against a locally running cassandra instance (v0.7.3) which
has the default settings (no changes in the .yaml)

I hope that someone can help me with this issue ... after a couple of days
it's driving me bonkers.

Thx in advance,
Johannes


Exception:
ERROR 14:47:56,842 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IndexOutOfBoundsException: 6
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
meUUIDType.java:56)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:45)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:29)
        at
org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
.java:1608)
        at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
:1552)
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:42)
        ... 4 more
ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
<GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>




--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com<http://www.datastax.com/>

<GetTimeUUIDIndexedSlices.java>

AW: AW: problems while TimeUUIDType-index-querying with two expressions

Posted by Roland Gude <ro...@yoochoose.com>.

With debugging into it i found something that might be the issue (please correct me if I am wrong):
In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether some column satisfies an index expression.
In line 1608 it compares the value of the index expression with the value given in the expression.

For this comparison it utilizes the comparator of the columnfamily while it should use the comparator of the Column validation class.

    private static boolean satisfies(ColumnFamily data, IndexClause clause, IndexExpression first)
    {
        for (IndexExpression expression : clause.expressions)
        {
            // (we can skip "first" since we already know it's satisfied)
            if (expression == first)
                continue;
            // check column data vs expression
            IColumn column = data.getColumn(expression.column_name);
            if (column == null)
                return false;
            int v = data.getComparator().compare(column.value(), expression.value);
            if (!satisfies(v, expression.op))
                return false;
        }
        return true;
    }


The line 1608 should be changed from:
            int v = data.getComparator().compare(column.value(), expression.value);

to
            int v = data.metadata().getValueValidator (expression.column_name).compare(column.value(), expression.value);



greetings roland


Von: Roland Gude [mailto:roland.gude@yoochoose.com]
Gesendet: Mittwoch, 16. März 2011 14:50
An: user@cassandra.apache.org
Betreff: AW: AW: problems while TimeUUIDType-index-querying with two expressions

Hi Aaron,

now I am completely confused.
The code that did not work for days now - like a miracle - works even against the unpatched Cassandra 0.7.3 but the testcase still does not...
There seems to be some randomness in whether it works or not (which is a bad sign I think)... I will debug a little deeper into this and report anything I find.

Greetings,
roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Mittwoch, 16. März 2011 01:15
An: user@cassandra.apache.org
Betreff: Re: AW: problems while TimeUUIDType-index-querying with two expressions

Have attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-2328

Can you give it a try ? You should not get a InvalidRequestException when you send an invalid name or value in the query expression.

Aaron

On 16 Mar 2011, at 10:30, aaron morton wrote:

Will have the Jira I created finished soon, it's a legitimate issue we should be validating the column names and values when a ger_indexed_slice() request is sent. The error in your original email shows that.

WRT your code example. You are using the TimeUUID Validator for the column name when creating the index expression, but are using a string serialiser for the value...
IndexedSlicesQuery<String, UUID, String> indexQuery = HFactory
                        .createIndexedSlicesQuery(keyspace,
                                               stringSerializer, UUID_SERIALIZER, stringSerializer);
        indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
But your schema is saying it is a bytes type...

column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 00000001-0000-1000-0000-000000000000, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011, at 22:41,

Once I have the patch can you apply it and run your test again ?

You may also want to ask on the Hector list if it automagically check you are using the correct types when creating an IndexedSlicesQuery.

Aaron

Roland Gude wrote:

Forgot to attach the source code... here it comes

Von: Roland Gude [mailto:roland.gude@yoochoose.com]
Gesendet: Dienstag, 15. März 2011 10:39
An: user@cassandra.apache.org<ma...@cassandra.apache.org>
Betreff: AW: problems while TimeUUIDType-index-querying with two expressions

Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists.

As far as I can see, there is nothing wrong with the IndexExpression.
using two Index expressions with key=TimedUUID and Value=anything does not work
using one index expression (any one of the other two) alone does work fine.

I refactored Johannes code into a junit testcase. It  needs the cluster configured as described in Johannes mail.
There are three cases. Two with one of the indexExpressions and one with both index expression. The one with Both IndexExpression will never finish and youz will see the exception in the Cassandra logs.

Bye,
roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Dienstag, 15. März 2011 07:54
An: user@cassandra.apache.org<ma...@cassandra.apache.org>
Cc: Juergen Link; Roland Gude; hermes@datastax.com<ma...@datastax.com>
Betreff: Re: problems while TimeUUIDType-index-querying with two expressions

Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328

Aaron
On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:

Sounds like we should send an InvalidRequestException then.

On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com>> wrote:
It's failing to when comparing two TimeUUID values because on of them is not
properly formatted. In this case it's comparing a stored value with the
value passed in the get_indexed_slice() query expression.
I'm going to assume it's the value passed for the expression.
When you create the IndexedSlicesQuery this is incorrect
IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
.createIndexedSlicesQuery(keyspace,
stringSerializer, bytesSerializer, bytesSerializer);
Use a UUIDSerializer for the last param and then pass the UUID you want to
build the expressing. Rather than the string/byte thing you are passing
Hope that helps.
Aaron
On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:

Hi all,

in order to improve our queries, we started to use IndexedSliceQueries from
the hector project (https://github.com/zznate/hector-examples). I followed
the instructions for creating IndexedSlicesQuery with
GetIndexedSlices.java.
I created the corresponding CF with in a keyspace called "Keyspace1" (
"create keyspace  Keyspace1;") with:
"create column family Indexed1 with column_type='Standard' and
comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
rows_cached=20000 and column_metadata=[{column_name: birthdate,
validation_class: LongType, index_name: dateIndex, index_type:
KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
monthIndex, index_type: KEYS}];"
and the example GetIndexedSlices.java worked fine.

Output of CF Indexed1:
---------------------------------------
[default@Keyspace1] list Indexed1;
Using default limit of 100
-------------------
RowKey: fake_key_12
=> (column=birthdate, value=1974, timestamp=1300110485826059)
=> (column=birthmonth, value=0, timestamp=1300110485826060)
=> (column=fake_column_0, value=66616b655f76616c75655f305f3132,
timestamp=1300110485826056)
=> (column=fake_column_1, value=66616b655f76616c75655f315f3132,
timestamp=1300110485826057)
=> (column=fake_column_2, value=66616b655f76616c75655f325f3132,
timestamp=1300110485826058)
-------------------
RowKey: fake_key_8
=> (column=birthdate, value=1974, timestamp=1300110485826039)
=> (column=birthmonth, value=8, timestamp=1300110485826040)
=> (column=fake_column_0, value=66616b655f76616c75655f305f38,
timestamp=1300110485826036)
=> (column=fake_column_1, value=66616b655f76616c75655f315f38,
timestamp=1300110485826037)
=> (column=fake_column_2, value=66616b655f76616c75655f325f38,
timestamp=1300110485826038)
-------------------
....


Now to the problem:
As we have another column format in our cluster (using TimeUUIDType as
comparator in CF definition) I adapted the application to our schema on a
cassandra-0.7.3 cluster.
We use a manually defined UUID for a mandator id index
(00000000-0000-1000-0000-000000000000) and another one for a userid index
(00000001-0000-1000-0000-000000000000). It can be created with:
"create column family ByUser with column_type='Standard' and
comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
and rows_cached=20000 and column_metadata=[{column_name:
00000000-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: mandatorIndex, index_type: KEYS}, {column_name:
00000001-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: useridIndex, index_type: KEYS}];"


which looks in the cluster using cassandra-cli like this:

[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: ByUser
      Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
      Column Metadata:
        Column Name: 00000001-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: useridIndex
          Index Type: KEYS
        Column Name: 00000000-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: mandatorIndex
          Index Type: KEYS
    ColumnFamily: Indexed1
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
      Column Metadata:
        Column Name: birthmonth (birthmonth)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: monthIndex
          Index Type: KEYS
        Column Name: birthdate (birthdate)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: dateIndex
          Index Type: KEYS
[default@Keyspace1] list ByUser;
Using default limit of 100
-------------------
RowKey: testMandator!!user01
=> (column=00000000-0000-1000-0000-000000000000,
value=746573744d616e6461746f72, timestamp=1300111213321000)
=> (column=00000001-0000-1000-0000-000000000000, value=757365723031,
timestamp=1300111213322000)
=> (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
timestamp=1300111213561000)

1 Row Returned.

the values of the index colums 00000000-0000-1000-0000-000000000000 and
00000001-0000-1000-0000-000000000000 represent "testMandator" and and
"user01" as bytes
the third column is a randomly generated one with value "15" that are
inserted in GetTimeUUIDIndexedSlices app.
I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
Currently the second index expression for the userid index in
GetTimeUUIDIndexedSlices.queryCf(...) method

            indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
StringSerializer().toBytes(mandator));
        //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
StringSerializer().toBytes(dummyUserId));

is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
lt expression I get an IndexOutOfBoundsException (see below).

This issue can be easily reproduced by
- downloading the zznate example
(https://github.com/zznate/hector-examples),
- mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
- importing it in eclipse and
- letting it run against a locally running cassandra instance (v0.7.3) which
has the default settings (no changes in the .yaml)

I hope that someone can help me with this issue ... after a couple of days
it's driving me bonkers.

Thx in advance,
Johannes


Exception:
ERROR 14:47:56,842 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IndexOutOfBoundsException: 6
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
meUUIDType.java:56)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:45)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:29)
        at
org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
.java:1608)
        at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
:1552)
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:42)
        ... 4 more
ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
<GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>




--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com<http://www.datastax.com/>

<GetTimeUUIDIndexedSlices.java>

AW: AW: problems while TimeUUIDType-index-querying with two expressions

Posted by Roland Gude <ro...@yoochoose.com>.

Hi Aaron,

now I am completely confused.
The code that did not work for days now - like a miracle - works even against the unpatched Cassandra 0.7.3 but the testcase still does not...
There seems to be some randomness in whether it works or not (which is a bad sign I think)... I will debug a little deeper into this and report anything I find.

Greetings,
roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Mittwoch, 16. März 2011 01:15
An: user@cassandra.apache.org
Betreff: Re: AW: problems while TimeUUIDType-index-querying with two expressions

Have attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-2328

Can you give it a try ? You should not get a InvalidRequestException when you send an invalid name or value in the query expression.

Aaron

On 16 Mar 2011, at 10:30, aaron morton wrote:


Will have the Jira I created finished soon, it's a legitimate issue we should be validating the column names and values when a ger_indexed_slice() request is sent. The error in your original email shows that.

WRT your code example. You are using the TimeUUID Validator for the column name when creating the index expression, but are using a string serialiser for the value...
IndexedSlicesQuery<String, UUID, String> indexQuery = HFactory
                        .createIndexedSlicesQuery(keyspace,
                                               stringSerializer, UUID_SERIALIZER, stringSerializer);
        indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
But your schema is saying it is a bytes type...

column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 00000001-0000-1000-0000-000000000000, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011, at 22:41,

Once I have the patch can you apply it and run your test again ?

You may also want to ask on the Hector list if it automagically check you are using the correct types when creating an IndexedSlicesQuery.

Aaron

Roland Gude wrote:


Forgot to attach the source code... here it comes

Von: Roland Gude [mailto:roland.gude@yoochoose.com]
Gesendet: Dienstag, 15. März 2011 10:39
An: user@cassandra.apache.org<ma...@cassandra.apache.org>
Betreff: AW: problems while TimeUUIDType-index-querying with two expressions

Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists.

As far as I can see, there is nothing wrong with the IndexExpression.
using two Index expressions with key=TimedUUID and Value=anything does not work
using one index expression (any one of the other two) alone does work fine.

I refactored Johannes code into a junit testcase. It  needs the cluster configured as described in Johannes mail.
There are three cases. Two with one of the indexExpressions and one with both index expression. The one with Both IndexExpression will never finish and youz will see the exception in the Cassandra logs.

Bye,
roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Dienstag, 15. März 2011 07:54
An: user@cassandra.apache.org<ma...@cassandra.apache.org>
Cc: Juergen Link; Roland Gude; hermes@datastax.com<ma...@datastax.com>
Betreff: Re: problems while TimeUUIDType-index-querying with two expressions

Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328

Aaron
On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:

Sounds like we should send an InvalidRequestException then.

On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com>> wrote:
It's failing to when comparing two TimeUUID values because on of them is not
properly formatted. In this case it's comparing a stored value with the
value passed in the get_indexed_slice() query expression.
I'm going to assume it's the value passed for the expression.
When you create the IndexedSlicesQuery this is incorrect
IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
.createIndexedSlicesQuery(keyspace,
stringSerializer, bytesSerializer, bytesSerializer);
Use a UUIDSerializer for the last param and then pass the UUID you want to
build the expressing. Rather than the string/byte thing you are passing
Hope that helps.
Aaron
On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:

Hi all,

in order to improve our queries, we started to use IndexedSliceQueries from
the hector project (https://github.com/zznate/hector-examples). I followed
the instructions for creating IndexedSlicesQuery with
GetIndexedSlices.java.
I created the corresponding CF with in a keyspace called "Keyspace1" (
"create keyspace  Keyspace1;") with:
"create column family Indexed1 with column_type='Standard' and
comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
rows_cached=20000 and column_metadata=[{column_name: birthdate,
validation_class: LongType, index_name: dateIndex, index_type:
KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
monthIndex, index_type: KEYS}];"
and the example GetIndexedSlices.java worked fine.

Output of CF Indexed1:
---------------------------------------
[default@Keyspace1] list Indexed1;
Using default limit of 100
-------------------
RowKey: fake_key_12
=> (column=birthdate, value=1974, timestamp=1300110485826059)
=> (column=birthmonth, value=0, timestamp=1300110485826060)
=> (column=fake_column_0, value=66616b655f76616c75655f305f3132,
timestamp=1300110485826056)
=> (column=fake_column_1, value=66616b655f76616c75655f315f3132,
timestamp=1300110485826057)
=> (column=fake_column_2, value=66616b655f76616c75655f325f3132,
timestamp=1300110485826058)
-------------------
RowKey: fake_key_8
=> (column=birthdate, value=1974, timestamp=1300110485826039)
=> (column=birthmonth, value=8, timestamp=1300110485826040)
=> (column=fake_column_0, value=66616b655f76616c75655f305f38,
timestamp=1300110485826036)
=> (column=fake_column_1, value=66616b655f76616c75655f315f38,
timestamp=1300110485826037)
=> (column=fake_column_2, value=66616b655f76616c75655f325f38,
timestamp=1300110485826038)
-------------------
....


Now to the problem:
As we have another column format in our cluster (using TimeUUIDType as
comparator in CF definition) I adapted the application to our schema on a
cassandra-0.7.3 cluster.
We use a manually defined UUID for a mandator id index
(00000000-0000-1000-0000-000000000000) and another one for a userid index
(00000001-0000-1000-0000-000000000000). It can be created with:
"create column family ByUser with column_type='Standard' and
comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
and rows_cached=20000 and column_metadata=[{column_name:
00000000-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: mandatorIndex, index_type: KEYS}, {column_name:
00000001-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: useridIndex, index_type: KEYS}];"


which looks in the cluster using cassandra-cli like this:

[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: ByUser
      Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
      Column Metadata:
        Column Name: 00000001-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: useridIndex
          Index Type: KEYS
        Column Name: 00000000-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: mandatorIndex
          Index Type: KEYS
    ColumnFamily: Indexed1
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
      Column Metadata:
        Column Name: birthmonth (birthmonth)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: monthIndex
          Index Type: KEYS
        Column Name: birthdate (birthdate)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: dateIndex
          Index Type: KEYS
[default@Keyspace1] list ByUser;
Using default limit of 100
-------------------
RowKey: testMandator!!user01
=> (column=00000000-0000-1000-0000-000000000000,
value=746573744d616e6461746f72, timestamp=1300111213321000)
=> (column=00000001-0000-1000-0000-000000000000, value=757365723031,
timestamp=1300111213322000)
=> (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
timestamp=1300111213561000)

1 Row Returned.

the values of the index colums 00000000-0000-1000-0000-000000000000 and
00000001-0000-1000-0000-000000000000 represent "testMandator" and and
"user01" as bytes
the third column is a randomly generated one with value "15" that are
inserted in GetTimeUUIDIndexedSlices app.
I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
Currently the second index expression for the userid index in
GetTimeUUIDIndexedSlices.queryCf(...) method

            indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
StringSerializer().toBytes(mandator));
        //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
StringSerializer().toBytes(dummyUserId));

is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
lt expression I get an IndexOutOfBoundsException (see below).

This issue can be easily reproduced by
- downloading the zznate example
(https://github.com/zznate/hector-examples),
- mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
- importing it in eclipse and
- letting it run against a locally running cassandra instance (v0.7.3) which
has the default settings (no changes in the .yaml)

I hope that someone can help me with this issue ... after a couple of days
it's driving me bonkers.

Thx in advance,
Johannes


Exception:
ERROR 14:47:56,842 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IndexOutOfBoundsException: 6
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
meUUIDType.java:56)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:45)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:29)
        at
org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
.java:1608)
        at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
:1552)
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:42)
        ... 4 more
ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
<GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>




--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com<http://www.datastax.com/>

<GetTimeUUIDIndexedSlices.java>

AW: AW: problems while TimeUUIDType-index-querying with two expressions

Posted by Roland Gude <ro...@yoochoose.com>.

Hi Aaron,

First of all, thank you for putting effort into this, but I still think we are not talking about the same issue. Nevertheless I applied the patch and did run the testcase again and it still fails.
I also made the change you pointed out with using a different value validator and serializer. It does not make a lot of difference in this case though as BytesType validation should blindly accept any value I guess - It should not matter if I have a ByteType column and the bytes I put into there happen to be valid UTF8 Strings. I just did that in the testcase to make the output more readable. In the real environment there are just Bytes. Anyways we tried this with other Value ValidationTypes as well and from the Exception I still don't think it has anything to do with the issue.
Anyways. I attached the modified testcases again, but let me briefly describe the issue again because I think we are misunderstanding each other constantly. Sorry for that.

Given a ColumnFamily with UTF8Type ordering and any ValueType
It is possible to generate Indexes for any column
It is possible to query for rows based on one or more indexes.

Given a ColumnFamily with TimedUUID column ordering and any ValueType
It is possible to generate Indexes for any column.
It is possible to query for rows based on exactly one of the indexed columns.
It is not possible to query for rows based on more than one Indexed columns (operator does not matter - any combination fails).

If a query is constructed that queries based on more than one indexed column, the Cassandra server will log the following exception:

ERROR 14:08:38,774 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:51)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IndexOutOfBoundsException: 6
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
        at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56)
        at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45)
        at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29)
        at org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore.java:1608)
        at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1552)
        at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:42)
        ... 4 more

I am still very certain that the index expressions are correct because they work if they are used stand alone, but not in combination.
If they were Invalid, in the patched version they should throw an InvalidRequestException which they do not.



Thanks for looking into this.
Roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Mittwoch, 16. März 2011 01:15
An: user@cassandra.apache.org
Betreff: Re: AW: problems while TimeUUIDType-index-querying with two expressions

Have attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-2328

Can you give it a try ? You should not get a InvalidRequestException when you send an invalid name or value in the query expression.

Aaron

On 16 Mar 2011, at 10:30, aaron morton wrote:


Will have the Jira I created finished soon, it's a legitimate issue we should be validating the column names and values when a ger_indexed_slice() request is sent. The error in your original email shows that.

WRT your code example. You are using the TimeUUID Validator for the column name when creating the index expression, but are using a string serialiser for the value...
IndexedSlicesQuery<String, UUID, String> indexQuery = HFactory
                        .createIndexedSlicesQuery(keyspace,
                                               stringSerializer, UUID_SERIALIZER, stringSerializer);
        indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
But your schema is saying it is a bytes type...

column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 00000001-0000-1000-0000-000000000000, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011, at 22:41,

Once I have the patch can you apply it and run your test again ?

You may also want to ask on the Hector list if it automagically check you are using the correct types when creating an IndexedSlicesQuery.

Aaron

Roland Gude wrote:


Forgot to attach the source code... here it comes

Von: Roland Gude [mailto:roland.gude@yoochoose.com]
Gesendet: Dienstag, 15. März 2011 10:39
An: user@cassandra.apache.org<ma...@cassandra.apache.org>
Betreff: AW: problems while TimeUUIDType-index-querying with two expressions

Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists.

As far as I can see, there is nothing wrong with the IndexExpression.
using two Index expressions with key=TimedUUID and Value=anything does not work
using one index expression (any one of the other two) alone does work fine.

I refactored Johannes code into a junit testcase. It  needs the cluster configured as described in Johannes mail.
There are three cases. Two with one of the indexExpressions and one with both index expression. The one with Both IndexExpression will never finish and youz will see the exception in the Cassandra logs.

Bye,
roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Dienstag, 15. März 2011 07:54
An: user@cassandra.apache.org<ma...@cassandra.apache.org>
Cc: Juergen Link; Roland Gude; hermes@datastax.com<ma...@datastax.com>
Betreff: Re: problems while TimeUUIDType-index-querying with two expressions

Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328

Aaron
On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:

Sounds like we should send an InvalidRequestException then.

On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com>> wrote:
It's failing to when comparing two TimeUUID values because on of them is not
properly formatted. In this case it's comparing a stored value with the
value passed in the get_indexed_slice() query expression.
I'm going to assume it's the value passed for the expression.
When you create the IndexedSlicesQuery this is incorrect
IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
.createIndexedSlicesQuery(keyspace,
stringSerializer, bytesSerializer, bytesSerializer);
Use a UUIDSerializer for the last param and then pass the UUID you want to
build the expressing. Rather than the string/byte thing you are passing
Hope that helps.
Aaron
On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:

Hi all,

in order to improve our queries, we started to use IndexedSliceQueries from
the hector project (https://github.com/zznate/hector-examples). I followed
the instructions for creating IndexedSlicesQuery with
GetIndexedSlices.java.
I created the corresponding CF with in a keyspace called "Keyspace1" (
"create keyspace  Keyspace1;") with:
"create column family Indexed1 with column_type='Standard' and
comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
rows_cached=20000 and column_metadata=[{column_name: birthdate,
validation_class: LongType, index_name: dateIndex, index_type:
KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
monthIndex, index_type: KEYS}];"
and the example GetIndexedSlices.java worked fine.

Output of CF Indexed1:
---------------------------------------
[default@Keyspace1] list Indexed1;
Using default limit of 100
-------------------
RowKey: fake_key_12
=> (column=birthdate, value=1974, timestamp=1300110485826059)
=> (column=birthmonth, value=0, timestamp=1300110485826060)
=> (column=fake_column_0, value=66616b655f76616c75655f305f3132,
timestamp=1300110485826056)
=> (column=fake_column_1, value=66616b655f76616c75655f315f3132,
timestamp=1300110485826057)
=> (column=fake_column_2, value=66616b655f76616c75655f325f3132,
timestamp=1300110485826058)
-------------------
RowKey: fake_key_8
=> (column=birthdate, value=1974, timestamp=1300110485826039)
=> (column=birthmonth, value=8, timestamp=1300110485826040)
=> (column=fake_column_0, value=66616b655f76616c75655f305f38,
timestamp=1300110485826036)
=> (column=fake_column_1, value=66616b655f76616c75655f315f38,
timestamp=1300110485826037)
=> (column=fake_column_2, value=66616b655f76616c75655f325f38,
timestamp=1300110485826038)
-------------------
....


Now to the problem:
As we have another column format in our cluster (using TimeUUIDType as
comparator in CF definition) I adapted the application to our schema on a
cassandra-0.7.3 cluster.
We use a manually defined UUID for a mandator id index
(00000000-0000-1000-0000-000000000000) and another one for a userid index
(00000001-0000-1000-0000-000000000000). It can be created with:
"create column family ByUser with column_type='Standard' and
comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
and rows_cached=20000 and column_metadata=[{column_name:
00000000-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: mandatorIndex, index_type: KEYS}, {column_name:
00000001-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: useridIndex, index_type: KEYS}];"


which looks in the cluster using cassandra-cli like this:

[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: ByUser
      Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
      Column Metadata:
        Column Name: 00000001-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: useridIndex
          Index Type: KEYS
        Column Name: 00000000-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: mandatorIndex
          Index Type: KEYS
    ColumnFamily: Indexed1
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
      Column Metadata:
        Column Name: birthmonth (birthmonth)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: monthIndex
          Index Type: KEYS
        Column Name: birthdate (birthdate)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: dateIndex
          Index Type: KEYS
[default@Keyspace1] list ByUser;
Using default limit of 100
-------------------
RowKey: testMandator!!user01
=> (column=00000000-0000-1000-0000-000000000000,
value=746573744d616e6461746f72, timestamp=1300111213321000)
=> (column=00000001-0000-1000-0000-000000000000, value=757365723031,
timestamp=1300111213322000)
=> (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
timestamp=1300111213561000)

1 Row Returned.

the values of the index colums 00000000-0000-1000-0000-000000000000 and
00000001-0000-1000-0000-000000000000 represent "testMandator" and and
"user01" as bytes
the third column is a randomly generated one with value "15" that are
inserted in GetTimeUUIDIndexedSlices app.
I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
Currently the second index expression for the userid index in
GetTimeUUIDIndexedSlices.queryCf(...) method

            indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
StringSerializer().toBytes(mandator));
        //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
StringSerializer().toBytes(dummyUserId));

is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
lt expression I get an IndexOutOfBoundsException (see below).

This issue can be easily reproduced by
- downloading the zznate example
(https://github.com/zznate/hector-examples),
- mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
- importing it in eclipse and
- letting it run against a locally running cassandra instance (v0.7.3) which
has the default settings (no changes in the .yaml)

I hope that someone can help me with this issue ... after a couple of days
it's driving me bonkers.

Thx in advance,
Johannes


Exception:
ERROR 14:47:56,842 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IndexOutOfBoundsException: 6
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
meUUIDType.java:56)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:45)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:29)
        at
org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
.java:1608)
        at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
:1552)
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:42)
        ... 4 more
ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
<GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>




--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com<http://www.datastax.com/>

<GetTimeUUIDIndexedSlices.java>

Re: AW: problems while TimeUUIDType-index-querying with two expressions

Posted by aaron morton <aa...@thelastpickle.com>.

Have attached a patch to https://issues.apache.org/jira/browse/CASSANDRA-2328 

Can you give it a try ? You should not get a InvalidRequestException when you send an invalid name or value in the query expression. 

Aaron

On 16 Mar 2011, at 10:30, aaron morton wrote:

> Will have the Jira I created finished soon, it's a legitimate issue we should be validating the column names and values when a ger_indexed_slice() request is sent. The error in your original email shows that. 
> 
> WRT your code example. You are using the TimeUUID Validator for the column name when creating the index expression, but are using a string serialiser for the value...
> IndexedSlicesQuery<String, UUID, String> indexQuery = HFactory
> 		.createIndexedSlicesQuery(keyspace,
> 				stringSerializer, UUID_SERIALIZER, stringSerializer);
>         indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
> 
> But your schema is saying it is a bytes type...
> 
> column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 00000001-0000-1000-0000-000000000000, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011, at 22:41, 
> 
> Once I have the patch can you apply it and run your test again ?
> 
> You may also want to ask on the Hector list if it automagically check you are using the correct types when creating an IndexedSlicesQuery. 
> 
> Aaron
> 
> Roland Gude wrote:
> 
>> Forgot to attach the source code… here it comes
>>  
>> Von: Roland Gude [mailto:roland.gude@yoochoose.com] 
>> Gesendet: Dienstag, 15. März 2011 10:39
>> An: user@cassandra.apache.org
>> Betreff: AW: problems while TimeUUIDType-index-querying with two expressions
>>  
>> Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists.
>>  
>> As far as I can see, there is nothing wrong with the IndexExpression.
>> using two Index expressions with key=TimedUUID and Value=anything does not work
>> using one index expression (any one of the other two) alone does work fine.
>>  
>> I refactored Johannes code into a junit testcase. It  needs the cluster configured as described in Johannes mail.
>> There are three cases. Two with one of the indexExpressions and one with both index expression. The one with Both IndexExpression will never finish and youz will see the exception in the Cassandra logs.
>>  
>> Bye,
>> roland
>>  
>> Von: aaron morton [mailto:aaron@thelastpickle.com] 
>> Gesendet: Dienstag, 15. März 2011 07:54
>> An: user@cassandra.apache.org
>> Cc: Juergen Link; Roland Gude; hermes@datastax.com
>> Betreff: Re: problems while TimeUUIDType-index-querying with two expressions
>>  
>> Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328
>>  
>> Aaron
>> On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:
>>  
>> 
>> Sounds like we should send an InvalidRequestException then.
>> 
>> On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> 
>> It's failing to when comparing two TimeUUID values because on of them is not
>> properly formatted. In this case it's comparing a stored value with the
>> value passed in the get_indexed_slice() query expression.
>> I'm going to assume it's the value passed for the expression.
>> When you create the IndexedSlicesQuery this is incorrect
>> IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
>> .createIndexedSlicesQuery(keyspace,
>> stringSerializer, bytesSerializer, bytesSerializer);
>> Use a UUIDSerializer for the last param and then pass the UUID you want to
>> build the expressing. Rather than the string/byte thing you are passing
>> Hope that helps.
>> Aaron
>> On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:
>>  
>> Hi all,
>>  
>> in order to improve our queries, we started to use IndexedSliceQueries from
>> the hector project (https://github.com/zznate/hector-examples). I followed
>> the instructions for creating IndexedSlicesQuery with
>> GetIndexedSlices.java.
>> I created the corresponding CF with in a keyspace called “Keyspace1” (
>> “create keyspace  Keyspace1;”) with:
>> "create column family Indexed1 with column_type='Standard' and
>> comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
>> rows_cached=20000 and column_metadata=[{column_name: birthdate,
>> validation_class: LongType, index_name: dateIndex, index_type:
>> KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
>> monthIndex, index_type: KEYS}];"
>> and the example GetIndexedSlices.java worked fine.
>>  
>> Output of CF Indexed1:
>> ---------------------------------------
>> [default@Keyspace1] list Indexed1;
>> Using default limit of 100
>> -------------------
>> RowKey: fake_key_12
>> => (column=birthdate, value=1974, timestamp=1300110485826059)
>> => (column=birthmonth, value=0, timestamp=1300110485826060)
>> => (column=fake_column_0, value=66616b655f76616c75655f305f3132,
>> timestamp=1300110485826056)
>> => (column=fake_column_1, value=66616b655f76616c75655f315f3132,
>> timestamp=1300110485826057)
>> => (column=fake_column_2, value=66616b655f76616c75655f325f3132,
>> timestamp=1300110485826058)
>> -------------------
>> RowKey: fake_key_8
>> => (column=birthdate, value=1974, timestamp=1300110485826039)
>> => (column=birthmonth, value=8, timestamp=1300110485826040)
>> => (column=fake_column_0, value=66616b655f76616c75655f305f38,
>> timestamp=1300110485826036)
>> => (column=fake_column_1, value=66616b655f76616c75655f315f38,
>> timestamp=1300110485826037)
>> => (column=fake_column_2, value=66616b655f76616c75655f325f38,
>> timestamp=1300110485826038)
>> -------------------
>> ....
>>  
>>  
>> Now to the problem:
>> As we have another column format in our cluster (using TimeUUIDType as
>> comparator in CF definition) I adapted the application to our schema on a
>> cassandra-0.7.3 cluster.
>> We use a manually defined UUID for a mandator id index
>> (00000000-0000-1000-0000-000000000000) and another one for a userid index
>> (00000001-0000-1000-0000-000000000000). It can be created with:
>> "create column family ByUser with column_type='Standard' and
>> comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
>> and rows_cached=20000 and column_metadata=[{column_name:
>> 00000000-0000-1000-0000-000000000000, validation_class: BytesType,
>> index_name: mandatorIndex, index_type: KEYS}, {column_name:
>> 00000001-0000-1000-0000-000000000000, validation_class: BytesType,
>> index_name: useridIndex, index_type: KEYS}];"
>>  
>>  
>> which looks in the cluster using cassandra-cli like this:
>>  
>> [default@Keyspace1] describe keyspace;
>> Keyspace: Keyspace1:
>>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>>     Replication Factor: 1
>>   Column Families:
>>     ColumnFamily: ByUser
>>       Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>>       Row cache size / save period: 20000.0/0
>>       Key cache size / save period: 200000.0/14400
>>       Memtable thresholds: 0.2953125/63/1440
>>       GC grace seconds: 864000
>>       Compaction min/max thresholds: 4/32
>>       Read repair chance: 0.01
>>       Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
>>       Column Metadata:
>>         Column Name: 00000001-0000-1000-0000-000000000000
>>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>>           Index Name: useridIndex
>>           Index Type: KEYS
>>         Column Name: 00000000-0000-1000-0000-000000000000
>>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>>           Index Name: mandatorIndex
>>           Index Type: KEYS
>>     ColumnFamily: Indexed1
>>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>       Row cache size / save period: 20000.0/0
>>       Key cache size / save period: 200000.0/14400
>>       Memtable thresholds: 0.2953125/63/1440
>>       GC grace seconds: 864000
>>       Compaction min/max thresholds: 4/32
>>       Read repair chance: 0.01
>>       Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
>>       Column Metadata:
>>         Column Name: birthmonth (birthmonth)
>>           Validation Class: org.apache.cassandra.db.marshal.LongType
>>           Index Name: monthIndex
>>           Index Type: KEYS
>>         Column Name: birthdate (birthdate)
>>           Validation Class: org.apache.cassandra.db.marshal.LongType
>>           Index Name: dateIndex
>>           Index Type: KEYS
>> [default@Keyspace1] list ByUser;
>> Using default limit of 100
>> -------------------
>> RowKey: testMandator!!user01
>> => (column=00000000-0000-1000-0000-000000000000,
>> value=746573744d616e6461746f72, timestamp=1300111213321000)
>> => (column=00000001-0000-1000-0000-000000000000, value=757365723031,
>> timestamp=1300111213322000)
>> => (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
>> timestamp=1300111213561000)
>>  
>> 1 Row Returned.
>>  
>> the values of the index colums 00000000-0000-1000-0000-000000000000 and
>> 00000001-0000-1000-0000-000000000000 represent "testMandator" and and
>> "user01" as bytes
>> the third column is a randomly generated one with value "15" that are
>> inserted in GetTimeUUIDIndexedSlices app.
>> I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
>> Currently the second index expression for the userid index in
>> GetTimeUUIDIndexedSlices.queryCf(...) method
>>  
>>             indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
>> StringSerializer().toBytes(mandator));
>>         //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
>> StringSerializer().toBytes(dummyUserId));
>>  
>> is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
>> IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
>> lt expression I get an IndexOutOfBoundsException (see below).
>>  
>> This issue can be easily reproduced by
>> - downloading the zznate example
>> (https://github.com/zznate/hector-examples),
>> - mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
>> - importing it in eclipse and
>> - letting it run against a locally running cassandra instance (v0.7.3) which
>> has the default settings (no changes in the .yaml)
>>  
>> I hope that someone can help me with this issue ... after a couple of days
>> it's driving me bonkers.
>>  
>> Thx in advance,
>> Johannes
>>  
>>  
>> Exception:
>> ERROR 14:47:56,842 Error in ThreadPoolExecutor
>> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>>         at
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> bHandler.java:51)
>>         at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>> java:72)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>> utor.java:886)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> .java:908)
>>         at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.lang.IndexOutOfBoundsException: 6
>>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
>>         at
>> org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
>> meUUIDType.java:56)
>>         at
>> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>> a:45)
>>         at
>> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>> a:29)
>>         at
>> org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
>> .java:1608)
>>         at
>> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
>> :1552)
>>         at
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> bHandler.java:42)
>>         ... 4 more
>> ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
>> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>>         at
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> bHandler.java:51)
>>         at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>> java:72)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>> utor.java:886)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> .java:908)
>> <GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>
>>  
>> 
>> 
>> 
>> -- 
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>  
>> <GetTimeUUIDIndexedSlices.java>
>

Re: AW: problems while TimeUUIDType-index-querying with two expressions

Posted by aaron morton <aa...@thelastpickle.com>.

Will have the Jira I created finished soon, it's a legitimate issue we should be validating the column names and values when a ger_indexed_slice() request is sent. The error in your original email shows that. 

WRT your code example. You are using the TimeUUID Validator for the column name when creating the index expression, but are using a string serialiser for the value...
IndexedSlicesQuery<String, UUID, String> indexQuery = HFactory
		.createIndexedSlicesQuery(keyspace,
				stringSerializer, UUID_SERIALIZER, stringSerializer);
        indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);

But your schema is saying it is a bytes type...

column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 00000001-0000-1000-0000-000000000000, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011, at 22:41, 

Once I have the patch can you apply it and run your test again ?

You may also want to ask on the Hector list if it automagically check you are using the correct types when creating an IndexedSlicesQuery. 

Aaron

Roland Gude wrote:

> Forgot to attach the source code… here it comes
>  
> Von: Roland Gude [mailto:roland.gude@yoochoose.com] 
> Gesendet: Dienstag, 15. März 2011 10:39
> An: user@cassandra.apache.org
> Betreff: AW: problems while TimeUUIDType-index-querying with two expressions
>  
> Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists.
>  
> As far as I can see, there is nothing wrong with the IndexExpression.
> using two Index expressions with key=TimedUUID and Value=anything does not work
> using one index expression (any one of the other two) alone does work fine.
>  
> I refactored Johannes code into a junit testcase. It  needs the cluster configured as described in Johannes mail.
> There are three cases. Two with one of the indexExpressions and one with both index expression. The one with Both IndexExpression will never finish and youz will see the exception in the Cassandra logs.
>  
> Bye,
> roland
>  
> Von: aaron morton [mailto:aaron@thelastpickle.com] 
> Gesendet: Dienstag, 15. März 2011 07:54
> An: user@cassandra.apache.org
> Cc: Juergen Link; Roland Gude; hermes@datastax.com
> Betreff: Re: problems while TimeUUIDType-index-querying with two expressions
>  
> Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328
>  
> Aaron
> On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:
>  
> 
> Sounds like we should send an InvalidRequestException then.
> 
> On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com> wrote:
> 
> It's failing to when comparing two TimeUUID values because on of them is not
> properly formatted. In this case it's comparing a stored value with the
> value passed in the get_indexed_slice() query expression.
> I'm going to assume it's the value passed for the expression.
> When you create the IndexedSlicesQuery this is incorrect
> IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
> .createIndexedSlicesQuery(keyspace,
> stringSerializer, bytesSerializer, bytesSerializer);
> Use a UUIDSerializer for the last param and then pass the UUID you want to
> build the expressing. Rather than the string/byte thing you are passing
> Hope that helps.
> Aaron
> On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:
>  
> Hi all,
>  
> in order to improve our queries, we started to use IndexedSliceQueries from
> the hector project (https://github.com/zznate/hector-examples). I followed
> the instructions for creating IndexedSlicesQuery with
> GetIndexedSlices.java.
> I created the corresponding CF with in a keyspace called “Keyspace1” (
> “create keyspace  Keyspace1;”) with:
> "create column family Indexed1 with column_type='Standard' and
> comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
> rows_cached=20000 and column_metadata=[{column_name: birthdate,
> validation_class: LongType, index_name: dateIndex, index_type:
> KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
> monthIndex, index_type: KEYS}];"
> and the example GetIndexedSlices.java worked fine.
>  
> Output of CF Indexed1:
> ---------------------------------------
> [default@Keyspace1] list Indexed1;
> Using default limit of 100
> -------------------
> RowKey: fake_key_12
> => (column=birthdate, value=1974, timestamp=1300110485826059)
> => (column=birthmonth, value=0, timestamp=1300110485826060)
> => (column=fake_column_0, value=66616b655f76616c75655f305f3132,
> timestamp=1300110485826056)
> => (column=fake_column_1, value=66616b655f76616c75655f315f3132,
> timestamp=1300110485826057)
> => (column=fake_column_2, value=66616b655f76616c75655f325f3132,
> timestamp=1300110485826058)
> -------------------
> RowKey: fake_key_8
> => (column=birthdate, value=1974, timestamp=1300110485826039)
> => (column=birthmonth, value=8, timestamp=1300110485826040)
> => (column=fake_column_0, value=66616b655f76616c75655f305f38,
> timestamp=1300110485826036)
> => (column=fake_column_1, value=66616b655f76616c75655f315f38,
> timestamp=1300110485826037)
> => (column=fake_column_2, value=66616b655f76616c75655f325f38,
> timestamp=1300110485826038)
> -------------------
> ....
>  
>  
> Now to the problem:
> As we have another column format in our cluster (using TimeUUIDType as
> comparator in CF definition) I adapted the application to our schema on a
> cassandra-0.7.3 cluster.
> We use a manually defined UUID for a mandator id index
> (00000000-0000-1000-0000-000000000000) and another one for a userid index
> (00000001-0000-1000-0000-000000000000). It can be created with:
> "create column family ByUser with column_type='Standard' and
> comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
> and rows_cached=20000 and column_metadata=[{column_name:
> 00000000-0000-1000-0000-000000000000, validation_class: BytesType,
> index_name: mandatorIndex, index_type: KEYS}, {column_name:
> 00000001-0000-1000-0000-000000000000, validation_class: BytesType,
> index_name: useridIndex, index_type: KEYS}];"
>  
>  
> which looks in the cluster using cassandra-cli like this:
>  
> [default@Keyspace1] describe keyspace;
> Keyspace: Keyspace1:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>     Replication Factor: 1
>   Column Families:
>     ColumnFamily: ByUser
>       Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>       Row cache size / save period: 20000.0/0
>       Key cache size / save period: 200000.0/14400
>       Memtable thresholds: 0.2953125/63/1440
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 0.01
>       Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
>       Column Metadata:
>         Column Name: 00000001-0000-1000-0000-000000000000
>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>           Index Name: useridIndex
>           Index Type: KEYS
>         Column Name: 00000000-0000-1000-0000-000000000000
>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>           Index Name: mandatorIndex
>           Index Type: KEYS
>     ColumnFamily: Indexed1
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       Row cache size / save period: 20000.0/0
>       Key cache size / save period: 200000.0/14400
>       Memtable thresholds: 0.2953125/63/1440
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 0.01
>       Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
>       Column Metadata:
>         Column Name: birthmonth (birthmonth)
>           Validation Class: org.apache.cassandra.db.marshal.LongType
>           Index Name: monthIndex
>           Index Type: KEYS
>         Column Name: birthdate (birthdate)
>           Validation Class: org.apache.cassandra.db.marshal.LongType
>           Index Name: dateIndex
>           Index Type: KEYS
> [default@Keyspace1] list ByUser;
> Using default limit of 100
> -------------------
> RowKey: testMandator!!user01
> => (column=00000000-0000-1000-0000-000000000000,
> value=746573744d616e6461746f72, timestamp=1300111213321000)
> => (column=00000001-0000-1000-0000-000000000000, value=757365723031,
> timestamp=1300111213322000)
> => (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
> timestamp=1300111213561000)
>  
> 1 Row Returned.
>  
> the values of the index colums 00000000-0000-1000-0000-000000000000 and
> 00000001-0000-1000-0000-000000000000 represent "testMandator" and and
> "user01" as bytes
> the third column is a randomly generated one with value "15" that are
> inserted in GetTimeUUIDIndexedSlices app.
> I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
> Currently the second index expression for the userid index in
> GetTimeUUIDIndexedSlices.queryCf(...) method
>  
>             indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
> StringSerializer().toBytes(mandator));
>         //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
> StringSerializer().toBytes(dummyUserId));
>  
> is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
> IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
> lt expression I get an IndexOutOfBoundsException (see below).
>  
> This issue can be easily reproduced by
> - downloading the zznate example
> (https://github.com/zznate/hector-examples),
> - mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
> - importing it in eclipse and
> - letting it run against a locally running cassandra instance (v0.7.3) which
> has the default settings (no changes in the .yaml)
>  
> I hope that someone can help me with this issue ... after a couple of days
> it's driving me bonkers.
>  
> Thx in advance,
> Johannes
>  
>  
> Exception:
> ERROR 14:47:56,842 Error in ThreadPoolExecutor
> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>         at
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
> bHandler.java:51)
>         at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
> java:72)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> utor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.IndexOutOfBoundsException: 6
>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
>         at
> org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
> meUUIDType.java:56)
>         at
> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
> a:45)
>         at
> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
> a:29)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
> .java:1608)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
> :1552)
>         at
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
> bHandler.java:42)
>         ... 4 more
> ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>         at
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
> bHandler.java:51)
>         at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
> java:72)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> utor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
> <GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>
>  
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>  
> <GetTimeUUIDIndexedSlices.java>

AW: problems while TimeUUIDType-index-querying with two expressions

Posted by Roland Gude <ro...@yoochoose.com>.

Forgot to attach the source code... here it comes

Von: Roland Gude [mailto:roland.gude@yoochoose.com]
Gesendet: Dienstag, 15. März 2011 10:39
An: user@cassandra.apache.org
Betreff: AW: problems while TimeUUIDType-index-querying with two expressions

Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists.

As far as I can see, there is nothing wrong with the IndexExpression.
using two Index expressions with key=TimedUUID and Value=anything does not work
using one index expression (any one of the other two) alone does work fine.

I refactored Johannes code into a junit testcase. It  needs the cluster configured as described in Johannes mail.
There are three cases. Two with one of the indexExpressions and one with both index expression. The one with Both IndexExpression will never finish and youz will see the exception in the Cassandra logs.

Bye,
roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Dienstag, 15. März 2011 07:54
An: user@cassandra.apache.org
Cc: Juergen Link; Roland Gude; hermes@datastax.com
Betreff: Re: problems while TimeUUIDType-index-querying with two expressions

Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328

Aaron
On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:

Sounds like we should send an InvalidRequestException then.

On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com>> wrote:
It's failing to when comparing two TimeUUID values because on of them is not
properly formatted. In this case it's comparing a stored value with the
value passed in the get_indexed_slice() query expression.
I'm going to assume it's the value passed for the expression.
When you create the IndexedSlicesQuery this is incorrect
IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
.createIndexedSlicesQuery(keyspace,
stringSerializer, bytesSerializer, bytesSerializer);
Use a UUIDSerializer for the last param and then pass the UUID you want to
build the expressing. Rather than the string/byte thing you are passing
Hope that helps.
Aaron
On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:

Hi all,

in order to improve our queries, we started to use IndexedSliceQueries from
the hector project (https://github.com/zznate/hector-examples). I followed
the instructions for creating IndexedSlicesQuery with
GetIndexedSlices.java.
I created the corresponding CF with in a keyspace called "Keyspace1" (
"create keyspace  Keyspace1;") with:
"create column family Indexed1 with column_type='Standard' and
comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
rows_cached=20000 and column_metadata=[{column_name: birthdate,
validation_class: LongType, index_name: dateIndex, index_type:
KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
monthIndex, index_type: KEYS}];"
and the example GetIndexedSlices.java worked fine.

Output of CF Indexed1:
---------------------------------------
[default@Keyspace1] list Indexed1;
Using default limit of 100
-------------------
RowKey: fake_key_12
=> (column=birthdate, value=1974, timestamp=1300110485826059)
=> (column=birthmonth, value=0, timestamp=1300110485826060)
=> (column=fake_column_0, value=66616b655f76616c75655f305f3132,
timestamp=1300110485826056)
=> (column=fake_column_1, value=66616b655f76616c75655f315f3132,
timestamp=1300110485826057)
=> (column=fake_column_2, value=66616b655f76616c75655f325f3132,
timestamp=1300110485826058)
-------------------
RowKey: fake_key_8
=> (column=birthdate, value=1974, timestamp=1300110485826039)
=> (column=birthmonth, value=8, timestamp=1300110485826040)
=> (column=fake_column_0, value=66616b655f76616c75655f305f38,
timestamp=1300110485826036)
=> (column=fake_column_1, value=66616b655f76616c75655f315f38,
timestamp=1300110485826037)
=> (column=fake_column_2, value=66616b655f76616c75655f325f38,
timestamp=1300110485826038)
-------------------
....


Now to the problem:
As we have another column format in our cluster (using TimeUUIDType as
comparator in CF definition) I adapted the application to our schema on a
cassandra-0.7.3 cluster.
We use a manually defined UUID for a mandator id index
(00000000-0000-1000-0000-000000000000) and another one for a userid index
(00000001-0000-1000-0000-000000000000). It can be created with:
"create column family ByUser with column_type='Standard' and
comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
and rows_cached=20000 and column_metadata=[{column_name:
00000000-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: mandatorIndex, index_type: KEYS}, {column_name:
00000001-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: useridIndex, index_type: KEYS}];"


which looks in the cluster using cassandra-cli like this:

[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: ByUser
      Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
      Column Metadata:
        Column Name: 00000001-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: useridIndex
          Index Type: KEYS
        Column Name: 00000000-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: mandatorIndex
          Index Type: KEYS
    ColumnFamily: Indexed1
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
      Column Metadata:
        Column Name: birthmonth (birthmonth)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: monthIndex
          Index Type: KEYS
        Column Name: birthdate (birthdate)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: dateIndex
          Index Type: KEYS
[default@Keyspace1] list ByUser;
Using default limit of 100
-------------------
RowKey: testMandator!!user01
=> (column=00000000-0000-1000-0000-000000000000,
value=746573744d616e6461746f72, timestamp=1300111213321000)
=> (column=00000001-0000-1000-0000-000000000000, value=757365723031,
timestamp=1300111213322000)
=> (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
timestamp=1300111213561000)

1 Row Returned.

the values of the index colums 00000000-0000-1000-0000-000000000000 and
00000001-0000-1000-0000-000000000000 represent "testMandator" and and
"user01" as bytes
the third column is a randomly generated one with value "15" that are
inserted in GetTimeUUIDIndexedSlices app.
I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
Currently the second index expression for the userid index in
GetTimeUUIDIndexedSlices.queryCf(...) method

            indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
StringSerializer().toBytes(mandator));
        //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
StringSerializer().toBytes(dummyUserId));

is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
lt expression I get an IndexOutOfBoundsException (see below).

This issue can be easily reproduced by
- downloading the zznate example
(https://github.com/zznate/hector-examples),
- mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
- importing it in eclipse and
- letting it run against a locally running cassandra instance (v0.7.3) which
has the default settings (no changes in the .yaml)

I hope that someone can help me with this issue ... after a couple of days
it's driving me bonkers.

Thx in advance,
Johannes


Exception:
ERROR 14:47:56,842 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IndexOutOfBoundsException: 6
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
meUUIDType.java:56)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:45)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:29)
        at
org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
.java:1608)
        at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
:1552)
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:42)
        ... 4 more
ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
<GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>




--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

AW: problems while TimeUUIDType-index-querying with two expressions

Posted by Roland Gude <ro...@yoochoose.com>.

Actually its not the column values that should be UUIDs in our case, but the column keys. The CF uses TimeUUID ordering and the values are just some ByteArrays. Even with changing the code to use UUIDSerializer instead of serializing the UUIDs manually the issue still exists.

As far as I can see, there is nothing wrong with the IndexExpression.
using two Index expressions with key=TimedUUID and Value=anything does not work
using one index expression (any one of the other two) alone does work fine.

I refactored Johannes code into a junit testcase. It  needs the cluster configured as described in Johannes mail.
There are three cases. Two with one of the indexExpressions and one with both index expression. The one with Both IndexExpression will never finish and youz will see the exception in the Cassandra logs.

Bye,
roland

Von: aaron morton [mailto:aaron@thelastpickle.com]
Gesendet: Dienstag, 15. März 2011 07:54
An: user@cassandra.apache.org
Cc: Juergen Link; Roland Gude; hermes@datastax.com
Betreff: Re: problems while TimeUUIDType-index-querying with two expressions

Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328

Aaron
On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:


Sounds like we should send an InvalidRequestException then.

On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com>> wrote:

It's failing to when comparing two TimeUUID values because on of them is not
properly formatted. In this case it's comparing a stored value with the
value passed in the get_indexed_slice() query expression.
I'm going to assume it's the value passed for the expression.
When you create the IndexedSlicesQuery this is incorrect
IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
.createIndexedSlicesQuery(keyspace,
stringSerializer, bytesSerializer, bytesSerializer);
Use a UUIDSerializer for the last param and then pass the UUID you want to
build the expressing. Rather than the string/byte thing you are passing
Hope that helps.
Aaron
On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:

Hi all,

in order to improve our queries, we started to use IndexedSliceQueries from
the hector project (https://github.com/zznate/hector-examples). I followed
the instructions for creating IndexedSlicesQuery with
GetIndexedSlices.java.
I created the corresponding CF with in a keyspace called "Keyspace1" (
"create keyspace  Keyspace1;") with:
"create column family Indexed1 with column_type='Standard' and
comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
rows_cached=20000 and column_metadata=[{column_name: birthdate,
validation_class: LongType, index_name: dateIndex, index_type:
KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
monthIndex, index_type: KEYS}];"
and the example GetIndexedSlices.java worked fine.

Output of CF Indexed1:
---------------------------------------
[default@Keyspace1] list Indexed1;
Using default limit of 100
-------------------
RowKey: fake_key_12
=> (column=birthdate, value=1974, timestamp=1300110485826059)
=> (column=birthmonth, value=0, timestamp=1300110485826060)
=> (column=fake_column_0, value=66616b655f76616c75655f305f3132,
timestamp=1300110485826056)
=> (column=fake_column_1, value=66616b655f76616c75655f315f3132,
timestamp=1300110485826057)
=> (column=fake_column_2, value=66616b655f76616c75655f325f3132,
timestamp=1300110485826058)
-------------------
RowKey: fake_key_8
=> (column=birthdate, value=1974, timestamp=1300110485826039)
=> (column=birthmonth, value=8, timestamp=1300110485826040)
=> (column=fake_column_0, value=66616b655f76616c75655f305f38,
timestamp=1300110485826036)
=> (column=fake_column_1, value=66616b655f76616c75655f315f38,
timestamp=1300110485826037)
=> (column=fake_column_2, value=66616b655f76616c75655f325f38,
timestamp=1300110485826038)
-------------------
....


Now to the problem:
As we have another column format in our cluster (using TimeUUIDType as
comparator in CF definition) I adapted the application to our schema on a
cassandra-0.7.3 cluster.
We use a manually defined UUID for a mandator id index
(00000000-0000-1000-0000-000000000000) and another one for a userid index
(00000001-0000-1000-0000-000000000000). It can be created with:
"create column family ByUser with column_type='Standard' and
comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
and rows_cached=20000 and column_metadata=[{column_name:
00000000-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: mandatorIndex, index_type: KEYS}, {column_name:
00000001-0000-1000-0000-000000000000, validation_class: BytesType,
index_name: useridIndex, index_type: KEYS}];"


which looks in the cluster using cassandra-cli like this:

[default@Keyspace1] describe keyspace;
Keyspace: Keyspace1:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
    Replication Factor: 1
  Column Families:
    ColumnFamily: ByUser
      Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
      Column Metadata:
        Column Name: 00000001-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: useridIndex
          Index Type: KEYS
        Column Name: 00000000-0000-1000-0000-000000000000
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: mandatorIndex
          Index Type: KEYS
    ColumnFamily: Indexed1
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period: 20000.0/0
      Key cache size / save period: 200000.0/14400
      Memtable thresholds: 0.2953125/63/1440
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.01
      Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
      Column Metadata:
        Column Name: birthmonth (birthmonth)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: monthIndex
          Index Type: KEYS
        Column Name: birthdate (birthdate)
          Validation Class: org.apache.cassandra.db.marshal.LongType
          Index Name: dateIndex
          Index Type: KEYS
[default@Keyspace1] list ByUser;
Using default limit of 100
-------------------
RowKey: testMandator!!user01
=> (column=00000000-0000-1000-0000-000000000000,
value=746573744d616e6461746f72, timestamp=1300111213321000)
=> (column=00000001-0000-1000-0000-000000000000, value=757365723031,
timestamp=1300111213322000)
=> (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
timestamp=1300111213561000)

1 Row Returned.

the values of the index colums 00000000-0000-1000-0000-000000000000 and
00000001-0000-1000-0000-000000000000 represent "testMandator" and and
"user01" as bytes
the third column is a randomly generated one with value "15" that are
inserted in GetTimeUUIDIndexedSlices app.
I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
Currently the second index expression for the userid index in
GetTimeUUIDIndexedSlices.queryCf(...) method

            indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
StringSerializer().toBytes(mandator));
        //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
StringSerializer().toBytes(dummyUserId));

is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
lt expression I get an IndexOutOfBoundsException (see below).

This issue can be easily reproduced by
- downloading the zznate example
(https://github.com/zznate/hector-examples),
- mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
- importing it in eclipse and
- letting it run against a locally running cassandra instance (v0.7.3) which
has the default settings (no changes in the .yaml)

I hope that someone can help me with this issue ... after a couple of days
it's driving me bonkers.

Thx in advance,
Johannes


Exception:
ERROR 14:47:56,842 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IndexOutOfBoundsException: 6
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
meUUIDType.java:56)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:45)
        at
org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:29)
        at
org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
.java:1608)
        at
org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
:1552)
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:42)
        ... 4 more
ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
        at
org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
        at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
<GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>




--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: problems while TimeUUIDType-index-querying with two expressions

Posted by aaron morton <aa...@thelastpickle.com>.

Perfectly reasonable, created https://issues.apache.org/jira/browse/CASSANDRA-2328

Aaron
On 15 Mar 2011, at 16:52, Jonathan Ellis wrote:

> Sounds like we should send an InvalidRequestException then.
> 
> On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> It's failing to when comparing two TimeUUID values because on of them is not
>> properly formatted. In this case it's comparing a stored value with the
>> value passed in the get_indexed_slice() query expression.
>> I'm going to assume it's the value passed for the expression.
>> When you create the IndexedSlicesQuery this is incorrect
>> IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
>> .createIndexedSlicesQuery(keyspace,
>> stringSerializer, bytesSerializer, bytesSerializer);
>> Use a UUIDSerializer for the last param and then pass the UUID you want to
>> build the expressing. Rather than the string/byte thing you are passing
>> Hope that helps.
>> Aaron
>> On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:
>> 
>> Hi all,
>> 
>> in order to improve our queries, we started to use IndexedSliceQueries from
>> the hector project (https://github.com/zznate/hector-examples). I followed
>> the instructions for creating IndexedSlicesQuery with
>> GetIndexedSlices.java.
>> I created the corresponding CF with in a keyspace called “Keyspace1” (
>> “create keyspace  Keyspace1;”) with:
>> "create column family Indexed1 with column_type='Standard' and
>> comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
>> rows_cached=20000 and column_metadata=[{column_name: birthdate,
>> validation_class: LongType, index_name: dateIndex, index_type:
>> KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
>> monthIndex, index_type: KEYS}];"
>> and the example GetIndexedSlices.java worked fine.
>> 
>> Output of CF Indexed1:
>> ---------------------------------------
>> [default@Keyspace1] list Indexed1;
>> Using default limit of 100
>> -------------------
>> RowKey: fake_key_12
>> => (column=birthdate, value=1974, timestamp=1300110485826059)
>> => (column=birthmonth, value=0, timestamp=1300110485826060)
>> => (column=fake_column_0, value=66616b655f76616c75655f305f3132,
>> timestamp=1300110485826056)
>> => (column=fake_column_1, value=66616b655f76616c75655f315f3132,
>> timestamp=1300110485826057)
>> => (column=fake_column_2, value=66616b655f76616c75655f325f3132,
>> timestamp=1300110485826058)
>> -------------------
>> RowKey: fake_key_8
>> => (column=birthdate, value=1974, timestamp=1300110485826039)
>> => (column=birthmonth, value=8, timestamp=1300110485826040)
>> => (column=fake_column_0, value=66616b655f76616c75655f305f38,
>> timestamp=1300110485826036)
>> => (column=fake_column_1, value=66616b655f76616c75655f315f38,
>> timestamp=1300110485826037)
>> => (column=fake_column_2, value=66616b655f76616c75655f325f38,
>> timestamp=1300110485826038)
>> -------------------
>> ....
>> 
>> 
>> Now to the problem:
>> As we have another column format in our cluster (using TimeUUIDType as
>> comparator in CF definition) I adapted the application to our schema on a
>> cassandra-0.7.3 cluster.
>> We use a manually defined UUID for a mandator id index
>> (00000000-0000-1000-0000-000000000000) and another one for a userid index
>> (00000001-0000-1000-0000-000000000000). It can be created with:
>> "create column family ByUser with column_type='Standard' and
>> comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
>> and rows_cached=20000 and column_metadata=[{column_name:
>> 00000000-0000-1000-0000-000000000000, validation_class: BytesType,
>> index_name: mandatorIndex, index_type: KEYS}, {column_name:
>> 00000001-0000-1000-0000-000000000000, validation_class: BytesType,
>> index_name: useridIndex, index_type: KEYS}];"
>> 
>> 
>> which looks in the cluster using cassandra-cli like this:
>> 
>> [default@Keyspace1] describe keyspace;
>> Keyspace: Keyspace1:
>>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>>     Replication Factor: 1
>>   Column Families:
>>     ColumnFamily: ByUser
>>       Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>>       Row cache size / save period: 20000.0/0
>>       Key cache size / save period: 200000.0/14400
>>       Memtable thresholds: 0.2953125/63/1440
>>       GC grace seconds: 864000
>>       Compaction min/max thresholds: 4/32
>>       Read repair chance: 0.01
>>       Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
>>       Column Metadata:
>>         Column Name: 00000001-0000-1000-0000-000000000000
>>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>>           Index Name: useridIndex
>>           Index Type: KEYS
>>         Column Name: 00000000-0000-1000-0000-000000000000
>>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>>           Index Name: mandatorIndex
>>           Index Type: KEYS
>>     ColumnFamily: Indexed1
>>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>       Row cache size / save period: 20000.0/0
>>       Key cache size / save period: 200000.0/14400
>>       Memtable thresholds: 0.2953125/63/1440
>>       GC grace seconds: 864000
>>       Compaction min/max thresholds: 4/32
>>       Read repair chance: 0.01
>>       Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
>>       Column Metadata:
>>         Column Name: birthmonth (birthmonth)
>>           Validation Class: org.apache.cassandra.db.marshal.LongType
>>           Index Name: monthIndex
>>           Index Type: KEYS
>>         Column Name: birthdate (birthdate)
>>           Validation Class: org.apache.cassandra.db.marshal.LongType
>>           Index Name: dateIndex
>>           Index Type: KEYS
>> [default@Keyspace1] list ByUser;
>> Using default limit of 100
>> -------------------
>> RowKey: testMandator!!user01
>> => (column=00000000-0000-1000-0000-000000000000,
>> value=746573744d616e6461746f72, timestamp=1300111213321000)
>> => (column=00000001-0000-1000-0000-000000000000, value=757365723031,
>> timestamp=1300111213322000)
>> => (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
>> timestamp=1300111213561000)
>> 
>> 1 Row Returned.
>> 
>> the values of the index colums 00000000-0000-1000-0000-000000000000 and
>> 00000001-0000-1000-0000-000000000000 represent "testMandator" and and
>> "user01" as bytes
>> the third column is a randomly generated one with value "15" that are
>> inserted in GetTimeUUIDIndexedSlices app.
>> I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
>> Currently the second index expression for the userid index in
>> GetTimeUUIDIndexedSlices.queryCf(...) method
>> 
>>             indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
>> StringSerializer().toBytes(mandator));
>>         //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
>> StringSerializer().toBytes(dummyUserId));
>> 
>> is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
>> IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
>> lt expression I get an IndexOutOfBoundsException (see below).
>> 
>> This issue can be easily reproduced by
>> - downloading the zznate example
>> (https://github.com/zznate/hector-examples),
>> - mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
>> - importing it in eclipse and
>> - letting it run against a locally running cassandra instance (v0.7.3) which
>> has the default settings (no changes in the .yaml)
>> 
>> I hope that someone can help me with this issue ... after a couple of days
>> it's driving me bonkers.
>> 
>> Thx in advance,
>> Johannes
>> 
>> 
>> Exception:
>> ERROR 14:47:56,842 Error in ThreadPoolExecutor
>> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>>         at
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> bHandler.java:51)
>>         at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>> java:72)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>> utor.java:886)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> .java:908)
>>         at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.lang.IndexOutOfBoundsException: 6
>>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
>>         at
>> org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
>> meUUIDType.java:56)
>>         at
>> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>> a:45)
>>         at
>> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
>> a:29)
>>         at
>> org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
>> .java:1608)
>>         at
>> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
>> :1552)
>>         at
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> bHandler.java:42)
>>         ... 4 more
>> ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
>> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>>         at
>> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
>> bHandler.java:51)
>>         at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
>> java:72)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
>> utor.java:886)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
>> .java:908)
>> <GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: problems while TimeUUIDType-index-querying with two expressions

Posted by Jonathan Ellis <jb...@gmail.com>.

Sounds like we should send an InvalidRequestException then.

On Mon, Mar 14, 2011 at 8:06 PM, aaron morton <aa...@thelastpickle.com> wrote:
> It's failing to when comparing two TimeUUID values because on of them is not
> properly formatted. In this case it's comparing a stored value with the
> value passed in the get_indexed_slice() query expression.
> I'm going to assume it's the value passed for the expression.
> When you create the IndexedSlicesQuery this is incorrect
> IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
> .createIndexedSlicesQuery(keyspace,
> stringSerializer, bytesSerializer, bytesSerializer);
> Use a UUIDSerializer for the last param and then pass the UUID you want to
> build the expressing. Rather than the string/byte thing you are passing
> Hope that helps.
> Aaron
> On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:
>
> Hi all,
>
> in order to improve our queries, we started to use IndexedSliceQueries from
> the hector project (https://github.com/zznate/hector-examples). I followed
> the instructions for creating IndexedSlicesQuery with
> GetIndexedSlices.java.
> I created the corresponding CF with in a keyspace called “Keyspace1” (
> “create keyspace  Keyspace1;”) with:
> "create column family Indexed1 with column_type='Standard' and
> comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and
> rows_cached=20000 and column_metadata=[{column_name: birthdate,
> validation_class: LongType, index_name: dateIndex, index_type:
> KEYS},{column_name: birthmonth, validation_class: LongType, index_name:
> monthIndex, index_type: KEYS}];"
> and the example GetIndexedSlices.java worked fine.
>
> Output of CF Indexed1:
> ---------------------------------------
> [default@Keyspace1] list Indexed1;
> Using default limit of 100
> -------------------
> RowKey: fake_key_12
> => (column=birthdate, value=1974, timestamp=1300110485826059)
> => (column=birthmonth, value=0, timestamp=1300110485826060)
> => (column=fake_column_0, value=66616b655f76616c75655f305f3132,
> timestamp=1300110485826056)
> => (column=fake_column_1, value=66616b655f76616c75655f315f3132,
> timestamp=1300110485826057)
> => (column=fake_column_2, value=66616b655f76616c75655f325f3132,
> timestamp=1300110485826058)
> -------------------
> RowKey: fake_key_8
> => (column=birthdate, value=1974, timestamp=1300110485826039)
> => (column=birthmonth, value=8, timestamp=1300110485826040)
> => (column=fake_column_0, value=66616b655f76616c75655f305f38,
> timestamp=1300110485826036)
> => (column=fake_column_1, value=66616b655f76616c75655f315f38,
> timestamp=1300110485826037)
> => (column=fake_column_2, value=66616b655f76616c75655f325f38,
> timestamp=1300110485826038)
> -------------------
> ....
>
>
> Now to the problem:
> As we have another column format in our cluster (using TimeUUIDType as
> comparator in CF definition) I adapted the application to our schema on a
> cassandra-0.7.3 cluster.
> We use a manually defined UUID for a mandator id index
> (00000000-0000-1000-0000-000000000000) and another one for a userid index
> (00000001-0000-1000-0000-000000000000). It can be created with:
> "create column family ByUser with column_type='Standard' and
> comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0
> and rows_cached=20000 and column_metadata=[{column_name:
> 00000000-0000-1000-0000-000000000000, validation_class: BytesType,
> index_name: mandatorIndex, index_type: KEYS}, {column_name:
> 00000001-0000-1000-0000-000000000000, validation_class: BytesType,
> index_name: useridIndex, index_type: KEYS}];"
>
>
> which looks in the cluster using cassandra-cli like this:
>
> [default@Keyspace1] describe keyspace;
> Keyspace: Keyspace1:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>     Replication Factor: 1
>   Column Families:
>     ColumnFamily: ByUser
>       Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>       Row cache size / save period: 20000.0/0
>       Key cache size / save period: 200000.0/14400
>       Memtable thresholds: 0.2953125/63/1440
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 0.01
>       Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
>       Column Metadata:
>         Column Name: 00000001-0000-1000-0000-000000000000
>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>           Index Name: useridIndex
>           Index Type: KEYS
>         Column Name: 00000000-0000-1000-0000-000000000000
>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>           Index Name: mandatorIndex
>           Index Type: KEYS
>     ColumnFamily: Indexed1
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       Row cache size / save period: 20000.0/0
>       Key cache size / save period: 200000.0/14400
>       Memtable thresholds: 0.2953125/63/1440
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 0.01
>       Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
>       Column Metadata:
>         Column Name: birthmonth (birthmonth)
>           Validation Class: org.apache.cassandra.db.marshal.LongType
>           Index Name: monthIndex
>           Index Type: KEYS
>         Column Name: birthdate (birthdate)
>           Validation Class: org.apache.cassandra.db.marshal.LongType
>           Index Name: dateIndex
>           Index Type: KEYS
> [default@Keyspace1] list ByUser;
> Using default limit of 100
> -------------------
> RowKey: testMandator!!user01
> => (column=00000000-0000-1000-0000-000000000000,
> value=746573744d616e6461746f72, timestamp=1300111213321000)
> => (column=00000001-0000-1000-0000-000000000000, value=757365723031,
> timestamp=1300111213322000)
> => (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135,
> timestamp=1300111213561000)
>
> 1 Row Returned.
>
> the values of the index colums 00000000-0000-1000-0000-000000000000 and
> 00000001-0000-1000-0000-000000000000 represent "testMandator" and and
> "user01" as bytes
> the third column is a randomly generated one with value "15" that are
> inserted in GetTimeUUIDIndexedSlices app.
> I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices.
> Currently the second index expression for the userid index in
> GetTimeUUIDIndexedSlices.queryCf(...) method
>
>             indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new
> StringSerializer().toBytes(mandator));
>         //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new
> StringSerializer().toBytes(dummyUserId));
>
> is commented out, so the GetTimeUUIDIndexedSlices will run. Using one
> IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or
> lt expression I get an IndexOutOfBoundsException (see below).
>
> This issue can be easily reproduced by
> - downloading the zznate example
> (https://github.com/zznate/hector-examples),
> - mavenizing it to an eclipse project with "mvn clean eclipse:eclipse",
> - importing it in eclipse and
> - letting it run against a locally running cassandra instance (v0.7.3) which
> has the default settings (no changes in the .yaml)
>
> I hope that someone can help me with this issue ... after a couple of days
> it's driving me bonkers.
>
> Thx in advance,
> Johannes
>
>
> Exception:
> ERROR 14:47:56,842 Error in ThreadPoolExecutor
> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>         at
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
> bHandler.java:51)
>         at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
> java:72)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> utor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.IndexOutOfBoundsException: 6
>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
>         at
> org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
> meUUIDType.java:56)
>         at
> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
> a:45)
>         at
> org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
> a:29)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
> .java:1608)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
> :1552)
>         at
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
> bHandler.java:42)
>         ... 4 more
> ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>         at
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
> bHandler.java:51)
>         at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
> java:72)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> utor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
> <GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: problems while TimeUUIDType-index-querying with two expressions

Posted by aaron morton <aa...@thelastpickle.com>.

It's failing to when comparing two TimeUUID values because on of them is not properly formatted. In this case it's comparing a stored value with the value passed in the get_indexed_slice() query expression. 

I'm going to assume it's the value passed for the expression. 

When you create the IndexedSlicesQuery this is incorrect

IndexedSlicesQuery<String, byte[], byte[]> indexQuery = HFactory
		.createIndexedSlicesQuery(keyspace,
				stringSerializer, bytesSerializer, bytesSerializer);

Use a UUIDSerializer for the last param and then pass the UUID you want to build the expressing. Rather than the string/byte thing you are passing

Hope that helps.
Aaron

On 15 Mar 2011, at 04:17, Johannes Hoerle wrote:

> Hi all,
>  
> in order to improve our queries, we started to use IndexedSliceQueries from the hector project (https://github.com/zznate/hector-examples). I followed the instructions for creating IndexedSlicesQuery with GetIndexedSlices.java. 
> I created the corresponding CF with in a keyspace called “Keyspace1” ( “create keyspace  Keyspace1;”) with:
> "create column family Indexed1 with column_type='Standard' and comparator='UTF8Type' and keys_cached=200000 and read_repair_chance=1.0 and rows_cached=20000 and column_metadata=[{column_name: birthdate, validation_class: LongType, index_name: dateIndex, index_type: KEYS},{column_name: birthmonth, validation_class: LongType, index_name: monthIndex, index_type: KEYS}];"
> and the example GetIndexedSlices.java worked fine. 
>  
> Output of CF Indexed1:
> ---------------------------------------
> [default@Keyspace1] list Indexed1;
> Using default limit of 100
> -------------------
> RowKey: fake_key_12
> => (column=birthdate, value=1974, timestamp=1300110485826059)
> => (column=birthmonth, value=0, timestamp=1300110485826060)
> => (column=fake_column_0, value=66616b655f76616c75655f305f3132, timestamp=1300110485826056)
> => (column=fake_column_1, value=66616b655f76616c75655f315f3132, timestamp=1300110485826057)
> => (column=fake_column_2, value=66616b655f76616c75655f325f3132, timestamp=1300110485826058)
> -------------------
> RowKey: fake_key_8
> => (column=birthdate, value=1974, timestamp=1300110485826039)
> => (column=birthmonth, value=8, timestamp=1300110485826040)
> => (column=fake_column_0, value=66616b655f76616c75655f305f38, timestamp=1300110485826036)
> => (column=fake_column_1, value=66616b655f76616c75655f315f38, timestamp=1300110485826037)
> => (column=fake_column_2, value=66616b655f76616c75655f325f38, timestamp=1300110485826038)
> -------------------
> ....
>  
>  
> Now to the problem:
> As we have another column format in our cluster (using TimeUUIDType as comparator in CF definition) I adapted the application to our schema on a cassandra-0.7.3 cluster. 
> We use a manually defined UUID for a mandator id index (00000000-0000-1000-0000-000000000000) and another one for a userid index (00000001-0000-1000-0000-000000000000). It can be created with:
> "create column family ByUser with column_type='Standard' and comparator='TimeUUIDType' and keys_cached=200000 and read_repair_chance=1.0 and rows_cached=20000 and column_metadata=[{column_name: 00000000-0000-1000-0000-000000000000, validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS}, {column_name: 00000001-0000-1000-0000-000000000000, validation_class: BytesType, index_name: useridIndex, index_type: KEYS}];"
>  
>  
> which looks in the cluster using cassandra-cli like this:
>  
> [default@Keyspace1] describe keyspace;
> Keyspace: Keyspace1:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>     Replication Factor: 1
>   Column Families:
>     ColumnFamily: ByUser
>       Columns sorted by: org.apache.cassandra.db.marshal.TimeUUIDType
>       Row cache size / save period: 20000.0/0
>       Key cache size / save period: 200000.0/14400
>       Memtable thresholds: 0.2953125/63/1440
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 0.01
>       Built indexes: [ByUser.mandatorIndex, ByUser.useridIndex]
>       Column Metadata:
>         Column Name: 00000001-0000-1000-0000-000000000000
>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>           Index Name: useridIndex
>           Index Type: KEYS
>         Column Name: 00000000-0000-1000-0000-000000000000
>           Validation Class: org.apache.cassandra.db.marshal.BytesType
>           Index Name: mandatorIndex
>           Index Type: KEYS
>     ColumnFamily: Indexed1
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       Row cache size / save period: 20000.0/0
>       Key cache size / save period: 200000.0/14400
>       Memtable thresholds: 0.2953125/63/1440
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 0.01
>       Built indexes: [Indexed1.dateIndex, Indexed1.monthIndex]
>       Column Metadata:
>         Column Name: birthmonth (birthmonth)
>           Validation Class: org.apache.cassandra.db.marshal.LongType
>           Index Name: monthIndex
>           Index Type: KEYS
>         Column Name: birthdate (birthdate)
>           Validation Class: org.apache.cassandra.db.marshal.LongType
>           Index Name: dateIndex
>           Index Type: KEYS
> [default@Keyspace1] list ByUser;
> Using default limit of 100
> -------------------
> RowKey: testMandator!!user01
> => (column=00000000-0000-1000-0000-000000000000, value=746573744d616e6461746f72, timestamp=1300111213321000)
> => (column=00000001-0000-1000-0000-000000000000, value=757365723031, timestamp=1300111213322000)
> => (column=f064b480-495e-11e0-abc4-0024e89fa587, value=3135, timestamp=1300111213561000)
>  
> 1 Row Returned.
>  
> the values of the index colums 00000000-0000-1000-0000-000000000000 and 00000001-0000-1000-0000-000000000000 represent "testMandator" and and "user01" as bytes 
> the third column is a randomly generated one with value "15" that are inserted in GetTimeUUIDIndexedSlices app.
> I attached both source codes, GetIndexedSlices and GetTimeUUIDIndexedSlices. Currently the second index expression for the userid index in GetTimeUUIDIndexedSlices.queryCf(...) method 
>  
>             indexQuery.addEqualsExpression(asByteArray(MANDATOR_UUID), new StringSerializer().toBytes(mandator));
>         //indexQuery.addEqualsExpression(asByteArray(USERID_INDEX_UUID), new StringSerializer().toBytes(dummyUserId));
>  
> is commented out, so the GetTimeUUIDIndexedSlices will run. Using one IndexQuery works perfectly fine but as soon as I add a second eq, gt, gte or lt expression I get an IndexOutOfBoundsException (see below).
>  
> This issue can be easily reproduced by 
> - downloading the zznate example (https://github.com/zznate/hector-examples), 
> - mavenizing it to an eclipse project with "mvn clean eclipse:eclipse", 
> - importing it in eclipse and 
> - letting it run against a locally running cassandra instance (v0.7.3) which has the default settings (no changes in the .yaml)
>  
> I hope that someone can help me with this issue ... after a couple of days it's driving me bonkers.
>  
> Thx in advance,
> Johannes
>  
>  
> Exception:
> ERROR 14:47:56,842 Error in ThreadPoolExecutor
> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>         at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
> bHandler.java:51)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
> java:72)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> utor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.IndexOutOfBoundsException: 6
>         at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
>         at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
> meUUIDType.java:56)
>         at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
> a:45)
>         at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
> a:29)
>         at org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
> .java:1608)
>         at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
> :1552)
>         at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
> bHandler.java:42)
>         ... 4 more
> ERROR 14:47:56,852 Fatal exception in thread Thread[ReadStage:14,5,main]
> java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
>         at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
> bHandler.java:51)
>         at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
> java:72)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
> utor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
> <GetIndexedSlices.java><GetTimeUUIDIndexedSlices.java>