You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Nate Sammons <NS...@ften.com> on 2011/11/07 22:43:08 UTC

Secondary index issue, unable to query for records that should be there

Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF with several secondary indexes to try out some options.  Right now I have the following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      -- absolute timestamp for this message, also indexed year/month/day/hour/minute
      -- index these as they are low cardinality
      {column_name:messageTimestamp, validation_class:LongType},
      {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS},

                ... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these values on a Hector ColumnFamilyUpdater instance and update that way.  Then later I can query from the command line with CQL such as:

                get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query for what should be there and get data back such as the above query, but later that same query returns 0 rows.  Similarly, with fewer clauses in the expression, like this:

                get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???????
Any idea what could be going wrong?  I'm not getting any exceptions in my client during the write, and I don't see anything in the logs (no errors anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on CQL queries with multiple indexed columns is good (does Cassandra intelligently use all available indexes on these queries?)



Thanks,

-nate

RE: Secondary index issue, unable to query for records that should be there

Posted by Nate Sammons <NS...@ften.com>.
Here is a simple test that shows the problem.  My setup is:


-          DSE 1.0.3 on Ubuntu 11.04, JDK 1.6.0_29 on x86_64, installed from the DataStax debian repo (yesterday)

-          Hector 1.0-1 (from maven)

Attached is a CLI file to create the keyspace and CF, and a java file to insert data and do some queries.


This creates the following CF:

create column family IndexTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      {column_name:year, validation_class:IntegerType, index_type: KEYS},
      {column_name:month, validation_class:IntegerType, index_type: KEYS},
      {column_name:day, validation_class:IntegerType, index_type: KEYS},
      {column_name:hour, validation_class:IntegerType, index_type: KEYS},
      {column_name:minute, validation_class:IntegerType, index_type: KEYS},
      {column_name:data, validation_class:UTF8Type}
  ];


Then inserts 5 rows per minute value, with the following values for year/month/day/hour/minute:

                Year: 2011
                Month: 1, 2
                Day: 1-15
                Hour: 1-23
                Minute: 1-59

For a total of 203,550 rows.  For queries it just picks some known values for year/month/day/hour/minute at random and looks for rows, there should be 5 rows per combination.

Row keys are of the form YEAR-MONTH-DAY-HOUR-MINUTE-NUM (where NUM is 1-5).


Now once that data is inserted, using the CLI I can find records such as the following:


[default@Test] get IndexTest[2011-1-8-18-30--1];
=> (column=data, value=xvktwirapi0qs0ta29w9rchbdc2omsuv0k2chjqp9pmaodlj9ngecllaa8eq3nnx66p591b2a06mry4rpsvkd54ji5pbxikpc6mxj4czi4nuuxgoasibjd5yk65hdtqe8a0uq3yxnw81dgq6hkx8wnbs177rwo51xtkwuhwizoc0gul92pvo6tfivjgdschd9fjzfu4v1d1uxhih3argr1mp4i1h6fqybfv2utlzdzzqczq3ruu90647prrnqwdw1zqmd46ia175a929ltx2hoz8sv6rs817zm2myhp3wekfk3flnuniqgtpth7g5fns8q3oc8qde5btivt1j99gc1h2kxjbek1p448t1hs91lh9r6yrg1douj53sn7d81bnwp4nnbmz01dbr46fae1b9ter0zljet2nl1x751no6pdt64k2mdh0un01gerfihak6vn0wdvgzuv9soji3pwgnffkw2zvm5q0jlp1uf9nmy7gzswydpxwtvc35c6jw64d, timestamp=1320769482652005)
=> (column=day, value=8, timestamp=1320769482652002)
=> (column=hour, value=18, timestamp=1320769482652003)
=> (column=minute, value=30, timestamp=1320769482652004)
=> (column=month, value=1, timestamp=1320769482652001)
=> (column=year, value=2011, timestamp=1320769482652000)
Returned 6 results.


However a CQL query to find that same record fails:

[default@Test] get IndexTest where year=2011 and month=1 and day=8 and hour=18 and minute=30;

0 Row Returned.
[default@Test] get IndexTest where year=2011 and month=1 and day=8 and hour=18;

0 Row Returned.
[default@Test] get IndexTest where year=2011 and month=1 and day=8;

0 Row Returned.
[default@Test] get IndexTest where year=2011 and month=1;


Similar results using CQLSH:

cqlsh> select * from IndexTest where year=2011 and month=1 and day=8 and hour=18 and minute=30;
cqlsh> select * from IndexTest where year=2011 and month=1 and day=8 and hour=18;
cqlsh> select * from IndexTest where year=2011 and month=1 and day=8;

(no results in any of those cases).




However, some data does show up through CQL (I omitted the column data for brevity):

[default@Test] get IndexTest where year=2011 and month=2 and day=8 and hour=18 and minute=30;
-------------------
RowKey: 2011-2-8-18-30--1
-------------------
RowKey: 2011-2-8-18-30--4
-------------------
RowKey: 2011-2-8-18-30--5
-------------------
RowKey: 2011-2-8-18-30--2
-------------------
RowKey: 2011-2-8-18-30--3

5 Rows Returned.


So it seems like (in this case), month=1 is not working, but month=2 does work (along with the other parts of the expression).  I havn't tried this a bunch of times to see if this is always the case, but it seems to be.


When running those queries using Hector, in the debugger the QueryResult's get() method returns null (which should have rows).



Thanks,

-nate



From: Jake Luciani [mailto:jakers@gmail.com]
Sent: Tuesday, November 08, 2011 8:56 AM
To: user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be there

Hi Nate,

Could you try running it with debug enabled on the logs? it will give more insite into what's going on.

-Jake

On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons <NS...@ften.com>> wrote:
This is against a single server, not a cluster.  Replication factor for the keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if multiple indexes like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rkalla@gmail.com<ma...@gmail.com>]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Secondary index issue, unable to query for records that should be there

Nate, is this all against a single Cassandra server, or do you have a ring setup? If you do have a ring setup, what is your replicationfactor set to? Also what ConsistencyLevel are you writing with when storing the values?

-R
On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <NS...@ften.com>> wrote:
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF with several secondary indexes to try out some options.  Right now I have the following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      -- absolute timestamp for this message, also indexed year/month/day/hour/minute
      -- index these as they are low cardinality
      {column_name:messageTimestamp, validation_class:LongType},
      {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS},

                ... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these values on a Hector ColumnFamilyUpdater instance and update that way.  Then later I can query from the command line with CQL such as:

                get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query for what should be there and get data back such as the above query, but later that same query returns 0 rows.  Similarly, with fewer clauses in the expression, like this:

                get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???????
Any idea what could be going wrong?  I'm not getting any exceptions in my client during the write, and I don't see anything in the logs (no errors anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on CQL queries with multiple indexed columns is good (does Cassandra intelligently use all available indexes on these queries?)



Thanks,

-nate




--
http://twitter.com/tjake

Re: Secondary index issue, unable to query for records that should be there

Posted by Nate McCall <na...@datastax.com>.
I think you wanted to use Int32Type instead of IntegerType for
creating the indexes. IntegerType is actually representative of
java.math.BigInteger.

On Tue, Nov 8, 2011 at 12:28 PM, Nate Sammons <NS...@ften.com> wrote:
> Interesting…  if I switch the columns to be UTF8 instead of integers, like
> this:
>
>
>
> create column family IndexTest with
>
>   key_validation_class = UTF8Type
>
>   and comparator = UTF8Type
>
>   and column_metadata = [
>
>       {column_name:year, validation_class:UTF8Type, index_type: KEYS},
>
>       {column_name:month, validation_class:UTF8Type, index_type: KEYS},
>
>       {column_name:day, validation_class:UTF8Type, index_type: KEYS},
>
>       {column_name:hour, validation_class:UTF8Type, index_type: KEYS},
>
>       {column_name:minute, validation_class:UTF8Type, index_type: KEYS},
>
>       {column_name:data, validation_class:UTF8Type}
>
>   ];
>
>
>
>
>
> And change the hector code to use setString(…) instead of setInteger(…).
>
>
>
> Then everything works fine.   Is there a CQL bug with respect to non-string
> columns?
>
>
>
>
>
> Thanks,
>
>
>
> -nate
>
>
>
>
>
>
>
> From: Nate Sammons [mailto:NSammons@ften.com]
> Sent: Tuesday, November 08, 2011 11:14 AM
>
> To: user@cassandra.apache.org
> Subject: RE: Secondary index issue, unable to query for records that should
> be there
>
>
>
> Note that I had identical behavior using a fresh download of Cassandra 1.0.2
> as of today.
>
>
>
> Thanks,
>
>
>
> -nate
>
>
>
>
>
> From: Nate Sammons [mailto:NSammons@ften.com]
> Sent: Tuesday, November 08, 2011 10:20 AM
> To: user@cassandra.apache.org
> Subject: RE: Secondary index issue, unable to query for records that should
> be there
>
>
>
> I restarted with logging turned up to DEBUG, and after quite a bit of
> logging during startup, I re-ran a query:
>
>
>
>
>
> get IndexTest where year=2011 and month=1 and day=14 and hour=18 and
> minute=49;
>
>
>
>
>
> produced the following in the following:
>
>
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line
> 728) scan
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line
> 1017) restricted ranges for query [-1,-1] are
> [[-1,160425280223280959086247334056682279392],
> (160425280223280959086247334056682279392,-1]]
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line
> 1104) scan ranges are
> [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77)
> Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line
> 1131) reading org.apache.cassandra.db.IndexScanCommand@7bc203c from
> natebookpro/127.0.1.1
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96)
> Primary scan clause is minute
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109)
> Expanding slice filter to entire row to cover additional expressions
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151)
> Scanning index 'IndexTest.minute EQ 49' starting with
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line
> 189) collectAllData
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163)
> fetched null
>
> DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line
> 46) Sending RangeSliceReply{rows=} to 808@natebookpro/127.0.1.1
>
> DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829
> ResponseVerbHandler.java (line 44) Processing response on a callback from
> 808@natebookpro/127.0.1.1
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77)
> Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
>
> DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line
> 1131) reading org.apache.cassandra.db.IndexScanCommand@6a25a21d from
> natebookpro/127.0.1.1
>
> DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96)
> Primary scan clause is minute
>
> DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109)
> Expanding slice filter to entire row to cover additional expressions
>
> DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151)
> Scanning index 'IndexTest.minute EQ 49' starting with
>
> DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line
> 189) collectAllData
>
> DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163)
> fetched null
>
> DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line
> 46) Sending RangeSliceReply{rows=} to 809@natebookpro/127.0.1.1
>
> DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834
> ResponseVerbHandler.java (line 44) Processing response on a callback from
> 809@natebookpro/127.0.1.1
>
>
>
>
>
>
>
> Whereas a direct read of a key using “get IndexTest[2011-1-14-18-49--1];”
> produced a result, and the following in the logs:
>
>
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line
> 323) get_slice
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 623)
> Command/ConsistencyLevel is SliceFromReadCommand(table='Test',
> key='323031312d312d31342d31382d34392d2d31',
> column_parent='QueryPath(columnFamilyName='IndexTest',
> superColumnName='null', columnName='null')', start='', finish='',
> reversed=false, count=1000000)/ONE
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 ReadCallback.java (line 77)
> Blockfor/repair is 1/true; setting up requests to natebookpro/127.0.1.1
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 639)
> reading data locally
>
> DEBUG [ReadStage:37] 2011-11-08 10:11:20,160 StorageProxy.java (line 792)
> LocalReadRunnable reading SliceFromReadCommand(table='Test',
> key='323031312d312d31342d31382d34392d2d31',
> column_parent='QueryPath(columnFamilyName='IndexTest',
> superColumnName='null', columnName='null')', start='', finish='',
> reversed=false, count=1000000)
>
> DEBUG [ReadStage:37] 2011-11-08 10:11:20,161 CollationController.java (line
> 189) collectAllData
>
> DEBUG [ReadStage:37] 2011-11-08 10:11:20,162 SliceQueryFilter.java (line
> 123) collecting 0 of 1000000: data:false:512@1320769510502017
>
> DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line
> 123) collecting 1 of 1000000: day:false:4@1320769510502014
>
> DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line
> 123) collecting 2 of 1000000: hour:false:4@1320769510502015
>
> DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line
> 123) collecting 3 of 1000000: minute:false:4@1320769510502016
>
> DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line
> 123) collecting 4 of 1000000: month:false:4@1320769510502013
>
> DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line
> 123) collecting 5 of 1000000: year:false:4@1320769510502012
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,170 StorageProxy.java (line 689)
> Read: 10 ms.
>
>
>
>
>
>
>
> Note that a query for “get IndexTest where minute=49” (which also returns no
> records) results in the following logs:
>
>
>
>
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,210 CassandraServer.java (line
> 728) scan
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line
> 1017) restricted ranges for query [-1,-1] are
> [[-1,160425280223280959086247334056682279392],
> (160425280223280959086247334056682279392,-1]]
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line
> 1104) scan ranges are
> [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 ReadCallback.java (line 77)
> Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 StorageProxy.java (line
> 1131) reading org.apache.cassandra.db.IndexScanCommand@547d6c11 from
> natebookpro/127.0.1.1
>
> DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 96)
> Primary scan clause is minute
>
> DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 109)
> Expanding slice filter to entire row to cover additional expressions
>
> DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 151)
> Scanning index 'IndexTest.minute EQ 49' starting with
>
> DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 CollationController.java (line
> 189) collectAllData
>
> DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 163)
> fetched null
>
> DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 IndexScanVerbHandler.java (line
> 46) Sending RangeSliceReply{rows=} to 462@natebookpro/127.0.1.1
>
> DEBUG [RequestResponseStage:17] 2011-11-08 10:13:40,215
> ResponseVerbHandler.java (line 44) Processing response on a callback from
> 462@natebookpro/127.0.1.1
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 ReadCallback.java (line 77)
> Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
>
> DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 StorageProxy.java (line
> 1131) reading org.apache.cassandra.db.IndexScanCommand@62132898 from
> natebookpro/127.0.1.1
>
> DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 96)
> Primary scan clause is minute
>
> DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 109)
> Expanding slice filter to entire row to cover additional expressions
>
> DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 151)
> Scanning index 'IndexTest.minute EQ 49' starting with
>
> DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 CollationController.java (line
> 189) collectAllData
>
> DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 163)
> fetched null
>
> DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 IndexScanVerbHandler.java (line
> 46) Sending RangeSliceReply{rows=} to 463@natebookpro/127.0.1.1
>
> DEBUG [RequestResponseStage:18] 2011-11-08 10:13:40,219
> ResponseVerbHandler.java (line 44) Processing response on a callback from
> 463@natebookpro/127.0.1.1
>
>
>
>
>
>
>
>
>
>
>
> Thanks,
>
>
>
> -nate
>
>
>
>
>
>
>
>
>
>
>
>
>
> From: Jake Luciani [mailto:jakers@gmail.com]
> Sent: Tuesday, November 08, 2011 8:56 AM
> To: user@cassandra.apache.org
> Subject: Re: Secondary index issue, unable to query for records that should
> be there
>
>
>
> Hi Nate,
>
>
>
> Could you try running it with debug enabled on the logs? it will give more
> insite into what's going on.
>
>
>
> -Jake
>
>
>
> On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons <NS...@ften.com> wrote:
>
> This is against a single server, not a cluster.  Replication factor for the
> keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.
>
>
>
> I’m trying to get a simple test together that shows this.  Does anyone know
> if multiple indexes like this are efficient?
>
>
>
> Thanks,
>
>
>
> -nate
>
>
>
>
>
> From: Riyad Kalla [mailto:rkalla@gmail.com]
> Sent: Monday, November 07, 2011 4:31 PM
> To: user@cassandra.apache.org
> Subject: Re: Secondary index issue, unable to query for records that should
> be there
>
>
>
> Nate, is this all against a single Cassandra server, or do you have a ring
> setup? If you do have a ring setup, what is your replicationfactor set to?
> Also what ConsistencyLevel are you writing with when storing the values?
>
>
>
> -R
>
> On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <NS...@ften.com> wrote:
>
> Hello,
>
>
>
> I’m experimenting with Cassandra (DataStax Enterprise 1.0.3), and I’ve got a
> CF with several secondary indexes to try out some options.  Right now I have
> the following to create my CF using the CLI:
>
>
>
> create column family MyTest with
>
>   key_validation_class = UTF8Type
>
>   and comparator = UTF8Type
>
>   and column_metadata = [
>
>       -- absolute timestamp for this message, also indexed
> year/month/day/hour/minute
>
>       -- index these as they are low cardinality
>
>       {column_name:messageTimestamp, validation_class:LongType},
>
>       {column_name:messageYear, validation_class:IntegerType, index_type:
> KEYS},
>
>       {column_name:messageMonth, validation_class:IntegerType, index_type:
> KEYS},
>
>       {column_name:messageDay, validation_class:IntegerType, index_type:
> KEYS},
>
>       {column_name:messageHour, validation_class:IntegerType, index_type:
> KEYS},
>
>       {column_name:messageMinute, validation_class:IntegerType, index_type:
> KEYS},
>
>
>
>                 … other non-indexed columns defined
>
>
>
>   ];
>
>
>
>
>
> So when I insert data, I calculate a year/month/day/hour/minute and set
> these values on a Hector ColumnFamilyUpdater instance and update that way.
> Then later I can query from the command line with CQL such as:
>
>
>
>                 get MyTest where messageYear=2011 and messageMonth=6 and
> messageDay=1 and messageHour=13 and messageMinute=44;
>
>
>
> etc.  This generally works, however at some point queries that I know should
> return data no longer return any rows.
>
>
>
> So for instance, part way through my test (inserting 250K rows), I can query
> for what should be there and get data back such as the above query, but
> later that same query returns 0 rows.  Similarly, with fewer clauses in the
> expression, like this:
>
>
>
>                 get MyTest where messageYear=2011 and messageMonth=6;
>
>
>
> Will also return 0 rows.
>
>
>
>
>
> ???????
>
> Any idea what could be going wrong?  I’m not getting any exceptions in my
> client during the write, and I don’t see anything in the logs (no errors
> anyway).
>
>
>
>
>
>
>
> A second question – is what I’m doing insane?  I’m not sure that performance
> on CQL queries with multiple indexed columns is good (does Cassandra
> intelligently use all available indexes on these queries?)
>
>
>
>
>
>
>
> Thanks,
>
>
>
> -nate
>
>
>
>
>
> --
> http://twitter.com/tjake

RE: Secondary index issue, unable to query for records that should be there

Posted by Nate Sammons <NS...@ften.com>.
Interesting...  if I switch the columns to be UTF8 instead of integers, like this:

create column family IndexTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      {column_name:year, validation_class:UTF8Type, index_type: KEYS},
      {column_name:month, validation_class:UTF8Type, index_type: KEYS},
      {column_name:day, validation_class:UTF8Type, index_type: KEYS},
      {column_name:hour, validation_class:UTF8Type, index_type: KEYS},
      {column_name:minute, validation_class:UTF8Type, index_type: KEYS},
      {column_name:data, validation_class:UTF8Type}
  ];


And change the hector code to use setString(...) instead of setInteger(...).

Then everything works fine.   Is there a CQL bug with respect to non-string columns?


Thanks,

-nate



From: Nate Sammons [mailto:NSammons@ften.com]
Sent: Tuesday, November 08, 2011 11:14 AM
To: user@cassandra.apache.org
Subject: RE: Secondary index issue, unable to query for records that should be there

Note that I had identical behavior using a fresh download of Cassandra 1.0.2 as of today.

Thanks,

-nate


From: Nate Sammons [mailto:NSammons@ften.com]<mailto:[mailto:NSammons@ften.com]>
Sent: Tuesday, November 08, 2011 10:20 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: RE: Secondary index issue, unable to query for records that should be there

I restarted with logging turned up to DEBUG, and after quite a bit of logging during startup, I re-ran a query:


get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49;


produced the following in the following:

DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) scan
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) restricted ranges for query [-1,-1] are [[-1,160425280223280959086247334056682279392], (160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) scan ranges are [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@7bc203c<ma...@7bc203c> from natebookpro/127.0.1.1
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 808@natebookpro/127.0.1.1<mailto:808@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 ResponseVerbHandler.java (line 44) Processing response on a callback from 808@natebookpro/127.0.1.1<mailto:808@natebookpro/127.0.1.1>
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@6a25a21d<ma...@6a25a21d> from natebookpro/127.0.1.1
DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 809@natebookpro/127.0.1.1<mailto:809@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 ResponseVerbHandler.java (line 44) Processing response on a callback from 809@natebookpro/127.0.1.1<mailto:809@natebookpro/127.0.1.1>



Whereas a direct read of a key using "get IndexTest[2011-1-14-18-49--1];" produced a result, and the following in the logs:

DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line 323) get_slice
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 623) Command/ConsistencyLevel is SliceFromReadCommand(table='Test', key='323031312d312d31342d31382d34392d2d31', column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 ReadCallback.java (line 77) Blockfor/repair is 1/true; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 639) reading data locally
DEBUG [ReadStage:37] 2011-11-08 10:11:20,160 StorageProxy.java (line 792) LocalReadRunnable reading SliceFromReadCommand(table='Test', key='323031312d312d31342d31382d34392d2d31', column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=1000000)
DEBUG [ReadStage:37] 2011-11-08 10:11:20,161 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:37] 2011-11-08 10:11:20,162 SliceQueryFilter.java (line 123) collecting 0 of 1000000: data:false:512@1320769510502017
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 1 of 1000000: day:false:4@1320769510502014
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 2 of 1000000: hour:false:4@1320769510502015
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 3 of 1000000: minute:false:4@1320769510502016
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 4 of 1000000: month:false:4@1320769510502013
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 5 of 1000000: year:false:4@1320769510502012
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,170 StorageProxy.java (line 689) Read: 10 ms.



Note that a query for "get IndexTest where minute=49" (which also returns no records) results in the following logs:


DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,210 CassandraServer.java (line 728) scan
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line 1017) restricted ranges for query [-1,-1] are [[-1,160425280223280959086247334056682279392], (160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line 1104) scan ranges are [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@547d6c11<ma...@547d6c11> from natebookpro/127.0.1.1
DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 462@natebookpro/127.0.1.1<mailto:462@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:17] 2011-11-08 10:13:40,215 ResponseVerbHandler.java (line 44) Processing response on a callback from 462@natebookpro/127.0.1.1<mailto:462@natebookpro/127.0.1.1>
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@62132898<ma...@62132898> from natebookpro/127.0.1.1
DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 463@natebookpro/127.0.1.1<mailto:463@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:18] 2011-11-08 10:13:40,219 ResponseVerbHandler.java (line 44) Processing response on a callback from 463@natebookpro/127.0.1.1<mailto:463@natebookpro/127.0.1.1>





Thanks,

-nate






From: Jake Luciani [mailto:jakers@gmail.com]<mailto:[mailto:jakers@gmail.com]>
Sent: Tuesday, November 08, 2011 8:56 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Secondary index issue, unable to query for records that should be there

Hi Nate,

Could you try running it with debug enabled on the logs? it will give more insite into what's going on.

-Jake

On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons <NS...@ften.com>> wrote:
This is against a single server, not a cluster.  Replication factor for the keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if multiple indexes like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rkalla@gmail.com<ma...@gmail.com>]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Secondary index issue, unable to query for records that should be there

Nate, is this all against a single Cassandra server, or do you have a ring setup? If you do have a ring setup, what is your replicationfactor set to? Also what ConsistencyLevel are you writing with when storing the values?

-R
On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <NS...@ften.com>> wrote:
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF with several secondary indexes to try out some options.  Right now I have the following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      -- absolute timestamp for this message, also indexed year/month/day/hour/minute
      -- index these as they are low cardinality
      {column_name:messageTimestamp, validation_class:LongType},
      {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS},

                ... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these values on a Hector ColumnFamilyUpdater instance and update that way.  Then later I can query from the command line with CQL such as:

                get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query for what should be there and get data back such as the above query, but later that same query returns 0 rows.  Similarly, with fewer clauses in the expression, like this:

                get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???????
Any idea what could be going wrong?  I'm not getting any exceptions in my client during the write, and I don't see anything in the logs (no errors anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on CQL queries with multiple indexed columns is good (does Cassandra intelligently use all available indexes on these queries?)



Thanks,

-nate




--
http://twitter.com/tjake

RE: Secondary index issue, unable to query for records that should be there

Posted by Nate Sammons <NS...@ften.com>.
Note that I had identical behavior using a fresh download of Cassandra 1.0.2 as of today.

Thanks,

-nate


From: Nate Sammons [mailto:NSammons@ften.com]
Sent: Tuesday, November 08, 2011 10:20 AM
To: user@cassandra.apache.org
Subject: RE: Secondary index issue, unable to query for records that should be there

I restarted with logging turned up to DEBUG, and after quite a bit of logging during startup, I re-ran a query:


get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49;


produced the following in the following:

DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) scan
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) restricted ranges for query [-1,-1] are [[-1,160425280223280959086247334056682279392], (160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) scan ranges are [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@7bc203c<ma...@7bc203c> from natebookpro/127.0.1.1
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 808@natebookpro/127.0.1.1<mailto:808@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 ResponseVerbHandler.java (line 44) Processing response on a callback from 808@natebookpro/127.0.1.1<mailto:808@natebookpro/127.0.1.1>
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@6a25a21d<ma...@6a25a21d> from natebookpro/127.0.1.1
DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 809@natebookpro/127.0.1.1<mailto:809@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 ResponseVerbHandler.java (line 44) Processing response on a callback from 809@natebookpro/127.0.1.1<mailto:809@natebookpro/127.0.1.1>



Whereas a direct read of a key using "get IndexTest[2011-1-14-18-49--1];" produced a result, and the following in the logs:

DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line 323) get_slice
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 623) Command/ConsistencyLevel is SliceFromReadCommand(table='Test', key='323031312d312d31342d31382d34392d2d31', column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 ReadCallback.java (line 77) Blockfor/repair is 1/true; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 639) reading data locally
DEBUG [ReadStage:37] 2011-11-08 10:11:20,160 StorageProxy.java (line 792) LocalReadRunnable reading SliceFromReadCommand(table='Test', key='323031312d312d31342d31382d34392d2d31', column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=1000000)
DEBUG [ReadStage:37] 2011-11-08 10:11:20,161 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:37] 2011-11-08 10:11:20,162 SliceQueryFilter.java (line 123) collecting 0 of 1000000: data:false:512@1320769510502017
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 1 of 1000000: day:false:4@1320769510502014
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 2 of 1000000: hour:false:4@1320769510502015
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 3 of 1000000: minute:false:4@1320769510502016
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 4 of 1000000: month:false:4@1320769510502013
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 5 of 1000000: year:false:4@1320769510502012
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,170 StorageProxy.java (line 689) Read: 10 ms.



Note that a query for "get IndexTest where minute=49" (which also returns no records) results in the following logs:


DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,210 CassandraServer.java (line 728) scan
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line 1017) restricted ranges for query [-1,-1] are [[-1,160425280223280959086247334056682279392], (160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line 1104) scan ranges are [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@547d6c11<ma...@547d6c11> from natebookpro/127.0.1.1
DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 462@natebookpro/127.0.1.1<mailto:462@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:17] 2011-11-08 10:13:40,215 ResponseVerbHandler.java (line 44) Processing response on a callback from 462@natebookpro/127.0.1.1<mailto:462@natebookpro/127.0.1.1>
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@62132898<ma...@62132898> from natebookpro/127.0.1.1
DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 463@natebookpro/127.0.1.1<mailto:463@natebookpro/127.0.1.1>
DEBUG [RequestResponseStage:18] 2011-11-08 10:13:40,219 ResponseVerbHandler.java (line 44) Processing response on a callback from 463@natebookpro/127.0.1.1<mailto:463@natebookpro/127.0.1.1>





Thanks,

-nate






From: Jake Luciani [mailto:jakers@gmail.com]<mailto:[mailto:jakers@gmail.com]>
Sent: Tuesday, November 08, 2011 8:56 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Secondary index issue, unable to query for records that should be there

Hi Nate,

Could you try running it with debug enabled on the logs? it will give more insite into what's going on.

-Jake

On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons <NS...@ften.com>> wrote:
This is against a single server, not a cluster.  Replication factor for the keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if multiple indexes like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rkalla@gmail.com<ma...@gmail.com>]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Secondary index issue, unable to query for records that should be there

Nate, is this all against a single Cassandra server, or do you have a ring setup? If you do have a ring setup, what is your replicationfactor set to? Also what ConsistencyLevel are you writing with when storing the values?

-R
On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <NS...@ften.com>> wrote:
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF with several secondary indexes to try out some options.  Right now I have the following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      -- absolute timestamp for this message, also indexed year/month/day/hour/minute
      -- index these as they are low cardinality
      {column_name:messageTimestamp, validation_class:LongType},
      {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS},

                ... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these values on a Hector ColumnFamilyUpdater instance and update that way.  Then later I can query from the command line with CQL such as:

                get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query for what should be there and get data back such as the above query, but later that same query returns 0 rows.  Similarly, with fewer clauses in the expression, like this:

                get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???????
Any idea what could be going wrong?  I'm not getting any exceptions in my client during the write, and I don't see anything in the logs (no errors anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on CQL queries with multiple indexed columns is good (does Cassandra intelligently use all available indexes on these queries?)



Thanks,

-nate




--
http://twitter.com/tjake

RE: Secondary index issue, unable to query for records that should be there

Posted by Nate Sammons <NS...@ften.com>.
I restarted with logging turned up to DEBUG, and after quite a bit of logging during startup, I re-ran a query:


get IndexTest where year=2011 and month=1 and day=14 and hour=18 and minute=49;


produced the following in the following:

DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,823 CassandraServer.java (line 728) scan
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1017) restricted ranges for query [-1,-1] are [[-1,160425280223280959086247334056682279392], (160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,824 StorageProxy.java (line 1104) scan ranges are [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,825 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,826 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@7bc203c from natebookpro/127.0.1.1
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:46] 2011-11-08 10:19:21,827 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:46] 2011-11-08 10:19:21,828 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 808@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:21] 2011-11-08 10:19:21,829 ResponseVerbHandler.java (line 44) Processing response on a callback from 808@natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,829 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-3] 2011-11-08 10:19:21,830 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@6a25a21d from natebookpro/127.0.1.1
DEBUG [ReadStage:47] 2011-11-08 10:19:21,831 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:47] 2011-11-08 10:19:21,832 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:47] 2011-11-08 10:19:21,833 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 809@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:22] 2011-11-08 10:19:21,834 ResponseVerbHandler.java (line 44) Processing response on a callback from 809@natebookpro/127.0.1.1



Whereas a direct read of a key using "get IndexTest[2011-1-14-18-49--1];" produced a result, and the following in the logs:

DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,153 CassandraServer.java (line 323) get_slice
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 623) Command/ConsistencyLevel is SliceFromReadCommand(table='Test', key='323031312d312d31342d31382d34392d2d31', column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=1000000)/ONE
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 ReadCallback.java (line 77) Blockfor/repair is 1/true; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,159 StorageProxy.java (line 639) reading data locally
DEBUG [ReadStage:37] 2011-11-08 10:11:20,160 StorageProxy.java (line 792) LocalReadRunnable reading SliceFromReadCommand(table='Test', key='323031312d312d31342d31382d34392d2d31', column_parent='QueryPath(columnFamilyName='IndexTest', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=1000000)
DEBUG [ReadStage:37] 2011-11-08 10:11:20,161 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:37] 2011-11-08 10:11:20,162 SliceQueryFilter.java (line 123) collecting 0 of 1000000: data:false:512@1320769510502017
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 1 of 1000000: day:false:4@1320769510502014
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 2 of 1000000: hour:false:4@1320769510502015
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 3 of 1000000: minute:false:4@1320769510502016
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 4 of 1000000: month:false:4@1320769510502013
DEBUG [ReadStage:37] 2011-11-08 10:11:20,163 SliceQueryFilter.java (line 123) collecting 5 of 1000000: year:false:4@1320769510502012
DEBUG [pool-2-thread-2] 2011-11-08 10:11:20,170 StorageProxy.java (line 689) Read: 10 ms.



Note that a query for "get IndexTest where minute=49" (which also returns no records) results in the following logs:


DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,210 CassandraServer.java (line 728) scan
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line 1017) restricted ranges for query [-1,-1] are [[-1,160425280223280959086247334056682279392], (160425280223280959086247334056682279392,-1]]
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,211 StorageProxy.java (line 1104) scan ranges are [-1,160425280223280959086247334056682279392],(160425280223280959086247334056682279392,-1]
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,212 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@547d6c11 from natebookpro/127.0.1.1
DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:42] 2011-11-08 10:13:40,213 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:42] 2011-11-08 10:13:40,214 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 462@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:17] 2011-11-08 10:13:40,215 ResponseVerbHandler.java (line 44) Processing response on a callback from 462@natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 ReadCallback.java (line 77) Blockfor/repair is 1/false; setting up requests to natebookpro/127.0.1.1
DEBUG [pool-2-thread-2] 2011-11-08 10:13:40,216 StorageProxy.java (line 1131) reading org.apache.cassandra.db.IndexScanCommand@62132898 from natebookpro/127.0.1.1
DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 96) Primary scan clause is minute
DEBUG [ReadStage:43] 2011-11-08 10:13:40,217 KeysSearcher.java (line 109) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 151) Scanning index 'IndexTest.minute EQ 49' starting with
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 CollationController.java (line 189) collectAllData
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 KeysSearcher.java (line 163) fetched null
DEBUG [ReadStage:43] 2011-11-08 10:13:40,218 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 463@natebookpro/127.0.1.1
DEBUG [RequestResponseStage:18] 2011-11-08 10:13:40,219 ResponseVerbHandler.java (line 44) Processing response on a callback from 463@natebookpro/127.0.1.1





Thanks,

-nate






From: Jake Luciani [mailto:jakers@gmail.com]
Sent: Tuesday, November 08, 2011 8:56 AM
To: user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be there

Hi Nate,

Could you try running it with debug enabled on the logs? it will give more insite into what's going on.

-Jake

On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons <NS...@ften.com>> wrote:
This is against a single server, not a cluster.  Replication factor for the keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if multiple indexes like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rkalla@gmail.com<ma...@gmail.com>]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Secondary index issue, unable to query for records that should be there

Nate, is this all against a single Cassandra server, or do you have a ring setup? If you do have a ring setup, what is your replicationfactor set to? Also what ConsistencyLevel are you writing with when storing the values?

-R
On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <NS...@ften.com>> wrote:
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF with several secondary indexes to try out some options.  Right now I have the following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      -- absolute timestamp for this message, also indexed year/month/day/hour/minute
      -- index these as they are low cardinality
      {column_name:messageTimestamp, validation_class:LongType},
      {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS},

                ... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these values on a Hector ColumnFamilyUpdater instance and update that way.  Then later I can query from the command line with CQL such as:

                get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query for what should be there and get data back such as the above query, but later that same query returns 0 rows.  Similarly, with fewer clauses in the expression, like this:

                get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???????
Any idea what could be going wrong?  I'm not getting any exceptions in my client during the write, and I don't see anything in the logs (no errors anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on CQL queries with multiple indexed columns is good (does Cassandra intelligently use all available indexes on these queries?)



Thanks,

-nate




--
http://twitter.com/tjake

Re: Secondary index issue, unable to query for records that should be there

Posted by Jake Luciani <ja...@gmail.com>.
Hi Nate,

Could you try running it with debug enabled on the logs? it will give more
insite into what's going on.

-Jake


On Tue, Nov 8, 2011 at 3:45 PM, Nate Sammons <NS...@ften.com> wrote:

> This is against a single server, not a cluster.  Replication factor for
> the keyspace is set to 1, CL is the default for Hector, which I think is
> QUORUM.****
>
> ** **
>
> I’m trying to get a simple test together that shows this.  Does anyone
> know if multiple indexes like this are efficient?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> -nate****
>
> ** **
>
> ** **
>
> *From:* Riyad Kalla [mailto:rkalla@gmail.com]
> *Sent:* Monday, November 07, 2011 4:31 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Secondary index issue, unable to query for records that
> should be there****
>
> ** **
>
> Nate, is this all against a single Cassandra server, or do you have a ring
> setup? If you do have a ring setup, what is your replicationfactor set to?
> Also what ConsistencyLevel are you writing with when storing the values?**
> **
>
> ** **
>
> -R****
>
> On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <NS...@ften.com> wrote:***
> *
>
> Hello,****
>
>  ****
>
> I’m experimenting with Cassandra (DataStax Enterprise 1.0.3), and I’ve got
> a CF with several secondary indexes to try out some options.  Right now I
> have the following to create my CF using the CLI:****
>
>  ****
>
> create column family MyTest with****
>
>   key_validation_class = UTF8Type****
>
>   and comparator = UTF8Type****
>
>   and column_metadata = [****
>
>       -- absolute timestamp for this message, also indexed
> year/month/day/hour/minute****
>
>       -- index these as they are low cardinality****
>
>       {column_name:messageTimestamp, validation_class:LongType},****
>
>       {column_name:messageYear, validation_class:IntegerType, index_type:
> KEYS},****
>
>       {column_name:messageMonth, validation_class:IntegerType, index_type:
> KEYS},****
>
>       {column_name:messageDay, validation_class:IntegerType, index_type:
> KEYS},****
>
>       {column_name:messageHour, validation_class:IntegerType, index_type:
> KEYS},****
>
>       {column_name:messageMinute, validation_class:IntegerType,
> index_type: KEYS},****
>
>  ****
>
>                 … other non-indexed columns defined****
>
>  ****
>
>   ];****
>
>  ****
>
>  ****
>
> So when I insert data, I calculate a year/month/day/hour/minute and set
> these values on a Hector ColumnFamilyUpdater instance and update that way.
> Then later I can query from the command line with CQL such as:****
>
>  ****
>
>                 get MyTest where messageYear=2011 and messageMonth=6 and
> messageDay=1 and messageHour=13 and messageMinute=44;****
>
>  ****
>
> etc.  This generally works, however at some point queries that I know
> should return data no longer return any rows.****
>
>  ****
>
> So for instance, part way through my test (inserting 250K rows), I can
> query for what should be there and get data back such as the above query,
> but later that same query returns 0 rows.  Similarly, with fewer clauses in
> the expression, like this:****
>
>  ****
>
>                 get MyTest where messageYear=2011 and messageMonth=6;****
>
>  ****
>
> Will also return 0 rows.****
>
>  ****
>
>  ****
>
> ???????****
>
> Any idea what could be going wrong?  I’m not getting any exceptions in my
> client during the write, and I don’t see anything in the logs (no errors
> anyway).****
>
>  ****
>
>  ****
>
>  ****
>
> A second question – is what I’m doing insane?  I’m not sure that
> performance on CQL queries with multiple indexed columns is good (does
> Cassandra intelligently use all available indexes on these queries?)****
>
>  ****
>
>  ****
>
>  ****
>
> Thanks,****
>
>  ****
>
> -nate****
>
> ** **
>



-- 
http://twitter.com/tjake

RE: Secondary index issue, unable to query for records that should be there

Posted by Nate Sammons <NS...@ften.com>.
This is against a single server, not a cluster.  Replication factor for the keyspace is set to 1, CL is the default for Hector, which I think is QUORUM.

I'm trying to get a simple test together that shows this.  Does anyone know if multiple indexes like this are efficient?

Thanks,

-nate


From: Riyad Kalla [mailto:rkalla@gmail.com]
Sent: Monday, November 07, 2011 4:31 PM
To: user@cassandra.apache.org
Subject: Re: Secondary index issue, unable to query for records that should be there

Nate, is this all against a single Cassandra server, or do you have a ring setup? If you do have a ring setup, what is your replicationfactor set to? Also what ConsistencyLevel are you writing with when storing the values?

-R
On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <NS...@ften.com>> wrote:
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF with several secondary indexes to try out some options.  Right now I have the following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
      -- absolute timestamp for this message, also indexed year/month/day/hour/minute
      -- index these as they are low cardinality
      {column_name:messageTimestamp, validation_class:LongType},
      {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMonth, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
      {column_name:messageMinute, validation_class:IntegerType, index_type: KEYS},

                ... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these values on a Hector ColumnFamilyUpdater instance and update that way.  Then later I can query from the command line with CQL such as:

                get MyTest where messageYear=2011 and messageMonth=6 and messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query for what should be there and get data back such as the above query, but later that same query returns 0 rows.  Similarly, with fewer clauses in the expression, like this:

                get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???????
Any idea what could be going wrong?  I'm not getting any exceptions in my client during the write, and I don't see anything in the logs (no errors anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on CQL queries with multiple indexed columns is good (does Cassandra intelligently use all available indexes on these queries?)



Thanks,

-nate


Re: Secondary index issue, unable to query for records that should be there

Posted by Riyad Kalla <rk...@gmail.com>.
Nate, is this all against a single Cassandra server, or do you have a ring
setup? If you do have a ring setup, what is your replicationfactor set to?
Also what ConsistencyLevel are you writing with when storing the values?

-R

On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons <NS...@ften.com> wrote:

> Hello,****
>
> ** **
>
> I’m experimenting with Cassandra (DataStax Enterprise 1.0.3), and I’ve got
> a CF with several secondary indexes to try out some options.  Right now I
> have the following to create my CF using the CLI:****
>
> ** **
>
> create column family MyTest with****
>
>   key_validation_class = UTF8Type****
>
>   and comparator = UTF8Type****
>
>   and column_metadata = [****
>
>       -- absolute timestamp for this message, also indexed
> year/month/day/hour/minute****
>
>       -- index these as they are low cardinality****
>
>       {column_name:messageTimestamp, validation_class:LongType},****
>
>       {column_name:messageYear, validation_class:IntegerType, index_type:
> KEYS},****
>
>       {column_name:messageMonth, validation_class:IntegerType, index_type:
> KEYS},****
>
>       {column_name:messageDay, validation_class:IntegerType, index_type:
> KEYS},****
>
>       {column_name:messageHour, validation_class:IntegerType, index_type:
> KEYS},****
>
>       {column_name:messageMinute, validation_class:IntegerType,
> index_type: KEYS},****
>
> ** **
>
>                 … other non-indexed columns defined****
>
> ** **
>
>   ];****
>
> ** **
>
> ** **
>
> So when I insert data, I calculate a year/month/day/hour/minute and set
> these values on a Hector ColumnFamilyUpdater instance and update that way.
> Then later I can query from the command line with CQL such as:****
>
> ** **
>
>                 get MyTest where messageYear=2011 and messageMonth=6 and
> messageDay=1 and messageHour=13 and messageMinute=44;****
>
> ** **
>
> etc.  This generally works, however at some point queries that I know
> should return data no longer return any rows.****
>
> ** **
>
> So for instance, part way through my test (inserting 250K rows), I can
> query for what should be there and get data back such as the above query,
> but later that same query returns 0 rows.  Similarly, with fewer clauses in
> the expression, like this:****
>
> ** **
>
>                 get MyTest where messageYear=2011 and messageMonth=6;****
>
> ** **
>
> Will also return 0 rows.****
>
> ** **
>
> ** **
>
> ???????****
>
> Any idea what could be going wrong?  I’m not getting any exceptions in my
> client during the write, and I don’t see anything in the logs (no errors
> anyway).****
>
> ** **
>
> ** **
>
> ** **
>
> A second question – is what I’m doing insane?  I’m not sure that
> performance on CQL queries with multiple indexed columns is good (does
> Cassandra intelligently use all available indexes on these queries?)****
>
> ** **
>
> ** **
>
> ** **
>
> Thanks,****
>
> ** **
>
> -nate****
>
> ****
>