You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Matthew Allen <ma...@gmail.com> on 2014/07/14 06:09:14 UTC

Multi-column range scans

Hi,

We have a roll-up table that as follows.

CREATE TABLE SKILL_COUNT (
  skill text,
  interval_id bigint,
  skill_level int,
  skill_count int,
  PRIMARY KEY (skill, interval_id, skill_level));

Essentially,
  skill = a names skill i.e. "Complaints"
  interval_id = a rounded epoch time (15 minute intervals)
  skill_level = a number/rating from 1-10
  skill_count = the number of people with the specified skill, with the
specified skill level, logged in at the interval_id

We'd like to run the following query against it

select * from skill_count where skill='Complaints' and interval_id >=
1402359300000 and interval_id < 1402359900000 and skill_level >= 5;

to get a count of people with the relevant skill and level at the
appropriate time.  However I am getting the following message.

Bad Request: PRIMARY KEY part skill_level cannot be restricted (preceding
part interval_id is either not restricted or by a non-EQ relation)

Looking at how the data is stored ...

-------------------
RowKey: Complaints
=> (name=1402359300000:2:, value=, timestamp=1405308260403000)
=> (name=1402359300000:2:skill_count, value=0000000a,
timestamp=1405308260403000)
=> (name=1402359300000:5:, value=, timestamp=1405308260403001)
=> (name=1402359300000:5:skill_count, value=00000014,
timestamp=1405308260403001)
=> (name=1402359300000:8:, value=, timestamp=1405308260419000)
=> (name=1402359300000:8:skill_count, value=0000001e,
timestamp=1405308260419000)
=> (name=1402359300000:10:, value=, timestamp=1405308260419001)
=> (name=1402359300000:10:skill_count, value=00000001,
timestamp=1405308260419001)

Should cassandra be able to allow for an extra level of filtering ? or is
this something that should be performed from within the application.

We have a solution working in Oracle, but would like to store this data in
Cassandra, as all the other data that this solution relies on already sits
within Cassandra.

Appreciate any guidance on this matter.

Matt

Re: Multi-column range scans

Posted by Matthew Allen <ma...@gmail.com>.

Thanks for both your help, greatly appreciated.

We'll proceed down the path of putting the filtering into the application
logic for the time being.

Matt.


On Tue, Jul 15, 2014 at 1:20 AM, DuyHai Doan <do...@gmail.com> wrote:

> Exact Ken, I get bitten again by the semantics of composite tuples.
>
>  This kind of query won't be possible until something like wide row end
> slice predicate is available (
> https://issues.apache.org/jira/browse/CASSANDRA-6167), if it will one day
>
>
>
>
> On Mon, Jul 14, 2014 at 5:02 PM, Ken Hancock <ke...@schange.com>
> wrote:
>
>> I don't think your query is doing what he wants.  Your query will
>> correctly set the starting point, but will also return larger interval_id's
>> but with lower skill_levels:
>>
>> cqlsh:test> select * from skill_count where skill='Complaints' and
>> (interval_id, skill_level) >= (1402359300000, 5);
>>
>>  skill      | interval_id   | skill_level | skill_count
>> ------------+---------------+-------------+-------------
>>  Complaints | 1402359300000 |           5 |          20
>>  Complaints | 1402359300000 |           8 |          30
>>  Complaints | 1402359300000 |          10 |           1
>>  Complaints | 1402359400000 |           2 |          10
>>  Complaints | 1402359400000 |           8 |          30
>>
>> (5 rows)
>>
>> cqlsh:test> select * from skill_count where skill='Complaints' and
>> (interval_id, skill_level) >= (1402359300000, 5) and (interval_id) <
>> (1402359900000);
>>
>>  skill      | interval_id   | skill_level | skill_count
>> ------------+---------------+-------------+-------------
>>  Complaints | 1402359300000 |           5 |          20  <- desired
>>  Complaints | 1402359300000 |           8 |          30  <- desired
>>  Complaints | 1402359300000 |          10 |           1  <- desired
>>  Complaints | 1402359400000 |           2 |          10  <- SKIP
>>  Complaints | 1402359400000 |           8 |          30  <- desired
>>
>> The query results in a discontinuous range slice so isn't supported --
>> Essentially, the client will have to read the entire range and perform
>> client-side filtering.  Whether this is efficient depends on the
>> cardinality of skill_level.
>>
>> I tried playing with the "allow filtering" cql clause, but it would
>> appear from the documentation it's very restrictive...
>>
>>
>>
>>
>>
>> On Mon, Jul 14, 2014 at 7:44 AM, DuyHai Doan <do...@gmail.com>
>> wrote:
>>
>>> or :
>>>
>>>
>>> select * from skill_count where skill='Complaints'
>>> and (interval_id,skill_level) >= (1402359300000,5)
>>> and (interval_id) < (1402359900000)
>>>
>>> Strange enough, when starting using tuple notation you'll need to stick
>>> to it even if there is only one element in the tuple
>>>
>>>
>>> On Mon, Jul 14, 2014 at 1:40 PM, DuyHai Doan <do...@gmail.com>
>>> wrote:
>>>
>>>> Sorry, I've just checked, the correct query should be:
>>>>
>>>> select * from skill_count where skill='Complaints' and
>>>> (interval_id,skill_level) >= (1402359300000,5) and
>>>> (interval_id,skill_level) < (1402359900000,11)
>>>>
>>>>
>>>> On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Mathew
>>>>>
>>>>>  Since Cassandra 2.0.6 it is possible to query over composites:
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-4851
>>>>>
>>>>> For your example:
>>>>>
>>>>> select * from skill_count where skill='Complaints' and
>>>>> (interval_id,skill_level) >= (1402359300000,5) and interval_id <
>>>>> 1402359900000;
>>>>>
>>>>>
>>>>> On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen <
>>>>> matthew.j.allen@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a roll-up table that as follows.
>>>>>>
>>>>>> CREATE TABLE SKILL_COUNT (
>>>>>>   skill text,
>>>>>>   interval_id bigint,
>>>>>>   skill_level int,
>>>>>>   skill_count int,
>>>>>>   PRIMARY KEY (skill, interval_id, skill_level));
>>>>>>
>>>>>> Essentially,
>>>>>>   skill = a names skill i.e. "Complaints"
>>>>>>   interval_id = a rounded epoch time (15 minute intervals)
>>>>>>   skill_level = a number/rating from 1-10
>>>>>>   skill_count = the number of people with the specified skill, with
>>>>>> the specified skill level, logged in at the interval_id
>>>>>>
>>>>>> We'd like to run the following query against it
>>>>>>
>>>>>> select * from skill_count where skill='Complaints' and interval_id >=
>>>>>> 1402359300000 and interval_id < 1402359900000 and skill_level >= 5;
>>>>>>
>>>>>> to get a count of people with the relevant skill and level at the
>>>>>> appropriate time.  However I am getting the following message.
>>>>>>
>>>>>> Bad Request: PRIMARY KEY part skill_level cannot be restricted
>>>>>> (preceding part interval_id is either not restricted or by a non-EQ
>>>>>> relation)
>>>>>>
>>>>>> Looking at how the data is stored ...
>>>>>>
>>>>>> -------------------
>>>>>> RowKey: Complaints
>>>>>> => (name=1402359300000:2:, value=, timestamp=1405308260403000)
>>>>>> => (name=1402359300000:2:skill_count, value=0000000a,
>>>>>> timestamp=1405308260403000)
>>>>>> => (name=1402359300000:5:, value=, timestamp=1405308260403001)
>>>>>> => (name=1402359300000:5:skill_count, value=00000014,
>>>>>> timestamp=1405308260403001)
>>>>>> => (name=1402359300000:8:, value=, timestamp=1405308260419000)
>>>>>> => (name=1402359300000:8:skill_count, value=0000001e,
>>>>>> timestamp=1405308260419000)
>>>>>> => (name=1402359300000:10:, value=, timestamp=1405308260419001)
>>>>>> => (name=1402359300000:10:skill_count, value=00000001,
>>>>>> timestamp=1405308260419001)
>>>>>>
>>>>>> Should cassandra be able to allow for an extra level of filtering ?
>>>>>> or is this something that should be performed from within the application.
>>>>>>
>>>>>> We have a solution working in Oracle, but would like to store this
>>>>>> data in Cassandra, as all the other data that this solution relies on
>>>>>> already sits within Cassandra.
>>>>>>
>>>>>> Appreciate any guidance on this matter.
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> *Ken Hancock *| System Architect, Advanced Advertising
>> SeaChange International
>> 50 Nagog Park
>> Acton, Massachusetts 01720
>> ken.hancock@schange.com | www.schange.com | NASDAQ:SEAC
>> <http://www.schange.com/en-US/Company/InvestorRelations.aspx>
>> Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hancock@schange.com
>>  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
>> LinkedIn] <http://www.linkedin.com/in/kenhancock>
>>
>> [image: SeaChange International]
>>  <http://www.schange.com/>This e-mail and any attachments may contain
>> information which is SeaChange International confidential. The information
>> enclosed is intended only for the addressees herein and may not be copied
>> or forwarded without permission from SeaChange International.
>>
>
>

Re: Multi-column range scans

Posted by DuyHai Doan <do...@gmail.com>.

Exact Ken, I get bitten again by the semantics of composite tuples.

 This kind of query won't be possible until something like wide row end
slice predicate is available (
https://issues.apache.org/jira/browse/CASSANDRA-6167), if it will one day




On Mon, Jul 14, 2014 at 5:02 PM, Ken Hancock <ke...@schange.com>
wrote:

> I don't think your query is doing what he wants.  Your query will
> correctly set the starting point, but will also return larger interval_id's
> but with lower skill_levels:
>
> cqlsh:test> select * from skill_count where skill='Complaints' and
> (interval_id, skill_level) >= (1402359300000, 5);
>
>  skill      | interval_id   | skill_level | skill_count
> ------------+---------------+-------------+-------------
>  Complaints | 1402359300000 |           5 |          20
>  Complaints | 1402359300000 |           8 |          30
>  Complaints | 1402359300000 |          10 |           1
>  Complaints | 1402359400000 |           2 |          10
>  Complaints | 1402359400000 |           8 |          30
>
> (5 rows)
>
> cqlsh:test> select * from skill_count where skill='Complaints' and
> (interval_id, skill_level) >= (1402359300000, 5) and (interval_id) <
> (1402359900000);
>
>  skill      | interval_id   | skill_level | skill_count
> ------------+---------------+-------------+-------------
>  Complaints | 1402359300000 |           5 |          20  <- desired
>  Complaints | 1402359300000 |           8 |          30  <- desired
>  Complaints | 1402359300000 |          10 |           1  <- desired
>  Complaints | 1402359400000 |           2 |          10  <- SKIP
>  Complaints | 1402359400000 |           8 |          30  <- desired
>
> The query results in a discontinuous range slice so isn't supported --
> Essentially, the client will have to read the entire range and perform
> client-side filtering.  Whether this is efficient depends on the
> cardinality of skill_level.
>
> I tried playing with the "allow filtering" cql clause, but it would appear
> from the documentation it's very restrictive...
>
>
>
>
>
> On Mon, Jul 14, 2014 at 7:44 AM, DuyHai Doan <do...@gmail.com> wrote:
>
>> or :
>>
>>
>> select * from skill_count where skill='Complaints'
>> and (interval_id,skill_level) >= (1402359300000,5)
>> and (interval_id) < (1402359900000)
>>
>> Strange enough, when starting using tuple notation you'll need to stick
>> to it even if there is only one element in the tuple
>>
>>
>> On Mon, Jul 14, 2014 at 1:40 PM, DuyHai Doan <do...@gmail.com>
>> wrote:
>>
>>> Sorry, I've just checked, the correct query should be:
>>>
>>> select * from skill_count where skill='Complaints' and
>>> (interval_id,skill_level) >= (1402359300000,5) and
>>> (interval_id,skill_level) < (1402359900000,11)
>>>
>>>
>>> On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan <do...@gmail.com>
>>> wrote:
>>>
>>>> Hello Mathew
>>>>
>>>>  Since Cassandra 2.0.6 it is possible to query over composites:
>>>> https://issues.apache.org/jira/browse/CASSANDRA-4851
>>>>
>>>> For your example:
>>>>
>>>> select * from skill_count where skill='Complaints' and
>>>> (interval_id,skill_level) >= (1402359300000,5) and interval_id <
>>>> 1402359900000;
>>>>
>>>>
>>>> On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen <
>>>> matthew.j.allen@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a roll-up table that as follows.
>>>>>
>>>>> CREATE TABLE SKILL_COUNT (
>>>>>   skill text,
>>>>>   interval_id bigint,
>>>>>   skill_level int,
>>>>>   skill_count int,
>>>>>   PRIMARY KEY (skill, interval_id, skill_level));
>>>>>
>>>>> Essentially,
>>>>>   skill = a names skill i.e. "Complaints"
>>>>>   interval_id = a rounded epoch time (15 minute intervals)
>>>>>   skill_level = a number/rating from 1-10
>>>>>   skill_count = the number of people with the specified skill, with
>>>>> the specified skill level, logged in at the interval_id
>>>>>
>>>>> We'd like to run the following query against it
>>>>>
>>>>> select * from skill_count where skill='Complaints' and interval_id >=
>>>>> 1402359300000 and interval_id < 1402359900000 and skill_level >= 5;
>>>>>
>>>>> to get a count of people with the relevant skill and level at the
>>>>> appropriate time.  However I am getting the following message.
>>>>>
>>>>> Bad Request: PRIMARY KEY part skill_level cannot be restricted
>>>>> (preceding part interval_id is either not restricted or by a non-EQ
>>>>> relation)
>>>>>
>>>>> Looking at how the data is stored ...
>>>>>
>>>>> -------------------
>>>>> RowKey: Complaints
>>>>> => (name=1402359300000:2:, value=, timestamp=1405308260403000)
>>>>> => (name=1402359300000:2:skill_count, value=0000000a,
>>>>> timestamp=1405308260403000)
>>>>> => (name=1402359300000:5:, value=, timestamp=1405308260403001)
>>>>> => (name=1402359300000:5:skill_count, value=00000014,
>>>>> timestamp=1405308260403001)
>>>>> => (name=1402359300000:8:, value=, timestamp=1405308260419000)
>>>>> => (name=1402359300000:8:skill_count, value=0000001e,
>>>>> timestamp=1405308260419000)
>>>>> => (name=1402359300000:10:, value=, timestamp=1405308260419001)
>>>>> => (name=1402359300000:10:skill_count, value=00000001,
>>>>> timestamp=1405308260419001)
>>>>>
>>>>> Should cassandra be able to allow for an extra level of filtering ? or
>>>>> is this something that should be performed from within the application.
>>>>>
>>>>> We have a solution working in Oracle, but would like to store this
>>>>> data in Cassandra, as all the other data that this solution relies on
>>>>> already sits within Cassandra.
>>>>>
>>>>> Appreciate any guidance on this matter.
>>>>>
>>>>> Matt
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> *Ken Hancock *| System Architect, Advanced Advertising
> SeaChange International
> 50 Nagog Park
> Acton, Massachusetts 01720
> ken.hancock@schange.com | www.schange.com | NASDAQ:SEAC
> <http://www.schange.com/en-US/Company/InvestorRelations.aspx>
> Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hancock@schange.com
>  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
> LinkedIn] <http://www.linkedin.com/in/kenhancock>
>
> [image: SeaChange International]
>  <http://www.schange.com/>This e-mail and any attachments may contain
> information which is SeaChange International confidential. The information
> enclosed is intended only for the addressees herein and may not be copied
> or forwarded without permission from SeaChange International.
>

Re: Multi-column range scans

Posted by Ken Hancock <ke...@schange.com>.

I don't think your query is doing what he wants.  Your query will correctly
set the starting point, but will also return larger interval_id's but with
lower skill_levels:

cqlsh:test> select * from skill_count where skill='Complaints' and
(interval_id, skill_level) >= (1402359300000, 5);

 skill      | interval_id   | skill_level | skill_count
------------+---------------+-------------+-------------
 Complaints | 1402359300000 |           5 |          20
 Complaints | 1402359300000 |           8 |          30
 Complaints | 1402359300000 |          10 |           1
 Complaints | 1402359400000 |           2 |          10
 Complaints | 1402359400000 |           8 |          30

(5 rows)

cqlsh:test> select * from skill_count where skill='Complaints' and
(interval_id, skill_level) >= (1402359300000, 5) and (interval_id) <
(1402359900000);

 skill      | interval_id   | skill_level | skill_count
------------+---------------+-------------+-------------
 Complaints | 1402359300000 |           5 |          20  <- desired
 Complaints | 1402359300000 |           8 |          30  <- desired
 Complaints | 1402359300000 |          10 |           1  <- desired
 Complaints | 1402359400000 |           2 |          10  <- SKIP
 Complaints | 1402359400000 |           8 |          30  <- desired

The query results in a discontinuous range slice so isn't supported --
Essentially, the client will have to read the entire range and perform
client-side filtering.  Whether this is efficient depends on the
cardinality of skill_level.

I tried playing with the "allow filtering" cql clause, but it would appear
from the documentation it's very restrictive...





On Mon, Jul 14, 2014 at 7:44 AM, DuyHai Doan <do...@gmail.com> wrote:

> or :
>
>
> select * from skill_count where skill='Complaints'
> and (interval_id,skill_level) >= (1402359300000,5)
> and (interval_id) < (1402359900000)
>
> Strange enough, when starting using tuple notation you'll need to stick to
> it even if there is only one element in the tuple
>
>
> On Mon, Jul 14, 2014 at 1:40 PM, DuyHai Doan <do...@gmail.com> wrote:
>
>> Sorry, I've just checked, the correct query should be:
>>
>> select * from skill_count where skill='Complaints' and
>> (interval_id,skill_level) >= (1402359300000,5) and
>> (interval_id,skill_level) < (1402359900000,11)
>>
>>
>> On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan <do...@gmail.com>
>> wrote:
>>
>>> Hello Mathew
>>>
>>>  Since Cassandra 2.0.6 it is possible to query over composites:
>>> https://issues.apache.org/jira/browse/CASSANDRA-4851
>>>
>>> For your example:
>>>
>>> select * from skill_count where skill='Complaints' and
>>> (interval_id,skill_level) >= (1402359300000,5) and interval_id <
>>> 1402359900000;
>>>
>>>
>>> On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen <
>>> matthew.j.allen@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We have a roll-up table that as follows.
>>>>
>>>> CREATE TABLE SKILL_COUNT (
>>>>   skill text,
>>>>   interval_id bigint,
>>>>   skill_level int,
>>>>   skill_count int,
>>>>   PRIMARY KEY (skill, interval_id, skill_level));
>>>>
>>>> Essentially,
>>>>   skill = a names skill i.e. "Complaints"
>>>>   interval_id = a rounded epoch time (15 minute intervals)
>>>>   skill_level = a number/rating from 1-10
>>>>   skill_count = the number of people with the specified skill, with the
>>>> specified skill level, logged in at the interval_id
>>>>
>>>> We'd like to run the following query against it
>>>>
>>>> select * from skill_count where skill='Complaints' and interval_id >=
>>>> 1402359300000 and interval_id < 1402359900000 and skill_level >= 5;
>>>>
>>>> to get a count of people with the relevant skill and level at the
>>>> appropriate time.  However I am getting the following message.
>>>>
>>>> Bad Request: PRIMARY KEY part skill_level cannot be restricted
>>>> (preceding part interval_id is either not restricted or by a non-EQ
>>>> relation)
>>>>
>>>> Looking at how the data is stored ...
>>>>
>>>> -------------------
>>>> RowKey: Complaints
>>>> => (name=1402359300000:2:, value=, timestamp=1405308260403000)
>>>> => (name=1402359300000:2:skill_count, value=0000000a,
>>>> timestamp=1405308260403000)
>>>> => (name=1402359300000:5:, value=, timestamp=1405308260403001)
>>>> => (name=1402359300000:5:skill_count, value=00000014,
>>>> timestamp=1405308260403001)
>>>> => (name=1402359300000:8:, value=, timestamp=1405308260419000)
>>>> => (name=1402359300000:8:skill_count, value=0000001e,
>>>> timestamp=1405308260419000)
>>>> => (name=1402359300000:10:, value=, timestamp=1405308260419001)
>>>> => (name=1402359300000:10:skill_count, value=00000001,
>>>> timestamp=1405308260419001)
>>>>
>>>> Should cassandra be able to allow for an extra level of filtering ? or
>>>> is this something that should be performed from within the application.
>>>>
>>>> We have a solution working in Oracle, but would like to store this data
>>>> in Cassandra, as all the other data that this solution relies on already
>>>> sits within Cassandra.
>>>>
>>>> Appreciate any guidance on this matter.
>>>>
>>>> Matt
>>>>
>>>
>>>
>>
>


-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hancock@schange.com | www.schange.com | NASDAQ:SEAC
<http://www.schange.com/en-US/Company/InvestorRelations.aspx>
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hancock@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
<http://www.linkedin.com/in/kenhancock>

[image: SeaChange International]
 <http://www.schange.com/>This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.

Re: Multi-column range scans

Posted by DuyHai Doan <do...@gmail.com>.

or :

select * from skill_count where skill='Complaints'
and (interval_id,skill_level) >= (1402359300000,5)
and (interval_id) < (1402359900000)

Strange enough, when starting using tuple notation you'll need to stick to
it even if there is only one element in the tuple


On Mon, Jul 14, 2014 at 1:40 PM, DuyHai Doan <do...@gmail.com> wrote:

> Sorry, I've just checked, the correct query should be:
>
> select * from skill_count where skill='Complaints' and
> (interval_id,skill_level) >= (1402359300000,5) and
> (interval_id,skill_level) < (1402359900000,11)
>
>
> On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan <do...@gmail.com> wrote:
>
>> Hello Mathew
>>
>>  Since Cassandra 2.0.6 it is possible to query over composites:
>> https://issues.apache.org/jira/browse/CASSANDRA-4851
>>
>> For your example:
>>
>> select * from skill_count where skill='Complaints' and
>> (interval_id,skill_level) >= (1402359300000,5) and interval_id <
>> 1402359900000;
>>
>>
>> On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen <matthew.j.allen@gmail.com
>> > wrote:
>>
>>> Hi,
>>>
>>> We have a roll-up table that as follows.
>>>
>>> CREATE TABLE SKILL_COUNT (
>>>   skill text,
>>>   interval_id bigint,
>>>   skill_level int,
>>>   skill_count int,
>>>   PRIMARY KEY (skill, interval_id, skill_level));
>>>
>>> Essentially,
>>>   skill = a names skill i.e. "Complaints"
>>>   interval_id = a rounded epoch time (15 minute intervals)
>>>   skill_level = a number/rating from 1-10
>>>   skill_count = the number of people with the specified skill, with the
>>> specified skill level, logged in at the interval_id
>>>
>>> We'd like to run the following query against it
>>>
>>> select * from skill_count where skill='Complaints' and interval_id >=
>>> 1402359300000 and interval_id < 1402359900000 and skill_level >= 5;
>>>
>>> to get a count of people with the relevant skill and level at the
>>> appropriate time.  However I am getting the following message.
>>>
>>> Bad Request: PRIMARY KEY part skill_level cannot be restricted
>>> (preceding part interval_id is either not restricted or by a non-EQ
>>> relation)
>>>
>>> Looking at how the data is stored ...
>>>
>>> -------------------
>>> RowKey: Complaints
>>> => (name=1402359300000:2:, value=, timestamp=1405308260403000)
>>> => (name=1402359300000:2:skill_count, value=0000000a,
>>> timestamp=1405308260403000)
>>> => (name=1402359300000:5:, value=, timestamp=1405308260403001)
>>> => (name=1402359300000:5:skill_count, value=00000014,
>>> timestamp=1405308260403001)
>>> => (name=1402359300000:8:, value=, timestamp=1405308260419000)
>>> => (name=1402359300000:8:skill_count, value=0000001e,
>>> timestamp=1405308260419000)
>>> => (name=1402359300000:10:, value=, timestamp=1405308260419001)
>>> => (name=1402359300000:10:skill_count, value=00000001,
>>> timestamp=1405308260419001)
>>>
>>> Should cassandra be able to allow for an extra level of filtering ? or
>>> is this something that should be performed from within the application.
>>>
>>> We have a solution working in Oracle, but would like to store this data
>>> in Cassandra, as all the other data that this solution relies on already
>>> sits within Cassandra.
>>>
>>> Appreciate any guidance on this matter.
>>>
>>> Matt
>>>
>>
>>
>

Re: Multi-column range scans

Posted by DuyHai Doan <do...@gmail.com>.

Sorry, I've just checked, the correct query should be:

select * from skill_count where skill='Complaints' and
(interval_id,skill_level) >= (1402359300000,5) and
(interval_id,skill_level) < (1402359900000,11)


On Mon, Jul 14, 2014 at 9:45 AM, DuyHai Doan <do...@gmail.com> wrote:

> Hello Mathew
>
>  Since Cassandra 2.0.6 it is possible to query over composites:
> https://issues.apache.org/jira/browse/CASSANDRA-4851
>
> For your example:
>
> select * from skill_count where skill='Complaints' and
> (interval_id,skill_level) >= (1402359300000,5) and interval_id <
> 1402359900000;
>
>
> On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen <ma...@gmail.com>
> wrote:
>
>> Hi,
>>
>> We have a roll-up table that as follows.
>>
>> CREATE TABLE SKILL_COUNT (
>>   skill text,
>>   interval_id bigint,
>>   skill_level int,
>>   skill_count int,
>>   PRIMARY KEY (skill, interval_id, skill_level));
>>
>> Essentially,
>>   skill = a names skill i.e. "Complaints"
>>   interval_id = a rounded epoch time (15 minute intervals)
>>   skill_level = a number/rating from 1-10
>>   skill_count = the number of people with the specified skill, with the
>> specified skill level, logged in at the interval_id
>>
>> We'd like to run the following query against it
>>
>> select * from skill_count where skill='Complaints' and interval_id >=
>> 1402359300000 and interval_id < 1402359900000 and skill_level >= 5;
>>
>> to get a count of people with the relevant skill and level at the
>> appropriate time.  However I am getting the following message.
>>
>> Bad Request: PRIMARY KEY part skill_level cannot be restricted (preceding
>> part interval_id is either not restricted or by a non-EQ relation)
>>
>> Looking at how the data is stored ...
>>
>> -------------------
>> RowKey: Complaints
>> => (name=1402359300000:2:, value=, timestamp=1405308260403000)
>> => (name=1402359300000:2:skill_count, value=0000000a,
>> timestamp=1405308260403000)
>> => (name=1402359300000:5:, value=, timestamp=1405308260403001)
>> => (name=1402359300000:5:skill_count, value=00000014,
>> timestamp=1405308260403001)
>> => (name=1402359300000:8:, value=, timestamp=1405308260419000)
>> => (name=1402359300000:8:skill_count, value=0000001e,
>> timestamp=1405308260419000)
>> => (name=1402359300000:10:, value=, timestamp=1405308260419001)
>> => (name=1402359300000:10:skill_count, value=00000001,
>> timestamp=1405308260419001)
>>
>> Should cassandra be able to allow for an extra level of filtering ? or is
>> this something that should be performed from within the application.
>>
>> We have a solution working in Oracle, but would like to store this data
>> in Cassandra, as all the other data that this solution relies on already
>> sits within Cassandra.
>>
>> Appreciate any guidance on this matter.
>>
>> Matt
>>
>
>

Re: Multi-column range scans

Posted by DuyHai Doan <do...@gmail.com>.

Hello Mathew

 Since Cassandra 2.0.6 it is possible to query over composites:
https://issues.apache.org/jira/browse/CASSANDRA-4851

For your example:

select * from skill_count where skill='Complaints' and
(interval_id,skill_level) >= (1402359300000,5) and interval_id <
1402359900000;


On Mon, Jul 14, 2014 at 6:09 AM, Matthew Allen <ma...@gmail.com>
wrote:

> Hi,
>
> We have a roll-up table that as follows.
>
> CREATE TABLE SKILL_COUNT (
>   skill text,
>   interval_id bigint,
>   skill_level int,
>   skill_count int,
>   PRIMARY KEY (skill, interval_id, skill_level));
>
> Essentially,
>   skill = a names skill i.e. "Complaints"
>   interval_id = a rounded epoch time (15 minute intervals)
>   skill_level = a number/rating from 1-10
>   skill_count = the number of people with the specified skill, with the
> specified skill level, logged in at the interval_id
>
> We'd like to run the following query against it
>
> select * from skill_count where skill='Complaints' and interval_id >=
> 1402359300000 and interval_id < 1402359900000 and skill_level >= 5;
>
> to get a count of people with the relevant skill and level at the
> appropriate time.  However I am getting the following message.
>
> Bad Request: PRIMARY KEY part skill_level cannot be restricted (preceding
> part interval_id is either not restricted or by a non-EQ relation)
>
> Looking at how the data is stored ...
>
> -------------------
> RowKey: Complaints
> => (name=1402359300000:2:, value=, timestamp=1405308260403000)
> => (name=1402359300000:2:skill_count, value=0000000a,
> timestamp=1405308260403000)
> => (name=1402359300000:5:, value=, timestamp=1405308260403001)
> => (name=1402359300000:5:skill_count, value=00000014,
> timestamp=1405308260403001)
> => (name=1402359300000:8:, value=, timestamp=1405308260419000)
> => (name=1402359300000:8:skill_count, value=0000001e,
> timestamp=1405308260419000)
> => (name=1402359300000:10:, value=, timestamp=1405308260419001)
> => (name=1402359300000:10:skill_count, value=00000001,
> timestamp=1405308260419001)
>
> Should cassandra be able to allow for an extra level of filtering ? or is
> this something that should be performed from within the application.
>
> We have a solution working in Oracle, but would like to store this data in
> Cassandra, as all the other data that this solution relies on already sits
> within Cassandra.
>
> Appreciate any guidance on this matter.
>
> Matt
>