You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Subodh Nijsure <su...@gmail.com> on 2014/09/01 01:43:16 UTC

Help with select IN query in cassandra

I have following database schema

CREATE TABLE sensor_info_table (
  asset_id text,
  event_time timestamp,
  "timestamp" timeuuid,
  sensor_reading map<text, text>,
  sensor_serial_number text,
  sensor_type int,
  PRIMARY KEY ((asset_id), event_time, "timestamp")
);

CREATE INDEX event_time_index ON sensor_info_table (event_time);

CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");

Now I am able to insert the data into this table, however I am unable
to do following query where I want to select items with specific
timeuuid values.

It gives me following error.

SELECT * from mydb.sensor_info_table where timestamp IN (
bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
bf4521ac-3166-11e4-87a3-b888e30f5d17) ;

Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
(preceding column "event_time" is either not restricted or by a non-EQ
relation)

What do I have to do to make this work?

For what its worth I am using django for my front end development and
I am using "timestamp timeuuid" field as unique indentifier to
reference specific sensor reading from django framework -- since
cassandra doesn't have way to generate unique id upon insert (like
old-style rdms's auto-fields).


Below is software version info.

show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
Thrift protocol 19.39.0]

I really don't understand what the error message preceeding column
"event_time" is either not restricted or by no-EQ relation?

-Subodh Nijsure

Re: Help with select IN query in cassandra

Posted by Subodh Nijsure <su...@gmail.com>.
Thanks Michael I will certainly go with this approach for now.

-Subodh

On Mon, Sep 1, 2014 at 6:33 AM, Laing, Michael
<mi...@nytimes.com> wrote:
> This should work for your query requirements - 2 tables w same info because
> disk is cheap and writes are fast so optimize for reads:
>
> CREATE TABLE sensor_asset (
>   asset_id text,
>   event_time timestamp,
>   tuuid timeuuid,
>   sensor_reading map<text, text>,
>   sensor_serial_number text,
>   sensor_type int,
>   PRIMARY KEY ((asset_id), event_time)
> );
>
> CREATE TABLE sensor_tuuid (
>   asset_id text,
>   event_time timestamp,
>   tuuid timeuuid,
>   sensor_reading map<text, text>,
>   sensor_serial_number text,
>   sensor_type int,
>   PRIMARY KEY (tuuid)
> );
>
> 1. Give me all sensor data for an asset:
>
> select * from sensor_asset where asset_id = <asset>;
>
> 2. Give me sensor data that matches a set of timeuuids:
>
> select * from sensor_tuuid where tuuid in (<tuuid1>, <tuuid2>, ...);
>
> 3. Give me all sensor data for an asset collected after | before | between
> event_time(s):
>
> select * from sensor_asset where asset_id = <asset> and event_time > <ts1>;
> select * from sensor_asset where asset_id = <asset> and event_time < <ts1>;
> select * from sensor_asset where asset_id = <asset> and event_time < <ts1>
> and event_time > <ts2>;
>
> ***
>
> Many people (not me) handle sensor data, so there may be better overall
> approaches considering volumes, deletion, compaction etc.
>
> But the above is simple and should make your current approach workable as
> you iterate toward a complete solution.
>
> Cheers,
> ml
>
>
>
> On Sun, Aug 31, 2014 at 11:08 PM, Subodh Nijsure <su...@gmail.com>
> wrote:
>>
>> Thanks for your help Michael.
>>
>> If specifying asset_id would help I can construct queries that can
>> include asset_id
>>
>> So I have been "playing" around with PRIMARY KEY definition and
>> following table definition
>>
>> CREATE TABLE sensor_info_table (
>>   asset_id text,
>>   event_time timestamp,
>>   "timestamp" timeuuid,
>>   sensor_reading map<text, text>,
>>   sensor_serial_number text,
>>   sensor_type int,
>>   PRIMARY KEY ((asset_id, "timestamp"), event_time)
>> );
>>
>> It does what I want to do, and I removed the index for timestamp item
>> since now it is part of primary key and thus my query like this works.
>>
>> SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
>> timestamp IN (
>> 17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
>> );
>>
>> But now this doesn't work it give
>>
>> SELECT * from sensor_info_table where  asset_id='3' ;
>>
>> Bad Request: Partition key part timestamp must be restricted since
>> preceding part is
>>
>> I am keeping index on event_time as I sometime need to query something
>> "give me all data since time x" i.e. something like this works.
>>
>>  SELECT * from sensor_info_table where  event_time > '2014-08-31
>> 16:54:02-0700' ALLOW FILTERING;
>>
>> However if I do this things then this don't work:
>>
>> SELECT * from sensor_info_table where  asset_id='3' AND event_time >
>> '2014-08-31 16:54:02-0700';
>>
>> Bad Request: Partition key part timestamp must be restricted since
>> preceding part is
>>
>> Also  I am not conformable with fact that I need to specify ALLOW
>> FILTERING.
>>
>> I guess cassandra schema design task asks designer to write down
>> queries before designing schema.
>>
>> For the above table definition I want to do following queries:
>>
>> - Give me all sensor data for given asset.
>> - Give me sensor data that matches given set of timeuuids
>> - Give me all sendor data for a given asset, that were collected after
>> | before | between  certain event_time.
>>
>> Given these query criteria how should  I construct my schema? One
>> thought has occurred to me is make three tables with each item
>> asset_id , event_time, timeuuid as primary keys and depending on type
>> of query choose the table to do query upon. That seems like a waste of
>> resources (disk, cpu ), also increasing insert times(!) but thats the
>> way things need to happen in cassandra world its okay. ( I am
>> two-three weeks into learning about cassandra).
>>
>> -Subodh
>>
>> On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael
>> <mi...@nytimes.com> wrote:
>> > Oh it must be late - I missed the fact that you didn't want to specify
>> > asset_id. The above queries will still work but you have to use 'allow
>> > filtering' - generally not a good idea. I'll look again in the morning.
>> >
>> >
>> > On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael
>> > <mi...@nytimes.com>
>> > wrote:
>> >>
>> >> Hmm. Because the clustering key is (event_time, "timestamp"),
>> >> event_time
>> >> must be specified as well - hopefully that info is available to the ux.
>> >>
>> >> Unfortunately you will then hit another problem with your query: you
>> >> are
>> >> selecting a collection field... this will not work with IN on
>> >> "timestamp".
>> >>
>> >> So you could select all the "timestamp"s for an asset_id/event_time:
>> >>>
>> >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>> >>> 1231234;
>> >>
>> >>
>> >> Or you could apply a range of "timestamp"s:
>> >>>
>> >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>> >>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
>> >>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
>> >>
>> >>
>> >> BTW the secondary indices are not a good idea: high cardinality and of
>> >> no
>> >> use in this query that I can see.
>> >>
>> >> ml
>> >>
>> >>
>> >> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure
>> >> <su...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Not really event time stamp is created by the sensor when it reads
>> >>> data
>> >>> and  timestamp is something server creates when inserting data into
>> >>> cassandra db.  At later point in time my django ux allows users to
>> >>> browse
>> >>> this data and reference interesting data points via the timestamp
>> >>> field. The
>> >>> timestamp field is my bridge between Sal and nosql world.
>> >>>
>> >>> Subodh
>> >>>
>> >>> On Aug 31, 2014 5:33 PM, "Laing, Michael" <mi...@nytimes.com>
>> >>> wrote:
>> >>>>
>> >>>> Are event_time and timestamp essentially representing the same
>> >>>> datetime?
>> >>>>
>> >>>> On Sunday, August 31, 2014, Subodh Nijsure <su...@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>> I have following database schema
>> >>>>>
>> >>>>> CREATE TABLE sensor_info_table (
>> >>>>>   asset_id text,
>> >>>>>   event_time timestamp,
>> >>>>>   "timestamp" timeuuid,
>> >>>>>   sensor_reading map<text, text>,
>> >>>>>   sensor_serial_number text,
>> >>>>>   sensor_type int,
>> >>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
>> >>>>> );
>> >>>>>
>> >>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>> >>>>>
>> >>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>> >>>>>
>> >>>>> Now I am able to insert the data into this table, however I am
>> >>>>> unable
>> >>>>> to do following query where I want to select items with specific
>> >>>>> timeuuid values.
>> >>>>>
>> >>>>> It gives me following error.
>> >>>>>
>> >>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
>> >>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
>> >>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>> >>>>>
>> >>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
>> >>>>> (preceding column "event_time" is either not restricted or by a
>> >>>>> non-EQ
>> >>>>> relation)
>> >>>>>
>> >>>>> What do I have to do to make this work?
>> >>>>>
>> >>>>> For what its worth I am using django for my front end development
>> >>>>> and
>> >>>>> I am using "timestamp timeuuid" field as unique indentifier to
>> >>>>> reference specific sensor reading from django framework -- since
>> >>>>> cassandra doesn't have way to generate unique id upon insert (like
>> >>>>> old-style rdms's auto-fields).
>> >>>>>
>> >>>>>
>> >>>>> Below is software version info.
>> >>>>>
>> >>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
>> >>>>> Thrift protocol 19.39.0]
>> >>>>>
>> >>>>> I really don't understand what the error message preceeding column
>> >>>>> "event_time" is either not restricted or by no-EQ relation?
>> >>>>>
>> >>>>> -Subodh Nijsure
>> >>
>> >>
>> >
>
>

Re: Help with select IN query in cassandra

Posted by Subodh Nijsure <su...@gmail.com>.
I am guessing OP => Original Poster. I had not asked about deletion,
but Jack thanks for your suggestion for overall handling of deletion.
For the type of sensor data I am collecting we really don't want to
delete the data as we want to figure out data trend over real long
term ( 1, 2 years).

BTW - I found a good link that describes behavior of where clause in
cassandra, just posting that link in case someone stumbles upon this
thread in the future

http://mechanics.flite.com/blog/2013/11/05/breaking-down-the-cql-where-clause/

-Subodh

On Mon, Sep 1, 2014 at 8:56 AM, Jack Krupansky <ja...@basetechnology.com> wrote:
> I did see a reference to deletions: “overall approaches considering volumes,
> deletion, compaction etc.” Did I merely misunderstand the reference? That’s
> all I was responding to... sorry if my misunderstanding added any confusion!
>
> -- Jack Krupansky
>
> From: Laing, Michael
> Sent: Monday, September 1, 2014 11:34 AM
> To: user@cassandra.apache.org
> Subject: Re: Help with select IN query in cassandra
>
> Did the OP propose that?
>
>
> On Mon, Sep 1, 2014 at 10:53 AM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
>>
>> One comment on deletions – aren’t deletions kind of an anti-pattern for
>> modern data processing, such as sensor data, time series data, and social
>> media? I mean, isn’t it usually better to return a full history of the data,
>> with some aging scheme, and manage the tracking of which values are
>> “current” (or “recent”)? Shouldn’t we be looking for and promoting “write
>> once” approaches as a much stronger preference/pattern? Or maybe I should
>> say “write once and bulk delete on aging” rather than the exercise in
>> futility of doing a massive number of deletes and updates in place?
>>
>> -- Jack Krupansky
>>
>> From: Laing, Michael
>> Sent: Monday, September 1, 2014 9:33 AM
>> To: user@cassandra.apache.org
>> Subject: Re: Help with select IN query in cassandra
>>
>> This should work for your query requirements - 2 tables w same info
>> because disk is cheap and writes are fast so optimize for reads:
>>
>> CREATE TABLE sensor_asset (
>>   asset_id text,
>>   event_time timestamp,
>>   tuuid timeuuid,
>>   sensor_reading map<text, text>,
>>   sensor_serial_number text,
>>   sensor_type int,
>>   PRIMARY KEY ((asset_id), event_time)
>> );
>>
>> CREATE TABLE sensor_tuuid (
>>   asset_id text,
>>   event_time timestamp,
>>   tuuid timeuuid,
>>   sensor_reading map<text, text>,
>>   sensor_serial_number text,
>>   sensor_type int,
>>   PRIMARY KEY (tuuid)
>> );
>>
>> 1. Give me all sensor data for an asset:
>>
>> select * from sensor_asset where asset_id = <asset>;
>>
>> 2. Give me sensor data that matches a set of timeuuids:
>>
>> select * from sensor_tuuid where tuuid in (<tuuid1>, <tuuid2>, ...);
>>
>> 3. Give me all sensor data for an asset collected after | before | between
>> event_time(s):
>>
>> select * from sensor_asset where asset_id = <asset> and event_time >
>> <ts1>;
>> select * from sensor_asset where asset_id = <asset> and event_time <
>> <ts1>;
>> select * from sensor_asset where asset_id = <asset> and event_time < <ts1>
>> and event_time > <ts2>;
>>
>> ***
>>
>> Many people (not me) handle sensor data, so there may be better overall
>> approaches considering volumes, deletion, compaction etc.
>>
>> But the above is simple and should make your current approach workable as
>> you iterate toward a complete solution.
>>
>> Cheers,
>> ml
>>
>>
>>
>> On Sun, Aug 31, 2014 at 11:08 PM, Subodh Nijsure
>> <su...@gmail.com> wrote:
>>>
>>> Thanks for your help Michael.
>>>
>>> If specifying asset_id would help I can construct queries that can
>>> include asset_id
>>>
>>> So I have been "playing" around with PRIMARY KEY definition and
>>> following table definition
>>>
>>> CREATE TABLE sensor_info_table (
>>>   asset_id text,
>>>   event_time timestamp,
>>>   "timestamp" timeuuid,
>>>   sensor_reading map<text, text>,
>>>   sensor_serial_number text,
>>>   sensor_type int,
>>>   PRIMARY KEY ((asset_id, "timestamp"), event_time)
>>> );
>>>
>>> It does what I want to do, and I removed the index for timestamp item
>>> since now it is part of primary key and thus my query like this works.
>>>
>>> SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
>>> timestamp IN (
>>> 17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
>>> );
>>>
>>> But now this doesn't work it give
>>>
>>> SELECT * from sensor_info_table where  asset_id='3' ;
>>>
>>> Bad Request: Partition key part timestamp must be restricted since
>>> preceding part is
>>>
>>> I am keeping index on event_time as I sometime need to query something
>>> "give me all data since time x" i.e. something like this works.
>>>
>>> SELECT * from sensor_info_table where  event_time > '2014-08-31
>>> 16:54:02-0700' ALLOW FILTERING;
>>>
>>> However if I do this things then this don't work:
>>>
>>> SELECT * from sensor_info_table where  asset_id='3' AND event_time >
>>> '2014-08-31 16:54:02-0700';
>>>
>>> Bad Request: Partition key part timestamp must be restricted since
>>> preceding part is
>>>
>>> Also  I am not conformable with fact that I need to specify ALLOW
>>> FILTERING.
>>>
>>> I guess cassandra schema design task asks designer to write down
>>> queries before designing schema.
>>>
>>> For the above table definition I want to do following queries:
>>>
>>> - Give me all sensor data for given asset.
>>> - Give me sensor data that matches given set of timeuuids
>>> - Give me all sendor data for a given asset, that were collected after
>>> | before | between  certain event_time.
>>>
>>> Given these query criteria how should  I construct my schema? One
>>> thought has occurred to me is make three tables with each item
>>> asset_id , event_time, timeuuid as primary keys and depending on type
>>> of query choose the table to do query upon. That seems like a waste of
>>> resources (disk, cpu ), also increasing insert times(!) but thats the
>>> way things need to happen in cassandra world its okay. ( I am
>>> two-three weeks into learning about cassandra).
>>>
>>> -Subodh
>>>
>>> On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael
>>> <mi...@nytimes.com> wrote:
>>> > Oh it must be late - I missed the fact that you didn't want to specify
>>> > asset_id. The above queries will still work but you have to use 'allow
>>> > filtering' - generally not a good idea. I'll look again in the morning.
>>> >
>>> >
>>> > On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael
>>> > <mi...@nytimes.com>
>>> > wrote:
>>> >>
>>> >> Hmm. Because the clustering key is (event_time, "timestamp"),
>>> >> event_time
>>> >> must be specified as well - hopefully that info is available to the
>>> >> ux.
>>> >>
>>> >> Unfortunately you will then hit another problem with your query: you
>>> >> are
>>> >> selecting a collection field... this will not work with IN on
>>> >> "timestamp".
>>> >>
>>> >> So you could select all the "timestamp"s for an asset_id/event_time:
>>> >>>
>>> >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>>> >>> 1231234;
>>> >>
>>> >>
>>> >> Or you could apply a range of "timestamp"s:
>>> >>>
>>> >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>>> >>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
>>> >>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
>>> >>
>>> >>
>>> >> BTW the secondary indices are not a good idea: high cardinality and of
>>> >> no
>>> >> use in this query that I can see.
>>> >>
>>> >> ml
>>> >>
>>> >>
>>> >> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure
>>> >> <su...@gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Not really event time stamp is created by the sensor when it reads
>>> >>> data
>>> >>> and  timestamp is something server creates when inserting data into
>>> >>> cassandra db.  At later point in time my django ux allows users to
>>> >>> browse
>>> >>> this data and reference interesting data points via the timestamp
>>> >>> field. The
>>> >>> timestamp field is my bridge between Sal and nosql world.
>>> >>>
>>> >>> Subodh
>>> >>>
>>> >>> On Aug 31, 2014 5:33 PM, "Laing, Michael" <mi...@nytimes.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Are event_time and timestamp essentially representing the same
>>> >>>> datetime?
>>> >>>>
>>> >>>> On Sunday, August 31, 2014, Subodh Nijsure
>>> >>>> <su...@gmail.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> I have following database schema
>>> >>>>>
>>> >>>>> CREATE TABLE sensor_info_table (
>>> >>>>>   asset_id text,
>>> >>>>>   event_time timestamp,
>>> >>>>>   "timestamp" timeuuid,
>>> >>>>>   sensor_reading map<text, text>,
>>> >>>>>   sensor_serial_number text,
>>> >>>>>   sensor_type int,
>>> >>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
>>> >>>>> );
>>> >>>>>
>>> >>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>>> >>>>>
>>> >>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>>> >>>>>
>>> >>>>> Now I am able to insert the data into this table, however I am
>>> >>>>> unable
>>> >>>>> to do following query where I want to select items with specific
>>> >>>>> timeuuid values.
>>> >>>>>
>>> >>>>> It gives me following error.
>>> >>>>>
>>> >>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
>>> >>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
>>> >>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>>> >>>>>
>>> >>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
>>> >>>>> (preceding column "event_time" is either not restricted or by a
>>> >>>>> non-EQ
>>> >>>>> relation)
>>> >>>>>
>>> >>>>> What do I have to do to make this work?
>>> >>>>>
>>> >>>>> For what its worth I am using django for my front end development
>>> >>>>> and
>>> >>>>> I am using "timestamp timeuuid" field as unique indentifier to
>>> >>>>> reference specific sensor reading from django framework -- since
>>> >>>>> cassandra doesn't have way to generate unique id upon insert (like
>>> >>>>> old-style rdms's auto-fields).
>>> >>>>>
>>> >>>>>
>>> >>>>> Below is software version info.
>>> >>>>>
>>> >>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
>>> >>>>> Thrift protocol 19.39.0]
>>> >>>>>
>>> >>>>> I really don't understand what the error message preceeding column
>>> >>>>> "event_time" is either not restricted or by no-EQ relation?
>>> >>>>>
>>> >>>>> -Subodh Nijsure
>>> >>
>>> >>
>>> >
>>
>>
>
>

Re: Help with select IN query in cassandra

Posted by Jack Krupansky <ja...@basetechnology.com>.
I did see a reference to deletions: “overall approaches considering volumes, deletion, compaction etc.” Did I merely misunderstand the reference? That’s all I was responding to... sorry if my misunderstanding added any confusion!

-- Jack Krupansky

From: Laing, Michael 
Sent: Monday, September 1, 2014 11:34 AM
To: user@cassandra.apache.org 
Subject: Re: Help with select IN query in cassandra

Did the OP propose that?



On Mon, Sep 1, 2014 at 10:53 AM, Jack Krupansky <ja...@basetechnology.com> wrote:

  One comment on deletions – aren’t deletions kind of an anti-pattern for modern data processing, such as sensor data, time series data, and social media? I mean, isn’t it usually better to return a full history of the data, with some aging scheme, and manage the tracking of which values are “current” (or “recent”)? Shouldn’t we be looking for and promoting “write once” approaches as a much stronger preference/pattern? Or maybe I should say “write once and bulk delete on aging” rather than the exercise in futility of doing a massive number of deletes and updates in place?

  -- Jack Krupansky

  From: Laing, Michael 
  Sent: Monday, September 1, 2014 9:33 AM
  To: user@cassandra.apache.org 
  Subject: Re: Help with select IN query in cassandra

  This should work for your query requirements - 2 tables w same info because disk is cheap and writes are fast so optimize for reads: 

  CREATE TABLE sensor_asset (
    asset_id text,
    event_time timestamp,
    tuuid timeuuid,
    sensor_reading map<text, text>,
    sensor_serial_number text,
    sensor_type int,
    PRIMARY KEY ((asset_id), event_time)
  );

  CREATE TABLE sensor_tuuid (
    asset_id text,
    event_time timestamp,
    tuuid timeuuid,
    sensor_reading map<text, text>,
    sensor_serial_number text,
    sensor_type int,
    PRIMARY KEY (tuuid)
  );

  1. Give me all sensor data for an asset:

  select * from sensor_asset where asset_id = <asset>;

  2. Give me sensor data that matches a set of timeuuids:

  select * from sensor_tuuid where tuuid in (<tuuid1>, <tuuid2>, ...);

  3. Give me all sensor data for an asset collected after | before | between event_time(s):

  select * from sensor_asset where asset_id = <asset> and event_time > <ts1>;
  select * from sensor_asset where asset_id = <asset> and event_time < <ts1>;
  select * from sensor_asset where asset_id = <asset> and event_time < <ts1> and event_time > <ts2>;

  ***

  Many people (not me) handle sensor data, so there may be better overall approaches considering volumes, deletion, compaction etc.

  But the above is simple and should make your current approach workable as you iterate toward a complete solution.

  Cheers,
  ml




  On Sun, Aug 31, 2014 at 11:08 PM, Subodh Nijsure <su...@gmail.com> wrote:

    Thanks for your help Michael.

    If specifying asset_id would help I can construct queries that can
    include asset_id

    So I have been "playing" around with PRIMARY KEY definition and
    following table definition


    CREATE TABLE sensor_info_table (
      asset_id text,
      event_time timestamp,
      "timestamp" timeuuid,
      sensor_reading map<text, text>,
      sensor_serial_number text,
      sensor_type int,

      PRIMARY KEY ((asset_id, "timestamp"), event_time)
    );

    It does what I want to do, and I removed the index for timestamp item
    since now it is part of primary key and thus my query like this works.

    SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
    timestamp IN ( 17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
    );

    But now this doesn't work it give

    SELECT * from sensor_info_table where  asset_id='3' ;

    Bad Request: Partition key part timestamp must be restricted since
    preceding part is

    I am keeping index on event_time as I sometime need to query something
    "give me all data since time x" i.e. something like this works.

    SELECT * from sensor_info_table where  event_time > '2014-08-31
    16:54:02-0700' ALLOW FILTERING;

    However if I do this things then this don't work:

    SELECT * from sensor_info_table where  asset_id='3' AND event_time >
    '2014-08-31 16:54:02-0700';

    Bad Request: Partition key part timestamp must be restricted since
    preceding part is

    Also  I am not conformable with fact that I need to specify ALLOW FILTERING.

    I guess cassandra schema design task asks designer to write down
    queries before designing schema.

    For the above table definition I want to do following queries:

    - Give me all sensor data for given asset.
    - Give me sensor data that matches given set of timeuuids
    - Give me all sendor data for a given asset, that were collected after
    | before | between  certain event_time.

    Given these query criteria how should  I construct my schema? One
    thought has occurred to me is make three tables with each item
    asset_id , event_time, timeuuid as primary keys and depending on type
    of query choose the table to do query upon. That seems like a waste of
    resources (disk, cpu ), also increasing insert times(!) but thats the
    way things need to happen in cassandra world its okay. ( I am
    two-three weeks into learning about cassandra).

    -Subodh

    On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael

    <mi...@nytimes.com> wrote:
    > Oh it must be late - I missed the fact that you didn't want to specify
    > asset_id. The above queries will still work but you have to use 'allow
    > filtering' - generally not a good idea. I'll look again in the morning.
    >
    >
    > On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael <mi...@nytimes.com>
    > wrote:
    >>
    >> Hmm. Because the clustering key is (event_time, "timestamp"), event_time
    >> must be specified as well - hopefully that info is available to the ux.
    >>
    >> Unfortunately you will then hit another problem with your query: you are
    >> selecting a collection field... this will not work with IN on "timestamp".
    >>
    >> So you could select all the "timestamp"s for an asset_id/event_time:
    >>>
    >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
    >>> 1231234;
    >>
    >>
    >> Or you could apply a range of "timestamp"s:
    >>>
    >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
    >>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
    >>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
    >>
    >>
    >> BTW the secondary indices are not a good idea: high cardinality and of no
    >> use in this query that I can see.
    >>
    >> ml
    >>
    >>
    >> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure <su...@gmail.com>
    >> wrote:
    >>>
    >>> Not really event time stamp is created by the sensor when it reads data
    >>> and  timestamp is something server creates when inserting data into
    >>> cassandra db.  At later point in time my django ux allows users to browse
    >>> this data and reference interesting data points via the timestamp field. The
    >>> timestamp field is my bridge between Sal and nosql world.
    >>>
    >>> Subodh
    >>>
    >>> On Aug 31, 2014 5:33 PM, "Laing, Michael" <mi...@nytimes.com>
    >>> wrote:
    >>>>
    >>>> Are event_time and timestamp essentially representing the same datetime?
    >>>>
    >>>> On Sunday, August 31, 2014, Subodh Nijsure <su...@gmail.com>
    >>>> wrote:
    >>>>>
    >>>>> I have following database schema
    >>>>>
    >>>>> CREATE TABLE sensor_info_table (
    >>>>>   asset_id text,
    >>>>>   event_time timestamp,
    >>>>>   "timestamp" timeuuid,
    >>>>>   sensor_reading map<text, text>,
    >>>>>   sensor_serial_number text,
    >>>>>   sensor_type int,
    >>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
    >>>>> );
    >>>>>
    >>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
    >>>>>
    >>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
    >>>>>
    >>>>> Now I am able to insert the data into this table, however I am unable
    >>>>> to do following query where I want to select items with specific
    >>>>> timeuuid values.
    >>>>>
    >>>>> It gives me following error.
    >>>>>
    >>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
    >>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
    >>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
    >>>>>
    >>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
    >>>>> (preceding column "event_time" is either not restricted or by a non-EQ
    >>>>> relation)
    >>>>>
    >>>>> What do I have to do to make this work?
    >>>>>
    >>>>> For what its worth I am using django for my front end development and
    >>>>> I am using "timestamp timeuuid" field as unique indentifier to
    >>>>> reference specific sensor reading from django framework -- since
    >>>>> cassandra doesn't have way to generate unique id upon insert (like
    >>>>> old-style rdms's auto-fields).
    >>>>>
    >>>>>
    >>>>> Below is software version info.
    >>>>>
    >>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
    >>>>> Thrift protocol 19.39.0]
    >>>>>
    >>>>> I really don't understand what the error message preceeding column
    >>>>> "event_time" is either not restricted or by no-EQ relation?
    >>>>>
    >>>>> -Subodh Nijsure
    >>
    >>
    >



Re: Help with select IN query in cassandra

Posted by "Laing, Michael" <mi...@nytimes.com>.
Did the OP propose that?


On Mon, Sep 1, 2014 at 10:53 AM, Jack Krupansky <ja...@basetechnology.com>
wrote:

>   One comment on deletions – aren’t deletions kind of an anti-pattern for
> modern data processing, such as sensor data, time series data, and social
> media? I mean, isn’t it usually better to return a full history of the
> data, with some aging scheme, and manage the tracking of which values are
> “current” (or “recent”)? Shouldn’t we be looking for and promoting “write
> once” approaches as a much stronger preference/pattern? Or maybe I should
> say “write once and bulk delete on aging” rather than the exercise in
> futility of doing a massive number of deletes and updates in place?
>
> -- Jack Krupansky
>
>  *From:* Laing, Michael <mi...@nytimes.com>
> *Sent:* Monday, September 1, 2014 9:33 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Help with select IN query in cassandra
>
>  This should work for your query requirements - 2 tables w same info
> because disk is cheap and writes are fast so optimize for reads:
>
>  CREATE TABLE sensor_asset (
>   asset_id text,
>   event_time timestamp,
>   tuuid timeuuid,
>   sensor_reading map<text, text>,
>   sensor_serial_number text,
>   sensor_type int,
>   PRIMARY KEY ((asset_id), event_time)
> );
>
> CREATE TABLE sensor_tuuid (
>   asset_id text,
>   event_time timestamp,
>   tuuid timeuuid,
>   sensor_reading map<text, text>,
>   sensor_serial_number text,
>   sensor_type int,
>   PRIMARY KEY (tuuid)
> );
>
> 1. Give me all sensor data for an asset:
>
> select * from sensor_asset where asset_id = <asset>;
>
> 2. Give me sensor data that matches a set of timeuuids:
>
> select * from sensor_tuuid where tuuid in (<tuuid1>, <tuuid2>, ...);
>
> 3. Give me all sensor data for an asset collected after | before | between
> event_time(s):
>
> select * from sensor_asset where asset_id = <asset> and event_time > <ts1>;
>  select * from sensor_asset where asset_id = <asset> and event_time <
> <ts1>;
>  select * from sensor_asset where asset_id = <asset> and event_time <
> <ts1> and event_time > <ts2>;
>
> ***
>
> Many people (not me) handle sensor data, so there may be better overall
> approaches considering volumes, deletion, compaction etc.
>
> But the above is simple and should make your current approach workable as
> you iterate toward a complete solution.
>
> Cheers,
> ml
>
>
>
> On Sun, Aug 31, 2014 at 11:08 PM, Subodh Nijsure <subodh.nijsure@gmail.com
> > wrote:
>
>> Thanks for your help Michael.
>>
>> If specifying asset_id would help I can construct queries that can
>> include asset_id
>>
>> So I have been "playing" around with PRIMARY KEY definition and
>> following table definition
>>
>> CREATE TABLE sensor_info_table (
>>   asset_id text,
>>   event_time timestamp,
>>   "timestamp" timeuuid,
>>   sensor_reading map<text, text>,
>>   sensor_serial_number text,
>>   sensor_type int,
>>   PRIMARY KEY ((asset_id, "timestamp"), event_time)
>> );
>>
>> It does what I want to do, and I removed the index for timestamp item
>> since now it is part of primary key and thus my query like this works.
>>
>> SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
>> timestamp IN (
>> 17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
>> );
>>
>> But now this doesn't work it give
>>
>> SELECT * from sensor_info_table where  asset_id='3' ;
>>
>> Bad Request: Partition key part timestamp must be restricted since
>> preceding part is
>>
>> I am keeping index on event_time as I sometime need to query something
>> "give me all data since time x" i.e. something like this works.
>>
>> SELECT * from sensor_info_table where  event_time > '2014-08-31
>> 16:54:02-0700' ALLOW FILTERING;
>>
>> However if I do this things then this don't work:
>>
>> SELECT * from sensor_info_table where  asset_id='3' AND event_time >
>> '2014-08-31 16:54:02-0700';
>>
>> Bad Request: Partition key part timestamp must be restricted since
>> preceding part is
>>
>> Also  I am not conformable with fact that I need to specify ALLOW
>> FILTERING.
>>
>> I guess cassandra schema design task asks designer to write down
>> queries before designing schema.
>>
>> For the above table definition I want to do following queries:
>>
>> - Give me all sensor data for given asset.
>> - Give me sensor data that matches given set of timeuuids
>> - Give me all sendor data for a given asset, that were collected after
>> | before | between  certain event_time.
>>
>> Given these query criteria how should  I construct my schema? One
>> thought has occurred to me is make three tables with each item
>> asset_id , event_time, timeuuid as primary keys and depending on type
>> of query choose the table to do query upon. That seems like a waste of
>> resources (disk, cpu ), also increasing insert times(!) but thats the
>> way things need to happen in cassandra world its okay. ( I am
>> two-three weeks into learning about cassandra).
>>
>> -Subodh
>>
>> On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael
>>  <mi...@nytimes.com> wrote:
>> > Oh it must be late - I missed the fact that you didn't want to specify
>> > asset_id. The above queries will still work but you have to use 'allow
>> > filtering' - generally not a good idea. I'll look again in the morning.
>> >
>> >
>> > On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael <
>> michael.laing@nytimes.com>
>> > wrote:
>> >>
>> >> Hmm. Because the clustering key is (event_time, "timestamp"),
>> event_time
>> >> must be specified as well - hopefully that info is available to the ux.
>> >>
>> >> Unfortunately you will then hit another problem with your query: you
>> are
>> >> selecting a collection field... this will not work with IN on
>> "timestamp".
>> >>
>> >> So you could select all the "timestamp"s for an asset_id/event_time:
>> >>>
>> >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>> >>> 1231234;
>> >>
>> >>
>> >> Or you could apply a range of "timestamp"s:
>> >>>
>> >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>> >>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
>> >>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
>> >>
>> >>
>> >> BTW the secondary indices are not a good idea: high cardinality and of
>> no
>> >> use in this query that I can see.
>> >>
>> >> ml
>> >>
>> >>
>> >> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure <
>> subodh.nijsure@gmail.com>
>> >> wrote:
>> >>>
>> >>> Not really event time stamp is created by the sensor when it reads
>> data
>> >>> and  timestamp is something server creates when inserting data into
>> >>> cassandra db.  At later point in time my django ux allows users to
>> browse
>> >>> this data and reference interesting data points via the timestamp
>> field. The
>> >>> timestamp field is my bridge between Sal and nosql world.
>> >>>
>> >>> Subodh
>> >>>
>> >>> On Aug 31, 2014 5:33 PM, "Laing, Michael" <mi...@nytimes.com>
>> >>> wrote:
>> >>>>
>> >>>> Are event_time and timestamp essentially representing the same
>> datetime?
>> >>>>
>> >>>> On Sunday, August 31, 2014, Subodh Nijsure <subodh.nijsure@gmail.com
>> >
>> >>>> wrote:
>> >>>>>
>> >>>>> I have following database schema
>> >>>>>
>> >>>>> CREATE TABLE sensor_info_table (
>> >>>>>   asset_id text,
>> >>>>>   event_time timestamp,
>> >>>>>   "timestamp" timeuuid,
>> >>>>>   sensor_reading map<text, text>,
>> >>>>>   sensor_serial_number text,
>> >>>>>   sensor_type int,
>> >>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
>> >>>>> );
>> >>>>>
>> >>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>> >>>>>
>> >>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>> >>>>>
>> >>>>> Now I am able to insert the data into this table, however I am
>> unable
>> >>>>> to do following query where I want to select items with specific
>> >>>>> timeuuid values.
>> >>>>>
>> >>>>> It gives me following error.
>> >>>>>
>> >>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
>> >>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
>> >>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>> >>>>>
>> >>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
>> >>>>> (preceding column "event_time" is either not restricted or by a
>> non-EQ
>> >>>>> relation)
>> >>>>>
>> >>>>> What do I have to do to make this work?
>> >>>>>
>> >>>>> For what its worth I am using django for my front end development
>> and
>> >>>>> I am using "timestamp timeuuid" field as unique indentifier to
>> >>>>> reference specific sensor reading from django framework -- since
>> >>>>> cassandra doesn't have way to generate unique id upon insert (like
>> >>>>> old-style rdms's auto-fields).
>> >>>>>
>> >>>>>
>> >>>>> Below is software version info.
>> >>>>>
>> >>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
>> >>>>> Thrift protocol 19.39.0]
>> >>>>>
>> >>>>> I really don't understand what the error message preceeding column
>> >>>>> "event_time" is either not restricted or by no-EQ relation?
>> >>>>>
>> >>>>> -Subodh Nijsure
>> >>
>> >>
>> >
>>
>
>

Re: Help with select IN query in cassandra

Posted by Jack Krupansky <ja...@basetechnology.com>.
One comment on deletions – aren’t deletions kind of an anti-pattern for modern data processing, such as sensor data, time series data, and social media? I mean, isn’t it usually better to return a full history of the data, with some aging scheme, and manage the tracking of which values are “current” (or “recent”)? Shouldn’t we be looking for and promoting “write once” approaches as a much stronger preference/pattern? Or maybe I should say “write once and bulk delete on aging” rather than the exercise in futility of doing a massive number of deletes and updates in place?

-- Jack Krupansky

From: Laing, Michael 
Sent: Monday, September 1, 2014 9:33 AM
To: user@cassandra.apache.org 
Subject: Re: Help with select IN query in cassandra

This should work for your query requirements - 2 tables w same info because disk is cheap and writes are fast so optimize for reads: 

CREATE TABLE sensor_asset (
  asset_id text,
  event_time timestamp,
  tuuid timeuuid,
  sensor_reading map<text, text>,
  sensor_serial_number text,
  sensor_type int,
  PRIMARY KEY ((asset_id), event_time)
);

CREATE TABLE sensor_tuuid (
  asset_id text,
  event_time timestamp,
  tuuid timeuuid,
  sensor_reading map<text, text>,
  sensor_serial_number text,
  sensor_type int,
  PRIMARY KEY (tuuid)
);

1. Give me all sensor data for an asset:

select * from sensor_asset where asset_id = <asset>;

2. Give me sensor data that matches a set of timeuuids:

select * from sensor_tuuid where tuuid in (<tuuid1>, <tuuid2>, ...);

3. Give me all sensor data for an asset collected after | before | between event_time(s):

select * from sensor_asset where asset_id = <asset> and event_time > <ts1>;
select * from sensor_asset where asset_id = <asset> and event_time < <ts1>;
select * from sensor_asset where asset_id = <asset> and event_time < <ts1> and event_time > <ts2>;

***

Many people (not me) handle sensor data, so there may be better overall approaches considering volumes, deletion, compaction etc.

But the above is simple and should make your current approach workable as you iterate toward a complete solution.

Cheers,
ml




On Sun, Aug 31, 2014 at 11:08 PM, Subodh Nijsure <su...@gmail.com> wrote:

  Thanks for your help Michael.

  If specifying asset_id would help I can construct queries that can
  include asset_id

  So I have been "playing" around with PRIMARY KEY definition and
  following table definition


  CREATE TABLE sensor_info_table (
    asset_id text,
    event_time timestamp,
    "timestamp" timeuuid,
    sensor_reading map<text, text>,
    sensor_serial_number text,
    sensor_type int,

    PRIMARY KEY ((asset_id, "timestamp"), event_time)
  );

  It does what I want to do, and I removed the index for timestamp item
  since now it is part of primary key and thus my query like this works.

  SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
  timestamp IN ( 17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
  );

  But now this doesn't work it give

  SELECT * from sensor_info_table where  asset_id='3' ;

  Bad Request: Partition key part timestamp must be restricted since
  preceding part is

  I am keeping index on event_time as I sometime need to query something
  "give me all data since time x" i.e. something like this works.

  SELECT * from sensor_info_table where  event_time > '2014-08-31
  16:54:02-0700' ALLOW FILTERING;

  However if I do this things then this don't work:

  SELECT * from sensor_info_table where  asset_id='3' AND event_time >
  '2014-08-31 16:54:02-0700';

  Bad Request: Partition key part timestamp must be restricted since
  preceding part is

  Also  I am not conformable with fact that I need to specify ALLOW FILTERING.

  I guess cassandra schema design task asks designer to write down
  queries before designing schema.

  For the above table definition I want to do following queries:

  - Give me all sensor data for given asset.
  - Give me sensor data that matches given set of timeuuids
  - Give me all sendor data for a given asset, that were collected after
  | before | between  certain event_time.

  Given these query criteria how should  I construct my schema? One
  thought has occurred to me is make three tables with each item
  asset_id , event_time, timeuuid as primary keys and depending on type
  of query choose the table to do query upon. That seems like a waste of
  resources (disk, cpu ), also increasing insert times(!) but thats the
  way things need to happen in cassandra world its okay. ( I am
  two-three weeks into learning about cassandra).

  -Subodh

  On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael

  <mi...@nytimes.com> wrote:
  > Oh it must be late - I missed the fact that you didn't want to specify
  > asset_id. The above queries will still work but you have to use 'allow
  > filtering' - generally not a good idea. I'll look again in the morning.
  >
  >
  > On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael <mi...@nytimes.com>
  > wrote:
  >>
  >> Hmm. Because the clustering key is (event_time, "timestamp"), event_time
  >> must be specified as well - hopefully that info is available to the ux.
  >>
  >> Unfortunately you will then hit another problem with your query: you are
  >> selecting a collection field... this will not work with IN on "timestamp".
  >>
  >> So you could select all the "timestamp"s for an asset_id/event_time:
  >>>
  >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
  >>> 1231234;
  >>
  >>
  >> Or you could apply a range of "timestamp"s:
  >>>
  >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
  >>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
  >>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
  >>
  >>
  >> BTW the secondary indices are not a good idea: high cardinality and of no
  >> use in this query that I can see.
  >>
  >> ml
  >>
  >>
  >> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure <su...@gmail.com>
  >> wrote:
  >>>
  >>> Not really event time stamp is created by the sensor when it reads data
  >>> and  timestamp is something server creates when inserting data into
  >>> cassandra db.  At later point in time my django ux allows users to browse
  >>> this data and reference interesting data points via the timestamp field. The
  >>> timestamp field is my bridge between Sal and nosql world.
  >>>
  >>> Subodh
  >>>
  >>> On Aug 31, 2014 5:33 PM, "Laing, Michael" <mi...@nytimes.com>
  >>> wrote:
  >>>>
  >>>> Are event_time and timestamp essentially representing the same datetime?
  >>>>
  >>>> On Sunday, August 31, 2014, Subodh Nijsure <su...@gmail.com>
  >>>> wrote:
  >>>>>
  >>>>> I have following database schema
  >>>>>
  >>>>> CREATE TABLE sensor_info_table (
  >>>>>   asset_id text,
  >>>>>   event_time timestamp,
  >>>>>   "timestamp" timeuuid,
  >>>>>   sensor_reading map<text, text>,
  >>>>>   sensor_serial_number text,
  >>>>>   sensor_type int,
  >>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
  >>>>> );
  >>>>>
  >>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
  >>>>>
  >>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
  >>>>>
  >>>>> Now I am able to insert the data into this table, however I am unable
  >>>>> to do following query where I want to select items with specific
  >>>>> timeuuid values.
  >>>>>
  >>>>> It gives me following error.
  >>>>>
  >>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
  >>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
  >>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
  >>>>>
  >>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
  >>>>> (preceding column "event_time" is either not restricted or by a non-EQ
  >>>>> relation)
  >>>>>
  >>>>> What do I have to do to make this work?
  >>>>>
  >>>>> For what its worth I am using django for my front end development and
  >>>>> I am using "timestamp timeuuid" field as unique indentifier to
  >>>>> reference specific sensor reading from django framework -- since
  >>>>> cassandra doesn't have way to generate unique id upon insert (like
  >>>>> old-style rdms's auto-fields).
  >>>>>
  >>>>>
  >>>>> Below is software version info.
  >>>>>
  >>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
  >>>>> Thrift protocol 19.39.0]
  >>>>>
  >>>>> I really don't understand what the error message preceeding column
  >>>>> "event_time" is either not restricted or by no-EQ relation?
  >>>>>
  >>>>> -Subodh Nijsure
  >>
  >>
  >


Re: Help with select IN query in cassandra

Posted by "Laing, Michael" <mi...@nytimes.com>.
This should work for your query requirements - 2 tables w same info because
disk is cheap and writes are fast so optimize for reads:

CREATE TABLE sensor_asset (
  asset_id text,
  event_time timestamp,
  tuuid timeuuid,
  sensor_reading map<text, text>,
  sensor_serial_number text,
  sensor_type int,
  PRIMARY KEY ((asset_id), event_time)
);

CREATE TABLE sensor_tuuid (
  asset_id text,
  event_time timestamp,
  tuuid timeuuid,
  sensor_reading map<text, text>,
  sensor_serial_number text,
  sensor_type int,
  PRIMARY KEY (tuuid)
);

1. Give me all sensor data for an asset:

select * from sensor_asset where asset_id = <asset>;

2. Give me sensor data that matches a set of timeuuids:

select * from sensor_tuuid where tuuid in (<tuuid1>, <tuuid2>, ...);

3. Give me all sensor data for an asset collected after | before | between
event_time(s):

select * from sensor_asset where asset_id = <asset> and event_time > <ts1>;
select * from sensor_asset where asset_id = <asset> and event_time < <ts1>;
select * from sensor_asset where asset_id = <asset> and event_time < <ts1>
and event_time > <ts2>;

***

Many people (not me) handle sensor data, so there may be better overall
approaches considering volumes, deletion, compaction etc.

But the above is simple and should make your current approach workable as
you iterate toward a complete solution.

Cheers,
ml



On Sun, Aug 31, 2014 at 11:08 PM, Subodh Nijsure <su...@gmail.com>
wrote:

> Thanks for your help Michael.
>
> If specifying asset_id would help I can construct queries that can
> include asset_id
>
> So I have been "playing" around with PRIMARY KEY definition and
> following table definition
>
> CREATE TABLE sensor_info_table (
>   asset_id text,
>   event_time timestamp,
>   "timestamp" timeuuid,
>   sensor_reading map<text, text>,
>   sensor_serial_number text,
>   sensor_type int,
>   PRIMARY KEY ((asset_id, "timestamp"), event_time)
> );
>
> It does what I want to do, and I removed the index for timestamp item
> since now it is part of primary key and thus my query like this works.
>
> SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
> timestamp IN (
> 17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
> );
>
> But now this doesn't work it give
>
> SELECT * from sensor_info_table where  asset_id='3' ;
>
> Bad Request: Partition key part timestamp must be restricted since
> preceding part is
>
> I am keeping index on event_time as I sometime need to query something
> "give me all data since time x" i.e. something like this works.
>
>  SELECT * from sensor_info_table where  event_time > '2014-08-31
> 16:54:02-0700' ALLOW FILTERING;
>
> However if I do this things then this don't work:
>
> SELECT * from sensor_info_table where  asset_id='3' AND event_time >
> '2014-08-31 16:54:02-0700';
>
> Bad Request: Partition key part timestamp must be restricted since
> preceding part is
>
> Also  I am not conformable with fact that I need to specify ALLOW
> FILTERING.
>
> I guess cassandra schema design task asks designer to write down
> queries before designing schema.
>
> For the above table definition I want to do following queries:
>
> - Give me all sensor data for given asset.
> - Give me sensor data that matches given set of timeuuids
> - Give me all sendor data for a given asset, that were collected after
> | before | between  certain event_time.
>
> Given these query criteria how should  I construct my schema? One
> thought has occurred to me is make three tables with each item
> asset_id , event_time, timeuuid as primary keys and depending on type
> of query choose the table to do query upon. That seems like a waste of
> resources (disk, cpu ), also increasing insert times(!) but thats the
> way things need to happen in cassandra world its okay. ( I am
> two-three weeks into learning about cassandra).
>
> -Subodh
>
> On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael
> <mi...@nytimes.com> wrote:
> > Oh it must be late - I missed the fact that you didn't want to specify
> > asset_id. The above queries will still work but you have to use 'allow
> > filtering' - generally not a good idea. I'll look again in the morning.
> >
> >
> > On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael <
> michael.laing@nytimes.com>
> > wrote:
> >>
> >> Hmm. Because the clustering key is (event_time, "timestamp"), event_time
> >> must be specified as well - hopefully that info is available to the ux.
> >>
> >> Unfortunately you will then hit another problem with your query: you are
> >> selecting a collection field... this will not work with IN on
> "timestamp".
> >>
> >> So you could select all the "timestamp"s for an asset_id/event_time:
> >>>
> >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
> >>> 1231234;
> >>
> >>
> >> Or you could apply a range of "timestamp"s:
> >>>
> >>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
> >>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
> >>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
> >>
> >>
> >> BTW the secondary indices are not a good idea: high cardinality and of
> no
> >> use in this query that I can see.
> >>
> >> ml
> >>
> >>
> >> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure <
> subodh.nijsure@gmail.com>
> >> wrote:
> >>>
> >>> Not really event time stamp is created by the sensor when it reads data
> >>> and  timestamp is something server creates when inserting data into
> >>> cassandra db.  At later point in time my django ux allows users to
> browse
> >>> this data and reference interesting data points via the timestamp
> field. The
> >>> timestamp field is my bridge between Sal and nosql world.
> >>>
> >>> Subodh
> >>>
> >>> On Aug 31, 2014 5:33 PM, "Laing, Michael" <mi...@nytimes.com>
> >>> wrote:
> >>>>
> >>>> Are event_time and timestamp essentially representing the same
> datetime?
> >>>>
> >>>> On Sunday, August 31, 2014, Subodh Nijsure <su...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> I have following database schema
> >>>>>
> >>>>> CREATE TABLE sensor_info_table (
> >>>>>   asset_id text,
> >>>>>   event_time timestamp,
> >>>>>   "timestamp" timeuuid,
> >>>>>   sensor_reading map<text, text>,
> >>>>>   sensor_serial_number text,
> >>>>>   sensor_type int,
> >>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
> >>>>> );
> >>>>>
> >>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
> >>>>>
> >>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
> >>>>>
> >>>>> Now I am able to insert the data into this table, however I am unable
> >>>>> to do following query where I want to select items with specific
> >>>>> timeuuid values.
> >>>>>
> >>>>> It gives me following error.
> >>>>>
> >>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
> >>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
> >>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
> >>>>>
> >>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
> >>>>> (preceding column "event_time" is either not restricted or by a
> non-EQ
> >>>>> relation)
> >>>>>
> >>>>> What do I have to do to make this work?
> >>>>>
> >>>>> For what its worth I am using django for my front end development and
> >>>>> I am using "timestamp timeuuid" field as unique indentifier to
> >>>>> reference specific sensor reading from django framework -- since
> >>>>> cassandra doesn't have way to generate unique id upon insert (like
> >>>>> old-style rdms's auto-fields).
> >>>>>
> >>>>>
> >>>>> Below is software version info.
> >>>>>
> >>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
> >>>>> Thrift protocol 19.39.0]
> >>>>>
> >>>>> I really don't understand what the error message preceeding column
> >>>>> "event_time" is either not restricted or by no-EQ relation?
> >>>>>
> >>>>> -Subodh Nijsure
> >>
> >>
> >
>

Re: Help with select IN query in cassandra

Posted by Subodh Nijsure <su...@gmail.com>.
Thanks for your help Michael.

If specifying asset_id would help I can construct queries that can
include asset_id

So I have been "playing" around with PRIMARY KEY definition and
following table definition

CREATE TABLE sensor_info_table (
  asset_id text,
  event_time timestamp,
  "timestamp" timeuuid,
  sensor_reading map<text, text>,
  sensor_serial_number text,
  sensor_type int,
  PRIMARY KEY ((asset_id, "timestamp"), event_time)
);

It does what I want to do, and I removed the index for timestamp item
since now it is part of primary key and thus my query like this works.

SELECT * from sigsense.sensor_info_table where  asset_id='3' AND
timestamp IN ( 17830bb0-316a-11e4-800f-b888e30f5d17,16ddbdfe-316a-11e4-9f50-b888e30f5d17
);

But now this doesn't work it give

SELECT * from sensor_info_table where  asset_id='3' ;

Bad Request: Partition key part timestamp must be restricted since
preceding part is

I am keeping index on event_time as I sometime need to query something
"give me all data since time x" i.e. something like this works.

 SELECT * from sensor_info_table where  event_time > '2014-08-31
16:54:02-0700' ALLOW FILTERING;

However if I do this things then this don't work:

SELECT * from sensor_info_table where  asset_id='3' AND event_time >
'2014-08-31 16:54:02-0700';

Bad Request: Partition key part timestamp must be restricted since
preceding part is

Also  I am not conformable with fact that I need to specify ALLOW FILTERING.

I guess cassandra schema design task asks designer to write down
queries before designing schema.

For the above table definition I want to do following queries:

- Give me all sensor data for given asset.
- Give me sensor data that matches given set of timeuuids
- Give me all sendor data for a given asset, that were collected after
| before | between  certain event_time.

Given these query criteria how should  I construct my schema? One
thought has occurred to me is make three tables with each item
asset_id , event_time, timeuuid as primary keys and depending on type
of query choose the table to do query upon. That seems like a waste of
resources (disk, cpu ), also increasing insert times(!) but thats the
way things need to happen in cassandra world its okay. ( I am
two-three weeks into learning about cassandra).

-Subodh

On Sun, Aug 31, 2014 at 6:44 PM, Laing, Michael
<mi...@nytimes.com> wrote:
> Oh it must be late - I missed the fact that you didn't want to specify
> asset_id. The above queries will still work but you have to use 'allow
> filtering' - generally not a good idea. I'll look again in the morning.
>
>
> On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael <mi...@nytimes.com>
> wrote:
>>
>> Hmm. Because the clustering key is (event_time, "timestamp"), event_time
>> must be specified as well - hopefully that info is available to the ux.
>>
>> Unfortunately you will then hit another problem with your query: you are
>> selecting a collection field... this will not work with IN on "timestamp".
>>
>> So you could select all the "timestamp"s for an asset_id/event_time:
>>>
>>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>>> 1231234;
>>
>>
>> Or you could apply a range of "timestamp"s:
>>>
>>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
>>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
>>
>>
>> BTW the secondary indices are not a good idea: high cardinality and of no
>> use in this query that I can see.
>>
>> ml
>>
>>
>> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure <su...@gmail.com>
>> wrote:
>>>
>>> Not really event time stamp is created by the sensor when it reads data
>>> and  timestamp is something server creates when inserting data into
>>> cassandra db.  At later point in time my django ux allows users to browse
>>> this data and reference interesting data points via the timestamp field. The
>>> timestamp field is my bridge between Sal and nosql world.
>>>
>>> Subodh
>>>
>>> On Aug 31, 2014 5:33 PM, "Laing, Michael" <mi...@nytimes.com>
>>> wrote:
>>>>
>>>> Are event_time and timestamp essentially representing the same datetime?
>>>>
>>>> On Sunday, August 31, 2014, Subodh Nijsure <su...@gmail.com>
>>>> wrote:
>>>>>
>>>>> I have following database schema
>>>>>
>>>>> CREATE TABLE sensor_info_table (
>>>>>   asset_id text,
>>>>>   event_time timestamp,
>>>>>   "timestamp" timeuuid,
>>>>>   sensor_reading map<text, text>,
>>>>>   sensor_serial_number text,
>>>>>   sensor_type int,
>>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
>>>>> );
>>>>>
>>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>>>>>
>>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>>>>>
>>>>> Now I am able to insert the data into this table, however I am unable
>>>>> to do following query where I want to select items with specific
>>>>> timeuuid values.
>>>>>
>>>>> It gives me following error.
>>>>>
>>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
>>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
>>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>>>>>
>>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
>>>>> (preceding column "event_time" is either not restricted or by a non-EQ
>>>>> relation)
>>>>>
>>>>> What do I have to do to make this work?
>>>>>
>>>>> For what its worth I am using django for my front end development and
>>>>> I am using "timestamp timeuuid" field as unique indentifier to
>>>>> reference specific sensor reading from django framework -- since
>>>>> cassandra doesn't have way to generate unique id upon insert (like
>>>>> old-style rdms's auto-fields).
>>>>>
>>>>>
>>>>> Below is software version info.
>>>>>
>>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
>>>>> Thrift protocol 19.39.0]
>>>>>
>>>>> I really don't understand what the error message preceeding column
>>>>> "event_time" is either not restricted or by no-EQ relation?
>>>>>
>>>>> -Subodh Nijsure
>>
>>
>

Re: Help with select IN query in cassandra

Posted by "Laing, Michael" <mi...@nytimes.com>.
Oh it must be late - I missed the fact that you didn't want to specify
asset_id. The above queries will still work but you have to use 'allow
filtering' - generally not a good idea. I'll look again in the morning.


On Sun, Aug 31, 2014 at 9:41 PM, Laing, Michael <mi...@nytimes.com>
wrote:

> Hmm. Because the clustering key is (event_time, "timestamp"), event_time
> must be specified as well - hopefully that info is available to the ux.
>
> Unfortunately you will then hit another problem with your query: you are
> selecting a collection field... this will not work with IN on "timestamp".
>
> So you could select all the "timestamp"s for an asset_id/event_time:
>
>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>> 1231234;
>
>
> Or you could apply a range of "timestamp"s:
>
>> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
>> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
>> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;
>
>
> BTW the secondary indices are not a good idea: high cardinality and of no
> use in this query that I can see.
>
> ml
>
>
> On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure <su...@gmail.com>
> wrote:
>
>> Not really event time stamp is created by the sensor when it reads data
>> and  timestamp is something server creates when inserting data into
>> cassandra db.  At later point in time my django ux allows users to browse
>> this data and reference interesting data points via the timestamp field.
>> The timestamp field is my bridge between Sal and nosql world.
>>
>> Subodh
>> On Aug 31, 2014 5:33 PM, "Laing, Michael" <mi...@nytimes.com>
>> wrote:
>>
>>> Are event_time and timestamp essentially representing the same datetime?
>>>
>>> On Sunday, August 31, 2014, Subodh Nijsure <su...@gmail.com>
>>> wrote:
>>>
>>>> I have following database schema
>>>>
>>>> CREATE TABLE sensor_info_table (
>>>>   asset_id text,
>>>>   event_time timestamp,
>>>>   "timestamp" timeuuid,
>>>>   sensor_reading map<text, text>,
>>>>   sensor_serial_number text,
>>>>   sensor_type int,
>>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
>>>> );
>>>>
>>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>>>>
>>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>>>>
>>>> Now I am able to insert the data into this table, however I am unable
>>>> to do following query where I want to select items with specific
>>>> timeuuid values.
>>>>
>>>> It gives me following error.
>>>>
>>>> SELECT * from mydb.sensor_info_table where timestamp IN (
>>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
>>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>>>>
>>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
>>>> (preceding column "event_time" is either not restricted or by a non-EQ
>>>> relation)
>>>>
>>>> What do I have to do to make this work?
>>>>
>>>> For what its worth I am using django for my front end development and
>>>> I am using "timestamp timeuuid" field as unique indentifier to
>>>> reference specific sensor reading from django framework -- since
>>>> cassandra doesn't have way to generate unique id upon insert (like
>>>> old-style rdms's auto-fields).
>>>>
>>>>
>>>> Below is software version info.
>>>>
>>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
>>>> Thrift protocol 19.39.0]
>>>>
>>>> I really don't understand what the error message preceeding column
>>>> "event_time" is either not restricted or by no-EQ relation?
>>>>
>>>> -Subodh Nijsure
>>>>
>>>
>

Re: Help with select IN query in cassandra

Posted by "Laing, Michael" <mi...@nytimes.com>.
Hmm. Because the clustering key is (event_time, "timestamp"), event_time
must be specified as well - hopefully that info is available to the ux.

Unfortunately you will then hit another problem with your query: you are
selecting a collection field... this will not work with IN on "timestamp".

So you could select all the "timestamp"s for an asset_id/event_time:

> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
> 1231234;


Or you could apply a range of "timestamp"s:

> SELECT * from sensor_info_table where asset_id = 'a' and event_time =
> 1231234 and "timestamp" > 1d934af3-3178-11e4-ba8d-406c8f1838fa and
> "timestamp" < 20b82021-3178-11e4-abc2-406c8f1838fa;


BTW the secondary indices are not a good idea: high cardinality and of no
use in this query that I can see.

ml


On Sun, Aug 31, 2014 at 8:40 PM, Subodh Nijsure <su...@gmail.com>
wrote:

> Not really event time stamp is created by the sensor when it reads data
> and  timestamp is something server creates when inserting data into
> cassandra db.  At later point in time my django ux allows users to browse
> this data and reference interesting data points via the timestamp field.
> The timestamp field is my bridge between Sal and nosql world.
>
> Subodh
> On Aug 31, 2014 5:33 PM, "Laing, Michael" <mi...@nytimes.com>
> wrote:
>
>> Are event_time and timestamp essentially representing the same datetime?
>>
>> On Sunday, August 31, 2014, Subodh Nijsure <su...@gmail.com>
>> wrote:
>>
>>> I have following database schema
>>>
>>> CREATE TABLE sensor_info_table (
>>>   asset_id text,
>>>   event_time timestamp,
>>>   "timestamp" timeuuid,
>>>   sensor_reading map<text, text>,
>>>   sensor_serial_number text,
>>>   sensor_type int,
>>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
>>> );
>>>
>>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>>>
>>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>>>
>>> Now I am able to insert the data into this table, however I am unable
>>> to do following query where I want to select items with specific
>>> timeuuid values.
>>>
>>> It gives me following error.
>>>
>>> SELECT * from mydb.sensor_info_table where timestamp IN (
>>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
>>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>>>
>>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
>>> (preceding column "event_time" is either not restricted or by a non-EQ
>>> relation)
>>>
>>> What do I have to do to make this work?
>>>
>>> For what its worth I am using django for my front end development and
>>> I am using "timestamp timeuuid" field as unique indentifier to
>>> reference specific sensor reading from django framework -- since
>>> cassandra doesn't have way to generate unique id upon insert (like
>>> old-style rdms's auto-fields).
>>>
>>>
>>> Below is software version info.
>>>
>>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
>>> Thrift protocol 19.39.0]
>>>
>>> I really don't understand what the error message preceeding column
>>> "event_time" is either not restricted or by no-EQ relation?
>>>
>>> -Subodh Nijsure
>>>
>>

Re: Help with select IN query in cassandra

Posted by Subodh Nijsure <su...@gmail.com>.
Not really event time stamp is created by the sensor when it reads data
and  timestamp is something server creates when inserting data into
cassandra db.  At later point in time my django ux allows users to browse
this data and reference interesting data points via the timestamp field.
The timestamp field is my bridge between Sal and nosql world.

Subodh
On Aug 31, 2014 5:33 PM, "Laing, Michael" <mi...@nytimes.com> wrote:

> Are event_time and timestamp essentially representing the same datetime?
>
> On Sunday, August 31, 2014, Subodh Nijsure <su...@gmail.com>
> wrote:
>
>> I have following database schema
>>
>> CREATE TABLE sensor_info_table (
>>   asset_id text,
>>   event_time timestamp,
>>   "timestamp" timeuuid,
>>   sensor_reading map<text, text>,
>>   sensor_serial_number text,
>>   sensor_type int,
>>   PRIMARY KEY ((asset_id), event_time, "timestamp")
>> );
>>
>> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>>
>> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>>
>> Now I am able to insert the data into this table, however I am unable
>> to do following query where I want to select items with specific
>> timeuuid values.
>>
>> It gives me following error.
>>
>> SELECT * from mydb.sensor_info_table where timestamp IN (
>> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
>> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>>
>> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
>> (preceding column "event_time" is either not restricted or by a non-EQ
>> relation)
>>
>> What do I have to do to make this work?
>>
>> For what its worth I am using django for my front end development and
>> I am using "timestamp timeuuid" field as unique indentifier to
>> reference specific sensor reading from django framework -- since
>> cassandra doesn't have way to generate unique id upon insert (like
>> old-style rdms's auto-fields).
>>
>>
>> Below is software version info.
>>
>> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
>> Thrift protocol 19.39.0]
>>
>> I really don't understand what the error message preceeding column
>> "event_time" is either not restricted or by no-EQ relation?
>>
>> -Subodh Nijsure
>>
>

Re: Help with select IN query in cassandra

Posted by "Laing, Michael" <mi...@nytimes.com>.
Are event_time and timestamp essentially representing the same datetime?

On Sunday, August 31, 2014, Subodh Nijsure <su...@gmail.com> wrote:

> I have following database schema
>
> CREATE TABLE sensor_info_table (
>   asset_id text,
>   event_time timestamp,
>   "timestamp" timeuuid,
>   sensor_reading map<text, text>,
>   sensor_serial_number text,
>   sensor_type int,
>   PRIMARY KEY ((asset_id), event_time, "timestamp")
> );
>
> CREATE INDEX event_time_index ON sensor_info_table (event_time);
>
> CREATE INDEX timestamp_index ON sensor_info_table ("timestamp");
>
> Now I am able to insert the data into this table, however I am unable
> to do following query where I want to select items with specific
> timeuuid values.
>
> It gives me following error.
>
> SELECT * from mydb.sensor_info_table where timestamp IN (
> bfdfa614-3166-11e4-a61d-b888e30f5d17 ,
> bf4521ac-3166-11e4-87a3-b888e30f5d17) ;
>
> Bad Request: PRIMARY KEY column "timestamp" cannot be restricted
> (preceding column "event_time" is either not restricted or by a non-EQ
> relation)
>
> What do I have to do to make this work?
>
> For what its worth I am using django for my front end development and
> I am using "timestamp timeuuid" field as unique indentifier to
> reference specific sensor reading from django framework -- since
> cassandra doesn't have way to generate unique id upon insert (like
> old-style rdms's auto-fields).
>
>
> Below is software version info.
>
> show VERSION ; [cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 |
> Thrift protocol 19.39.0]
>
> I really don't understand what the error message preceeding column
> "event_time" is either not restricted or by no-EQ relation?
>
> -Subodh Nijsure
>