You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Peter Figliozzi <pe...@gmail.com> on 2016/08/27 03:17:22 UTC

Need help with simple schema for time-series

I have data from many sensors as time-series:

   - Sensor name
   - Date
   - Time
   - value

I want to query windows of both date and time.  For example, 8am - 9am from
Aug. 1st to Aug 10th.

Here's what I did:

CREATE TABLE mykeyspace.mytable (
    sensorname text,
    date date,
    time time,
    data MAP<text, int>,
    PRIMARY KEY (sensorname, date, time)
);


However, when we query this, Cassandra restricts us to an "equal" relation
for the date, if we are to select a window of time.  So with that schema,
I'd have to query once for each date.


What's the right way to do this??  ("Right" defined as extracting a window
of date and of time in one query.)


Thank you,


Pete

Re: Need help with simple schema for time-series

Posted by Peter Figliozzi <pe...@gmail.com>.
Thanks, guys, for your help.  I tried the filtering method and it works
great.
Sincerely,
Pete

On Sat, Aug 27, 2016 at 12:36 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Ah, i see what you're looking for.  No, my schema wouldn't work for that.
> I had read through your question a little quickly.
>
> In cassandra 3.5 support was added for more flexible ALLOW FILTERING
> statements.  Here's an example:
>
> CREATE TABLE mytable (
>     sensorname text,
>     date date,
>     time time,
>     data int,
>     PRIMARY KEY (sensorname, date, time)
> );
>
> cqlsh:test> insert into mytable (sensorname, date, time, data) values
> ('test', '2016-03-03', '08:00:00', 1);
> cqlsh:test> insert into mytable (sensorname, date, time, data) values
> ('test', '2016-03-04', '13:00:00', 2);
> cqlsh:test> insert into mytable (sensorname, date, time, data) values
> ('test', '2016-03-05', '17:00:00', 3);
> cqlsh:test> insert into mytable (sensorname, date, time, data) values
> ('test', '2016-03-06', '12:00:00', 4);
> cqlsh:test> insert into mytable (sensorname, date, time, data) values
> ('test', '2016-03-07', '07:00:00', 5);
> cqlsh:test> insert into mytable (sensorname, date, time, data) values
> ('test', '2016-03-08', '15:00:00', 6);
> cqlsh:test>
> cqlsh:test>
> cqlsh:test> select * from mytable where sensorname = 'test'
>         ...                       and date >= '2016-03-04'
>         ...                       and date <= '2016-03-07'
>         ...                       and time >= '11:00:00'
>         ...                       and time <= '14:00:00'
>         ...                       allow filtering;
>
>  sensorname | date       | time               | data
> ------------+------------+--------------------+------
>        test | 2016-03-04 | 13:00:00.000000000 |    2
>        test | 2016-03-06 | 12:00:00.000000000 |    4
>
> (2 rows)
>
>
>
> On Fri, Aug 26, 2016 at 10:02 PM Peter Figliozzi <pe...@gmail.com>
> wrote:
>
>> I don't believe that would let me query a time of day range, over a date
>> range, would it?  For example, between 8am and 9am, August 1st through
>> August 10th.
>>
>> On Fri, Aug 26, 2016 at 11:52 PM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>>> Use a timestamp instead of 2 separate fields and you can query on the
>>> range.
>>>
>>> CREATE TABLE mytable (
>>>     sensorname text,
>>>     reading_time timestamp,
>>>     data MAP<text, int>,
>>>     PRIMARY KEY (sensorname, reading_time)
>>> );
>>>
>>>
>>>
>>> On Fri, Aug 26, 2016 at 8:17 PM Peter Figliozzi <
>>> pete.figliozzi@gmail.com> wrote:
>>>
>>>> I have data from many sensors as time-series:
>>>>
>>>>    - Sensor name
>>>>    - Date
>>>>    - Time
>>>>    - value
>>>>
>>>> I want to query windows of both date and time.  For example, 8am - 9am
>>>> from Aug. 1st to Aug 10th.
>>>>
>>>> Here's what I did:
>>>>
>>>> CREATE TABLE mykeyspace.mytable (
>>>>     sensorname text,
>>>>     date date,
>>>>     time time,
>>>>     data MAP<text, int>,
>>>>     PRIMARY KEY (sensorname, date, time)
>>>> );
>>>>
>>>>
>>>> However, when we query this, Cassandra restricts us to an "equal"
>>>> relation for the date, if we are to select a window of time.  So with that
>>>> schema, I'd have to query once for each date.
>>>>
>>>>
>>>> What's the right way to do this??  ("Right" defined as extracting a
>>>> window of date and of time in one query.)
>>>>
>>>>
>>>> Thank you,
>>>>
>>>>
>>>> Pete
>>>>
>>>
>>

Re: Need help with simple schema for time-series

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Ah, i see what you're looking for.  No, my schema wouldn't work for that.
I had read through your question a little quickly.

In cassandra 3.5 support was added for more flexible ALLOW FILTERING
statements.  Here's an example:

CREATE TABLE mytable (
    sensorname text,
    date date,
    time time,
    data int,
    PRIMARY KEY (sensorname, date, time)
);

cqlsh:test> insert into mytable (sensorname, date, time, data) values
('test', '2016-03-03', '08:00:00', 1);
cqlsh:test> insert into mytable (sensorname, date, time, data) values
('test', '2016-03-04', '13:00:00', 2);
cqlsh:test> insert into mytable (sensorname, date, time, data) values
('test', '2016-03-05', '17:00:00', 3);
cqlsh:test> insert into mytable (sensorname, date, time, data) values
('test', '2016-03-06', '12:00:00', 4);
cqlsh:test> insert into mytable (sensorname, date, time, data) values
('test', '2016-03-07', '07:00:00', 5);
cqlsh:test> insert into mytable (sensorname, date, time, data) values
('test', '2016-03-08', '15:00:00', 6);
cqlsh:test>
cqlsh:test>
cqlsh:test> select * from mytable where sensorname = 'test'
        ...                       and date >= '2016-03-04'
        ...                       and date <= '2016-03-07'
        ...                       and time >= '11:00:00'
        ...                       and time <= '14:00:00'
        ...                       allow filtering;

 sensorname | date       | time               | data
------------+------------+--------------------+------
       test | 2016-03-04 | 13:00:00.000000000 |    2
       test | 2016-03-06 | 12:00:00.000000000 |    4

(2 rows)



On Fri, Aug 26, 2016 at 10:02 PM Peter Figliozzi <pe...@gmail.com>
wrote:

> I don't believe that would let me query a time of day range, over a date
> range, would it?  For example, between 8am and 9am, August 1st through
> August 10th.
>
> On Fri, Aug 26, 2016 at 11:52 PM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> Use a timestamp instead of 2 separate fields and you can query on the
>> range.
>>
>> CREATE TABLE mytable (
>>     sensorname text,
>>     reading_time timestamp,
>>     data MAP<text, int>,
>>     PRIMARY KEY (sensorname, reading_time)
>> );
>>
>>
>>
>> On Fri, Aug 26, 2016 at 8:17 PM Peter Figliozzi <pe...@gmail.com>
>> wrote:
>>
>>> I have data from many sensors as time-series:
>>>
>>>    - Sensor name
>>>    - Date
>>>    - Time
>>>    - value
>>>
>>> I want to query windows of both date and time.  For example, 8am - 9am
>>> from Aug. 1st to Aug 10th.
>>>
>>> Here's what I did:
>>>
>>> CREATE TABLE mykeyspace.mytable (
>>>     sensorname text,
>>>     date date,
>>>     time time,
>>>     data MAP<text, int>,
>>>     PRIMARY KEY (sensorname, date, time)
>>> );
>>>
>>>
>>> However, when we query this, Cassandra restricts us to an "equal"
>>> relation for the date, if we are to select a window of time.  So with that
>>> schema, I'd have to query once for each date.
>>>
>>>
>>> What's the right way to do this??  ("Right" defined as extracting a
>>> window of date and of time in one query.)
>>>
>>>
>>> Thank you,
>>>
>>>
>>> Pete
>>>
>>
>

Re: Need help with simple schema for time-series

Posted by Jeff Jirsa <je...@crowdstrike.com>.
To do 8-9am on Aug-1 through Aug-10, you’d likely need to do either multiple queries in parallel (fire off async), or use some clever IN logic.

 

Or, you’d need to break your table up so the first clustering key is the hour of the day, and then you could do this:

 

CREATE TABLE mytable (

    sensorname text,

    hour int,

    reading_time timestamp,

    data MAP<text, int>,

    PRIMARY KEY ((sensorname, hour), reading_time)

);          

 

The obvious downside here is that if you need to query for more than one hour of a given day, you’re back to multiple queries.

 

 

From: Peter Figliozzi <pe...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Friday, August 26, 2016 at 10:02 PM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: Re: Need help with simple schema for time-series

 

I don't believe that would let me query a time of day range, over a date range, would it?  For example, between 8am and 9am, August 1st through August 10th.

 

On Fri, Aug 26, 2016 at 11:52 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

Use a timestamp instead of 2 separate fields and you can query on the range. 

 

CREATE TABLE mytable (

    sensorname text,

    reading_time timestamp,

    data MAP<text, int>,

    PRIMARY KEY (sensorname, reading_time)

);        

 

 

 

On Fri, Aug 26, 2016 at 8:17 PM Peter Figliozzi <pe...@gmail.com> wrote:

I have data from many sensors as time-series:
Sensor name
Date
Time
value
I want to query windows of both date and time.  For example, 8am - 9am from Aug. 1st to Aug 10th.

Here's what I did:

CREATE TABLE mykeyspace.mytable (
    sensorname text,
    date date,
    time time,
    data MAP<text, int>,
    PRIMARY KEY (sensorname, date, time)
);

 

However, when we query this, Cassandra restricts us to an "equal" relation for the date, if we are to select a window of time.  So with that schema, I'd have to query once for each date.

 

What's the right way to do this??  ("Right" defined as extracting a window of date and of time in one query.)

 

Thank you,

 

Pete

 


Re: Need help with simple schema for time-series

Posted by Peter Figliozzi <pe...@gmail.com>.
I don't believe that would let me query a time of day range, over a date
range, would it?  For example, between 8am and 9am, August 1st through
August 10th.

On Fri, Aug 26, 2016 at 11:52 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Use a timestamp instead of 2 separate fields and you can query on the
> range.
>
> CREATE TABLE mytable (
>     sensorname text,
>     reading_time timestamp,
>     data MAP<text, int>,
>     PRIMARY KEY (sensorname, reading_time)
> );
>
>
>
> On Fri, Aug 26, 2016 at 8:17 PM Peter Figliozzi <pe...@gmail.com>
> wrote:
>
>> I have data from many sensors as time-series:
>>
>>    - Sensor name
>>    - Date
>>    - Time
>>    - value
>>
>> I want to query windows of both date and time.  For example, 8am - 9am
>> from Aug. 1st to Aug 10th.
>>
>> Here's what I did:
>>
>> CREATE TABLE mykeyspace.mytable (
>>     sensorname text,
>>     date date,
>>     time time,
>>     data MAP<text, int>,
>>     PRIMARY KEY (sensorname, date, time)
>> );
>>
>>
>> However, when we query this, Cassandra restricts us to an "equal"
>> relation for the date, if we are to select a window of time.  So with that
>> schema, I'd have to query once for each date.
>>
>>
>> What's the right way to do this??  ("Right" defined as extracting a
>> window of date and of time in one query.)
>>
>>
>> Thank you,
>>
>>
>> Pete
>>
>

Re: Need help with simple schema for time-series

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Use a timestamp instead of 2 separate fields and you can query on the range.

CREATE TABLE mytable (
    sensorname text,
    reading_time timestamp,
    data MAP<text, int>,
    PRIMARY KEY (sensorname, reading_time)
);



On Fri, Aug 26, 2016 at 8:17 PM Peter Figliozzi <pe...@gmail.com>
wrote:

> I have data from many sensors as time-series:
>
>    - Sensor name
>    - Date
>    - Time
>    - value
>
> I want to query windows of both date and time.  For example, 8am - 9am
> from Aug. 1st to Aug 10th.
>
> Here's what I did:
>
> CREATE TABLE mykeyspace.mytable (
>     sensorname text,
>     date date,
>     time time,
>     data MAP<text, int>,
>     PRIMARY KEY (sensorname, date, time)
> );
>
>
> However, when we query this, Cassandra restricts us to an "equal"
> relation for the date, if we are to select a window of time.  So with that
> schema, I'd have to query once for each date.
>
>
> What's the right way to do this??  ("Right" defined as extracting a window
> of date and of time in one query.)
>
>
> Thank you,
>
>
> Pete
>

Re: Need help with simple schema for time-series

Posted by Noorul Islam K M <no...@noorul.com>.
http://kairosdb.github.io/

Regards,
Noorul

Peter Figliozzi <pe...@gmail.com> writes:

> I have data from many sensors as time-series:
>
>    - Sensor name
>    - Date
>    - Time
>    - value
>
> I want to query windows of both date and time.  For example, 8am - 9am from
> Aug. 1st to Aug 10th.
>
> Here's what I did:
>
> CREATE TABLE mykeyspace.mytable (
>     sensorname text,
>     date date,
>     time time,
>     data MAP<text, int>,
>     PRIMARY KEY (sensorname, date, time)
> );
>
>
> However, when we query this, Cassandra restricts us to an "equal" relation
> for the date, if we are to select a window of time.  So with that schema,
> I'd have to query once for each date.
>
>
> What's the right way to do this??  ("Right" defined as extracting a window
> of date and of time in one query.)
>
>
> Thank you,
>
>
> Pete