You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Pål Andreassen <pa...@bouvet.no> on 2015/08/31 15:48:12 UTC

Cassandra 2.2 for time series

Hi

I'm currently evaluating Cassandra as a potiantial database for storing time series data from lots of devices (IoT type of scenario).
Currently we have a few thousand devices with X channels (measurements) that they report at different intervals (from 5 minutes and up).

I've created as simple test table to store the data:

CREATE TABLE DataRaw(
  channelId int,
  sampleTime timestamp,
  value double,
  PRIMARY KEY (channelId, sampleTime)
) WITH CLUSTERING ORDER BY (sampleTime ASC);

This schema seems to work ok, but I have queries that I need to support that I cannot easily figure out how to perform (except getting all the data out and iterate it myself).

Query 1: For max and min queries, I not only want the maximum/minimum value, but also the corresponding timestamp.


sampleTime          value

2015-08-28 00:00    10

2015-08-28 01:00    15

2015-08-28 02:00    13

I'd like the max query to return both 2015-08-28 01:00 and 15. SELECT sampleTime, max(value) FROM DataRAW return the max value, but the first sampleTime.
Also I wonder if Cassandra has built-in support for interpolation/extrapolation. Some sort of group by hour/day/week/month and even year function.

Query 2: Give me hourly averages for channel X for yesterday. I'd expect to get 24 values each of which is the hourly average. Or give my daily averages for last year for a given channel. Should return 365 daily averages.

Best regards

Pål Andreassen
54°23'58"S 3°18'53"E
Konsulent
Mobil +47 982 85 504
pal.andreassen@bouvet.no<ma...@bouvet.no>

Bouvet Norge AS
Avdeling Grenland
Uniongata 18, Klosterøya
N-3732 Skien
Tlf +47 23 40 60 00
bouvet.no<http://www.bouvet.no/?utm_campaign=e-post&utm_source=ansatt&utm_medium=email>


Re: Cassandra 2.2 for time series

Posted by Kévin LOVATO <kl...@alprema.com>.
This solution is very "sql-like", meaning that you query what you want when
you need it.
Unfortunately this will probably not scale as your data grows, you might
want to consider de-normalizing your data. You could maintain a min/max
average in the application that inserts the data, or have a batch that runs
periodically to precompute the data.

On Thu, Sep 3, 2015 at 6:59 AM, Kevin Burton <bu...@spinn3r.com> wrote:

> Check out kairosd for a time series db on Cassandra.
> On Aug 31, 2015 7:12 AM, "Peter Lin" <wo...@gmail.com> wrote:
>
>>
>> I didn't realize they had added max and min as stock functions.
>>
>> to get the sample time. you'll probably need to write a custom function.
>> google for it and you'll find people that have done it.
>>
>> On Mon, Aug 31, 2015 at 10:09 AM, Pål Andreassen <
>> pal.andreassen@bouvet.no> wrote:
>>
>>> Cassandra 2.2 has min and max built-in. My problem is getting the
>>> corresponding sample time as well.
>>>
>>>
>>>
>>> *Pål Andreassen*
>>>
>>> *54°23'58"S 3°18'53"E*
>>>
>>> *Konsulent*
>>>
>>> Mobil +47 982 85 504
>>>
>>> pal.andreassen@bouvet.no
>>>
>>>
>>>
>>>
>>> *Bouvet Norge AS Avdeling Grenland*
>>>
>>> Uniongata 18, Klosterøya
>>>
>>> N-3732 Skien
>>>
>>> Tlf +47 23 40 60 00
>>>
>>> *bouvet.no*
>>> <http://www.bouvet.no/?utm_campaign=e-post&utm_source=ansatt&utm_medium=email>
>>>
>>>
>>>
>>> *From:* Peter Lin [mailto:woolfel@gmail.com]
>>> *Sent:* mandag 31. august 2015 16.09
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Cassandra 2.2 for time series
>>>
>>>
>>>
>>>
>>>
>>> Unlike SQL, CQL doesn't have built-in functions like max/min
>>>
>>> In the past, people would create summary tables to keep rolling stats
>>> for reports/analytics. In cql3, there's user defined functions, so you can
>>> write a function to do max/min
>>>
>>> http://cassandra.apache.org/doc/cql3/CQL-2.2.html#selectStmt
>>> http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udfs
>>>
>>>
>>>
>>> On Mon, Aug 31, 2015 at 9:48 AM, Pål Andreassen <
>>> pal.andreassen@bouvet.no> wrote:
>>>
>>> Hi
>>>
>>>
>>>
>>> I’m currently evaluating Cassandra as a potiantial database for storing
>>> time series data from lots of devices (IoT type of scenario).
>>>
>>> Currently we have a few thousand devices with X channels (measurements)
>>> that they report at different intervals (from 5 minutes and up).
>>>
>>>
>>>
>>> I’ve created as simple test table to store the data:
>>>
>>>
>>>
>>> CREATE TABLE DataRaw(
>>>
>>>   channelId int,
>>>
>>>   sampleTime timestamp,
>>>
>>>   value double,
>>>
>>>   PRIMARY KEY (channelId, sampleTime)
>>>
>>> ) WITH CLUSTERING ORDER BY (sampleTime ASC);
>>>
>>>
>>>
>>> This schema seems to work ok, but I have queries that I need to support
>>> that I cannot easily figure out how to perform (except getting all the data
>>> out and iterate it myself).
>>>
>>>
>>>
>>> Query 1: For max and min queries, I not only want the maximum/minimum
>>> value, but also the corresponding timestamp.
>>>
>>>
>>>
>>> sampleTime          value
>>>
>>> 2015-08-28 00:00    10
>>>
>>> 2015-08-28 01:00    15
>>>
>>> 2015-08-28 02:00    13
>>>
>>>
>>> I'd like the max query to return both 2015-08-28 01:00 and 15. SELECT
>>> sampleTime, max(value) FROM DataRAW return the max value, but the first
>>> sampleTime.
>>>
>>> Also I wonder if Cassandra has built-in support for
>>> interpolation/extrapolation. Some sort of group by hour/day/week/month and
>>> even year function.
>>>
>>>
>>>
>>> Query 2: Give me hourly averages for channel X for yesterday. I’d expect
>>> to get 24 values each of which is the hourly average. Or give my daily
>>> averages for last year for a given channel. Should return 365 daily
>>> averages.
>>>
>>>
>>>
>>> Best regards
>>>
>>>
>>>
>>> *Pål Andreassen*
>>>
>>> *54°23'58"S 3°18'53"E*
>>>
>>> *Konsulent*
>>>
>>> Mobil +47 982 85 504
>>>
>>> pal.andreassen@bouvet.no
>>>
>>>
>>>
>>>
>>> *Bouvet Norge AS Avdeling Grenland*
>>>
>>> Uniongata 18, Klosterøya
>>>
>>> N-3732 Skien
>>>
>>> Tlf +47 23 40 60 00
>>>
>>> *bouvet.no*
>>> <http://www.bouvet.no/?utm_campaign=e-post&utm_source=ansatt&utm_medium=email>
>>>
>>>
>>>
>>>
>>>
>>
>>

Re: Cassandra 2.2 for time series

Posted by Kevin Burton <bu...@spinn3r.com>.
Check out kairosd for a time series db on Cassandra.
On Aug 31, 2015 7:12 AM, "Peter Lin" <wo...@gmail.com> wrote:

>
> I didn't realize they had added max and min as stock functions.
>
> to get the sample time. you'll probably need to write a custom function.
> google for it and you'll find people that have done it.
>
> On Mon, Aug 31, 2015 at 10:09 AM, Pål Andreassen <pal.andreassen@bouvet.no
> > wrote:
>
>> Cassandra 2.2 has min and max built-in. My problem is getting the
>> corresponding sample time as well.
>>
>>
>>
>> *Pål Andreassen*
>>
>> *54°23'58"S 3°18'53"E*
>>
>> *Konsulent*
>>
>> Mobil +47 982 85 504
>>
>> pal.andreassen@bouvet.no
>>
>>
>>
>>
>> *Bouvet Norge AS Avdeling Grenland*
>>
>> Uniongata 18, Klosterøya
>>
>> N-3732 Skien
>>
>> Tlf +47 23 40 60 00
>>
>> *bouvet.no*
>> <http://www.bouvet.no/?utm_campaign=e-post&utm_source=ansatt&utm_medium=email>
>>
>>
>>
>> *From:* Peter Lin [mailto:woolfel@gmail.com]
>> *Sent:* mandag 31. august 2015 16.09
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Cassandra 2.2 for time series
>>
>>
>>
>>
>>
>> Unlike SQL, CQL doesn't have built-in functions like max/min
>>
>> In the past, people would create summary tables to keep rolling stats for
>> reports/analytics. In cql3, there's user defined functions, so you can
>> write a function to do max/min
>>
>> http://cassandra.apache.org/doc/cql3/CQL-2.2.html#selectStmt
>> http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udfs
>>
>>
>>
>> On Mon, Aug 31, 2015 at 9:48 AM, Pål Andreassen <pa...@bouvet.no>
>> wrote:
>>
>> Hi
>>
>>
>>
>> I’m currently evaluating Cassandra as a potiantial database for storing
>> time series data from lots of devices (IoT type of scenario).
>>
>> Currently we have a few thousand devices with X channels (measurements)
>> that they report at different intervals (from 5 minutes and up).
>>
>>
>>
>> I’ve created as simple test table to store the data:
>>
>>
>>
>> CREATE TABLE DataRaw(
>>
>>   channelId int,
>>
>>   sampleTime timestamp,
>>
>>   value double,
>>
>>   PRIMARY KEY (channelId, sampleTime)
>>
>> ) WITH CLUSTERING ORDER BY (sampleTime ASC);
>>
>>
>>
>> This schema seems to work ok, but I have queries that I need to support
>> that I cannot easily figure out how to perform (except getting all the data
>> out and iterate it myself).
>>
>>
>>
>> Query 1: For max and min queries, I not only want the maximum/minimum
>> value, but also the corresponding timestamp.
>>
>>
>>
>> sampleTime          value
>>
>> 2015-08-28 00:00    10
>>
>> 2015-08-28 01:00    15
>>
>> 2015-08-28 02:00    13
>>
>>
>> I'd like the max query to return both 2015-08-28 01:00 and 15. SELECT
>> sampleTime, max(value) FROM DataRAW return the max value, but the first
>> sampleTime.
>>
>> Also I wonder if Cassandra has built-in support for
>> interpolation/extrapolation. Some sort of group by hour/day/week/month and
>> even year function.
>>
>>
>>
>> Query 2: Give me hourly averages for channel X for yesterday. I’d expect
>> to get 24 values each of which is the hourly average. Or give my daily
>> averages for last year for a given channel. Should return 365 daily
>> averages.
>>
>>
>>
>> Best regards
>>
>>
>>
>> *Pål Andreassen*
>>
>> *54°23'58"S 3°18'53"E*
>>
>> *Konsulent*
>>
>> Mobil +47 982 85 504
>>
>> pal.andreassen@bouvet.no
>>
>>
>>
>>
>> *Bouvet Norge AS Avdeling Grenland*
>>
>> Uniongata 18, Klosterøya
>>
>> N-3732 Skien
>>
>> Tlf +47 23 40 60 00
>>
>> *bouvet.no*
>> <http://www.bouvet.no/?utm_campaign=e-post&utm_source=ansatt&utm_medium=email>
>>
>>
>>
>>
>>
>
>

Re: Cassandra 2.2 for time series

Posted by Peter Lin <wo...@gmail.com>.
I didn't realize they had added max and min as stock functions.

to get the sample time. you'll probably need to write a custom function.
google for it and you'll find people that have done it.

On Mon, Aug 31, 2015 at 10:09 AM, Pål Andreassen <pa...@bouvet.no>
wrote:

> Cassandra 2.2 has min and max built-in. My problem is getting the
> corresponding sample time as well.
>
>
>
> *Pål Andreassen*
>
> *54°23'58"S 3°18'53"E*
>
> *Konsulent*
>
> Mobil +47 982 85 504
>
> pal.andreassen@bouvet.no
>
>
>
>
> *Bouvet Norge AS Avdeling Grenland*
>
> Uniongata 18, Klosterøya
>
> N-3732 Skien
>
> Tlf +47 23 40 60 00
>
> *bouvet.no*
> <http://www.bouvet.no/?utm_campaign=e-post&utm_source=ansatt&utm_medium=email>
>
>
>
> *From:* Peter Lin [mailto:woolfel@gmail.com]
> *Sent:* mandag 31. august 2015 16.09
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra 2.2 for time series
>
>
>
>
>
> Unlike SQL, CQL doesn't have built-in functions like max/min
>
> In the past, people would create summary tables to keep rolling stats for
> reports/analytics. In cql3, there's user defined functions, so you can
> write a function to do max/min
>
> http://cassandra.apache.org/doc/cql3/CQL-2.2.html#selectStmt
> http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udfs
>
>
>
> On Mon, Aug 31, 2015 at 9:48 AM, Pål Andreassen <pa...@bouvet.no>
> wrote:
>
> Hi
>
>
>
> I’m currently evaluating Cassandra as a potiantial database for storing
> time series data from lots of devices (IoT type of scenario).
>
> Currently we have a few thousand devices with X channels (measurements)
> that they report at different intervals (from 5 minutes and up).
>
>
>
> I’ve created as simple test table to store the data:
>
>
>
> CREATE TABLE DataRaw(
>
>   channelId int,
>
>   sampleTime timestamp,
>
>   value double,
>
>   PRIMARY KEY (channelId, sampleTime)
>
> ) WITH CLUSTERING ORDER BY (sampleTime ASC);
>
>
>
> This schema seems to work ok, but I have queries that I need to support
> that I cannot easily figure out how to perform (except getting all the data
> out and iterate it myself).
>
>
>
> Query 1: For max and min queries, I not only want the maximum/minimum
> value, but also the corresponding timestamp.
>
>
>
> sampleTime          value
>
> 2015-08-28 00:00    10
>
> 2015-08-28 01:00    15
>
> 2015-08-28 02:00    13
>
>
> I'd like the max query to return both 2015-08-28 01:00 and 15. SELECT
> sampleTime, max(value) FROM DataRAW return the max value, but the first
> sampleTime.
>
> Also I wonder if Cassandra has built-in support for
> interpolation/extrapolation. Some sort of group by hour/day/week/month and
> even year function.
>
>
>
> Query 2: Give me hourly averages for channel X for yesterday. I’d expect
> to get 24 values each of which is the hourly average. Or give my daily
> averages for last year for a given channel. Should return 365 daily
> averages.
>
>
>
> Best regards
>
>
>
> *Pål Andreassen*
>
> *54°23'58"S 3°18'53"E*
>
> *Konsulent*
>
> Mobil +47 982 85 504
>
> pal.andreassen@bouvet.no
>
>
>
>
> *Bouvet Norge AS Avdeling Grenland*
>
> Uniongata 18, Klosterøya
>
> N-3732 Skien
>
> Tlf +47 23 40 60 00
>
> *bouvet.no*
> <http://www.bouvet.no/?utm_campaign=e-post&utm_source=ansatt&utm_medium=email>
>
>
>
>
>

RE: Cassandra 2.2 for time series

Posted by Pål Andreassen <pa...@bouvet.no>.
Cassandra 2.2 has min and max built-in. My problem is getting the corresponding sample time as well.

Pål Andreassen
54°23'58"S 3°18'53"E
Konsulent
Mobil +47 982 85 504
pal.andreassen@bouvet.no<ma...@bouvet.no>

Bouvet Norge AS
Avdeling Grenland
Uniongata 18, Klosterøya
N-3732 Skien
Tlf +47 23 40 60 00
bouvet.no<http://www.bouvet.no/?utm_campaign=e-post&utm_source=ansatt&utm_medium=email>

From: Peter Lin [mailto:woolfel@gmail.com]
Sent: mandag 31. august 2015 16.09
To: user@cassandra.apache.org
Subject: Re: Cassandra 2.2 for time series


Unlike SQL, CQL doesn't have built-in functions like max/min
In the past, people would create summary tables to keep rolling stats for reports/analytics. In cql3, there's user defined functions, so you can write a function to do max/min

http://cassandra.apache.org/doc/cql3/CQL-2.2.html#selectStmt
http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udfs

On Mon, Aug 31, 2015 at 9:48 AM, Pål Andreassen <pa...@bouvet.no>> wrote:
Hi

I’m currently evaluating Cassandra as a potiantial database for storing time series data from lots of devices (IoT type of scenario).
Currently we have a few thousand devices with X channels (measurements) that they report at different intervals (from 5 minutes and up).

I’ve created as simple test table to store the data:

CREATE TABLE DataRaw(
  channelId int,
  sampleTime timestamp,
  value double,
  PRIMARY KEY (channelId, sampleTime)
) WITH CLUSTERING ORDER BY (sampleTime ASC);

This schema seems to work ok, but I have queries that I need to support that I cannot easily figure out how to perform (except getting all the data out and iterate it myself).

Query 1: For max and min queries, I not only want the maximum/minimum value, but also the corresponding timestamp.


sampleTime          value

2015-08-28 00:00    10

2015-08-28 01:00    15

2015-08-28 02:00    13

I'd like the max query to return both 2015-08-28 01:00 and 15. SELECT sampleTime, max(value) FROM DataRAW return the max value, but the first sampleTime.
Also I wonder if Cassandra has built-in support for interpolation/extrapolation. Some sort of group by hour/day/week/month and even year function.

Query 2: Give me hourly averages for channel X for yesterday. I’d expect to get 24 values each of which is the hourly average. Or give my daily averages for last year for a given channel. Should return 365 daily averages.

Best regards

Pål Andreassen
54°23'58"S 3°18'53"E
Konsulent
Mobil +47 982 85 504<tel:%2B47%C2%A0982%2085%20504>
pal.andreassen@bouvet.no<ma...@bouvet.no>

Bouvet Norge AS
Avdeling Grenland
Uniongata 18, Klosterøya
N-3732 Skien
Tlf +47 23 40 60 00<tel:%2B47%2023%2040%2060%2000>
bouvet.no<http://www.bouvet.no/?utm_campaign=e-post&utm_source=ansatt&utm_medium=email>



Re: Cassandra 2.2 for time series

Posted by Peter Lin <wo...@gmail.com>.
Unlike SQL, CQL doesn't have built-in functions like max/min

In the past, people would create summary tables to keep rolling stats for
reports/analytics. In cql3, there's user defined functions, so you can
write a function to do max/min

http://cassandra.apache.org/doc/cql3/CQL-2.2.html#selectStmt
http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udfs

On Mon, Aug 31, 2015 at 9:48 AM, Pål Andreassen <pa...@bouvet.no>
wrote:

> Hi
>
>
>
> I’m currently evaluating Cassandra as a potiantial database for storing
> time series data from lots of devices (IoT type of scenario).
>
> Currently we have a few thousand devices with X channels (measurements)
> that they report at different intervals (from 5 minutes and up).
>
>
>
> I’ve created as simple test table to store the data:
>
>
>
> CREATE TABLE DataRaw(
>
>   channelId int,
>
>   sampleTime timestamp,
>
>   value double,
>
>   PRIMARY KEY (channelId, sampleTime)
>
> ) WITH CLUSTERING ORDER BY (sampleTime ASC);
>
>
>
> This schema seems to work ok, but I have queries that I need to support
> that I cannot easily figure out how to perform (except getting all the data
> out and iterate it myself).
>
>
>
> Query 1: For max and min queries, I not only want the maximum/minimum
> value, but also the corresponding timestamp.
>
>
>
> sampleTime          value
>
> 2015-08-28 00:00    10
>
> 2015-08-28 01:00    15
>
> 2015-08-28 02:00    13
>
>
> I'd like the max query to return both 2015-08-28 01:00 and 15. SELECT
> sampleTime, max(value) FROM DataRAW return the max value, but the first
> sampleTime.
>
> Also I wonder if Cassandra has built-in support for
> interpolation/extrapolation. Some sort of group by hour/day/week/month and
> even year function.
>
>
>
> Query 2: Give me hourly averages for channel X for yesterday. I’d expect
> to get 24 values each of which is the hourly average. Or give my daily
> averages for last year for a given channel. Should return 365 daily
> averages.
>
>
>
> Best regards
>
>
>
> *Pål Andreassen*
>
> *54°23'58"S 3°18'53"E*
>
> *Konsulent*
>
> Mobil +47 982 85 504
>
> pal.andreassen@bouvet.no
>
>
>
>
> *Bouvet Norge AS Avdeling Grenland*
>
> Uniongata 18, Klosterøya
>
> N-3732 Skien
>
> Tlf +47 23 40 60 00
>
> *bouvet.no*
> <http://www.bouvet.no/?utm_campaign=e-post&utm_source=ansatt&utm_medium=email>
>
>
>