You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Welly Tambunan <if...@gmail.com> on 2015/12/03 10:50:27 UTC

Storing Time Series data efficiently on Ignite

Hi Igniters,

Currently we are trying to asses the possibility of using Ignite on our
Architecture.

We have a case where we want to store time series data in memory.  We will
have a lots of sensor data. So i think i will use sensor id as a key to
retrieve the time series from cache.

However i can't find any sorted list structure in ignite to store our time
series. The index can be long ( for time ). So it will need to be sorted by
index.


We also have a query for getting range of index, ex: Give me all series
from start idx to end idx.
For updating we also need to be able to update a range with new data
series.

We just don't want to re uploaded the data again over and over again to
cache everytime there's an update on the range. We want to be able to
update the cache partially based on the range.

Is there any way we can achieve this on Ignite ?

Any suggestion or reference would be really appreciated.

Cheers

-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

Re: Storing Time Series data efficiently on Ignite

Posted by bintisepaha <bi...@tudor.com>.
Welly, would you please mind sharing how did this work out for you? what was
the time-series size and how was the performance?

Thanks,
Binti



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Storing Time Series data efficiently on Ignite

Posted by bintisepaha <bi...@tudor.com>.
Welly,

Hi, wondering how this turned our for you? we have a similar use case now.
DId you end up using ignite for this?

Thanks,
Binti



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Storing Time Series data efficiently on Ignite

Posted by Welly Tambunan <if...@gmail.com>.
Hi Denis,

Thanks a lot. This is really useful. I will experiment with this approach

Cheers

On Thu, Dec 3, 2015 at 7:59 PM, Denis Magda <dm...@gridgain.com> wrote:

> Welly,
>
> Please see below
>
> On 12/3/2015 3:33 PM, Welly Tambunan wrote:
>
> Hi Denis,
>
> Thanks for your clear explanation.
>
> Our data structure is something like this one <sensorid: UUID, time: Long,
> value: Double>
>
> When i put a data point with that composite key. <sensorid, time>, is
> there any guarantee that it will be store close together in a same node ?
>
> Yes, to force such a behavior you should use so called affinity
> collocation.
> https://apacheignite.readme.io/docs/affinity-collocation
>
> In you case you can mark sensorId as an affinity key and this will enforce
> all its data (and keys) to be stored on the same partition where sensorID
> is mapped to.
>
> SampleKey {
> @AffinityKeyMapped
> int sernsorId;
>
> long sampleTime;
> }
>
> In case of update, however we need to be able to update in range, like
> replace the range <startId, endIdx, List[DataPoint]>, (where DataPoint =>
> <time, value> )
> so it will clear that range and insert the new data point into that range.
>
> If the result set of such query is not significant then you can split it
> into two steps:
> - use SQL query to retrieve keys which entries should be updated or
> removed according to you 'WHERE' clause above;
> - use removeAll or putAll to delete or update the values for the keys.
>
>
> So we need to do two step process to do that ? Select the key and then
> delete the key ?
>
> If the questions are related to the range like query above, then, yes in
> cases if the SQL result set is not significant follow this way.
> Otherwise it's always possible to come up with other solution.
>
> Regards,
> Denis
>
>
> Thanks
>
>
>
>
>
> On Thu, Dec 3, 2015 at 7:00 PM, Denis Magda <dm...@gridgain.com> wrote:
>
>> Hi Welly,
>>
>> Ignite perfectly fits for your task.
>>
>> First, if I understand you properly there are will be many time series
>> for a give sensor ID.
>> If so then I would use a compound key like (sensorId, time) for all cache
>> related operations.
>>
>> As an example you may want to use classes like this.
>>
>> SampleKey {
>> int sernsorId;
>> long sampleTime;
>> }
>>
>> Sample {
>> int sensorId;
>> long sampleTime;
>> byte[] data1;
>> byte[] data2;
>> etc.
>> }
>>
>> And use them this way
>>
>> cache.put(new SampleKey(1, time), sample);
>> cache.get(new SampleKey(2, time));
>>
>> Second, to retrieve samples depending on sensor ID, time (time range) or
>> other parameters you can leverage Ignite SQL engine that is designed
>> exactly for the use cases you have. [1]
>> However, if you're going to use an object field in 'SELECT' or 'WHERE'
>> clause you have to annotate it properly or specify using CacheTypeMetadata
>> [2]
>>
>> Third, when you need to update data series you can remove the old one and
>> insert the new one that should have new time.
>> To perform a remove I would suggest doing the following:
>> - select sensor ID and sampleTime of all the entries to delete;
>> - use cache.removeAll by passing SampleKeys that are created using the
>> data retrieved with SQL above.
>>
>> Moreover, you can use an eviction or expire policy that is used in cases
>> when old data must be removed from cache automatically.
>> Just refer to these articles for more info  - [3]
>>
>> Finally, Ignite has bunch of cache and SQL related examples. They are
>> located in "datagrid" folder of "examples" module.
>> Have a look at them and probably you'll come up with better solution
>> based on Ignite that suggested by me above cause definitely you know all
>> the details of your case better ;)
>>
>> [1] https://apacheignite.readme.io/docs/sql-queries
>> [2]
>> https://apacheignite.readme.io/docs/sql-queries#configuring-sql-indexes-by-annotations
>> [3] https://apacheignite.readme.io/docs/evictions
>> [4] https://apacheignite.readme.io/docs/expiry-policies
>>
>>
>> Regards,
>> Denis
>>
>>
>> On 12/3/2015 12:50 PM, Welly Tambunan wrote:
>>
>> Hi Igniters,
>>
>> Currently we are trying to asses the possibility of using Ignite on our
>> Architecture.
>>
>> We have a case where we want to store time series data in memory.  We
>> will have a lots of sensor data. So i think i will use sensor id as a key
>> to retrieve the time series from cache.
>>
>> However i can't find any sorted list structure in ignite to store our
>> time series. The index can be long ( for time ). So it will need to be
>> sorted by index.
>>
>>
>> We also have a query for getting range of index, ex: Give me all series
>> from start idx to end idx.
>> For updating we also need to be able to update a range with new data
>> series.
>>
>> We just don't want to re uploaded the data again over and over again to
>> cache everytime there's an update on the range. We want to be able to
>> update the cache partially based on the range.
>>
>> Is there any way we can achieve this on Ignite ?
>>
>> Any suggestion or reference would be really appreciated.
>>
>> Cheers
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

Re: Storing Time Series data efficiently on Ignite

Posted by Denis Magda <dm...@gridgain.com>.
Welly,

Please see below

On 12/3/2015 3:33 PM, Welly Tambunan wrote:
> Hi Denis,
>
> Thanks for your clear explanation.
>
> Our data structure is something like this one <sensorid: UUID, time: 
> Long, value: Double>
>
> When i put a data point with that composite key. <sensorid, time>, is 
> there any guarantee that it will be store close together in a same node ?
>
Yes, to force such a behavior you should use so called affinity collocation.
https://apacheignite.readme.io/docs/affinity-collocation

In you case you can mark sensorId as an affinity key and this will 
enforce all its data (and keys) to be stored on the same partition where 
sensorID is mapped to.

SampleKey {
@AffinityKeyMapped
int sernsorId;

long sampleTime;
}

> In case of update, however we need to be able to update in range, like 
> replace the range <startId, endIdx, List[DataPoint]>, (where DataPoint 
> => <time, value> )
> so it will clear that range and insert the new data point into that 
> range.
If the result set of such query is not significant then you can split it 
into two steps:
- use SQL query to retrieve keys which entries should be updated or 
removed according to you 'WHERE' clause above;
- use removeAll or putAll to delete or update the values for the keys.

>
> So we need to do two step process to do that ? Select the key and then 
> delete the key ?
>
If the questions are related to the range like query above, then, yes in 
cases if the SQL result set is not significant follow this way.
Otherwise it's always possible to come up with other solution.

Regards,
Denis
>
> Thanks
>
>
>
>
>
> On Thu, Dec 3, 2015 at 7:00 PM, Denis Magda <dmagda@gridgain.com 
> <ma...@gridgain.com>> wrote:
>
>     Hi Welly,
>
>     Ignite perfectly fits for your task.
>
>     First, if I understand you properly there are will be many time
>     series for a give sensor ID.
>     If so then I would use a compound key like (sensorId, time) for
>     all cache related operations.
>
>     As an example you may want to use classes like this.
>
>     SampleKey {
>     int sernsorId;
>     long sampleTime;
>     }
>
>     Sample {
>     int sensorId;
>     long sampleTime;
>     byte[] data1;
>     byte[] data2;
>     etc.
>     }
>
>     And use them this way
>
>     cache.put(new SampleKey(1, time), sample);
>     cache.get(new SampleKey(2, time));
>
>     Second, to retrieve samples depending on sensor ID, time (time
>     range) or other parameters you can leverage Ignite SQL engine that
>     is designed exactly for the use cases you have. [1]
>     However, if you're going to use an object field in 'SELECT' or
>     'WHERE' clause you have to annotate it properly or specify using
>     CacheTypeMetadata [2]
>
>     Third, when you need to update data series you can remove the old
>     one and insert the new one that should have new time.
>     To perform a remove I would suggest doing the following:
>     - select sensor ID and sampleTime of all the entries to delete;
>     - use cache.removeAll by passing SampleKeys that are created using
>     the data retrieved with SQL above.
>
>     Moreover, you can use an eviction or expire policy that is used in
>     cases when old data must be removed from cache automatically.
>     Just refer to these articles for more info  - [3]
>
>     Finally, Ignite has bunch of cache and SQL related examples. They
>     are located in "datagrid" folder of "examples" module.
>     Have a look at them and probably you'll come up with better
>     solution based on Ignite that suggested by me above cause
>     definitely you know all the details of your case better ;)
>
>     [1] https://apacheignite.readme.io/docs/sql-queries
>     [2]
>     https://apacheignite.readme.io/docs/sql-queries#configuring-sql-indexes-by-annotations
>     [3] https://apacheignite.readme.io/docs/evictions
>     [4] https://apacheignite.readme.io/docs/expiry-policies
>
>
>     Regards,
>     Denis
>
>
>     On 12/3/2015 12:50 PM, Welly Tambunan wrote:
>>     Hi Igniters,
>>
>>     Currently we are trying to asses the possibility of using Ignite
>>     on our Architecture.
>>
>>     We have a case where we want to store time series data in
>>     memory.  We will have a lots of sensor data. So i think i will
>>     use sensor id as a key to retrieve the time series from cache.
>>
>>     However i can't find any sorted list structure in ignite to store
>>     our time series. The index can be long ( for time ). So it will
>>     need to be sorted by index.
>>
>>
>>     We also have a query for getting range of index, ex: Give me all
>>     series from start idx to end idx.
>>     For updating we also need to be able to update a range with new
>>     data series.
>>
>>     We just don't want to re uploaded the data again over and over
>>     again to cache everytime there's an update on the range. We want
>>     to be able to update the cache partially based on the range.
>>
>>     Is there any way we can achieve this on Ignite ?
>>
>>     Any suggestion or reference would be really appreciated.
>>
>>     Cheers
>>
>>     -- 
>>     Welly Tambunan
>>     Triplelands
>>
>>     http://weltam.wordpress.com
>>     http://www.triplelands.com <http://www.triplelands.com/blog/>
>
>
>
>
> -- 
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>


Re: Storing Time Series data efficiently on Ignite

Posted by Welly Tambunan <if...@gmail.com>.
Hi Denis,

Thanks for your clear explanation.

Our data structure is something like this one <sensorid: UUID, time: Long,
value: Double>

When i put a data point with that composite key. <sensorid, time>, is there
any guarantee that it will be store close together in a same node ?

In case of update, however we need to be able to update in range, like
replace the range <startId, endIdx, List[DataPoint]>, (where DataPoint =>
<time, value> )
so it will clear that range and insert the new data point into that range.

So we need to do two step process to do that ? Select the key and then
delete the key ?


Thanks





On Thu, Dec 3, 2015 at 7:00 PM, Denis Magda <dm...@gridgain.com> wrote:

> Hi Welly,
>
> Ignite perfectly fits for your task.
>
> First, if I understand you properly there are will be many time series for
> a give sensor ID.
> If so then I would use a compound key like (sensorId, time) for all cache
> related operations.
>
> As an example you may want to use classes like this.
>
> SampleKey {
> int sernsorId;
> long sampleTime;
> }
>
> Sample {
> int sensorId;
> long sampleTime;
> byte[] data1;
> byte[] data2;
> etc.
> }
>
> And use them this way
>
> cache.put(new SampleKey(1, time), sample);
> cache.get(new SampleKey(2, time));
>
> Second, to retrieve samples depending on sensor ID, time (time range) or
> other parameters you can leverage Ignite SQL engine that is designed
> exactly for the use cases you have. [1]
> However, if you're going to use an object field in 'SELECT' or 'WHERE'
> clause you have to annotate it properly or specify using CacheTypeMetadata
> [2]
>
> Third, when you need to update data series you can remove the old one and
> insert the new one that should have new time.
> To perform a remove I would suggest doing the following:
> - select sensor ID and sampleTime of all the entries to delete;
> - use cache.removeAll by passing SampleKeys that are created using the
> data retrieved with SQL above.
>
> Moreover, you can use an eviction or expire policy that is used in cases
> when old data must be removed from cache automatically.
> Just refer to these articles for more info  - [3]
>
> Finally, Ignite has bunch of cache and SQL related examples. They are
> located in "datagrid" folder of "examples" module.
> Have a look at them and probably you'll come up with better solution based
> on Ignite that suggested by me above cause definitely you know all the
> details of your case better ;)
>
> [1] https://apacheignite.readme.io/docs/sql-queries
> [2]
> https://apacheignite.readme.io/docs/sql-queries#configuring-sql-indexes-by-annotations
> [3] https://apacheignite.readme.io/docs/evictions
> [4] https://apacheignite.readme.io/docs/expiry-policies
>
>
> Regards,
> Denis
>
>
> On 12/3/2015 12:50 PM, Welly Tambunan wrote:
>
> Hi Igniters,
>
> Currently we are trying to asses the possibility of using Ignite on our
> Architecture.
>
> We have a case where we want to store time series data in memory.  We will
> have a lots of sensor data. So i think i will use sensor id as a key to
> retrieve the time series from cache.
>
> However i can't find any sorted list structure in ignite to store our time
> series. The index can be long ( for time ). So it will need to be sorted by
> index.
>
>
> We also have a query for getting range of index, ex: Give me all series
> from start idx to end idx.
> For updating we also need to be able to update a range with new data
> series.
>
> We just don't want to re uploaded the data again over and over again to
> cache everytime there's an update on the range. We want to be able to
> update the cache partially based on the range.
>
> Is there any way we can achieve this on Ignite ?
>
> Any suggestion or reference would be really appreciated.
>
> Cheers
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

Re: Storing Time Series data efficiently on Ignite

Posted by Denis Magda <dm...@gridgain.com>.
Hi Welly,

Ignite perfectly fits for your task.

First, if I understand you properly there are will be many time series 
for a give sensor ID.
If so then I would use a compound key like (sensorId, time) for all 
cache related operations.

As an example you may want to use classes like this.

SampleKey {
int sernsorId;
long sampleTime;
}

Sample {
int sensorId;
long sampleTime;
byte[] data1;
byte[] data2;
etc.
}

And use them this way

cache.put(new SampleKey(1, time), sample);
cache.get(new SampleKey(2, time));

Second, to retrieve samples depending on sensor ID, time (time range) or 
other parameters you can leverage Ignite SQL engine that is designed 
exactly for the use cases you have. [1]
However, if you're going to use an object field in 'SELECT' or 'WHERE' 
clause you have to annotate it properly or specify using 
CacheTypeMetadata [2]

Third, when you need to update data series you can remove the old one 
and insert the new one that should have new time.
To perform a remove I would suggest doing the following:
- select sensor ID and sampleTime of all the entries to delete;
- use cache.removeAll by passing SampleKeys that are created using the 
data retrieved with SQL above.

Moreover, you can use an eviction or expire policy that is used in cases 
when old data must be removed from cache automatically.
Just refer to these articles for more info  - [3]

Finally, Ignite has bunch of cache and SQL related examples. They are 
located in "datagrid" folder of "examples" module.
Have a look at them and probably you'll come up with better solution 
based on Ignite that suggested by me above cause definitely you know all 
the details of your case better ;)

[1] https://apacheignite.readme.io/docs/sql-queries
[2] 
https://apacheignite.readme.io/docs/sql-queries#configuring-sql-indexes-by-annotations
[3] https://apacheignite.readme.io/docs/evictions
[4] https://apacheignite.readme.io/docs/expiry-policies


Regards,
Denis

On 12/3/2015 12:50 PM, Welly Tambunan wrote:
> Hi Igniters,
>
> Currently we are trying to asses the possibility of using Ignite on 
> our Architecture.
>
> We have a case where we want to store time series data in memory.  We 
> will have a lots of sensor data. So i think i will use sensor id as a 
> key to retrieve the time series from cache.
>
> However i can't find any sorted list structure in ignite to store our 
> time series. The index can be long ( for time ). So it will need to be 
> sorted by index.
>
>
> We also have a query for getting range of index, ex: Give me all 
> series from start idx to end idx.
> For updating we also need to be able to update a range with new data 
> series.
>
> We just don't want to re uploaded the data again over and over again 
> to cache everytime there's an update on the range. We want to be able 
> to update the cache partially based on the range.
>
> Is there any way we can achieve this on Ignite ?
>
> Any suggestion or reference would be really appreciated.
>
> Cheers
>
> -- 
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>