You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@geode.apache.org by Olivier Mallassi <ol...@gmail.com> on 2016/05/04 09:34:57 UTC

Geode and data snapshotting

Hi everybody

I am facing an issue and do not know what would be the right pattern. I
guess you can help.

The need is to create snapshot of datas:
- let's say you have a stream of incoming objects that you want to store in
a region; let's say *MyRegion*. Clients are listening (via CQ) to updates
on *MyRegion*.
- at fixed period (e.g. every 3 sec or every hours depending on the case)
you want to snapshot these datas (while keeping updated the *MyRegion *with
incoming objects). Let's say the snapshotted region follow the convention
*MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am currently thinking
about keeping a fixed number of snapshots and rolling on them.

I see several options to implement this.
- *option#1*: at fixed period, I execute a function to copy data from *MyRegion
*to *MyRegion/snapshot-id1*. not sure it works fine with large amount of
data. not sure how to well handle new objects arriving in *MyRegion *while
I am snapshotting it.

- *option#2*: I write the object twice: once in *MyRegion *and also in
*MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one. then
switching to a new snapshot is about writing the objects in *MyRegion *and
*MyRegion/snapshot-idN+1*.

Regarding option#2 (which is my preferred one but I may be wrong), I see
two implementations:
- *implem#1*. use a custom function that writes the object twice (regions
can be collocated etc...)? I can use local transaction within the function
in order to guarantee consistency between both regions.
- *implem#2*. I can use Listener and use AsyncEventListener. if they are
declared on multiple nodes, I assume there is no risk of losing data in
case of failure (e.g. a node crashes before all the "objects" in
AsyncListener are processed) ?

Implem#1 looks easier to me (and I do not think it costs me a lot more in
terms of performance than the HA AsyncEventListener).

What would be your opinions? favorite options? alternative options?

I hope my email is clear enough. Many thanks for your help.

olivier.

Re: Geode and data snapshotting

Posted by Olivier Mallassi <ol...@gmail.com>.

Aggr size should be around 500 bytes max
(Will try a first implem)


Thx

On Saturday, 7 May 2016, Michael Stolz <ms...@pivotal.io> wrote:

> For the CQ to deliver only latest, you would need to have a separately
> keyed "Latest" entity that they can register the interest on.
>
> How big are each of the aggregates? If they are large you will not get
> much benefit from my array model.
>
> The array model is ideal for fixed numbers of doubles or integers like
> availability counts and rates in hotel systems or 5 minute prices on
> financial instruments.
>
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: 631-835-4771
>
> On Fri, May 6, 2016 at 12:27 PM, Olivier Mallassi <
> olivier.mallassi@gmail.com
> <javascript:_e(%7B%7D,'cvml','olivier.mallassi@gmail.com');>> wrote:
>
>> Hi all
>>
>> thank you for your answer.
>>
>> Mike, this is not for Market Data (one day...) but it is more related to
>> our geode / Storm integration (as you know).
>>
>> At one point, I need to snapshot my aggregates: every xx minutes, a
>> specific event is emitted. this event specify a txId (long). and, in the
>> end, every txId matches to a snaphsot (aka well known version of the
>> aggregates)
>>
>> I was thinking about using regions like MyRegion/txID1, MyRegion/txID2
>> etc...
>>
>> I like your pattern and it could work and be modeled like
>>
>> key: aggregateKey =  a.b.c
>> value: aggregates[] where the index 0 is the latest txId, index 1
>> previous txId and so on
>>
>> The thing with this model (and this is maybe not a real issue) is that,
>> as I have CQ, the client will be notified with aggregates[] and not only
>> the latest objects. (but if I implement delta propagation?)
>>
>> Maybe another option (in my case) would be to use the txId in the key.
>> key: aggregateKey = [a.b.c, txID1]
>> value: aggregate
>>
>> if you have any ideas :) but in all cases, thank you.
>>
>> oliv/
>>
>> On Thu, May 5, 2016 at 12:52 AM, Michael Stolz <mstolz@pivotal.io
>> <javascript:_e(%7B%7D,'cvml','mstolz@pivotal.io');>> wrote:
>>
>>> Yes the lists can be first class objects with the same key as the
>>> description object and possibly some sort of date stamp appended, depending
>>> on how many observations over how many days you want to keep.
>>>
>>> Yes, I think this model can be used very well for any periodic
>>> time-series data, and would therefore be a very useful pattern.
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Mike Stolz
>>> Principal Engineer, GemFire Product Manager
>>> Mobile: 631-835-4771
>>>
>>> On Wed, May 4, 2016 at 10:45 AM, Alan Kash <crudbug@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','crudbug@gmail.com');>> wrote:
>>>
>>>> Mike,
>>>>
>>>> The model you just described, are you referring to one parent object
>>>> which describes an Entity and multiple List objects to describe measurable
>>>> metrics (e.g. stock price, temperature) with constant Array objects to
>>>> store time slices ?
>>>>
>>>> Metadata-Object
>>>>     - List of [metric1 timeslice array] - List<Array>
>>>>     - List of [metric2 timeslice array]
>>>>
>>>> How will the indexes work in this case ?
>>>>
>>>> This model can be used as a general time-series pattern for Geode.
>>>>
>>>> Thanks,
>>>> Alan
>>>>
>>>> On Wed, May 4, 2016 at 9:56 AM, Michael Stolz <mstolz@pivotal.io
>>>> <javascript:_e(%7B%7D,'cvml','mstolz@pivotal.io');>> wrote:
>>>>
>>>>> If what you are trying to do is get a consistent picture of market
>>>>> data and trade data at a point in time, then maybe some form of temporal
>>>>> storage organization would give you the best approach.
>>>>>
>>>>> If you can define a regular interval we can do a very elegant
>>>>> mechanism based on fixed length arrays in GemFire that contain point in
>>>>> time snapshots of the rapidly changing elements. For instance, you might
>>>>> want a single top-level market data description object and then a price
>>>>> object with individual prices at 5 minute intervals built as a simple array
>>>>> of doubles.
>>>>>
>>>>> Does that sound like it might be a workable pattern for you?
>>>>>
>>>>>
>>>>> --
>>>>> Mike Stolz
>>>>> Principal Engineer, GemFire Product Manager
>>>>> Mobile: 631-835-4771
>>>>>
>>>>> On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <
>>>>> olivier.mallassi@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','olivier.mallassi@gmail.com');>> wrote:
>>>>>
>>>>>> Hi everybody
>>>>>>
>>>>>> I am facing an issue and do not know what would be the right pattern.
>>>>>> I guess you can help.
>>>>>>
>>>>>> The need is to create snapshot of datas:
>>>>>> - let's say you have a stream of incoming objects that you want to
>>>>>> store in a region; let's say *MyRegion*. Clients are listening (via
>>>>>> CQ) to updates on *MyRegion*.
>>>>>> - at fixed period (e.g. every 3 sec or every hours depending on the
>>>>>> case) you want to snapshot these datas (while keeping updated the *MyRegion
>>>>>> *with incoming objects). Let's say the snapshotted region follow the
>>>>>> convention *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am
>>>>>> currently thinking about keeping a fixed number of snapshots and rolling on
>>>>>> them.
>>>>>>
>>>>>> I see several options to implement this.
>>>>>> - *option#1*: at fixed period, I execute a function to copy data
>>>>>> from *MyRegion *to *MyRegion/snapshot-id1*. not sure it works fine
>>>>>> with large amount of data. not sure how to well handle new objects arriving
>>>>>> in *MyRegion *while I am snapshotting it.
>>>>>>
>>>>>> - *option#2*: I write the object twice: once in *MyRegion *and also
>>>>>> in *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest
>>>>>> one. then switching to a new snapshot is about writing the objects in *MyRegion
>>>>>> *and *MyRegion/snapshot-idN+1*.
>>>>>>
>>>>>> Regarding option#2 (which is my preferred one but I may be wrong), I
>>>>>> see two implementations:
>>>>>> - *implem#1*. use a custom function that writes the object twice
>>>>>> (regions can be collocated etc...)? I can use local transaction within the
>>>>>> function in order to guarantee consistency between both regions.
>>>>>> - *implem#2*. I can use Listener and use AsyncEventListener. if they
>>>>>> are declared on multiple nodes, I assume there is no risk of losing data in
>>>>>> case of failure (e.g. a node crashes before all the "objects" in
>>>>>> AsyncListener are processed) ?
>>>>>>
>>>>>> Implem#1 looks easier to me (and I do not think it costs me a lot
>>>>>> more in terms of performance than the HA AsyncEventListener).
>>>>>>
>>>>>> What would be your opinions? favorite options? alternative options?
>>>>>>
>>>>>> I hope my email is clear enough. Many thanks for your help.
>>>>>>
>>>>>> olivier.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Geode and data snapshotting

Posted by Michael Stolz <ms...@pivotal.io>.

For the CQ to deliver only latest, you would need to have a separately
keyed "Latest" entity that they can register the interest on.

How big are each of the aggregates? If they are large you will not get much
benefit from my array model.

The array model is ideal for fixed numbers of doubles or integers like
availability counts and rates in hotel systems or 5 minute prices on
financial instruments.

--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: 631-835-4771

On Fri, May 6, 2016 at 12:27 PM, Olivier Mallassi <
olivier.mallassi@gmail.com> wrote:

> Hi all
>
> thank you for your answer.
>
> Mike, this is not for Market Data (one day...) but it is more related to
> our geode / Storm integration (as you know).
>
> At one point, I need to snapshot my aggregates: every xx minutes, a
> specific event is emitted. this event specify a txId (long). and, in the
> end, every txId matches to a snaphsot (aka well known version of the
> aggregates)
>
> I was thinking about using regions like MyRegion/txID1, MyRegion/txID2
> etc...
>
> I like your pattern and it could work and be modeled like
>
> key: aggregateKey =  a.b.c
> value: aggregates[] where the index 0 is the latest txId, index 1
> previous txId and so on
>
> The thing with this model (and this is maybe not a real issue) is that, as
> I have CQ, the client will be notified with aggregates[] and not only the
> latest objects. (but if I implement delta propagation?)
>
> Maybe another option (in my case) would be to use the txId in the key.
> key: aggregateKey = [a.b.c, txID1]
> value: aggregate
>
> if you have any ideas :) but in all cases, thank you.
>
> oliv/
>
> On Thu, May 5, 2016 at 12:52 AM, Michael Stolz <ms...@pivotal.io> wrote:
>
>> Yes the lists can be first class objects with the same key as the
>> description object and possibly some sort of date stamp appended, depending
>> on how many observations over how many days you want to keep.
>>
>> Yes, I think this model can be used very well for any periodic
>> time-series data, and would therefore be a very useful pattern.
>>
>>
>>
>>
>>
>> --
>> Mike Stolz
>> Principal Engineer, GemFire Product Manager
>> Mobile: 631-835-4771
>>
>> On Wed, May 4, 2016 at 10:45 AM, Alan Kash <cr...@gmail.com> wrote:
>>
>>> Mike,
>>>
>>> The model you just described, are you referring to one parent object
>>> which describes an Entity and multiple List objects to describe measurable
>>> metrics (e.g. stock price, temperature) with constant Array objects to
>>> store time slices ?
>>>
>>> Metadata-Object
>>>     - List of [metric1 timeslice array] - List<Array>
>>>     - List of [metric2 timeslice array]
>>>
>>> How will the indexes work in this case ?
>>>
>>> This model can be used as a general time-series pattern for Geode.
>>>
>>> Thanks,
>>> Alan
>>>
>>> On Wed, May 4, 2016 at 9:56 AM, Michael Stolz <ms...@pivotal.io> wrote:
>>>
>>>> If what you are trying to do is get a consistent picture of market data
>>>> and trade data at a point in time, then maybe some form of temporal storage
>>>> organization would give you the best approach.
>>>>
>>>> If you can define a regular interval we can do a very elegant mechanism
>>>> based on fixed length arrays in GemFire that contain point in time
>>>> snapshots of the rapidly changing elements. For instance, you might want a
>>>> single top-level market data description object and then a price object
>>>> with individual prices at 5 minute intervals built as a simple array of
>>>> doubles.
>>>>
>>>> Does that sound like it might be a workable pattern for you?
>>>>
>>>>
>>>> --
>>>> Mike Stolz
>>>> Principal Engineer, GemFire Product Manager
>>>> Mobile: 631-835-4771
>>>>
>>>> On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <
>>>> olivier.mallassi@gmail.com> wrote:
>>>>
>>>>> Hi everybody
>>>>>
>>>>> I am facing an issue and do not know what would be the right pattern.
>>>>> I guess you can help.
>>>>>
>>>>> The need is to create snapshot of datas:
>>>>> - let's say you have a stream of incoming objects that you want to
>>>>> store in a region; let's say *MyRegion*. Clients are listening (via
>>>>> CQ) to updates on *MyRegion*.
>>>>> - at fixed period (e.g. every 3 sec or every hours depending on the
>>>>> case) you want to snapshot these datas (while keeping updated the *MyRegion
>>>>> *with incoming objects). Let's say the snapshotted region follow the
>>>>> convention *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am
>>>>> currently thinking about keeping a fixed number of snapshots and rolling on
>>>>> them.
>>>>>
>>>>> I see several options to implement this.
>>>>> - *option#1*: at fixed period, I execute a function to copy data from *MyRegion
>>>>> *to *MyRegion/snapshot-id1*. not sure it works fine with large amount
>>>>> of data. not sure how to well handle new objects arriving in *MyRegion
>>>>> *while I am snapshotting it.
>>>>>
>>>>> - *option#2*: I write the object twice: once in *MyRegion *and also
>>>>> in *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one.
>>>>> then switching to a new snapshot is about writing the objects in *MyRegion
>>>>> *and *MyRegion/snapshot-idN+1*.
>>>>>
>>>>> Regarding option#2 (which is my preferred one but I may be wrong), I
>>>>> see two implementations:
>>>>> - *implem#1*. use a custom function that writes the object twice
>>>>> (regions can be collocated etc...)? I can use local transaction within the
>>>>> function in order to guarantee consistency between both regions.
>>>>> - *implem#2*. I can use Listener and use AsyncEventListener. if they
>>>>> are declared on multiple nodes, I assume there is no risk of losing data in
>>>>> case of failure (e.g. a node crashes before all the "objects" in
>>>>> AsyncListener are processed) ?
>>>>>
>>>>> Implem#1 looks easier to me (and I do not think it costs me a lot more
>>>>> in terms of performance than the HA AsyncEventListener).
>>>>>
>>>>> What would be your opinions? favorite options? alternative options?
>>>>>
>>>>> I hope my email is clear enough. Many thanks for your help.
>>>>>
>>>>> olivier.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Geode and data snapshotting

Posted by Olivier Mallassi <ol...@gmail.com>.

Hi all

thank you for your answer.

Mike, this is not for Market Data (one day...) but it is more related to
our geode / Storm integration (as you know).

At one point, I need to snapshot my aggregates: every xx minutes, a
specific event is emitted. this event specify a txId (long). and, in the
end, every txId matches to a snaphsot (aka well known version of the
aggregates)

I was thinking about using regions like MyRegion/txID1, MyRegion/txID2
etc...

I like your pattern and it could work and be modeled like

key: aggregateKey =  a.b.c
value: aggregates[] where the index 0 is the latest txId, index 1 previous
txId and so on

The thing with this model (and this is maybe not a real issue) is that, as
I have CQ, the client will be notified with aggregates[] and not only the
latest objects. (but if I implement delta propagation?)

Maybe another option (in my case) would be to use the txId in the key.
key: aggregateKey = [a.b.c, txID1]
value: aggregate

if you have any ideas :) but in all cases, thank you.

oliv/

On Thu, May 5, 2016 at 12:52 AM, Michael Stolz <ms...@pivotal.io> wrote:

> Yes the lists can be first class objects with the same key as the
> description object and possibly some sort of date stamp appended, depending
> on how many observations over how many days you want to keep.
>
> Yes, I think this model can be used very well for any periodic time-series
> data, and would therefore be a very useful pattern.
>
>
>
>
>
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: 631-835-4771
>
> On Wed, May 4, 2016 at 10:45 AM, Alan Kash <cr...@gmail.com> wrote:
>
>> Mike,
>>
>> The model you just described, are you referring to one parent object
>> which describes an Entity and multiple List objects to describe measurable
>> metrics (e.g. stock price, temperature) with constant Array objects to
>> store time slices ?
>>
>> Metadata-Object
>>     - List of [metric1 timeslice array] - List<Array>
>>     - List of [metric2 timeslice array]
>>
>> How will the indexes work in this case ?
>>
>> This model can be used as a general time-series pattern for Geode.
>>
>> Thanks,
>> Alan
>>
>> On Wed, May 4, 2016 at 9:56 AM, Michael Stolz <ms...@pivotal.io> wrote:
>>
>>> If what you are trying to do is get a consistent picture of market data
>>> and trade data at a point in time, then maybe some form of temporal storage
>>> organization would give you the best approach.
>>>
>>> If you can define a regular interval we can do a very elegant mechanism
>>> based on fixed length arrays in GemFire that contain point in time
>>> snapshots of the rapidly changing elements. For instance, you might want a
>>> single top-level market data description object and then a price object
>>> with individual prices at 5 minute intervals built as a simple array of
>>> doubles.
>>>
>>> Does that sound like it might be a workable pattern for you?
>>>
>>>
>>> --
>>> Mike Stolz
>>> Principal Engineer, GemFire Product Manager
>>> Mobile: 631-835-4771
>>>
>>> On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <
>>> olivier.mallassi@gmail.com> wrote:
>>>
>>>> Hi everybody
>>>>
>>>> I am facing an issue and do not know what would be the right pattern. I
>>>> guess you can help.
>>>>
>>>> The need is to create snapshot of datas:
>>>> - let's say you have a stream of incoming objects that you want to
>>>> store in a region; let's say *MyRegion*. Clients are listening (via
>>>> CQ) to updates on *MyRegion*.
>>>> - at fixed period (e.g. every 3 sec or every hours depending on the
>>>> case) you want to snapshot these datas (while keeping updated the *MyRegion
>>>> *with incoming objects). Let's say the snapshotted region follow the
>>>> convention *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am
>>>> currently thinking about keeping a fixed number of snapshots and rolling on
>>>> them.
>>>>
>>>> I see several options to implement this.
>>>> - *option#1*: at fixed period, I execute a function to copy data from *MyRegion
>>>> *to *MyRegion/snapshot-id1*. not sure it works fine with large amount
>>>> of data. not sure how to well handle new objects arriving in *MyRegion
>>>> *while I am snapshotting it.
>>>>
>>>> - *option#2*: I write the object twice: once in *MyRegion *and also in
>>>> *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one.
>>>> then switching to a new snapshot is about writing the objects in *MyRegion
>>>> *and *MyRegion/snapshot-idN+1*.
>>>>
>>>> Regarding option#2 (which is my preferred one but I may be wrong), I
>>>> see two implementations:
>>>> - *implem#1*. use a custom function that writes the object twice
>>>> (regions can be collocated etc...)? I can use local transaction within the
>>>> function in order to guarantee consistency between both regions.
>>>> - *implem#2*. I can use Listener and use AsyncEventListener. if they
>>>> are declared on multiple nodes, I assume there is no risk of losing data in
>>>> case of failure (e.g. a node crashes before all the "objects" in
>>>> AsyncListener are processed) ?
>>>>
>>>> Implem#1 looks easier to me (and I do not think it costs me a lot more
>>>> in terms of performance than the HA AsyncEventListener).
>>>>
>>>> What would be your opinions? favorite options? alternative options?
>>>>
>>>> I hope my email is clear enough. Many thanks for your help.
>>>>
>>>> olivier.
>>>>
>>>
>>>
>>
>

Re: Geode and data snapshotting

Posted by Michael Stolz <ms...@pivotal.io>.

Yes the lists can be first class objects with the same key as the
description object and possibly some sort of date stamp appended, depending
on how many observations over how many days you want to keep.

Yes, I think this model can be used very well for any periodic time-series
data, and would therefore be a very useful pattern.





--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: 631-835-4771

On Wed, May 4, 2016 at 10:45 AM, Alan Kash <cr...@gmail.com> wrote:

> Mike,
>
> The model you just described, are you referring to one parent object which
> describes an Entity and multiple List objects to describe measurable
> metrics (e.g. stock price, temperature) with constant Array objects to
> store time slices ?
>
> Metadata-Object
>     - List of [metric1 timeslice array] - List<Array>
>     - List of [metric2 timeslice array]
>
> How will the indexes work in this case ?
>
> This model can be used as a general time-series pattern for Geode.
>
> Thanks,
> Alan
>
> On Wed, May 4, 2016 at 9:56 AM, Michael Stolz <ms...@pivotal.io> wrote:
>
>> If what you are trying to do is get a consistent picture of market data
>> and trade data at a point in time, then maybe some form of temporal storage
>> organization would give you the best approach.
>>
>> If you can define a regular interval we can do a very elegant mechanism
>> based on fixed length arrays in GemFire that contain point in time
>> snapshots of the rapidly changing elements. For instance, you might want a
>> single top-level market data description object and then a price object
>> with individual prices at 5 minute intervals built as a simple array of
>> doubles.
>>
>> Does that sound like it might be a workable pattern for you?
>>
>>
>> --
>> Mike Stolz
>> Principal Engineer, GemFire Product Manager
>> Mobile: 631-835-4771
>>
>> On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <
>> olivier.mallassi@gmail.com> wrote:
>>
>>> Hi everybody
>>>
>>> I am facing an issue and do not know what would be the right pattern. I
>>> guess you can help.
>>>
>>> The need is to create snapshot of datas:
>>> - let's say you have a stream of incoming objects that you want to store
>>> in a region; let's say *MyRegion*. Clients are listening (via CQ) to
>>> updates on *MyRegion*.
>>> - at fixed period (e.g. every 3 sec or every hours depending on the
>>> case) you want to snapshot these datas (while keeping updated the *MyRegion
>>> *with incoming objects). Let's say the snapshotted region follow the
>>> convention *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am
>>> currently thinking about keeping a fixed number of snapshots and rolling on
>>> them.
>>>
>>> I see several options to implement this.
>>> - *option#1*: at fixed period, I execute a function to copy data from *MyRegion
>>> *to *MyRegion/snapshot-id1*. not sure it works fine with large amount
>>> of data. not sure how to well handle new objects arriving in *MyRegion *while
>>> I am snapshotting it.
>>>
>>> - *option#2*: I write the object twice: once in *MyRegion *and also in
>>> *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one. then
>>> switching to a new snapshot is about writing the objects in *MyRegion *and
>>> *MyRegion/snapshot-idN+1*.
>>>
>>> Regarding option#2 (which is my preferred one but I may be wrong), I see
>>> two implementations:
>>> - *implem#1*. use a custom function that writes the object twice
>>> (regions can be collocated etc...)? I can use local transaction within the
>>> function in order to guarantee consistency between both regions.
>>> - *implem#2*. I can use Listener and use AsyncEventListener. if they
>>> are declared on multiple nodes, I assume there is no risk of losing data in
>>> case of failure (e.g. a node crashes before all the "objects" in
>>> AsyncListener are processed) ?
>>>
>>> Implem#1 looks easier to me (and I do not think it costs me a lot more
>>> in terms of performance than the HA AsyncEventListener).
>>>
>>> What would be your opinions? favorite options? alternative options?
>>>
>>> I hope my email is clear enough. Many thanks for your help.
>>>
>>> olivier.
>>>
>>
>>
>

Re: Geode and data snapshotting

Posted by Real Wes Williams <Th...@outlook.com>.

>> How will the indexes work in this case ?

They won’t directly against the aggregate. One solution is to think of a CQRS design where you have your write model - as Mike described with the time series - and the view model that contains your indexes for fast lookup against the view.  A poor man’s approach would be to create your view model (and/or indexes) in the CacheListener on the MetaData-object, which is your aggregate.

> On May 4, 2016, at 11:45 AM, Alan Kash <cr...@gmail.com> wrote:
> 
> Mike,
> 
> The model you just described, are you referring to one parent object which describes an Entity and multiple List objects to describe measurable metrics (e.g. stock price, temperature) with constant Array objects to store time slices ? 
> 
> Metadata-Object
>     - List of [metric1 timeslice array] - List<Array>
>     - List of [metric2 timeslice array] 
> 
> How will the indexes work in this case ?
> 
> This model can be used as a general time-series pattern for Geode.
> 
> Thanks,
> Alan
> 
> On Wed, May 4, 2016 at 9:56 AM, Michael Stolz <mstolz@pivotal.io <ma...@pivotal.io>> wrote:
> If what you are trying to do is get a consistent picture of market data and trade data at a point in time, then maybe some form of temporal storage organization would give you the best approach.
> 
> If you can define a regular interval we can do a very elegant mechanism based on fixed length arrays in GemFire that contain point in time snapshots of the rapidly changing elements. For instance, you might want a single top-level market data description object and then a price object with individual prices at 5 minute intervals built as a simple array of doubles.
> 
> Does that sound like it might be a workable pattern for you?
> 
> 
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager 
> Mobile: 631-835-4771 <tel:631-835-4771>
> On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <olivier.mallassi@gmail.com <ma...@gmail.com>> wrote:
> Hi everybody
> 
> I am facing an issue and do not know what would be the right pattern. I guess you can help. 
> 
> The need is to create snapshot of datas: 
> - let's say you have a stream of incoming objects that you want to store in a region; let's say MyRegion. Clients are listening (via CQ) to updates on MyRegion. 
> - at fixed period (e.g. every 3 sec or every hours depending on the case) you want to snapshot these datas (while keeping updated the MyRegion with incoming objects). Let's say the snapshotted region follow the convention MyRegion/snapshot-id1, MyRegion/snapshot-id2... I am currently thinking about keeping a fixed number of snapshots and rolling on them. 
> 
> I see several options to implement this. 
> - option#1: at fixed period, I execute a function to copy data from MyRegion to MyRegion/snapshot-id1. not sure it works fine with large amount of data. not sure how to well handle new objects arriving in MyRegion while I am snapshotting it. 
> 
> - option#2: I write the object twice: once in MyRegion and also in MyRegion/snapshot-idN assuming snapshot-idN is the latest one. then switching to a new snapshot is about writing the objects in MyRegion and MyRegion/snapshot-idN+1.
> 
> Regarding option#2 (which is my preferred one but I may be wrong), I see two implementations:
> - implem#1. use a custom function that writes the object twice (regions can be collocated etc...)? I can use local transaction within the function in order to guarantee consistency between both regions. 
> - implem#2. I can use Listener and use AsyncEventListener. if they are declared on multiple nodes, I assume there is no risk of losing data in case of failure (e.g. a node crashes before all the "objects" in AsyncListener are processed) ? 
> 
> Implem#1 looks easier to me (and I do not think it costs me a lot more in terms of performance than the HA AsyncEventListener). 
> 
> What would be your opinions? favorite options? alternative options? 
> 
> I hope my email is clear enough. Many thanks for your help. 
> 
> olivier. 
> 
>

Re: Geode and data snapshotting

Posted by Alan Kash <cr...@gmail.com>.

Mike,

The model you just described, are you referring to one parent object which
describes an Entity and multiple List objects to describe measurable
metrics (e.g. stock price, temperature) with constant Array objects to
store time slices ?

Metadata-Object
    - List of [metric1 timeslice array] - List<Array>
    - List of [metric2 timeslice array]

How will the indexes work in this case ?

This model can be used as a general time-series pattern for Geode.

Thanks,
Alan

On Wed, May 4, 2016 at 9:56 AM, Michael Stolz <ms...@pivotal.io> wrote:

> If what you are trying to do is get a consistent picture of market data
> and trade data at a point in time, then maybe some form of temporal storage
> organization would give you the best approach.
>
> If you can define a regular interval we can do a very elegant mechanism
> based on fixed length arrays in GemFire that contain point in time
> snapshots of the rapidly changing elements. For instance, you might want a
> single top-level market data description object and then a price object
> with individual prices at 5 minute intervals built as a simple array of
> doubles.
>
> Does that sound like it might be a workable pattern for you?
>
>
> --
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: 631-835-4771
>
> On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <
> olivier.mallassi@gmail.com> wrote:
>
>> Hi everybody
>>
>> I am facing an issue and do not know what would be the right pattern. I
>> guess you can help.
>>
>> The need is to create snapshot of datas:
>> - let's say you have a stream of incoming objects that you want to store
>> in a region; let's say *MyRegion*. Clients are listening (via CQ) to
>> updates on *MyRegion*.
>> - at fixed period (e.g. every 3 sec or every hours depending on the case)
>> you want to snapshot these datas (while keeping updated the *MyRegion *with
>> incoming objects). Let's say the snapshotted region follow the convention
>> *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am currently
>> thinking about keeping a fixed number of snapshots and rolling on them.
>>
>> I see several options to implement this.
>> - *option#1*: at fixed period, I execute a function to copy data from *MyRegion
>> *to *MyRegion/snapshot-id1*. not sure it works fine with large amount of
>> data. not sure how to well handle new objects arriving in *MyRegion *while
>> I am snapshotting it.
>>
>> - *option#2*: I write the object twice: once in *MyRegion *and also in
>> *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one. then
>> switching to a new snapshot is about writing the objects in *MyRegion *and
>> *MyRegion/snapshot-idN+1*.
>>
>> Regarding option#2 (which is my preferred one but I may be wrong), I see
>> two implementations:
>> - *implem#1*. use a custom function that writes the object twice
>> (regions can be collocated etc...)? I can use local transaction within the
>> function in order to guarantee consistency between both regions.
>> - *implem#2*. I can use Listener and use AsyncEventListener. if they are
>> declared on multiple nodes, I assume there is no risk of losing data in
>> case of failure (e.g. a node crashes before all the "objects" in
>> AsyncListener are processed) ?
>>
>> Implem#1 looks easier to me (and I do not think it costs me a lot more in
>> terms of performance than the HA AsyncEventListener).
>>
>> What would be your opinions? favorite options? alternative options?
>>
>> I hope my email is clear enough. Many thanks for your help.
>>
>> olivier.
>>
>
>

Re: Geode and data snapshotting

Posted by Michael Stolz <ms...@pivotal.io>.

If what you are trying to do is get a consistent picture of market data and
trade data at a point in time, then maybe some form of temporal storage
organization would give you the best approach.

If you can define a regular interval we can do a very elegant mechanism
based on fixed length arrays in GemFire that contain point in time
snapshots of the rapidly changing elements. For instance, you might want a
single top-level market data description object and then a price object
with individual prices at 5 minute intervals built as a simple array of
doubles.

Does that sound like it might be a workable pattern for you?


--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: 631-835-4771

On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <olivier.mallassi@gmail.com
> wrote:

> Hi everybody
>
> I am facing an issue and do not know what would be the right pattern. I
> guess you can help.
>
> The need is to create snapshot of datas:
> - let's say you have a stream of incoming objects that you want to store
> in a region; let's say *MyRegion*. Clients are listening (via CQ) to
> updates on *MyRegion*.
> - at fixed period (e.g. every 3 sec or every hours depending on the case)
> you want to snapshot these datas (while keeping updated the *MyRegion *with
> incoming objects). Let's say the snapshotted region follow the convention
> *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am currently
> thinking about keeping a fixed number of snapshots and rolling on them.
>
> I see several options to implement this.
> - *option#1*: at fixed period, I execute a function to copy data from *MyRegion
> *to *MyRegion/snapshot-id1*. not sure it works fine with large amount of
> data. not sure how to well handle new objects arriving in *MyRegion *while
> I am snapshotting it.
>
> - *option#2*: I write the object twice: once in *MyRegion *and also in
> *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one. then
> switching to a new snapshot is about writing the objects in *MyRegion *and
> *MyRegion/snapshot-idN+1*.
>
> Regarding option#2 (which is my preferred one but I may be wrong), I see
> two implementations:
> - *implem#1*. use a custom function that writes the object twice (regions
> can be collocated etc...)? I can use local transaction within the function
> in order to guarantee consistency between both regions.
> - *implem#2*. I can use Listener and use AsyncEventListener. if they are
> declared on multiple nodes, I assume there is no risk of losing data in
> case of failure (e.g. a node crashes before all the "objects" in
> AsyncListener are processed) ?
>
> Implem#1 looks easier to me (and I do not think it costs me a lot more in
> terms of performance than the HA AsyncEventListener).
>
> What would be your opinions? favorite options? alternative options?
>
> I hope my email is clear enough. Many thanks for your help.
>
> olivier.
>

Re: Geode and data snapshotting

Posted by Luke Shannon <ls...@pivotal.io>.

If you are using disk stores you can also use the back up command in gfsh:
http://gemfire.docs.pivotal.io/docs-gemfire/latest/managing/disk_storage/backup_restore_disk_store.html#backup_restore_disk_store

It even supports incremental back up. If you are not using diskstores I
would not add them just for this as it does add complexity to managing the
cluster. Import and Export are good options.


On Wed, May 4, 2016 at 9:07 AM, james bedenbaugh <jb...@pivotal.io>
wrote:

> If I understand you properly, you want to do snapshots, then I would use
> the snapshop
> <http://geode.docs.pivotal.io/docs/managing/cache_snapshots/chapter_overview.html>
> facilities already in Geode -
>
>    - gfsh import
>    <http://geode.docs.pivotal.io/docs/managing/cache_snapshots/importing_a_snapshot.html>
>    or export
>    <http://geode.docs.pivotal.io/docs/managing/cache_snapshots/exporting_a_snapshot.html>
>    data
>    - Java API
>
> These can import/export into files or regions. I would think a cron job
> running a gfsh script would do the trick - no need to write any code unless
> you want to do filtering, etc.
>
> I would read the caveats in the docs concerning CacheListeners, etc.
>
>
> On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <
> olivier.mallassi@gmail.com> wrote:
>
>> Hi everybody
>>
>> I am facing an issue and do not know what would be the right pattern. I
>> guess you can help.
>>
>> The need is to create snapshot of datas:
>> - let's say you have a stream of incoming objects that you want to store
>> in a region; let's say *MyRegion*. Clients are listening (via CQ) to
>> updates on *MyRegion*.
>> - at fixed period (e.g. every 3 sec or every hours depending on the case)
>> you want to snapshot these datas (while keeping updated the *MyRegion *with
>> incoming objects). Let's say the snapshotted region follow the convention
>> *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am currently
>> thinking about keeping a fixed number of snapshots and rolling on them.
>>
>> I see several options to implement this.
>> - *option#1*: at fixed period, I execute a function to copy data from *MyRegion
>> *to *MyRegion/snapshot-id1*. not sure it works fine with large amount of
>> data. not sure how to well handle new objects arriving in *MyRegion *while
>> I am snapshotting it.
>>
>> - *option#2*: I write the object twice: once in *MyRegion *and also in
>> *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one. then
>> switching to a new snapshot is about writing the objects in *MyRegion *and
>> *MyRegion/snapshot-idN+1*.
>>
>> Regarding option#2 (which is my preferred one but I may be wrong), I see
>> two implementations:
>> - *implem#1*. use a custom function that writes the object twice
>> (regions can be collocated etc...)? I can use local transaction within the
>> function in order to guarantee consistency between both regions.
>> - *implem#2*. I can use Listener and use AsyncEventListener. if they are
>> declared on multiple nodes, I assume there is no risk of losing data in
>> case of failure (e.g. a node crashes before all the "objects" in
>> AsyncListener are processed) ?
>>
>> Implem#1 looks easier to me (and I do not think it costs me a lot more in
>> terms of performance than the HA AsyncEventListener).
>>
>> What would be your opinions? favorite options? alternative options?
>>
>> I hope my email is clear enough. Many thanks for your help.
>>
>> olivier.
>>
>
>
>
> --
> Regards,
> Jim Bedenbaugh
> Advisory Data Engineer
> Pivotal Software
>
> Optimism is not a naive hope for a better world, but a philosophical
> doctrine that is unafraid to voice harsh realities, embrace their
> confrontation and execute painful decisions with an
> unyielding commitment to excellence, buoyed with the confidence that by
> doing the right thing, one creates a better world, having done the least
> harm.
>



-- 
Luke Shannon | Platform Engineering | Pivotal
-------------------------------------------------------------------------
Mobile:416-571-9495
Join the Toronto Pivotal Usergroup:
http://www.meetup.com/Toronto-Pivotal-User-Group/

Re: Geode and data snapshotting

Posted by james bedenbaugh <jb...@pivotal.io>.

If I understand you properly, you want to do snapshots, then I would use
the snapshop
<http://geode.docs.pivotal.io/docs/managing/cache_snapshots/chapter_overview.html>
facilities already in Geode -

   - gfsh import
   <http://geode.docs.pivotal.io/docs/managing/cache_snapshots/importing_a_snapshot.html>
   or export
   <http://geode.docs.pivotal.io/docs/managing/cache_snapshots/exporting_a_snapshot.html>
   data
   - Java API

These can import/export into files or regions. I would think a cron job
running a gfsh script would do the trick - no need to write any code unless
you want to do filtering, etc.

I would read the caveats in the docs concerning CacheListeners, etc.


On Wed, May 4, 2016 at 4:34 AM, Olivier Mallassi <olivier.mallassi@gmail.com
> wrote:

> Hi everybody
>
> I am facing an issue and do not know what would be the right pattern. I
> guess you can help.
>
> The need is to create snapshot of datas:
> - let's say you have a stream of incoming objects that you want to store
> in a region; let's say *MyRegion*. Clients are listening (via CQ) to
> updates on *MyRegion*.
> - at fixed period (e.g. every 3 sec or every hours depending on the case)
> you want to snapshot these datas (while keeping updated the *MyRegion *with
> incoming objects). Let's say the snapshotted region follow the convention
> *MyRegion/snapshot-id1*, *MyRegion/snapshot-id2*... I am currently
> thinking about keeping a fixed number of snapshots and rolling on them.
>
> I see several options to implement this.
> - *option#1*: at fixed period, I execute a function to copy data from *MyRegion
> *to *MyRegion/snapshot-id1*. not sure it works fine with large amount of
> data. not sure how to well handle new objects arriving in *MyRegion *while
> I am snapshotting it.
>
> - *option#2*: I write the object twice: once in *MyRegion *and also in
> *MyRegion/snapshot-idN* assuming *snapshot-idN* is the latest one. then
> switching to a new snapshot is about writing the objects in *MyRegion *and
> *MyRegion/snapshot-idN+1*.
>
> Regarding option#2 (which is my preferred one but I may be wrong), I see
> two implementations:
> - *implem#1*. use a custom function that writes the object twice (regions
> can be collocated etc...)? I can use local transaction within the function
> in order to guarantee consistency between both regions.
> - *implem#2*. I can use Listener and use AsyncEventListener. if they are
> declared on multiple nodes, I assume there is no risk of losing data in
> case of failure (e.g. a node crashes before all the "objects" in
> AsyncListener are processed) ?
>
> Implem#1 looks easier to me (and I do not think it costs me a lot more in
> terms of performance than the HA AsyncEventListener).
>
> What would be your opinions? favorite options? alternative options?
>
> I hope my email is clear enough. Many thanks for your help.
>
> olivier.
>



-- 
Regards,
Jim Bedenbaugh
Advisory Data Engineer
Pivotal Software

Optimism is not a naive hope for a better world, but a philosophical
doctrine that is unafraid to voice harsh realities, embrace their
confrontation and execute painful decisions with an
unyielding commitment to excellence, buoyed with the confidence that by
doing the right thing, one creates a better world, having done the least
harm.