You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@geode.apache.org by Paul Perez <pa...@pymma.com> on 2017/01/19 11:39:17 UTC

Send an asynchronous event to one client among many

Hello All, 

 

As explained in a previous email, we try to use Geode to process and
aggregate a stream of Traces. Our requirement is to process billions of
simple traces  every day.

We imagine the aggregation process  in many steps. 

One: traces are generated by a tiers tools and stored in a first geode
region

Two: once a trace  put in the first region we use the  async event feature
to invoke a client that executes the first aggregation steps. Then the
result will be put in a second region. 

Three: the second aggregation step is in the same way, when traces are put
in the second region, then an asynchronous event is sent to  the client to
execute the second part of the aggregation etc..

 

For scalability purposes, we plan to use many clients that could receive the
events and execute the aggregation and put the results back to Geode. 

Consequently, as far as we understand the documentation, when an entry is
put in a region, each client that registered an interest receives an event
and aggregate the trace.  So, the trace will be aggregated many times. 

 

My  question is: If many clients are registered, could we configure the
region to send randomly, the event to one client only. 



A subsidiary question: Do we have the same behaviour with the function
execution feature or it could  be an alternative in that case 

Thank you for your help 



Best regards

 

Paul

RE: Send an asynchronous event to one client among many

Posted by Paul Perez <pa...@pymma.com>.

Hello Udo, 

 

Thank you for your prompt answer.  

Managing another open source forum, I know how it is difficult to reply
promptly and how we are happy to know what the users do with the product we
are working on. 

So further to your answer,  let me reformulate our concern with an
architect's vision and point out a kind of lack in Geode in the field it
wants to play in. 

We see two designs for  our aggregation. The first could rely on a pull
design. This means that the aggregator invokes the persistence system to get
"food to eat". 

The second could be a push design where the persistence system sends
messages to the aggregator through events and send the "Food to eat" to the
aggregator.

Our background pushes us to the second solution and it was the way we used
Gemfire (LONG TIME AGO) to ridicule pull designs based on databases in bank
and financial projects. In these projects, we had a "1:1 relationship" and
each event type was processed by one and only one application. 

 

Since, many things changed and a 1:1 relationship in a push architecture
cannot keep up the current requirements. Today, large applications need to
scale linearly. So Geode highlight its capability to scale and store more
and more data in a better way that anyone else and I believe it is true. In
the same way, Geode offers many nice features such as the events and the
functions. Unfortunately, these last features cannot scale linearly because
of the "1:1 relationship". 

 

I'm  impatient to read the result of your search. In the meantime, maybe a
pull solution could be enough and scalable too. Another way would be to use
an additional tool that could scale the aggregation process such as Storm.
But scaling Geode persistence and processing capability at the same time
would be perfect. 

 

Thank you for your time and your patience with me 

 

Best regards

 

Paul 

 

From: Udo Kohlmeyer [mailto:udo@apache.org] 
Sent: 19 January 2017 16:21
To: user@geode.apache.org; Paul.perez@pymma.com
Cc: bruno.sinkovic@pymma.com
Subject: Re: Send an asynchronous event to one client among many

 

Hi there Paul,

Firstly, your use case looks really interesting and hope to see a few more
posts on how you use Geode further. Keep us informed we like to hear what
you guys are doing with GEODE! :)

The subscription or CQ (continuous query) paradigm is, as stated, a 1:1
relationship. When a client registers interest on a region that client will
be notified. This is more of a topic semantic rather than a queue semantic.

Although this is not the first time I've heard the request for this kind of
functionality. To best explain why GEODE, currently, implements the 1:1
relationship has got to do with guaranteed delivery and in-order delivery.
If we use a queue semantic, with multiple clients being able to process data
in a balanced manner, we end up with potential out-of-order processing of
messages. In addition to that it now becomes significantly harder to track
and n deal with client failures and the potential replaying of messages.

But that said, I have seen other users resolve this problem and could detail
some approaches in a later correspondence if you'd like.

--Udo

 

 

 

On 1/19/17 03:39, Paul Perez wrote:

Hello All, 

 

As explained in a previous email, we try to use Geode to process and
aggregate a stream of Traces. Our requirement is to process billions of
simple traces  every day.

We imagine the aggregation process  in many steps. 

One: traces are generated by a tiers tools and stored in a first geode
region

Two: once a trace  put in the first region we use the  async event feature
to invoke a client that executes the first aggregation steps. Then the
result will be put in a second region. 

Three: the second aggregation step is in the same way, when traces are put
in the second region, then an asynchronous event is sent to  the client to
execute the second part of the aggregation etc..

 

For scalability purposes, we plan to use many clients that could receive the
events and execute the aggregation and put the results back to Geode. 

Consequently, as far as we understand the documentation, when an entry is
put in a region, each client that registered an interest receives an event
and aggregate the trace.  So, the trace will be aggregated many times. 

 

My  question is: If many clients are registered, could we configure the
region to send randomly, the event to one client only. 




A subsidiary question: Do we have the same behaviour with the function
execution feature or it could  be an alternative in that case 

Thank you for your help 




Best regards

 

Paul

Re: Send an asynchronous event to one client among many

Posted by Dan Smith <ds...@pivotal.io>.

Hi Paul,

The way a serial AsyncEventListener works is that you create the
AsyncEventQueue in multiple members. All of those members will hold the
queue of events to be dispatched. One of the members is chosen by geode to
be the primary. That member will take from the queue, invoke your
AsyncEventListener and pass it the events. The other members will just hold
redundant copies of the queue. If the chosen primary crashes, another
member will become primary and start dispatching events from the queue
where the old primary left off.

Parallel AsyncEventQueues work in a similar fashion, except that instead of
a single queue, there are multiple queues partitioned across all of the
members. You attach the parallel AsyncEventQueue to a parititioned region.
Each partition of that region has it's own queue.

-Dan

On Thu, Jan 19, 2017 at 12:06 PM, Paul Perez <pa...@pymma.com> wrote:

> Dan
>
>
>
> Thank you for time, this correction and your kindness.
>
>
>
> I don’t really understand your sentence : *Serial AsyncEventListener's
> have their own concept of a primary, and the primary dispatches all events
> until it fails.*
>
> I would be please to read your explanation if you have few moments.
>
>
>
> Best regards
>
>
>
> Paul
>
>
>
> *From:* Dan Smith [mailto:dsmith@pivotal.io]
> *Sent:* 19 January 2017 19:26
> *To:* user@geode.apache.org
> *Cc:* Udo Kohlmeyer <uk...@pivotal.io>; Michael Stolz <
> mstolz@pivotal.io>; bruno.sinkovic@pymma.com
>
> *Subject:* Re: Send an asynchronous event to one client among many
>
>
>
> One minor correction - even serial AsyncEventListener's are only fired on
> a single node for each event. Serial AsyncEventListener's have their own
> concept of a primary, and the primary dispatches all events until it fails.
>
> In this case I do think the parallel AsyncEventListener is probably the
> better option so you can scale out.
>
> -Dan
>
>
>
> On Thu, Jan 19, 2017 at 11:13 AM, Paul Perez <pa...@pymma.com> wrote:
>
> Hi Url
> That is a very good idea
> I will test it and let you know.
> Thank you everyone I really appreciate your help and your comments.
>
> Sent from my mobile phone
> Paul Perez
> Pymma
>
> On Jan 19, 2017, at 19:00, Udo Kohlmeyer <uk...@pivotal.io> wrote:
>
> Hi there Paul.
>
> We will assume that you are using partitioned regions. In partitioned
> regions you have the notion of "primary" and "redundant" data copies. Any
> CUD (create,update,destroy) operations will ALWAYS only happen on the
> primary node. Which means that with an AsyncEventListener, it will only
> ever "fire" on the primary data node.
>
> So no, you will not have the AsyncEventListener fire 3 times.
>
> With a replicate region, the AsyncEventListener will fire 3 times.
>
> The concept of serial vs parallel just means the amount to
> threads/executors that each AsyncEventListener will use. With serial, there
> will only be 1. With parallel you could have many threads, but once again
> an event will only ever be processed by one of the AsyncEventListener
> threads. (if you are using partitioned regions).
>
> You can try this out if you want.
>
> --Udo
>
>
>
> On 1/19/17 10:42, Paul Perez wrote:
>
> Hello Michael
>
>
>
> I did not see your answer before replying to Udo so may be in my last
> email I made mistakes and wrote wrong things.
>
> We also though  about AsyncEventListeners but we found a difficulty in.
>
> Geode documentation says:
>
> *“You can configure an AsyncEventQueue to be either serial or parallel. A
> serial queue is deployed to one Geode member, and it delivers all of a
> region’s events, in order of occurrence, to a configured AsyncEventListener
> implementation. A parallel queue is deployed to multiple Geode members, and
> each instance of the queue delivers region events, possibly simultaneously,
> to a local AsyncEventListener implementation.” *
>
>
>
> Let’s say that for scalability reason we have 3 members in our aggregation
> system, and we implement the aggregation process in the Listener. We
> understand that it will be invoked 3 times. And since an aggregation is not
> a stateless process the aggregation will be wrong.
>
>
>
> May be I’m wrong and I did not understand the documentation. I would be
> very happy with that.
>
>
>
> Please let me know
>
>
>
>
>
> Best regards
>
>
>
> Paul
>
>
>
> *From:* Michael Stolz [mailto:mstolz@pivotal.io <ms...@pivotal.io>]
> *Sent:* 19 January 2017 16:55
> *To:* user@geode.apache.org
> *Cc:* Paul.perez@pymma.com; bruno.sinkovic@pymma.com
> *Subject:* Re: Send an asynchronous event to one client among many
>
>
>
> Instead of hopping out to a client, you could get horizontal scale and
> asynchronous processing by using an AsyncEventListener in the servers. That
> will take care of multi-threading and queuing and all the plumbing, and you
> just go ahead and write your processing code and deploy it as
> AsyncEventListeners. This gives you guaranteed ordering semantics for each
> key as well.
>
>
>
> I *think* it even gives you a notion of H/A so that if the primary fails
> the queued messages will be processed by a secondary. (I know the WAN
> Gateway does and it uses pretty much the same plumbing under the covers).
>
>
>
>
> --
>
> Mike Stolz
>
> Principal Engineer, GemFire Product Manager
>
> Mobile: 631-835-4771 <(631)%20835-4771>
>
>
>
> On Thu, Jan 19, 2017 at 11:21 AM, Udo Kohlmeyer <ud...@apache.org> wrote:
>
> Hi there Paul,
>
> Firstly, your use case looks really interesting and hope to see a few more
> posts on how you use Geode further. Keep us informed we like to hear what
> you guys are doing with GEODE! :)
>
> The subscription or CQ (continuous query) paradigm is, as stated, a 1:1
> relationship. When a client registers interest on a region that client will
> be notified. This is more of a topic semantic rather than a queue semantic.
>
> Although this is not the first time I've heard the request for this kind
> of functionality. To best explain why GEODE, currently, implements the 1:1
> relationship has got to do with guaranteed delivery and in-order delivery.
> If we use a queue semantic, with multiple clients being able to process
> data in a balanced manner, we end up with potential out-of-order processing
> of messages. In addition to that it now becomes significantly harder to
> track and n deal with client failures and the potential replaying of
> messages.
>
> But that said, I have seen other users resolve this problem and could
> detail some approaches in a later correspondence if you'd like.
>
> --Udo
>
>
>
>
>
>
>
> On 1/19/17 03:39, Paul Perez wrote:
>
> Hello All,
>
>
>
> As explained in a previous email, we try to use Geode to process and
> aggregate a stream of Traces. Our requirement is to process billions of
> simple traces  every day.
>
> We imagine the aggregation process  in many steps.
>
> One: traces are generated by a tiers tools and stored in a first geode
> region
>
> Two: once a trace  put in the first region we use the  async event feature
> to invoke a client that executes the first aggregation steps. Then the
> result will be put in a second region.
>
> Three: the second aggregation step is in the same way, when traces are put
> in the second region, then an asynchronous event is sent to  the client to
>  execute the second part of the aggregation etc.…
>
>
>
> For scalability purposes, we plan to use many clients that could receive
> the events and execute the aggregation and put the results back to Geode.
>
> Consequently, as far as we understand the documentation, when an entry is
> put in a region, each client that registered an interest receives an event
> and aggregate the trace.  So, the trace will be aggregated many times.
>
>
>
> My  question is: If many clients are registered, could we configure the
> region to send randomly, the event to one client only.
>
> A subsidiary question: Do we have the same behaviour with the function
> execution feature or it could  be an alternative in that case
>
> Thank you for your help
>
> Best regards
>
>
>
> Paul
>
>
>
>
>
>
>
>
>
>
>

RE: Send an asynchronous event to one client among many

Posted by Paul Perez <pa...@pymma.com>.

Dan

 

Thank you for time, this correction and your kindness.

 

I don’t really understand your sentence : Serial AsyncEventListener's have their own concept of a primary, and the primary dispatches all events until it fails.

I would be please to read your explanation if you have few moments.

 

Best regards

 

Paul

 

From: Dan Smith [mailto:dsmith@pivotal.io] 
Sent: 19 January 2017 19:26
To: user@geode.apache.org
Cc: Udo Kohlmeyer <uk...@pivotal.io>; Michael Stolz <ms...@pivotal.io>; bruno.sinkovic@pymma.com
Subject: Re: Send an asynchronous event to one client among many

 

One minor correction - even serial AsyncEventListener's are only fired on a single node for each event. Serial AsyncEventListener's have their own concept of a primary, and the primary dispatches all events until it fails.

In this case I do think the parallel AsyncEventListener is probably the better option so you can scale out.

-Dan

 

On Thu, Jan 19, 2017 at 11:13 AM, Paul Perez <paul.perez@pymma.com <ma...@pymma.com> > wrote:

Hi Url
That is a very good idea
I will test it and let you know.  
Thank you everyone I really appreciate your help and your comments.  

Sent from my mobile phone 
Paul Perez
Pymma

On Jan 19, 2017, at 19:00, Udo Kohlmeyer <ukohlmeyer@pivotal.io <ma...@pivotal.io> > wrote:

Hi there Paul.

We will assume that you are using partitioned regions. In partitioned regions you have the notion of "primary" and "redundant" data copies. Any CUD (create,update,destroy) operations will ALWAYS only happen on the primary node. Which means that with an AsyncEventListener, it will only ever "fire" on the primary data node.

So no, you will not have the AsyncEventListener fire 3 times.

With a replicate region, the AsyncEventListener will fire 3 times.

The concept of serial vs parallel just means the amount to threads/executors that each AsyncEventListener will use. With serial, there will only be 1. With parallel you could have many threads, but once again an event will only ever be processed by one of the AsyncEventListener threads. (if you are using partitioned regions).

You can try this out if you want.

--Udo

 

On 1/19/17 10:42, Paul Perez wrote: 

Hello Michael 

  

I did not see your answer before replying to Udo so may be in my last email I made mistakes and wrote wrong things. 

We also though  about AsyncEventListeners but we found a difficulty in. 

Geode documentation says: 

“You can configure an AsyncEventQueue to be either serial or parallel. A serial queue is deployed to one Geode member, and it delivers all of a region’s events, in order of occurrence, to a configured AsyncEventListener implementation. A parallel queue is deployed to multiple Geode members, and each instance of the queue delivers region events, possibly simultaneously, to a local AsyncEventListener implementation.” 

  

Let’s say that for scalability reason we have 3 members in our aggregation system, and we implement the aggregation process in the Listener. We understand that it will be invoked 3 times. And since an aggregation is not a stateless process the aggregation will be wrong. 

  

May be I’m wrong and I did not understand the documentation. I would be very happy with that. 

  

Please let me know 

  

  

Best regards 

  

Paul 

  

From: Michael Stolz [mailto:mstolz@pivotal.io] 
Sent: 19 January 2017 16:55
To: user@geode.apache.org <ma...@geode.apache.org> 
Cc: Paul.perez@pymma.com <ma...@pymma.com> ; bruno.sinkovic@pymma.com <ma...@pymma.com> 
Subject: Re: Send an asynchronous event to one client among many 

  

Instead of hopping out to a client, you could get horizontal scale and asynchronous processing by using an AsyncEventListener in the servers. That will take care of multi-threading and queuing and all the plumbing, and you just go ahead and write your processing code and deploy it as AsyncEventListeners. This gives you guaranteed ordering semantics for each key as well. 

  

I *think* it even gives you a notion of H/A so that if the primary fails the queued messages will be processed by a secondary. (I know the WAN Gateway does and it uses pretty much the same plumbing under the covers). 

  




-- 

Mike Stolz 

Principal Engineer, GemFire Product Manager  

Mobile: 631-835-4771 <tel:(631)%20835-4771>  

  

On Thu, Jan 19, 2017 at 11:21 AM, Udo Kohlmeyer <udo@apache.org <ma...@apache.org> > wrote: 

Hi there Paul, 

Firstly, your use case looks really interesting and hope to see a few more posts on how you use Geode further. Keep us informed we like to hear what you guys are doing with GEODE! :) 

The subscription or CQ (continuous query) paradigm is, as stated, a 1:1 relationship. When a client registers interest on a region that client will be notified. This is more of a topic semantic rather than a queue semantic. 

Although this is not the first time I've heard the request for this kind of functionality. To best explain why GEODE, currently, implements the 1:1 relationship has got to do with guaranteed delivery and in-order delivery. If we use a queue semantic, with multiple clients being able to process data in a balanced manner, we end up with potential out-of-order processing of messages. In addition to that it now becomes significantly harder to track and n deal with client failures and the potential replaying of messages. 

But that said, I have seen other users resolve this problem and could detail some approaches in a later correspondence if you'd like. 

--Udo 

  

  

  

On 1/19/17 03:39, Paul Perez wrote: 

Hello All, 

  

As explained in a previous email, we try to use Geode to process and aggregate a stream of Traces. Our requirement is to process billions of simple traces  every day. 

We imagine the aggregation process  in many steps. 

One: traces are generated by a tiers tools and stored in a first geode region 

Two: once a trace  put in the first region we use the  async event feature to invoke a client that executes the first aggregation steps. Then the result will be put in a second region. 

Three: the second aggregation step is in the same way, when traces are put in the second region, then an asynchronous event is sent to  the client to  execute the second part of the aggregation etc.… 

  

For scalability purposes, we plan to use many clients that could receive the events and execute the aggregation and put the results back to Geode. 

Consequently, as far as we understand the documentation, when an entry is put in a region, each client that registered an interest receives an event and aggregate the trace.  So, the trace will be aggregated many times. 

  

My  question is: If many clients are registered, could we configure the region to send randomly, the event to one client only. 

A subsidiary question: Do we have the same behaviour with the function execution feature or it could  be an alternative in that case 

Thank you for your help 

Best regards 

  

Paul

Re: Send an asynchronous event to one client among many

Posted by Dan Smith <ds...@pivotal.io>.

One minor correction - even serial AsyncEventListener's are only fired on a
single node for each event. Serial AsyncEventListener's have their own
concept of a primary, and the primary dispatches all events until it fails.

In this case I do think the parallel AsyncEventListener is probably the
better option so you can scale out.

-Dan

On Thu, Jan 19, 2017 at 11:13 AM, Paul Perez <pa...@pymma.com> wrote:

> Hi Url
> That is a very good idea
> I will test it and let you know.
> Thank you everyone I really appreciate your help and your comments.
>
> Sent from my mobile phone
> Paul Perez
> Pymma
> On Jan 19, 2017, at 19:00, Udo Kohlmeyer <uk...@pivotal.io> wrote:
>>
>> Hi there Paul.
>>
>> We will assume that you are using partitioned regions. In partitioned
>> regions you have the notion of "primary" and "redundant" data copies. Any
>> CUD (create,update,destroy) operations will ALWAYS only happen on the
>> primary node. Which means that with an AsyncEventListener, it will only
>> ever "fire" on the primary data node.
>>
>> So no, you will not have the AsyncEventListener fire 3 times.
>>
>> With a replicate region, the AsyncEventListener will fire 3 times.
>>
>> The concept of serial vs parallel just means the amount to
>> threads/executors that each AsyncEventListener will use. With serial, there
>> will only be 1. With parallel you could have many threads, but once again
>> an event will only ever be processed by one of the AsyncEventListener
>> threads. (if you are using partitioned regions).
>>
>> You can try this out if you want.
>>
>> --Udo
>>
>> On 1/19/17 10:42, Paul Perez wrote:
>>
>> Hello Michael
>>
>>
>>
>> I did not see your answer before replying to Udo so may be in my last
>> email I made mistakes and wrote wrong things.
>>
>> We also though  about AsyncEventListeners but we found a difficulty in.
>>
>> Geode documentation says:
>>
>> *“You can configure an AsyncEventQueue to be either serial or parallel. A
>> serial queue is deployed to one Geode member, and it delivers all of a
>> region’s events, in order of occurrence, to a configured AsyncEventListener
>> implementation. A parallel queue is deployed to multiple Geode members, and
>> each instance of the queue delivers region events, possibly simultaneously,
>> to a local AsyncEventListener implementation.” *
>>
>>
>>
>> Let’s say that for scalability reason we have 3 members in our
>> aggregation system, and we implement the aggregation process in the
>> Listener. We understand that it will be invoked 3 times. And since an
>> aggregation is not a stateless process the aggregation will be wrong.
>>
>>
>>
>> May be I’m wrong and I did not understand the documentation. I would be
>> very happy with that.
>>
>>
>>
>> Please let me know
>>
>>
>>
>>
>>
>> Best regards
>>
>>
>>
>> Paul
>>
>>
>>
>> *From:* Michael Stolz [mailto:mstolz@pivotal.io <ms...@pivotal.io>]
>> *Sent:* 19 January 2017 16:55
>> *To:* user@geode.apache.org
>> *Cc:* Paul.perez@pymma.com; bruno.sinkovic@pymma.com
>> *Subject:* Re: Send an asynchronous event to one client among many
>>
>>
>>
>> Instead of hopping out to a client, you could get horizontal scale and
>> asynchronous processing by using an AsyncEventListener in the servers. That
>> will take care of multi-threading and queuing and all the plumbing, and you
>> just go ahead and write your processing code and deploy it as
>> AsyncEventListeners. This gives you guaranteed ordering semantics for each
>> key as well.
>>
>>
>>
>> I *think* it even gives you a notion of H/A so that if the primary fails
>> the queued messages will be processed by a secondary. (I know the WAN
>> Gateway does and it uses pretty much the same plumbing under the covers).
>>
>>
>>
>>
>> --
>>
>> Mike Stolz
>>
>> Principal Engineer, GemFire Product Manager
>>
>> Mobile: 631-835-4771 <(631)%20835-4771>
>>
>>
>>
>> On Thu, Jan 19, 2017 at 11:21 AM, Udo Kohlmeyer <ud...@apache.org> wrote:
>>
>> Hi there Paul,
>>
>> Firstly, your use case looks really interesting and hope to see a few
>> more posts on how you use Geode further. Keep us informed we like to hear
>> what you guys are doing with GEODE! :)
>>
>> The subscription or CQ (continuous query) paradigm is, as stated, a 1:1
>> relationship. When a client registers interest on a region that client will
>> be notified. This is more of a topic semantic rather than a queue semantic.
>>
>> Although this is not the first time I've heard the request for this kind
>> of functionality. To best explain why GEODE, currently, implements the 1:1
>> relationship has got to do with guaranteed delivery and in-order delivery.
>> If we use a queue semantic, with multiple clients being able to process
>> data in a balanced manner, we end up with potential out-of-order processing
>> of messages. In addition to that it now becomes significantly harder to
>> track and n deal with client failures and the potential replaying of
>> messages.
>>
>> But that said, I have seen other users resolve this problem and could
>> detail some approaches in a later correspondence if you'd like.
>>
>> --Udo
>>
>>
>>
>>
>>
>>
>>
>> On 1/19/17 03:39, Paul Perez wrote:
>>
>> Hello All,
>>
>>
>>
>> As explained in a previous email, we try to use Geode to process and
>> aggregate a stream of Traces. Our requirement is to process billions of
>> simple traces  every day.
>>
>> We imagine the aggregation process  in many steps.
>>
>> One: traces are generated by a tiers tools and stored in a first geode
>> region
>>
>> Two: once a trace  put in the first region we use the  async event
>> feature to invoke a client that executes the first aggregation steps. Then
>> the result will be put in a second region.
>>
>> Three: the second aggregation step is in the same way, when traces are
>> put in the second region, then an asynchronous event is sent to  the client
>> to  execute the second part of the aggregation etc.…
>>
>>
>>
>> For scalability purposes, we plan to use many clients that could receive
>> the events and execute the aggregation and put the results back to Geode.
>>
>> Consequently, as far as we understand the documentation, when an entry is
>> put in a region, each client that registered an interest receives an event
>> and aggregate the trace.  So, the trace will be aggregated many times.
>>
>>
>>
>> My  question is: If many clients are registered, could we configure the
>> region to send randomly, the event to one client only.
>>
>> A subsidiary question: Do we have the same behaviour with the function
>> execution feature or it could  be an alternative in that case
>>
>> Thank you for your help
>>
>> Best regards
>>
>>
>>
>> Paul
>>
>>
>>
>>
>>
>>
>>
>>
>>

Re: Send an asynchronous event to one client among many

Posted by Paul Perez <pa...@pymma.com>.

Hi Url
That is a very good idea
I will test it and let you know.� 
Thank you everyone I really appreciate your help and your comments.� 

\u2063Sent from my mobile phone 
Paul Perez
Pymma\u200b

On Jan 19, 2017, 19:00, at 19:00, Udo Kohlmeyer <uk...@pivotal.io> wrote:
>Hi there Paul.
>
>We will assume that you are using partitioned regions. In partitioned 
>regions you have the notion of "primary" and "redundant" data copies. 
>Any CUD (create,update,destroy) operations will ALWAYS only happen on 
>the primary node. Which means that with an AsyncEventListener, it will 
>only ever "fire" on the primary data node.
>
>So no, you will not have the AsyncEventListener fire 3 times.
>
>With a replicate region, the AsyncEventListener will fire 3 times.
>
>The concept of serial vs parallel just means the amount to 
>threads/executors that each AsyncEventListener will use. With serial, 
>there will only be 1. With parallel you could have many threads, but 
>once again an event will only ever be processed by one of the 
>AsyncEventListener threads. (if you are using partitioned regions).
>
>You can try this out if you want.
>
>--Udo
>
>
>On 1/19/17 10:42, Paul Perez wrote:
>>
>> Hello Michael
>>
>> I did not see your answer before replying to Udo so may be in my last
>
>> email I made mistakes and wrote wrong things.
>>
>> We also though  about AsyncEventListeners but we found a difficulty
>in.
>>
>> Geode documentation says:
>>
>> /\u201cYou can configure an AsyncEventQueue to be either serial or 
>> parallel. A serial queue is deployed to one Geode member, and it 
>> delivers all of a region\u2019s events, in order of occurrence, to a 
>> configured AsyncEventListener implementation. A parallel queue is 
>> deployed to multiple Geode members, and each instance of the queue 
>> delivers region events, possibly simultaneously, to a local 
>> AsyncEventListener implementation.\u201d/
>>
>> Let\u2019s say that for scalability reason we have 3 members in our 
>> aggregation system, and we implement the aggregation process in the 
>> Listener. We understand that it will be invoked 3 times. And since an
>
>> aggregation is not a stateless process the aggregation will be wrong.
>>
>> May be I\u2019m wrong and I did not understand the documentation. I would 
>> be very happy with that.
>>
>> Please let me know
>>
>> Best regards
>>
>> Paul
>>
>> *From:*Michael Stolz [mailto:mstolz@pivotal.io]
>> *Sent:* 19 January 2017 16:55
>> *To:* user@geode.apache.org
>> *Cc:* Paul.perez@pymma.com; bruno.sinkovic@pymma.com
>> *Subject:* Re: Send an asynchronous event to one client among many
>>
>> Instead of hopping out to a client, you could get horizontal scale
>and 
>> asynchronous processing by using an AsyncEventListener in the
>servers. 
>> That will take care of multi-threading and queuing and all the 
>> plumbing, and you just go ahead and write your processing code and 
>> deploy it as AsyncEventListeners. This gives you guaranteed ordering 
>> semantics for each key as well.
>>
>> I *think* it even gives you a notion of H/A so that if the primary 
>> fails the queued messages will be processed by a secondary. (I know 
>> the WAN Gateway does and it uses pretty much the same plumbing under 
>> the covers).
>>
>>
>> --
>>
>> Mike Stolz
>>
>> Principal Engineer, GemFire Product Manager
>>
>> Mobile: 631-835-4771
>>
>> On Thu, Jan 19, 2017 at 11:21 AM, Udo Kohlmeyer <udo@apache.org 
>> <ma...@apache.org>> wrote:
>>
>>     Hi there Paul,
>>
>>     Firstly, your use case looks really interesting and hope to see a
>>     few more posts on how you use Geode further. Keep us informed we
>>     like to hear what you guys are doing with GEODE! :)
>>
>>     The subscription or CQ (continuous query) paradigm is, as stated,
>>     a 1:1 relationship. When a client registers interest on a region
>>     that client will be notified. This is more of a topic semantic
>>     rather than a queue semantic.
>>
>>     Although this is not the first time I've heard the request for
>>     this kind of functionality. To best explain why GEODE, currently,
>>     implements the 1:1 relationship has got to do with guaranteed
>>     delivery and in-order delivery. If we use a queue semantic, with
>>     multiple clients being able to process data in a balanced manner,
>>     we end up with potential out-of-order processing of messages. In
>>     addition to that it now becomes significantly harder to track and
>>     n deal with client failures and the potential replaying of
>messages.
>>
>>     But that said, I have seen other users resolve this problem and
>>     could detail some approaches in a later correspondence if you'd
>like.
>>
>>     --Udo
>>
>>     On 1/19/17 03:39, Paul Perez wrote:
>>
>>         Hello All,
>>
>>         As explained in a previous email, we try to use Geode to
>>         process and aggregate a stream of Traces. Our requirement is
>>         to process billions of simple traces  every day.
>>
>>         We imagine the aggregation process  in many steps.
>>
>>         One: traces are generated by a tiers tools and stored in a
>>         first geode region
>>
>>         Two: once a trace  put in the first region we use the  async
>>         event feature to invoke a client that executes the first
>>         aggregation steps. Then the result will be put in a second
>>         region.
>>
>>         Three: the second aggregation step is in the same way, when
>>         traces are put in the second region, then an asynchronous
>>         event is sent to  the client to  execute the second part of
>>         the aggregation etc.\u2026
>>
>>         For scalability purposes, we plan to use many clients that
>>         could receive the events and execute the aggregation and put
>>         the results back to Geode.
>>
>>         Consequently, as far as we understand the documentation, when
>>         an entry is put in a region, each client that registered an
>>         interest receives an event and aggregate the trace.  So, the
>>         trace will be aggregated many times.
>>
>>         My  question is: If many clients are registered, could we
>>         configure the region to send randomly, the event to one
>client
>>         only.
>>
>>         A subsidiary question: Do we have the same behaviour with the
>>         function execution feature or it could  be an alternative in
>>         that case
>>
>>         Thank you for your help
>>
>>         Best regards
>>
>>         Paul
>>

Re: Send an asynchronous event to one client among many

Posted by Udo Kohlmeyer <uk...@pivotal.io>.

Hi there Paul.

We will assume that you are using partitioned regions. In partitioned 
regions you have the notion of "primary" and "redundant" data copies. 
Any CUD (create,update,destroy) operations will ALWAYS only happen on 
the primary node. Which means that with an AsyncEventListener, it will 
only ever "fire" on the primary data node.

So no, you will not have the AsyncEventListener fire 3 times.

With a replicate region, the AsyncEventListener will fire 3 times.

The concept of serial vs parallel just means the amount to 
threads/executors that each AsyncEventListener will use. With serial, 
there will only be 1. With parallel you could have many threads, but 
once again an event will only ever be processed by one of the 
AsyncEventListener threads. (if you are using partitioned regions).

You can try this out if you want.

--Udo


On 1/19/17 10:42, Paul Perez wrote:
>
> Hello Michael
>
> I did not see your answer before replying to Udo so may be in my last 
> email I made mistakes and wrote wrong things.
>
> We also though  about AsyncEventListeners but we found a difficulty in.
>
> Geode documentation says:
>
> /\u201cYou can configure an AsyncEventQueue to be either serial or 
> parallel. A serial queue is deployed to one Geode member, and it 
> delivers all of a region\u2019s events, in order of occurrence, to a 
> configured AsyncEventListener implementation. A parallel queue is 
> deployed to multiple Geode members, and each instance of the queue 
> delivers region events, possibly simultaneously, to a local 
> AsyncEventListener implementation.\u201d/
>
> Let\u2019s say that for scalability reason we have 3 members in our 
> aggregation system, and we implement the aggregation process in the 
> Listener. We understand that it will be invoked 3 times. And since an 
> aggregation is not a stateless process the aggregation will be wrong.
>
> May be I\u2019m wrong and I did not understand the documentation. I would 
> be very happy with that.
>
> Please let me know
>
> Best regards
>
> Paul
>
> *From:*Michael Stolz [mailto:mstolz@pivotal.io]
> *Sent:* 19 January 2017 16:55
> *To:* user@geode.apache.org
> *Cc:* Paul.perez@pymma.com; bruno.sinkovic@pymma.com
> *Subject:* Re: Send an asynchronous event to one client among many
>
> Instead of hopping out to a client, you could get horizontal scale and 
> asynchronous processing by using an AsyncEventListener in the servers. 
> That will take care of multi-threading and queuing and all the 
> plumbing, and you just go ahead and write your processing code and 
> deploy it as AsyncEventListeners. This gives you guaranteed ordering 
> semantics for each key as well.
>
> I *think* it even gives you a notion of H/A so that if the primary 
> fails the queued messages will be processed by a secondary. (I know 
> the WAN Gateway does and it uses pretty much the same plumbing under 
> the covers).
>
>
> --
>
> Mike Stolz
>
> Principal Engineer, GemFire Product Manager
>
> Mobile: 631-835-4771
>
> On Thu, Jan 19, 2017 at 11:21 AM, Udo Kohlmeyer <udo@apache.org 
> <ma...@apache.org>> wrote:
>
>     Hi there Paul,
>
>     Firstly, your use case looks really interesting and hope to see a
>     few more posts on how you use Geode further. Keep us informed we
>     like to hear what you guys are doing with GEODE! :)
>
>     The subscription or CQ (continuous query) paradigm is, as stated,
>     a 1:1 relationship. When a client registers interest on a region
>     that client will be notified. This is more of a topic semantic
>     rather than a queue semantic.
>
>     Although this is not the first time I've heard the request for
>     this kind of functionality. To best explain why GEODE, currently,
>     implements the 1:1 relationship has got to do with guaranteed
>     delivery and in-order delivery. If we use a queue semantic, with
>     multiple clients being able to process data in a balanced manner,
>     we end up with potential out-of-order processing of messages. In
>     addition to that it now becomes significantly harder to track and
>     n deal with client failures and the potential replaying of messages.
>
>     But that said, I have seen other users resolve this problem and
>     could detail some approaches in a later correspondence if you'd like.
>
>     --Udo
>
>     On 1/19/17 03:39, Paul Perez wrote:
>
>         Hello All,
>
>         As explained in a previous email, we try to use Geode to
>         process and aggregate a stream of Traces. Our requirement is
>         to process billions of simple traces  every day.
>
>         We imagine the aggregation process  in many steps.
>
>         One: traces are generated by a tiers tools and stored in a
>         first geode region
>
>         Two: once a trace  put in the first region we use the  async
>         event feature to invoke a client that executes the first
>         aggregation steps. Then the result will be put in a second
>         region.
>
>         Three: the second aggregation step is in the same way, when
>         traces are put in the second region, then an asynchronous
>         event is sent to  the client to  execute the second part of
>         the aggregation etc.\u2026
>
>         For scalability purposes, we plan to use many clients that
>         could receive the events and execute the aggregation and put
>         the results back to Geode.
>
>         Consequently, as far as we understand the documentation, when
>         an entry is put in a region, each client that registered an
>         interest receives an event and aggregate the trace.  So, the
>         trace will be aggregated many times.
>
>         My  question is: If many clients are registered, could we
>         configure the region to send randomly, the event to one client
>         only.
>
>         A subsidiary question: Do we have the same behaviour with the
>         function execution feature or it could  be an alternative in
>         that case
>
>         Thank you for your help
>
>         Best regards
>
>         Paul
>

RE: Send an asynchronous event to one client among many

Posted by Paul Perez <pa...@pymma.com>.

Hello Michael 

 

I did not see your answer before replying to Udo so may be in my last email I made mistakes and wrote wrong things. 

We also though  about AsyncEventListeners but we found a difficulty in. 

Geode documentation says: 

“You can configure an AsyncEventQueue to be either serial or parallel. A serial queue is deployed to one Geode member, and it delivers all of a region’s events, in order of occurrence, to a configured AsyncEventListener implementation. A parallel queue is deployed to multiple Geode members, and each instance of the queue delivers region events, possibly simultaneously, to a local AsyncEventListener implementation.”

 

Let’s say that for scalability reason we have 3 members in our aggregation system, and we implement the aggregation process in the Listener. We understand that it will be invoked 3 times. And since an aggregation is not a stateless process the aggregation will be wrong. 

 

May be I’m wrong and I did not understand the documentation. I would be very happy with that. 

 

Please let me know 

 

 

Best regards

 

Paul 

 

From: Michael Stolz [mailto:mstolz@pivotal.io] 
Sent: 19 January 2017 16:55
To: user@geode.apache.org
Cc: Paul.perez@pymma.com; bruno.sinkovic@pymma.com
Subject: Re: Send an asynchronous event to one client among many

 

Instead of hopping out to a client, you could get horizontal scale and asynchronous processing by using an AsyncEventListener in the servers. That will take care of multi-threading and queuing and all the plumbing, and you just go ahead and write your processing code and deploy it as AsyncEventListeners. This gives you guaranteed ordering semantics for each key as well.

 

I *think* it even gives you a notion of H/A so that if the primary fails the queued messages will be processed by a secondary. (I know the WAN Gateway does and it uses pretty much the same plumbing under the covers).

 




--

Mike Stolz

Principal Engineer, GemFire Product Manager 

Mobile: 631-835-4771

 

On Thu, Jan 19, 2017 at 11:21 AM, Udo Kohlmeyer <udo@apache.org <ma...@apache.org> > wrote:

Hi there Paul,

Firstly, your use case looks really interesting and hope to see a few more posts on how you use Geode further. Keep us informed we like to hear what you guys are doing with GEODE! :)

The subscription or CQ (continuous query) paradigm is, as stated, a 1:1 relationship. When a client registers interest on a region that client will be notified. This is more of a topic semantic rather than a queue semantic.

Although this is not the first time I've heard the request for this kind of functionality. To best explain why GEODE, currently, implements the 1:1 relationship has got to do with guaranteed delivery and in-order delivery. If we use a queue semantic, with multiple clients being able to process data in a balanced manner, we end up with potential out-of-order processing of messages. In addition to that it now becomes significantly harder to track and n deal with client failures and the potential replaying of messages.

But that said, I have seen other users resolve this problem and could detail some approaches in a later correspondence if you'd like.

--Udo

 

 

 

On 1/19/17 03:39, Paul Perez wrote:

Hello All, 

 

As explained in a previous email, we try to use Geode to process and aggregate a stream of Traces. Our requirement is to process billions of simple traces  every day.

We imagine the aggregation process  in many steps. 

One: traces are generated by a tiers tools and stored in a first geode region

Two: once a trace  put in the first region we use the  async event feature to invoke a client that executes the first aggregation steps. Then the result will be put in a second region. 

Three: the second aggregation step is in the same way, when traces are put in the second region, then an asynchronous event is sent to  the client to  execute the second part of the aggregation etc.…

 

For scalability purposes, we plan to use many clients that could receive the events and execute the aggregation and put the results back to Geode. 

Consequently, as far as we understand the documentation, when an entry is put in a region, each client that registered an interest receives an event and aggregate the trace.  So, the trace will be aggregated many times. 

 

My  question is: If many clients are registered, could we configure the region to send randomly, the event to one client only. 

A subsidiary question: Do we have the same behaviour with the function execution feature or it could  be an alternative in that case 

Thank you for your help 

Best regards

 

Paul

Re: Send an asynchronous event to one client among many

Posted by Udo Kohlmeyer <uk...@pivotal.io>.

+1 Good idea Mike


On 1/19/17 08:55, Michael Stolz wrote:
> Instead of hopping out to a client, you could get horizontal scale and 
> asynchronous processing by using an AsyncEventListener in the servers. 
> That will take care of multi-threading and queuing and all the 
> plumbing, and you just go ahead and write your processing code and 
> deploy it as AsyncEventListeners. This gives you guaranteed ordering 
> semantics for each key as well.
>
> I *think* it even gives you a notion of H/A so that if the primary 
> fails the queued messages will be processed by a secondary. (I know 
> the WAN Gateway does and it uses pretty much the same plumbing under 
> the covers).
>
>
> -- 
> Mike Stolz
> Principal Engineer, GemFire Product Manager
> Mobile: 631-835-4771
>
> On Thu, Jan 19, 2017 at 11:21 AM, Udo Kohlmeyer <udo@apache.org 
> <ma...@apache.org>> wrote:
>
>     Hi there Paul,
>
>     Firstly, your use case looks really interesting and hope to see a
>     few more posts on how you use Geode further. Keep us informed we
>     like to hear what you guys are doing with GEODE! :)
>
>     The subscription or CQ (continuous query) paradigm is, as stated,
>     a 1:1 relationship. When a client registers interest on a region
>     that client will be notified. This is more of a topic semantic
>     rather than a queue semantic.
>
>     Although this is not the first time I've heard the request for
>     this kind of functionality. To best explain why GEODE, currently,
>     implements the 1:1 relationship has got to do with guaranteed
>     delivery and in-order delivery. If we use a queue semantic, with
>     multiple clients being able to process data in a balanced manner,
>     we end up with potential out-of-order processing of messages. In
>     addition to that it now becomes significantly harder to track and
>     n deal with client failures and the potential replaying of messages.
>
>     But that said, I have seen other users resolve this problem and
>     could detail some approaches in a later correspondence if you'd like.
>
>     --Udo
>
>
>
>
>     On 1/19/17 03:39, Paul Perez wrote:
>>
>>     Hello All,
>>
>>     As explained in a previous email, we try to use Geode to process
>>     and aggregate a stream of Traces. Our requirement is to process
>>     billions of simple traces  every day.
>>
>>     We imagine the aggregation process  in many steps.
>>
>>     One: traces are generated by a tiers tools and stored in a first
>>     geode region
>>
>>     Two: once a trace  put in the first region we use the  async
>>     event feature to invoke a client that executes the first
>>     aggregation steps. Then the result will be put in a second region.
>>
>>     Three: the second aggregation step is in the same way, when
>>     traces are put in the second region, then an asynchronous event
>>     is sent to  the client to  execute the second part of the
>>     aggregation etc.\u2026
>>
>>     For scalability purposes, we plan to use many clients that could
>>     receive the events and execute the aggregation and put the
>>     results back to Geode.
>>
>>     Consequently, as far as we understand the documentation, when an
>>     entry is put in a region, each client that registered an interest
>>     receives an event and aggregate the trace.  So, the trace will be
>>     aggregated many times.
>>
>>     My  question is: If many clients are registered, could we
>>     configure the region to send randomly, the event to one client only.
>>
>>     A subsidiary question: Do we have the same behaviour with the
>>     function execution feature or it could  be an alternative in that
>>     case
>>
>>     Thank you for your help
>>
>>     Best regards
>>
>>     Paul
>>
>
>

Re: Send an asynchronous event to one client among many

Posted by Michael Stolz <ms...@pivotal.io>.

Instead of hopping out to a client, you could get horizontal scale and
asynchronous processing by using an AsyncEventListener in the servers. That
will take care of multi-threading and queuing and all the plumbing, and you
just go ahead and write your processing code and deploy it as
AsyncEventListeners. This gives you guaranteed ordering semantics for each
key as well.

I *think* it even gives you a notion of H/A so that if the primary fails
the queued messages will be processed by a secondary. (I know the WAN
Gateway does and it uses pretty much the same plumbing under the covers).


--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: 631-835-4771

On Thu, Jan 19, 2017 at 11:21 AM, Udo Kohlmeyer <ud...@apache.org> wrote:

> Hi there Paul,
>
> Firstly, your use case looks really interesting and hope to see a few more
> posts on how you use Geode further. Keep us informed we like to hear what
> you guys are doing with GEODE! :)
>
> The subscription or CQ (continuous query) paradigm is, as stated, a 1:1
> relationship. When a client registers interest on a region that client will
> be notified. This is more of a topic semantic rather than a queue semantic.
>
> Although this is not the first time I've heard the request for this kind
> of functionality. To best explain why GEODE, currently, implements the 1:1
> relationship has got to do with guaranteed delivery and in-order delivery.
> If we use a queue semantic, with multiple clients being able to process
> data in a balanced manner, we end up with potential out-of-order processing
> of messages. In addition to that it now becomes significantly harder to
> track and n deal with client failures and the potential replaying of
> messages.
>
> But that said, I have seen other users resolve this problem and could
> detail some approaches in a later correspondence if you'd like.
>
> --Udo
>
>
>
> On 1/19/17 03:39, Paul Perez wrote:
>
> Hello All,
>
>
>
> As explained in a previous email, we try to use Geode to process and
> aggregate a stream of Traces. Our requirement is to process billions of
> simple traces  every day.
>
> We imagine the aggregation process  in many steps.
>
> One: traces are generated by a tiers tools and stored in a first geode
> region
>
> Two: once a trace  put in the first region we use the  async event feature
> to invoke a client that executes the first aggregation steps. Then the
> result will be put in a second region.
>
> Three: the second aggregation step is in the same way, when traces are put
> in the second region, then an asynchronous event is sent to  the client to
>  execute the second part of the aggregation etc.…
>
>
>
> For scalability purposes, we plan to use many clients that could receive
> the events and execute the aggregation and put the results back to Geode.
>
> Consequently, as far as we understand the documentation, when an entry is
> put in a region, each client that registered an interest receives an event
> and aggregate the trace.  So, the trace will be aggregated many times.
>
>
>
> My  question is: If many clients are registered, could we configure the
> region to send randomly, the event to one client only.
>
> A subsidiary question: Do we have the same behaviour with the function
> execution feature or it could  be an alternative in that case
>
> Thank you for your help
>
> Best regards
>
>
>
> Paul
>
>
>
>
>

Re: Send an asynchronous event to one client among many

Posted by Udo Kohlmeyer <ud...@apache.org>.

Hi there Paul,

Firstly, your use case looks really interesting and hope to see a few 
more posts on how you use Geode further. Keep us informed we like to 
hear what you guys are doing with GEODE! :)

The subscription or CQ (continuous query) paradigm is, as stated, a 1:1 
relationship. When a client registers interest on a region that client 
will be notified. This is more of a topic semantic rather than a queue 
semantic.

Although this is not the first time I've heard the request for this kind 
of functionality. To best explain why GEODE, currently, implements the 
1:1 relationship has got to do with guaranteed delivery and in-order 
delivery. If we use a queue semantic, with multiple clients being able 
to process data in a balanced manner, we end up with potential 
out-of-order processing of messages. In addition to that it now becomes 
significantly harder to track and n deal with client failures and the 
potential replaying of messages.

But that said, I have seen other users resolve this problem and could 
detail some approaches in a later correspondence if you'd like.

--Udo

On 1/19/17 03:39, Paul Perez wrote:
>
> Hello All,
>
> As explained in a previous email, we try to use Geode to process and 
> aggregate a stream of Traces. Our requirement is to process billions 
> of simple traces  every day.
>
> We imagine the aggregation process  in many steps.
>
> One: traces are generated by a tiers tools and stored in a first geode 
> region
>
> Two: once a trace  put in the first region we use the  async event 
> feature to invoke a client that executes the first aggregation steps. 
> Then the result will be put in a second region.
>
> Three: the second aggregation step is in the same way, when traces are 
> put in the second region, then an asynchronous event is sent to  the 
> client to  execute the second part of the aggregation etc.
>
> For scalability purposes, we plan to use many clients that could 
> receive the events and execute the aggregation and put the results 
> back to Geode.
>
> Consequently, as far as we understand the documentation, when an entry 
> is put in a region, each client that registered an interest receives 
> an event and aggregate the trace.  So, the trace will be aggregated 
> many times.
>
> My  question is: If many clients are registered, could we configure 
> the region to send randomly, the event to one client only.
>
> A subsidiary question: Do we have the same behaviour with the function 
> execution feature or it could  be an alternative in that case
>
> Thank you for your help
>
> Best regards
>
> Paul
>