You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streams.apache.org by Danny Sullivan <ds...@hotmail.com> on 2013/07/09 21:40:57 UTC

Subscriber/ Publisher handling of activity

Will publishers or subscribers be in charge or making sure that only specific activity stream entries make it to a certain queue?
If publishers are in charge, I would imagine that there would exist a list of all subscribers for each publisher. Then each activity published would be added to all the subscribers in that publishers subscriber list. 
If subscribers are in charge, each subscriber would have a list of publishers he/she is subscribed to. Then on some sort of timer, the list would be iterated through and all activity entires not already consumed by that subscriber would be outputted.  
Looking at the application architecture here: http://streams.incubator.apache.org/architecture.html It looks like all activity is passed through a single queue. If this is going to be the implementation going forward, I would think it would make more sense for subscribers to handle the filtering. That would make it so that all activity entires could be dumped in a single database by the publishers and activity could be extracted and filtered based on some list kept by each individual subscriber. Let me know if this sounds like it aligns with the direction of the project. I would like to have the functionality to allow subscribers to get only specific messages that are published.     		 	   		  

Re: Subscriber/ Publisher handling of activity

Posted by Jason Letourneau <jl...@gmail.com>.
It isn't true that a subscriber necessarily knows what publisher it
cares about or that it cares at all, so tying it to Q's created for
specific publishers gets messy pretty fast since you'd need a Q for
every possible type of filter you'd want as well (aka by activity
type, user account, subject, etc.)

The aggregator service basically manages each activity independant of
where it came from and allows the subscribers to decide if they care
about each activity independently.  It is in theory in a constant
state of popping and iterating over each subscriber.  One way to scale
this would be to create multiple aggregator service instances, that
grabbed an activity when they were available to do so and retained the
same list of subscribers to iterate through.

The way a subscriber is currently instantiated is it posts the filters
it wants to be created with along with the filter criteria to apply to
an activity it is offered - at which point a bean for the subscriber
is instantiated.  My only concern with creating multiple aggregator
services would be blocking conditions that could occur when they
independantly offer activities to a subscriber.  Since we are looking
to be as high performance as possible, that could or could not be an
issue, I am not sure.  But there may be a smart way for aggregators to
collaborate on how they work with the subscribers...

On Wed, Jul 10, 2013 at 5:11 PM, Danny Sullivan <ds...@hotmail.com> wrote:
> Rather than iterate through the entire queue of all activity entries from all publishers (presumably since the beginning of the queue's existence) would it make more sense to have a map of publisher urls to activity queues for specific publishers? That way subscribers could look up activity only for publishers that he/she has subscribed to quickly rather than sifting through publishers that he/she doesn't care about.
>
>> Date: Tue, 9 Jul 2013 20:01:48 -0400
>> Subject: Re: Subscriber/ Publisher handling of activity
>> From: jletourneau80@gmail.com
>> To: dev@streams.incubator.apache.org
>>
>> A clarifying point on the iteration - the aggregator service knows about
>> each subscriber and is responsible for pulling activities from the queue
>> and offering them to each subscriber - this is a potential bottleneck and
>> it would be good to get a discussion going on how to mitigate that
>>
>> On Tuesday, July 9, 2013, Jason Letourneau wrote:
>>
>> > The last discussion on this topic had subscribers applying a filter to
>> > each published message on the queue - there should be some stub classes in
>> > the source that shows this thought direction - each subscriber would be
>> > iterated over and asked to process each published activity on the queue -
>> > they would apply a filter adhering to the filter interface - the
>> > implementation of that filter could be anything - one thought was a dsl
>> > like lucene syntax could be the default implementation - to answer your
>> > foundational question - publishers should have no knowledge of who is
>> > subscribed an subscriber should be able to filter in the best way for them
>> > (I.e. based on source o message, user, activity streams properties etc)
>> >
>> > Jason
>> >
>> >
>> > On Tuesday, July 9, 2013, Danny Sullivan wrote:
>> >
>> >> Will publishers or subscribers be in charge or making sure that only
>> >> specific activity stream entries make it to a certain queue?
>> >> If publishers are in charge, I would imagine that there would exist a
>> >> list of all subscribers for each publisher. Then each activity published
>> >> would be added to all the subscribers in that publishers subscriber list.
>> >> If subscribers are in charge, each subscriber would have a list of
>> >> publishers he/she is subscribed to. Then on some sort of timer, the list
>> >> would be iterated through and all activity entires not already consumed by
>> >> that subscriber would be outputted.
>> >> Looking at the application architecture here:
>> >> http://streams.incubator.apache.org/architecture.html It looks like all
>> >> activity is passed through a single queue. If this is going to be the
>> >> implementation going forward, I would think it would make more sense for
>> >> subscribers to handle the filtering. That would make it so that all
>> >> activity entires could be dumped in a single database by the publishers and
>> >> activity could be extracted and filtered based on some list kept by each
>> >> individual subscriber. Let me know if this sounds like it aligns with the
>> >> direction of the project. I would like to have the functionality to allow
>> >> subscribers to get only specific messages that are published.
>> >>
>> >
>> >
>

RE: Subscriber/ Publisher handling of activity

Posted by Danny Sullivan <ds...@hotmail.com>.
Rather than iterate through the entire queue of all activity entries from all publishers (presumably since the beginning of the queue's existence) would it make more sense to have a map of publisher urls to activity queues for specific publishers? That way subscribers could look up activity only for publishers that he/she has subscribed to quickly rather than sifting through publishers that he/she doesn't care about. 

> Date: Tue, 9 Jul 2013 20:01:48 -0400
> Subject: Re: Subscriber/ Publisher handling of activity
> From: jletourneau80@gmail.com
> To: dev@streams.incubator.apache.org
> 
> A clarifying point on the iteration - the aggregator service knows about
> each subscriber and is responsible for pulling activities from the queue
> and offering them to each subscriber - this is a potential bottleneck and
> it would be good to get a discussion going on how to mitigate that
> 
> On Tuesday, July 9, 2013, Jason Letourneau wrote:
> 
> > The last discussion on this topic had subscribers applying a filter to
> > each published message on the queue - there should be some stub classes in
> > the source that shows this thought direction - each subscriber would be
> > iterated over and asked to process each published activity on the queue -
> > they would apply a filter adhering to the filter interface - the
> > implementation of that filter could be anything - one thought was a dsl
> > like lucene syntax could be the default implementation - to answer your
> > foundational question - publishers should have no knowledge of who is
> > subscribed an subscriber should be able to filter in the best way for them
> > (I.e. based on source o message, user, activity streams properties etc)
> >
> > Jason
> >
> >
> > On Tuesday, July 9, 2013, Danny Sullivan wrote:
> >
> >> Will publishers or subscribers be in charge or making sure that only
> >> specific activity stream entries make it to a certain queue?
> >> If publishers are in charge, I would imagine that there would exist a
> >> list of all subscribers for each publisher. Then each activity published
> >> would be added to all the subscribers in that publishers subscriber list.
> >> If subscribers are in charge, each subscriber would have a list of
> >> publishers he/she is subscribed to. Then on some sort of timer, the list
> >> would be iterated through and all activity entires not already consumed by
> >> that subscriber would be outputted.
> >> Looking at the application architecture here:
> >> http://streams.incubator.apache.org/architecture.html It looks like all
> >> activity is passed through a single queue. If this is going to be the
> >> implementation going forward, I would think it would make more sense for
> >> subscribers to handle the filtering. That would make it so that all
> >> activity entires could be dumped in a single database by the publishers and
> >> activity could be extracted and filtered based on some list kept by each
> >> individual subscriber. Let me know if this sounds like it aligns with the
> >> direction of the project. I would like to have the functionality to allow
> >> subscribers to get only specific messages that are published.
> >>
> >
> >
 		 	   		  

Re: Subscriber/ Publisher handling of activity

Posted by Jason Letourneau <jl...@gmail.com>.
A clarifying point on the iteration - the aggregator service knows about
each subscriber and is responsible for pulling activities from the queue
and offering them to each subscriber - this is a potential bottleneck and
it would be good to get a discussion going on how to mitigate that

On Tuesday, July 9, 2013, Jason Letourneau wrote:

> The last discussion on this topic had subscribers applying a filter to
> each published message on the queue - there should be some stub classes in
> the source that shows this thought direction - each subscriber would be
> iterated over and asked to process each published activity on the queue -
> they would apply a filter adhering to the filter interface - the
> implementation of that filter could be anything - one thought was a dsl
> like lucene syntax could be the default implementation - to answer your
> foundational question - publishers should have no knowledge of who is
> subscribed an subscriber should be able to filter in the best way for them
> (I.e. based on source o message, user, activity streams properties etc)
>
> Jason
>
>
> On Tuesday, July 9, 2013, Danny Sullivan wrote:
>
>> Will publishers or subscribers be in charge or making sure that only
>> specific activity stream entries make it to a certain queue?
>> If publishers are in charge, I would imagine that there would exist a
>> list of all subscribers for each publisher. Then each activity published
>> would be added to all the subscribers in that publishers subscriber list.
>> If subscribers are in charge, each subscriber would have a list of
>> publishers he/she is subscribed to. Then on some sort of timer, the list
>> would be iterated through and all activity entires not already consumed by
>> that subscriber would be outputted.
>> Looking at the application architecture here:
>> http://streams.incubator.apache.org/architecture.html It looks like all
>> activity is passed through a single queue. If this is going to be the
>> implementation going forward, I would think it would make more sense for
>> subscribers to handle the filtering. That would make it so that all
>> activity entires could be dumped in a single database by the publishers and
>> activity could be extracted and filtered based on some list kept by each
>> individual subscriber. Let me know if this sounds like it aligns with the
>> direction of the project. I would like to have the functionality to allow
>> subscribers to get only specific messages that are published.
>>
>
>

Re: Subscriber/ Publisher handling of activity

Posted by Jason Letourneau <jl...@gmail.com>.
The last discussion on this topic had subscribers applying a filter to each
published message on the queue - there should be some stub classes in the
source that shows this thought direction - each subscriber would be
iterated over and asked to process each published activity on the queue -
they would apply a filter adhering to the filter interface - the
implementation of that filter could be anything - one thought was a dsl
like lucene syntax could be the default implementation - to answer your
foundational question - publishers should have no knowledge of who is
subscribed an subscriber should be able to filter in the best way for them
(I.e. based on source o message, user, activity streams properties etc)

Jason


On Tuesday, July 9, 2013, Danny Sullivan wrote:

> Will publishers or subscribers be in charge or making sure that only
> specific activity stream entries make it to a certain queue?
> If publishers are in charge, I would imagine that there would exist a list
> of all subscribers for each publisher. Then each activity published would
> be added to all the subscribers in that publishers subscriber list.
> If subscribers are in charge, each subscriber would have a list of
> publishers he/she is subscribed to. Then on some sort of timer, the list
> would be iterated through and all activity entires not already consumed by
> that subscriber would be outputted.
> Looking at the application architecture here:
> http://streams.incubator.apache.org/architecture.html It looks like all
> activity is passed through a single queue. If this is going to be the
> implementation going forward, I would think it would make more sense for
> subscribers to handle the filtering. That would make it so that all
> activity entires could be dumped in a single database by the publishers and
> activity could be extracted and filtered based on some list kept by each
> individual subscriber. Let me know if this sounds like it aligns with the
> direction of the project. I would like to have the functionality to allow
> subscribers to get only specific messages that are published.
>