You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streams.apache.org by "Lavender, Beth A" <la...@mitre.org> on 2013/06/03 22:53:36 UTC

Question about processing architecture

Many of our current systems that will feed the integrated activity stream are noisy. For example, if I update a page 4 times in  5 minutes it generates an activity for each one.   I want to be able to set rule for discard the last n activities if they have the same actor, verb, and object  in x time frame. 

This assumes a sub processor that detects the pattern and takes an action described in a rule. Where do rules and sub processors fit in this architecture? Is anyone doing this in their existing systems?


Re: Question about processing architecture

Posted by Ryan Baxter <rb...@gmail.com>.
On Thu, Jun 6, 2013 at 8:08 AM, Beth Lavender <la...@gmail.com> wrote:
> On Tue, Jun 4, 2013 at 8:32 AM, Matt Franklin <m....@gmail.com>wrote:
>
>> On Mon, Jun 3, 2013 at 10:52 PM, Jason Letourneau
>> <jl...@gmail.com>wrote:
>>
>> > The current vision is that filters will be implemented agnostic to the
>> > overall processing architecture - there may be subscribers using lucene
>> dsl
>> > as part of the initial streams implemetation- but the interface won't
>> > dictate how a subscriber filters it's activities -
>>
>>
>> Maybe I am misunderstanding your statement.  In my mind, we really need
>> inbound and outbound data pipelines.  I don't think a simple outbound
>> filter can solve this easily.  I don't see how the system can do
>> de-duplication, supersession, aggregation, etc during the outbound phase.
>>  We will need to do a lot of processing before we hit an intermediate
>> persistence layer that can then be used by subscriber filters and query
>> endpoints.
>>
>> The pipeline components themselves should be pluggable and we just need a
>> series of workflow events that they can hook and do work against the
>> incoming data.
>>
>> Am I off base?
>>
>
> The rollup [1] reference is useful.  There is an implied set use cases
> given the context for "views that support roll-up".  Do we have a set of
> use cases documented (or that can be referenced) that would help drive
> where in the architecture the plug ins are needed?

Views are just a subset of your stream in Connections.  One view where
rollup is not used is the action required view.  This will show all
activities that the application deemed to need some type of action
performed by a user, for example an approval for something.  This
concept of "action required" is also an extension to the activity
entry that IBM came up with.  We don't "rollup" activity entries in
this view because there might be several actions you need to take on a
single object and we don't want to give the impression that there may
on by one by rolling them up.

>
>>
>>
>> > I don't know that we've
>> > figured out whether the subscriber delegate tells the aggregate its
>> filter
>> > via an interface or whether aggregator tells each subscriber about every
>> > activity and the subscriber filters - either way - you can implement
>> > filters however you want - provides they adhere to the common filter
>> > interface (which to my recollection is very simplistic)
>> >
>> > On Monday, June 3, 2013, Lavender, Beth A wrote:
>> >
>> > > Many of our current systems that will feed the integrated activity
>> stream
>> > > are noisy. For example, if I update a page 4 times in  5 minutes it
>> > > generates an activity for each one.   I want to be able to set rule for
>> > > discard the last n activities if they have the same actor, verb, and
>> > object
>> > >  in x time frame.
>> > >
>> > > This assumes a sub processor that detects the pattern and takes an
>> action
>> > > described in a rule. Where do rules and sub processors fit in this
>> > > architecture? Is anyone doing this in their existing systems?
>> > >
>> > >
>> >
>>

Re: Question about processing architecture

Posted by Beth Lavender <la...@gmail.com>.
On Tue, Jun 4, 2013 at 8:32 AM, Matt Franklin <m....@gmail.com>wrote:

> On Mon, Jun 3, 2013 at 10:52 PM, Jason Letourneau
> <jl...@gmail.com>wrote:
>
> > The current vision is that filters will be implemented agnostic to the
> > overall processing architecture - there may be subscribers using lucene
> dsl
> > as part of the initial streams implemetation- but the interface won't
> > dictate how a subscriber filters it's activities -
>
>
> Maybe I am misunderstanding your statement.  In my mind, we really need
> inbound and outbound data pipelines.  I don't think a simple outbound
> filter can solve this easily.  I don't see how the system can do
> de-duplication, supersession, aggregation, etc during the outbound phase.
>  We will need to do a lot of processing before we hit an intermediate
> persistence layer that can then be used by subscriber filters and query
> endpoints.
>
> The pipeline components themselves should be pluggable and we just need a
> series of workflow events that they can hook and do work against the
> incoming data.
>
> Am I off base?
>

The rollup [1] reference is useful.  There is an implied set use cases
given the context for "views that support roll-up".  Do we have a set of
use cases documented (or that can be referenced) that would help drive
where in the architecture the plug ins are needed?

>
>
> > I don't know that we've
> > figured out whether the subscriber delegate tells the aggregate its
> filter
> > via an interface or whether aggregator tells each subscriber about every
> > activity and the subscriber filters - either way - you can implement
> > filters however you want - provides they adhere to the common filter
> > interface (which to my recollection is very simplistic)
> >
> > On Monday, June 3, 2013, Lavender, Beth A wrote:
> >
> > > Many of our current systems that will feed the integrated activity
> stream
> > > are noisy. For example, if I update a page 4 times in  5 minutes it
> > > generates an activity for each one.   I want to be able to set rule for
> > > discard the last n activities if they have the same actor, verb, and
> > object
> > >  in x time frame.
> > >
> > > This assumes a sub processor that detects the pattern and takes an
> action
> > > described in a rule. Where do rules and sub processors fit in this
> > > architecture? Is anyone doing this in their existing systems?
> > >
> > >
> >
>

Re: Question about processing architecture

Posted by Jason Letourneau <jl...@gmail.com>.
No - that's all correct - I was just addressing that the way filters
are put together won't be restricted by the streams processing
architecture...in other words how different implementations and use
cases filter or implement filtering doesn't matter in the context of
the processing architecture.  Your statement of pluggable pipeline
components is right on - hence the messaging architecture/EIP.

On Tue, Jun 4, 2013 at 8:32 AM, Matt Franklin <m....@gmail.com> wrote:
> On Mon, Jun 3, 2013 at 10:52 PM, Jason Letourneau
> <jl...@gmail.com>wrote:
>
>> The current vision is that filters will be implemented agnostic to the
>> overall processing architecture - there may be subscribers using lucene dsl
>> as part of the initial streams implemetation- but the interface won't
>> dictate how a subscriber filters it's activities -
>
>
> Maybe I am misunderstanding your statement.  In my mind, we really need
> inbound and outbound data pipelines.  I don't think a simple outbound
> filter can solve this easily.  I don't see how the system can do
> de-duplication, supersession, aggregation, etc during the outbound phase.
>  We will need to do a lot of processing before we hit an intermediate
> persistence layer that can then be used by subscriber filters and query
> endpoints.
>
> The pipeline components themselves should be pluggable and we just need a
> series of workflow events that they can hook and do work against the
> incoming data.
>
> Am I off base?
>
>
>> I don't know that we've
>> figured out whether the subscriber delegate tells the aggregate its filter
>> via an interface or whether aggregator tells each subscriber about every
>> activity and the subscriber filters - either way - you can implement
>> filters however you want - provides they adhere to the common filter
>> interface (which to my recollection is very simplistic)
>>
>> On Monday, June 3, 2013, Lavender, Beth A wrote:
>>
>> > Many of our current systems that will feed the integrated activity stream
>> > are noisy. For example, if I update a page 4 times in  5 minutes it
>> > generates an activity for each one.   I want to be able to set rule for
>> > discard the last n activities if they have the same actor, verb, and
>> object
>> >  in x time frame.
>> >
>> > This assumes a sub processor that detects the pattern and takes an action
>> > described in a rule. Where do rules and sub processors fit in this
>> > architecture? Is anyone doing this in their existing systems?
>> >
>> >
>>

Re: Question about processing architecture

Posted by Matt Franklin <m....@gmail.com>.
On Mon, Jun 3, 2013 at 10:52 PM, Jason Letourneau
<jl...@gmail.com>wrote:

> The current vision is that filters will be implemented agnostic to the
> overall processing architecture - there may be subscribers using lucene dsl
> as part of the initial streams implemetation- but the interface won't
> dictate how a subscriber filters it's activities -


Maybe I am misunderstanding your statement.  In my mind, we really need
inbound and outbound data pipelines.  I don't think a simple outbound
filter can solve this easily.  I don't see how the system can do
de-duplication, supersession, aggregation, etc during the outbound phase.
 We will need to do a lot of processing before we hit an intermediate
persistence layer that can then be used by subscriber filters and query
endpoints.

The pipeline components themselves should be pluggable and we just need a
series of workflow events that they can hook and do work against the
incoming data.

Am I off base?


> I don't know that we've
> figured out whether the subscriber delegate tells the aggregate its filter
> via an interface or whether aggregator tells each subscriber about every
> activity and the subscriber filters - either way - you can implement
> filters however you want - provides they adhere to the common filter
> interface (which to my recollection is very simplistic)
>
> On Monday, June 3, 2013, Lavender, Beth A wrote:
>
> > Many of our current systems that will feed the integrated activity stream
> > are noisy. For example, if I update a page 4 times in  5 minutes it
> > generates an activity for each one.   I want to be able to set rule for
> > discard the last n activities if they have the same actor, verb, and
> object
> >  in x time frame.
> >
> > This assumes a sub processor that detects the pattern and takes an action
> > described in a rule. Where do rules and sub processors fit in this
> > architecture? Is anyone doing this in their existing systems?
> >
> >
>

Re: Question about processing architecture

Posted by Jason Letourneau <jl...@gmail.com>.
The current vision is that filters will be implemented agnostic to the
overall processing architecture - there may be subscribers using lucene dsl
as part of the initial streams implemetation- but the interface won't
dictate how a subscriber filters it's activities - I don't know that we've
figured out whether the subscriber delegate tells the aggregate its filter
via an interface or whether aggregator tells each subscriber about every
activity and the subscriber filters - either way - you can implement
filters however you want - provides they adhere to the common filter
interface (which to my recollection is very simplistic)

On Monday, June 3, 2013, Lavender, Beth A wrote:

> Many of our current systems that will feed the integrated activity stream
> are noisy. For example, if I update a page 4 times in  5 minutes it
> generates an activity for each one.   I want to be able to set rule for
> discard the last n activities if they have the same actor, verb, and object
>  in x time frame.
>
> This assumes a sub processor that detects the pattern and takes an action
> described in a rule. Where do rules and sub processors fit in this
> architecture? Is anyone doing this in their existing systems?
>
>

Re: Question about processing architecture

Posted by Ryan Baxter <rb...@apache.org>.
To solve some of this in our activity stream in Connections we
introduced a concept called "rollup" [1] so if there are entries with
the same rollup id in the stream we only ever show the latest one.
The user can then open the embedded experience for the entry to see
the prior entries for this rollup id.

[1] http://www-10.lotus.com/ldd/appdevwiki.nsf/xpDocViewer.xsp?lookupName=IBM+Connections+4.5+API+Documentation#action=openDocument&res_title=Support_for_Rollup_ic45&content=pdcontent

On Mon, Jun 3, 2013 at 4:53 PM, Lavender, Beth A <la...@mitre.org> wrote:
> Many of our current systems that will feed the integrated activity stream are noisy. For example, if I update a page 4 times in  5 minutes it generates an activity for each one.   I want to be able to set rule for discard the last n activities if they have the same actor, verb, and object  in x time frame.
>
> This assumes a sub processor that detects the pattern and takes an action described in a rule. Where do rules and sub processors fit in this architecture? Is anyone doing this in their existing systems?
>