You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directory.apache.org by Alex Karasulu <ao...@bellsouth.net> on 2004/09/21 23:40:19 UTC

Re: [chain] Pipeline implementation

Hi all,

Just wanted at least once to attempt to cross pollinate both lists
(Directory) and Commons by posting to both this response.  I think there
are some very interesting ideas here.  
Most likely we are not going to be able to make the SEDA stuff this
general purpose - nor do we want to however the patterns used and
problems in both spaces seem to overlap a little.

It's good to share ideas on how different people in different spaces
solve very similar problems.  The entire trail has merit and can be
accessed here in eyebrowse:

http://nagoya.apache.org/eyebrowse/BrowseList?listName=commons-dev@jakarta.apache.org&by=thread&from=885656

Kris my response to you is inline.

On Mon, 2004-09-20 at 13:37, Kris Nuttycombe wrote:
> Alex Karasulu wrote:
> 
> >Hi Kris,
> >
> >On Fri, 2004-09-17 at 19:29, Kris Nuttycombe wrote:
> >
> ><snip/>
> >  
> >
> >>How is routing handled if you have multiple subscribers registered that 
> >>can handle the same type of event? 
> >>    
> >>
> >
> >The event is delivered to all subscribers then.
> >  
> >
> >>Is it possible under your framework 
> >>to explicitly specify the notification sequence such that an event is 
> >>handled by one subscriber which raises the same variety of event to be 
> >>handled by the next subscriber?
> >>    
> >>
> >
> >I think we're currently exploring things like that.  Actually this is a
> >pretty good idea in some respects to achieve something we have been
> >searching for.  Thanks for the idea I need to lull it over a bit.
> >  
> >
> Perhaps in the subscriber registration process one could pass a flag 
> that dictates whether processing of the event should be in parallel or 
> serialized? It could even be something as simple as an index where 

Funny you mentioned this.  A fellow named Trustin and I have been
discussing these same tactics just last week.

> subscribers that share the same index have events processed in parallel. 
> Also, perhaps instead of returning void StageHandler.handleEvent() could 
> return a boolean value that flags whether or not the event is allowed to 
> propagate to other stages with higher serial numbers.

That's also another good idea.  This almost reminds me of rule salience
in expert system shells.  What stage does the event have the most
affinity for?

> >You still don't need to give up on type safety with this pub/sub model
> >because type safety is honored in the event types and the subscribers. 
> >But you still have dynamism.
> >  
> >
> The StageHandler still needs to do a runtime cast to get a useful event 
> class in any case, so I don't see how this really gives you any more 
> type safety, and you give up some modularity. If I have six different 
> types of geometry processers that need to run serially, it doesn't make 
> sense to have each of them generate a different event type because the 
> order in which these stages should be run may vary from dataset to 
> dataset. I guess I could work around this by having each stage generate 
> a unique event and changing the subscription arrangements in each case, 

We also discussed something similar.  We're still exploring the best
route to take.  What we have is definately not set in stone.

> but then it seems like you have bleeding of the application logic into 
> the configuration realm. Maybe one could modify the StageHandler 
> interface by adding a method that allows you to query for the runtime 
> class of the event returned to get around this problem.

I don't understand the "bleeding of the application logic" comment. 
Could you clarify this some more and explain how this is removed when
the class of the event can be queried?

> >RE: the synchronization constructs needed for sequential processing
> >through portions of the pipeline we are in the process of thinking about
> >how we can do this without a lot of synch overhead.  Perhaps we can
> >think about this one together.
> >  
> >
> >>Doesn't this cause problems if your StageHandler raises the same variety 
> >>of event that caused it to be invoked in the first place?
> >>    
> >>
> >
> >If there is not terminating condition yes I would suppose so however
> >none of our stages generate the same event they process.  If they did
> >this could produce an infinate event processing loop.
> >  
> >
> We do things like this all the time, but I'm beginning to see how we 
> could get around it by having a base event type that related stages all 
> process and have each stage raise a subtype of that event. Seems a bit 
> like going the long way around the horn for our use case, but it might 
> add enough value to be worth it.

Well this way may not be the best way for you.  This is our first
attempt using the pub/sub pattern.  Questions about subtyping verses
other means have been discussed.  Right now we simply don't know which
way is the best way.  

I think this is why we have started to write some simple protocols to
test the SEDA framework and see which techniques best suite our use
cases.

Alex

Re: [chain] Pipeline implementation

Posted by Alex Karasulu <ao...@bellsouth.net>.

On Tue, 2004-09-21 at 19:46, Kris Nuttycombe wrote:
> Alex Karasulu wrote:
> 
> >>subscribers that share the same index have events processed in parallel. 
> >>Also, perhaps instead of returning void StageHandler.handleEvent() could 
> >>return a boolean value that flags whether or not the event is allowed to 
> >>propagate to other stages with higher serial numbers.
> >
> >
> >That's also another good idea.  This almost reminds me of rule salience
> >in expert system shells.  What stage does the event have the most
> >affinity for?
> >  
> >
> I hadn't thought of things in this context, but both the stage/event 
> handling pieces of the SEDA framework and the pipeline we've developed 
> here do seem a lot like frameworks for building specialized expert 
> systems with concurrent processing.

Good characterization! I like that.

> >>but then it seems like you have bleeding of the application logic into 
> >>the configuration realm. Maybe one could modify the StageHandler 
> >>interface by adding a method that allows you to query for the runtime 
> >>class of the event returned to get around this problem.
> >>    
> >>
> >
> >I don't understand the "bleeding of the application logic" comment. 
> >Could you clarify this some more and explain how this is removed when
> >the class of the event can be queried?
> >  
> >
> StageHandler's handleEvent() method is regularly responsible for raising 
> events and pass them back to the event router, right? The problem is 
> that there's nothing in the public API that makes it clear what events a 
> particluar StageHandler may generate, so establishing a routing scheme 
> is a manual process that involves the programmer having knowledge of the 
> StageHandler's internals. In a situation where you're trying to set up a 
> linear routing scheme from a configuration file, it would make more 
> sense that the ordering of elements in that file would determine the 
> routing. If it's possible for a configuration tool to look at a 
> StageHandler and determine what events the handleEvent method has the 
> potential to raise, then automatic configuration becomes much simpler. 
> It might also be useful to define a method on the interface that allows 
> a handler to announce what events it can handle.

Thanks I see now what you have referred to.

> >>We do things like this all the time, but I'm beginning to see how we 
> >>could get around it by having a base event type that related stages all 
> >>process and have each stage raise a subtype of that event. Seems a bit 
> >>like going the long way around the horn for our use case, but it might 
> >>add enough value to be worth it.
> >>    
> >>
> >
> >Well this way may not be the best way for you.  This is our first
> >attempt using the pub/sub pattern.  Questions about subtyping verses
> >other means have been discussed.  Right now we simply don't know which
> >way is the best way.  
> >  
> >
> 
> I think that the pub/sub model definitely has the potential to be a lot 
> more powerful than our current approach; it's just a matter of 
> developing the interfaces to make them flexible enough to support use 
> cases for both projects. I think that our use cases are different enough 
> that if we can find a model that satisfies both it will be a broadly 
> useful framework.

Yeah we just realy need to be careful when trying to give birth to a
panecea from the start.  I'm more in the relaxed mindset thinking we can
have two frameworks and see where we can reach common ground.  I would
love to work on both.  I think we will see more convergence in some
areas and divergence in others.  As we gain more experience and
understand all possible use cases I think we can converge if we can. 

> Initially Craig had suggested setting up a commons-pipeline project in 
> the sandbox. I've been preparing our code (licenses, submission 
> agreements, etc) to make this transition. Are you at all interested in 
> refactoring out the stage, event routing, and thread handling pieces 
> from the network-oriented bits of SEDA into this project? 

Sure I would also love to be involved.

> There are 
> definitely parts of your code that I'd like to be able to use without 
> forking them, although I'm sure you don't really want to introduce 
> extraneous dependencies.

I think we can fork it the code is not all that much.  I would rather
fork and come to a point where we reconverge instead of slowing down
development because of opposing needs.  This is a new start for both
projects; there should be room to aggressively experiment or alter their
course before binding them tightly at the hip with a common shared API.

This may be a little radical in thought but I think common libs don't
appear over night they are slowly purified from tons of different use
cases.  This IMO is what makes commons so successful.  But I'm totally
ready to work on the other extreme to - I have a slight case of
schitzophrenia :).  If you have a preference I can go with it.

Cheers, 
Alex




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [chain] Pipeline implementation

Posted by Alex Karasulu <ao...@bellsouth.net>.

On Wed, 2004-09-22 at 10:33, Kris Nuttycombe wrote:
<snip/>

> I'll try and get our code to Craig today so that people can have a 
> closer look at it.

Excellent.  I think this can help me see more overlap as well.

Alex


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

RE: [chain] Pipeline implementation

Posted by Rory Winston <rw...@eircom.net>.

Kris,

This sounds fantastic!....I would be very keen on taking a look at this when
you manage to get it into the repository. I would concur that something like
this would probably site better as a separate project (commons-pipeline,
perhaps?). Sorry if I am repeating any of the previous correspondance, but I
haven't had a chance to check the list yet - do you use NIO in this project,
and do your Connectors just conform to a generic interface? If so, what is
the contract for their input/output?

Cheers,
Rory

-----Original Message-----
From: Kris Nuttycombe [mailto:Kris.Nuttycombe@noaa.gov]
Sent: 22 September 2004 15:34
To: Jakarta Commons Developers List
Subject: Re: [chain] Pipeline implementation


This is exactly the sort of use case we have developed our pipeline
under, and there are in fact already plugins to obtain data over FTP and
HTTP, and work is ongoing on an XSLT plugin.

All of our pipeline implementations are configured from XML files using
Digester. It's interesting that you mention commons-chain because that
was where I started, as well. Earlier on in the thread we discussed the
relationship of commons-chain to the pipeline idea, and came to the
conclusion that since chain is more about decision making than
concurrent processing, pipeline should probably be a separate project
with an adapter that allows a chain to be used for decision making and
processing of a single stage in the pipeline.

I'll try and get our code to Craig today so that people can have a
closer look at it.

Kris

Rory Winston wrote:

>I've been following bits of this thread (will re-read the whole thread
later
>when I have time), and I am fascinated with this approach. I have been
>mulling over a similar implementation for a new framework, which comes from
>real-world requirements in my work. The basic idea will be a pipeline,
which
>may have many inputs, and many outputs. The outputs are configurable, and
>could be e.g. FTP, HTTP, SOAP, JMS, JDBC, JCA(??) etc. An input arrives at
>one end of the pipeline via a connector, and passes through each stage in
>the pipeline. Each individual stage takes an input and produces an output,
>and the "work" done in each pipeline stage can be performed via a plugin,
of
>sorts. A hook will be provided in each stage so that a developer could
>insert his/her custom code. Here is a simple diagram of what I have in
mind.
>This is a trivial plugin that reads line data files from an FTP connector,
>transforms the line data to XML via  predefined transform, then passes
>through another pipeline stage that transforms the XML files to PDF. The
>final stage is another connector that stores the generated PDF files in a
>database.
>
>
>
>     .--------------------.               .---------------------.
>.---------------------.    .-------------.
>     |                    |               |                     |
>|                     |    |             |
>     |                    |               |                     |
>|                     |    |             |
>     |   Conn. A          |-------------- | Pipeline Stage A
>|------------------| Pipeline Stage B    |----|  Conn. B    |
>     |    (FTP)           |      .        |                     |
>|                     |    |     (JDBC)  |
>     |                    |      |        |                     |
>|                     |    |             |
>     '--------------------'      |        '---------------------'
>'---------------------'    '-------------'
>                                 |                       .
>.                      .
>                                 |                       |
>|                      |
>                                 |                       |
>|                      |
>                                 |                       |
>Transform XMl          Write PDF
>                               Read files              Transform line data
>to PDF                 to DB
>                                                               to XML
>
>
>Thinking about this problem, there are a few things that are needed:
>
> - A pipeline-based processing model
> - A generic connector API
> - Event-driven async. pipeline processing
> -
>
>A workflow specification (possibly XML) could define the pipeline "flow",
>and this could be generated via a GUI. The WebMethods Integration platform
>is the slickest example of this approach that I have seen.
>
>My questions are thus : could Commons-Chain (+ pipeline) be used as the
>basis for this type of processing? Are there any other open-source
>frameworks that anyone knows of that do this already?
>
>Cheers,
>Rory
>
>
>-----Original Message-----
>From: Kris Nuttycombe [mailto:Kris.Nuttycombe@noaa.gov]
>Sent: 22 September 2004 00:47
>To: Jakarta Commons Developers List
>Subject: Re: [chain] Pipeline implementation
>
>
>Alex Karasulu wrote:
>
>
>
>>>subscribers that share the same index have events processed in parallel.
>>>Also, perhaps instead of returning void StageHandler.handleEvent() could
>>>return a boolean value that flags whether or not the event is allowed to
>>>propagate to other stages with higher serial numbers.
>>>
>>>
>>That's also another good idea.  This almost reminds me of rule salience
>>in expert system shells.  What stage does the event have the most
>>affinity for?
>>
>>
>>
>>
>I hadn't thought of things in this context, but both the stage/event
>handling pieces of the SEDA framework and the pipeline we've developed
>here do seem a lot like frameworks for building specialized expert
>systems with concurrent processing.
>
>
>
>>>but then it seems like you have bleeding of the application logic into
>>>the configuration realm. Maybe one could modify the StageHandler
>>>interface by adding a method that allows you to query for the runtime
>>>class of the event returned to get around this problem.
>>>
>>>
>>>
>>>
>>I don't understand the "bleeding of the application logic" comment.
>>Could you clarify this some more and explain how this is removed when
>>the class of the event can be queried?
>>
>>
>>
>>
>StageHandler's handleEvent() method is regularly responsible for raising
>events and pass them back to the event router, right? The problem is
>that there's nothing in the public API that makes it clear what events a
>particluar StageHandler may generate, so establishing a routing scheme
>is a manual process that involves the programmer having knowledge of the
>StageHandler's internals. In a situation where you're trying to set up a
>linear routing scheme from a configuration file, it would make more
>sense that the ordering of elements in that file would determine the
>routing. If it's possible for a configuration tool to look at a
>StageHandler and determine what events the handleEvent method has the
>potential to raise, then automatic configuration becomes much simpler.
>It might also be useful to define a method on the interface that allows
>a handler to announce what events it can handle.
>
>
>
>>>We do things like this all the time, but I'm beginning to see how we
>>>could get around it by having a base event type that related stages all
>>>process and have each stage raise a subtype of that event. Seems a bit
>>>like going the long way around the horn for our use case, but it might
>>>add enough value to be worth it.
>>>
>>>
>>>
>>>
>>Well this way may not be the best way for you.  This is our first
>>attempt using the pub/sub pattern.  Questions about subtyping verses
>>other means have been discussed.  Right now we simply don't know which
>>way is the best way.
>>
>>
>>
>>
>
>I think that the pub/sub model definitely has the potential to be a lot
>more powerful than our current approach; it's just a matter of
>developing the interfaces to make them flexible enough to support use
>cases for both projects. I think that our use cases are different enough
>that if we can find a model that satisfies both it will be a broadly
>useful framework.
>
>Initially Craig had suggested setting up a commons-pipeline project in
>the sandbox. I've been preparing our code (licenses, submission
>agreements, etc) to make this transition. Are you at all interested in
>refactoring out the stage, event routing, and thread handling pieces
>from the network-oriented bits of SEDA into this project? There are
>definitely parts of your code that I'd like to be able to use without
>forking them, although I'm sure you don't really want to introduce
>extraneous dependencies.
>
>Kris
>
>--
>=====================================================
>Kris Nuttycombe
>Associate Scientist
>Geospatial Data Services Group
>CIRES, National Geophysical Data Center/NOAA
>(303) 497-6337
>Kris.Nuttycombe@noaa.gov
>=====================================================
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>
>
>

--
=====================================================
Kris Nuttycombe
Associate Scientist
Geospatial Data Services Group
CIRES, National Geophysical Data Center/NOAA
(303) 497-6337
Kris.Nuttycombe@noaa.gov
=====================================================



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [chain] Pipeline implementation

Posted by Kris Nuttycombe <Kr...@noaa.gov>.

This is exactly the sort of use case we have developed our pipeline 
under, and there are in fact already plugins to obtain data over FTP and 
HTTP, and work is ongoing on an XSLT plugin.

All of our pipeline implementations are configured from XML files using 
Digester. It's interesting that you mention commons-chain because that 
was where I started, as well. Earlier on in the thread we discussed the 
relationship of commons-chain to the pipeline idea, and came to the 
conclusion that since chain is more about decision making than 
concurrent processing, pipeline should probably be a separate project 
with an adapter that allows a chain to be used for decision making and 
processing of a single stage in the pipeline.

I'll try and get our code to Craig today so that people can have a 
closer look at it.

Kris

Rory Winston wrote:

>I've been following bits of this thread (will re-read the whole thread later
>when I have time), and I am fascinated with this approach. I have been
>mulling over a similar implementation for a new framework, which comes from
>real-world requirements in my work. The basic idea will be a pipeline, which
>may have many inputs, and many outputs. The outputs are configurable, and
>could be e.g. FTP, HTTP, SOAP, JMS, JDBC, JCA(??) etc. An input arrives at
>one end of the pipeline via a connector, and passes through each stage in
>the pipeline. Each individual stage takes an input and produces an output,
>and the "work" done in each pipeline stage can be performed via a plugin, of
>sorts. A hook will be provided in each stage so that a developer could
>insert his/her custom code. Here is a simple diagram of what I have in mind.
>This is a trivial plugin that reads line data files from an FTP connector,
>transforms the line data to XML via  predefined transform, then passes
>through another pipeline stage that transforms the XML files to PDF. The
>final stage is another connector that stores the generated PDF files in a
>database.
>
>
>
>     .--------------------.               .---------------------.
>.---------------------.    .-------------.
>     |                    |               |                     |
>|                     |    |             |
>     |                    |               |                     |
>|                     |    |             |
>     |   Conn. A          |-------------- | Pipeline Stage A
>|------------------| Pipeline Stage B    |----|  Conn. B    |
>     |    (FTP)           |      .        |                     |
>|                     |    |     (JDBC)  |
>     |                    |      |        |                     |
>|                     |    |             |
>     '--------------------'      |        '---------------------'
>'---------------------'    '-------------'
>                                 |                       .
>.                      .
>                                 |                       |
>|                      |
>                                 |                       |
>|                      |
>                                 |                       |
>Transform XMl          Write PDF
>                               Read files              Transform line data
>to PDF                 to DB
>                                                               to XML
>
>
>Thinking about this problem, there are a few things that are needed:
>
> - A pipeline-based processing model
> - A generic connector API
> - Event-driven async. pipeline processing
> -
>
>A workflow specification (possibly XML) could define the pipeline "flow",
>and this could be generated via a GUI. The WebMethods Integration platform
>is the slickest example of this approach that I have seen.
>
>My questions are thus : could Commons-Chain (+ pipeline) be used as the
>basis for this type of processing? Are there any other open-source
>frameworks that anyone knows of that do this already?
>
>Cheers,
>Rory
>
>
>-----Original Message-----
>From: Kris Nuttycombe [mailto:Kris.Nuttycombe@noaa.gov]
>Sent: 22 September 2004 00:47
>To: Jakarta Commons Developers List
>Subject: Re: [chain] Pipeline implementation
>
>
>Alex Karasulu wrote:
>
>  
>
>>>subscribers that share the same index have events processed in parallel.
>>>Also, perhaps instead of returning void StageHandler.handleEvent() could
>>>return a boolean value that flags whether or not the event is allowed to
>>>propagate to other stages with higher serial numbers.
>>>      
>>>
>>That's also another good idea.  This almost reminds me of rule salience
>>in expert system shells.  What stage does the event have the most
>>affinity for?
>>
>>
>>    
>>
>I hadn't thought of things in this context, but both the stage/event
>handling pieces of the SEDA framework and the pipeline we've developed
>here do seem a lot like frameworks for building specialized expert
>systems with concurrent processing.
>
>  
>
>>>but then it seems like you have bleeding of the application logic into
>>>the configuration realm. Maybe one could modify the StageHandler
>>>interface by adding a method that allows you to query for the runtime
>>>class of the event returned to get around this problem.
>>>
>>>
>>>      
>>>
>>I don't understand the "bleeding of the application logic" comment.
>>Could you clarify this some more and explain how this is removed when
>>the class of the event can be queried?
>>
>>
>>    
>>
>StageHandler's handleEvent() method is regularly responsible for raising
>events and pass them back to the event router, right? The problem is
>that there's nothing in the public API that makes it clear what events a
>particluar StageHandler may generate, so establishing a routing scheme
>is a manual process that involves the programmer having knowledge of the
>StageHandler's internals. In a situation where you're trying to set up a
>linear routing scheme from a configuration file, it would make more
>sense that the ordering of elements in that file would determine the
>routing. If it's possible for a configuration tool to look at a
>StageHandler and determine what events the handleEvent method has the
>potential to raise, then automatic configuration becomes much simpler.
>It might also be useful to define a method on the interface that allows
>a handler to announce what events it can handle.
>
>  
>
>>>We do things like this all the time, but I'm beginning to see how we
>>>could get around it by having a base event type that related stages all
>>>process and have each stage raise a subtype of that event. Seems a bit
>>>like going the long way around the horn for our use case, but it might
>>>add enough value to be worth it.
>>>
>>>
>>>      
>>>
>>Well this way may not be the best way for you.  This is our first
>>attempt using the pub/sub pattern.  Questions about subtyping verses
>>other means have been discussed.  Right now we simply don't know which
>>way is the best way.
>>
>>
>>    
>>
>
>I think that the pub/sub model definitely has the potential to be a lot
>more powerful than our current approach; it's just a matter of
>developing the interfaces to make them flexible enough to support use
>cases for both projects. I think that our use cases are different enough
>that if we can find a model that satisfies both it will be a broadly
>useful framework.
>
>Initially Craig had suggested setting up a commons-pipeline project in
>the sandbox. I've been preparing our code (licenses, submission
>agreements, etc) to make this transition. Are you at all interested in
>refactoring out the stage, event routing, and thread handling pieces
>from the network-oriented bits of SEDA into this project? There are
>definitely parts of your code that I'd like to be able to use without
>forking them, although I'm sure you don't really want to introduce
>extraneous dependencies.
>
>Kris
>
>--
>=====================================================
>Kris Nuttycombe
>Associate Scientist
>Geospatial Data Services Group
>CIRES, National Geophysical Data Center/NOAA
>(303) 497-6337
>Kris.Nuttycombe@noaa.gov
>=====================================================
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>
>  
>

-- 
=====================================================
Kris Nuttycombe
Associate Scientist
Geospatial Data Services Group
CIRES, National Geophysical Data Center/NOAA
(303) 497-6337
Kris.Nuttycombe@noaa.gov
=====================================================



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

RE: [chain] Pipeline implementation

Posted by Rory Winston <rw...@eircom.net>.

Alex,

>Very interesting...  almost sounds like a generalized cocoon or
>something.  I like how you are building up using protocol transports to
>pump stuff in and out of this.  It's definately a nice high level
>example of what can be done with pipelining and a juicy set of
>connectors.


Exactly - a "generalized Cocoon" was almost exactly what I was thinking of.
I like Cocoon's architecture
- what I would like to see is an even more generic implementation, utilizing
async. processing, and with
a generalized connector capability, so the endpoints could be
HTTP/FTP/JMS/SOAP/JDBC, or even ERP/CMS systems,
via  a JCA-type capability. The pipeline is there to transform and massage
the data along the way, and ideally,
if there was some sort of workflow or process definition protocol, the
pipeline could have the capability
to "branch" at certain points. The pipeline stages may have a generic "hook"
for custom code, and may be reusable,
so a rich workflow could be built up quickly. Again, I'm using WebMethods
Integration for inspiration here - it's
fantastic, and I haven't seen anything in the OSS sphere that goes down
exactly the same route.


Thanks for the link to Coconut, I'll check it out!

Cheers,
Rory

-----Original Message-----
From: Alex Karasulu [mailto:aok123@bellsouth.net]
Sent: 22 September 2004 16:22
To: Jakarta Commons Developers List
Subject: RE: [chain] Pipeline implementation


On Wed, 2004-09-22 at 04:53, Rory Winston wrote:
> I've been following bits of this thread (will re-read the whole thread
later
> when I have time), and I am fascinated with this approach. I have been
> mulling over a similar implementation for a new framework, which comes
from
> real-world requirements in my work. The basic idea will be a pipeline,
which
> may have many inputs, and many outputs. The outputs are configurable, and
> could be e.g. FTP, HTTP, SOAP, JMS, JDBC, JCA(??) etc. An input arrives at
> one end of the pipeline via a connector, and passes through each stage in
> the pipeline. Each individual stage takes an input and produces an output,
> and the "work" done in each pipeline stage can be performed via a plugin,
of
> sorts. A hook will be provided in each stage so that a developer could
> insert his/her custom code. Here is a simple diagram of what I have in
mind.
> This is a trivial plugin that reads line data files from an FTP connector,
> transforms the line data to XML via  predefined transform, then passes
> through another pipeline stage that transforms the XML files to PDF. The
> final stage is another connector that stores the generated PDF files in a
> database.

Very interesting...  almost sounds like a generalized cocoon or
something.  I like how you are building up using protocol transports to
pump stuff in and out of this.  It's definately a nice high level
example of what can be done with pipelining and a juicy set of
connectors.

<snip/>

Sorry to have lost yer ascii art :(.

> Thinking about this problem, there are a few things that are needed:
>
>  - A pipeline-based processing model
>  - A generic connector API
>  - Event-driven async. pipeline processing


> A workflow specification (possibly XML) could define the pipeline "flow",
> and this could be generated via a GUI. The WebMethods Integration platform
> is the slickest example of this approach that I have seen.
>
> My questions are thus : could Commons-Chain (+ pipeline) be used as the
> basis for this type of processing? Are there any other open-source
> frameworks that anyone knows of that do this already?

This is sounding more and more like you can use a combination of chain
and seda code.  Also there is another effort which is more on the SEDA
and Async IO side called coconut at the codehaus which has similar
functionality but its missing some of the aspects of the chain library
here:

http://coconut.codehaus.org/

> -----Original Message-----
> From: Kris Nuttycombe [mailto:Kris.Nuttycombe@noaa.gov]
> Sent: 22 September 2004 00:47
> To: Jakarta Commons Developers List
> Subject: Re: [chain] Pipeline implementation
>
>
> Alex Karasulu wrote:
>
> >>subscribers that share the same index have events processed in parallel.
> >>Also, perhaps instead of returning void StageHandler.handleEvent() could
> >>return a boolean value that flags whether or not the event is allowed to
> >>propagate to other stages with higher serial numbers.
> >
> >
> >That's also another good idea.  This almost reminds me of rule salience
> >in expert system shells.  What stage does the event have the most
> >affinity for?
> >
> >
> I hadn't thought of things in this context, but both the stage/event
> handling pieces of the SEDA framework and the pipeline we've developed
> here do seem a lot like frameworks for building specialized expert
> systems with concurrent processing.
>
> >>but then it seems like you have bleeding of the application logic into
> >>the configuration realm. Maybe one could modify the StageHandler
> >>interface by adding a method that allows you to query for the runtime
> >>class of the event returned to get around this problem.
> >>
> >>
> >
> >I don't understand the "bleeding of the application logic" comment.
> >Could you clarify this some more and explain how this is removed when
> >the class of the event can be queried?
> >
> >
> StageHandler's handleEvent() method is regularly responsible for raising
> events and pass them back to the event router, right? The problem is
> that there's nothing in the public API that makes it clear what events a
> particluar StageHandler may generate, so establishing a routing scheme
> is a manual process that involves the programmer having knowledge of the
> StageHandler's internals. In a situation where you're trying to set up a
> linear routing scheme from a configuration file, it would make more
> sense that the ordering of elements in that file would determine the
> routing. If it's possible for a configuration tool to look at a
> StageHandler and determine what events the handleEvent method has the
> potential to raise, then automatic configuration becomes much simpler.
> It might also be useful to define a method on the interface that allows
> a handler to announce what events it can handle.
>
> >>We do things like this all the time, but I'm beginning to see how we
> >>could get around it by having a base event type that related stages all
> >>process and have each stage raise a subtype of that event. Seems a bit
> >>like going the long way around the horn for our use case, but it might
> >>add enough value to be worth it.
> >>
> >>
> >
> >Well this way may not be the best way for you.  This is our first
> >attempt using the pub/sub pattern.  Questions about subtyping verses
> >other means have been discussed.  Right now we simply don't know which
> >way is the best way.
> >
> >
>
> I think that the pub/sub model definitely has the potential to be a lot
> more powerful than our current approach; it's just a matter of
> developing the interfaces to make them flexible enough to support use
> cases for both projects. I think that our use cases are different enough
> that if we can find a model that satisfies both it will be a broadly
> useful framework.
>
> Initially Craig had suggested setting up a commons-pipeline project in
> the sandbox. I've been preparing our code (licenses, submission
> agreements, etc) to make this transition. Are you at all interested in
> refactoring out the stage, event routing, and thread handling pieces
> from the network-oriented bits of SEDA into this project? There are
> definitely parts of your code that I'd like to be able to use without
> forking them, although I'm sure you don't really want to introduce
> extraneous dependencies.
>
> Kris
>
> --
> =====================================================
> Kris Nuttycombe
> Associate Scientist
> Geospatial Data Services Group
> CIRES, National Geophysical Data Center/NOAA
> (303) 497-6337
> Kris.Nuttycombe@noaa.gov
> =====================================================
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

RE: [chain] Pipeline implementation

Posted by Alex Karasulu <ao...@bellsouth.net>.

On Wed, 2004-09-22 at 04:53, Rory Winston wrote:
> I've been following bits of this thread (will re-read the whole thread later
> when I have time), and I am fascinated with this approach. I have been
> mulling over a similar implementation for a new framework, which comes from
> real-world requirements in my work. The basic idea will be a pipeline, which
> may have many inputs, and many outputs. The outputs are configurable, and
> could be e.g. FTP, HTTP, SOAP, JMS, JDBC, JCA(??) etc. An input arrives at
> one end of the pipeline via a connector, and passes through each stage in
> the pipeline. Each individual stage takes an input and produces an output,
> and the "work" done in each pipeline stage can be performed via a plugin, of
> sorts. A hook will be provided in each stage so that a developer could
> insert his/her custom code. Here is a simple diagram of what I have in mind.
> This is a trivial plugin that reads line data files from an FTP connector,
> transforms the line data to XML via  predefined transform, then passes
> through another pipeline stage that transforms the XML files to PDF. The
> final stage is another connector that stores the generated PDF files in a
> database.

Very interesting...  almost sounds like a generalized cocoon or
something.  I like how you are building up using protocol transports to
pump stuff in and out of this.  It's definately a nice high level
example of what can be done with pipelining and a juicy set of
connectors.

<snip/>

Sorry to have lost yer ascii art :(.

> Thinking about this problem, there are a few things that are needed:
> 
>  - A pipeline-based processing model
>  - A generic connector API
>  - Event-driven async. pipeline processing


> A workflow specification (possibly XML) could define the pipeline "flow",
> and this could be generated via a GUI. The WebMethods Integration platform
> is the slickest example of this approach that I have seen.
> 
> My questions are thus : could Commons-Chain (+ pipeline) be used as the
> basis for this type of processing? Are there any other open-source
> frameworks that anyone knows of that do this already?

This is sounding more and more like you can use a combination of chain
and seda code.  Also there is another effort which is more on the SEDA
and Async IO side called coconut at the codehaus which has similar
functionality but its missing some of the aspects of the chain library
here:

http://coconut.codehaus.org/

> -----Original Message-----
> From: Kris Nuttycombe [mailto:Kris.Nuttycombe@noaa.gov]
> Sent: 22 September 2004 00:47
> To: Jakarta Commons Developers List
> Subject: Re: [chain] Pipeline implementation
> 
> 
> Alex Karasulu wrote:
> 
> >>subscribers that share the same index have events processed in parallel.
> >>Also, perhaps instead of returning void StageHandler.handleEvent() could
> >>return a boolean value that flags whether or not the event is allowed to
> >>propagate to other stages with higher serial numbers.
> >
> >
> >That's also another good idea.  This almost reminds me of rule salience
> >in expert system shells.  What stage does the event have the most
> >affinity for?
> >
> >
> I hadn't thought of things in this context, but both the stage/event
> handling pieces of the SEDA framework and the pipeline we've developed
> here do seem a lot like frameworks for building specialized expert
> systems with concurrent processing.
> 
> >>but then it seems like you have bleeding of the application logic into
> >>the configuration realm. Maybe one could modify the StageHandler
> >>interface by adding a method that allows you to query for the runtime
> >>class of the event returned to get around this problem.
> >>
> >>
> >
> >I don't understand the "bleeding of the application logic" comment.
> >Could you clarify this some more and explain how this is removed when
> >the class of the event can be queried?
> >
> >
> StageHandler's handleEvent() method is regularly responsible for raising
> events and pass them back to the event router, right? The problem is
> that there's nothing in the public API that makes it clear what events a
> particluar StageHandler may generate, so establishing a routing scheme
> is a manual process that involves the programmer having knowledge of the
> StageHandler's internals. In a situation where you're trying to set up a
> linear routing scheme from a configuration file, it would make more
> sense that the ordering of elements in that file would determine the
> routing. If it's possible for a configuration tool to look at a
> StageHandler and determine what events the handleEvent method has the
> potential to raise, then automatic configuration becomes much simpler.
> It might also be useful to define a method on the interface that allows
> a handler to announce what events it can handle.
> 
> >>We do things like this all the time, but I'm beginning to see how we
> >>could get around it by having a base event type that related stages all
> >>process and have each stage raise a subtype of that event. Seems a bit
> >>like going the long way around the horn for our use case, but it might
> >>add enough value to be worth it.
> >>
> >>
> >
> >Well this way may not be the best way for you.  This is our first
> >attempt using the pub/sub pattern.  Questions about subtyping verses
> >other means have been discussed.  Right now we simply don't know which
> >way is the best way.
> >
> >
> 
> I think that the pub/sub model definitely has the potential to be a lot
> more powerful than our current approach; it's just a matter of
> developing the interfaces to make them flexible enough to support use
> cases for both projects. I think that our use cases are different enough
> that if we can find a model that satisfies both it will be a broadly
> useful framework.
> 
> Initially Craig had suggested setting up a commons-pipeline project in
> the sandbox. I've been preparing our code (licenses, submission
> agreements, etc) to make this transition. Are you at all interested in
> refactoring out the stage, event routing, and thread handling pieces
> from the network-oriented bits of SEDA into this project? There are
> definitely parts of your code that I'd like to be able to use without
> forking them, although I'm sure you don't really want to introduce
> extraneous dependencies.
> 
> Kris
> 
> --
> =====================================================
> Kris Nuttycombe
> Associate Scientist
> Geospatial Data Services Group
> CIRES, National Geophysical Data Center/NOAA
> (303) 497-6337
> Kris.Nuttycombe@noaa.gov
> =====================================================
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

RE: [chain] Pipeline implementation

Posted by Rory Winston <rw...@eircom.net>.

Oops. Looks like my line length has messed up my ASCII art :)

The actual pipeline should look like this:

 *Connector A (FTP) ------> Pipeline Stage A (transform line data to
XML) ----------> Pipeline Stage B (transform XML to PDF) ------> ConnectorB
(JDBC)*

-----Original Message-----
From: Rory Winston [mailto:rwinston@eircom.net]
Sent: 22 September 2004 09:53
To: Jakarta Commons Developers List
Subject: RE: [chain] Pipeline implementation

I've been following bits of this thread (will re-read the whole thread later
when I have time), and I am fascinated with this approach. I have been
mulling over a similar implementation for a new framework, which comes from
real-world requirements in my work. The basic idea will be a pipeline, which
may have many inputs, and many outputs. The outputs are configurable, and
could be e.g. FTP, HTTP, SOAP, JMS, JDBC, JCA(??) etc. An input arrives at
one end of the pipeline via a connector, and passes through each stage in
the pipeline. Each individual stage takes an input and produces an output,
and the "work" done in each pipeline stage can be performed via a plugin, of
sorts. A hook will be provided in each stage so that a developer could
insert his/her custom code. Here is a simple diagram of what I have in mind.
This is a trivial plugin that reads line data files from an FTP connector,
transforms the line data to XML via  predefined transform, then passes
through another pipeline stage that transforms the XML files to PDF. The
final stage is another connector that stores the generated PDF files in a
database.

     .--------------------.               .---------------------.
.---------------------.    .-------------.
     |                    |               |                     |
|                     |    |             |
     |                    |               |                     |
|                     |    |             |
     |   Conn. A          |-------------- | Pipeline Stage A
|------------------| Pipeline Stage B    |----|  Conn. B    |
     |    (FTP)           |      .        |                     |
|                     |    |     (JDBC)  |
     |                    |      |        |                     |
|                     |    |             |
     '--------------------'      |        '---------------------'
'---------------------'    '-------------'
                                 |                       .
.                      .
                                 |                       |
|                      |
                                 |                       |
|                      |
                                 |                       |
Transform XMl          Write PDF
                               Read files              Transform line data
to PDF                 to DB
                                                               to XML

Thinking about this problem, there are a few things that are needed:

 - A pipeline-based processing model
 - A generic connector API
 - Event-driven async. pipeline processing
 -

A workflow specification (possibly XML) could define the pipeline "flow",
and this could be generated via a GUI. The WebMethods Integration platform
is the slickest example of this approach that I have seen.

My questions are thus : could Commons-Chain (+ pipeline) be used as the
basis for this type of processing? Are there any other open-source
frameworks that anyone knows of that do this already?

Cheers,
Rory

-----Original Message-----
From: Kris Nuttycombe [mailto:Kris.Nuttycombe@noaa.gov]
Sent: 22 September 2004 00:47
To: Jakarta Commons Developers List
Subject: Re: [chain] Pipeline implementation

Alex Karasulu wrote:

>>subscribers that share the same index have events processed in parallel.
>>Also, perhaps instead of returning void StageHandler.handleEvent() could
>>return a boolean value that flags whether or not the event is allowed to
>>propagate to other stages with higher serial numbers.
>
>
>That's also another good idea.  This almost reminds me of rule salience
>in expert system shells.  What stage does the event have the most
>affinity for?
>
>
I hadn't thought of things in this context, but both the stage/event
handling pieces of the SEDA framework and the pipeline we've developed
here do seem a lot like frameworks for building specialized expert
systems with concurrent processing.

>>but then it seems like you have bleeding of the application logic into
>>the configuration realm. Maybe one could modify the StageHandler
>>interface by adding a method that allows you to query for the runtime
>>class of the event returned to get around this problem.
>>
>>
>
>I don't understand the "bleeding of the application logic" comment.
>Could you clarify this some more and explain how this is removed when
>the class of the event can be queried?
>
>
StageHandler's handleEvent() method is regularly responsible for raising
events and pass them back to the event router, right? The problem is
that there's nothing in the public API that makes it clear what events a
particluar StageHandler may generate, so establishing a routing scheme
is a manual process that involves the programmer having knowledge of the
StageHandler's internals. In a situation where you're trying to set up a
linear routing scheme from a configuration file, it would make more
sense that the ordering of elements in that file would determine the
routing. If it's possible for a configuration tool to look at a
StageHandler and determine what events the handleEvent method has the
potential to raise, then automatic configuration becomes much simpler.
It might also be useful to define a method on the interface that allows
a handler to announce what events it can handle.

>>We do things like this all the time, but I'm beginning to see how we
>>could get around it by having a base event type that related stages all
>>process and have each stage raise a subtype of that event. Seems a bit
>>like going the long way around the horn for our use case, but it might
>>add enough value to be worth it.
>>
>>
>
>Well this way may not be the best way for you.  This is our first
>attempt using the pub/sub pattern.  Questions about subtyping verses
>other means have been discussed.  Right now we simply don't know which
>way is the best way.
>
>

I think that the pub/sub model definitely has the potential to be a lot
more powerful than our current approach; it's just a matter of
developing the interfaces to make them flexible enough to support use
cases for both projects. I think that our use cases are different enough
that if we can find a model that satisfies both it will be a broadly
useful framework.

Initially Craig had suggested setting up a commons-pipeline project in
the sandbox. I've been preparing our code (licenses, submission
agreements, etc) to make this transition. Are you at all interested in
refactoring out the stage, event routing, and thread handling pieces
from the network-oriented bits of SEDA into this project? There are
definitely parts of your code that I'd like to be able to use without
forking them, although I'm sure you don't really want to introduce
extraneous dependencies.

Kris

--
=====================================================
Kris Nuttycombe
Associate Scientist
Geospatial Data Services Group
CIRES, National Geophysical Data Center/NOAA
(303) 497-6337
Kris.Nuttycombe@noaa.gov
=====================================================

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

RE: [chain] Pipeline implementation

Posted by Rory Winston <rw...@eircom.net>.

I've been following bits of this thread (will re-read the whole thread later
when I have time), and I am fascinated with this approach. I have been
mulling over a similar implementation for a new framework, which comes from
real-world requirements in my work. The basic idea will be a pipeline, which
may have many inputs, and many outputs. The outputs are configurable, and
could be e.g. FTP, HTTP, SOAP, JMS, JDBC, JCA(??) etc. An input arrives at
one end of the pipeline via a connector, and passes through each stage in
the pipeline. Each individual stage takes an input and produces an output,
and the "work" done in each pipeline stage can be performed via a plugin, of
sorts. A hook will be provided in each stage so that a developer could
insert his/her custom code. Here is a simple diagram of what I have in mind.
This is a trivial plugin that reads line data files from an FTP connector,
transforms the line data to XML via  predefined transform, then passes
through another pipeline stage that transforms the XML files to PDF. The
final stage is another connector that stores the generated PDF files in a
database.

     .--------------------.               .---------------------.
.---------------------.    .-------------.
     |                    |               |                     |
|                     |    |             |
     |                    |               |                     |
|                     |    |             |
     |   Conn. A          |-------------- | Pipeline Stage A
|------------------| Pipeline Stage B    |----|  Conn. B    |
     |    (FTP)           |      .        |                     |
|                     |    |     (JDBC)  |
     |                    |      |        |                     |
|                     |    |             |
     '--------------------'      |        '---------------------'
'---------------------'    '-------------'
                                 |                       .
.                      .
                                 |                       |
|                      |
                                 |                       |
|                      |
                                 |                       |
Transform XMl          Write PDF
                               Read files              Transform line data
to PDF                 to DB
                                                               to XML

Thinking about this problem, there are a few things that are needed:

 - A pipeline-based processing model
 - A generic connector API
 - Event-driven async. pipeline processing
 -

A workflow specification (possibly XML) could define the pipeline "flow",
and this could be generated via a GUI. The WebMethods Integration platform
is the slickest example of this approach that I have seen.

My questions are thus : could Commons-Chain (+ pipeline) be used as the
basis for this type of processing? Are there any other open-source
frameworks that anyone knows of that do this already?

Cheers,
Rory

-----Original Message-----
From: Kris Nuttycombe [mailto:Kris.Nuttycombe@noaa.gov]
Sent: 22 September 2004 00:47
To: Jakarta Commons Developers List
Subject: Re: [chain] Pipeline implementation

Alex Karasulu wrote:

>>subscribers that share the same index have events processed in parallel.
>>Also, perhaps instead of returning void StageHandler.handleEvent() could
>>return a boolean value that flags whether or not the event is allowed to
>>propagate to other stages with higher serial numbers.
>
>
>That's also another good idea.  This almost reminds me of rule salience
>in expert system shells.  What stage does the event have the most
>affinity for?
>
>
I hadn't thought of things in this context, but both the stage/event
handling pieces of the SEDA framework and the pipeline we've developed
here do seem a lot like frameworks for building specialized expert
systems with concurrent processing.

>>but then it seems like you have bleeding of the application logic into
>>the configuration realm. Maybe one could modify the StageHandler
>>interface by adding a method that allows you to query for the runtime
>>class of the event returned to get around this problem.
>>
>>
>
>I don't understand the "bleeding of the application logic" comment.
>Could you clarify this some more and explain how this is removed when
>the class of the event can be queried?
>
>
StageHandler's handleEvent() method is regularly responsible for raising
events and pass them back to the event router, right? The problem is
that there's nothing in the public API that makes it clear what events a
particluar StageHandler may generate, so establishing a routing scheme
is a manual process that involves the programmer having knowledge of the
StageHandler's internals. In a situation where you're trying to set up a
linear routing scheme from a configuration file, it would make more
sense that the ordering of elements in that file would determine the
routing. If it's possible for a configuration tool to look at a
StageHandler and determine what events the handleEvent method has the
potential to raise, then automatic configuration becomes much simpler.
It might also be useful to define a method on the interface that allows
a handler to announce what events it can handle.

>>We do things like this all the time, but I'm beginning to see how we
>>could get around it by having a base event type that related stages all
>>process and have each stage raise a subtype of that event. Seems a bit
>>like going the long way around the horn for our use case, but it might
>>add enough value to be worth it.
>>
>>
>
>Well this way may not be the best way for you.  This is our first
>attempt using the pub/sub pattern.  Questions about subtyping verses
>other means have been discussed.  Right now we simply don't know which
>way is the best way.
>
>

I think that the pub/sub model definitely has the potential to be a lot
more powerful than our current approach; it's just a matter of
developing the interfaces to make them flexible enough to support use
cases for both projects. I think that our use cases are different enough
that if we can find a model that satisfies both it will be a broadly
useful framework.

Initially Craig had suggested setting up a commons-pipeline project in
the sandbox. I've been preparing our code (licenses, submission
agreements, etc) to make this transition. Are you at all interested in
refactoring out the stage, event routing, and thread handling pieces
from the network-oriented bits of SEDA into this project? There are
definitely parts of your code that I'd like to be able to use without
forking them, although I'm sure you don't really want to introduce
extraneous dependencies.

Kris

--
=====================================================
Kris Nuttycombe
Associate Scientist
Geospatial Data Services Group
CIRES, National Geophysical Data Center/NOAA
(303) 497-6337
Kris.Nuttycombe@noaa.gov
=====================================================

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [chain] Pipeline implementation

Posted by Kris Nuttycombe <Kr...@noaa.gov>.

Alex Karasulu wrote:

>>subscribers that share the same index have events processed in parallel. 
>>Also, perhaps instead of returning void StageHandler.handleEvent() could 
>>return a boolean value that flags whether or not the event is allowed to 
>>propagate to other stages with higher serial numbers.
>
>
>That's also another good idea.  This almost reminds me of rule salience
>in expert system shells.  What stage does the event have the most
>affinity for?
>  
>
I hadn't thought of things in this context, but both the stage/event 
handling pieces of the SEDA framework and the pipeline we've developed 
here do seem a lot like frameworks for building specialized expert 
systems with concurrent processing.

>>but then it seems like you have bleeding of the application logic into 
>>the configuration realm. Maybe one could modify the StageHandler 
>>interface by adding a method that allows you to query for the runtime 
>>class of the event returned to get around this problem.
>>    
>>
>
>I don't understand the "bleeding of the application logic" comment. 
>Could you clarify this some more and explain how this is removed when
>the class of the event can be queried?
>  
>
StageHandler's handleEvent() method is regularly responsible for raising 
events and pass them back to the event router, right? The problem is 
that there's nothing in the public API that makes it clear what events a 
particluar StageHandler may generate, so establishing a routing scheme 
is a manual process that involves the programmer having knowledge of the 
StageHandler's internals. In a situation where you're trying to set up a 
linear routing scheme from a configuration file, it would make more 
sense that the ordering of elements in that file would determine the 
routing. If it's possible for a configuration tool to look at a 
StageHandler and determine what events the handleEvent method has the 
potential to raise, then automatic configuration becomes much simpler. 
It might also be useful to define a method on the interface that allows 
a handler to announce what events it can handle.

>>We do things like this all the time, but I'm beginning to see how we 
>>could get around it by having a base event type that related stages all 
>>process and have each stage raise a subtype of that event. Seems a bit 
>>like going the long way around the horn for our use case, but it might 
>>add enough value to be worth it.
>>    
>>
>
>Well this way may not be the best way for you.  This is our first
>attempt using the pub/sub pattern.  Questions about subtyping verses
>other means have been discussed.  Right now we simply don't know which
>way is the best way.  
>  
>

I think that the pub/sub model definitely has the potential to be a lot 
more powerful than our current approach; it's just a matter of 
developing the interfaces to make them flexible enough to support use 
cases for both projects. I think that our use cases are different enough 
that if we can find a model that satisfies both it will be a broadly 
useful framework.

Initially Craig had suggested setting up a commons-pipeline project in 
the sandbox. I've been preparing our code (licenses, submission 
agreements, etc) to make this transition. Are you at all interested in 
refactoring out the stage, event routing, and thread handling pieces 
from the network-oriented bits of SEDA into this project? There are 
definitely parts of your code that I'd like to be able to use without 
forking them, although I'm sure you don't really want to introduce 
extraneous dependencies.

Kris

-- 
=====================================================
Kris Nuttycombe
Associate Scientist
Geospatial Data Services Group
CIRES, National Geophysical Data Center/NOAA
(303) 497-6337
Kris.Nuttycombe@noaa.gov
=====================================================



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org