You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Kris Nuttycombe <Kr...@noaa.gov> on 2005/08/11 17:18:34 UTC

Re: [pipeline] Commons Sandbox Pipeline

Hi, Matthew,

Yep, I'm still here. I had been inactive for several months, but started
doing active development on the pipeline again a few seeks ago.

I originally developed Pipeline for doing processing of vector-based
geometry data so that I could arbitrarily reorder processing steps and
create different processing engines for different datasets. At the time,
all of the other applications I found used an event-based mechanism,
which while useful is not the best when you need to enforce a strict
sequence of processing steps but want to be able to reorder those steps
without changing any of the code.

The current implementation of Pipeline attempts to provide this sort of
sequential processing ability and at the same time decouple the
threading model from the processing stage implementation. When you
construct a Pipeline you're generally intending to process a large
number of objects of the same type and want to do so as efficiently as
possible. One of the important considerations for my initial use case
was that some of the processing steps we have (like the initial
retrieval of the data, and the eventual insertion of spatial objects
into a database) are high-latency and thus it's important to be able to
parallelize the execution of the processing steps to reduce the total
processing time.

There are a few outstanding tasks that could definitely use some
attention. Right now, there are two existing implementations of the
StageDriver (the class that controls the threading model for a stage),
one that supports execution of the entire pipeline in a single thread
(this is mostly for the unit tests) and one that supports execution of
each stage in a separate thread, but it would be really nice to have a
StageDriver that uses a thread pool to parallelize the processing within
a single stage. As usual, the documentation is also fairly minimal at
this point and any assistance on that front would be appreciated.
Another piece that's in the works but hasn't made it into subversion yet
is a means of dynamically creating pipeline branches and using 
commons-chain to determine which branch to route an object to for
processing, so if you think you might have an application for this I'd
be happy to send you the (currently incomplete) code. And, of course, if
you start using the pipeline I'm sure you'll find areas where it doesn't
support your application as well as you'd like, so contributions are
always welcome.

Currently it's possible to configure a pipeline either directly in code
or using Digester. I haven't really given much thought to how one might
go about reordering the processing stages in a running pipeline, so that
could also prove an interesting  and generally useful area for development.

For your application, one possibly useful piece of the current
implementation is the event model that's build into the pipeline. Each
of your stages for fraud prevention could choose to either pass the
transaction on to the next stage or raise an event in the case of a
detected case of fraud, and a listener registered with the pipeline
could then take appropriate action.

Anyway, I hope that this all give you a little better idea of what the
pipeline is all about. Let me know if you have any other questions or
would like to contribute!

Kris


Matthew Ryan wrote:

>Hi Kris,
>
>Greetings from Sydney Australia.
>
>I want to ask if you are still active on this Commons Sandbox Pipeline project.
>
>I had built a 'pipeline engine' for a fraud prevention application for online
>retailers. In the application a merchat can choose from several types of
>'checks', build a 'pipeline' of those checks, also configure each check and as
>they like come back to their administration console and re-configure the
>pipeline or the checks.
>
>That was what functionality I had built. They way I had done it probably left a
>lot to be desired so I started looking around for how other people build
>pipelines or workflows of tasks. It should be such a common thing that I
>thought maybe there would be an open source project, that was straight forward
>enough, to suit most applications.
>
>First I looked at the Pipeline in Apache Tomcat, then I opened up Apache Commons
>Chain to have a look, I found that looking at an article about Apache Struts
>1.3. Then the other day I found Apache Jakarta Commons Sandbox Workflow, and
>then this Apache Jakarta Commons Sandbox Pipeline.
>
>And along the way thought about writing something myself from beginning again.
>(In my own time.)
>
>Back to the question. Are you still active on this Pipeline endeavour?
>
>Look forward to hearing from you.
>
>Matthew Ryan
>Sydney, Australia
>
>  
>