You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Reinhard Pötz <re...@apache.org> on 2008/12/23 08:28:38 UTC

[c3] Pipeline results

Triggered by some discussions that Steven and I had with our students we
were thinking again about the pipeline API. Compared to previous
pipeline implementations we already made one improvement by making the
pipeline unaware of the data that flows between its components. This
means that the Cocoon 3 pipeline implementation doesn't impose any
limitations if it is SAX or StAX or whatever that is passed
from one component to another.

But still there is one limitation that has bugged me several times: The
result of all pipelines is that they write something into an output stream.
That's the original use case for pipelines used in servlet environments
and probably true in many cases but not in all.
One use case that I have is that I want to get the SAX events written
into a SAX buffer or some might want to make the pipeline write into
some content handler directly that they pass to it.

Another use case is when you want to use pipelines to create business
objects as their result. (I guess Simone can give more details here.)

In all these cases you don't need an output stream.


                                  - o -


In order to make the pipeline result pluggable, I propose following
changes:

I would like to introduce a generic Result interface which is the
transport vehicle for the actual result:

public interface Result<T> {

    void setResultObject(T o);

    T getResultObject();
}

And here is a possible implementation of a Result that carries
an output stream:

public class OutputStreamResult implements Result<OutputStream> {

    private OutputStream os;

    public OutputStream getResultObject() {
        return os;
    }

    public void setResultObject(OutputStream o) {
        this.os = o;
    }

    public String getContentType() {
        ...
    }

    public long getLastModified() {
        ...
    }
}

The pipeline interface would have to change accordingly:

public interface Pipeline<T> {

    void addComponent(PipelineComponent pipelineComponent);

    void addComponent(Finisher<T> finisher);

    void execute() throws Exception;

    void setup(Result<T> result);

    void setup(Result<T> result, Map<String, Object> parameters);

    void setConfiguration(Map<String, ? extends Object> parameters);
}

As you can see, the methods getContentType() and getLastModified() were
moved to the result object where they fit much better IMO because both
are specific for our output stream use case.

And thanks to generics, the compiler makes sure that only a Finisher (=
serializer) can be added to the pipeline if it supports a particular
result type:

public interface Finisher<T> extends PipelineComponent {

    void setResult(Result<T> result);
}


WDYT?

-- 
Reinhard Pötz                           Managing Director, {Indoqa} GmbH
                         http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member                  reinhard@apache.org
________________________________________________________________________

Re: [c3] Pipeline results

Posted by Simone Tripodi <si...@gmail.com>.

Hi Reinhard,
thanks to mentioned me :P I like your proposal and, indeed, is what we
start thinking when I submitted the Digester Finisher[1] patch; in the
testcase, there's the demonstration that the Finisher had to be
"forced" using a null OutputStream - otherwise the Pipeline couldn't
start - then parsing the XML events throught the Digester and after
all getting the result object using the specific Digester methods. Of
course, it works, but IMHO it's a little tricky.

The main point - that I suppose induced Reinhard thinking about the
interfaces introduction - is that the Pipeline module should be useful
not only to serialize XML events, but also to process them in
different ways, like mapping and XML document to Java objects, not
necessarily with Apache Digester.

Introducing the new interface, users can avoid to "hack" the finisher,
but rather using the Pipeline in a clear and documented way.
I like Reinhard's ideas, they make the Pipeline module multi-purpose
just introducing new small concepts.

Best regards,
Simone

[1] https://issues.apache.org/jira/browse/COCOON3-8

2008/12/23 Reinhard Pötz <re...@apache.org>:
>
> Triggered by some discussions that Steven and I had with our students we
> were thinking again about the pipeline API. Compared to previous
> pipeline implementations we already made one improvement by making the
> pipeline unaware of the data that flows between its components. This
> means that the Cocoon 3 pipeline implementation doesn't impose any
> limitations if it is SAX or StAX or whatever that is passed
> from one component to another.
>
> But still there is one limitation that has bugged me several times: The
> result of all pipelines is that they write something into an output stream.
> That's the original use case for pipelines used in servlet environments
> and probably true in many cases but not in all.
> One use case that I have is that I want to get the SAX events written
> into a SAX buffer or some might want to make the pipeline write into
> some content handler directly that they pass to it.
>
> Another use case is when you want to use pipelines to create business
> objects as their result. (I guess Simone can give more details here.)
>
> In all these cases you don't need an output stream.
>
>
>                                  - o -
>
>
> In order to make the pipeline result pluggable, I propose following
> changes:
>
> I would like to introduce a generic Result interface which is the
> transport vehicle for the actual result:
>
> public interface Result<T> {
>
>    void setResultObject(T o);
>
>    T getResultObject();
> }
>
> And here is a possible implementation of a Result that carries
> an output stream:
>
> public class OutputStreamResult implements Result<OutputStream> {
>
>    private OutputStream os;
>
>    public OutputStream getResultObject() {
>        return os;
>    }
>
>    public void setResultObject(OutputStream o) {
>        this.os = o;
>    }
>
>    public String getContentType() {
>        ...
>    }
>
>    public long getLastModified() {
>        ...
>    }
> }
>
> The pipeline interface would have to change accordingly:
>
> public interface Pipeline<T> {
>
>    void addComponent(PipelineComponent pipelineComponent);
>
>    void addComponent(Finisher<T> finisher);
>
>    void execute() throws Exception;
>
>    void setup(Result<T> result);
>
>    void setup(Result<T> result, Map<String, Object> parameters);
>
>    void setConfiguration(Map<String, ? extends Object> parameters);
> }
>
> As you can see, the methods getContentType() and getLastModified() were
> moved to the result object where they fit much better IMO because both
> are specific for our output stream use case.
>
> And thanks to generics, the compiler makes sure that only a Finisher (=
> serializer) can be added to the pipeline if it supports a particular
> result type:
>
> public interface Finisher<T> extends PipelineComponent {
>
>    void setResult(Result<T> result);
> }
>
>
> WDYT?
>
> --
> Reinhard Pötz                           Managing Director, {Indoqa} GmbH
>                         http://www.indoqa.com/en/people/reinhard.poetz/
>
> Member of the Apache Software Foundation
> Apache Cocoon Committer, PMC member                  reinhard@apache.org
> ________________________________________________________________________
>
>



-- 
My LinkedIn profile: http://www.linkedin.com/in/simonetripodi
My GoogleCode profile: http://code.google.com/u/simone.tripodi/
My Picasa: http://picasaweb.google.com/simone.tripodi/
My Tube: http://www.youtube.com/user/stripodi
My Del.icio.us: http://del.icio.us/simone.tripodi

Re: [c3] Pipeline results

Posted by Peter Hunsberger <pe...@gmail.com>.

On Mon, Jan 5, 2009 at 2:36 AM, Carsten Ziegeler <cz...@apache.org> wrote:
> Reinhard Pötz wrote:
> Ok, so let's throw in some other ideas:
> The pipeline interface has the execute method, we could change the
> return type of that method from void to Object. So the execution of the
> pipeline returns the result (if any).

Yes please, this makes sense to me.  This is how we handle our generic
data source providers we use with Cocoon 2.1  If the 3.0 pipelines
worked this way we can remove an entire layer of wrappers that we use
to pass stuff around.

I do wonder if this might be more than Object: for the generic use
case it's probably just a marker interface, but then specific kinds of
pipelines could inherit from this so as to keep the contracts on what
is being handled by the pipeline clear?  Of course, anyone could
create their own set of interfaces, but we do know the SAX (and maybe
STAX) use cases so it might be nice to have them standardized?

-- 
Peter Hunsberger

Re: [c3] Pipeline results

Posted by Grzegorz Kossakowski <gr...@tuffmail.com>.

Peter Hunsberger pisze:
> 
> I guess part of the debate is whether it is worth defining some
> additional method, eg:
> 
>   CocoonOutput execute( CocoonInput genericInput) throws ?;
> 
> or
> 
>   Object execute( Object genericInput) throws ?;
> 
> I really can't see that the CocoonOuput and CocoonInput are anything
> more than marker interfaces and this of course means that more
> standard objects have to be wrapped so I think the  second  version is
> preferable.  But that certainly doesn't buy you anything as far as
> pipeline contracts are concerned for traditional Java.  However, I do
> think that perhaps this opens up some longer term possibilities for
> Aspect oriented inspection across the pipeline so maybe it is actually
> worth adding?

You have got a point, it's exactly about this. The problem is (that my code shows) that this kind of thing, looking
simple at the beginning results in rather ugly constructs in Java... At least I couldn't make it more concise...

-- 
Best regards,
Grzegorz Kossakowski

Re: [c3] Pipeline results

Posted by Peter Hunsberger <pe...@gmail.com>.

On Tue, Jan 13, 2009 at 12:00 PM, Carsten Ziegeler <cz...@apache.org> wrote:
> Grzegorz Kossakowski wrote:
>> Your problem made me to analyze Cocoon3 Pipeline API very carefully and think about it for a while. What I found rather
>> strange is PipelineComponent interface, see:
>>
>> public interface PipelineComponent {
>>
>>     void setConfiguration(Map<String, ? extends Object> configuration);
>>
>>     void setup(Map<String, Object> parameters);
>>
>>     void finish(Exception exception);
>> }
>>
>> Why I think this interface is strange (and confusing)? Becuase it does not deal with the most important aspect of
>> PipelineComponent: that it processes something and that it can be combined with other components. The most important
>> aspect is neither configuration nor setup of a component.
>>
>> I started to think that we will have such problems like the one mentioned in this thread as long as we don't address
>> issues with PipelineComponent.
>>
>> I've decided to experiment with the code (after analyzing the problem on a paper) and you can see results here:
>> http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline
>>
>> This code proves only one thing: that I failed to address an issue mentioned above. What I'm interested in is if others
>> share my concerns about PipelineComponent interface (and Pipeline itself) and if you also see a relation to problem with
>>  pipeline result?
>>
> I agree that the PipelineComponent interface looks really strange. It's
> absolutely not clear what the difference between setConfiguration and
> setup is - so I think we need better names here.
>
> The finish method looks also strange as it gets an exception as a
> parameter. The question is if a component should really now what
> exception occured during a pipeline run or if a boolean is sufficient.
>
> Apart from that I think, the interface is the price we have to pay for
> the flexibility we get.
>

I guess part of the debate is whether it is worth defining some
additional method, eg:

  CocoonOutput execute( CocoonInput genericInput) throws ?;

or

  Object execute( Object genericInput) throws ?;

I really can't see that the CocoonOuput and CocoonInput are anything
more than marker interfaces and this of course means that more
standard objects have to be wrapped so I think the  second  version is
preferable.  But that certainly doesn't buy you anything as far as
pipeline contracts are concerned for traditional Java.  However, I do
think that perhaps this opens up some longer term possibilities for
Aspect oriented inspection across the pipeline so maybe it is actually
worth adding?

-- 
Peter Hunsberger

Re: [c3] Pipeline results

Posted by Steven Dolg <st...@indoqa.com>.

Grzegorz Kossakowski schrieb:
> Steven Dolg pisze:
>   
>> Configuration and setup is clearly not the most important aspect of a
>> pipeline component.
>> But AFAIK interfaces are not designed by what is most important or not,
>> but by what is common to the implementating classes and by what is
>> really necessary for the caller of that interface.
>>     
>
> Processing input and generating output is a common to all pipeline components. For any meaningful way of using
> components you need methods that will execute given component. If we were to stick to this interface it should be
> renamed to something like PipelineComponentBase but this obviously does not solve anything more than we are honestly
> admitting our mistake here.
>
>   
>> From that point of view configuration and setup (and yes, those names
>> are not ideal - suggesstion are always welcome...) are very valid
>> candidates for that interface.
>>
>> It is the common basis of *all* pipeline components.
>> This is the most basic interface for any pipeline component - no matter
>> if it is a Serializer, Generator, Transformer, uses SAX, StAX, Images,
>> Beans, ...
>> I seriously wonder what methods for content processing and component
>> linking you are missing at that level?
>>     
>
> As you were unable to check what I've come up with (btw. GitHub folks have fixed their problem already so links work) so
> you couldn't get my point. Just have a look at reworked PipelineComponent interface which should be considered as a
> starting point for a discussion.
>
>   
>> As this is basically a marker interface (with those 3 methods that are
>> common to all components) a user won't have to deal with it.
>> Even a developer implementing new components hardly ever gets in contact
>> with it, as he will usually deal with the Starter/Finisher,
>> Producer/Consumer level above PipelineComponent.
>>     
>
> In pipeline we have three types of components: generators, transformers and serializers. Could you explain to me why do
> we need 5 different interfaces supporting these three cases:
>   PipelineComponent, Starter, Producer, Consumer, Finisher
>   
We have 5 interface because those are the 5 "roles" any pipeline 
component can have.
There are those that can be the first component of a pipeline: Starter.
... the last component of a pipeline: Finisher.
... providing input for a following component: Producer.
... receiving input from a preceeding component: Consumer.
All are components that can be configured and added to a pipeline: 
PipelineComponent.

IMO it makes sense to really have those 5 interfaces, because not all 
Starters are actually Producers, not all Consumers are Finishers, and 
not all Producers are Consumers as well.
There are even components that are Starters and Finishers at the same 
time (e.g. the FileReaderComponent).
We could very well define all possible combinations but that would not 
reduce the number of interfaces.

> Moreover, we have AbstractGenerator, AbstractTransformer and AbstractSerializer. An argument, that in C2.2 it wasn't
> simpler is rather weak as we strive for finding a *better* design. It's not about pointing at anyone and blaming about
> imperfect code (because you could easily do the same for me) but about expressing current weak points and discussing
> possible solutions.
>   
Maybe it is a weak argument. But it is an argument nonetheless.
Actually it is quite hard to compare a working solution that is actually 
already in use with an approach that is yet to be converted into a 
working piece of software.
So yes, I prefer to compare it to Cocoon 2.2.

>   
>> I understand that this concept is quite a bit different than Cocoon 2.2
>> and is almost completely undocumented at this time, too.
>> But I seriously doubt that selecting interfaces randomly and questioning
>> their usefulness is really good approach...
>>     
>
> We can disagree on different things and it's ok but accusing me of choosing random pieces of code just for a sake of
> criticizing is not ok.
>   
Well the interface PipelineComponent alone may look weird.
But the Pipeline API contains some more code.
Looking at how the other interfaces (Starter, etc.) are used should 
indicate what they are used for.

The SAX components in addition to the StAX components should demonstrate 
why there is little sense in having content type related methods in the 
PipelineComponent.
Because the idea behind the Pipeline API is that the components decide 
how they communicate with each other. Thus the Pipeline API must not 
make any assumptions about this.

The intention was to have a pipeline that is flexible enough to support 
different content types and still makes it easy enough use them and 
actually create new ones.
The recent experiences with StAX made me believe that the current 
implementation achieved that goal pretty well.
Having a user implementing some new generators and serializers - 
practically all on his own - supports this impression.

Of course there are still points that can (and should) be improved - 
there always are.

> I've worked with this code, I have reworked it according to different philosophy to show possible benefits and
> weaknesses. After that, I've come to conclusion that this interface looks weird and found relation to problem discussed
> in this thread (pipeline results). I wanted to bring that interface to the attention because we still have quite a lot
> of people much clever than me on this list that could possibly propose something better.
>
>

Re: [c3] Pipeline results

Posted by Grzegorz Kossakowski <gr...@tuffmail.com>.

Steven Dolg pisze:
> Configuration and setup is clearly not the most important aspect of a
> pipeline component.
> But AFAIK interfaces are not designed by what is most important or not,
> but by what is common to the implementating classes and by what is
> really necessary for the caller of that interface.

Processing input and generating output is a common to all pipeline components. For any meaningful way of using
components you need methods that will execute given component. If we were to stick to this interface it should be
renamed to something like PipelineComponentBase but this obviously does not solve anything more than we are honestly
admitting our mistake here.

> From that point of view configuration and setup (and yes, those names
> are not ideal - suggesstion are always welcome...) are very valid
> candidates for that interface.
> 
> It is the common basis of *all* pipeline components.
> This is the most basic interface for any pipeline component - no matter
> if it is a Serializer, Generator, Transformer, uses SAX, StAX, Images,
> Beans, ...
> I seriously wonder what methods for content processing and component
> linking you are missing at that level?

As you were unable to check what I've come up with (btw. GitHub folks have fixed their problem already so links work) so
you couldn't get my point. Just have a look at reworked PipelineComponent interface which should be considered as a
starting point for a discussion.

> As this is basically a marker interface (with those 3 methods that are
> common to all components) a user won't have to deal with it.
> Even a developer implementing new components hardly ever gets in contact
> with it, as he will usually deal with the Starter/Finisher,
> Producer/Consumer level above PipelineComponent.

In pipeline we have three types of components: generators, transformers and serializers. Could you explain to me why do
we need 5 different interfaces supporting these three cases:
  PipelineComponent, Starter, Producer, Consumer, Finisher

Moreover, we have AbstractGenerator, AbstractTransformer and AbstractSerializer. An argument, that in C2.2 it wasn't
simpler is rather weak as we strive for finding a *better* design. It's not about pointing at anyone and blaming about
imperfect code (because you could easily do the same for me) but about expressing current weak points and discussing
possible solutions.

> I understand that this concept is quite a bit different than Cocoon 2.2
> and is almost completely undocumented at this time, too.
> But I seriously doubt that selecting interfaces randomly and questioning
> their usefulness is really good approach...

We can disagree on different things and it's ok but accusing me of choosing random pieces of code just for a sake of
criticizing is not ok.

I've worked with this code, I have reworked it according to different philosophy to show possible benefits and
weaknesses. After that, I've come to conclusion that this interface looks weird and found relation to problem discussed
in this thread (pipeline results). I wanted to bring that interface to the attention because we still have quite a lot
of people much clever than me on this list that could possibly propose something better.

-- 
Best regards,
Grzegorz Kossakowski

Re: [c3] Pipeline results

Posted by Steven Dolg <st...@indoqa.com>.

Grzegorz Kossakowski schrieb:
> Reinhard Pötz pisze:
>   
>> Thanks Grek! But there is no need to hurry because neither Carsten nor I
>> will work on this until Steven has finished his refactorings.
>>
>>     
>
> Actually, for me there was a hurry as another series of exams in coming and I wanted to contribute something useful
> before February.
>
> Your problem made me to analyze Cocoon3 Pipeline API very carefully and think about it for a while. What I found rather
> strange is PipelineComponent interface, see:
>
> public interface PipelineComponent {
>
>     void setConfiguration(Map<String, ? extends Object> configuration);
>
>     void setup(Map<String, Object> parameters);
>
>     void finish(Exception exception);
> }
>
> Why I think this interface is strange (and confusing)? Becuase it does not deal with the most important aspect of
> PipelineComponent: that it processes something and that it can be combined with other components. The most important
> aspect is neither configuration nor setup of a component.
>   
Configuration and setup is clearly not the most important aspect of a 
pipeline component.
But AFAIK interfaces are not designed by what is most important or not, 
but by what is common to the implementating classes and by what is 
really necessary for the caller of that interface.
 From that point of view configuration and setup (and yes, those names 
are not ideal - suggesstion are always welcome...) are very valid 
candidates for that interface.

It is the common basis of *all* pipeline components.
This is the most basic interface for any pipeline component - no matter 
if it is a Serializer, Generator, Transformer, uses SAX, StAX, Images, 
Beans, ...
I seriously wonder what methods for content processing and component 
linking you are missing at that level?

As this is basically a marker interface (with those 3 methods that are 
common to all components) a user won't have to deal with it.
Even a developer implementing new components hardly ever gets in contact 
with it, as he will usually deal with the Starter/Finisher, 
Producer/Consumer level above PipelineComponent.


I understand that this concept is quite a bit different than Cocoon 2.2 
and is almost completely undocumented at this time, too.
But I seriously doubt that selecting interfaces randomly and questioning 
their usefulness is really good approach...

> I started to think that we will have such problems like the one mentioned in this thread as long as we don't address
> issues with PipelineComponent.
>
> I've decided to experiment with the code (after analyzing the problem on a paper) and you can see results here:
> http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline
>
> This code proves only one thing: that I failed to address an issue mentioned above. What I'm interested in is if others
> share my concerns about PipelineComponent interface (and Pipeline itself) and if you also see a relation to problem with
>  pipeline result?
>
>

Re: [c3] Pipeline results

Posted by Carsten Ziegeler <cz...@apache.org>.

Grzegorz Kossakowski wrote:
> Your problem made me to analyze Cocoon3 Pipeline API very carefully and think about it for a while. What I found rather
> strange is PipelineComponent interface, see:
> 
> public interface PipelineComponent {
> 
>     void setConfiguration(Map<String, ? extends Object> configuration);
> 
>     void setup(Map<String, Object> parameters);
> 
>     void finish(Exception exception);
> }
> 
> Why I think this interface is strange (and confusing)? Becuase it does not deal with the most important aspect of
> PipelineComponent: that it processes something and that it can be combined with other components. The most important
> aspect is neither configuration nor setup of a component.
> 
> I started to think that we will have such problems like the one mentioned in this thread as long as we don't address
> issues with PipelineComponent.
> 
> I've decided to experiment with the code (after analyzing the problem on a paper) and you can see results here:
> http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline
> 
> This code proves only one thing: that I failed to address an issue mentioned above. What I'm interested in is if others
> share my concerns about PipelineComponent interface (and Pipeline itself) and if you also see a relation to problem with
>  pipeline result?
> 
I agree that the PipelineComponent interface looks really strange. It's
absolutely not clear what the difference between setConfiguration and
setup is - so I think we need better names here.

The finish method looks also strange as it gets an exception as a
parameter. The question is if a component should really now what
exception occured during a pipeline run or if a boolean is sufficient.

Apart from that I think, the interface is the price we have to pay for
the flexibility we get.

Carsten

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [c3] Pipeline results

Posted by Grzegorz Kossakowski <gr...@tuffmail.com>.

Reinhard Pötz pisze:
> 
> Thanks Grek! But there is no need to hurry because neither Carsten nor I
> will work on this until Steven has finished his refactorings.
> 

Actually, for me there was a hurry as another series of exams in coming and I wanted to contribute something useful
before February.

Your problem made me to analyze Cocoon3 Pipeline API very carefully and think about it for a while. What I found rather
strange is PipelineComponent interface, see:

public interface PipelineComponent {

    void setConfiguration(Map<String, ? extends Object> configuration);

    void setup(Map<String, Object> parameters);

    void finish(Exception exception);
}

Why I think this interface is strange (and confusing)? Becuase it does not deal with the most important aspect of
PipelineComponent: that it processes something and that it can be combined with other components. The most important
aspect is neither configuration nor setup of a component.

I started to think that we will have such problems like the one mentioned in this thread as long as we don't address
issues with PipelineComponent.

I've decided to experiment with the code (after analyzing the problem on a paper) and you can see results here:
http://github.com/gkossakowski/cocoonpipelines/tree/master/src/org/apache/cocoon/pipeline

This code proves only one thing: that I failed to address an issue mentioned above. What I'm interested in is if others
share my concerns about PipelineComponent interface (and Pipeline itself) and if you also see a relation to problem with
 pipeline result?

-- 
Best regards,
Grzegorz Kossakowski

Re: [c3] Pipeline results

Posted by Reinhard Pötz <re...@apache.org>.

Grzegorz Kossakowski wrote:
> Reinhard Pötz pisze:
>> TBH I'm still not fully convinced of the parameters approach. If
>> none is faster than me I will prepare patches for both solutions so
>> that we see what it means at code level.
>> 
>> I would also be interested in what others think about the two
>> solutions or maybe there are other alternatives that Carsten and I
>> haven't considered yet.
> 
> Reinhard, this problem looks quite interesting and I've been about it
> for a while yesterday's evening.
> 
> I have some idea which I'll try to describe today as soon as I
> address a few of my doubts.
> 
> I'm writing this e-mail just to let you know that you can expect some
> more feedback. :-)
> 

Thanks Grek! But there is no need to hurry because neither Carsten nor I
will work on this until Steven has finished his refactorings.

-- 
Reinhard Pötz                           Managing Director, {Indoqa} GmbH
                         http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member                  reinhard@apache.org
________________________________________________________________________

Re: [c3] Pipeline results

Posted by Grzegorz Kossakowski <gr...@tuffmail.com>.

Reinhard Pötz pisze:
> 
> TBH I'm still not fully convinced of the parameters approach. If none is
> faster than me I will prepare patches for both solutions so that we see
> what it means at code level.
> 
> I would also be interested in what others think about the two solutions
> or maybe there are other alternatives that Carsten and I haven't
> considered yet.

Reinhard, this problem looks quite interesting and I've been about it for a while yesterday's evening.

I have some idea which I'll try to describe today as soon as I address a few of my doubts.

I'm writing this e-mail just to let you know that you can expect some more feedback. :-)

-- 
Best regards,
Grzegorz Kossakowski

Re: [c3] Pipeline results

Posted by Carsten Ziegeler <cz...@apache.org>.

Reinhard Pötz wrote:
> TBH I'm still not fully convinced of the parameters approach. If none is
> faster than me I will prepare patches for both solutions so that we see
> what it means at code level.
> 
I can write a patch - but I would like to wait for Steven's changes first.

Carsten

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [c3] Pipeline results

Posted by Reinhard Pötz <re...@apache.org>.

Carsten Ziegeler wrote:
> Carsten Ziegeler wrote:
>> Reinhard Pötz wrote:
>>> I want to provide a SAX buffer to the pipeline and the serializer just
>>> passes the SAX events to it.
>> Ah, ok.
>>
>>>> As the output stream is a runtime object for the finisher it should be
>>>> treated like other objects of this kind (the src url for the file
>>>> generator for example etc.) which means it should be passed in through
>>>> the parameters for the finisher.
>>> I'm not sure if I understand your idea: Do you propose to change the
>>> interface this way?
>>>
>>> setup(Map<String, Object> outputParameters,
>>>       Map<String, Object> inputParameters);
>>>
>>>
>>> The client of the pipeline API has to put the objects for the serializer
>>> into the output parameters map and the serializer gets them passed so
>>> that it can manipulate them?
>> Yes, but I thought about just using:
>>
>> setup(Map<String, Object> parameters);
>>
>> so we don't differentiate between input and output parameters. Maybe two
>> parameter maps make more sense.
>>
> Rethinking this, we should just use one parameters map. Other parameters
> affecting the output like the encoding or the doctype (html, xhtml...)
> are parameters in this map as well. And in the end these are all output
> parameters, so we would end up with defining input parameters for all
> pipeline components except the finisher, while the finisher just uses
> the output parameters. Therefore one map seems to be the better solution.

TBH I'm still not fully convinced of the parameters approach. If none is
faster than me I will prepare patches for both solutions so that we see
what it means at code level.

I would also be interested in what others think about the two solutions
or maybe there are other alternatives that Carsten and I haven't
considered yet.

-- 
Reinhard Pötz                           Managing Director, {Indoqa} GmbH
                         http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member                  reinhard@apache.org
________________________________________________________________________

Re: [c3] Pipeline results

Posted by Carsten Ziegeler <cz...@apache.org>.

Carsten Ziegeler wrote:
> Reinhard Pötz wrote:
>> I want to provide a SAX buffer to the pipeline and the serializer just
>> passes the SAX events to it.
> Ah, ok.
> 
>>> As the output stream is a runtime object for the finisher it should be
>>> treated like other objects of this kind (the src url for the file
>>> generator for example etc.) which means it should be passed in through
>>> the parameters for the finisher.
>> I'm not sure if I understand your idea: Do you propose to change the
>> interface this way?
>>
>> setup(Map<String, Object> outputParameters,
>>       Map<String, Object> inputParameters);
>>
>>
>> The client of the pipeline API has to put the objects for the serializer
>> into the output parameters map and the serializer gets them passed so
>> that it can manipulate them?
> Yes, but I thought about just using:
> 
> setup(Map<String, Object> parameters);
> 
> so we don't differentiate between input and output parameters. Maybe two
> parameter maps make more sense.
> 
Rethinking this, we should just use one parameters map. Other parameters
affecting the output like the encoding or the doctype (html, xhtml...)
are parameters in this map as well. And in the end these are all output
parameters, so we would end up with defining input parameters for all
pipeline components except the finisher, while the finisher just uses
the output parameters. Therefore one map seems to be the better solution.

Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [c3] Pipeline results

Posted by Carsten Ziegeler <cz...@apache.org>.

Reinhard Pötz wrote:
> I want to provide a SAX buffer to the pipeline and the serializer just
> passes the SAX events to it.
Ah, ok.

>> As the output stream is a runtime object for the finisher it should be
>> treated like other objects of this kind (the src url for the file
>> generator for example etc.) which means it should be passed in through
>> the parameters for the finisher.
> 
> I'm not sure if I understand your idea: Do you propose to change the
> interface this way?
> 
> setup(Map<String, Object> outputParameters,
>       Map<String, Object> inputParameters);
> 
> 
> The client of the pipeline API has to put the objects for the serializer
> into the output parameters map and the serializer gets them passed so
> that it can manipulate them?
Yes, but I thought about just using:

setup(Map<String, Object> parameters);

so we don't differentiate between input and output parameters. Maybe two
parameter maps make more sense.

> The advantage compared to my proposed solution is that this way a
> serializer could handle more than one output type. Right?
Yes (and it doesn't create a new interface :) )


Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [c3] Pipeline results

Posted by Reinhard Pötz <re...@apache.org>.

Carsten Ziegeler wrote:
> Reinhard Pötz wrote:
>> Yes and I don't think that there speaks anything against it: One
>> serializer can produce one particular result type.
> Yes, or maybe two (output stream and writer).
> 
>> The problem that I want to solve is that a pipeline can produce other
>> results than OutputStreams which imposes an unnecessary limit or makes
>> things ugly because you have to produce something that you don't need
>> and use a hack to get out what you really want.
>>
>> I think we all agree on this.
> Yes, definitly.
> 
>> The question we have to answer now is, how to we want to get access to
>> the produced result. As pointed out before, I don't think that it is a
>> good idea if you have to ask the finisher to get the result. If we do it
>> this way, we would probably have to expose the finisher via the pipeline
>> API. I'm not sure if that's the way to go.
>>
> Ok, so which use cases do we have? Apart from the io stuff the finisher
> might return an arbitrary object. Obviously, there's a difference
> between these two use cases: for the io stuff, the client of the
> pipeline needs to provide an output stream (or writer) to the pipeline
> (finisher). Whereas for the other use cases, no special object is needed
> for the finisher, but the finisher just returns something.

I want to provide a SAX buffer to the pipeline and the serializer just
passes the SAX events to it.

> Ok, so let's throw in some other ideas:
> The pipeline interface has the execute method, we could change the
> return type of that method from void to Object. So the execution of the
> pipeline returns the result (if any).
> 
> As the output stream is a runtime object for the finisher it should be
> treated like other objects of this kind (the src url for the file
> generator for example etc.) which means it should be passed in through
> the parameters for the finisher.

I'm not sure if I understand your idea: Do you propose to change the
interface this way?

setup(Map<String, Object> outputParameters,
      Map<String, Object> inputParameters);


The client of the pipeline API has to put the objects for the serializer
into the output parameters map and the serializer gets them passed so
that it can manipulate them?

The advantage compared to my proposed solution is that this way a
serializer could handle more than one output type. Right?

-- 
Reinhard Pötz                           Managing Director, {Indoqa} GmbH
                         http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member                  reinhard@apache.org
________________________________________________________________________

Re: [c3] Pipeline results

Posted by Carsten Ziegeler <cz...@apache.org>.

Reinhard Pötz wrote:
> Yes and I don't think that there speaks anything against it: One
> serializer can produce one particular result type.
Yes, or maybe two (output stream and writer).

> The problem that I want to solve is that a pipeline can produce other
> results than OutputStreams which imposes an unnecessary limit or makes
> things ugly because you have to produce something that you don't need
> and use a hack to get out what you really want.
> 
> I think we all agree on this.
Yes, definitly.

> The question we have to answer now is, how to we want to get access to
> the produced result. As pointed out before, I don't think that it is a
> good idea if you have to ask the finisher to get the result. If we do it
> this way, we would probably have to expose the finisher via the pipeline
> API. I'm not sure if that's the way to go.
> 
Ok, so which use cases do we have? Apart from the io stuff the finisher
might return an arbitrary object. Obviously, there's a difference
between these two use cases: for the io stuff, the client of the
pipeline needs to provide an output stream (or writer) to the pipeline
(finisher). Whereas for the other use cases, no special object is needed
for the finisher, but the finisher just returns something.

Ok, so let's throw in some other ideas:
The pipeline interface has the execute method, we could change the
return type of that method from void to Object. So the execution of the
pipeline returns the result (if any).

As the output stream is a runtime object for the finisher it should be
treated like other objects of this kind (the src url for the file
generator for example etc.) which means it should be passed in through
the parameters for the finisher.

WDYT?
Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [c3] Pipeline results

Posted by Reinhard Pötz <re...@apache.org>.

Carsten Ziegeler wrote:
> Reinhard Pötz wrote:
>> Reinhard Pötz wrote:
>>> Carsten Ziegeler wrote:
>>>
>>>> There is a slight overlap between the serializer and the result - for
>>>> example if you want to get java objects out of the pipeline, you might
>>>> want to use a special serializer. So maybe we can merge the two?
>>> I was thinking about this too but I preferred the symmetry of defining
>>> the input and output objects at pipeline level and not at component
>>> level. OTOH it's only the serializer which needs access to the output
>>> object, hmmm ...
>>>
>> The more I think about it the less I like the idea of merging the result
>> and the finisher (serializer) interfaces. The reason is that you would
>> have to pass the result object to the finisher which means that you have
>> to do this when you create the finisher instead of doing it when you
>> setup the pipeline.
> Hmm, if you merge result and finisher, you don't have to pass the result
> object to the finisher - it is the finisher :) But I guess you mean
> something like the output stream, right? (which is currently passed to
> the finisher by the pipeline object).

yes, I meant it that way.

> But I see your point - however I fear this flexibility - I guess it is
> unlikely that each finisher will cope with each possible result - text
> based finisher (like html or xml serializers) will be able to write to a
> stream or writer, finishers creating binary content (like the pdf
> serializer) will be able to write to a stream but not to a writer. Atm,
> I see no other possible result format for these kind of finishers (maybe
> getting the fop object model? hmm)

They can still produce an OutputStream. I have no problem with that.

> Now, if you want a tree of objects as a result of your pipeline, you
> will have a special finisher for that (maybe a castor finisher or
> whatever).
> So in the end your result object type depends heavily on the used finisher.

Yes and I don't think that there speaks anything against it: One
serializer can produce one particular result type.

The problem that I want to solve is that a pipeline can produce other
results than OutputStreams which imposes an unnecessary limit or makes
things ugly because you have to produce something that you don't need
and use a hack to get out what you really want.

I think we all agree on this.

The question we have to answer now is, how to we want to get access to
the produced result. As pointed out before, I don't think that it is a
good idea if you have to ask the finisher to get the result. If we do it
this way, we would probably have to expose the finisher via the pipeline
API. I'm not sure if that's the way to go.

> We already have a similar case with the producer/starter - we don't have
> a source interface abstracting where the input for a starter is comming
> from (like from a stream, reader, http request etc.).

yes, the situation is similar but since we already have input parameters
that can be passed in the setup method of a pipeline, it isn't as
limiting as in the output case.

-- 
Reinhard Pötz                           Managing Director, {Indoqa} GmbH
                         http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member                  reinhard@apache.org
________________________________________________________________________

Re: [c3] Pipeline results

Posted by Carsten Ziegeler <cz...@apache.org>.

Reinhard Pötz wrote:
> Reinhard Pötz wrote:
>> Carsten Ziegeler wrote:
>>
>>> There is a slight overlap between the serializer and the result - for
>>> example if you want to get java objects out of the pipeline, you might
>>> want to use a special serializer. So maybe we can merge the two?
>> I was thinking about this too but I preferred the symmetry of defining
>> the input and output objects at pipeline level and not at component
>> level. OTOH it's only the serializer which needs access to the output
>> object, hmmm ...
>>
> 
> The more I think about it the less I like the idea of merging the result
> and the finisher (serializer) interfaces. The reason is that you would
> have to pass the result object to the finisher which means that you have
> to do this when you create the finisher instead of doing it when you
> setup the pipeline.
Hmm, if you merge result and finisher, you don't have to pass the result
object to the finisher - it is the finisher :) But I guess you mean
something like the output stream, right? (which is currently passed to
the finisher by the pipeline object).
But I see your point - however I fear this flexibility - I guess it is
unlikely that each finisher will cope with each possible result - text
based finisher (like html or xml serializers) will be able to write to a
stream or writer, finishers creating binary content (like the pdf
serializer) will be able to write to a stream but not to a writer. Atm,
I see no other possible result format for these kind of finishers (maybe
getting the fop object model? hmm)
Now, if you want a tree of objects as a result of your pipeline, you
will have a special finisher for that (maybe a castor finisher or
whatever).
So in the end your result object type depends heavily on the used finisher.

We already have a similar case with the producer/starter - we don't have
a source interface abstracting where the input for a starter is comming
from (like from a stream, reader, http request etc.).

Carsten

> 
> It would also mean that you have to keep a reference to the finisher (or
> even worse, expose the finisher via the pipeline API) in order to get
> access to the result.
> 
> WDYT?
> 


-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [c3] Pipeline results

Posted by Reinhard Pötz <re...@apache.org>.

Reinhard Pötz wrote:
> Carsten Ziegeler wrote:
>> Reinhard Pötz wrote:
>>> But still there is one limitation that has bugged me several times: The
>>> result of all pipelines is that they write something into an output stream.
>>> That's the original use case for pipelines used in servlet environments
>>> and probably true in many cases but not in all.
>>> One use case that I have is that I want to get the SAX events written
>>> into a SAX buffer or some might want to make the pipeline write into
>>> some content handler directly that they pass to it.
>>>
>>> Another use case is when you want to use pipelines to create business
>>> objects as their result. (I guess Simone can give more details here.)
>>>
>>> In all these cases you don't need an output stream.
>> Yes, I had recently the same idea when I wanted to write the output to a
>> writer instead of an output stream (now this is easy to do with a
>> wrapper but still it could be easier).
>>
>>>                                   - o -
>>>
>>>
>>> In order to make the pipeline result pluggable, I propose following
>>> changes:
>>>
>>> I would like to introduce a generic Result interface which is the
>>> transport vehicle for the actual result:
>> Ah, here we have the new interface :)
>>
>>> public interface Result<T> {
>>>
>>>     void setResultObject(T o);
>>>
>>>     T getResultObject();
>>> }
>>>
>>> And here is a possible implementation of a Result that carries
>>> an output stream:
>>>
>>> public class OutputStreamResult implements Result<OutputStream> {
>>>
>>>     private OutputStream os;
>>>
>>>     public OutputStream getResultObject() {
>>>         return os;
>>>     }
>>>
>>>     public void setResultObject(OutputStream o) {
>>>         this.os = o;
>>>     }
>>>
>>>     public String getContentType() {
>>>         ...
>>>     }
>>>
>>>     public long getLastModified() {
>>>         ...
>>>     }
>>> }
>>>
>>> The pipeline interface would have to change accordingly:
>>>
>>> public interface Pipeline<T> {
>>>
>>>     void addComponent(PipelineComponent pipelineComponent);
>>>
>>>     void addComponent(Finisher<T> finisher);
>>>
>>>     void execute() throws Exception;
>>>
>>>     void setup(Result<T> result);
>>>
>>>     void setup(Result<T> result, Map<String, Object> parameters);
>>>
>>>     void setConfiguration(Map<String, ? extends Object> parameters);
>>> }
>>>
>>> As you can see, the methods getContentType() and getLastModified() were
>>> moved to the result object where they fit much better IMO because both
>>> are specific for our output stream use case.
>>>
>>> And thanks to generics, the compiler makes sure that only a Finisher (=
>>> serializer) can be added to the pipeline if it supports a particular
>>> result type:
>>>
>>> public interface Finisher<T> extends PipelineComponent {
>>>
>>>     void setResult(Result<T> result);
>>> }
>>>
>>>
>> Hmm, not sure :) How does this it actually work - what does a serializer
>> need to do?
> 
> The serializer has to fill the result object with some content.
> 
>> There is a slight overlap between the serializer and the result - for
>> example if you want to get java objects out of the pipeline, you might
>> want to use a special serializer. So maybe we can merge the two?
> 
> I was thinking about this too but I preferred the symmetry of defining
> the input and output objects at pipeline level and not at component
> level. OTOH it's only the serializer which needs access to the output
> object, hmmm ...
> 

The more I think about it the less I like the idea of merging the result
and the finisher (serializer) interfaces. The reason is that you would
have to pass the result object to the finisher which means that you have
to do this when you create the finisher instead of doing it when you
setup the pipeline.

It would also mean that you have to keep a reference to the finisher (or
even worse, expose the finisher via the pipeline API) in order to get
access to the result.

WDYT?

-- 
Reinhard Pötz                           Managing Director, {Indoqa} GmbH
                         http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member                  reinhard@apache.org
________________________________________________________________________

Re: [c3] Pipeline results

Posted by Reinhard Pötz <re...@apache.org>.

Carsten Ziegeler wrote:
> Reinhard Pötz wrote:
>> But still there is one limitation that has bugged me several times: The
>> result of all pipelines is that they write something into an output stream.
>> That's the original use case for pipelines used in servlet environments
>> and probably true in many cases but not in all.
>> One use case that I have is that I want to get the SAX events written
>> into a SAX buffer or some might want to make the pipeline write into
>> some content handler directly that they pass to it.
>>
>> Another use case is when you want to use pipelines to create business
>> objects as their result. (I guess Simone can give more details here.)
>>
>> In all these cases you don't need an output stream.
> Yes, I had recently the same idea when I wanted to write the output to a
> writer instead of an output stream (now this is easy to do with a
> wrapper but still it could be easier).
> 
>>
>>                                   - o -
>>
>>
>> In order to make the pipeline result pluggable, I propose following
>> changes:
>>
>> I would like to introduce a generic Result interface which is the
>> transport vehicle for the actual result:
> Ah, here we have the new interface :)
> 
>> public interface Result<T> {
>>
>>     void setResultObject(T o);
>>
>>     T getResultObject();
>> }
>>
>> And here is a possible implementation of a Result that carries
>> an output stream:
>>
>> public class OutputStreamResult implements Result<OutputStream> {
>>
>>     private OutputStream os;
>>
>>     public OutputStream getResultObject() {
>>         return os;
>>     }
>>
>>     public void setResultObject(OutputStream o) {
>>         this.os = o;
>>     }
>>
>>     public String getContentType() {
>>         ...
>>     }
>>
>>     public long getLastModified() {
>>         ...
>>     }
>> }
>>
>> The pipeline interface would have to change accordingly:
>>
>> public interface Pipeline<T> {
>>
>>     void addComponent(PipelineComponent pipelineComponent);
>>
>>     void addComponent(Finisher<T> finisher);
>>
>>     void execute() throws Exception;
>>
>>     void setup(Result<T> result);
>>
>>     void setup(Result<T> result, Map<String, Object> parameters);
>>
>>     void setConfiguration(Map<String, ? extends Object> parameters);
>> }
>>
>> As you can see, the methods getContentType() and getLastModified() were
>> moved to the result object where they fit much better IMO because both
>> are specific for our output stream use case.
>>
>> And thanks to generics, the compiler makes sure that only a Finisher (=
>> serializer) can be added to the pipeline if it supports a particular
>> result type:
>>
>> public interface Finisher<T> extends PipelineComponent {
>>
>>     void setResult(Result<T> result);
>> }
>>
>>
> Hmm, not sure :) How does this it actually work - what does a serializer
> need to do?

The serializer has to fill the result object with some content.

> There is a slight overlap between the serializer and the result - for
> example if you want to get java objects out of the pipeline, you might
> want to use a special serializer. So maybe we can merge the two?

I was thinking about this too but I preferred the symmetry of defining
the input and output objects at pipeline level and not at component
level. OTOH it's only the serializer which needs access to the output
object, hmmm ...

-- 
Reinhard Pötz                           Managing Director, {Indoqa} GmbH
                         http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member                  reinhard@apache.org
________________________________________________________________________

Re: [c3] Pipeline results

Posted by Carsten Ziegeler <cz...@apache.org>.

Reinhard Pötz wrote:
> But still there is one limitation that has bugged me several times: The
> result of all pipelines is that they write something into an output stream.
> That's the original use case for pipelines used in servlet environments
> and probably true in many cases but not in all.
> One use case that I have is that I want to get the SAX events written
> into a SAX buffer or some might want to make the pipeline write into
> some content handler directly that they pass to it.
> 
> Another use case is when you want to use pipelines to create business
> objects as their result. (I guess Simone can give more details here.)
> 
> In all these cases you don't need an output stream.
Yes, I had recently the same idea when I wanted to write the output to a
writer instead of an output stream (now this is easy to do with a
wrapper but still it could be easier).

> 
> 
>                                   - o -
> 
> 
> In order to make the pipeline result pluggable, I propose following
> changes:
> 
> I would like to introduce a generic Result interface which is the
> transport vehicle for the actual result:
Ah, here we have the new interface :)

> 
> public interface Result<T> {
> 
>     void setResultObject(T o);
> 
>     T getResultObject();
> }
> 
> And here is a possible implementation of a Result that carries
> an output stream:
> 
> public class OutputStreamResult implements Result<OutputStream> {
> 
>     private OutputStream os;
> 
>     public OutputStream getResultObject() {
>         return os;
>     }
> 
>     public void setResultObject(OutputStream o) {
>         this.os = o;
>     }
> 
>     public String getContentType() {
>         ...
>     }
> 
>     public long getLastModified() {
>         ...
>     }
> }
> 
> The pipeline interface would have to change accordingly:
> 
> public interface Pipeline<T> {
> 
>     void addComponent(PipelineComponent pipelineComponent);
> 
>     void addComponent(Finisher<T> finisher);
> 
>     void execute() throws Exception;
> 
>     void setup(Result<T> result);
> 
>     void setup(Result<T> result, Map<String, Object> parameters);
> 
>     void setConfiguration(Map<String, ? extends Object> parameters);
> }
> 
> As you can see, the methods getContentType() and getLastModified() were
> moved to the result object where they fit much better IMO because both
> are specific for our output stream use case.
> 
> And thanks to generics, the compiler makes sure that only a Finisher (=
> serializer) can be added to the pipeline if it supports a particular
> result type:
> 
> public interface Finisher<T> extends PipelineComponent {
> 
>     void setResult(Result<T> result);
> }
> 
> 
Hmm, not sure :) How does this it actually work - what does a serializer
need to do?
There is a slight overlap between the serializer and the result - for
example if you want to get java objects out of the pipeline, you might
want to use a special serializer. So maybe we can merge the two?

Carsten


-- 
Carsten Ziegeler
cziegeler@apache.org