You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Stefano Mazzocchi <st...@apache.org> on 2002/03/05 21:27:13 UTC

Pipe-aware Selectors [was Re: XML-Based Selection (Redirect Serializer?)]

Daniel Fagerstrom wrote:

> The pipe-selector (ideas about a better name?) would look something like
> this:
> <pipe-selector type="xpath">
>   <when test="expr1">
>     <!-- pipeline fragment -->
>   </when>
>   <when test="expr2">
>     <!-- pipeline fragment -->
>   </when>
>     ...
>   <otherwise>
>     <!-- pipeline fragment -->
>   </otherwise>
> </pipe-selector>
> 
> The general idea is that the pipe-selector buffers it input in e.g. a
> DOM-tree, then the tests can be applied to the buffered input. The pipeline
> fragment in the first when clause where the test succeed is then feeded with
> the buffered input, and its output is send to the pipeline component after
> the pipe selector.

Ok, I see your point clearly.

> How can this be implemented?
> 
> The pipe-selector is a transformer (i.e. implements the transformer
> interface) extended with a method that lets the sitemap constructor send an
> object to the pipe-selector that takes care of the tests and the pipeline
> fragment construction for the selected when clause in the pipe-selector.
> 
> The actual tests can implement the selector interface if it is ok to put the
> DOM-tree with buffered input in the objectModel. A possible issue with this
> is that the tests in the selector are performed after that the whole
> pipeline is constructed. This might give the unintuitive effect that
> components later in the pipeline that effects the objectModel and is
> executed during sitemap construction time are executed before the test in
> the pipe-selector.

Hmmm, I don't think this behavior should extend Transformer's, smells
like a bad design choice to me.

> The sitemap stylesheet constructs a class for each pipe-selector instance in
> the sitemap. This class contains a method that returns a EventPipeline. The
> method executes the tests and constructs and returns an EventPipeline for
> the first when clause that succeed. The EventPipline starts with a generator
> puts the DOM-tree in the objectModell in a DOMStreamer.
> 
> class PipeSelectorInternalPipeN01234 {
>   public EventPipeLine constructPipe(SitemapRedirector redirector,
>                                      Environment environment,
>                                      List listOfMaps) { ... }
> }
> 
> The code in sitemap_xmap for constructing the pipeline that contains a
> pipe-selector is like that for any pipeline that contains a transformer,
> with the difference that the "pipe-selector transformer" is given an
> instance of its PipeSelectorInternalPipe class.
> 
> The algoritm for the PipeSelector is:
> * Its input is connected to a DOMBuilder.
> * The  DOM-tree is put in the objectModel.
> * The constructPipe(...) method of the PipeSelectors
> PipeSelectorInternalPipe  class is executed.
> * The returned EventPipeline is connected to the output of the PipeSelector
> and the EventPipelines process method is called.
> 
> I hope that the description above is comprehensible.

I'm *seriously* worried about the need to buffer the input.

> >  2) how does this impact performance? how does this impact caching?
> >
> > [the first impacts the system usage, the second the interface of
> > Selectors or PipeSelectors]
> 
> The simplest way to implement cashability is to base the cash key generation
> on _all_ of the pipeline fragments in the when clauses. This is fairly
> unsatisfying as it implies the construction of all of the pipeline fragments
> instead of only the one that is selected. It will also decrease the
> possibility to cash the PipeSelector and lead to unnecesary recalculation
> based on changes in "when clauses" that was not selected. The problem is
> that when the generateKey() method is called it is not known what "when
> clause" that will be choosen. If we however had a method like
> generateKey(key), where "key" is the hash key for the pipeline fragment
> before the PipeSelector, "key" uniquely determines the input to the
> PipeSelector and thus what "when clause" that will be selected, this
> information: the map from key to "when clause", could be stored and the used
> to compute the cach key for the PipeSelector only based on the pipeline
> fragment in the _selected_ when clause.

Sorry, I think I lost you here. :/

> Performance: it would of course better to not be forced to store the
> SAX-events in DOM-tree, but I do not see much choice. 

Well, this is not a problem since XSLT processing works this way anyway,
but an XPath engine can be made much more incremental than a XSLT-one.

Also, it might be possible that not much load is given to these pipes
since they are mostly used in data INPUT which is normally much less
than data OUTPUT in any site.

> The main use for the
> PipeSelector will probably be to make selection based on XML input to Cocoon
> and on the output from transformers with side effects, I would guess that
> for these cases it will most of the time be quite small documents. Besides
> the need for buffering I do not think there should be any sources to
> performance botlenecks in a PipeSelector, but I do not know enogh about the
> internals of Cocoon to know for sure.
> 
> What do you think?, would something along this lines work?

Yes, I think it might work, but I still can't see if this requires a
change in the Selector interface or if another sitemap component must be
added.

What do you guys think?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Pipe-aware Selectors

Posted by Stefano Mazzocchi <st...@apache.org>.
Daniel Fagerstrom wrote:

> I have done some more thinking and have started to build a prototype. I
> decided that it would be easier to build it in the treeprocessor, so I will
> describe the design this far in terms of the treeprocessor interfaces and
> classes.

<snip/>

> Given that the design above actually work, it seem possible to implement
> pipe-aware selection without messing with any interfaces at all in Cocoon.

Ok, good.

> The somewhat implicit transport of the dom-tree is maybe a kludgy solution,
> it might be better to have a new interface that makes that communication
> more explicit. But in any case we should _not_ change the Selector
> interface.

Ok, I think we all agree on this point.

<snip/>
 
> With the design above I think the existing cash mechanisms should be usable
> as is, the EventPipeline before the selection could be cached and then the
> selection mechanism could be applied on the cached data and the content in
> the chosen "when clause" can be cached in turn. But this scheme would be
> unnecessarily inefficient: it would be better to also store a mapping
> between the cash key for the "input pipeline" and the "when clause" that is
> chosen for that input pipeline, so that the tests does not have to be
> recalculated.

ok

> <snip/>
> 
> > > Performance: it would of course better to not be forced to store the
> > > SAX-events in DOM-tree, but I do not see much choice.
> >
> > Well, this is not a problem since XSLT processing works this way anyway,
> > but an XPath engine can be made much more incremental than a XSLT-one.
> Might be doable in principle, but I would guess that it could be hard to
> reuse this functionality from an existent XSLT implementation. I have some
> previous (very bad)
> experience from trying to reuse low level mechanisms in Xalan for an
> extension element, and have to forget about that experience before I will
> repeat that mistake ;)

ok
 
> > Also, it might be possible that not much load is given to these pipes
> > since they are mostly used in data INPUT which is normally much less
> > than data OUTPUT in any site.
> Yes, that is what I believe, IMHO it seem unnecessary to implement
> complicated optimizations before there are clear use cases for them. It
> should also be noted that selection based on validation _can not_ be done in
> streaming mode, the whole document must be validated before we know that it
> is valid, i.e. buffering is necessary.

hmmm, don't know, there are cases where XML validation can be a
sequential process, but the same thoughs on XSLT apply here.

Anyway, I'd love to place your eventual contribution in the scratchpad
and see what people think about it and what performance it gets.

Also, I would love to see how this merges with Vadim's thoughs on
'multiplexers'... but I have the impression that we can work
incrementally, so you shouldn't stop your effort.

Who knows: maybe the new interface you are wondering for above might be
Multiplexer and Vadim could reuse part of your code to implement it.

Bah, dunno, just thinking out loud...

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: Pipe-aware Selectors [was Re: XML-Based Selection (Redirect Serializer?)]

Posted by Daniel Fagerstrom <da...@swipnet.se>.
Stefano Mazzocchi wrote:
> Daniel Fagerstrom wrote:
>
> > The pipe-selector (ideas about a better name?) would look something like
> > this:
> > <pipe-selector type="xpath">
> >   <when test="expr1">
> >     <!-- pipeline fragment -->
> >   </when>
> >   <when test="expr2">
> >     <!-- pipeline fragment -->
> >   </when>
> >     ...
> >   <otherwise>
> >     <!-- pipeline fragment -->
> >   </otherwise>
> > </pipe-selector>
> >
> > The general idea is that the pipe-selector buffers it input in e.g. a
> > DOM-tree, then the tests can be applied to the buffered input.
> The pipeline
> > fragment in the first when clause where the test succeed is
> then feeded with
> > the buffered input, and its output is send to the pipeline
> component after
> > the pipe selector.
>
> Ok, I see your point clearly.
Good :)

> > How can this be implemented?
I have done some more thinking and have started to build a prototype. I
decided that it would be easier to build it in the treeprocessor, so I will
describe the design this far in terms of the treeprocessor interfaces and
classes.

<background-info>
For those who had not yet studied the treeprocessor there are two main
interfaces: ProcessingNodeBuilder, implemented by e.g. SelectNodeBuilder,
and ProcessingNode, implemented by e.g. SelectNode. The node builders are
used to construct a processing tree from the sitemap, and a FooNodeBuilder
typically puts an instance of FooNode in the tree. The tree typically get
the same structure as the elements in the sitemap. The ProcessingNodes
implement:

  boolean invoke(Environment env, InvokeContext context)

where the InvokeContext among other things contains the current
EventPipeline and StreamPipeline, invoke drives the execution of a request.
Generators and Transformers are pushed into the EventPipeline, a Serializer
is put in the StreamPipeline and starts the execution of the pipeline.
Sitemap elements with children decide how to process the children.
</background-info>

To implement pipe aware selection as in the above sitemap example we need 4
classes: PipeSelectNodeBuilder, PipeSelectNode, DOMGenerator and
XPathSelector.

PipeSelectNodeBuilder: is like SelectNodeBuilder but puts PipeSelectNodes
instead of SelectNodes in the tree, it should possibly also implement
LinkedProcessingNodeBuilder to enable view-labels on a pipe-aware selector.

PipeSelectNode: extracts the current EventPipeline from the InvokeContext,
connects a DOMBuilder to the EventPipeline and executes it. The resulting
dom-tree is stored in the objectModel and a new InvokeContext is created
with a newly created EventPipeline that starts with a DOMGenerator, and the
StreamPipeline from the incoming InvokeContext. After these steps the invoke
method in PipeSelectNode will do exactly the same things as in the
SelectNode but with the new InvokeContext as input.

DOMGenerator: takes the stored dom-tree from the objectModel and applies a
DOMStreamer on it.

XPathSelector: implements the Selector interface, and its select method
takes the dom-tree from the object model and returns the (boolean) result of
the application of the XPath on it.

Given that the design above actually work, it seem possible to implement
pipe-aware selection without messing with any interfaces at all in Cocoon.
The somewhat implicit transport of the dom-tree is maybe a kludgy solution,
it might be better to have a new interface that makes that communication
more explicit. But in any case we should _not_ change the Selector
interface.

Of course all this should have been said in Java-code and not words ;) I
hope to be able to finish a first prototype soon.

<snip/>
> Hmmm, I don't think this behavior should extend Transformer's, smells
> like a bad design choice to me.
The above design takes away that.

<snip type="some incomprehensible thoughts on caching"/>
> Sorry, I think I lost you here. :/

A new trial:
With the design above I think the existing cash mechanisms should be usable
as is, the EventPipeline before the selection could be cached and then the
selection mechanism could be applied on the cached data and the content in
the chosen "when clause" can be cached in turn. But this scheme would be
unnecessarily inefficient: it would be better to also store a mapping
between the cash key for the "input pipeline" and the "when clause" that is
chosen for that input pipeline, so that the tests does not have to be
recalculated.

<snip/>

> > Performance: it would of course better to not be forced to store the
> > SAX-events in DOM-tree, but I do not see much choice.
>
> Well, this is not a problem since XSLT processing works this way anyway,
> but an XPath engine can be made much more incremental than a XSLT-one.
Might be doable in principle, but I would guess that it could be hard to
reuse this functionality from an existent XSLT implementation. I have some
previous (very bad)
experience from trying to reuse low level mechanisms in Xalan for an
extension element, and have to forget about that experience before I will
repeat that mistake ;)

> Also, it might be possible that not much load is given to these pipes
> since they are mostly used in data INPUT which is normally much less
> than data OUTPUT in any site.
Yes, that is what I believe, IMHO it seem unnecessary to implement
complicated optimizations before there are clear use cases for them. It
should also be noted that selection based on validation _can not_ be done in
streaming mode, the whole document must be validated before we know that it
is valid, i.e. buffering is necessary.

What do you think?

/Daniel Fagerstrom



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org