You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Reinhard Poetz <re...@apache.org> on 2008/03/21 18:17:37 UTC

Exploring Corona

Today I have added Corona to our whiteboard section in SVN 
(http://svn.apache.org/repos/asf/cocoon/whiteboard/corona/). It mostly mimicks 
the existing concepts of pipelines and sitemaps as you know from Cocoon 2.x. The 
available test cases are a good starting point to explore the sources. We also 
hope that this email explains some of our ideas.

What does already work?
=======================

PIPELINE API
°°°°°°°°°°°°

So far we have created a (minimalistic) pipeline API (o.a.c.c.pipeline.Pipeline 
[1]) that works based on two fundamental concepts:

1. The first component of a pipeline is of type
    o.a.c.c.pipeline.component.Starter. The last component is of type
    o.a.c.c.pipeline.component.Finisher.

2. In order to link components with each other, the first has to be
    a o.a.c.c.pipeline.component.Producer, the latter
    a o.a.c.c.pipeline.component.Consumer.


When the pipeline links the components, it merely checks whether the above 
mentioned interfaces are present. So the pipeline does not know about the 
specifc capabilities or the compatibility of the components.
It is the responsibility of the Producer to decide whether a specific Consumer 
can be linked to it or not (that is, whether it can produce output in the 
desired format of the Consumer or not). It is also conceivable that a Producer 
is capable of accepting different types of Consumers and adjust the output 
format according to the actual Consumer.

There are SAX-based components that implement these concepts.

These concepts are more general than the implementation of Ccooon 2.x which has 
explicit methods to set generators, transformers, serializers and readers on 
pipelines.


SITEMAP
°°°°°°°

The sitemap engine works similar like Cocoon 2.x. The 
o.a.c.c.sitemap.SitemapBuilder reads XML and creates a tree of 
o.a.c.c.sitemap.node.SitemapNode objects that know their parent node, their 
child nodes and their parameters.

The o.a.c.c.sitemap.node.AbstractSitemapNode handles the node relationships and 
parameters in a general way.
However there are two annotations (@o.a.c.c.sitemap.node.annotations.NodeChild 
and @o.a.c.c.sitemap.node.annotations.Parameter) to make the access of specific 
child nodes and parameters more explicit.
The ChildNode annotation can be used to store a certain child node in a separate 
member variable instead of the collection of all children (e.g. 
o.a.c.c.sitemap.node.PipelineNode receives its ErrorNode in the errorNode member 
variable).
The Parameter annotation works the same way, but causes parameters to be stored 
in separate member variables (e.g. o.a.c.c.sitemap.node.MatchNode receives its 
pattern in the pattern member variable).


When the sitemap is being executed, the invocation traverses the tree of 
SitemapNodes. Each node returns a o.a.c.c.sitemap.node.InvocationResult that 
indicates the execution state. This is one of NONE, PROCESSED, and COMPLETED:

* NONE means that the node did not do any processing whatsoever (e.g. a 
MatchNode did not match).

* PROCESSED means that the node did some processing, but the traversal should 
continue (e.g. the GenerateNode installed a Generator at the pipeline; but some 
other components might still be pending)

* COMPLETED means that the node did some processing and the traversal should 
stop, since the invocation processing is completed (e.g. the PipelineNode 
executed the pipeline)


Nodes that act as a switch (e.g. MatchNode, ErrorNode, etc.) aggregate the 
individual results of their children.
So a MatchNode will respond with NONE if and only if all of its children return 
NONE, and with COMPLETED otherwise.


EXECUTION CONTEXT
°°°°°°°°°°°°°°°°°

When a pipeline and then further the sitemap is invoked, the execution context 
is passed. Since context has so many different meanings to us, we called this 
execution context o.a.c.c.sitemap.Invocation. It contains input parameters, 
sitemap parameters and a component provider and gives access to the result.

Since the sitemap should be useable in any environment, the Invocation doesn't 
have any environment specific dependencies (e.g. the Servlet API). Hence, the 
input parameters are a general map. However, our idea is that environment 
specific parameters (e.g. the HTTPRequest) can be put into this map too and can 
be made accessible by an accessor helper class.
So if a component needs access to environment specific parameters e.g. the 
HTTPRequest, it uses the appropriate accessor helper class. All components and 
accessor helper classes that belong to a certain environment (iow. are not 
generally available) should be bundled together. This creates a core module that 
is useable in any environment and additional modules for specific purposes.

Since the sitemap shouldn't depend on a specific component container, the 
o.a.c.c.sitemap.ComponentProvider as an abstraction for specific containers, was 
introduced. So far we have implemented a o.a.c.c.sitemap.SpringComponentProvider 
that encapsulates all Spring bean lookups. It should be fairly easy to write 
implementation for alternative containers (e.g. OSGi).


What needs to be done?
======================

So far the package and module structure doesn't reflect all the ideas from 
above. For now we have prefered to keep the things simple.

The sitemap language hasn't been completed so far. We also think that we should 
take this as a chance to tidy up a few things (e.g. map:match vs map:select, 
map:call).

There are still many components missing. For our needs we will work on the 
XSLTTransformer and the IncludeTransformer.

There is no support for other expression languages (e.g. JEXL, JXPath) than the 
"map" language.

We want to introduce support for controllers, however, we are not sure if this 
should be a environment specific concept or should only go into a servlet 
specific module. (So far we don't have any use case for it outside of a web 
application.)

Caching pipelines are not supported yet.

As I said (http://marc.info/?l=xml-cocoon-dev&m=120542286521032&w=2), our goal 
is to get the Micro-Cocoon test sitemap[2] working.


We hope that this gives you enough hints in order to understand Corona. We are 
very interested in further discussions. So fire at will!

Reinhard & Steven


[1] o.a.c.c -> org.apache.cocoon.corona
[2] 
http://svn.apache.org/repos/asf/cocoon/whiteboard/micro/misc/cocoon-micro-it-block/src/main/resources/COB-INF/sitemap.xmap

-- 
Reinhard Pötz                            Managing Director, {Indoqa} GmbH
                           http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member, PMC Chair        reinhard@apache.org
_________________________________________________________________________

Re: Exploring Corona

Posted by Sylvain Wallez <sy...@apache.org>.
Carsten Ziegeler wrote:
> Intersting stuff - thanks Reinhard and Steven for starting this and 
> sharing it with us.
>
> Finally I had time to have a *brief* look at it and I have some 
> remarks :)
>
> I think the pipeline api and sitemap api should be separate things. So 
> the invocation should rather be in the pipeline api as the base of 
> executing pipelines. We could than split this into two modules.
>
> I'm not sure if actions belong to the pipeline api; i think they are 
> rather sitemap specific. All they do wrt to the pipeline is to change 
> the invocation perhaps. So this could also be done before starting the 
> pipeline and get the action stuff out of the pipeline api.

Yes, actions definitely don't belong to the pipeline API. They are 
sitemap control structures, just like matchers and selectors. The main 
difference between matcher and action (besides the pattern/src 
attribute) is that actions are allowed to have side effects while 
matchers should not.

> The classes should be put into different packages: we should separate 
> between the pure api, helper classes and implementations. This makes 
> it easier to use the stuff in an osgi environment.
>
> Ok, final comment for today, the idea of abstracting the consumer and 
> the producer seems appealing. It's like the javax.xml stuff (Result, 
> Source); the javax.xml stuff has the advantage that the implementation 
> knows which results and sources are possible: there are only a 
> handfull of subsclasses; adding own results or sources simply is not 
> supported.
> I fear we will have to follow the same path (which might not be bad).

Reminds me of some old thoughts I had about a Cocoon 3. This can be the 
role of a collection of adapters that would convert data for components 
that can't directly talk to each other. This complexifies the picture a 
bit, but would allow for advanced things such as non-XML pipelines, 
mixing SAX, DOM and StAX transparently to e.g. perform some 
content-aware construction of the pipeline, etc.

Sylvain

-- 
Sylvain Wallez - http://bluxte.net


Re: Exploring Corona

Posted by Torsten Curdt <tc...@apache.org>.
On Mar 27, 2008, at 19:14, Carsten Ziegeler wrote:
> Intersting stuff - thanks Reinhard and Steven for starting this and  
> sharing it with us.
>
> Finally I had time to have a *brief* look at it and I have some  
> remarks :)
>
> I think the pipeline api and sitemap api should be separate things.

+1

> So the invocation should rather be in the pipeline api as the base  
> of executing pipelines. We could than split this into two modules.
>
> I'm not sure if actions belong to the pipeline api; i think they are  
> rather sitemap specific. All they do wrt to the pipeline is to  
> change the invocation perhaps. So this could also be done before  
> starting the pipeline and get the action stuff out of the pipeline  
> api.

+1

cheers
--
Torsten

Re: Exploring Corona

Posted by Steven Dolg <st...@gmx.at>.

Carsten Ziegeler schrieb:
> Ralph Goers wrote:
>> Consider this:
>>
>> URL baseUrl = new URL("file:///C:/temp/");
>> Pipeline pipeline = new NonCachingPipeline();
>> pipeline.addComponent(new FileGenerator(new URL(baseUrl, "xyz.xml"));
>> pipeline.addComponent(new XSLTTransformer(new URL(baseUrl, "xyz.xslt"));
>> pipeline.addComponent(new XMLSerializer());
>> pipeline.invoke(new InvocationImpl(System.out));
>>
>> This simple pipeline has these potentially cacheable components; 
>> xyz.xml, xyz.xslt, the result of the XSLT transformation, and the 
>> final result of the pipeline. As it relates to the pipeline I don't 
>> see how the URL.getLastModified() really helps as it could apply to 
>> any of these items, two of which aren't even URLs.
>>
> Hmm, I think this isn't different to what we have today with sources.
> Today: FileGenerator, XSLTTransformer use a source as input
>        For caching: this source provides a validity object
> URLs: FileGenerator, XSLTTransformer use a url as input
>        For caching: this url provides a last modified date
> XMLSerializer in both cases returns a fake (or always valid) validity 
> object/last modified.
Thanks for responding ;-)
This is exactly the way I implemented the simple caching approach for 
Corona.
Patch from me is still due (I know, shame on me) - work load is 
currently quite high...
>
> Now, as I responded to Steven, last modified covers most use cases but 
> not all of the use cases the validity object can handle. This is where 
> we have to think about a good way to have the same.
>
> Carsten

Re: Exploring Corona

Posted by Carsten Ziegeler <cz...@apache.org>.
Ralph Goers wrote:
> Consider this:
> 
> URL baseUrl = new URL("file:///C:/temp/");
> Pipeline pipeline = new NonCachingPipeline();
> pipeline.addComponent(new FileGenerator(new URL(baseUrl, "xyz.xml"));
> pipeline.addComponent(new XSLTTransformer(new URL(baseUrl, "xyz.xslt"));
> pipeline.addComponent(new XMLSerializer());
> pipeline.invoke(new InvocationImpl(System.out));
> 
> This simple pipeline has these potentially cacheable components; 
> xyz.xml, xyz.xslt, the result of the XSLT transformation, and the final 
> result of the pipeline. As it relates to the pipeline I don't see how 
> the URL.getLastModified() really helps as it could apply to any of these 
> items, two of which aren't even URLs.
> 
Hmm, I think this isn't different to what we have today with sources.
Today: FileGenerator, XSLTTransformer use a source as input
        For caching: this source provides a validity object
URLs: FileGenerator, XSLTTransformer use a url as input
        For caching: this url provides a last modified date
XMLSerializer in both cases returns a fake (or always valid) validity 
object/last modified.

Now, as I responded to Steven, last modified covers most use cases but 
not all of the use cases the validity object can handle. This is where 
we have to think about a good way to have the same.

Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Exploring Corona

Posted by Rainer Pruy <Ra...@Acrys.COM>.
It is essential to keep the different layers straight here.

The example is somewhere at the level of the pipeline api or probably sitemap api implementation..
Here caching is a question of the implementation of the components.
It actually will depend on different implementations of generators, transformers or serializers (cache-enabled or not).

URL cache support is an issue for implementing the cache support within a component.
e.g. the FileGenerator might use .getLastModified() or alike methods for determining cache control info for its own cacheability...
Also the transformer might use such information for determining whether the script used is still valid.....

Thus, it is not really surprising that the example will not really benefit from cache parameter info methods provided from URL
implementations - it's a different layer.

However, e.g. when trying to decide whether the "cached" result of the FileGenerator() *component* is still valid, it will come handy
to have information on whether the file did change in between.

Rainer

Ralph Goers schrieb:
> Consider this:
> 
> URL baseUrl = new URL("file:///C:/temp/");
> Pipeline pipeline = new NonCachingPipeline();
> pipeline.addComponent(new FileGenerator(new URL(baseUrl, "xyz.xml"));
> pipeline.addComponent(new XSLTTransformer(new URL(baseUrl, "xyz.xslt"));
> pipeline.addComponent(new XMLSerializer());
> pipeline.invoke(new InvocationImpl(System.out));
> 
> This simple pipeline has these potentially cacheable components;
> xyz.xml, xyz.xslt, the result of the XSLT transformation, and the final
> result of the pipeline. As it relates to the pipeline I don't see how
> the URL.getLastModified() really helps as it could apply to any of these
> items, two of which aren't even URLs.
> 
> Ralph
> 
> Steven Dolg wrote:
>>
>>
>> Carsten Ziegeler schrieb:
>>> Steven Dolg wrote:
>>>> How about:
>>>>
>>>> URL url = new URL("some url");
>>>> UrlConnection connection = url.openConnection();
>>>> connection.getLastModified();
>>>>
>>>> Not sure it this really works in all cases, but appears to be quite
>>>> suitable and easily extensible.
>>>>
>>> Yes, this works for many cases, but not for cases like where you have
>>> an expiry date etc. What do you mean by "easily extensible"?
>> url.openConnection() actually returns a subclass of URLConnection
>> depending on the protocol of the URL.
>> So own protocol implementations can return own subclasses that
>> implement this (and other methods) accordingly.
>> And - at least theoretically - provide additional methods for handling
>> specific stuff, e.g. expiration dates.
>>>
>>> Carsten
>>>


Re: Exploring Corona

Posted by Ralph Goers <Ra...@dslextreme.com>.
Consider this:

URL baseUrl = new URL("file:///C:/temp/");
Pipeline pipeline = new NonCachingPipeline();
pipeline.addComponent(new FileGenerator(new URL(baseUrl, "xyz.xml"));
pipeline.addComponent(new XSLTTransformer(new URL(baseUrl, "xyz.xslt"));
pipeline.addComponent(new XMLSerializer());
pipeline.invoke(new InvocationImpl(System.out));

This simple pipeline has these potentially cacheable components; 
xyz.xml, xyz.xslt, the result of the XSLT transformation, and the final 
result of the pipeline. As it relates to the pipeline I don't see how 
the URL.getLastModified() really helps as it could apply to any of these 
items, two of which aren't even URLs.

Ralph

Steven Dolg wrote:
>
>
> Carsten Ziegeler schrieb:
>> Steven Dolg wrote:
>>> How about:
>>>
>>> URL url = new URL("some url");
>>> UrlConnection connection = url.openConnection();
>>> connection.getLastModified();
>>>
>>> Not sure it this really works in all cases, but appears to be quite 
>>> suitable and easily extensible.
>>>
>> Yes, this works for many cases, but not for cases like where you have 
>> an expiry date etc. What do you mean by "easily extensible"?
> url.openConnection() actually returns a subclass of URLConnection 
> depending on the protocol of the URL.
> So own protocol implementations can return own subclasses that 
> implement this (and other methods) accordingly.
> And - at least theoretically - provide additional methods for handling 
> specific stuff, e.g. expiration dates.
>>
>> Carsten
>>

Re: Exploring Corona

Posted by Steven Dolg <st...@gmx.at>.

Carsten Ziegeler schrieb:
> Steven Dolg wrote:
>> How about:
>>
>> URL url = new URL("some url");
>> UrlConnection connection = url.openConnection();
>> connection.getLastModified();
>>
>> Not sure it this really works in all cases, but appears to be quite 
>> suitable and easily extensible.
>>
> Yes, this works for many cases, but not for cases like where you have 
> an expiry date etc. What do you mean by "easily extensible"?
url.openConnection() actually returns a subclass of URLConnection 
depending on the protocol of the URL.
So own protocol implementations can return own subclasses that implement 
this (and other methods) accordingly.
And - at least theoretically - provide additional methods for handling 
specific stuff, e.g. expiration dates.
>
> Carsten
>

Re: Exploring Corona

Posted by Carsten Ziegeler <cz...@apache.org>.
Steven Dolg wrote:
> How about:
> 
> URL url = new URL("some url");
> UrlConnection connection = url.openConnection();
> connection.getLastModified();
> 
> Not sure it this really works in all cases, but appears to be quite 
> suitable and easily extensible.
> 
Yes, this works for many cases, but not for cases like where you have an 
expiry date etc. What do you mean by "easily extensible"?

Carsten

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Exploring Corona

Posted by Steven Dolg <st...@gmx.at>.

Carsten Ziegeler schrieb:
> Reinhard Poetz wrote:
>> ok. Steven and I will work on Corona next week again so that the code 
>> reflects the "layered design" that we have discussed recently. When 
>> doing this we will also improve the package structure to make it 
>> becomes cleaner in general (and more OSGi friendly in particular).
>>
> Great :) I'll hold my breath till then (and try to get some ideas 
> about the url and caching stuff)
>
How about:

URL url = new URL("some url");
UrlConnection connection = url.openConnection();
connection.getLastModified();

Not sure it this really works in all cases, but appears to be quite 
suitable and easily extensible.

> Carsten
>

Re: Exploring Corona

Posted by Carsten Ziegeler <cz...@apache.org>.
Reinhard Poetz wrote:
> ok. Steven and I will work on Corona next week again so that the code 
> reflects the "layered design" that we have discussed recently. When 
> doing this we will also improve the package structure to make it becomes 
> cleaner in general (and more OSGi friendly in particular).
> 
Great :) I'll hold my breath till then (and try to get some ideas about 
the url and caching stuff)

Carsten

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Exploring Corona

Posted by Reinhard Poetz <re...@apache.org>.
Carsten Ziegeler wrote:
> Intersting stuff - thanks Reinhard and Steven for starting this and 
> sharing it with us.
> 
> Finally I had time to have a *brief* look at it and I have some remarks :)

:-)

> I think the pipeline api and sitemap api should be separate things. So 
> the invocation should rather be in the pipeline api as the base of 
> executing pipelines. We could than split this into two modules.

good idea

> 
> I'm not sure if actions belong to the pipeline api; i think they are 
> rather sitemap specific. All they do wrt to the pipeline is to change 
> the invocation perhaps. So this could also be done before starting the 
> pipeline and get the action stuff out of the pipeline api.

Since I wasn't sure if we need actions in the sitemap language at all, we just 
made them work. Maybe we can merge them with the controller integration which 
hasn't been thought through either.

> The classes should be put into different packages: we should separate 
> between the pure api, helper classes and implementations. This makes it 
> easier to use the stuff in an osgi environment.

ok. Steven and I will work on Corona next week again so that the code reflects 
the "layered design" that we have discussed recently. When doing this we will 
also improve the package structure to make it becomes cleaner in general (and 
more OSGi friendly in particular).

-- 
Reinhard Pötz                            Managing Director, {Indoqa} GmbH
                           http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member, PMC Chair        reinhard@apache.org
_________________________________________________________________________

Re: Exploring Corona

Posted by Carsten Ziegeler <cz...@apache.org>.
Intersting stuff - thanks Reinhard and Steven for starting this and 
sharing it with us.

Finally I had time to have a *brief* look at it and I have some remarks :)

I think the pipeline api and sitemap api should be separate things. So 
the invocation should rather be in the pipeline api as the base of 
executing pipelines. We could than split this into two modules.

I'm not sure if actions belong to the pipeline api; i think they are 
rather sitemap specific. All they do wrt to the pipeline is to change 
the invocation perhaps. So this could also be done before starting the 
pipeline and get the action stuff out of the pipeline api.

The classes should be put into different packages: we should separate 
between the pure api, helper classes and implementations. This makes it 
easier to use the stuff in an osgi environment.

Ok, final comment for today, the idea of abstracting the consumer and 
the producer seems appealing. It's like the javax.xml stuff (Result, 
Source); the javax.xml stuff has the advantage that the implementation 
knows which results and sources are possible: there are only a handfull 
of subsclasses; adding own results or sources simply is not supported.
I fear we will have to follow the same path (which might not be bad).

Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org