You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@sling.apache.org by Juan José Vázquez Delgado <ju...@gmail.com> on 2009/02/05 14:18:05 UTC

Pipeline support

Hi,

In response to this thread [1] in the Apache Cocoon Dev list, I have
been working in a minimal sample [2] concerning about resolution of
pipelines and Apache Sling. IMHO, having pipeline support in Sling is
an important feature in terms of separation of concerns.

On the other hand, because it´s important not reinventing the wheel,
IMHO we should take advantage of Cocoon community efforts somehow or
other.

Right now, the Cocoon team is working in a new and refactored
framework´s release named Cocoon 3. AFAIK, this release is intended to
be a more minimal version of Cocoon 2.2 and IMHO more suitable to be
integrated into Sling. For the time being (alpha-1), Cocoon artifacts
are not released as OSGi bundles.

The stuff [2] is just a proof of concept using Cocoon 3 pipelines
inside Sling but with the current state of art, that is, without
changes in Sling core.

Nevertheless, IMHO Sling should have a more natural pipeline support
with Cocoon pipeline definitions as Sling scripts. Until now, dynamic
resources have been rendered with two kinds of animals: servlets and
scripts. What about having pipelines as a new kind of animal?.

Comments and ideas are welcome.

BR,

Juanjo.

[1] http://markmail.org/message/owefsfj4eqbc4ifq#query:OSGi%20integration%20(again)%20markmail+page:1+mid:owefsfj4eqbc4ifq+state:results
[2] http://svn.apache.org/repos/asf/incubator/sling/whiteboard/jvazquez/pipeline

Re: Pipeline support - update with right link

Posted by Alexander Klimetschek <ak...@day.com>.

On Fri, Feb 20, 2009 at 5:10 AM, paksegu <pa...@yahoo.com> wrote:
> Though I am late to this discussion, taken an excerpt form previous discussion
>
> "I could imagine a XML generator that simply does an xml document view of the node in question." [ An excerpt from previous discussion]
>
> then  using *something to process the document, into something
>
> Wouldn't E4X (links below) be a viable alternative in this situation? for example you could write (output) your links and name of links into an HTML then use an HTML parser to crawl and check the links...WDYT?
>
> https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Processing_XML_with_E4X
>
> https://developer.mozilla.org/En/E4X

For the generator of the pipeline it should be possible to use any
normal sling script, because they already run on top of the data (a
node/resource) and generate a stream. If there would be a scripting
engine capable of E4X, one could simply use it as generator.

Regarding scripts for all the parts of a pipeline: For the other
elements of the pipeline (transformers and serializers) the input
interface is a bit more difficult as they will have to deal with an
input stream. In Cocoon, these pipelines are based on standard SAX
events, which is probably not that easy to "script".

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Pipeline support

Posted by Juan José Vázquez Delgado <ju...@gmail.com>.

> However whereas this is one important use case I see another use case
> where I simply want to "run" a pipeline on generated output of some
> script like for doing link checking or doing other general purpuse stuff.

Until now, I have just been thinking about XSL based transformers
acting over generated XML.

BR,

Juanjo.

Re: Pipeline support

Posted by Alexander Klimetschek <ak...@day.com>.

On Tue, Feb 10, 2009 at 2:52 PM, Felix Meschberger <fm...@gmail.com> wrote:
>> However whereas this is one important use case I see another use case
>> where I simply want to "run" a pipeline on generated output of some
>> script like for doing link checking or doing other general purpuse stuff.
>>
>> In this case the sling:resourceType would still point to the original
>> script doing the html representation and the pipeline would take the
>> (html) output and process it. Not sure if we can find a good solution
>> for this as well. But we can have a look at the first use case first and
>> then see where this leads.
>
> This kind of post-processing would probably best be placed into a
> Servlet Filter ?

+1

I don't know what the latest state of the discussion of a generic
filter mechanism is that also includes the ability to use scripts as
filters. I think the simplest thing would be if scripts for filters
would resolved just as scripts for the main "rendering". Then the
.pipeline script type can be used for both use cases.

The open question is probably how the "interface" for such a filter
script looks like.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Pipeline support

Posted by Felix Meschberger <fm...@gmail.com>.

Hi,

Carsten Ziegeler schrieb:
> Alexander Klimetschek wrote:
>> On Tue, Feb 10, 2009 at 12:13 PM, Felix Meschberger <fm...@gmail.com> wrote:
>>> There is yet another alternative, which also sounds intriguing: We
>>> define a ScriptEngineFactory for the ".pipeline" extension. Files  with
>>> the extension .pipeline would be pipeline configurations, which would be
>>> interpreted by the PipelineScriptEngine. The second part of the
>>> processing -- preparation of the input data -- would be analogous to the
>>> above with the two options :
>>>
>>>         /a/b/data
>>>              +-- sling:resourceType = "sling/pipeline/sample"
>>>
>>>         /apps/sling/pipeline/sample/html.pipeline
>>>              "file with pipeline config"
>> I like this one more.
>>
>> For the question how the initial XML (or whatever stream the pipeline
>> can handle) is generated: that should be part of the pipeline
>> config/script, using standard generators just as in Cocoon for
>> example. I could imagine a XML generator that simply does an xml
>> document view of the node in question.
>>
> Yes, I totally agree here as well :) This sounds like the nicest approach.
> 
> However whereas this is one important use case I see another use case
> where I simply want to "run" a pipeline on generated output of some
> script like for doing link checking or doing other general purpuse stuff.
> 
> In this case the sling:resourceType would still point to the original
> script doing the html representation and the pipeline would take the
> (html) output and process it. Not sure if we can find a good solution
> for this as well. But we can have a look at the first use case first and
> then see where this leads.

This kind of post-processing would probably best be placed into a
Servlet Filter ?

Regards
Felix

Re: Pipeline support

Posted by paksegu <pa...@yahoo.com>.

Though I am late to this discussion, taken an excerpt form previous discussion

"I could imagine a XML generator that simply does an xml document view of the node in question." [ An excerpt from previous discussion]

then  using *something to process the document, into something

Wouldn't E4X be a viable alternative in this situation? for example you could write (output) your links and name of links into an HTML then use an HTML parser to crawl and check the links...WDYT?

http://wiki.eclipse.org/E4/JavaScript


Ransford Segu-Baffoe



paksegu@yahoo.com



http://www.noqmx.com/

https://serenade.dev.java.net/

--- On Tue, 2/10/09, Carsten Ziegeler <cz...@apache.org> wrote:
From: Carsten Ziegeler <cz...@apache.org>
Subject: Re: Pipeline support
To: sling-dev@incubator.apache.org
Date: Tuesday, February 10, 2009, 8:46 AM

Alexander Klimetschek wrote:
> On Tue, Feb 10, 2009 at 12:13 PM, Felix Meschberger
<fm...@gmail.com> wrote:
>> There is yet another alternative, which also sounds intriguing: We
>> define a ScriptEngineFactory for the ".pipeline" extension.
Files  with
>> the extension .pipeline would be pipeline configurations, which would
be
>> interpreted by the PipelineScriptEngine. The second part of the
>> processing -- preparation of the input data -- would be analogous to
the
>> above with the two options :
>>
>>         /a/b/data
>>              +-- sling:resourceType =
"sling/pipeline/sample"
>>
>>         /apps/sling/pipeline/sample/html.pipeline
>>              "file with pipeline config"
> 
> I like this one more.
> 
> For the question how the initial XML (or whatever stream the pipeline
> can handle) is generated: that should be part of the pipeline
> config/script, using standard generators just as in Cocoon for
> example. I could imagine a XML generator that simply does an xml
> document view of the node in question.
> 
Yes, I totally agree here as well :) This sounds like the nicest approach.

However whereas this is one important use case I see another use case
where I simply want to "run" a pipeline on generated output of some
script like for doing link checking or doing other general purpuse stuff.

In this case the sling:resourceType would still point to the original
script doing the html representation and the pipeline would take the
(html) output and process it. Not sure if we can find a good solution
for this as well. But we can have a look at the first use case first and
then see where this leads.

Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Pipeline support - update with right link

Posted by paksegu <pa...@yahoo.com>.

Though I am late to this discussion, taken an excerpt form previous discussion

"I could imagine a XML generator that simply does an xml document view of the node in question." [ An excerpt from previous discussion]

then  using *something to process the document, into something

Wouldn't E4X (links below) be a viable alternative in this situation? for example you could write (output) your links and name of links into an HTML then use an HTML parser to crawl and check the links...WDYT?

https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Processing_XML_with_E4X

https://developer.mozilla.org/En/E4X


Ransford Segu-Baffoe



paksegu@yahoo.com



http://www.noqmx.com/

https://serenade.dev.java.net/

--- On Tue, 2/10/09, Carsten Ziegeler <cz...@apache.org> wrote:
From: Carsten Ziegeler <cz...@apache.org>
Subject: Re: Pipeline support
To: sling-dev@incubator.apache.org
Date: Tuesday, February 10, 2009, 8:46 AM

Alexander Klimetschek wrote:
> On Tue, Feb 10, 2009 at 12:13 PM, Felix Meschberger
<fm...@gmail.com> wrote:
>> There is yet another alternative, which also sounds intriguing: We
>> define a ScriptEngineFactory for the ".pipeline" extension.
Files  with
>> the extension .pipeline would be pipeline configurations, which would
be
>> interpreted by the PipelineScriptEngine. The second part of the
>> processing -- preparation of the input data -- would be analogous to
the
>> above with the two options :
>>
>>         /a/b/data
>>              +-- sling:resourceType =
"sling/pipeline/sample"
>>
>>         /apps/sling/pipeline/sample/html.pipeline
>>              "file with pipeline config"
> 
> I like this one more.
> 
> For the question how the initial XML (or whatever stream the pipeline
> can handle) is generated: that should be part of the pipeline
> config/script, using standard generators just as in Cocoon for
> example. I could imagine a XML generator that simply does an xml
> document view of the node in question.
> 
Yes, I totally agree here as well :) This sounds like the nicest approach.

However whereas this is one important use case I see another use case
where I simply want to "run" a pipeline on generated output of some
script like for doing link checking or doing other general purpuse stuff.

In this case the sling:resourceType would still point to the original
script doing the html representation and the pipeline would take the
(html) output and process it. Not sure if we can find a good solution
for this as well. But we can have a look at the first use case first and
then see where this leads.

Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Pipeline support

Posted by Carsten Ziegeler <cz...@apache.org>.

Alexander Klimetschek wrote:
> On Tue, Feb 10, 2009 at 12:13 PM, Felix Meschberger <fm...@gmail.com> wrote:
>> There is yet another alternative, which also sounds intriguing: We
>> define a ScriptEngineFactory for the ".pipeline" extension. Files  with
>> the extension .pipeline would be pipeline configurations, which would be
>> interpreted by the PipelineScriptEngine. The second part of the
>> processing -- preparation of the input data -- would be analogous to the
>> above with the two options :
>>
>>         /a/b/data
>>              +-- sling:resourceType = "sling/pipeline/sample"
>>
>>         /apps/sling/pipeline/sample/html.pipeline
>>              "file with pipeline config"
> 
> I like this one more.
> 
> For the question how the initial XML (or whatever stream the pipeline
> can handle) is generated: that should be part of the pipeline
> config/script, using standard generators just as in Cocoon for
> example. I could imagine a XML generator that simply does an xml
> document view of the node in question.
> 
Yes, I totally agree here as well :) This sounds like the nicest approach.

However whereas this is one important use case I see another use case
where I simply want to "run" a pipeline on generated output of some
script like for doing link checking or doing other general purpuse stuff.

In this case the sling:resourceType would still point to the original
script doing the html representation and the pipeline would take the
(html) output and process it. Not sure if we can find a good solution
for this as well. But we can have a look at the first use case first and
then see where this leads.

Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: Pipeline support

Posted by Alexander Klimetschek <ak...@day.com>.

On Tue, Feb 10, 2009 at 12:13 PM, Felix Meschberger <fm...@gmail.com> wrote:
> There is yet another alternative, which also sounds intriguing: We
> define a ScriptEngineFactory for the ".pipeline" extension. Files  with
> the extension .pipeline would be pipeline configurations, which would be
> interpreted by the PipelineScriptEngine. The second part of the
> processing -- preparation of the input data -- would be analogous to the
> above with the two options :
>
>         /a/b/data
>              +-- sling:resourceType = "sling/pipeline/sample"
>
>         /apps/sling/pipeline/sample/html.pipeline
>              "file with pipeline config"

I like this one more.

For the question how the initial XML (or whatever stream the pipeline
can handle) is generated: that should be part of the pipeline
config/script, using standard generators just as in Cocoon for
example. I could imagine a XML generator that simply does an xml
document view of the node in question.

Just my 2 cents...

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Pipeline support

Posted by Juan José Vázquez Delgado <ju...@gmail.com>.

> This has been extracted from the XProc candidate recomendation [1].

Sorry, I forgot the link:

[1] http://www.w3.org/TR/xproc/

Re: Pipeline support

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Tue, Mar 10, 2009 at 10:43 AM, Juan José Vázquez Delgado
<ju...@gmail.com> wrote:
>>> ...1. A XML pipeline is expressed as a W3C XProc [2] file with "xpl" extension....
>>
>> Is this "xpl" extension standard?
>> If you're choosing your own I'd prefer not having an L at the end as
>> it's too easy to confuse with a I.
>> Maybe "xpr" or even "xproc", clearer?
>
> This has been extracted from the XProc candidate recomendation [1].
> Literally: "The media type for pipeline documents is application/xml.
> Often, pipeline documents are identified by the extension .xpl."

Fine then, let's go with the standard!
-Bertrand

Re: Pipeline support

Posted by Juan José Vázquez Delgado <ju...@gmail.com>.

>> ...1. A XML pipeline is expressed as a W3C XProc [2] file with "xpl" extension....
>
> Is this "xpl" extension standard?
> If you're choosing your own I'd prefer not having an L at the end as
> it's too easy to confuse with a I.
> Maybe "xpr" or even "xproc", clearer?

This has been extracted from the XProc candidate recomendation [1].
Literally: "The media type for pipeline documents is application/xml.
Often, pipeline documents are identified by the extension .xpl."

I suppose we are not forced to use this extension but IMHO it could be
a good idea to stay in line with the recomendation. Anyway, we can
support both of them.

>> ...If you are agree with this approach, I´d like grabbing the stuff into trunk
>> after adding some unit testing...
>
> +1
>
>> ...The question is, where?, "bundles/scripting" or "contrib/scripting"?.
>
> We recently said "everything new initially goes under contrib", I
> think that's good in this case.
> Although very useful, this is not core Sling functionality.

Agreed.

BR,

Juanjo.

Re: Pipeline support

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Juanjo,

On Tue, Mar 10, 2009 at 9:47 AM, Juan José Vázquez Delgado
<ju...@gmail.com> wrote:
> ...After some work, the pipeline support prototype [1] has ended up as follows:...

Cool stuff! Still haven't tested it, shame on me...but your
description looks great.
Just two quick comments for now.

> ...1. A XML pipeline is expressed as a W3C XProc [2] file with "xpl" extension....

Is this "xpl" extension standard?

If you're choosing your own I'd prefer not having an L at the end as
it's too easy to confuse with a I.
Maybe "xpr" or even "xproc", clearer?

> ...If you are agree with this approach, I´d like grabbing the stuff into trunk
> after adding some unit testing...

+1

> ...The question is, where?, "bundles/scripting" or "contrib/scripting"?.

We recently said "everything new initially goes under contrib", I
think that's good in this case.
Although very useful, this is not core Sling functionality.

-Bertrand