You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Stefano Mazzocchi <st...@apache.org> on 2001/02/20 14:19:47 UTC

[c2] Cocoon XInclude

I've looked at the latest W3C XInclude WD (26 October 2000) and I'm
pretty disappointed: it seems like nobody but the authors are endorsing
such a thing and we might guess why since it overlaps big time with XSLT
for <xsl:include> , with XPath for the document() function and with XML
Fragment for the entire thing.

Anyway, there is no distinction between server side and client side
processing so XInclude is not powerful enough for Cocoon to use since
either we end up adding something proprietary to the WD, or we have to
lobby for them to understand the value of this.

Since:

1) I don't care if we used Xinclude or not as long as it does what we
want

2) I'm sick and tired of lobbying

3) I want to get stuff done

Here follows my proposal for the Cocoon XInclude mechanism.

                        --------- o ----------

                           Cocoon XInclude
                           ===============

Cocoon requires a way to specify content aggregating behavior. 

This is defined by making possible for a generated page to trigger a
Cocoon internal subrequest and substitute the triggering content with
the content generated from the internal subrequest.

The triggering content could be one of:

 1) PI
 2) comment
 3) text
 4) element
 5) attribute

since 1/2/3 cannot be easily postprocessed in the XML world, the choice
goes to either elements or attributes.

To identify their semantic behavior, they must belong to a specific
namespace. I propose to use the namespace

 http://apache.org/cocoon/include/[major.minor]

thus

 http://apache.org/cocoon/include/1.0

for this version. Using a wider namespace (removing /cocoon) would be
too much since this inclusion mechanism is cocoon-specific and it
involves sitemap subrequests.

The choice between 'elements' and 'attributes' must be made toward
usability: the two possible choices are

 <page xmlns:include="http://apache.org/cocoon/include/1.0">
  <title>this is an example</title>
  <include:include include:uri="/sidebar/"/>
  ...
 <page>

or

 <page xmlns:include="http://apache.org/cocoon/include/1.0">
  <title>this is an example</title>
  <navigation include:uri="/sidebar/"/>
  ...
 <page>

I propose to use 'attributes' since they preserve the semantic
information of the placeholder, unlike 'elements' which indicate only
the inclusion semantic information.

                             - o -

This said, the Cocoon Include namespace is made entirely by one
attribute

 uri --> indicates the uri of the resource to include

some considerations:

1) the URI must be local and internal, therefor it must *NOT* contain a
protocol identifier: this enforces SoC by placing direct resource
control on the sitemap and avoid loosing aggregation information around
the system.

I repeat this since it's very important: allowing the aggregation of
resources directly instead of passing thru the sitemap, creates the same
problems that the document() XPath function generates, making site
administration a nightmare and placing site growth saturation with
concern overlap.

2) if the URI is relative, the subrequest will be made to the resource
relative to the one that was called.

For example, suppose that

 /index

generates

 <sidebar include:uri="sitebar"/>

the subrequest will be made to 

 /index/sitebar

which the sitemap will map to the appropriate pipeline.

3) if the URI is absolute, the subrequest will be made to the absolute
"Cocoon" resource.

For example, suppose cocoon receives the request

 http://localhost/cocoon/index

and this generates

 <sidebar include:uri="/sitebar"/>

the subrequest will be made to

 /sitebar

directly to the sitemap.

4) if the URI contains an XPointer, the returned content is the result
of the XPointer query to the generated resource. 

For example

 <page xmlns:include="http://apache.org/cocoon/include/1.0">
  <content include:uri="content#xpointer(book/chapter[3])"/>
 <page>

will include the 3-rd chapter of the generated book.

[is this really useful/desirable?]

5) if the element that contains the include:uri attribute is not empty,
the content contained is stripped out.

For example

 <page xmlns:include="http://apache.org/cocoon/include/1.0">
  <content include:uri="content">
   <something-to-fill/>
  <content>
 <page>

where the resource 'content' generates

 <blah/>

will generate at the end

 <page>
  <blah/>
 <page>

                             - o -

Being the included resources local, the sitemap retains its central
administration role and will be responsible of providing transparent
caching of the included resources.

The namespace adaptation of the included content will therefore be
performed at generation time, *NOT* by the include mechanism. 

The same thing can be said for the encoding considerations that will be
performed at serialization time since all SAX events are based on UTF-8
anyway.

Comments?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



Re: [c2] Cocoon XInclude

Posted by Stefano Mazzocchi <st...@apache.org>.
Donald Ball wrote:
> 
> On Wed, 21 Feb 2001, Paul Russell wrote:
> 
> > +1, but I'm not sure about having a separate attribute for the xpath. If
> > there's a spec for something, shouldn't we use it, rather than
> > reinventing the wheel?
> 
> we're already reinventing the wheel to some extent. pragmatically, i'd say
> that only a small subset of the xpointer spec is relevant here, and it's
> easier and faster to let the xml parser give us two attribute values than
> give us one attribute value which we then have to parse.

I agree on this....

> otoh, if we're
> planning on playing supporting xpointer ranges and such, we should
> definitely use xpointer.

...still I believe that using xpaths can be dangerous.

Think about it, when you do

 <include uri="/document" xpath="/chapter[3]"/>

you are really saying

 <include uri="/document/interesting-content"/>

where it's the administration's concern to identify what is
'interesting' in that particular request (might not always be the third
chapter).

The only time where your concern is appropriate is when you are
including parts of your own document, but I fully believe this should be
handled by another namespace reacting method!!!!

Why? well, first of all performance! performing an xpath query on
yourself requires us to store the SAX events in a buffer and write a
SAX-based XPath engine. Something admittedly pretty complex (even if
maybe Xalan2 already does this).

The act of performing internal inclusion is clearly overlapping with
XSLT functionality: surely a small subset of its functionality, but it
can be easily performed with XSLT.

Now, I believe that 'content aggregation' means including data which is
*NOT* already contained into the document. In this case, I believe
XPath/XPointer breaks SoC just like direct external access does.

But I fully agree that the ability to perform data manipulation for
content that is already in place is fully useful: this is called 'tree
transformation' and its bestly achieved with XSLT.

So, adding xpointer capabilities to the inclusion method is complex,
hard, slow, harmful and duplicates existing functionality, are you
really sure you want it?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



Re: [c2] Cocoon XInclude

Posted by Donald Ball <ba...@webslingerZ.com>.
On Wed, 21 Feb 2001, Paul Russell wrote:

> +1, but I'm not sure about having a separate attribute for the xpath. If
> there's a spec for something, shouldn't we use it, rather than
> reinventing the wheel?

we're already reinventing the wheel to some extent. pragmatically, i'd say
that only a small subset of the xpointer spec is relevant here, and it's
easier and faster to let the xml parser give us two attribute values than
give us one attribute value which we then have to parse. otoh, if we're
planning on playing supporting xpointer ranges and such, we should
definitely use xpointer.

- donald


Re: [c2] Cocoon XInclude

Posted by Paul Russell <pa...@luminas.co.uk>.
* Jeremy Quinn (jeremy@media.demon.co.uk) wrote :
> At 2:19 PM +0100 20/2/01, Stefano Mazzocchi wrote:
> >I propose to use 'attributes' since they preserve the semantic
> >information of the placeholder, unlike 'elements' which indicate only
> >the inclusion semantic information.
> +1 on using attributes

+1 here too.

> >4) if the URI contains an XPointer, the returned content is the result
> >of the XPointer query to the generated resource.
> >
> >For example
> >
> > <page xmlns:include="http://apache.org/cocoon/include/1.0">
> >  <content include:uri="content#xpointer(book/chapter[3])"/>
> > <page>
> >
> >will include the 3-rd chapter of the generated book.
> >
> >[is this really useful/desirable?]
> Yes, +1
> Though I tend to agree with Donald, that the xpath should be in it's own
> attribute.

+1, but I'm not sure about having a separate attribute for the xpath. If
there's a spec for something, shouldn't we use it, rather than
reinventing the wheel?

> I do not understand your logic here, if we are going to use attributes to
> trigger the include mechanism, why are we throwing away the the container
> tag, when this would be useful during XSLT processing to identify included
> content according to your own DTD, without having to resort to namespaces.

But you could achieve this by simple wrapping the include element in
another tag. If it replaces the tag, it's more flexible.


P.
-- 
Paul Russell                                 Email:   paul@luminas.co.uk
Technical Director                             Tel:  +44 (0)20 8553 6622
Luminas Internet Applications                  Fax:  +44 (0)870 28 47489
This is not an official statement or order.    Web:    www.luminas.co.uk

Re: [c2] Cocoon XInclude

Posted by Jeremy Quinn <je...@media.demon.co.uk>.
At 2:19 PM +0100 20/2/01, Stefano Mazzocchi wrote:

[snip]

>The choice between 'elements' and 'attributes' must be made toward
>usability: the two possible choices are
>
> <page xmlns:include="http://apache.org/cocoon/include/1.0">
>  <title>this is an example</title>
>  <include:include include:uri="/sidebar/"/>
>  ...
> <page>
>
>or
>
> <page xmlns:include="http://apache.org/cocoon/include/1.0">
>  <title>this is an example</title>
>  <navigation include:uri="/sidebar/"/>
>  ...
> <page>
>
>I propose to use 'attributes' since they preserve the semantic
>information of the placeholder, unlike 'elements' which indicate only
>the inclusion semantic information.

+1 on using attributes

[snip]

>4) if the URI contains an XPointer, the returned content is the result
>of the XPointer query to the generated resource.
>
>For example
>
> <page xmlns:include="http://apache.org/cocoon/include/1.0">
>  <content include:uri="content#xpointer(book/chapter[3])"/>
> <page>
>
>will include the 3-rd chapter of the generated book.
>
>[is this really useful/desirable?]

Yes, +1

Though I tend to agree with Donald, that the xpath should be in it's own
attribute.


>
>5) if the element that contains the include:uri attribute is not empty,
>the content contained is stripped out.
>
>For example
>
> <page xmlns:include="http://apache.org/cocoon/include/1.0">
>  <content include:uri="content">
>   <something-to-fill/>
>  <content>
> <page>
>
>where the resource 'content' generates
>
> <blah/>
>
>will generate at the end
>
> <page>
>  <blah/>
> <page>

I do not understand your logic here, if we are going to use attributes to
trigger the include mechanism, why are we throwing away the the container
tag, when this would be useful during XSLT processing to identify included
content according to your own DTD, without having to resort to namespaces.


ie. I feel the above should output this:

 <page>
  <content>
   <blah/>
  <content>
 <page>

instead of this:

 <page>
  <blah/>
 <page>

Particularly if we can use XPaths with the include.



Great to have you back Stefano!


regards Jeremy
-- 
   ___________________________________________________________________

   Jeremy Quinn                                           Karma Divers
                                                       webSpace Design
                                            HyperMedia Research Centre

   <ma...@mac.com>     		 <http://www.media.demon.co.uk>
    <phone:+44.[0].20.7737.6831>        <pa...@sms.genie.co.uk>

Re: [c2] Cocoon XInclude

Posted by Paul Russell <pa...@luminas.co.uk>.
* Torsten Curdt (tcurdt@dff.st) wrote :
> > I repeat this since it's very important: allowing the aggregation of
> > resources directly instead of passing thru the sitemap, creates the same
> > problems that the document() XPath function generates, making site
> > administration a nightmare and placing site growth saturation with
> > concern overlap.
> So how would you accomplish external aggregation then?

By wrapping the external request in a generator in the sitemap...

  <generate src="http://www.nasdaq.com/some_xml_file.xml"/>

Although donald things we should include the facility for external
includes. I'm +0 on either way, I think.


Paul.

-- 
Paul Russell                                 Email:   paul@luminas.co.uk
Technical Director                             Tel:  +44 (0)20 8553 6622
Luminas Internet Applications                  Fax:  +44 (0)870 28 47489
This is not an official statement or order.    Web:    www.luminas.co.uk

RE: [c2] Cocoon XInclude

Posted by Kirk Woerner <ki...@stoneseeker.com>.
huh, well, um, then I guess that begs the question of what's the overall
difference?  Basically, the Element tag is either completely ignored, or
required to be "include"?

> Subject: RE: [c2] Cocoon XInclude
>
> On Tue, 20 Feb 2001, Kirk Woerner wrote:
>
> > I have a comment about the element versus attribute stuff.  There IS a
> > disadvantage to the attribute way, and that is that it implies some
> > foreknowlege of what is being included and is therefore IMO
> somewhat less
> > useful.
>
> er, no, afaik the two are identical, it's just that one way you have to
> write
>
> <include:include include:uri="./included.xml"/>
>
> and another you can write
>
> <foobar include:uri="./included.xml"/>
>
> i think it's a little easier to write a SAX filter for the element way,
> but neither implies any foreknowledge of the included content. in both
> cases, the including element is replaced by the included content.
>
> - donald
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org
>
>


RE: [c2] Cocoon XInclude

Posted by Donald Ball <ba...@webslingerZ.com>.
On Tue, 20 Feb 2001, Kirk Woerner wrote:

> I have a comment about the element versus attribute stuff.  There IS a
> disadvantage to the attribute way, and that is that it implies some
> foreknowlege of what is being included and is therefore IMO somewhat less
> useful.

er, no, afaik the two are identical, it's just that one way you have to
write

<include:include include:uri="./included.xml"/>

and another you can write

<foobar include:uri="./included.xml"/>

i think it's a little easier to write a SAX filter for the element way,
but neither implies any foreknowledge of the included content. in both
cases, the including element is replaced by the included content.

- donald


RE: [c2] Cocoon XInclude

Posted by Kirk Woerner <ki...@stoneseeker.com>.
I have a comment about the element versus attribute stuff.  There IS a
disadvantage to the attribute way, and that is that it implies some
foreknowlege of what is being included and is therefore IMO somewhat less
useful.

For example, if you have a structure where you have Branches and Leaves and
a ranch can have Branches or Leaves as children while a Leaf cannot.  Using
the element version of "include", you can include a file that has a branch
or a leaf in it without knowing which it is.

dtd
<!ELEMENT Branch (Branch*. Leaf*)>
<!ELEMENT Leaf (#PCDATA)>

file.xml
<Branch xmlns...>
  <Branch>
    <Leaf>a leaf</Leaf>
  <Branch>
  <include include:uri="./included.xml">
</Branch>

in One directory you could have
included.xml
<Branch>
  <Leaf>a leaf</Leaf>
  <Leaf>another</Leaf>
</Branch>

while in another you could have
included.xml
<Leaf>a leaf</Leaf>

Sort of a contrived example, but the attribute way makes this sort of thing
impossible.

My $.02 from using xinclude...

Kirk

> -----Original Message-----
> From: Stefano Mazzocchi [mailto:stefano@apache.org]
> Sent: Tuesday, February 20, 2001 6:20 AM
> To: Cocoon
> Subject: [c2] Cocoon XInclude
>
>
> I've looked at the latest W3C XInclude WD (26 October 2000) and I'm
> pretty disappointed: it seems like nobody but the authors are endorsing
> such a thing and we might guess why since it overlaps big time with XSLT
> for <xsl:include> , with XPath for the document() function and with XML
> Fragment for the entire thing.
>
> Anyway, there is no distinction between server side and client side
> processing so XInclude is not powerful enough for Cocoon to use since
> either we end up adding something proprietary to the WD, or we have to
> lobby for them to understand the value of this.
>
> Since:
>
> 1) I don't care if we used Xinclude or not as long as it does what we
> want
>
> 2) I'm sick and tired of lobbying
>
> 3) I want to get stuff done
>
> Here follows my proposal for the Cocoon XInclude mechanism.
>
>                         --------- o ----------
>
>                            Cocoon XInclude
>                            ===============
>
> Cocoon requires a way to specify content aggregating behavior.
>
> This is defined by making possible for a generated page to trigger a
> Cocoon internal subrequest and substitute the triggering content with
> the content generated from the internal subrequest.
>
> The triggering content could be one of:
>
>  1) PI
>  2) comment
>  3) text
>  4) element
>  5) attribute
>
> since 1/2/3 cannot be easily postprocessed in the XML world, the choice
> goes to either elements or attributes.
>
> To identify their semantic behavior, they must belong to a specific
> namespace. I propose to use the namespace
>
>  http://apache.org/cocoon/include/[major.minor]
>
> thus
>
>  http://apache.org/cocoon/include/1.0
>
> for this version. Using a wider namespace (removing /cocoon) would be
> too much since this inclusion mechanism is cocoon-specific and it
> involves sitemap subrequests.
>
> The choice between 'elements' and 'attributes' must be made toward
> usability: the two possible choices are
>
>  <page xmlns:include="http://apache.org/cocoon/include/1.0">
>   <title>this is an example</title>
>   <include:include include:uri="/sidebar/"/>
>   ...
>  <page>
>
> or
>
>  <page xmlns:include="http://apache.org/cocoon/include/1.0">
>   <title>this is an example</title>
>   <navigation include:uri="/sidebar/"/>
>   ...
>  <page>
>
> I propose to use 'attributes' since they preserve the semantic
> information of the placeholder, unlike 'elements' which indicate only
> the inclusion semantic information.
>
>                              - o -
>
> This said, the Cocoon Include namespace is made entirely by one
> attribute
>
>  uri --> indicates the uri of the resource to include
>
> some considerations:
>
> 1) the URI must be local and internal, therefor it must *NOT* contain a
> protocol identifier: this enforces SoC by placing direct resource
> control on the sitemap and avoid loosing aggregation information around
> the system.
>
> I repeat this since it's very important: allowing the aggregation of
> resources directly instead of passing thru the sitemap, creates the same
> problems that the document() XPath function generates, making site
> administration a nightmare and placing site growth saturation with
> concern overlap.
>
> 2) if the URI is relative, the subrequest will be made to the resource
> relative to the one that was called.
>
> For example, suppose that
>
>  /index
>
> generates
>
>  <sidebar include:uri="sitebar"/>
>
> the subrequest will be made to
>
>  /index/sitebar
>
> which the sitemap will map to the appropriate pipeline.
>
> 3) if the URI is absolute, the subrequest will be made to the absolute
> "Cocoon" resource.
>
> For example, suppose cocoon receives the request
>
>  http://localhost/cocoon/index
>
> and this generates
>
>  <sidebar include:uri="/sitebar"/>
>
> the subrequest will be made to
>
>  /sitebar
>
> directly to the sitemap.
>
> 4) if the URI contains an XPointer, the returned content is the result
> of the XPointer query to the generated resource.
>
> For example
>
>  <page xmlns:include="http://apache.org/cocoon/include/1.0">
>   <content include:uri="content#xpointer(book/chapter[3])"/>
>  <page>
>
> will include the 3-rd chapter of the generated book.
>
> [is this really useful/desirable?]
>
> 5) if the element that contains the include:uri attribute is not empty,
> the content contained is stripped out.
>
> For example
>
>  <page xmlns:include="http://apache.org/cocoon/include/1.0">
>   <content include:uri="content">
>    <something-to-fill/>
>   <content>
>  <page>
>
> where the resource 'content' generates
>
>  <blah/>
>
> will generate at the end
>
>  <page>
>   <blah/>
>  <page>
>
>                              - o -
>
> Being the included resources local, the sitemap retains its central
> administration role and will be responsible of providing transparent
> caching of the included resources.
>
> The namespace adaptation of the included content will therefore be
> performed at generation time, *NOT* by the include mechanism.
>
> The same thing can be said for the encoding considerations that will be
> performed at serialization time since all SAX events are based on UTF-8
> anyway.
>
> Comments?
>
> --
> Stefano Mazzocchi      One must still have chaos in oneself to be
>                           able to give birth to a dancing star.
> <st...@apache.org>                             Friedrich Nietzsche
> --------------------------------------------------------------------
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org
>
>


Re: [c2] Cocoon XInclude

Posted by Donald Ball <ba...@webslingerZ.com>.
On Tue, 20 Feb 2001, Stefano Mazzocchi wrote:

> Since:
>
> 1) I don't care if we used Xinclude or not as long as it does what we
> want
>
> 2) I'm sick and tired of lobbying
>
> 3) I want to get stuff done

+1

> To identify their semantic behavior, they must belong to a specific
> namespace. I propose to use the namespace
>
>  http://apache.org/cocoon/include/[major.minor]
>
> thus
>
>  http://apache.org/cocoon/include/1.0
>
> for this version. Using a wider namespace (removing /cocoon) would be
> too much since this inclusion mechanism is cocoon-specific and it
> involves sitemap subrequests.

+1

> I propose to use 'attributes' since they preserve the semantic
> information of the placeholder, unlike 'elements' which indicate only
> the inclusion semantic information.

+0.

> This said, the Cocoon Include namespace is made entirely by one
> attribute
>
>  uri --> indicates the uri of the resource to include
>
> some considerations:
>
> 1) the URI must be local and internal, therefor it must *NOT* contain a
> protocol identifier: this enforces SoC by placing direct resource
> control on the sitemap and avoid loosing aggregation information around
> the system.

_must_ be local and internal? -1, i'd like the ability to include remote
resources if i choose.

> I repeat this since it's very important: allowing the aggregation of
> resources directly instead of passing thru the sitemap, creates the same
> problems that the document() XPath function generates, making site
> administration a nightmare and placing site growth saturation with
> concern overlap.

so document the potential negative consequences, but don't forbid it.

> 4) if the URI contains an XPointer, the returned content is the result
> of the XPointer query to the generated resource.
>
> For example
>
>  <page xmlns:include="http://apache.org/cocoon/include/1.0">
>   <content include:uri="content#xpointer(book/chapter[3])"/>
>  <page>
>
> will include the 3-rd chapter of the generated book.
>
> [is this really useful/desirable?]

it can be, sure, although i'd be inclined to make the xpath expression be
a seperate attribute:

<content include:uri="content" include:xpath="/book/chapter[3]"/>

(note that your xpointer would probably be invalid since there isn't a
context node to evaluate the relative xpath against)

although i don't think it's mandatory that we have it right away - i'd
almost rather wait until xalan has a SAX-based xpath API. does anyone know
if one's on the way?

> 5) if the element that contains the include:uri attribute is not empty,
> the content contained is stripped out.
>
> For example
>
>  <page xmlns:include="http://apache.org/cocoon/include/1.0">
>   <content include:uri="content">
>    <something-to-fill/>
>   <content>
>  <page>
>
> where the resource 'content' generates
>
>  <blah/>
>
> will generate at the end
>
>  <page>
>   <blah/>
>  <page>

how about an extension to this - if an error occurs when attempting to
include the resource, the contents of the include element will be included
instead.

> Comments?

this doesn't say anything about circular inclusions.

- donald


Re: [c2] Cocoon XInclude

Posted by Donald Ball <ba...@webslingerZ.com>.
On Wed, 21 Feb 2001, Stefano Mazzocchi wrote:

> So, instead of doing
>
>  <include uri="http://www.cnn.com/news/today/headlines.rss">
>
> you should do
>
>  <include uri="/news/today/headlines"/>
>
> and let the administration map this resource to the external one.

that makes sense, i'm +0 now. honestly, i still rather think it's 6 of
one, 1/2 dozen of the other, but i bet this will be easier to program.

> 3) inclusion loops: if we force all included resources to be internal,
> then we have to pass thru the sitemap, this allows to watch over
> inclusion loops by traking the inclusion path of the entering URI.
>
> For example
>
>  [/home] includes [/home/sidebar]
>                   [/news]
>                   [/mail/headers/new]
>                   [/cvs/last-commits]
>
>  [/news] includes [/synd/slashdot.org/]
>                   [/synd/xmlhack.com/]
>                   [/sync/freshmeat.net/]
>                   [/mail/headers/new]
>
> is no problem even if there is multiple inclusion while
>
>  [/home] includes [/home/recursive]
>
>  [/home/recursive] includes [/home]
>
> will generate an error for the second inclusion and avoid it.
>
> The algorithm to check this it's left to the reader as an exercise :)

the complexity here is if you have documents which recursively include
_portions_ of each other:

/home -> /home/recursive#/root/node1 -> /home#/root/meta

should that toss an error or not? (xinclude spec held that it did not
unless the inclusion chain ended up with two identical uri+xpath
expressions, but i wasn't convinced you couldn't end up with am infinite
loop)

- donald


Re: [c2] Cocoon XInclude

Posted by Stefano Mazzocchi <st...@apache.org>.
Torsten Curdt wrote:
> 
> [snip]
> >                            Cocoon XInclude
> >                            ===============
> >
> > Cocoon requires a way to specify content aggregating behavior.
> 
> Oh, yes!!:)
> 
> > This is defined by making possible for a generated page to trigger a
> > Cocoon internal subrequest and substitute the triggering content with
> > the content generated from the internal subrequest.
> 
> So a generator can trigger another generator via this subrequest.

More or less, yeah, but not directly.

You do something like this

 /resource [ G ---> T -*-> T ---> T ] ---> S
                       |
                [/resource/include]
                       |
                       + [ <--- T <--- G ]

where '*' is the transparent including mechanism that reacts on the
"cocoon include" namespace (element or attribute reaction is yet to be
defined).

> This sounds cool ... so inclusion of serverpages should be possible, correct?

Of course, this allows you to "include into one pipeline result the
result of other pipelines". It's *WAY* more than including other
serverpages :)

> Assuming this: we are talking about generator based inclusion.

No, like I said (and hopefully you can see from the above picture, if
not, let me know) this is 'pipeline based' not only generation.

> I alway felt XInclude beeing a transformer is a little unnatural.

Yes, I felt exactly the same and using component cache atomicity as our
metric clearly indicates why.

> Isn't including more generating than a transforming issue?

Totally.
 
> [snip]
> 
> > 1) the URI must be local and internal, therefor it must *NOT* contain a
> > protocol identifier: this enforces SoC by placing direct resource
> > control on the sitemap and avoid loosing aggregation information around
> > the system.
> >
> > I repeat this since it's very important: allowing the aggregation of
> > resources directly instead of passing thru the sitemap, creates the same
> > problems that the document() XPath function generates, making site
> > administration a nightmare and placing site growth saturation with
> > concern overlap.
> 
> So how would you accomplish external aggregation then?

Ask yourself this question: how can Cocoon map external resources if the
sitemap processes only requests made to Cocoon?

The response, in this context, is obvious: go get them. Which translates
into: write a generator to get them (or use one shipped with the
distribution, which is much more likely to happen).

Donald is '-1' on forcing 'cocoon include' to the internal resources,
here I try to explain why I believe it would be harmful to do it.

                                - o -

As usual, I'll use the metrics of SoC where a design can be 'judged' on
how much overlap creates between existing 'concern islands' (this is a
term I got from the 'knowledge management' world). Of course, the value
of a design is inversively proportional to the overlap that creates
between concern islands.

The concern islands map will be the usual 'cocoon pyramid of contracts'.

Let us start by assuming that cocoon implements transparent including as
specified in the picture above: it is sufficient to generate an
element/attribute with the specific namespace for Cocoon to react and
perform a subrequest and include the result removing the element.

Now, in general, two possible subrequests can be made

 1) internal (no protocol specified in the URI)

 2) external (protocol specified in the URI)

While it is *obvious* that this inclusion mechanism must be able, at the
end, to obtain resources created both internally and externally, but we
are judging (using the SoC metric) the functionality of allowing
*direct* external inclusion, instead of forcing external inclusion to go
thru an internal resource map.

To do this, we must understand what concern island generates the include
triggering instruction: it could be 

 - content island:  the trigger is placed directly into the document
(for example, a document fragment (i.e. license) that is included in
many files and maintained separately)

 - logic island:  the trigger is dynamically generated (for example, in
a portal-like application, where aggregation is statefull and
user-driven)

it must be noted at this point that the presentation island (style)
should never triggers content aggregation: this doesn't mean that, for
example, stylesheets cannot be separated in multiple files, no, this
means that style doesn't need to include pipeline results, otherwise,
SoC is broken since content generation is not thier concern.

In case of direct external inclusion, the content or logic island
overlaps with the administration island since the administration (i.e.
those who manage the sitemap, not those who manage the web server!)
cannot control directly the behavior of the included resource.

This is the problem: scalability is hurt by the fact that more contracts
need to be created if the responsibility of direct inclusion is
delegated by the administration island to the content/logic islands.

In general, the contracts between these islands should only be schemas
and internal URI spaces (and enviornment parameters for the logic
island), what is external should be 'mediated' by a central authority,
which is technologically represented by the sitemap.

So, instead of doing

 <include uri="http://www.cnn.com/news/today/headlines.rss">

you should do

 <include uri="/news/today/headlines"/>

and let the administration map this resource to the external one.

This has several benefits:

1) changes are localized, thus they spread automatically thruout the
entire site. For example, if the newsfeed is changed between cnn.com to,
say, cnet.com because the management did a specific deal with them, this
doesn't impact the rest of the system in any way... expecially
because...

2) resources can be 'adapted' in a central way. For example, the cnn.com
news schema might be semantically equivalent to the cnet.com news
schema, but might require transformation to be adapted to the schema our
system uses. 

Allowing direct external inclusion means that other islands rather than
administration must be aware of 'adaptation issues' with the external
world, and this creates unnecessary (and harmful) overlap between the
concerns.

This adaptation could also imply the 'namespacization' of the external
resource, or, even more likely, the 'XML-ization' of non-XML resources
(text feeds, email, news, SMS messages, MPEG-7 streams etc..)

So, while forcing internal inclusion seems limiting, it only imposes a
design pattern that is carefully choosen to minimize abuse, thus reduce
overlap, increase separation, therefore scalability and return of
investiment.

The fact that even XML experts in this list don't see this limitation
valuable automatically, adds even more value to the concept.

                                - o -

Continuing with Donald's concerns about the proposal:

1) nested content: the idea of using the markup nested inside the
include element as an error message is a very clever one, I love it!
Great idea! Definately +1!

2) element vs. attribute: yes, element filtering is easier than
attribute filtering in SAX2 and probably more performant. 

As a general perspective, attribute filtering is more semantically
reasonable, even if not immediate, in fact, it's much better to
understand something like

 <sidebar include:uri="sidebar"/>

than

 <include:include include:uri="sidebar"/>

but then again since the element is normally ignored even something like
this

 <sidebar>
  <include:include include:uri="sidebar"/>
 </sidebar>

is meaningful enough.

Well, I'd say we go for the element for performance reasons. Allowing
for attribute behavior in the future is piece of cake anyway.

3) inclusion loops: if we force all included resources to be internal,
then we have to pass thru the sitemap, this allows to watch over
inclusion loops by traking the inclusion path of the entering URI.

For example

 [/home] includes [/home/sidebar] 
                  [/news] 
                  [/mail/headers/new]
                  [/cvs/last-commits]

 [/news] includes [/synd/slashdot.org/] 
                  [/synd/xmlhack.com/]
                  [/sync/freshmeat.net/]
                  [/mail/headers/new]

is no problem even if there is multiple inclusion while
 
 [/home] includes [/home/recursive]

 [/home/recursive] includes [/home]

will generate an error for the second inclusion and avoid it. 

The algorithm to check this it's left to the reader as an exercise :)

It's incredible to see how such a simple namespace reaction can lead to
such an incredibly powerful publishing system.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



RE: [c2] Cocoon XInclude

Posted by Torsten Curdt <tc...@dff.st>.
[snip]
>                            Cocoon XInclude
>                            ===============
> 
> Cocoon requires a way to specify content aggregating behavior. 

Oh, yes!!:)

> This is defined by making possible for a generated page to trigger a
> Cocoon internal subrequest and substitute the triggering content with
> the content generated from the internal subrequest.

So a generator can trigger another generator via this subrequest.
This sounds cool ... so inclusion of serverpages should be possible, correct?
Assuming this: we are talking about generator based inclusion.

I alway felt XInclude beeing a transformer is a little unnatural.
Isn't including more generating than a transforming issue?

[snip]

> 1) the URI must be local and internal, therefor it must *NOT* contain a
> protocol identifier: this enforces SoC by placing direct resource
> control on the sitemap and avoid loosing aggregation information around
> the system.
> 
> I repeat this since it's very important: allowing the aggregation of
> resources directly instead of passing thru the sitemap, creates the same
> problems that the document() XPath function generates, making site
> administration a nightmare and placing site growth saturation with
> concern overlap.

So how would you accomplish external aggregation then?
--
Torsten