You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@forrest.apache.org by Nicola Ken Barozzi <ni...@apache.org> on 2002/08/31 19:58:10 UTC

[VOTE] Usage of file.hint.ext convention

Since our users also want to put all the files in a single dir, and 
since Cocoon needs hints about the contents of the file for an easy 
usage, I propose that we formalise the

   file.hint.ext

convention, and keep it also in the output of the files.

We should always make ext as the *target* or the *source* extension, so 
that it becomes natural to link for users. Seeing mypage.xml and having 
to link to mypage.html has confused many already.

It could be mypage.html or mypage.xml, but links and filenames need to 
be the same.

IMHO it should be the sources, for multiple output processing.

This also comes with the implicit need to change the sitemap to handle 
all filetypes.

Files that don't need to be necessarily processed have no hint (javadocs 
are html for example).

Finally, I have already demonstrated how hints don't break IoC and SoC, 
since Cocoon can decide what to do with the hint indipendently from the 
doc writer; it could for example ignore it completely.

It's simply a hint.

+1

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [VOTE] Usage of file.hint.ext convention

Posted by "J.Pietschmann" <j3...@yahoo.de>.

Nicola Ken Barozzi wrote:
> Table is a semantic thing.
HTML and XSLFO tables are commonly used for grid layouts,
like in newspapers, in absence of alternatives.

J.Pietschmann

Re: [VOTE] Usage of file.hint.ext convention

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Steven Noels wrote:
> Nicola Ken Barozzi wrote:
> 
>> Exactly why I brought the vote up.
>>
>> The double ext solution is a quick solution to this problem, what Marc 
>> and I regard to as a minor hack.
>>
>> If anyone has better suggestions, please bring them on, because we 
>> need to go forward.
> 
> 
> Again: why?
> 
> I trailed off to some recommended reading:
> 
> http://www.w3.org/Provider/Style/URI.html#remove
> http://httpd.apache.org/docs/content-negotiation.html

hehehe:

"
Note on hyperlinks and naming conventions

If you are using language negotiation you can choose between different 
naming conventions, because files can have more than one extension, and 
the order of the extensions is normally irrelevant (see mod_mime 
documentation for details).

A typical file has a MIME-type extension (e.g., html), maybe an encoding 
extension (e.g., gz), and of course a language extension (e.g., en) when 
we have different language variants of this file.

Examples:

     * foo.en.html
     * foo.html.en
     * foo.en.html.gz
"

> and then some
> 
> http://www.w3.org/TR/webarch/ (draft)
> http://www.imc.org/ietf-xml-use/ (general interest reading)
> 
> and came back with the idea that we really need CAPs for grammar-based 
> XML processing,

The downside: speed
The solution: caching between runs

> and ResourceExistActions for avoiding name clashes. 

The downside: user confusion
The solution: enable only one name in a given dir

> Other than that, I see many flaws because we want to generate static 
> HTML, but hey, that's life :-)

That's user need ;-)

See my latest postings if you find some proposed solutions acceptable.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [VOTE] Usage of file.hint.ext convention

Posted by Steven Noels <st...@outerthought.org>.

Nicola Ken Barozzi wrote:

> Exactly why I brought the vote up.
> 
> The double ext solution is a quick solution to this problem, what Marc 
> and I regard to as a minor hack.
> 
> If anyone has better suggestions, please bring them on, because we need 
> to go forward.

Again: why?

I trailed off to some recommended reading:

http://www.w3.org/Provider/Style/URI.html#remove
http://httpd.apache.org/docs/content-negotiation.html

and then some

http://www.w3.org/TR/webarch/ (draft)
http://www.imc.org/ietf-xml-use/ (general interest reading)

and came back with the idea that we really need CAPs for grammar-based 
XML processing, and ResourceExistActions for avoiding name clashes. 
Other than that, I see many flaws because we want to generate static 
HTML, but hey, that's life :-)

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org                      stevenn@apache.org

Re: [VOTE] Usage of file.hint.ext convention

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Steven Noels wrote:
> Nicola Ken Barozzi wrote:
> 
> <snip/>
> 
>>> But it requires the docwriter to think about the management concern, 
>>> IMO.
>>
>> They still write mydoc.xml or mydoc.gif, no?
>> This isn't a management concern, no?
> 
> 
> No, that's the way silly OS'es bind editing apps to filetypes. 

Simple Minds: "Quit dreaming thisis real life baby..." ;-)

> On Unix 
> and Mac OS, this has been solved in a more robust way, IMHO 
> (/etc/mime-magic and Resource Forks).

Resource forks?
It seems cool, what is it?

>>> I'm still thinking about those content-aware pipelines, and for some 
>>> app we are developing, we actually have been using this technique 
>>> doing a XML Pull Parse on the document to check its root element - 
>>> here, we could check for its DTD identifier.
>>
>>
>>
>> It's neat, but a PITA for many users.
> 
> 
> Howcome? Using CAPs (content-aware pipelines), the system decides what 
> will be done with their XML files, depending on the editing grammar they 
> used.

Because users are used ;-) to declare the mimetype in the filename.
And doctypes are difficult to write.

Ok, it *is* a PITA for many users, but not a blocking one.

>>> I'm vigourously opposing the idea of encoding meta-information twice 
>>> and in different places: inside the document, using its filename, and 
>>> in the request URI.
>>
>>
>>
>> Conceptually I agree, the hint is a "hack".
> 
> 
> Yes.
> 
>>> Consider this scenario:
>>>
>>> URI:
>>>
>>> http://somehost/documentnameA.html
>>> http://somehost/documentnameB.pdf
>>>
>>>
>>> source          step 1         |   step 2        step 3      step4
>>>                                |
>>> A.docv11.xml      -            |   web.xsl      (skin.xsl)   serialize
>>> B.docbook.xml   db2xdoc.xsl    |   paper.xsl                 serialize
>>>                                |
>>>                                ^
>>>                             logical
>>>                               view
>>>                            format [1]
>>>
>>>
>>> There's two concepts that could help us here:
>>>
>>> 1) content-aware pipelines, as being articulated in some form in 
>>> http://marc.theaimsgroup.com/?t=102767485200006&r=1&w=2 - the grammar 
>>> of the XML source document as being passed across the pipeline will 
>>> decide what extra preparatory transformation steps need to be done
>>
>>
>>
>> Ok.
>>
>>> 2) views - simple Cocoon views instead of the current skinning 
>>> system, which would oblige us to seriously think of an intermediate 
>>> 'logical' page format that can be fed into a media-specific 
>>> stylesheet (web, paper, mobile, searchindexes, TOC's etc) resulting 
>>> in media-specific markup that can be augmented with a purely visual 
>>> skinning transformation
>>
>>
>>
>> Man, that's what I've been advocating all along.
> 
> I know - it's just that we add hack after hack to get Forrest out of the 
> door ASAP, which brings as further away from the silver bullet solution 
> (knowing very well that those don't exist, only in the mind of their 
> creators - but we should really try).

:-)

>> I think that the document.dtd can be such a step.
>> The switch to using XHTML for it is *exactly* this.
> 
> 
> I resonate with you on some intermediate format, but am strugling myself 
> with what format we should use. Remember XHTML still carries a lot of 
> structure-typographic elements like tables which can be misused in 
> various ways. 

Table is a semantic thing.
People can always abuse of things, heck, you can abuse of Cocoon in many 
ways, but as Stefano said it's unavoidable.
The protection you give users must be reasonable, not more.

> So what selection of XHTML elements/atts should we use for 
> that intermediate format then?

Se the archives, I did a comparison between documentDTD and XHTML wd2.

> And how will we support the tricks Bert 
> has been applying to the DTD documentation pipelines to have a much 
> better rendition for the element content model description? I don't have 
> the answer right now, but Marc and I are teasing each other to come up 
> with a definitive solution somewhere in time. 

Use div and span tags.

> Time is a bit limited now, 
> infortunately: I'm also readying the launch of cocoondev.org as I 
> promised in 
> http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=102881527330596&w=2
> 
>> Users that want to write a generic document use that dtd.
>> All other content that must be "skinned" by forrest must be 
>> pregenerated by other tools to give that dtd.
> 
> 
> +1
> 
>> We still have status.xml... etc files that get automatically 
>> transformed to that format.
> 
> 
> Yep - and I like that.
> 
>> I have been advocating the two step process since I started using 
>> Cocoon (see also mails to the cocoon users for example), so I'm +10000 
>> for it being formailzed :-D
>>
>>> Views are currently specified using the cocoon-view request 
>>> parameter, so maybe we could use the request-parameter Selector for 
>>> that purpose:
>>>
>>>       <map:match pattern="**">
>>>         <map:select type="request-parameter">
>>>           <map:parameter name="parameter" value="cocoon-view"/>
>>>           <map:when test="pdf">
>>>             pdf pipeline acting on a 'logical page' view?
>>>           </map:when>
>>>           <map:when test="html"/>
>>>         </map:select>
>>>       </map:match>
>>>
>>> Or we could write some Action which uses the URI to specify the 
>>> choosen view/rendition.
>>
>>
>>
>> *This* -1.
>>
>> The hack of putting the intermediate step in the name is to make URI 
>> space indipendent from the output space; you say that even that 
>> pollutes the URI (I agree), and this is a step back.
>>
>> The best think would be to understand something about the client 
>> automatically, but also a request parameter can be ok.
> 
> 
> I was thinking along the lines of choosing a Cocoon view based on the 
> request environment *and* CAPs, but I need to have a serious whiteboard 
> session on this - maybe it is time to host that Forrest hackaton over 
> here RSN.

:-D

>> The point is, can we use them in statically generated documentation?
>>
>> We cannot.  :-/
> 
> 
> I fail to see why, but I might be stuborn ;-)

When I persist the file on disk, I cannot write a file named 
"file.xml?view=mine".
If I encode it in the name, we're back to step one.

>> So we simply should say that the output format is given by the 
>> filename, but this is the output, not th input, and this brings us 
>> back to the problem that writers should concentrate on the input, and 
>> use that for the links to have view indipendence.
>>
>> See, browser technology constrains us :-/
>>
>>> I know all this is bring us to a slowdown, but I couldn't care less: 
>>> I feel we are deviating from best practices in favor of quick wins.
>>>
>>> Caveat: I haven't spent enough time thinking and discussing this, and 
>>> perhaps I have different interests (pet peeves) than others on the list.
>>
>>
>>
>> What you propose is the best route, but we need to be faster.
> 
> 
> Why? ;-)

;-)

>> Ok, let's go into it.
>>
>> 1) have two step process standard +1
>> 2) switch documentdtd to be the intermediate format and become akin to 
>> XHTML2 as in previous mails +1
> 
> 
> I need to revise documentv11, as I promised already (too) many times. 
> Can anyone drop the sky on me?

I did it in the XHTML comparison. Please take a look.

>> 3) use content-aware pipelines - see below
>> 4) link the sources, not the results.
> 
> 
> +1
> 
>> This is cool but what gets generated when I have
>>  file.xml -> file.html
>>  file.html -> file.html
>> Both in the same dir?
> 
> 
> Exactly what Marc just told me - I suggested going for a 
> ResourceExistAction there - would that help?

We already discussed about this, no, it doesn't.
The user has no real idea that one was chosen over the other.

>> If I link to file.xml, I get the link translated to file.html, but 
>> then what file do I get to see?
>>
>> This is the reason why we need a 1-1 relationship.
>>
>> Now to explain the why of the double ext (again):
>>
>> We have file.xml
>>
>> - user must link using the same filename
>>
>>  link href"file.xml"
>>
>> - the browser needs the filename with the resulting extension:
>>
>>  file.html
> 
> 
> Filename generation is part of the crawler, and we can patch/configure 
> that, if we want.

Yes but... |
            v

>> - the system needs to have unique names
>>
>> So this brings us *necessarily* to having both xml and html included 
>> in the extension.
>> xml for unicity, html for the browser.
>>
>> Or maybe have it just become with double extension only with clashing 
>> names, but then, how can the user tell to generate a pdf out of it if 
>> there is only .xml extension?
>>
>> You say it shouldn't know, because part of the view?
>> Go tell the users.
>> And how can they do it without breaking the uri?
>>
>> Ha.
> 
> 
> Duh ;-)
> 
> Will think about that some more, at least we have the discussion going 
> again :-)

:-)

Exactly why I brought the vote up.

The double ext solution is a quick solution to this problem, what Marc 
and I regard to as a minor hack.

If anyone has better suggestions, please bring them on, because we need 
to go forward.

                          <><><>

In essence, the problem is that the URL should expose both the content 
and the result hints.

Why?

- The content is needed for the writer and 1-1 mapping.
- The result is for the browser *and* to make more results available 
with the same resource (page.html and page.pdf).

Where can we put this in the URL?

In browsers, the result is in the extension (see also the .pdf extension 
problem and accompanied hacks and the fact that IE uses the extension 
instead of mime-type).

What remains is the content.

We can encode it in the file or in the path.
But since we want a clean URI that is semantically rich, we put it in 
the filename.

Hence the extension proposal.

           <><><>

Second proposal:

Nobody came up with the fact that the filename has semantics associated.

So:

  mypage.xml
  ->  /path/to/mypage/xml.html
  ->  /path/to/mypage/xml.pdf

  mypage.pdf
  ->  /path/to/mypage/html.pdf

Standard automatic view from live cocoon:

  mypage.xml
  ->  /path/to/mypage

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

relativising URI's when using cocoon CLI (was Re: [VOTE] Usage of file.hint.ext convention)

Posted by Marc Portier <mp...@outerthought.org>.

>> by the way Nicola: the CLI should also be the one that "relativizes" 
>> all the none http://-leading hrefs in our generated html as well :-D
>> that is the correct way to produce a bunch of relative interconnected 
>> pages that can be placed where-ever (not just in the /forrest) on a 
>> webserver. (and detaches the solution from the webapp that doesn't 
>> have this concern)
> 
> 
> Gee I didn't know they became absolute from relative, are you sure?
> Can you give me an example?
> 

nono, they're not...

but when we insert now absolute links (e.g. with the infamous 
tab-hack: /forrest/) they remain absolute in the end-result 
(generated html) because of this we need to publish the generated 
html content on a /forrest URI of a webserver

therefore I propose to have a relativiser inside the cocoon CLI 
process... (maybe command line switch to set it "on")
it should push as much "../" in front of all the /-leading links 
  as there are /file-elements/* in the URI that caused it to be 
generated.

like this the process of generating off-line content immidiately 
takes into account the fact that it could be placed anywhere...

regards,
-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
mpo@outerthought.org                              mpo@apache.org

Re: [VOTE] Usage of file.hint.ext convention

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Marc Portier wrote:
> 
> Steven Noels wrote:
> 
>> Nicola Ken Barozzi wrote:
>>
>> <snip/>

>>> Users that want to write a generic document use that dtd.
>>> All other content that must be "skinned" by forrest must be 
>>> pregenerated by other tools to give that dtd.
>>
>> +1
>>
> ? we are not saying that two step view means that the first step is done 
> with ant-style-tasks is it?

Also. Not only.

> "pre-generated" I also like to read as have a pipeline for it.

Yes, also this.

> as for:
> 
>>> We still have status.xml... etc files that get automatically 
>>> transformed to that format.
>>
>> Yep - and I like that.
> 
> 
> And we should allow for more of these, including user-defined ones.
> I mean: solve this in the URI design for our own future types, and make 
> it available as well.

Yes.

> <snip />
> 
>>
>>> The point is, can we use them in statically generated documentation?
>>>
>>> We cannot.  :-/
>>
>>
> 
> I know I made a similar statement on this earlier, however been thinking 
> along these lines:
> 
>>
>> I fail to see why, but I might be stuborn ;-)
>>
> 
> someone should maybe just check...
> I think the static generation process of cocoon would
> - in case it finds a link of the kind 
> href="addressing-part/name.rendering-type?hint=source-type"
> 
> - just save the resulting file under name.rendering-type
> - currently the serializer would leave the jucky ?hint=source-type in 
> the href, but the webserver we put it on would survive that anyways 
> (since it is not going to take the parameter in account for delivering 
> static content)
> 
> In future it would be nice if the CLI version would remove the ?-trailer 
> on none http://-leading URI's though,
> 
> by the way Nicola: the CLI should also be the one that "relativizes" all 
> the none http://-leading hrefs in our generated html as well :-D
> that is the correct way to produce a bunch of relative interconnected 
> pages that can be placed where-ever (not just in the /forrest) on a 
> webserver. (and detaches the solution from the webapp that doesn't have 
> this concern)

Gee I didn't know they became absolute from relative, are you sure?
Can you give me an example?

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [VOTE] Usage of file.hint.ext convention

Posted by Marc Portier <mp...@outerthought.org>.

Steven Noels wrote:
> Nicola Ken Barozzi wrote:
> 
> <snip/>
> 
>>> But it requires the docwriter to think about the management concern, 
>>> IMO.
>>
>>
>>
>> They still write mydoc.xml or mydoc.gif, no?
>> This isn't a management concern, no?
> 
> 
> No, that's the way silly OS'es bind editing apps to filetypes. On Unix 
> and Mac OS, this has been solved in a more robust way, IMHO 
> (/etc/mime-magic and Resource Forks).
> 

I see this more as a management contract to obey then a 
management concern taken up.
The content-editor is expected to give his document a filename 
(incl extension) by which the rest of the world can address it 
(pick it up later on)

>>> I'm still thinking about those content-aware pipelines, and for some 
>>> app we are developing, we actually have been using this technique 
>>> doing a XML Pull Parse on the document to check its root element - 
>>> here, we could check for its DTD identifier.
>>
>>
>>
>> It's neat, but a PITA for many users.
> 
> 
> Howcome? Using CAPs (content-aware pipelines), the system decides what 
> will be done with their XML files, depending on the editing grammar they 
> used.
> 

yep, this surely sounds easier from an end-user perspective
however, it will need some config file somewhere that explains 
which xslt or pipeline to use based on the recoginsed content.

(also note that 'content-aware' is more then doctype-aware.)

>>> I'm vigourously opposing the idea of encoding meta-information twice 
>>> and in different places: inside the document, using its filename, and 
>>> in the request URI.
>>
>> Conceptually I agree, the hint is a "hack".
> 
> Yes.
> 

it is also a solution ;-)

<snip />

>> Users that want to write a generic document use that dtd.
>> All other content that must be "skinned" by forrest must be 
>> pregenerated by other tools to give that dtd.
> 
> 
> +1
> 
? we are not saying that two step view means that the first step 
is done with ant-style-tasks is it?
"pre-generated" I also like to read as have a pipeline for it.
as for:

>> We still have status.xml... etc files that get automatically 
>> transformed to that format.
> 
> 
> Yep - and I like that.

And we should allow for more of these, including user-defined ones.
I mean: solve this in the URI design for our own future types, 
and make it available as well.

<snip />

> 
>> The point is, can we use them in statically generated documentation?
>>
>> We cannot.  :-/
> 

I know I made a similar statement on this earlier, however been 
thinking along these lines:

> 
> I fail to see why, but I might be stuborn ;-)
> 

someone should maybe just check...
I think the static generation process of cocoon would
- in case it finds a link of the kind 
href="addressing-part/name.rendering-type?hint=source-type"

- just save the resulting file under name.rendering-type
- currently the serializer would leave the jucky 
?hint=source-type in the href, but the webserver we put it on 
would survive that anyways (since it is not going to take the 
parameter in account for delivering static content)

In future it would be nice if the CLI version would remove the 
?-trailer on none http://-leading URI's though,

by the way Nicola: the CLI should also be the one that 
"relativizes" all the none http://-leading hrefs in our generated 
html as well :-D
that is the correct way to produce a bunch of relative 
interconnected pages that can be placed where-ever (not just in 
the /forrest) on a webserver. (and detaches the solution from the 
webapp that doesn't have this concern)

<snip/>

-marc
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
mpo@outerthought.org                              mpo@apache.org

Re: [VOTE] Usage of file.hint.ext convention

Posted by Steven Noels <st...@outerthought.org>.

Nicola Ken Barozzi wrote:

<snip/>

>> But it requires the docwriter to think about the management concern, IMO.
> 
> 
> They still write mydoc.xml or mydoc.gif, no?
> This isn't a management concern, no?

No, that's the way silly OS'es bind editing apps to filetypes. On Unix 
and Mac OS, this has been solved in a more robust way, IMHO 
(/etc/mime-magic and Resource Forks).

>> I'm still thinking about those content-aware pipelines, and for some 
>> app we are developing, we actually have been using this technique 
>> doing a XML Pull Parse on the document to check its root element - 
>> here, we could check for its DTD identifier.
> 
> 
> It's neat, but a PITA for many users.

Howcome? Using CAPs (content-aware pipelines), the system decides what 
will be done with their XML files, depending on the editing grammar they 
used.

>> I'm vigourously opposing the idea of encoding meta-information twice 
>> and in different places: inside the document, using its filename, and 
>> in the request URI.
> 
> 
> Conceptually I agree, the hint is a "hack".

Yes.

>> Consider this scenario:
>>
>> URI:
>>
>> http://somehost/documentnameA.html
>> http://somehost/documentnameB.pdf
>>
>>
>> source          step 1         |   step 2        step 3      step4
>>                                |
>> A.docv11.xml      -            |   web.xsl      (skin.xsl)   serialize
>> B.docbook.xml   db2xdoc.xsl    |   paper.xsl                 serialize
>>                                |
>>                                ^
>>                             logical
>>                               view
>>                            format [1]
>>
>>
>> There's two concepts that could help us here:
>>
>> 1) content-aware pipelines, as being articulated in some form in 
>> http://marc.theaimsgroup.com/?t=102767485200006&r=1&w=2 - the grammar 
>> of the XML source document as being passed across the pipeline will 
>> decide what extra preparatory transformation steps need to be done
> 
> 
> Ok.
> 
>> 2) views - simple Cocoon views instead of the current skinning system, 
>> which would oblige us to seriously think of an intermediate 'logical' 
>> page format that can be fed into a media-specific stylesheet (web, 
>> paper, mobile, searchindexes, TOC's etc) resulting in media-specific 
>> markup that can be augmented with a purely visual skinning transformation
> 
> 
> Man, that's what I've been advocating all along.

I know - it's just that we add hack after hack to get Forrest out of the 
door ASAP, which brings as further away from the silver bullet solution 
(knowing very well that those don't exist, only in the mind of their 
creators - but we should really try).

> I think that the document.dtd can be such a step.
> The switch to using XHTML for it is *exactly* this.

I resonate with you on some intermediate format, but am strugling myself 
with what format we should use. Remember XHTML still carries a lot of 
structure-typographic elements like tables which can be misused in 
various ways. So what selection of XHTML elements/atts should we use for 
that intermediate format then? And how will we support the tricks Bert 
has been applying to the DTD documentation pipelines to have a much 
better rendition for the element content model description? I don't have 
the answer right now, but Marc and I are teasing each other to come up 
with a definitive solution somewhere in time. Time is a bit limited now, 
infortunately: I'm also readying the launch of cocoondev.org as I 
promised in 
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=102881527330596&w=2

> Users that want to write a generic document use that dtd.
> All other content that must be "skinned" by forrest must be pregenerated 
> by other tools to give that dtd.

+1

> We still have status.xml... etc files that get automatically transformed 
> to that format.

Yep - and I like that.

> I have been advocating the two step process since I started using Cocoon 
> (see also mails to the cocoon users for example), so I'm +10000 for it 
> being formailzed :-D
> 
>> Views are currently specified using the cocoon-view request parameter, 
>> so maybe we could use the request-parameter Selector for that purpose:
>>
>>       <map:match pattern="**">
>>         <map:select type="request-parameter">
>>           <map:parameter name="parameter" value="cocoon-view"/>
>>           <map:when test="pdf">
>>             pdf pipeline acting on a 'logical page' view?
>>           </map:when>
>>           <map:when test="html"/>
>>         </map:select>
>>       </map:match>
>>
>> Or we could write some Action which uses the URI to specify the 
>> choosen view/rendition.
> 
> 
> *This* -1.
> 
> The hack of putting the intermediate step in the name is to make URI 
> space indipendent from the output space; you say that even that pollutes 
> the URI (I agree), and this is a step back.
> 
> The best think would be to understand something about the client 
> automatically, but also a request parameter can be ok.

I was thinking along the lines of choosing a Cocoon view based on the 
request environment *and* CAPs, but I need to have a serious whiteboard 
session on this - maybe it is time to host that Forrest hackaton over 
here RSN.

> The point is, can we use them in statically generated documentation?
> 
> We cannot.  :-/

I fail to see why, but I might be stuborn ;-)

> So we simply should say that the output format is given by the filename, 
> but this is the output, not th input, and this brings us back to the 
> problem that writers should concentrate on the input, and use that for 
> the links to have view indipendence.
> 
> See, browser technology constrains us :-/
> 
>> I know all this is bring us to a slowdown, but I couldn't care less: I 
>> feel we are deviating from best practices in favor of quick wins.
>>
>> Caveat: I haven't spent enough time thinking and discussing this, and 
>> perhaps I have different interests (pet peeves) than others on the list.
> 
> 
> What you propose is the best route, but we need to be faster.

Why? ;-)

> Ok, let's go into it.
> 
> 1) have two step process standard +1
> 2) switch documentdtd to be the intermediate format and become akin to 
> XHTML2 as in previous mails +1

I need to revise documentv11, as I promised already (too) many times. 
Can anyone drop the sky on me?

> 3) use content-aware pipelines - see below
> 4) link the sources, not the results.

+1

> This is cool but what gets generated when I have
>  file.xml -> file.html
>  file.html -> file.html
> Both in the same dir?

Exactly what Marc just told me - I suggested going for a 
ResourceExistAction there - would that help?

> If I link to file.xml, I get the link translated to file.html, but then 
> what file do I get to see?
> 
> This is the reason why we need a 1-1 relationship.
> 
> Now to explain the why of the double ext (again):
> 
> We have file.xml
> 
> - user must link using the same filename
> 
>  link href"file.xml"
> 
> - the browser needs the filename with the resulting extension:
> 
>  file.html

Filename generation is part of the crawler, and we can patch/configure 
that, if we want.

> - the system needs to have unique names
> 
> So this brings us *necessarily* to having both xml and html included in 
> the extension.
> xml for unicity, html for the browser.
> 
> Or maybe have it just become with double extension only with clashing 
> names, but then, how can the user tell to generate a pdf out of it if 
> there is only .xml extension?
> 
> You say it shouldn't know, because part of the view?
> Go tell the users.
> And how can they do it without breaking the uri?
> 
> Ha.

Duh ;-)

Will think about that some more, at least we have the discussion going 
again :-)

Cheers,

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org                      stevenn@apache.org

Re: Link-Addressing, and breaking up the sitemap

Posted by Marc Portier <mp...@outerthought.org>.


Jeff Turner wrote:
> On Fri, Sep 06, 2002 at 04:03:34PM +0200, Marc Portier wrote:
> ... 
> 
>>>Okay. What happens to the HTML that contained the link? We need to
>>>rewrite the link to point to the 'location' attribute. As the link is
>>>part of the contents, which is rendered by the sitemap, we need an XSLT
>>>or something that rewrites links:
>>>
>>><xsl:template match="link">
>>>  <link>
>>>    <xsl:attribute name="href">
>>>      <xsl:if test="@href = 'doc'">
>>>        <xsl:value-of select="./src/documentation/content/xdocs"/>
>>>      </xsl:if>
>>>      <xsl:if test="@href = 'mail'">
>>>        <xsl:value-of select="..."/>
>>>      </xsl:if>
>>>      ...
>>></xsl:template>
>>>
>>>Suitably generalized, of course.
>>
>>mmm, the idea would be that you write your documents in such a 
>>way that they use a common ref-prefix '/bar' for say docs 
>>generated by foo...
>>
>>then you mention
>><part name="bar" location="./build/where-foo-dropped-it" />
>>
>>and tzadaam they get merged in
> 
> 
> That's the goal..
> 
> 
>>it is of course a bit naieve?
>>in my first attempt at describing this I also thought about 
>>reference-aliases for the case where you can say
>><part name="bar" location="./build/where-foo-dropped-it" >
>>   <ref-alias name="old-bar" />
>>   <ref-alias name="bar2" />
>></part>
>>
>>this would mean that there are still documents that would use the 
>>old or alternative reference-prefix /old-bar resp /bar2 when they 
>>should be using /bar
> 
> 
> Don't see the need for extra prefixes. URIs are meant to uniquely
> identify the resource.
> 

yep, main reason why I dropped it later on...

> 
>>for those I think some tuned transformer could be created
>>suggestions:
>>- SAX filtering on anything that looks like a link-attribute?
> 
> 
> Make LinkSerializer configurable?
> 

surely a little babe I'ld like to have a look into... pointers/tips?

also some elaboration on how you'ld see the configuration look 
like and modify behaviour?

(even if the new line of thinking would be not to have multiple 
prefixes alias to the same kind of thing, your new line of 
thinking still requires us to start thinking on this, no?)

> 
>>- having as config a small file that is generated from the 
>>original one, (or is just the original one where it is 
>>xpath-working out what it needs)
>>
>>
>>still, a 2nd line kind of feature if you ask me
>>
>>
>>>Minor issue: if we're rewriting links, why bother copying the javadocs
>>>inside the Cocoon context? We could just prepend '../javadocs', tell
>>>Cocoon to ignore those links, and keep Javadocs outside. No need for Ant.
>>>
>>
>>I feared that the javadoc example would lead to this....
>>I was supposing that maybe one would want to e.g. start from 
>>xml-javadocs ore something like that?
> 
> 
> Then /xjdoc links get pointed to a different place than /jdoc.
> 
> Btw, how do you feel about a pseudo-protocol notation,
> 'javadoc:someClass'. Since we're going to need link rewriting anyway,
> might as well not confuse users by suggesting that a real '/jdoc'
> directory should exist.
> 

indeed... I like it a lot.
only constraining thing would be a bit the fact that considering 
'part of content' as 'protocol' might be confusing as well?

maybe URN notations with their namespace-parts are then the ideal 
go-between? Never done it actually but would we then get 
something like urn:javadoc:className ?

just thinking out loud here...

> 
>>and there is other content to consider, no?
> 
> 
> Link rewriting is useful all over the place. I've got links like: <link
> href="elem:httpRequest">, which means "link to the section documenting
> the httpRequest element". A stylesheet then replaces that with <link
> href="httpRequest.html">. Alternatively, if producing a PDF, it gets
> converted to a <fo:basic-link>.
> 
> Given a suitably smart link rewriter, and some support in the crawler,
> and I think we can have links that are genuinely independent of the
> output format and filesystem.
> 

japjap, here you...

> 
>>>So if I'm not mistaken, the whole thing boils down to one link-rewriting
>>>stylesheet.
>>>
>>>I'll try implementing it on my own project now.
>>>
>>
>>let us know where it's leading you...
> 
> 
> Javadoc links are working, eg: <link href="javadoc:org.apache.foo.Bar">
> Pretty trivial to implement. With one stylesheet modification I can link
> to either the online javadocs or locally produced javadocs.
> 
yes

> I don't know how "deep" links would work though. Right now I've got an
> xdoc like this:
> 
> <document>
>   <body>
>     <section><title>Anteater tags</title>
>     ...
>       <section><title>httpRequest</title>
>       ...
>       </section>
>     </section>
>   </body>
> </document>
> 
> And I'm trying to figure out how to link to the 'httpRequest' section
> from another page. Probably an id attribute, which gets rendered in HTML
> as <a nam=".."> would work.
> 

sounds like DTD-votes and xslt work... on the other hand:

this talk about link-rewrites makes me dream about 
fragment-identifiers with xpath/xpointer expressions...

would be great feature if the link-rewriter could also just 
rewrite the expression into the matching fragment-identifier the 
style-sheet is now throwing in for the inline-document-local TOC?



> --Jeff
> 

this feels like we're getting somewhere. no?

-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
mpo@outerthought.org                              mpo@apache.org

Re: Link-Addressing, and breaking up the sitemap

Posted by Jeff Turner <je...@apache.org>.

On Fri, Sep 06, 2002 at 04:03:34PM +0200, Marc Portier wrote:
... 
> >
> >Okay. What happens to the HTML that contained the link? We need to
> >rewrite the link to point to the 'location' attribute. As the link is
> >part of the contents, which is rendered by the sitemap, we need an XSLT
> >or something that rewrites links:
> >
> > <xsl:template match="link">
> >   <link>
> >     <xsl:attribute name="href">
> >       <xsl:if test="@href = 'doc'">
> >         <xsl:value-of select="./src/documentation/content/xdocs"/>
> >       </xsl:if>
> >       <xsl:if test="@href = 'mail'">
> >         <xsl:value-of select="..."/>
> >       </xsl:if>
> >       ...
> > </xsl:template>
> >
> > Suitably generalized, of course.
> 
> mmm, the idea would be that you write your documents in such a 
> way that they use a common ref-prefix '/bar' for say docs 
> generated by foo...
> 
> then you mention
> <part name="bar" location="./build/where-foo-dropped-it" />
> 
> and tzadaam they get merged in

That's the goal..

> it is of course a bit naieve?
> in my first attempt at describing this I also thought about 
> reference-aliases for the case where you can say
> <part name="bar" location="./build/where-foo-dropped-it" >
>    <ref-alias name="old-bar" />
>    <ref-alias name="bar2" />
> </part>
> 
> this would mean that there are still documents that would use the 
> old or alternative reference-prefix /old-bar resp /bar2 when they 
> should be using /bar

Don't see the need for extra prefixes. URIs are meant to uniquely
identify the resource.

> for those I think some tuned transformer could be created
> suggestions:
> - SAX filtering on anything that looks like a link-attribute?

Make LinkSerializer configurable?

> - having as config a small file that is generated from the 
> original one, (or is just the original one where it is 
> xpath-working out what it needs)
> 
> 
> still, a 2nd line kind of feature if you ask me
> 
> >
> >Minor issue: if we're rewriting links, why bother copying the javadocs
> >inside the Cocoon context? We could just prepend '../javadocs', tell
> >Cocoon to ignore those links, and keep Javadocs outside. No need for Ant.
> >
> I feared that the javadoc example would lead to this....
> I was supposing that maybe one would want to e.g. start from 
> xml-javadocs ore something like that?

Then /xjdoc links get pointed to a different place than /jdoc.

Btw, how do you feel about a pseudo-protocol notation,
'javadoc:someClass'. Since we're going to need link rewriting anyway,
might as well not confuse users by suggesting that a real '/jdoc'
directory should exist.

> and there is other content to consider, no?

Link rewriting is useful all over the place. I've got links like: <link
href="elem:httpRequest">, which means "link to the section documenting
the httpRequest element". A stylesheet then replaces that with <link
href="httpRequest.html">. Alternatively, if producing a PDF, it gets
converted to a <fo:basic-link>.

Given a suitably smart link rewriter, and some support in the crawler,
and I think we can have links that are genuinely independent of the
output format and filesystem.

> >So if I'm not mistaken, the whole thing boils down to one link-rewriting
> >stylesheet.
> >
> >I'll try implementing it on my own project now.
> >
> 
> let us know where it's leading you...

Javadoc links are working, eg: <link href="javadoc:org.apache.foo.Bar">
Pretty trivial to implement. With one stylesheet modification I can link
to either the online javadocs or locally produced javadocs.

I don't know how "deep" links would work though. Right now I've got an
xdoc like this:

<document>
  <body>
    <section><title>Anteater tags</title>
    ...
      <section><title>httpRequest</title>
      ...
      </section>
    </section>
  </body>
</document>

And I'm trying to figure out how to link to the 'httpRequest' section
from another page. Probably an id attribute, which gets rendered in HTML
as <a nam=".."> would work.

--Jeff

[ot] xanadu,.... (Re: Link-Addressing, and breaking up the sitemap)

Posted by Marc Portier <mp...@outerthought.org>.

> Xanadu:
>   Xanadu is the largest vaporware project in the history of computing.
>   Concieved by Ted Nelson, the project has been delayed and suffered
     ^^^^^^^^^
you have got to be kidding,
this was high LSD time, man
he purely 'envisioned' IIRC, nothing less, please.

>   setbacks since the 60's. It has been overshadowed by the world wide
>   web, a much more primitive (but working) implementation.
> 
> Lojban:
>    Lojban is a carefully constructed spoken language designed in the hope
>    of removing a large portion of the ambiguity from human communication.

anyone that knows the lojban word for 'ambiguity'?

-marc=

Re: Link-Addressing, and breaking up the sitemap

Posted by Jeff Turner <je...@apache.org>.

On Thu, Sep 05, 2002 at 04:32:29PM +0200, Steven Noels wrote:
> Jeff Turner wrote:
> 
> > "Ban All Extensions!! Ban All Extensions!! Ban All Argghhh..
> > getyourhandsoffme, peasant. I'm a knight of the W3C Ivory Tower.."
> 
> Cool! Where can I buy that Member Card? :-D
> 
> BTW, do I get a reduction if I have an installed copy of Udanax on my 
> system? Maybe we should all start discussing General Enfilade Theory and 
> The Ent for that matter!

Steven, you are a sick puppy. Life membership just for knowing about such
scary stuff. Mailing list conversations are to be held in Lojban, English
being for the unwashed masses.

--Jeff

PS: to save a lot of lost productivity:

udanax:
  An attempt in 1999 by Theodor Holm Nelson and the current Xanadu crew
  to resurrect Xanadu by making it open source and putting it on the web.

Xanadu:
  Xanadu is the largest vaporware project in the history of computing.
  Concieved by Ted Nelson, the project has been delayed and suffered
  setbacks since the 60's. It has been overshadowed by the world wide
  web, a much more primitive (but working) implementation.

Lojban:
   Lojban is a carefully constructed spoken language designed in the hope
   of removing a large portion of the ambiguity from human communication.

> Steven Noels                            http://outerthought.org/

Re: Link-Addressing, and breaking up the sitemap

Posted by Marc Portier <mp...@outerthought.org>.


Jeff Turner wrote:
> On Thu, Sep 05, 2002 at 05:36:58PM +0200, Marc Portier wrote:
> 
>>>+1. Just like how JSP tags all treat '/'-leading paths as relative to the
>>>servlet context, not the server root.
>>>
>>
>>yep, glad you like the idea more then the word :-)
> 
> 
> Though it later occurred to me that the sitemap renders the document, so
> it must be responsible for 'relativising' all links contained in it, not
> the crawler.
> 
that was my first idea: some transformer, but it would be active 
also in the webapp case (it wouldn't hurt, except for the elapsed 
time)

> It's all so confusing. I propose we wait until physicists come up with a
> Theory of Everything, and then we work backwards from there to discover
> the Theory of Forrest Linking.
> 
in this case, the physicist is called Nicola, if he thinks it 
would be feasible for the crawler to do this (or someone else 
under his guidance) we could just do it, no?

> 
>>>It would be best to define the goals first:
>>>
>>>- Users need to be able to customize the sitemap with their own
>>>  matchers, for whatever crazy reasons they want.
>>
>>maybe we should find reasons just to make sure this _is_ a valid 
>>goal, AND to make sure that those 'crazy' reasons will be 
>>satisfied by allowing snippets of sitemaps?
> 
> 
> Good point.
> 
> 
>>>- In addition, we'd like to provide quick'n simple ways of doing routine
>>>  customizations, like specifying javadoc prefixes, and adding pipelines
>>>  for new document types (docbook, say). Ie, a siteplan.
>>
>>adding pipelines for the document types should be covered by the 
>>CAPs and the previous discussion no?
>>
>>mmm, provided the end-user can add to that of course, but again: 
>>maybe the CAP could be told in a different way then the sitemap 
>>about possibly new stuff?
>>
>>mmm, lets look at how the CAP turns out first, then we can 
>>discuss on letting it know about newer stuff
> 
> 
> Okay. You're probably right, that the CAP can be configured externally to
> the sitemap. From Steven's mail, I imagine that CAPs would work as
> follows:
> 
> <map:match pattern="*.html">
>   <map:act type="CAPAction">
>     <map:parameter name="config" value="doctypes.properties"/>
>     <map:generate src="{1}.xml"/>
>     <map:transform src="stylesheets/{doctype}2docv11.xsl"/>
>     <map:transform src="stylesheets/document2html.xsl"/>
>     <map:serialize/>
> </map:match>
> 
> Where 'doctypes.properties' contains doctype <-> public id mappings:
> 
> docbook=-//OASIS//DTD DocBook XML V4.1.2//EN
> docv11=-//APACHE//DTD Documentation V1.1//EN
> docv10=-//APACHE//DTD Documentation V1.0//EN
> faq=-//APACHE//DTD FAQ V1.1//EN
> 
> 

yep, I hear he had Bruno cook something up... now on his hard 
drive, let us hope for a check-in soon

> 
>>>- Forrest's sitemap needs to be modularized, so users can choose just
>>>  the functionality they need. If they don't want svg2png, don't include
>>>  Batik. If they don't want PDFs, don't include FOP. 
>>>
>>
>>isn't this the kind of challenge that asks cocoon to be 
>>modularized first?
> 
> 
> I don't know. Would having Cocoon blocks give us a more modular sitemap,
> any more than <map:mount> gives us?
> 

beats me, it just sounded like fop and batik blocks to me :-S

> 
>>could be me, but it is probably not the first goal of forrest to 
>>do that cocoon-work?
>>
>>
>>>To meet these goals, I'd like to chop the sitemap into functional
>>>sections:
>>>
>>>- Straight *.xml to *.html
>>>- Site statistics reporting (apachestats)
>>>- todo generation
>>>- changelog generation
>>>- faq
>>>- 'community' section with feedback
>>>- doclist
>>>- DTD documentation
>>>- PDFs-of-every-page
>>>
>>>Each of these is in a sitemap 'snippet'. Users can add project-specific
>>>functionality by adding new snippets.
>>>
>>
>>mmm, most of the ones you mention are about types and the 
>>pipeline to get them through the 2-step-view rendition towards 
>>pdf, html, whadayanameit... so that is the stuff we covered with 
>>the CAPS & hints discussion, no?
> 
> 
> Yes, true.
> 
> 
>>the new thing I want to address is how to find and 
>>cross-reference these documents...
> 
> 
> Isn't the linking issue is completely separate from the issue of how to
> augment, modularize and customize the sitemap?
> 
> Oh, there it is; "Link Addressing" in the subject. I dragged this thread
> waaay off course :P Sorry..
> 

focus my man, and less mind-expanding chemicals :-)

> <snip topic=linking>
> 
>>3. we could possibly aid more...
>>- in all cases the enduser still needs the ant tasks (foreign 
>>processes) to generate the javadoc (or other stuff)
>>- he knows where they are relative to his project.home
>>- we let him tell forrest (1) where that is, (2) how other 
>>documents are referencing the root of this stuff...
> 
> 
> Good.
> 
> 
>>my current guess would be to do that with the XML snippet I 
>>proposed... (but possibly you all feel that letting him write the 
>> mingle-into-build/documentation/ ant is easier?)
> 
> 
> Ignore all my stuff about merging snippets.. it's irrelevant to this
> links topic. Anyway, so we have this XML snippet:
> 
>  <content>
>    <part name="doc"
>          location="./src/documentation/content/xdocs"/>
>    <part name="mail"
>          location="..." />
>    <part name="jdoc"
>          location="..." />
>  </content>
> 
> Now the question is how to process it.
> 

given the bot hack this could see a quick prototype that just 
produces a piece of ant script through xslt to be included...

in the long run it's maybe better to actually make it an ant-task

> 
>>then let some smart ant-task (part of the forrest activity) read 
>>that file, copy the described stuff over into the cocoon context 
>>dir (we stay in charge of location and organization) where the 
>>CAP-hints pipeline deals with it as soon as the webapp or crawler 
>>asks for it
> 
> 
> Okay. What happens to the HTML that contained the link? We need to
> rewrite the link to point to the 'location' attribute. As the link is
> part of the contents, which is rendered by the sitemap, we need an XSLT
> or something that rewrites links:
 >
 > <xsl:template match="link">
 >   <link>
 >     <xsl:attribute name="href">
 >       <xsl:if test="@href = 'doc'">
 >         <xsl:value-of select="./src/documentation/content/xdocs"/>
 >       </xsl:if>
 >       <xsl:if test="@href = 'mail'">
 >         <xsl:value-of select="..."/>
 >       </xsl:if>
 >       ...
 > </xsl:template>
 >
 > Suitably generalized, of course.

mmm, the idea would be that you write your documents in such a 
way that they use a common ref-prefix '/bar' for say docs 
generated by foo...

then you mention
<part name="bar" location="./build/where-foo-dropped-it" />

and tzadaam they get merged in


it is of course a bit naieve?
in my first attempt at describing this I also thought about 
reference-aliases for the case where you can say
<part name="bar" location="./build/where-foo-dropped-it" >
    <ref-alias name="old-bar" />
    <ref-alias name="bar2" />
</part>

this would mean that there are still documents that would use the 
old or alternative reference-prefix /old-bar resp /bar2 when they 
should be using /bar

for those I think some tuned transformer could be created
suggestions:
- SAX filtering on anything that looks like a link-attribute?
- having as config a small file that is generated from the 
original one, (or is just the original one where it is 
xpath-working out what it needs)


still, a 2nd line kind of feature if you ask me

> 
> Minor issue: if we're rewriting links, why bother copying the javadocs
> inside the Cocoon context? We could just prepend '../javadocs', tell
> Cocoon to ignore those links, and keep Javadocs outside. No need for Ant.
> 
I feared that the javadoc example would lead to this....
I was supposing that maybe one would want to e.g. start from 
xml-javadocs ore something like that?

and there is other content to consider, no?

but you _are_ right: we could take option 1: ignore that there is 
anything else

> So if I'm not mistaken, the whole thing boils down to one link-rewriting
> stylesheet.
> 
> I'll try implementing it on my own project now.
> 

let us know where it's leading you...

> 
> --Jeff
> 
> <snip stuff about merging sitemap snippets as it's not relevant>
> 
>>-- 
>>Marc Portier                            http://outerthought.org/
>>Outerthought - Open Source, Java & XML Competence Support Center
>>mpo@outerthought.org                              mpo@apache.org
>>
> 
> 

-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
mpo@outerthought.org                              mpo@apache.org

Re: Link-Addressing, and breaking up the sitemap

Posted by Jeff Turner <je...@apache.org>.

On Thu, Sep 05, 2002 at 05:36:58PM +0200, Marc Portier wrote:
> >
> >+1. Just like how JSP tags all treat '/'-leading paths as relative to the
> >servlet context, not the server root.
> >
> yep, glad you like the idea more then the word :-)

Though it later occurred to me that the sitemap renders the document, so
it must be responsible for 'relativising' all links contained in it, not
the crawler.

It's all so confusing. I propose we wait until physicists come up with a
Theory of Everything, and then we work backwards from there to discover
the Theory of Forrest Linking.

> >It would be best to define the goals first:
> >
> > - Users need to be able to customize the sitemap with their own
> >   matchers, for whatever crazy reasons they want.
> maybe we should find reasons just to make sure this _is_ a valid 
> goal, AND to make sure that those 'crazy' reasons will be 
> satisfied by allowing snippets of sitemaps?

Good point.

> > - In addition, we'd like to provide quick'n simple ways of doing routine
> >   customizations, like specifying javadoc prefixes, and adding pipelines
> >   for new document types (docbook, say). Ie, a siteplan.
> adding pipelines for the document types should be covered by the 
> CAPs and the previous discussion no?
> 
> mmm, provided the end-user can add to that of course, but again: 
> maybe the CAP could be told in a different way then the sitemap 
> about possibly new stuff?
> 
> mmm, lets look at how the CAP turns out first, then we can 
> discuss on letting it know about newer stuff

Okay. You're probably right, that the CAP can be configured externally to
the sitemap. From Steven's mail, I imagine that CAPs would work as
follows:

<map:match pattern="*.html">
  <map:act type="CAPAction">
    <map:parameter name="config" value="doctypes.properties"/>
    <map:generate src="{1}.xml"/>
    <map:transform src="stylesheets/{doctype}2docv11.xsl"/>
    <map:transform src="stylesheets/document2html.xsl"/>
    <map:serialize/>
</map:match>

Where 'doctypes.properties' contains doctype <-> public id mappings:

docbook=-//OASIS//DTD DocBook XML V4.1.2//EN
docv11=-//APACHE//DTD Documentation V1.1//EN
docv10=-//APACHE//DTD Documentation V1.0//EN
faq=-//APACHE//DTD FAQ V1.1//EN


> > - Forrest's sitemap needs to be modularized, so users can choose just
> >   the functionality they need. If they don't want svg2png, don't include
> >   Batik. If they don't want PDFs, don't include FOP. 
> >
> isn't this the kind of challenge that asks cocoon to be 
> modularized first?

I don't know. Would having Cocoon blocks give us a more modular sitemap,
any more than <map:mount> gives us?

> could be me, but it is probably not the first goal of forrest to 
> do that cocoon-work?
> 
> >
> >To meet these goals, I'd like to chop the sitemap into functional
> >sections:
> >
> > - Straight *.xml to *.html
> > - Site statistics reporting (apachestats)
> > - todo generation
> > - changelog generation
> > - faq
> > - 'community' section with feedback
> > - doclist
> > - DTD documentation
> > - PDFs-of-every-page
> >
> >Each of these is in a sitemap 'snippet'. Users can add project-specific
> >functionality by adding new snippets.
> >
> mmm, most of the ones you mention are about types and the 
> pipeline to get them through the 2-step-view rendition towards 
> pdf, html, whadayanameit... so that is the stuff we covered with 
> the CAPS & hints discussion, no?

Yes, true.

> the new thing I want to address is how to find and 
> cross-reference these documents...

Isn't the linking issue is completely separate from the issue of how to
augment, modularize and customize the sitemap?

Oh, there it is; "Link Addressing" in the subject. I dragged this thread
waaay off course :P Sorry..

<snip topic=linking>

> 3. we could possibly aid more...
> - in all cases the enduser still needs the ant tasks (foreign 
> processes) to generate the javadoc (or other stuff)
> - he knows where they are relative to his project.home
> - we let him tell forrest (1) where that is, (2) how other 
> documents are referencing the root of this stuff...

Good.

> my current guess would be to do that with the XML snippet I 
> proposed... (but possibly you all feel that letting him write the 
>  mingle-into-build/documentation/ ant is easier?)

Ignore all my stuff about merging snippets.. it's irrelevant to this
links topic. Anyway, so we have this XML snippet:

 <content>
   <part name="doc"
         location="./src/documentation/content/xdocs"/>
   <part name="mail"
         location="..." />
   <part name="jdoc"
         location="..." />
 </content>

Now the question is how to process it.

> then let some smart ant-task (part of the forrest activity) read 
> that file, copy the described stuff over into the cocoon context 
> dir (we stay in charge of location and organization) where the 
> CAP-hints pipeline deals with it as soon as the webapp or crawler 
> asks for it

Okay. What happens to the HTML that contained the link? We need to
rewrite the link to point to the 'location' attribute. As the link is
part of the contents, which is rendered by the sitemap, we need an XSLT
or something that rewrites links:

<xsl:template match="link">
  <link>
    <xsl:attribute name="href">
      <xsl:if test="@href = 'doc'">
        <xsl:value-of select="./src/documentation/content/xdocs"/>
      </xsl:if>
      <xsl:if test="@href = 'mail'">
        <xsl:value-of select="..."/>
      </xsl:if>
      ...
</xsl:template>

Suitably generalized, of course.

Minor issue: if we're rewriting links, why bother copying the javadocs
inside the Cocoon context? We could just prepend '../javadocs', tell
Cocoon to ignore those links, and keep Javadocs outside. No need for Ant.

So if I'm not mistaken, the whole thing boils down to one link-rewriting
stylesheet.

I'll try implementing it on my own project now.


--Jeff

<snip stuff about merging sitemap snippets as it's not relevant>
> -- 
> Marc Portier                            http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> mpo@outerthought.org                              mpo@apache.org
> 

-- 
Hell is a state of mind. And every state of mind, left to itself,
every shutting up of the creature within the dungeon of it's own
mind -- is, in the end, Hell.
  C.S. Lewis, _The Great Divorce_

Re: Link-Addressing, and breaking up the sitemap

Posted by Steven Noels <st...@outerthought.org>.

Jeff Turner wrote:

 > "Ban All Extensions!! Ban All Extensions!! Ban All Argghhh..
 > getyourhandsoffme, peasant. I'm a knight of the W3C Ivory Tower.."

Cool! Where can I buy that Member Card? :-D

BTW, do I get a reduction if I have an installed copy of Udanax on my 
system? Maybe we should all start discussing General Enfilade Theory and 
The Ent for that matter!

 > Problem solved :) Technically, all that is required is a way of
 > ordering matcher patterns, and then the Ant task is easy. Then we ask
 > Sylvain if he'll kindly write us another sitemap processor that
 > ignores matcher order, and it's beer and skittles from there on.

I knew we would start rewriting Cocoon after all :-)

Just kidding - must read this first.

Cheers,

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org                      stevenn@apache.org

Re: Link-Addressing, and breaking up the sitemap

Posted by Marc Portier <mp...@outerthought.org>.


[conceptual on the idea/resent of having sitemap generated]
> 
> Yes,  trying to 'hide' the sitemap could just introduce a layer of
> complexity and get in the way.
> 

although one of my arguments pro-siteplan would of have been that 
  it could be _easier_ then the cocoon sitemap (reduce 
complexity, rather then add a layer that tries to enable 
everything in different wording...)

given the fact that it would not try to offer everything of the 
underlaying cocoon (but as stated before: people knowing it is 
cocoon will not take that)



[practical on the idea of 'relativising' the 'staticalized' html]
>>whadayathink?
>>using the CLI is about "staticalizing" your content.  The webapp 
> 
> 
> That's almost as bad as 'prextensination'! ;)
> 
that was Steven's word for it (although my idea)

> 
>>does not have this concern, so only the CLI needs to be taking up 
>>this concern to 'relativise' all the /-leading references in the 
>>produced static content.
> 
> 
> +1. Just like how JSP tags all treat '/'-leading paths as relative to the
> servlet context, not the server root.
> 
yep, glad you like the idea more then the word :-)

<snip on tabs and default pages />
<snip on snippets of sitemap />

[theoretically on goals]
>>sounds good, but needs some elaboration though...
> 
> It would be best to define the goals first:
> 
>  - Users need to be able to customize the sitemap with their own
>    matchers, for whatever crazy reasons they want.
maybe we should find reasons just to make sure this _is_ a valid 
goal, AND to make sure that those 'crazy' reasons will be 
satisfied by allowing snippets of sitemaps?

> 
>  - In addition, we'd like to provide quick'n simple ways of doing routine
>    customizations, like specifying javadoc prefixes, and adding pipelines
>    for new document types (docbook, say). Ie, a siteplan.
adding pipelines for the document types should be covered by the 
CAPs and the previous discussion no?

mmm, provided the end-user can add to that of course, but again: 
maybe the CAP could be told in a different way then the sitemap 
about possibly new stuff?

mmm, lets look at how the CAP turns out first, then we can 
discuss on letting it know about newer stuff

>  - Forrest's sitemap needs to be modularized, so users can choose just
>    the functionality they need. If they don't want svg2png, don't include
>    Batik. If they don't want PDFs, don't include FOP. 
> 
isn't this the kind of challenge that asks cocoon to be 
modularized first?
could be me, but it is probably not the first goal of forrest to 
do that cocoon-work?

> 
> To meet these goals, I'd like to chop the sitemap into functional
> sections:
> 
>  - Straight *.xml to *.html
>  - Site statistics reporting (apachestats)
>  - todo generation
>  - changelog generation
>  - faq
>  - 'community' section with feedback
>  - doclist
>  - DTD documentation
>  - PDFs-of-every-page
> 
> Each of these is in a sitemap 'snippet'. Users can add project-specific
> functionality by adding new snippets.
> 
mmm, most of the ones you mention are about types and the 
pipeline to get them through the 2-step-view rendition towards 
pdf, html, whadayanameit... so that is the stuff we covered with 
the CAPS & hints discussion, no?

the new thing I want to address is how to find and 
cross-reference these documents...


[stubborn on the issue at hand]
let me try to restate my original posting on this:

Suppose...
I produce an xdoc page that explains how to contribute to my 
project X... so naturally I think about making a link to the 
javadoc of the central class of my API (something like a service 
facade, imagine)... what do I write down in the <link href=""> ?
(same for book.xml links to junit reports e.g.)


and the issue should not be the trailing part any more
- given it's javadoc it could be ready html to read, or maybe 
skinless xml to pull through the pipes... don't care: CAP should 
solve it

what needs to be solved:
where is the javadoc html?
how do I reference it?

1. either we don't care and say:
the link reads: 
http://someserver/path-to-javadoc/package-path-to-class/ClassName 
(of content-type text/html)
and it is up to the organizer of the site to make sure that the 
expected content is there.
+ we forresters can go happy to sleap
- our end-user looses sleap when he needs to update all his links :-(
- end-users that want to sleep now and then are forced to dirty 
copy-with-filter ant tasks or additional xslt stuff before their 
xdoc should be picked up by forrest (being the ignorant PITA, 
although it does nice skinning, sir!)


2. either we expect the end-user to have covered it by not only 
*producing* this stuff in an ant-task that preceeds the 
forrest-task... but also: moving it into the cocoon context 
directory!
now the link reads "/jdoc/package-path-to-class/Name.html"
(or a version that he makes relative himself ?)
and he just makes sure the CAP-hint pipeline grabs it there by 
letting ant produce/move the stuff to the cocoon context directory

Great idea, however
- the cocoon context dir is where forrest puts it, sorry
- it is also organized the way forrest does it, sorry
in other words: this is making too much of our internals public 
as features, no?

It could still be possible though, and forrest would never know:
If the content-dir this person provides to the forrest-task would 
  not be ./src/documentation  but ./build/documentation where he 
first mingles all his stuff himself...


3. we could possibly aid more...
- in all cases the enduser still needs the ant tasks (foreign 
processes) to generate the javadoc (or other stuff)
- he knows where they are relative to his project.home
- we let him tell forrest (1) where that is, (2) how other 
documents are referencing the root of this stuff...

my current guess would be to do that with the XML snippet I 
proposed... (but possibly you all feel that letting him write the 
  mingle-into-build/documentation/ ant is easier?)

then let some smart ant-task (part of the forrest activity) read 
that file, copy the described stuff over into the cocoon context 
dir (we stay in charge of location and organization) where the 
CAP-hints pipeline deals with it as soon as the webapp or crawler 
asks for it

if we go for 3 over 2
then we could also hook in the tabs here (but that would be the 
next issue)



[analytical on why mounting subsitemaps will not do this]
> Could this breakup be done by mounting subsitemaps? I don't think so,
> because it would be impossible to order the subsitemap mounts correctly,
> without knowing what's in them. Eg, say the "straight *.xml to *.html"
> subsitemap is mounted first. It contains matchers for:
> 
> <map:match pattern="*.html">
> <map:match pattern="**/*.html">
> 
> That's going to match requests that ought to be left for the todo,
> changelog, faq and community snippets. Conclusion: the master sitemap
> writer would need to know in detail, just what mounted snippets contain
> and how they could interact. It's perfectly feasible that two snippets
> could 'deadlock', with A required before B in some circumstances, and B
> before A in others.
> 
> So my suggested alternative to subsitemaps is to simply throw all snippet
> contents together into one big sitemap, and then sort by increasing
> matcher generality. So *.html and **/*.html will end up near the bottom.
> 
> 
>>and again, maybe solving it with less ant and more new cocoon 
>>components would probably also solve it, and be more welcomed?
> 
> 
> The cleanest solution is for Cocoon to not care about matcher order. You
> don't have to order your templates in XSLT, so why should you have to
> order matchers in a sitemap? If the sitemap didn't care about order, then

your principle of the "increasing matcher generality"
<map:match pattern="*.html">
<map:match pattern="**/*.html">
over
<map:match pattern="faq/**/*.html">

assumes that all colliding matches are of a form that allows 
deduction of (like Java does) "unreachable code", no?

there could be complex rules to find to 'order' these
/prefix/**
*.html
in terms of increased generality?
and what if we throw in regex matchers: p(.*)l

this makes me think about priorities inside xslt :-(


In every case, I think the issue I tried to address does not 
require this... again pipelines, matchers, renditions should be 
covered.... we only look for (a hopefully simple) mechanism to 
slide in different content, with different structure,...

in a way that it can be cross-referenced...

-marc=

> we could just mount subsitemaps for each snippet, just like in an XSLT
> you can <xsl:import> various XSLTs in any order.
> 
> So given this solution, either implemented as an Ant task, or as an
> enhanced Cocoon, how does it meet the goals?
> 
>   - Users need to be able to customize the sitemap with their own
>     matchers, for whatever crazy reasons they want.
> 
> The Ant task just includes ${project.home}/src/documentation/*.xmap,
> which adds user-defined snippets to the mix.
> 
>    - In addition, we'd like to provide quick'n simple ways of doing
>      routine customizations, like specifying javadoc prefixes, and adding
>      pipelines for new document types (docbook, say). Ie, a siteplan.
> 
> The Ant task transforms the siteplan into a number of .xmap snippets,
> which are then included.
> 
>    - Forrest's sitemap needs to be modularized, so users can choose just
>      the functionality they need. If they don't want svg2png, don't
>      include Batik. If they don't want PDFs, don't include FOP. 
> 
> If the user can edit the includes and excludes pattern that the Ant task
> uses to find .xmap snippets, then they can exclude snippets they don't
> want.
> 
> Problem solved :) Technically, all that is required is a way of ordering
> matcher patterns, and then the Ant task is easy. Then we ask Sylvain if he'll
> kindly write us another sitemap processor that ignores matcher order, and
> it's beer and skittles from there on.
> 
> 
> --Jeff
> 
> 
> 
>>-marc=
> 
> 

-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
mpo@outerthought.org                              mpo@apache.org

Re: Link-Addressing, and breaking up the sitemap

Posted by Jeff Turner <je...@apache.org>.

On Thu, Sep 05, 2002 at 09:51:55AM +0200, Marc Portier wrote:
> Jeff Turner wrote:
...
> >This sounds much like Marc's siteplan idea
...
> it does, I have felt some resent to the idea of generating 
> sitemap contents though, so I'm trying to find solutions that do 
> not require that, yet still solve the issues we are facing...

Yes,  trying to 'hide' the sitemap could just introduce a layer of
complexity and get in the way.

...
> >One issue..
> >
> >I think it should always be possible to use Forrest sites offline or
> >online. For this to work, links to external things like javadocs need to
> >work 'statically' as well. It's no good having links starting with
> >'/jdoc', because in a static site they'll be interpreted relative to the
> >site root, not the forrest root.
> >
> 
> Well hidden in another side-track of this thread I launched 
> another solution for this to Nicola (he has the cocoon CLI knowledge)
> 
> <snip 
> from="http://marc.theaimsgroup.com/?l=forrest-dev&m=103097448015510&w=2">
> 
> >> by the way Nicola: the CLI should also be the one that
> >> "relativizes" all the none http://-leading hrefs in our
> >> generated html as well :-D
> >> that is the correct way to produce a bunch of relative
> >> interconnected pages that can be placed where-ever (not just
> >> in the /forrest) on a webserver. (and detaches the solution
> >> from the webapp that doesn't have this concern)
> 
> </snip>
> 
> whadayathink?
> using the CLI is about "staticalizing" your content.  The webapp 

That's almost as bad as 'prextensination'! ;)

> does not have this concern, so only the CLI needs to be taking up 
> this concern to 'relativise' all the /-leading references in the 
> produced static content.

+1. Just like how JSP tags all treat '/'-leading paths as relative to the
servlet context, not the server root.

> >
> >How about just adding a 'default' attribute to the above XML?
> >
> 
> That was my original plan, only this feature DOES push more into 
> the direction of generation of the sitemap, which is a path I try 
> to avoid ATM.

Yes, generating a whole sitemap is yucky. But generating a sitemap
_snippet_, which then gets combined (perhaps at runtime) into a live
sitemap, is much nicer. See below..

> Only solution I see now, is just to make this kinda 'fix'
> after all the real world web has this thing with index.html as 
> well, right? (I expect you to have some emotions, Jeff, given 
> your recent train of thought on the URI's :-))

 "Ban All Extensions!! Ban All Extensions!! Ban All Argghhh..
 getyourhandsoffme, peasant. I'm a knight of the W3C Ivory Tower.."

:) No, I'm not that bad, yet.

...
<snip useful tab rationale>

> >Sounds cool, as long as users can still edit the sitemap directly. No
> 
> I think this very feature offers a strong reason against 
> generating it?

Not against generating parts of it.....

> unless you want the hastle of merging with user-customizations 
> and the like :-S

Generate snippets, and then merge them with user snippets :)

> There once was the idea to enable end-user sitemap customizations 
> through mounted sitemaps though...
> 
> >matter how much project-specific info the siteplan captures, there will
> >always be users needing to play with the sitemap directly.
> >
> >What I'd like to do is write some code which, given a set of patterns:
> >
> >*.pdf
> >manual/*.pdf
> >**/*.pdf
> >
> >can sort them by 'generality'. Then we can have an Ant task which can
> >build a sitemap from a bunch of sitemap snippets. Currently, the
> >sitemap is very confusing, because matchers cannot be grouped by
> >function; they must be ordered by increasing generality. Getting the
> >order wrong leads to hard-to-debug problems. If, instead of a
> >monolithic sitemap.xmap, we could have a bunch of sitemap snippets
> >which are assembled at runtime, then a) the sitemap's functionality
> >would be modular, b) users could add their snippets without editing
> >(and potentially breaking) a giant confusing sitemap.
> >
> 
> sounds good, but needs some elaboration though...

It would be best to define the goals first:

 - Users need to be able to customize the sitemap with their own
   matchers, for whatever crazy reasons they want.

 - In addition, we'd like to provide quick'n simple ways of doing routine
   customizations, like specifying javadoc prefixes, and adding pipelines
   for new document types (docbook, say). Ie, a siteplan.

 - Forrest's sitemap needs to be modularized, so users can choose just
   the functionality they need. If they don't want svg2png, don't include
   Batik. If they don't want PDFs, don't include FOP. 

To meet these goals, I'd like to chop the sitemap into functional
sections:

 - Straight *.xml to *.html
 - Site statistics reporting (apachestats)
 - todo generation
 - changelog generation
 - faq
 - 'community' section with feedback
 - doclist
 - DTD documentation
 - PDFs-of-every-page

Each of these is in a sitemap 'snippet'. Users can add project-specific
functionality by adding new snippets.

Could this breakup be done by mounting subsitemaps? I don't think so,
because it would be impossible to order the subsitemap mounts correctly,
without knowing what's in them. Eg, say the "straight *.xml to *.html"
subsitemap is mounted first. It contains matchers for:

<map:match pattern="*.html">
<map:match pattern="**/*.html">

That's going to match requests that ought to be left for the todo,
changelog, faq and community snippets. Conclusion: the master sitemap
writer would need to know in detail, just what mounted snippets contain
and how they could interact. It's perfectly feasible that two snippets
could 'deadlock', with A required before B in some circumstances, and B
before A in others.

So my suggested alternative to subsitemaps is to simply throw all snippet
contents together into one big sitemap, and then sort by increasing
matcher generality. So *.html and **/*.html will end up near the bottom.

> and again, maybe solving it with less ant and more new cocoon 
> components would probably also solve it, and be more welcomed?

The cleanest solution is for Cocoon to not care about matcher order. You
don't have to order your templates in XSLT, so why should you have to
order matchers in a sitemap? If the sitemap didn't care about order, then
we could just mount subsitemaps for each snippet, just like in an XSLT
you can <xsl:import> various XSLTs in any order.

So given this solution, either implemented as an Ant task, or as an
enhanced Cocoon, how does it meet the goals?

  - Users need to be able to customize the sitemap with their own
    matchers, for whatever crazy reasons they want.

The Ant task just includes ${project.home}/src/documentation/*.xmap,
which adds user-defined snippets to the mix.

   - In addition, we'd like to provide quick'n simple ways of doing
     routine customizations, like specifying javadoc prefixes, and adding
     pipelines for new document types (docbook, say). Ie, a siteplan.

The Ant task transforms the siteplan into a number of .xmap snippets,
which are then included.

   - Forrest's sitemap needs to be modularized, so users can choose just
     the functionality they need. If they don't want svg2png, don't
     include Batik. If they don't want PDFs, don't include FOP. 

If the user can edit the includes and excludes pattern that the Ant task
uses to find .xmap snippets, then they can exclude snippets they don't
want.

Problem solved :) Technically, all that is required is a way of ordering
matcher patterns, and then the Ant task is easy. Then we ask Sylvain if he'll
kindly write us another sitemap processor that ignores matcher order, and
it's beer and skittles from there on.

--Jeff

> -marc=

RE: Link-Addressing, and breaking up the sitemap

Posted by Robert Koberg <ro...@koberg.com>.

Hi,

> -----Original Message-----
> From: Marc Portier [mailto:mpo@outerthought.org]
> Sent: Thursday, September 05, 2002 3:39 AM
> Hi Robert,
>
> Robert Koberg wrote:
> > Hi,
> >
> >
> >>-----Original Message-----
> >>From: Marc Portier [mailto:mpo@outerthought.org]
> >
> >
> >>Still: configuring the load of what forrest should know about
> >>your project all in *one* file seems logic to me.  (as long as it
> >>is all inside the one concern which I would largely define as
> >>"organize the project site" This would be involving: choosing
> >>some skin, and maybe setting some skin-specific customizations,
> >>decide on which parts get hooked up in which tabs, setup the
> >>libre auto-indexing rules, ...
> >>
> >>Even if the concerns would grow out of one role (person) this
> >>could still be the entry config-file that maybe lists where other
> >>files are.
> >
> >
> > I have found that two main config files are a good thing. One that
> describes the
> > site and one that describes the resources. The resources are
> referenced in the
> > site config for easy reuse.
> >
>
> can't say I fully grasp yet what is in there, just some random
> thoughts:
> - they look machine generated rather then hand-made

Users will usually use a browser gui to edit them (they usually don't know they
exist). I tend to do the bulk of the my stuff in XMLSpy and some Ant tasks. I
find it easier and extremely clear what is going on

> - seem to be relating to one document repository that manages all
> 'article-ids' (while here we try to get some mechanism to cross
> ref between parts coming from different locations we still want
> to publish all together)

Yes it is a simple site example. Resources can come from anywhere.

I am noticing most hits to the docs site listed below are going to empty (not
404) pages. What you are seeing is a storyboard that is definitely a work in
progress. First thing I do with a site is storyboard out a wireframe, add
content and play with the site till it is workable.

If you go to:
http://docs.livestoryboard.com/en_us/manual/config/site/site.xml.html
there is a pager that will take you through the site config. Each element and
it's attributes are described.

I have also described the XSLT here:
http://docs.livestoryboard.com/en_us/manual/XSL/Overview.html

I separate them into 3 categories (links to further description are on the page
and in the nav):
1. Primary - these are the main XSLTs that setup a page structure and include
the other types
2. Common - these are generic templates to handle common web needs
3. Utility - eeeekkk! business logic :)


>
> > I have some incomplete documentation with Relax NG Schemas (I think
> I generated
> > DTDs and W3 Schema) up at:
> > http://docs.livestoryboard.com/en_us/manual/config/Overview.html
> >
> >
> >
> > </snip>
> >
> >>whadayathink?
> >>using the CLI is about "staticalizing" your content.  The webapp
> >>does not have this concern, so only the CLI needs to be taking up
> >>this concern to 'relativise' all the /-leading references in the
> >>produced static content.
> >>(At first I mentioned some transformer for this, but 'in office'
> >>Steven correctly pointed out that this has only got to do with
> >>the use of the crawler.)
> >>
> >
> >
> > What happens when your static output is PDF or some other binary
> staticalized :)
> > output?
> >
>
> honestly: I have *no* idea.
>
> For static HTML it seems natural to have relative links between
> all that sit in one dir with subdirs so they can be moved around
> in the vritual root of any webserver at will...
>
> For static pdf it seems natural to have only full fledged http://
> links... given the fact that you would maybe download, print,
> mail those around? (even harder to realize)
>
> It would not make a lot of sense to me to start browsing around
> from pdf to pdf... but who am I?
>
>

You are the savior of the forrest :)

Yes, I suppose it does not make much sense unless the PDF comes with other
things that it needs to link to when downloaded to a filesystem.

Another thing might be Flash movies that have relative resources.

I am trying to say that it does not seem to be a universal answer to the
relative links problem

-Rob


> > -Rob
> >
> >
>
> -marc=
> --
> Marc Portier                            http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> mpo@outerthought.org                              mpo@apache.org
>

Re: Link-Addressing, and breaking up the sitemap

Posted by Marc Portier <mp...@outerthought.org>.

Hi Robert,

Robert Koberg wrote:
> Hi,
> 
> 
>>-----Original Message-----
>>From: Marc Portier [mailto:mpo@outerthought.org]
> 
> 
>>Still: configuring the load of what forrest should know about
>>your project all in *one* file seems logic to me.  (as long as it
>>is all inside the one concern which I would largely define as
>>"organize the project site" This would be involving: choosing
>>some skin, and maybe setting some skin-specific customizations,
>>decide on which parts get hooked up in which tabs, setup the
>>libre auto-indexing rules, ...
>>
>>Even if the concerns would grow out of one role (person) this
>>could still be the entry config-file that maybe lists where other
>>files are.
> 
> 
> I have found that two main config files are a good thing. One that describes the
> site and one that describes the resources. The resources are referenced in the
> site config for easy reuse.
> 

can't say I fully grasp yet what is in there, just some random 
thoughts:
- they look machine generated rather then hand-made
- seem to be relating to one document repository that manages all 
'article-ids' (while here we try to get some mechanism to cross 
ref between parts coming from different locations we still want 
to publish all together)

> I have some incomplete documentation with Relax NG Schemas (I think I generated
> DTDs and W3 Schema) up at:
> http://docs.livestoryboard.com/en_us/manual/config/Overview.html
> 
> 
> 
> </snip>
> 
>>whadayathink?
>>using the CLI is about "staticalizing" your content.  The webapp
>>does not have this concern, so only the CLI needs to be taking up
>>this concern to 'relativise' all the /-leading references in the
>>produced static content.
>>(At first I mentioned some transformer for this, but 'in office'
>>Steven correctly pointed out that this has only got to do with
>>the use of the crawler.)
>>
> 
> 
> What happens when your static output is PDF or some other binary staticalized :)
> output?
> 

honestly: I have *no* idea.

For static HTML it seems natural to have relative links between 
all that sit in one dir with subdirs so they can be moved around 
in the vritual root of any webserver at will...

For static pdf it seems natural to have only full fledged http:// 
links... given the fact that you would maybe download, print, 
mail those around? (even harder to realize)

It would not make a lot of sense to me to start browsing around 
from pdf to pdf... but who am I?


> -Rob
> 
> 

-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
mpo@outerthought.org                              mpo@apache.org

RE: Link-Addressing, and breaking up the sitemap

Posted by Robert Koberg <ro...@koberg.com>.

Hi,

> -----Original Message-----
> From: Marc Portier [mailto:mpo@outerthought.org]

> Still: configuring the load of what forrest should know about
> your project all in *one* file seems logic to me.  (as long as it
> is all inside the one concern which I would largely define as
> "organize the project site" This would be involving: choosing
> some skin, and maybe setting some skin-specific customizations,
> decide on which parts get hooked up in which tabs, setup the
> libre auto-indexing rules, ...
>
> Even if the concerns would grow out of one role (person) this
> could still be the entry config-file that maybe lists where other
> files are.

I have found that two main config files are a good thing. One that describes the
site and one that describes the resources. The resources are referenced in the
site config for easy reuse.

I have some incomplete documentation with Relax NG Schemas (I think I generated
DTDs and W3 Schema) up at:
http://docs.livestoryboard.com/en_us/manual/config/Overview.html



</snip>

>
> whadayathink?
> using the CLI is about "staticalizing" your content.  The webapp
> does not have this concern, so only the CLI needs to be taking up
> this concern to 'relativise' all the /-leading references in the
> produced static content.
> (At first I mentioned some transformer for this, but 'in office'
> Steven correctly pointed out that this has only got to do with
> the use of the crawler.)
>

What happens when your static output is PDF or some other binary staticalized :)
output?

-Rob

Re: Link-Addressing, and breaking up the sitemap

Posted by Marc Portier <mp...@outerthought.org>.

Jeff Turner wrote:
> On Tue, Sep 03, 2002 at 01:50:57PM +0200, Marc Portier wrote:
> ...
> 
>>for this last part I still think a simple forrest-config file 
>>makes the most sense: (since I don't know how to have ant run 
>>through properties that you don't know about at design time)
>>
>><content>
>>  <part name="doc"
>>        location="./src/documentation/content/xdocs"/>
>>  <part name="mail"
>>        location="..." />
>>  <part name="jdoc"
>>        location="..." />
>></content>
>>
>>with the knowledge of the current bot we could easily build out 
>>of this a dynamic/embedded ant piece that gets to copy over this 
>>stuff to the cocoon context dir in separate dirs that == the 
>>part-name
> 
> 
> This sounds much like Marc's siteplan idea, where an XML file contains
> project-specific details that result in various sitemap modifications.
> 

it does, I have felt some resent to the idea of generating 
sitemap contents though, so I'm trying to find solutions that do 
not require that, yet still solve the issues we are facing...

Still: configuring the load of what forrest should know about 
your project all in *one* file seems logic to me.  (as long as it 
is all inside the one concern which I would largely define as 
"organize the project site" This would be involving: choosing 
some skin, and maybe setting some skin-specific customizations, 
decide on which parts get hooked up in which tabs, setup the 
libre auto-indexing rules, ...

Even if the concerns would grow out of one role (person) this 
could still be the entry config-file that maybe lists where other 
files are.

(ATM you just need to know, or take the hurdle of sitemap reading 
  to know what where enables which effect, and IMHO having better 
documentation is more a patch then an in depth solution: 
documenting one config file will also be easier and more clear 
then...)


> One issue..
> 
> I think it should always be possible to use Forrest sites offline or
> online. For this to work, links to external things like javadocs need to
> work 'statically' as well. It's no good having links starting with
> '/jdoc', because in a static site they'll be interpreted relative to the
> site root, not the forrest root.
> 

Well hidden in another side-track of this thread I launched 
another solution for this to Nicola (he has the cocoon CLI knowledge)

<snip 
from="http://marc.theaimsgroup.com/?l=forrest-dev&m=103097448015510&w=2">

 >> by the way Nicola: the CLI should also be the one that
 >> "relativizes" all the none http://-leading hrefs in our
 >> generated html as well :-D
 >> that is the correct way to produce a bunch of relative
 >> interconnected pages that can be placed where-ever (not just
 >> in the /forrest) on a webserver. (and detaches the solution
 >> from the webapp that doesn't have this concern)

</snip>

whadayathink?
using the CLI is about "staticalizing" your content.  The webapp 
does not have this concern, so only the CLI needs to be taking up 
this concern to 'relativise' all the /-leading references in the 
produced static content.
(At first I mentioned some transformer for this, but 'in office' 
Steven correctly pointed out that this has only got to do with 
the use of the crawler.)

> We could just have an Ant task that replaces '/jdoc' with the correct
> path, adding extra ..'s to compensate for subdirectories.
> 

yes, but writing that inside the CLI seems more logic, no?
maybe it could be a switch to enable/disable this behaviour
although IMHO people should not of have been exploiting this 
quirk in the current CLI as a feature (I know forrest did, though)

> 
>>what remains though is the knowledge of the "default-page" for 
>>each of the parts... what if we link to just /doc/ ? must there 
>>then be some /content/doc/index.* ?
>>if we make this flexible and possibly differrent for the various
> 
> 
> How about just adding a 'default' attribute to the above XML?
> 

That was my original plan, only this feature DOES push more into 
the direction of generation of the sitemap, which is a path I try 
to avoid ATM.

Only solution I see now, is just to make this kinda 'fix'
after all the real world web has this thing with index.html as 
well, right? (I expect you to have some emotions, Jeff, given 
your recent train of thought on the URI's :-))

> 
>>there is one other thing we could mingle in here, and that is the 
>>tab-definitions:
>><content>
>>  <tab label="must reads" default-part="doc">
>>    <part name="doc" location=".." />
>>    <part name="jdoc" location=".." />
>>  </tab>
>>  <tab label="community" defaut-part="mail">
>>    <part name="mail">
>>  </tab>
>>  <part name=".." location=".." />
>></content>
>>
>>so we could also generate a tabs.xml :-)

it would also more clearly define what tabs are (there is some 
history around that discussion)

in fact it would define them as the tabs you find in the (real 
paper!) books (cuts or colours on the edges of the sheets)
- they are somewhat partitioning the book: each page is in a 
section (part) of which some of them can be grouped in tabs
- each page (you reference by number) knows from itself in which 
section and tab it lives --> you know which one to highlight
- the 'tab' functions as a bookmark (the cut-out ones) in that if 
you open the book at the tab, then you're on the default (first) 
page of the default (first) section (part) of that book

> 
> 
>>>From what I gather of the siteplan proposal, the idea was to have one XML
> file that effectively configures the whole of Forrest for the user's

yes. that surely still holds (soc in my book also means grouping 
everything of one concern :-) there is already enough files to 
attend to, right?)

> project.  From that XML siteplan file, one would generate XML catalogs,
> tab files, book files (integrate libre.xml?), and do lots of tweaking of
> the sitemap, and generate parameters for the skins.
> 

(that was the original idea, but I would lower it today to:)
possibly, the tweaking could also be in forrest-specific 
components that are addressed from the sitemap, maybe that sounds 
less overwhelming and 'revolution' to the more hard-line 
cocooners here
(to some the sitemap is like the Arabic Sharia :-))

> Sounds cool, as long as users can still edit the sitemap directly. No

I think this very feature offers a strong reason against 
generating it?

unless you want the hastle of merging with user-customizations 
and the like :-S

There once was the idea to enable end-user sitemap customizations 
through mounted sitemaps though...

> matter how much project-specific info the siteplan captures, there will
> always be users needing to play with the sitemap directly.
> 
> What I'd like to do is write some code which, given a set of patterns:
> 
> *.pdf
> manual/*.pdf
> **/*.pdf
> 
> can sort them by 'generality'. Then we can have an Ant task which can build a
> sitemap from a bunch of sitemap snippets. Currently, the sitemap is very
> confusing, because matchers cannot be grouped by function; they must be ordered
> by increasing generality. Getting the order wrong leads to hard-to-debug
> problems. If, instead of a monolithic sitemap.xmap, we could have a bunch of
> sitemap snippets which are assembled at runtime, then a) the sitemap's
> functionality would be modular, b) users could add their snippets without
> editing (and potentially breaking) a giant confusing sitemap.
> 

sounds good, but needs some elaboration though...

and again, maybe solving it with less ant and more new cocoon 
components would probably also solve it, and be more welcomed?
- Smarter components could also be a way to cleanup the sitemap? 
(like the CAP effort?)
- A generated sitemap is probably not something one will like to 
edit manually, and as you mentioned 'some' will always like to do 
that?
- Solving with components means you get to code (and maintain) 
Java, solving it through sitemap and ant generation (like bot 
does) is more the path of writing (and maintaining XSLT :-()


or am I misenterpreting and generalizing some senses and guesses 
I just _think_ I have picked up?

-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
mpo@outerthought.org                              mpo@apache.org

Re: Link-Addressing, and breaking up the sitemap

Posted by Jeff Turner <je...@apache.org>.

On Tue, Sep 03, 2002 at 01:50:57PM +0200, Marc Portier wrote:
...
> for this last part I still think a simple forrest-config file 
> makes the most sense: (since I don't know how to have ant run 
> through properties that you don't know about at design time)
> 
> <content>
>   <part name="doc"
>         location="./src/documentation/content/xdocs"/>
>   <part name="mail"
>         location="..." />
>   <part name="jdoc"
>         location="..." />
> </content>
>
> with the knowledge of the current bot we could easily build out 
> of this a dynamic/embedded ant piece that gets to copy over this 
> stuff to the cocoon context dir in separate dirs that == the 
> part-name

This sounds much like Marc's siteplan idea, where an XML file contains
project-specific details that result in various sitemap modifications.

One issue..

I think it should always be possible to use Forrest sites offline or
online. For this to work, links to external things like javadocs need to
work 'statically' as well. It's no good having links starting with
'/jdoc', because in a static site they'll be interpreted relative to the
site root, not the forrest root.

We could just have an Ant task that replaces '/jdoc' with the correct
path, adding extra ..'s to compensate for subdirectories.

> what remains though is the knowledge of the "default-page" for 
> each of the parts... what if we link to just /doc/ ? must there 
> then be some /content/doc/index.* ?
> if we make this flexible and possibly differrent for the various

How about just adding a 'default' attribute to the above XML?

> there is one other thing we could mingle in here, and that is the 
> tab-definitions:
> <content>
>   <tab label="must reads" default-part="doc">
>     <part name="doc" location=".." />
>     <part name="jdoc" location=".." />
>   </tab>
>   <tab label="community" defaut-part="mail">
>     <part name="mail">
>   </tab>
>   <part name=".." location=".." />
> </content>
> 
> so we could also generate a tabs.xml :-)

>From what I gather of the siteplan proposal, the idea was to have one XML
file that effectively configures the whole of Forrest for the user's
project.  From that XML siteplan file, one would generate XML catalogs,
tab files, book files (integrate libre.xml?), and do lots of tweaking of
the sitemap, and generate parameters for the skins.

Sounds cool, as long as users can still edit the sitemap directly. No
matter how much project-specific info the siteplan captures, there will
always be users needing to play with the sitemap directly.

What I'd like to do is write some code which, given a set of patterns:

*.pdf
manual/*.pdf
**/*.pdf

can sort them by 'generality'. Then we can have an Ant task which can build a
sitemap from a bunch of sitemap snippets. Currently, the sitemap is very
confusing, because matchers cannot be grouped by function; they must be ordered
by increasing generality. Getting the order wrong leads to hard-to-debug
problems. If, instead of a monolithic sitemap.xmap, we could have a bunch of
sitemap snippets which are assembled at runtime, then a) the sitemap's
functionality would be modular, b) users could add their snippets without
editing (and potentially breaking) a giant confusing sitemap.

--Jeff

> any other ideas?
> 
> -marc=

Link-Addressing (Re: Semantic linking (Re: [VOTE] Usage of file.hint.ext convention))

Posted by Marc Portier <mp...@outerthought.org>.

Babes,

What Steven calls 'intense mind battle' is in fact pure 
'tootbrush torture'... so here is my +1 on the reached consensus
(just to make him stop, aaaargh)

While we are having some momentum in discussing the URI design it 
is maybe not a bad idea to talk about the leading part of the URI 
(we just covered the trailing part)

Looking at the URI concerns I think we covered: 1. deciding on 
rendition - 2. hinting (or autodetecting via CAP) the pipeline to 
produce the IMF for the two step view
what remains though is pure 3. addressing (finding).

the easiest one is linking to external documents: we add a 
http://-leading URI to the picture and the crawler leaves it just 
as is in the produced HTML.  In fact even 'external' (== docs 
forrest does not know about, not passing through the sitemap) 
that 'accidentally' are serviced on the same server could be 
handled like this. ("http://[project-server]/javadoc" just means 
that forrest isn't providing help for tighter linking to javadoc 
stuff yet)

next would be links between hand-written xdocs: I would propose 
to make all of these relative, as to inline or referenced 
resources they bring along.  To me this makes sense, but maybe 
I'm overlooking something...

finally there is the stuff we've been kind of ignoring:
junit-reports, mailarchives, javadocs, integration-build-reports 
(alla gump), 'ze metrix', maybe syndicated news (allthough 
pulling those through a live generator makes  more sense),....

I would be -1 on just saying that all of this will at all times 
be covered (ignored) via http://-leading url's we don't care 
about.  My -1 however entitles me to a proposed solution :-)

- the different content parts will have complete separate 
relative naming & addressing strategies (copying them all into 
/content/xdocs will ensure name collissions)
- so to avoid name collissions this probably ends up being 
different subdirs of ./content/ next to the xdocs... these should 
be copied over into the cocoon context dir before webapp-wrapping 
  or cli-crawler starts off

somewhere forrest needs to be told, for each of these parts
- where the raw-content is on disk
- by use of which root-ref these documents will be referenced to

this could be fixed or predefined by us forresters so we can make 
the sitemap look for

href --> drive location <-- {copied in by forrest-ant based on:}
/jdoc/**      --> /content/jdoc <-- ${forrest.content.jdoc.dir}
/news/**      --> /content/news <-- ${forrest.content.news.dir}
/junit/**     --> /content/junit<-- ${forrest.content.junit.dir}
/mail/**      --> /content/mail <-- ${forrest.content.mail.dir}
/doc/** (or not matched previously:)
/**           --> /content/xdocs<-- ${forrest.content-dir}


only I'm not that particularly fond of "fixed and predefined"
if we could agree to move up our current URLs to be in /docs
then the matcher remains a simple /** for all (no need for any 
sitemap-generation I expressed earlier)
(and the CAPs already solve the fact that there could be multiple 
xml-doctypes everywhere)

still, allowing forrest-endusers to assemble their content in 
this way would mean
1. they have their forrest-working ant-task depend on all the 
other content-generating tasks (easy)
2. they have a mechanism to tell forrest where all that stuff is 
around... (favour this above letting them build the cocoon 
context dir themselves?)

for this last part I still think a simple forrest-config file 
makes the most sense: (since I don't know how to have ant run 
through properties that you don't know about at design time)

<content>
   <part name="doc"
         location="./src/documentation/content/xdocs"/>
   <part name="mail"
         location="..." />
   <part name="jdoc"
         location="..." />
</content>

with the knowledge of the current bot we could easily build out 
of this a dynamic/embedded ant piece that gets to copy over this 
stuff to the cocoon context dir in separate dirs that == the 
part-name

what remains though is the knowledge of the "default-page" for 
each of the parts... what if we link to just /doc/ ? must there 
then be some /content/doc/index.* ?
if we make this flexible and possibly differrent for the various

there is one other thing we could mingle in here, and that is the 
tab-definitions:
<content>
   <tab label="must reads" default-part="doc">
     <part name="doc" location=".." />
     <part name="jdoc" location=".." />
   </tab>
   <tab label="community" defaut-part="mail">
     <part name="mail">
   </tab>
   <part name=".." location=".." />
</content>

so we could also generate a tabs.xml :-)

any other ideas?

-marc=
PS: this kind of joins with the "Working on Forrest" thread 
currently over at krysalis-centipede, no?
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
mpo@outerthought.org                              mpo@apache.org

Re: Semantic linking (Re: [VOTE] Usage of file.hint.ext convention)

Posted by Steven Noels <st...@outerthought.org>.

Nicola Ken Barozzi wrote:
> Steven Noels wrote:

>> OK - this is only a short summary but I hope it is clear. Do we move 
>> forward with this?
> 
> 
> With the use of an attribute versus extension, yes.

See my other mail. No problem. I'm very glad Jeff is putting some of his 
'spare' time in Forrest :-)

We reached a resolution, folks!

Now, there's crawler- and CAP-hacking awaiting us :-)

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org                      stevenn@apache.org

Re: Semantic linking (Re: [VOTE] Usage of file.hint.ext convention)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Steven Noels wrote:
> Jeff Turner wrote:
> 
> <snip/>
> 
>> So, what's the difference between <link href="primer.html"> and <link
>> href="primer" content-type="text/html">? The same difference as between
>> "identifying a resource" and "identifying a resource representation".
>> The gods of the web have deemed that they are separate concerns; that
>> "resource" != "resource representation"; they have separate identifiers;
>> one a URI, the other a MIME type. Trying to identify both in one "href"
>> element is mixing concerns.
>>
>> Ahem. So there you go :) I fondly imagine this sort of thinking was going
>> through Steven's head when he -1'ed extra extensions in URIs.
> 
> 
> (not really answering your mail, just attaching myself to the righteous 
> thought-train in this thread ;-)
> 
> I must have been fondly out of my mind as usual, I guess, but here's a 
> summarization of an hour of intense mind battle in our offices just now, 
> being challenged by Marc (who is actually much smarter than me but has a 
> problem with his hard drive organization, hence his need for two 
> extensions):
> 
> There are three kind of sources being processed through Forrest's 
> request space:
> 
>                  Name                                   URI
> 
> 1) XML (xdoc, docbook, YourGrammar)                **.{rendition}
> 2) XML-isable non-XML (e.g. DTD documentation)     **.{hint}.{rendition}
> 3) non-XML sources (images, static HTML/PDF/etc)   **.{extension}
>    (detected by wrapping the pipelines
>    in a ResourceExistAction)
> 
> {rendition} being html, pdf, wml, svg, ...
> {hint} being dtdx ...
> 
> Examples:
> 
> 1) manual/users/concepts.html
>    pressreleases/2001-02-06.pdf
> 
> 2) dtdx/document-v11.html
>    /09/11/23.downloadstats.svg
> 
> 3) architecture.png
>    dist/forrest-src.tgz
> 
> That being said, I believe we can set up a sitemap (*the* Forrest 
> sitemap, which is the definitive reference for the URI space being 
> processed by Forrest) that handles these three types of sources with 
> only minimal prextensination [1] of our URI space.
> 
> 1) Using CAPs, we are able to describe how XML sources, dependant on 
> their grammar must be preprocessed to conform to the intermediate 
> format. People will be able to link to a named XML document, irrelevant 
> of the preproceesing required, using <link 
> href="path/name(.{rendition})"/> (and I must still read Jeff's analysis 
> of the merits of having an extension in the href linking attribute).

I don't like this.
Users must link sources, not renditions.

Having another attribute with content-type is much better.

> We were thinking along the lines of a configuration section in the 
> sitemap listing possible identifiers to assign documents to a certain 
> 'document class': public identifier, root element name, 
> xsi:SchemaLocation attribute,...
> 
> Configuration of the pipeline would then be done in a CAPAction, setting 
> sitemap parameters, i.e. selecting the correct 
> authoringformat2intermediateformat.xsl - I will expand on this if the 
> dust in my mind has settled (and Bruno has defined his implementation 
> strategy ;-)

Simply put, the link is expanded as Jeff says before processing the 
content; then we should be able to select the source file just given the 
name (no resource-exists needed), and process it from the CAP rules.

> The pipeline is basically divided in two parts: pre- and 
> post-intermediate format. The pre-IMF should not be 'visible' for the 
> document editor: he just authors a document using a certain grammar and 
> stores it on disk. The post-IMF contains the skinning, TOC aggregation, 
> etc...

Ok. (conceptually)

> Rendition finally is specified using the extension, and is part of the 
> post-IMF process (= part of the document author concern when creating a 
> link).

No, as an attribute.

> 2) Given the hint, the pipeline can be especially configured, i.e. 
> setting the Generator type to nekodtd for a DTD source - the rendition 
> is specified using the extension like in 1). The XML orginating from 
> those sources can than be subject to CAP-processing.

If the hint is given, yes.
If not the default rendition is used.

> 3) For non-XML sources, there is a ResourceExistAction wrapping the all 
> this checking if the resource being requested already exists on disk, 
> and if so, using its extension, <map:read>'s it to the browser/crawler.

No nead for resource-exists: if the names must be unique, it's just the 
pipeline that matches the *only* file and *then* given the extension it 
understands to read rather than generate.

> OK - this is only a short summary but I hope it is clear. Do we move 
> forward with this?

With the use of an attribute versus extension, yes.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Semantic linking (Re: [VOTE] Usage of file.hint.ext convention)

Posted by Steven Noels <st...@outerthought.org>.

Jeff Turner wrote:

<snip/>

> So, what's the difference between <link href="primer.html"> and <link
> href="primer" content-type="text/html">? The same difference as between
> "identifying a resource" and "identifying a resource representation".
> The gods of the web have deemed that they are separate concerns; that
> "resource" != "resource representation"; they have separate identifiers;
> one a URI, the other a MIME type. Trying to identify both in one "href"
> element is mixing concerns.
> 
> Ahem. So there you go :) I fondly imagine this sort of thinking was going
> through Steven's head when he -1'ed extra extensions in URIs.

(not really answering your mail, just attaching myself to the righteous 
thought-train in this thread ;-)

I must have been fondly out of my mind as usual, I guess, but here's a 
summarization of an hour of intense mind battle in our offices just now, 
being challenged by Marc (who is actually much smarter than me but has a 
problem with his hard drive organization, hence his need for two 
extensions):

There are three kind of sources being processed through Forrest's 
request space:

                  Name                                   URI

1) XML (xdoc, docbook, YourGrammar)                **.{rendition}
2) XML-isable non-XML (e.g. DTD documentation)     **.{hint}.{rendition}
3) non-XML sources (images, static HTML/PDF/etc)   **.{extension}
    (detected by wrapping the pipelines
    in a ResourceExistAction)

{rendition} being html, pdf, wml, svg, ...
{hint} being dtdx ...

Examples:

1) manual/users/concepts.html
    pressreleases/2001-02-06.pdf

2) dtdx/document-v11.html
    /09/11/23.downloadstats.svg

3) architecture.png
    dist/forrest-src.tgz

That being said, I believe we can set up a sitemap (*the* Forrest 
sitemap, which is the definitive reference for the URI space being 
processed by Forrest) that handles these three types of sources with 
only minimal prextensination [1] of our URI space.

1) Using CAPs, we are able to describe how XML sources, dependant on 
their grammar must be preprocessed to conform to the intermediate 
format. People will be able to link to a named XML document, irrelevant 
of the preproceesing required, using <link 
href="path/name(.{rendition})"/> (and I must still read Jeff's analysis 
of the merits of having an extension in the href linking attribute).

We were thinking along the lines of a configuration section in the 
sitemap listing possible identifiers to assign documents to a certain 
'document class': public identifier, root element name, 
xsi:SchemaLocation attribute,...

Configuration of the pipeline would then be done in a CAPAction, setting 
sitemap parameters, i.e. selecting the correct 
authoringformat2intermediateformat.xsl - I will expand on this if the 
dust in my mind has settled (and Bruno has defined his implementation 
strategy ;-)

The pipeline is basically divided in two parts: pre- and 
post-intermediate format. The pre-IMF should not be 'visible' for the 
document editor: he just authors a document using a certain grammar and 
stores it on disk. The post-IMF contains the skinning, TOC aggregation, 
etc...

Rendition finally is specified using the extension, and is part of the 
post-IMF process (= part of the document author concern when creating a 
link).

2) Given the hint, the pipeline can be especially configured, i.e. 
setting the Generator type to nekodtd for a DTD source - the rendition 
is specified using the extension like in 1). The XML orginating from 
those sources can than be subject to CAP-processing.

3) For non-XML sources, there is a ResourceExistAction wrapping the all 
this checking if the resource being requested already exists on disk, 
and if so, using its extension, <map:read>'s it to the browser/crawler.

OK - this is only a short summary but I hope it is clear. Do we move 
forward with this?

Regards,

</Steven>

________
[1] non-existant English word for the use of double extensions to 
identify the type of resources, origin: 'prextension', i.e. an extension 
before the real extension. Also called: hint ;-)

-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org                      stevenn@apache.org

Re: Semantic linking (Re: [VOTE] Usage of file.hint.ext convention)

Posted by Jeff Turner <je...@apache.org>.

On Mon, Sep 02, 2002 at 05:15:13PM +0200, Nicola Ken Barozzi wrote:
> Jeff, you finally got your brain working again ;-P
> 
> :-D
> 
> I really like this proposal, it's really cool, man.

Wohoo :) It's not a proposal yet, just a RT, and now it's out of my
system, I'll shut up and listen to everyone else's.

--Jeff

Re: Semantic linking (Re: [VOTE] Usage of file.hint.ext convention)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff, you finally got your brain working again ;-P

:-D

I really like this proposal, it's really cool, man.

+1

Jeff Turner wrote:
> On Mon, Sep 02, 2002 at 02:31:25PM +0200, Nicola Ken Barozzi wrote:
> 
>>Jeff Turner wrote:
>>
>>>I gather that (one of) the problems being addressed in this thread is the
>>>where-to-link-to problem. Eg, in index.xml:
>>>
>>> Read our <link href="primer.html">Forrest Primer</link> ... 
>>>
>>>And apparently that's bad. So my first question: why bad?
>>
> ...
> 
>>>If that's the only reason, why not do "lazy resolution" of links. In the
>>>XML, link to something abstract:
>>>
>>>Read our <link href="primer">Forrest Primer</link> ...
>>
>>Interesting, nobody removed the extension alltogether yet :-)
> 
> 
> A website is just a little corner of the web, so all the rules of the web
> (REST) should apply.
> 
> Rule #1: A resource Identifier (URI) is not the same as a resource
> Representation (HTTP response w/ content type). A resource may have
> multiple representations, but they will all be identified by the same
> URI.
> 
> (no, not sucked out of my thumb :) see
> http://www.w3.org/TR/webarch/#uri-ref-operations)
> 
> In our context, a link's href should hold an *identifier*. Eg, <link
> href="primer">.
> 
> However, obviously we need a way to indicate a desired resource
> representation too. But that is a separate concern; it's not identifying
> the *resource*, so it doesn't belong in the URI. Web browsers have a
> Content-Type: header where the preferred representation is specified. I
> think links should have something similar:
> 
> <link href="primer" content-type="text/html">
> 
> When resolving that link, Cocoon says "give me the HTML representation of
> the 'primer' resource.
> 
> Just like web browsers, the content type is usually inferred from the
> user's context. Users don't need to say "oh, and give me text/html
> please"; it's inferred from the user agent (browser).
> 
> Likewise, links usually need only specify the URI, and let the content
> type be inferred from the type of document that is doing the linking. Eg,
> if index.xml is rendered to HTML, then <link href="primer">
> gets automatically expanded to <link href="primer"
> content-type="text/html">, at the time index.html is rendered.
> 
> So, what's the difference between <link href="primer.html"> and <link
> href="primer" content-type="text/html">? The same difference as between
> "identifying a resource" and "identifying a resource representation".
> The gods of the web have deemed that they are separate concerns; that
> "resource" != "resource representation"; they have separate identifiers;
> one a URI, the other a MIME type. Trying to identify both in one "href"
> element is mixing concerns.
> 
> Ahem. So there you go :) I fondly imagine this sort of thinking was going
> through Steven's head when he -1'ed extra extensions in URIs.
> 
> 
> So practically, how does one resolve:
> 
>   <link href="primer" content-type="text/html">
>   <link href="primer" content-type="text/plain">
>   <link href="primer" content-type="application/pdf">
> 
> With the 'header' selector I guess:
> 
> <map:match pattern="primer">
>   <map:generate src="content/xdocs/primer.xml"/>
> 
>   <map:select type="header">
>     <map:parameter name="header-name" value="Content-Type"/>
> 
>     <map:when test="text/html">
>       <map:transform src="document2html.xsl"/>
>       <map:serialize/>
>     </map:when>
> 
>     <map:when test="application/pdf">
>       <map:transform src="document2fo.xsl"/>
>       <map:serialize type="fo2pdf"/>
>     </map:when>
> 
>   </map:select>
> </map:match>
> 
> The Cocoon link trawler would also need to set the content type from the
> <link content-type="..."> attribute.
> 
> 
> 
>>>And then in document2html.xsl, just append the ".html":
>>>
>>> <xsl:template match="link">
>>>   <a><xsl:attribute name="href"><xsl:value-of select="concat(@href, 
>>>   '.html')"/></xsl:attribute>
>>>     <xsl:apply-templates/>
>>>   </a>
>>>
>>>In document.fo.xsl, convert it to a <fo:basic-link>.
>>>
>>>So say a user has a PDF saved alongside all the XML files. Then <link
>>>href="mypdf.pdf"> works as expected.
>>>
>>>All the world's problems solved by removing the extension instead of adding
>>>extensions :)
>>>
>>>Please someone tell me where I lost the plot..
>>
>>  I save the files on my hd with an extension usually.
>>  So I can have
>>
>>  myfile.xml
>>  myfile.pdf
>>  myfile.txt
>>
>>What gets used by cocoon to generate myfile.pdf? The rule is not *that 
>>clear...
> 
> 
> Given the link <link href="myfile">, we'd first examine the 'context', ie
> which file *contains* the link. If it's a HTML file, the link gets
> expanded to <link href="myfile" content-type="text/html">, and therefore
> myfile.html gets linked to.
> 
> 
>>Hmmm...
>>
>>Also, when I link, I want sometimes to link to a specific content-type.
> 
> 
> Then you link explicitly:
> 
> <link href="myfile" content-type="text/html">
> 
>>Part of the problem is in fact in wanting more outputs for one input.
>>You propose one input-one output.
>>
>>Anyway, I like the no-extension link, as it's nearer to my uri proposal...
> 
> 
> I think your "primer.xml" is the same as my "primer". They're both
> identifiers, independent of MIME type. Only difference is, "primer.xml"
> as an identifier would look silly outside a filesystem, eg in an XML db.
> 
> 
>>Then what about simply:
>>
>>- I can have only one file with a certain name in the dir
>>- I can use extensions for my sake but they don't get used by Forrest
> 
> 
> Yes.
> 
> 
>>- Forrest looks inside the file to see what it contains
> 
> 
> Isn't that solving a different problem?
> 
> 
>>- I always link to the name without extension
> 
> 
> Yes, keep the information space clean and semantic.
> 
> 
>>- If I want a particular doctype, the *link* URL is mypage/contenttype
> 
> 
> Yes! :) Except rather than mypage/contenttype, have two separate
> attributes.
> 
> 
> --Jeff
> 
> 
>>- The extensions are created by Cocoon; we leave the 1-1 mapping on 
>>extensions but keep them on the filename.
>>
>>This seems to solve it, right? (fingers crossed)
>>
>>-- 
>>Nicola Ken Barozzi                   nicolaken@apache.org
>>            - verba volant, scripta manent -
>>   (discussions get forgotten, just code remains)
>>---------------------------------------------------------------------
> 
> 
> 

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Semantic linking (Re: [VOTE] Usage of file.hint.ext convention)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Steven Noels wrote:
> Jeff Turner wrote:
> 
> <snip/>
> 
>> Rule #1: A resource Identifier (URI) is not the same as a resource
>> Representation (HTTP response w/ content type). A resource may have
>> multiple representations, but they will all be identified by the same
>> URI.
> 
> 
> <snip/>
> 
>> Ahem. So there you go :) I fondly imagine this sort of thinking was going
>> through Steven's head when he -1'ed extra extensions in URIs.
> 
> 
> Naaah. It's just that others are much better at clearly expressing their 
> thoughts than me, so I'm just decorating my mails with 
> important-sounding URLs :-D
> 
> <snip type="applause and +1"/>
>
>> I think your "primer.xml" is the same as my "primer". They're both
>> identifiers, independent of MIME type. Only difference is, "primer.xml"
>> as an identifier would look silly outside a filesystem, eg in an XML db.
> 
> Also: think of rendering raw XML example documents.
> 
> Yep, I think we should stick to the behaviour described in your mail: 
> only name/identifiers in href attrs and optionally overriding the 
> context-dependent mimetype/rendition. 

appause! :-D

> How should we bind mimetypes with 
> rendition extension and URIs?
> 
> If you specify a link like this <link href="myfile" 
> content-type="text/html">, this should be picked up by the crawler, 
> which also needs to create the correct extension - so we need some 
> configuration table for this. 

I think so too.

> Also, I assume those are the links that 
> will be edited at document creation time, and they still will need to be 
> translated to href links for the webapp.

Yes, we need this too.
Add to this the absolute-relative patch that is needed, and we have it 
going nicely :-D

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Semantic linking (Re: [VOTE] Usage of file.hint.ext convention)

Posted by Jeff Turner <je...@apache.org>.

On Mon, Sep 02, 2002 at 05:43:49PM +0200, Steven Noels wrote:

> >I think your "primer.xml" is the same as my "primer". They're both
> >identifiers, independent of MIME type. Only difference is, "primer.xml"
> >as an identifier would look silly outside a filesystem, eg in an XML db.
> 
> Also: think of rendering raw XML example documents.

Good point. Although of course we could have "urn:forrest:<resourcename>"
URIs if for some reason dots become desirable.

> Yep, I think we should stick to the behaviour described in your mail: 
> only name/identifiers in href attrs and optionally overriding the 
> context-dependent mimetype/rendition.

Well summed up.

> How should we bind mimetypes with rendition extension and URIs?

Traditionally it's done through /etc/mime.types files. IIRC the JavaMail
API includes a parser and classes for playing with mime.types and mailcap
files.


2am.. goodnight :)


--Jeff

> Steven Noels                            http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> stevenn@outerthought.org                      stevenn@apache.org

Re: Semantic linking (Re: [VOTE] Usage of file.hint.ext convention)

Posted by Steven Noels <st...@outerthought.org>.

Jeff Turner wrote:

<snip/>

> Rule #1: A resource Identifier (URI) is not the same as a resource
> Representation (HTTP response w/ content type). A resource may have
> multiple representations, but they will all be identified by the same
> URI.

<snip/>

> Ahem. So there you go :) I fondly imagine this sort of thinking was going
> through Steven's head when he -1'ed extra extensions in URIs.

Naaah. It's just that others are much better at clearly expressing their 
thoughts than me, so I'm just decorating my mails with 
important-sounding URLs :-D

<snip type="applause and +1"/>

> I think your "primer.xml" is the same as my "primer". They're both
> identifiers, independent of MIME type. Only difference is, "primer.xml"
> as an identifier would look silly outside a filesystem, eg in an XML db.

Also: think of rendering raw XML example documents.

Yep, I think we should stick to the behaviour described in your mail: 
only name/identifiers in href attrs and optionally overriding the 
context-dependent mimetype/rendition. How should we bind mimetypes with 
rendition extension and URIs?

If you specify a link like this <link href="myfile" 
content-type="text/html">, this should be picked up by the crawler, 
which also needs to create the correct extension - so we need some 
configuration table for this. Also, I assume those are the links that 
will be edited at document creation time, and they still will need to be 
translated to href links for the webapp.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org                      stevenn@apache.org

Semantic linking (Re: [VOTE] Usage of file.hint.ext convention)

Posted by Jeff Turner <je...@apache.org>.

On Mon, Sep 02, 2002 at 02:31:25PM +0200, Nicola Ken Barozzi wrote:
> 
> Jeff Turner wrote:
> >I gather that (one of) the problems being addressed in this thread is the
> >where-to-link-to problem. Eg, in index.xml:
> >
> >  Read our <link href="primer.html">Forrest Primer</link> ... 
> >
> >And apparently that's bad. So my first question: why bad?
...
> >If that's the only reason, why not do "lazy resolution" of links. In the
> >XML, link to something abstract:
> >
> >Read our <link href="primer">Forrest Primer</link> ...
> 
> Interesting, nobody removed the extension alltogether yet :-)

A website is just a little corner of the web, so all the rules of the web
(REST) should apply.

Rule #1: A resource Identifier (URI) is not the same as a resource
Representation (HTTP response w/ content type). A resource may have
multiple representations, but they will all be identified by the same
URI.

(no, not sucked out of my thumb :) see
http://www.w3.org/TR/webarch/#uri-ref-operations)

In our context, a link's href should hold an *identifier*. Eg, <link
href="primer">.

However, obviously we need a way to indicate a desired resource
representation too. But that is a separate concern; it's not identifying
the *resource*, so it doesn't belong in the URI. Web browsers have a
Content-Type: header where the preferred representation is specified. I
think links should have something similar:

<link href="primer" content-type="text/html">

When resolving that link, Cocoon says "give me the HTML representation of
the 'primer' resource.

Just like web browsers, the content type is usually inferred from the
user's context. Users don't need to say "oh, and give me text/html
please"; it's inferred from the user agent (browser).

Likewise, links usually need only specify the URI, and let the content
type be inferred from the type of document that is doing the linking. Eg,
if index.xml is rendered to HTML, then <link href="primer">
gets automatically expanded to <link href="primer"
content-type="text/html">, at the time index.html is rendered.

So, what's the difference between <link href="primer.html"> and <link
href="primer" content-type="text/html">? The same difference as between
"identifying a resource" and "identifying a resource representation".
The gods of the web have deemed that they are separate concerns; that
"resource" != "resource representation"; they have separate identifiers;
one a URI, the other a MIME type. Trying to identify both in one "href"
element is mixing concerns.

Ahem. So there you go :) I fondly imagine this sort of thinking was going
through Steven's head when he -1'ed extra extensions in URIs.

So practically, how does one resolve:

  <link href="primer" content-type="text/html">
  <link href="primer" content-type="text/plain">
  <link href="primer" content-type="application/pdf">

With the 'header' selector I guess:

<map:match pattern="primer">
  <map:generate src="content/xdocs/primer.xml"/>

  <map:select type="header">
    <map:parameter name="header-name" value="Content-Type"/>

    <map:when test="text/html">
      <map:transform src="document2html.xsl"/>
      <map:serialize/>
    </map:when>

    <map:when test="application/pdf">
      <map:transform src="document2fo.xsl"/>
      <map:serialize type="fo2pdf"/>
    </map:when>

  </map:select>
</map:match>

The Cocoon link trawler would also need to set the content type from the
<link content-type="..."> attribute.

> >And then in document2html.xsl, just append the ".html":
> >
> >  <xsl:template match="link">
> >    <a><xsl:attribute name="href"><xsl:value-of select="concat(@href, 
> >    '.html')"/></xsl:attribute>
> >      <xsl:apply-templates/>
> >    </a>
> >
> >In document.fo.xsl, convert it to a <fo:basic-link>.
> >
> >So say a user has a PDF saved alongside all the XML files. Then <link
> >href="mypdf.pdf"> works as expected.
> >
> >All the world's problems solved by removing the extension instead of adding
> >extensions :)
> >
> >Please someone tell me where I lost the plot..
> 
>   I save the files on my hd with an extension usually.
>   So I can have
> 
>   myfile.xml
>   myfile.pdf
>   myfile.txt
> 
> What gets used by cocoon to generate myfile.pdf? The rule is not *that 
> clear...

Given the link <link href="myfile">, we'd first examine the 'context', ie
which file *contains* the link. If it's a HTML file, the link gets
expanded to <link href="myfile" content-type="text/html">, and therefore
myfile.html gets linked to.

> Hmmm...
> 
> Also, when I link, I want sometimes to link to a specific content-type.

Then you link explicitly:

<link href="myfile" content-type="text/html">

> Part of the problem is in fact in wanting more outputs for one input.
> You propose one input-one output.
> 
> Anyway, I like the no-extension link, as it's nearer to my uri proposal...

I think your "primer.xml" is the same as my "primer". They're both
identifiers, independent of MIME type. Only difference is, "primer.xml"
as an identifier would look silly outside a filesystem, eg in an XML db.

> Then what about simply:
> 
> - I can have only one file with a certain name in the dir
> - I can use extensions for my sake but they don't get used by Forrest

Yes.

> - Forrest looks inside the file to see what it contains

Isn't that solving a different problem?

> - I always link to the name without extension

Yes, keep the information space clean and semantic.

> - If I want a particular doctype, the *link* URL is mypage/contenttype

Yes! :) Except rather than mypage/contenttype, have two separate
attributes.

--Jeff

> - The extensions are created by Cocoon; we leave the 1-1 mapping on 
> extensions but keep them on the filename.
> 
> This seems to solve it, right? (fingers crossed)
> 
> -- 
> Nicola Ken Barozzi                   nicolaken@apache.org
>             - verba volant, scripta manent -
>    (discussions get forgotten, just code remains)
> ---------------------------------------------------------------------

Re: [VOTE] Usage of file.hint.ext convention

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:
> Sorry, I'm *way* behind :P Putting on my humble newbie hat..
> 
> I gather that (one of) the problems being addressed in this thread is the
> where-to-link-to problem. Eg, in index.xml:
> 
>   Read our <link href="primer.html">Forrest Primer</link> ... 
> 
> And apparently that's bad. So my first question: why bad?
> 
> I'd guess the wrongness is that it introduces the assumption that the
> *containing* XML file will be rendered to HTML. Eg, if index.xml ever
> becomes part of a PDF, the link is broken. 

Right, this is one reason, and a big one.

> Any other reasons?

Unfortunately yes.
It's also about having to specify in the output url the content type.

> If that's the only reason, why not do "lazy resolution" of links. In the
> XML, link to something abstract:
> 
> Read our <link href="primer">Forrest Primer</link> ...

Interesting, nobody removed the extension alltogether yet :-)

> And then in document2html.xsl, just append the ".html":
> 
>   <xsl:template match="link">
>     <a><xsl:attribute name="href"><xsl:value-of select="concat(@href, '.html')"/></xsl:attribute>
>       <xsl:apply-templates/>
>     </a>
> 
> In document.fo.xsl, convert it to a <fo:basic-link>.
> 
> So say a user has a PDF saved alongside all the XML files. Then <link
> href="mypdf.pdf"> works as expected.
> 
> All the world's problems solved by removing the extension instead of adding
> extensions :)
> 
> Please someone tell me where I lost the plot..

   I save the files on my hd with an extension usually.
   So I can have

   myfile.xml
   myfile.pdf
   myfile.txt

What gets used by cocoon to generate myfile.pdf? The rule is not *that 
clear...

Hmmm...

Also, when I link, I want sometimes to link to a specific content-type.

Is it bad?
As I said, go tell the doc writer he cannot state that he wants a pdf 
out of it ;-)

Part of the problem is in fact in wanting more outputs for one input.
You propose one input-one output.

Anyway, I like the no-extension link, as it's nearer to my uri proposal...

Then what about simply:

- I can have only one file with a certain name in the dir
- I can use extensions for my sake but they don't get used by Forrest
- Forrest looks inside the file to see what it contains
- I always link to the name without extension
- If I want a particular doctyle, the *link* URL is mypage/contenttyle
- The extensions are created by Cocoon; we leave the 1-1 mapping on 
extensions but keep them on the filename.

This seems to solve it, right? (fingers crossed)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [VOTE] Usage of file.hint.ext convention

Posted by Jeff Turner <je...@apache.org>.

Sorry, I'm *way* behind :P Putting on my humble newbie hat..

I gather that (one of) the problems being addressed in this thread is the
where-to-link-to problem. Eg, in index.xml:

  Read our <link href="primer.html">Forrest Primer</link> ... 

And apparently that's bad. So my first question: why bad?

I'd guess the wrongness is that it introduces the assumption that the
*containing* XML file will be rendered to HTML. Eg, if index.xml ever
becomes part of a PDF, the link is broken. 

Any other reasons?

If that's the only reason, why not do "lazy resolution" of links. In the
XML, link to something abstract:

Read our <link href="primer">Forrest Primer</link> ...

And then in document2html.xsl, just append the ".html":

  <xsl:template match="link">
    <a><xsl:attribute name="href"><xsl:value-of select="concat(@href, '.html')"/></xsl:attribute>
      <xsl:apply-templates/>
    </a>

In document.fo.xsl, convert it to a <fo:basic-link>.

So say a user has a PDF saved alongside all the XML files. Then <link
href="mypdf.pdf"> works as expected.

All the world's problems solved by removing the extension instead of adding
extensions :)

Please someone tell me where I lost the plot..

thanks,

--Jeff

Re: [VOTE] Usage of file.hint.ext convention

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Steven Noels wrote:
> Nicola Ken Barozzi wrote:
> 
>  > Since our users also want to put all the files in a single dir, and
>  > since Cocoon needs hints about the contents of the file for an easy
>  > usage, I propose that we formalise the
>  >
>  > file.hint.ext
>  >
>  > convention, and keep it also in the output of the files.
> 
> +0 on source files, -1 on URIs
> 
>  > We should always make ext as the *target* or the *source* extension,
>  > so that it becomes natural to link for users. Seeing mypage.xml and
>  > having to link to mypage.html has confused many already.
>  >
>  > It could be mypage.html or mypage.xml, but links and filenames need
>  > to be the same.
>  >
>  > IMHO it should be the sources, for multiple output processing.
>  >
>  > This also comes with the implicit need to change the sitemap to
>  > handle all filetypes.
> 
> Good argument.
> 
>  > Files that don't need to be necessarily processed have no hint
>  > (javadocs are html for example).
>  >
>  > Finally, I have already demonstrated how hints don't break IoC and
>  > SoC, since Cocoon can decide what to do with the hint indipendently
>  > from the doc writer; it could for example ignore it completely.
> 
> But it requires the docwriter to think about the management concern, IMO.

They still write mydoc.xml or mydoc.gif, no?
This isn't a management concern, no?

> I'm still thinking about those content-aware pipelines, and for some app 
> we are developing, we actually have been using this technique doing a 
> XML Pull Parse on the document to check its root element - here, we 
> could check for its DTD identifier.

It's neat, but a PITA for many users.

> I'm vigourously opposing the idea of encoding meta-information twice and 
> in different places: inside the document, using its filename, and in the 
> request URI.

Conceptually I agree, the hint is a "hack".

> Consider this scenario:
> 
> URI:
> 
> http://somehost/documentnameA.html
> http://somehost/documentnameB.pdf
> 
> 
> source          step 1         |   step 2        step 3      step4
>                                |
> A.docv11.xml      -            |   web.xsl      (skin.xsl)   serialize
> B.docbook.xml   db2xdoc.xsl    |   paper.xsl                 serialize
>                                |
>                                ^
>                             logical
>                               view
>                            format [1]
> 
> 
> There's two concepts that could help us here:
> 
> 1) content-aware pipelines, as being articulated in some form in 
> http://marc.theaimsgroup.com/?t=102767485200006&r=1&w=2 - the grammar of 
> the XML source document as being passed across the pipeline will decide 
> what extra preparatory transformation steps need to be done

Ok.

> 2) views - simple Cocoon views instead of the current skinning system, 
> which would oblige us to seriously think of an intermediate 'logical' 
> page format that can be fed into a media-specific stylesheet (web, 
> paper, mobile, searchindexes, TOC's etc) resulting in media-specific 
> markup that can be augmented with a purely visual skinning transformation

Man, that's what I've been advocating all along.

I think that the document.dtd can be such a step.
The switch to using XHTML for it is *exactly* this.

Users that want to write a generic document use that dtd.
All other content that must be "skinned" by forrest must be pregenerated 
by other tools to give that dtd.

We still have status.xml... etc files that get automatically transformed 
to that format.

I have been advocating the two step process since I started using Cocoon 
(see also mails to the cocoon users for example), so I'm +10000 for it 
being formailzed :-D

> Views are currently specified using the cocoon-view request parameter, 
> so maybe we could use the request-parameter Selector for that purpose:
> 
>       <map:match pattern="**">
>         <map:select type="request-parameter">
>           <map:parameter name="parameter" value="cocoon-view"/>
>           <map:when test="pdf">
>             pdf pipeline acting on a 'logical page' view?
>           </map:when>
>           <map:when test="html"/>
>         </map:select>
>       </map:match>
> 
> Or we could write some Action which uses the URI to specify the choosen 
> view/rendition.

*This* -1.

The hack of putting the intermediate step in the name is to make URI 
space indipendent from the output space; you say that even that pollutes 
the URI (I agree), and this is a step back.

The best think would be to understand something about the client 
automatically, but also a request parameter can be ok.

The point is, can we use them in statically generated documentation?

We cannot.  :-/

So we simply should say that the output format is given by the filename, 
but this is the output, not th input, and this brings us back to the 
problem that writers should concentrate on the input, and use that for 
the links to have view indipendence.

See, browser technology constrains us :-/

> I know all this is bring us to a slowdown, but I couldn't care less: I 
> feel we are deviating from best practices in favor of quick wins.
> 
> Caveat: I haven't spent enough time thinking and discussing this, and 
> perhaps I have different interests (pet peeves) than others on the list.

What you propose is the best route, but we need to be faster.

Ok, let's go into it.

1) have two step process standard +1
2) switch documentdtd to be the intermediate format and become akin to 
XHTML2 as in previous mails +1
3) use content-aware pipelines - see below
4) link the sources, not the results.

This is cool but what gets generated when I have
  file.xml -> file.html
  file.html -> file.html
Both in the same dir?

If I link to file.xml, I get the link translated to file.html, but then 
what file do I get to see?

This is the reason why we need a 1-1 relationship.

Now to explain the why of the double ext (again):

We have file.xml

- user must link using the same filename

  link href"file.xml"

- the browser needs the filename with the resulting extension:

  file.html

- the system needs to have unique names

So this brings us *necessarily* to having both xml and html included in 
the extension.
xml for unicity, html for the browser.

Or maybe have it just become with double extension only with clashing 
names, but then, how can the user tell to generate a pdf out of it if 
there is only .xml extension?

You say it shouldn't know, because part of the view?
Go tell the users.
And how can they do it without breaking the uri?

Ha.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [VOTE] Usage of file.hint.ext convention

Posted by Steven Noels <st...@outerthought.org>.

Nicola Ken Barozzi wrote:

 > Since our users also want to put all the files in a single dir, and
 > since Cocoon needs hints about the contents of the file for an easy
 > usage, I propose that we formalise the
 >
 > file.hint.ext
 >
 > convention, and keep it also in the output of the files.

+0 on source files, -1 on URIs

 > We should always make ext as the *target* or the *source* extension,
 > so that it becomes natural to link for users. Seeing mypage.xml and
 > having to link to mypage.html has confused many already.
 >
 > It could be mypage.html or mypage.xml, but links and filenames need
 > to be the same.
 >
 > IMHO it should be the sources, for multiple output processing.
 >
 > This also comes with the implicit need to change the sitemap to
 > handle all filetypes.

Good argument.

 > Files that don't need to be necessarily processed have no hint
 > (javadocs are html for example).
 >
 > Finally, I have already demonstrated how hints don't break IoC and
 > SoC, since Cocoon can decide what to do with the hint indipendently
 > from the doc writer; it could for example ignore it completely.

But it requires the docwriter to think about the management concern, IMO.

I'm still thinking about those content-aware pipelines, and for some app 
we are developing, we actually have been using this technique doing a 
XML Pull Parse on the document to check its root element - here, we 
could check for its DTD identifier.

I'm vigourously opposing the idea of encoding meta-information twice and 
in different places: inside the document, using its filename, and in the 
request URI.

Consider this scenario:

URI:

http://somehost/documentnameA.html
http://somehost/documentnameB.pdf


source          step 1         |   step 2        step 3      step4
                                |
A.docv11.xml      -            |   web.xsl      (skin.xsl)   serialize
B.docbook.xml   db2xdoc.xsl    |   paper.xsl                 serialize
                                |
                                ^
                             logical
                               view
                            format [1]


There's two concepts that could help us here:

1) content-aware pipelines, as being articulated in some form in 
http://marc.theaimsgroup.com/?t=102767485200006&r=1&w=2 - the grammar of 
the XML source document as being passed across the pipeline will decide 
what extra preparatory transformation steps need to be done

2) views - simple Cocoon views instead of the current skinning system, 
which would oblige us to seriously think of an intermediate 'logical' 
page format that can be fed into a media-specific stylesheet (web, 
paper, mobile, searchindexes, TOC's etc) resulting in media-specific 
markup that can be augmented with a purely visual skinning transformation

Views are currently specified using the cocoon-view request parameter, 
so maybe we could use the request-parameter Selector for that purpose:

       <map:match pattern="**">
         <map:select type="request-parameter">
           <map:parameter name="parameter" value="cocoon-view"/>
           <map:when test="pdf">
             pdf pipeline acting on a 'logical page' view?
           </map:when>
           <map:when test="html"/>
         </map:select>
       </map:match>

Or we could write some Action which uses the URI to specify the choosen 
view/rendition.

I know all this is bring us to a slowdown, but I couldn't care less: I 
feel we are deviating from best practices in favor of quick wins.

Caveat: I haven't spent enough time thinking and discussing this, and 
perhaps I have different interests (pet peeves) than others on the list.

</Steven>



----
[1] http://martinfowler.com/isa/htmlRenderer.html

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org                      stevenn@apache.org

Re: [VOTE] Usage of file.hint.ext convention

Posted by Marc Portier <mp...@outerthought.org>.

Nicola Ken Barozzi wrote:
> Since our users also want to put all the files in a single dir, and 
> since Cocoon needs hints about the contents of the file for an easy 
> usage, I propose that we formalise the
> 
>   file.hint.ext
> 
> convention, and keep it also in the output of the files.
> 


<snip />

> 
> Finally, I have already demonstrated how hints don't break IoC and SoC, 
> since Cocoon can decide what to do with the hint indipendently from the 
> doc writer; it could for example ignore it completely.
> 
> It's simply a hint.
> 
> +1
> 

+1, by lack of better alternative :-)

with 2-3 more votes like this we could actually (and finally) get 
the URL act together.

this gives something for hinting the correct pipeline to use
we still lack:

[From the writing URL - side]
- addressing scheme to use in xdocs (and other) to point to other 
resources that _are_ managed by forrest (do pass through the cocoon)
- a mechanism for resulting html to only contain 2 kinds of 
links: "http://"-leading  or relative NOT-"/"-leading (we must do 
this when generating HTML: it must be able to place that stuff on 
any root-url on the webserver of choice, even on CDROMS etc)

[From the defining/interpreting URL -side]
- define which piece of the URL (and how it) is used to locate 
source-content for the pipelines
- decide how the URL points to xml not inside ./content/xdcos (ie 
outside the control of the documentation editor(s)) and on where 
to store/find that content then.

-marc=
-- 
Marc Portier                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
mpo@outerthought.org                              mpo@apache.org