You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@forrest.apache.org by Nicola Ken Barozzi <ni...@apache.org> on 2002/12/11 15:34:59 UTC

URI spaces: source, processing, result

Last commit of Jeff about the "link:" usage and my commit about 
resource-exists everywhere are all tentatives to resolve an issue that 
is still unfortunately open. Add to that topicmaps, linkmaps, and local 
dir hierarchy, and we definately have an issue to solve about URI spaces.

Now, IMHO these things are being difficult to resolve because we mix 
problems together, so I'll try here to separate them in different sections.


  SoC
-----------------------

SoC means that the user should not tell the system how it wants the 
files to be processed. It just puts content on the disk and Forrest 
creates a site out of it.


  URI space
-----------------------

URI space is about concepts, not about telling Cocoon what pipeline to 
call. Hence the final URI space should be completely freeform.

For example, we should not require users to have images in **/images/** 
to be able to serve them. This means that we have to understand what 
processing to do to the files before displaying them.

Some think that it only applies to the output URI space (1) not the 
local directory structure (2). Hence the two local file placement 
approaches are like:

Final:
      ./index.xml
      ./myimage.gif
      ./mycss.css
      ./asis.pdf

(1)
      ./index.xml
      ./images/myimage.gif
      ./css/mycss.css
      ./verbatim/asis.pdf

(2)
      ./index.xml
      ./myimage.gif
      ./mycss.css
      ./asis.pdf


Personally I like (2), but we should also cater for case (1), and even 
make it possible to mix ways.


  Linking
-----------------------

Going down the completely semantical URI space, we need to also tackle 
the file exxtension issue. So links IMHO should generally be done 
without using file extensions.

When a certain resulting mimetype is needed, it should be specified in 
another attribute of the link.

Links can be saved in a general linkmap that associates link shorthands 
with actual links. These links can be used by (example):

   <link href="linkmap://my/linket">...</link>

And have something like

  <linkmap>
    <my>
      <linkethref="http://www.mysite.org/mylink"/>
    </my>
  </linkmap>

This could also take care of a Stylebook feature we miss.


  Source Mounting
-----------------------

We should be able to include external directories in our local contents.
For example, if I have

  ./src/documentation/**

and

  ./build/javadocs/**

I may want to make Forrest work as if the javadocs were in

  ./src/documentation/javadocs/**

without having to actually move the files there.

This is not only about files that must be served as-is, but also served 
as if they were in the normal hierarchy (ie xdocs).

This means that probably we should make our own sourceresolver or 
filegenerator and have that keep a mounting config that can tell where 
to get the files.
Or maybe just a SourceMountTranslate action to be called before every 
generation, that resolves the real source path, given the mount point.

Thus linking to these external resources will be done exactly as if they 
were in the normal dir space.


Does this make sense?

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Linkmaps (Re: URI spaces: source, processing, result)

Posted by Jeff Turner <je...@apache.org>.

On Thu, Dec 12, 2002 at 02:58:20AM -0800, Robert Koberg wrote:
> Hi,
> 
> > -----Original Message-----
> > From: Jeff Turner [mailto:jefft@apache.org]
> > Sent: Wednesday, December 11, 2002 10:27 PM
> <snip/>
> >
> > Here is an analogy with the seemingly uncontroversial 'linkmap' scheme.
> > How should 'linkmap' links be implemented?
> >
> > a) Have an explicit prefix, like <link href="site:/primer">
> > b) Have unprefixed links like <link href="primer">, and have the CLI open
> > the linkmap.xml file, and check if a 'primer' entry exists.  If so, treat
> > as a linkmap link.
> 
> I am failing to understand why this is a concern of some post process. Are you
> not trying to transform one representation to another? To me, the 'linkmap.xml'
> should be accessed at transformation time to transform the link.

Yes.  The question is, should the link be <link href="site:/primer"> or
<link href="primer">.  Is the fact that "primer" is a linkmap id
explicit, or must Forrest deduce it.

Though remember, this issue is an analogy for the _real_ issue, which is
whether we should have <link href="file:hello.pdf"> or <link
href="hello.pdf">.

> On the linkmap: I would not like to see a list of URIs (or URLs). Is forrest
> intended to be only for well established projects? That is, those projects that
> have their site architecture set in stone. Should forrest be used for projects
> that might need to rearrange the site structure? If it is for a new site/project
> then it would be nice to be able easily move things around without having to
> hand edit the linkmap to change the URI/URL string for each changed item. If you
> have a linkmap like:
> 
> <folder name="docroot">
>   <page id="abcd"/>
>   <folder name="folder1">
>     <page id="f1abc"/>
>     <page id="f1bcd"/>
>     <page id="f1cde"/>
>     <folder name="folder11">
>       <page id="f1abc"/>
>       <page id="f1bcd"/>
>       <page id="f1cde"/>
>     </folder>
>   </folder>
> </folder>
> 
> After you have created this initial structure, generate the site, and then some
> people look at it and determine it is not the best, usability-wise. It is
> determined that folder11 would be better served as a child of the docroot. Using
> a structure like the above you simply move the folder11 nodeset to be a child of
> the docroot. There is no need to rewrite strings telling where these things are.

Yes, certainly.

> The transformation finds the ID of the item in question and recursively builds
> the path as it is structured in the linkmap at time of generation.

There is a subtle difference between your site.xml and what I proposed as
a linkmap:

http://marc.theaimsgroup.com/?t=103444042500002&r=1&w=2

In site.xml, the XML structure models the directory structure, and
assigns IDs to nodes.

In linkmap.xml, the XML structure models the _information_ structure of
the site, and then the physical directory structure is mapped onto it.
So a linkmap might be:

<site>
  <index/>
  <primer/>
  <faq>
    <how_can_I_help/>
    <building_own_website/>
    <building_fails_on_subsequent_builds/>
  </faq>
</site>

Given this difference, it makes sense to use XPath (not node IDs) when
naming links.  Eg, <link href="site:/faq/how_can_I_help">.  Because it
_is_ a FAQ entry, and that won't change regardless of where it is
physically located.

> Now, I think the objection to this is that it is too hard to understand
> or do recursion to build these paths? Is that the problem?

No.. the linkmaps are still vapourware, but when it comes time to
implement them I'm sure a bit of recursion won't hurt.  The current
'problem' is (IMO) largely unrelated to the the linkmap concept, and
_totally_ unrelated to their implementation.  The relation can be summed
up as, "if we're going to have site: links all over the place, why not
have file: links?".

--Jeff

> best,
> -Rob

Re: Conflict resolution (Re: URI spaces: source, processing, result)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:

> 1) Have lots of time to patiently explain your POV, in multiple emails
> over the coming days?  We can start with "Jeff explaining Nicola's POV"
> and "Nicola explaining Jeff's POV" emails.

Yes. That's why I vetoed your commit, because I say a usage of Forrest 
that seems to be completely out of my views.
Every -1 has to be accompanied with a technical explanation, and I put 
it. Then I tried to explain it. If you don't understand what I'm saying, 
it's only my fault, but please do me the favor of asking me more when 
you don't understand instead of snipping. I'm trying to explain myself 
and have never stopped doing it, so I'm more than happy to explain 
things till we understand each other.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:
> On Mon, Dec 16, 2002 at 04:30:47PM +0100, Nicola Ken Barozzi wrote:
> ...
> 
>>>The way it works is a hack. I like the file: approach much better, 
>>
>>Why?
>>
>>I'm a user. I take a file. Put it in the directory. Link to it. See it 
>>in the result.
>>
>>What do you not like of this? Why is it better if I write the link with 
>>file: in it?
> 
> 
> You are perfectly right, this _should_ be how it works.  It is simple and
> intuitive.

Ok then, let's make it work :-)

> But think about it: when you said "I take a file. Put it in the
> directory. Link to it.", you're admitting that you're linking to the
> _Source_ URI.  Which is good, because you shouldn't be relying on the
> destination location.  

Ok, you have my support here. links are always done relative to current 
source location. I like this.

> But unfortunately, unprefixed links have a
> 'cocoon:' scheme, so <link href="index.pdf"> will not link to
> src/documentation/content/index.pdf.  That is why we need this file:
> prefix.

This is an implementation problem, not a conceptual one.

It could be that we will be forced by ignorance and impotence to use it 
because we cannot find a technical way of dealing with it.
But IMHO we are not there yet.

resource-exists is not a hack IMHO. If the user can put any file in the 
directory and want it to be picked up by *name without extension*, we 
cannot do without it, because we don't have enough metadata in the 
filesystem to keep mime/types alongside files, and encode the info in 
the file itself and in the name. Thus, this info has to be collected via 
*probing*, which is what resource-exists and CAPs do.

I sould be able to ask the source to give me a file, without extension, 
and have it tell me what Mime-type it is and other info. Based on that 
process it. Not having it, we use resource exists. If you have a better 
method of probing, I'm all for it.

If file systems had proper metadata, we wouldn't need all this, but 
these "hacks" as you call them are necessary given the reality of things.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Jeff Turner <je...@apache.org>.

On Mon, Dec 16, 2002 at 04:30:47PM +0100, Nicola Ken Barozzi wrote:
...
> >The way it works is a hack. I like the file: approach much better, 
> 
> Why?
> 
> I'm a user. I take a file. Put it in the directory. Link to it. See it 
> in the result.
> 
> What do you not like of this? Why is it better if I write the link with 
> file: in it?

You are perfectly right, this _should_ be how it works.  It is simple and
intuitive.

But think about it: when you said "I take a file. Put it in the
directory. Link to it.", you're admitting that you're linking to the
_Source_ URI.  Which is good, because you shouldn't be relying on the
destination location.  But unfortunately, unprefixed links have a
'cocoon:' scheme, so <link href="index.pdf"> will not link to
src/documentation/content/index.pdf.  That is why we need this file:
prefix.

--Jeff

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Jeff Turner <je...@apache.org>.

On Wed, Dec 18, 2002 at 04:57:00PM +0100, Nicola Ken Barozzi wrote:
> 
> Jeff Turner wrote:
> >On Thu, Dec 19, 2002 at 02:17:40AM +1100, Jeff Turner wrote:
> >...
> >
> >>Popping the argument stack a bit, remember that this whole silly example
> >>of index.xml/index.pdf is a pathological case, that won't have the
> >>desired effect no matter what the URI is.  You have ignored my main
> >>argument, that the 'cocoon:' prefix is implicit and _conceptually_ a
> >>file: scheme is required.
> >
> >For your convenience, here is the conceptual justification for 'file:',
> >11 emails ago:
> [...]
> ><<<<<
> >
> >To that, your response started:
> >
> >>First distinction: schemes are not IMV in the source URI space, but in
> >>the destination URI space
> >
> >In the intervening 11 emails, I hope I have at least convinced you of the
> >wrongness of that statement, and hence the position you held back then,
> >based on it.
> 
> I have already said that I have changed my mind on this particular 
> point.

Then do please respond to the snippet, and point out exactly where my
logic fails.  It is a clear set of logical inferences.

> Moreover, There were other comments during the letter, and the results
> of the discussion on those I haven't changed my mind.
> 
> A part that is still being discussed, for example, started here
> 
> "...since we have decided that link URIs should not end in extensions, 
> because of many reasons one of which is the fact that a URI can 
> reference different formats at different times in history, having a 
> scheme that effectively makes me serve two different versions of the 
> same file is totally off-target.
> "

Extensions describe _what_ the file contents is.  Schemes describe how to
get the resource.  They are not the same.  The "extensions are bad"
argument (which, if you recall, was my answer to your "lets have multiple
extensions") has no relevance here.  I described at length the solution
to "different formats at different times": have multiple output URIs.
However that is an implementation issue; the conceptual issue is the bit
you ignored the first time, and snipped this time.


> Address those. I do change my mind. But I have to be convinced, as 
> everyone here.

Strangely I don't see them -1'ing things.


--Jeff

> Don't try to short-circuit the discussion becuse it simply doesn't
> work.
>

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:
> On Thu, Dec 19, 2002 at 02:17:40AM +1100, Jeff Turner wrote:
> ...
> 
>>Popping the argument stack a bit, remember that this whole silly example
>>of index.xml/index.pdf is a pathological case, that won't have the
>>desired effect no matter what the URI is.  You have ignored my main
>>argument, that the 'cocoon:' prefix is implicit and _conceptually_ a
>>file: scheme is required.
> 
> For your convenience, here is the conceptual justification for 'file:',
> 11 emails ago:
[...]
> <<<<<
> 
> To that, your response started:
> 
>>First distinction: schemes are not IMV in the source URI space, but in
>>the destination URI space
> 
> In the intervening 11 emails, I hope I have at least convinced you of the
> wrongness of that statement, and hence the position you held back then,
> based on it.

I have already said that I have changed my mind on this particular 
point. Moreover,
There were other comments during the letter, and the results of the 
discussion on those I haven't changed my mind.

A part that is still being discussed, for example, started here

"...since we have decided that link URIs should not end in extensions, 
because of many reasons one of which is the fact that a URI can 
reference different formats at different times in history, having a 
scheme that effectively makes me serve two different versions of the 
same file is totally off-target.
"

Address those. I do change my mind. But I have to be convinced, as 
everyone here. Don't try to short-circuit the discussion becuse it 
simply doesn't work.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Jeff Turner <je...@apache.org>.

On Thu, Dec 19, 2002 at 02:17:40AM +1100, Jeff Turner wrote:
...
> Popping the argument stack a bit, remember that this whole silly example
> of index.xml/index.pdf is a pathological case, that won't have the
> desired effect no matter what the URI is.  You have ignored my main
> argument, that the 'cocoon:' prefix is implicit and _conceptually_ a
> file: scheme is required.

For your convenience, here is the conceptual justification for 'file:',
11 emails ago:

>>>>>
> Why would we need to rewrite "file:"s?

Given the above definition, what do you think the implied scheme for
<link href="hello.pdf"> is?  What syntactic and semantic restrictions are
there?  Can we link to anything?  No: we can only link to URIs defined by
sitemap rules.  Therefore the implied scheme is 'cocoon:'.  I need to
invoke Cocoon to get 'hello.pdf'.  If my editor were written in Java as
an Avalon component, it might really be able to invoke Cocoon and
retrieve 'hello.pdf'.

What about when a file is sitting on my harddisk?  Do I need Cocoon to
view it?  No; I can open it in an editor.  Hence the 'file:' protocol is
implied.  In fact, in vim I can type 'gf' and automatically traverse the
link.  My editor is a 'browser' of the Source URI space, just like
Mozilla browses the Destination URI space.

That is the important concept: the Source URI space is distinct from the
Destination URI space.  In the Source URI space (XML docs + <link>
elems), we have all sorts of schemes (linkmap:, java:, file:, person:
etc), but in the Destination URI space (HTML docs + <a> elems), we have
only one protocol, usually http: or file:.

I described this notion of separating the Source and Destination URI
space in a RT: http://marc.theaimsgroup.com/?t=103959284100002&r=1&w=2

So that is the theory: it is better to have an explicit file: scheme,
because it distinguishes those URIs from the implied 'cocoon:' scheme,
and fits in better in a world where there are schemes everywhere.

<<<<<

To that, your response started:

> First distinction: schemes are not IMV in the source URI space, but in
> the destination URI space

In the intervening 11 emails, I hope I have at least convinced you of the
wrongness of that statement, and hence the position you held back then,
based on it.

--Jeff

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Jeff Turner <je...@apache.org>.

On Wed, Dec 18, 2002 at 04:52:34PM +0100, Nicola Ken Barozzi wrote:
> 
> Jeff Turner wrote:
> >On Wed, Dec 18, 2002 at 03:23:03PM +0100, Nicola Ken Barozzi wrote:
> >...
> >
> >>>Firstly: do you agree that there _are_ two Sources?  That the user
> >>>_could_ create an index.pdf?  In fact, considering that the user isn't
> >>>meant to know that index.xml even *has* a PDF rendition, why shouldn't
> >>>they create an index.pdf?
> >>
> >>I don't agree here. The user creates documents to explain a concept. 
> >>"index" means it's the index.
> >
> >Since when do semantics come into the business of ensuring every source
> >has a URI?
> 
> A source is a piece of information. The name is a token that identifies 
> that piece of information.

Identification has absolutely zippo to do with meaning.  I can create
good URIs and I can create bad URIs.  Forrest should allow both, but
discourage the latter.

> It is placed in a context that is also named (directory). Where you
> place it has a sense -> semantics. The path is a moniker to what the
> piece of information *means*.
> 
> >Fact: users _can_ create an index.pdf.  Whether this is a good idea is
> >irrelevant: as a source of content, it deserves a source URI.
> 
> I'd say that from the discussion it comes out that users should not be 
> allowed to do it, and a check done as part of the validation, to ensure 
> that double-named files are not there.
>
> >We can
> >then say, "by the way, it's really dumb creating index.pdf when you've
> >got index.xml", but that's a layer above the raw URI space addressing
> >issue.
> 
> Not IMHO. Since we decided to link to "concepts", we have actually IMHO 
> decided that it's the filename that identifies the file, without the 
> extension.

That does not follow at all.  *Only* URIs starting with 'linkmap:' are
semantic URIs.  A linkmap is a maps from semantic addresses to source
filenames.  Let's say we have the following linkmap:

<site>
  <welcome src="index.xml"/>
  <product_catalog src="index.pdf"/>
</site>

A contrived example: imagine I have a product cataloging tool that
insists on naming its output 'index.pdf'.  With the above linkmap, I have
mapped two different concepts to two different sources.  Who cares if the
filenames are similar?


--Jeff

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:
> On Wed, Dec 18, 2002 at 03:23:03PM +0100, Nicola Ken Barozzi wrote:
> ...
> 
>>>Firstly: do you agree that there _are_ two Sources?  That the user
>>>_could_ create an index.pdf?  In fact, considering that the user isn't
>>>meant to know that index.xml even *has* a PDF rendition, why shouldn't
>>>they create an index.pdf?
>>
>>I don't agree here. The user creates documents to explain a concept. 
>>"index" means it's the index.
> 
> Since when do semantics come into the business of ensuring every source
> has a URI?

A source is a piece of information. The name is a token that identifies 
that piece of information. It is placed in a context that is also named 
(directory). Where you place it has a sense -> semantics. The path is a 
moniker to what the piece of information *means*.

> Fact: users _can_ create an index.pdf.  Whether this is a good idea is
> irrelevant: as a source of content, it deserves a source URI.

I'd say that from the discussion it comes out that users should not be 
allowed to do it, and a check done as part of the validation, to ensure 
that double-named files are not there.

> We can
> then say, "by the way, it's really dumb creating index.pdf when you've
> got index.xml", but that's a layer above the raw URI space addressing
> issue.

Not IMHO. Since we decided to link to "concepts", we have actually IMHO 
decided that it's the filename that identifies the file, without the 
extension.

> Popping the argument stack a bit, remember that this whole silly example
> of index.xml/index.pdf is a pathological case, that won't have the
> desired effect no matter what the URI is.  You have ignored my main
> argument, that the 'cocoon:' prefix is implicit and _conceptually_ a
> file: scheme is required.

I have not ignored it. I keep thinking that concetpually the file scheme 
is not require, for all the reasons I have explained.

Yes, the 'cocoon:' prefix is implicit. No, _conceptually_ it's not 
required *if* we decide that we cannot have more than one source file.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Jeff Turner <je...@apache.org>.

On Wed, Dec 18, 2002 at 03:23:03PM +0100, Nicola Ken Barozzi wrote:
...
> >Firstly: do you agree that there _are_ two Sources?  That the user
> >_could_ create an index.pdf?  In fact, considering that the user isn't
> >meant to know that index.xml even *has* a PDF rendition, why shouldn't
> >they create an index.pdf?
> 
> I don't agree here. The user creates documents to explain a concept. 
> "index" means it's the index.

Since when do semantics come into the business of ensuring every source
has a URI?

Fact: users _can_ create an index.pdf.  Whether this is a good idea is
irrelevant: as a source of content, it deserves a source URI.  We can
then say, "by the way, it's really dumb creating index.pdf when you've
got index.xml", but that's a layer above the raw URI space addressing
issue.

Popping the argument stack a bit, remember that this whole silly example
of index.xml/index.pdf is a pathological case, that won't have the
desired effect no matter what the URI is.  You have ignored my main
argument, that the 'cocoon:' prefix is implicit and _conceptually_ a
file: scheme is required.

--Jeff

> Who cares what the rendition is.
> Imagine the user making an index.xml and index.xhtml file in the same 
> dir. Does it make sense?
> 
> >Secondly, do you agree that conceptually, any source of content should be
> >assigned a Source URI?  _Regardless_ of whether it has a Destination URI?
> >Because Source and Destination URI spaces have no direct relation.  Heck,
> >I could generate a single PDF containing the entire site, thus mapping
> >lots of Source URIs to a single Destination URI.
> 
> Yes, on this I agree. We should always link to source URIs, so that what 
> you explain about a single PDF can be possible. And it's also easier for 
> the user. +1
> 
> -- 
> Nicola Ken Barozzi                   nicolaken@apache.org
>             - verba volant, scripta manent -
>    (discussions get forgotten, just code remains)
> ---------------------------------------------------------------------
>

Re: PDF transforms (was: Re: File prefix again)

Posted by Jeremias Maerki <de...@greenmail.ch>.

Hi Keiron

On 20.12.2002 09:41:52 Keiron Liddle wrote:
> On Thu, 2002-12-19 at 21:15, Jeremias Maerki wrote:
> > All cool, but how exactly is that better than having a PDF template that
> > is stitched behind or in front of the FOP result using iText or PJ?
> > Works well. Ok, PDF reading with our own library is a bonus as is better
> > XML output for debugging. But I don't see any immediate need for this at
> > the moment given our limited resources. Or do I miss anything?
> 
> Well I'm not really suggesting is is high priority, just an idea.
> One things is that the XML and the additions can work both in and out of
> Fop.
>
> At least outputing SAX in the XMLRenderer would probably be an
> improvement.

Ok then. Will you put this on the todo list?

Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: PDF transforms (was: Re: File prefix again)

Posted by Keiron Liddle <ke...@aftexsw.com>.

On Thu, 2002-12-19 at 21:15, Jeremias Maerki wrote:
> All cool, but how exactly is that better than having a PDF template that
> is stitched behind or in front of the FOP result using iText or PJ?
> Works well. Ok, PDF reading with our own library is a bonus as is better
> XML output for debugging. But I don't see any immediate need for this at
> the moment given our limited resources. Or do I miss anything?

Well I'm not really suggesting is is high priority, just an idea.
One things is that the XML and the additions can work both in and out of
Fop.

At least outputing SAX in the XMLRenderer would probably be an
improvement.




---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: PDF transforms (was: Re: File prefix again)

Posted by Jeremias Maerki <de...@greenmail.ch>.

All cool, but how exactly is that better than having a PDF template that
is stitched behind or in front of the FOP result using iText or PJ?
Works well. Ok, PDF reading with our own library is a bonus as is better
XML output for debugging. But I don't see any immediate need for this at
the moment given our limited resources. Or do I miss anything?

On 19.12.2002 08:05:54 Keiron Liddle wrote:
> On Wed, 2002-12-18 at 15:23, Nicola Ken Barozzi wrote:
> > > I don't get this.  How can PDFs be transformed?
> > 
> > There are Java libraries that read PDFs. What would be really cool is to 
> > have a reader or something like it that uses a PDF as a template.
> > Using FOP for just filling out forms is overkill, we just need templating.
> > 
> > This is a general use case of PDF transformation, and another that I 
> > would really like to see is to generate a "non-controlled copy" stamp on 
> > the PDF for the management of ISO9001 documentation.
> > 
> > Or simply by adding a copyright statement.
> 
> Sounds like some good ideas.
> 
> It would be possible to do some work with Fop so that it can:
> - convert xsl:fo to paged xml
> - convert paged xml to pdf (or other formats)
> - define templates with the paged xml
> - append paged xml to a current document
> 
> So it would be possible to create the paged xml from fo. Then to do a
> transform or directly convert or append the paged xml to pdf.
> Also the extensions and foreign xml can be passed through directly so
> that both formats support the same extensions, such as svg.
> 
> So the changes that would need to be made are:
> - improve and update xml renderer so that it can output SAX
> - improve and update AreaTreeBuilder so that it takes SAX input
> - make some additions to the pdf lib so it can load and read pdf
> documents
> 
> Then it shouldn't be so hard to add in extensions for pdf forms etc.


Jeremias Maerki


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

AW: PDF transforms (was: Re: File prefix again)

Posted by "J.U. Anderegg" <ha...@bluewin.ch>.

Hi Keiron,

> On Sun, 2002-12-22 at 02:18, Kevin O'Neill wrote:
> > Is the paged XML a new or existing format?
>
> A new format for now at least.
>
> It is possible there will be a w3c defined format.

Please give some pointer to w3c activities in this area. What is this thing
exactly supposed to do? What have externals to look like? etc...

Hansuli Anderegg



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: PDF transforms (was: Re: File prefix again)

Posted by Keiron Liddle <ke...@aftexsw.com>.

On Sun, 2002-12-22 at 02:18, Kevin O'Neill wrote:
> > It would be possible to do some work with Fop so that it can:
> > - convert xsl:fo to paged xml
> 
> Is the paged XML a new or existing format?

A new format for now at least.

It is possible there will be a w3c defined format.



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: PDF transforms (was: Re: File prefix again)

Posted by Kevin O'Neill <ke...@rocketred.com.au>.

On Thu, 2002-12-19 at 18:05, Keiron Liddle wrote:
> On Wed, 2002-12-18 at 15:23, Nicola Ken Barozzi wrote:
> > > I don't get this.  How can PDFs be transformed?
> > 
> > There are Java libraries that read PDFs. What would be really cool is to 
> > have a reader or something like it that uses a PDF as a template.
> > Using FOP for just filling out forms is overkill, we just need templating.
> > 
> > This is a general use case of PDF transformation, and another that I 
> > would really like to see is to generate a "non-controlled copy" stamp on 
> > the PDF for the management of ISO9001 documentation.
> > 
> > Or simply by adding a copyright statement.
> 
> Sounds like some good ideas.
> 
> It would be possible to do some work with Fop so that it can:
> - convert xsl:fo to paged xml

Is the paged XML a new or existing format?

> - convert paged xml to pdf (or other formats)
> - define templates with the paged xml
> - append paged xml to a current document
> 
> So it would be possible to create the paged xml from fo. Then to do a
> transform or directly convert or append the paged xml to pdf.
> Also the extensions and foreign xml can be passed through directly so
> that both formats support the same extensions, such as svg.
> 
> So the changes that would need to be made are:
> - improve and update xml renderer so that it can output SAX
> - improve and update AreaTreeBuilder so that it takes SAX input
> - make some additions to the pdf lib so it can load and read pdf
> documents
> 
> Then it shouldn't be so hard to add in extensions for pdf forms etc.
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
> For additional commands, email: fop-dev-help@xml.apache.org
-- 
If you don't test then your code is only a collection of bugs which 
apparently behave like a working program. 

Website: http://www.rocketred.com.au/blogs/kevin/


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

PDF transforms (was: Re: File prefix again)

Posted by Keiron Liddle <ke...@aftexsw.com>.

On Wed, 2002-12-18 at 15:23, Nicola Ken Barozzi wrote:
> > I don't get this.  How can PDFs be transformed?
> 
> There are Java libraries that read PDFs. What would be really cool is to 
> have a reader or something like it that uses a PDF as a template.
> Using FOP for just filling out forms is overkill, we just need templating.
> 
> This is a general use case of PDF transformation, and another that I 
> would really like to see is to generate a "non-controlled copy" stamp on 
> the PDF for the management of ISO9001 documentation.
> 
> Or simply by adding a copyright statement.

Sounds like some good ideas.

It would be possible to do some work with Fop so that it can:
- convert xsl:fo to paged xml
- convert paged xml to pdf (or other formats)
- define templates with the paged xml
- append paged xml to a current document

So it would be possible to create the paged xml from fo. Then to do a
transform or directly convert or append the paged xml to pdf.
Also the extensions and foreign xml can be passed through directly so
that both formats support the same extensions, such as svg.

So the changes that would need to be made are:
- improve and update xml renderer so that it can output SAX
- improve and update AreaTreeBuilder so that it takes SAX input
- make some additions to the pdf lib so it can load and read pdf
documents

Then it shouldn't be so hard to add in extensions for pdf forms etc.

PDF transforms (was: Re: File prefix again)

Posted by Keiron Liddle <ke...@aftexsw.com>.

On Wed, 2002-12-18 at 15:23, Nicola Ken Barozzi wrote:
> > I don't get this.  How can PDFs be transformed?
> 
> There are Java libraries that read PDFs. What would be really cool is to 
> have a reader or something like it that uses a PDF as a template.
> Using FOP for just filling out forms is overkill, we just need templating.
> 
> This is a general use case of PDF transformation, and another that I 
> would really like to see is to generate a "non-controlled copy" stamp on 
> the PDF for the management of ISO9001 documentation.
> 
> Or simply by adding a copyright statement.

Sounds like some good ideas.

It would be possible to do some work with Fop so that it can:
- convert xsl:fo to paged xml
- convert paged xml to pdf (or other formats)
- define templates with the paged xml
- append paged xml to a current document

So it would be possible to create the paged xml from fo. Then to do a
transform or directly convert or append the paged xml to pdf.
Also the extensions and foreign xml can be passed through directly so
that both formats support the same extensions, such as svg.

So the changes that would need to be made are:
- improve and update xml renderer so that it can output SAX
- improve and update AreaTreeBuilder so that it takes SAX input
- make some additions to the pdf lib so it can load and read pdf
documents

Then it shouldn't be so hard to add in extensions for pdf forms etc.



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-dev-unsubscribe@xml.apache.org
For additional commands, email: fop-dev-help@xml.apache.org

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Jeff Turner wrote:
> On Tue, Dec 17, 2002 at 03:07:57PM +0100, Nicola Ken Barozzi wrote:
> 
>>Jeff Turner wrote:
>>
>>>On Mon, Dec 16, 2002 at 04:08:37PM +0100, Nicola Ken Barozzi wrote:

[...]

>>Static or generated there is no difference. The use should not even know 
>>if Cocoon does something with it.
> 
> Yes! +1000.  But first, the user needs to identify what "it" is.  Is "it"
> the PDF rendition of index.xml, or the index.pdf file sitting on my
> harddisk?  They are two different Sources, containing completely
> different content, and they deserve different Source URIs.

My point is that there should be just one "index" file, whatever 
extension it has.

>>This is important. This is why I say that you are mixing concerns.
> 
> Identifying the source is the user's concern.  That is the I in URI.  We
> have two different Sources, we need two different URIs.

Excactly the point. Me says that we can have only one source with the 
same name. I don't see the need of having two.

>>What if the sitemap guy would want to take the PDF and transform it; 
>>with the file: protocol you are making this not possible. You are taking 
>>away from the sitemap the possibility of doing what the heck it wants 
>>with the files.
> 
> I don't get this.  How can PDFs be transformed?

There are Java libraries that read PDFs. What would be really cool is to 
have a reader or something like it that uses a PDF as a template.
Using FOP for just filling out forms is overkill, we just need templating.

This is a general use case of PDF transformation, and another that I 
would really like to see is to generate a "non-controlled copy" stamp on 
the PDF for the management of ISO9001 documentation.

Or simply by adding a copyright statement.

[...]

>Imagine I have
>>
>> ./index.xml
>> ./index.pdf
>>
>>If I link like this
>>
>>  <link href="index"/>
>>
>>Cocoon serves only one, as defined in the sitemap rules.
>>
>>If I introduce the file: protocol, I can do:
>>
>>  <link href="index"/>           ->  serve index.xml
>>  <link href="site:index.pdf"/>  ->  serve index.pdf
>>
>>Problem is, how can the browser as for
>>
>>  http://domain.ext/path/to/index
>>
>>and have one or other result?
>>
>>What would the above URL yield?
> 
> 
> Excellent point :)  One I completely missed.  So you're saying that
> disambiguating 'cocoon:index.pdf' and 'file:index.pdf' is well and good,
> but it causes a name clash in the Destination URI space.
> 
> Simple enough answer: we need two create two destination URIs, because
> there are two Source URIs.  Eg, generate:
> 
> http://localhost:8888/index.pdf    # The static index.pdf
> http://localhost:8888/index~.pdf   # index.pdf generated from XML   
> 
> But this is an implementation detail.  What I'm concerned about now is
> whether disambiguating the sources makes sense _conceptually_.
> 
> So say we have two distinct Source URIs: a static index.pdf file, and the
> PDF rendition of index.xml.  In "ideal world" syntax, we can write those
> two as:
> 
> <link href="index.pdf">
> <link href="index.xml" type="application/pdf">
> 
> In "real world: Jeff style" syntax, they'd be written as:
> 
> <link href="file:index.pdf">
> <link href="index.pdf">
> 
> In "real world: Nicola style" syntax, there'd just be:
> 
> <link href="index.pdf">
> 
> and you simply can't have an index.pdf file.

Not exactly.
If you have index.xml, that becomes the index.pdf.
If you have index.pdf, that becomes the index.pdf.

One filename, one result.

> Firstly: do you agree that there _are_ two Sources?  That the user
> _could_ create an index.pdf?  In fact, considering that the user isn't
> meant to know that index.xml even *has* a PDF rendition, why shouldn't
> they create an index.pdf?

I don't agree here. The user creates documents to explain a concept. 
"index" means it's the index. Who cares what the rendition is.
Imagine the user making an index.xml and index.xhtml file in the same 
dir. Does it make sense?

> Secondly, do you agree that conceptually, any source of content should be
> assigned a Source URI?  _Regardless_ of whether it has a Destination URI?
> Because Source and Destination URI spaces have no direct relation.  Heck,
> I could generate a single PDF containing the entire site, thus mapping
> lots of Source URIs to a single Destination URI.

Yes, on this I agree. We should always link to source URIs, so that what 
you explain about a single PDF can be possible. And it's also easier for 
the user. +1

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Jeff Turner <je...@apache.org>.

On Tue, Dec 17, 2002 at 03:07:57PM +0100, Nicola Ken Barozzi wrote:
> 
> Jeff Turner wrote:
> >On Mon, Dec 16, 2002 at 04:08:37PM +0100, Nicola Ken Barozzi wrote:
> [...]
> >
> >Your view is perfectly clear and simple: schemes are aliasing mechanisms
> >to simplify linking to the destination URI space.
> >
> >My view only makes sense once you a) buy into the notion that the Source
> >URI space exists and is distinct from the Destination URI space, b)
> >understand that, given a), the implied *source* protocol for links is
> >currently 'cocoon:'.  Only then does the reason for file: become
> >apparent: static links do _not_ have the implied 'cocoon:' scheme.  We
> >need a different scheme to disambiguate, say, a static index.pdf, and an
> >index.pdf generated from index.xml.
> 
> Static or generated there is no difference. The use should not even know 
> if Cocoon does something with it.

Yes! +1000.  But first, the user needs to identify what "it" is.  Is "it"
the PDF rendition of index.xml, or the index.pdf file sitting on my
harddisk?  They are two different Sources, containing completely
different content, and they deserve different Source URIs.

> This is important. This is why I say that you are mixing concerns.

Identifying the source is the user's concern.  That is the I in URI.  We
have two different Sources, we need two different URIs.

> What if the sitemap guy would want to take the PDF and transform it; 
> with the file: protocol you are making this not possible. You are taking 
> away from the sitemap the possibility of doing what the heck it wants 
> with the files.

I don't get this.  How can PDFs be transformed?

...
> >>>Secondly, introducing a 'file:' prefix fixes the current name clash
> >>>problem.  What if I have a static file called 'index.pdf'?  How do I
> >>>access the index.pdf generated from XML?  I can't, because the
> >>>resource-exists will always choose for me.
> >>
> >>Which is another seemingly good point, but since we have decided that 
> >>link URIs should not end in extensions, because of many reasons one of 
> >>which is the fact that a URI can reference different formats at 
> >>different times in history, having a scheme that effectively makes me 
> >>serve two different versions of the same file is totally off-target.
> >
> >See above.  There is _no way_ that a sitemap, with MIMETypeActions and
> >resource-exists and any other crazy hacks you care to name, can 100%
> >correctly choose between a static index.pdf and one generated from
> >index.xml.  Simply cannot, because there is missing info only the user
> >knows.  That is what the file: prefix adds.
> 
> Reread my point.
> 
> Imagine I have
> 
>  ./index.xml
>  ./index.pdf
> 
> If I link like this
> 
>   <link href="index"/>
> 
> Cocoon serves only one, as defined in the sitemap rules.
> 
> If I introduce the file: protocol, I can do:
> 
>   <link href="index"/>           ->  serve index.xml
>   <link href="site:index.pdf"/>  ->  serve index.pdf
> 
> Problem is, how can the browser as for
> 
>   http://domain.ext/path/to/index
> 
> and have one or other result?
> 
> What would the above URL yield?

Excellent point :)  One I completely missed.  So you're saying that
disambiguating 'cocoon:index.pdf' and 'file:index.pdf' is well and good,
but it causes a name clash in the Destination URI space.

Simple enough answer: we need two create two destination URIs, because
there are two Source URIs.  Eg, generate:

http://localhost:8888/index.pdf    # The static index.pdf
http://localhost:8888/index~.pdf   # index.pdf generated from XML   

But this is an implementation detail.  What I'm concerned about now is
whether disambiguating the sources makes sense _conceptually_.

So say we have two distinct Source URIs: a static index.pdf file, and the
PDF rendition of index.xml.  In "ideal world" syntax, we can write those
two as:

<link href="index.pdf">
<link href="index.xml" type="application/pdf">

In "real world: Jeff style" syntax, they'd be written as:

<link href="file:index.pdf">
<link href="index.pdf">

In "real world: Nicola style" syntax, there'd just be:

<link href="index.pdf">

and you simply can't have an index.pdf file.

Firstly: do you agree that there _are_ two Sources?  That the user
_could_ create an index.pdf?  In fact, considering that the user isn't
meant to know that index.xml even *has* a PDF rendition, why shouldn't
they create an index.pdf?

Secondly, do you agree that conceptually, any source of content should be
assigned a Source URI?  _Regardless_ of whether it has a Destination URI?
Because Source and Destination URI spaces have no direct relation.  Heck,
I could generate a single PDF containing the entire site, thus mapping
lots of Source URIs to a single Destination URI.

If you agree to both of those, then you'll agree that adding a file:
prefix to address static files makes conceptual sense.  If, in
pathological cases, that causes conflicts in the destination URI space,
well that's too bad; we'll fix it eventually.  Conceptually we did the
right thing.

--Jeff

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:
> On Mon, Dec 16, 2002 at 04:08:37PM +0100, Nicola Ken Barozzi wrote:
[...]
> 
> Your view is perfectly clear and simple: schemes are aliasing mechanisms
> to simplify linking to the destination URI space.
> 
> My view only makes sense once you a) buy into the notion that the Source
> URI space exists and is distinct from the Destination URI space, b)
> understand that, given a), the implied *source* protocol for links is
> currently 'cocoon:'.  Only then does the reason for file: become
> apparent: static links do _not_ have the implied 'cocoon:' scheme.  We
> need a different scheme to disambiguate, say, a static index.pdf, and an
> index.pdf generated from index.xml.

Static or generated there is no difference. The use should not even know 
if Cocoon does something with it. This is important. This is why I say 
that you are mixing concerns.

What if the sitemap guy would want to take the PDF and transform it; 
with the file: protocol you are making this not possible. You are taking 
away from the sitemap the possibility of doing what the heck it wants 
with the files.

>>>I described this notion of separating the Source and Destination URI
>>>space in a RT: http://marc.theaimsgroup.com/?t=103959284100002&r=1&w=2
>>
>>I read it, and I basically agree with it, except the above distinction 
>>which wasn't clear to me in the first place.
>>
>>
>>>So that is the theory: it is better to have an explicit file: scheme,
>>>because it distinguishes those URIs from the implied 'cocoon:' scheme,
>>>and fits in better in a world where there are schemes everywhere.
>>
>>Please expand on this. Do you mean file scheme=sources and cocoon 
>>scheme=resulting URI space?
> 
> Yes.
> 
> In a perfect world, the default scheme would be file:, not cocoon:.  So
> we could have <link href="primer.xml">, or <link href="hello.pdf">.
> Then, a linkmap would genuinely be an aliasing mechanism, but aliasing in
> the _Source_ URI space.  Eg, <link href="site:/primer"> would be exactly
> equivalent to <link href="primer.xml"> (or ../primer.xml or
> ../../primer.xml etc).  Ignore this paragraph if it doesn't make sense..

It kinda does.
I buy in the idea that I should link only to source files, and have the 
resulting URI space be created by the sitemap. But I don't buy the fact 
that in the perfect world I use the extension to reference the file, 
this because of the last comment below.

>>>Practically, right now, what is the difference?
>>>
>>>Well for a start, if we consistently used 'file:' for URIs identifying
>>>static files, we could throw away the current resource-exists action:
>>>
>>> <map:match pattern="**">
>>>
>>>   <map:act type="resource-exists">
>>>    <map:parameter name="url" value="content/{1}"/>
>>>    <map:read src="content/{../1}"/>
>>>   </map:act>
>>>   ....
>>>
>>>And replace it with a simple sitemap rule:
>>>
>>> <map:match pattern="file:**">
>>>   <map:read src="content/{1}"/>
>>> </map:match>
>>
>>Which is something I don't like.
>>
>>Again, you are telling Cocoon how to treat that file, which is not a 
>>concern of the editor.
> 
> The implied URI scheme is 'cocoon:'.  By adding a 'file:' prefix, the
> user is saying "no, this file is local".  There is nothing wrong with
> this, and no other way to distinguish between, say, a static index.pdf
> and one generated from index.xml.  

And there should not be. Se below again.

>>>Having to interrogate the filesystem to decide a URI's scheme is a total 
>>>hack.
>>>What happens if our docs are stored in Xindice, or anything other than a
>>>filesystem?  Resource-exists is going to break.
>>
>>Hmmm, this is a good point, but not a resource-exists "conceptual" 
>>problem. I can test if a resource exists also in remote repositories.
>>If the "file:" thing takes care different backends, there is no reason 
>>why a better resource-exists cannot. So seems is more about the 
>>deficiencies of the resource-exists implementation rather than the need 
>>of a site: scheme.
> 
> 
> Say I want to link to a static index.pdf, but I forget to create it.  I
> want that link to break!  I don't want Cocoon to be clever, and create
> one from index.xml.  Resource-exists is an utter hack that doesn't
> (cannot!) meet use-cases like this, because ultimately, only the user can
> know if they are referring to a local file, or one generated by Cocoon.

Given that we have ruled out extensions in the links, there can be only 
one file with the same name in the same dir. Hence there is no ambiguity.

>>>Secondly, introducing a 'file:' prefix fixes the current name clash
>>>problem.  What if I have a static file called 'index.pdf'?  How do I
>>>access the index.pdf generated from XML?  I can't, because the
>>>resource-exists will always choose for me.
>>
>>Which is another seemingly good point, but since we have decided that 
>>link URIs should not end in extensions, because of many reasons one of 
>>which is the fact that a URI can reference different formats at 
>>different times in history, having a scheme that effectively makes me 
>>serve two different versions of the same file is totally off-target.
> 
> See above.  There is _no way_ that a sitemap, with MIMETypeActions and
> resource-exists and any other crazy hacks you care to name, can 100%
> correctly choose between a static index.pdf and one generated from
> index.xml.  Simply cannot, because there is missing info only the user
> knows.  That is what the file: prefix adds.

Reread my point.

Imagine I have

  ./index.xml
  ./index.pdf

If I link like this

   <link href="index"/>

Cocoon serves only one, as defined in the sitemap rules.

If I introduce the file: protocol, I can do:

   <link href="index"/>           ->  serve index.xml
   <link href="site:index.pdf"/>  ->  serve index.pdf

Problem is, how can the browser as for

   http://domain.ext/path/to/index

and have one or other result?

What would the above URL yield?

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Jeff Turner <je...@apache.org>.

On Mon, Dec 16, 2002 at 04:08:37PM +0100, Nicola Ken Barozzi wrote:
...
> >>Why would we need to rewrite "file:"s?
> >
> >Given the above definition, what do you think the implied scheme for
> ><link href="hello.pdf"> is?  What syntactic and semantic restrictions are
> >there?  Can we link to anything?  No: we can only link to URIs defined by
> >sitemap rules.  Therefore the implied scheme is 'cocoon:'.  I need to
> >invoke Cocoon to get 'hello.pdf'.  If my editor were written in Java as
> >an Avalon component, it might really be able to invoke Cocoon and
> >retrieve 'hello.pdf'.
> >
> >What about when a file is sitting on my harddisk?  Do I need Cocoon to
> >view it?  No; I can open it in an editor.  Hence the 'file:' protocol is
> >implied.  In fact, in vim I can type 'gf' and automatically traverse the
> >link.  My editor is a 'browser' of the Source URI space, just like
> >Mozilla browses the Destination URI space.
> >
> >That is the important concept: the Source URI space is distinct from the
> >Destination URI space.  In the Source URI space (XML docs + <link>
> >elems), we have all sorts of schemes (linkmap:, java:, file:, person:
> >etc), but in the Destination URI space (HTML docs + <a> elems), we have
> >only one protocol, usually http: or file:.
> 
> First distinction: schemes are not IMV in the source URI space, but in 
> the destination URI space

In the destination URI space (HTML files), all our linkmap:, java:,
person:, mail: schemes have vanished.  The only exist in the source URI
space (XML files).

> hence my definition of link rewriting. Links are always seen from the
> outside IMV.

I edit XML files, which are source docs.  I edit the source links.
Currently, most source links are identical to destination links, but that
is what will change completely once we introduce schemes.  There is no
way you can pretend <link href="linkmap:/primer"> is a destination link,
because browsers don't understand the 'linkmap' protocol.  Only Cocoon
can.  Just as Cocoon translates source docs (XML) to destination docs
(HTML), it translates source URIs (link:, java:, etc URIs) to destination
URIs.

> With this in mind, you can infer why I don't see the need for a file:
> scheme.
> 
> Thus I link to the resulting URI space, not the source one.

You do currently.  <link href="primer.html"> is a link to the destination
URI space.  But we have agreed that that is wrong.

> The resulting URI space can be complicated, so to ease the linking I
> use schemes to make linking easier.
> 
> Well, it might as well be not the best thing to do, but this is what 
> I've been saying till now, so I see why we didn't really understand each 
> other.

Your view is perfectly clear and simple: schemes are aliasing mechanisms
to simplify linking to the destination URI space.

My view only makes sense once you a) buy into the notion that the Source
URI space exists and is distinct from the Destination URI space, b)
understand that, given a), the implied *source* protocol for links is
currently 'cocoon:'.  Only then does the reason for file: become
apparent: static links do _not_ have the implied 'cocoon:' scheme.  We
need a different scheme to disambiguate, say, a static index.pdf, and an
index.pdf generated from index.xml.

> >I described this notion of separating the Source and Destination URI
> >space in a RT: http://marc.theaimsgroup.com/?t=103959284100002&r=1&w=2
> 
> I read it, and I basically agree with it, except the above distinction 
> which wasn't clear to me in the first place.
> 
> >So that is the theory: it is better to have an explicit file: scheme,
> >because it distinguishes those URIs from the implied 'cocoon:' scheme,
> >and fits in better in a world where there are schemes everywhere.
> 
> Please expand on this. Do you mean file scheme=sources and cocoon 
> scheme=resulting URI space?

Yes.

In a perfect world, the default scheme would be file:, not cocoon:.  So
we could have <link href="primer.xml">, or <link href="hello.pdf">.
Then, a linkmap would genuinely be an aliasing mechanism, but aliasing in
the _Source_ URI space.  Eg, <link href="site:/primer"> would be exactly
equivalent to <link href="primer.xml"> (or ../primer.xml or
../../primer.xml etc).  Ignore this paragraph if it doesn't make sense..

> >Practically, right now, what is the difference?
> >
> >Well for a start, if we consistently used 'file:' for URIs identifying
> >static files, we could throw away the current resource-exists action:
> >
> >  <map:match pattern="**">
> >
> >    <map:act type="resource-exists">
> >     <map:parameter name="url" value="content/{1}"/>
> >     <map:read src="content/{../1}"/>
> >    </map:act>
> >    ....
> >
> >And replace it with a simple sitemap rule:
> >
> >  <map:match pattern="file:**">
> >    <map:read src="content/{1}"/>
> >  </map:match>
> 
> Which is something I don't like.
> 
> Again, you are telling Cocoon how to treat that file, which is not a 
> concern of the editor.

The implied URI scheme is 'cocoon:'.  By adding a 'file:' prefix, the
user is saying "no, this file is local".  There is nothing wrong with
this, and no other way to distinguish between, say, a static index.pdf
and one generated from index.xml.  The sitemap simply takes advantage of
the lexical difference.

> We decided to take away the extension to files, but this file: thing 
> does the same conceptual thing, it selects the sitemap to use inside the 
> link.

The difference is, the file: scheme is not added to make the sitemap
simpler.  That is just a nice side-effect.

> >Having to interrogate the filesystem to decide a URI's scheme is a total 
> >hack.
> >What happens if our docs are stored in Xindice, or anything other than a
> >filesystem?  Resource-exists is going to break.
> 
> Hmmm, this is a good point, but not a resource-exists "conceptual" 
> problem. I can test if a resource exists also in remote repositories.
> If the "file:" thing takes care different backends, there is no reason 
> why a better resource-exists cannot. So seems is more about the 
> deficiencies of the resource-exists implementation rather than the need 
> of a site: scheme.

Say I want to link to a static index.pdf, but I forget to create it.  I
want that link to break!  I don't want Cocoon to be clever, and create
one from index.xml.  Resource-exists is an utter hack that doesn't
(cannot!) meet use-cases like this, because ultimately, only the user can
know if they are referring to a local file, or one generated by Cocoon.

> >Secondly, introducing a 'file:' prefix fixes the current name clash
> >problem.  What if I have a static file called 'index.pdf'?  How do I
> >access the index.pdf generated from XML?  I can't, because the
> >resource-exists will always choose for me.
> 
> Which is another seemingly good point, but since we have decided that 
> link URIs should not end in extensions, because of many reasons one of 
> which is the fact that a URI can reference different formats at 
> different times in history, having a scheme that effectively makes me 
> serve two different versions of the same file is totally off-target.

See above.  There is _no way_ that a sitemap, with MIMETypeActions and
resource-exists and any other crazy hacks you care to name, can 100%
correctly choose between a static index.pdf and one generated from
index.xml.  Simply cannot, because there is missing info only the user
knows.  That is what the file: prefix adds.

--Jeff

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Steven Noels wrote:
> Nicola Ken Barozzi wrote:
> 
>> Jeff Turner wrote:
> 
> 
>>> Having to interrogate the filesystem to decide a URI's scheme is a 
>>> total hack.
>>> What happens if our docs are stored in Xindice, or anything other than a
>>> filesystem?  Resource-exists is going to break.
> 
> 
>> Hmmm, this is a good point, but not a resource-exists "conceptual" 
>> problem. I can test if a resource exists also in remote repositories.
>> If the "file:" thing takes care different backends, there is no reason 
>> why a better resource-exists cannot. So seems is more about the 
>> deficiencies of the resource-exists implementation rather than the 
>> need of a site: scheme.
> 
> 
> The way resource-exist was brought into Forrest was based on a hackish 
> idea. 

Please explain why.

> The way it works is a hack. I like the file: approach much better, 

Why?

I'm a user. I take a file. Put it in the directory. Link to it. See it 
in the result.

What do you not like of this? Why is it better if I write the link with 
file: in it? Because that will be the only difference to the user.

> and I don't feel like I don't understand Cocoon or anything else because 
> of that. It's on the same level of letting the user put hints in his 
> documents as we currently inform people about some obscure XLink 
> attribute which can be set to stop crawling. At the very least, file: 
> will have been designed & coded by a community.

I don't get this.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Steven Noels <st...@outerthought.org>.

Nicola Ken Barozzi wrote:

> Jeff Turner wrote:

>> Having to interrogate the filesystem to decide a URI's scheme is a 
>> total hack.
>> What happens if our docs are stored in Xindice, or anything other than a
>> filesystem?  Resource-exists is going to break.

> Hmmm, this is a good point, but not a resource-exists "conceptual" 
> problem. I can test if a resource exists also in remote repositories.
> If the "file:" thing takes care different backends, there is no reason 
> why a better resource-exists cannot. So seems is more about the 
> deficiencies of the resource-exists implementation rather than the need 
> of a site: scheme.

The way resource-exist was brought into Forrest was based on a hackish 
idea. The way it works is a hack. I like the file: approach much better, 
and I don't feel like I don't understand Cocoon or anything else because 
of that. It's on the same level of letting the user put hints in his 
documents as we currently inform people about some obscure XLink 
attribute which can be set to stop crawling. At the very least, file: 
will have been designed & coded by a community.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:
> On Mon, Dec 16, 2002 at 02:01:52PM +0100, Nicola Ken Barozzi wrote:
> 
>>
>>Jeff Turner wrote:
>>
[...]

>>>The file: patch has two effects:
>>>
>>>- Introduce schemes in xdocs, starting with a 'file:' scheme.  I think
>>>  that schemes in general are uncontroversial.  When linkmaps arrive,
>>>  90% of links are going to be linkmap links, so having a scheme prefix
>>>  should be the norm. 
>>
>>I'm totally for the scheme concept. But schemes are IMHV onlt link 
>>rewriting rules, and should not address other concerns.
>>A file: scheme would not do any rewriting, so I don't see the need ATM.
> 
> ...
> 
>>>What we really need to agree on is the first point; whether we want to
>>>prefix static links with 'file:'.  When xdocs are swarming with linkmap:,
>>>java:, person:, mail:, etc links, why not have file:?  Conversely, if we
>>>want to "infer" the file: scheme, are we going to try to infer all the
>>>other schemes?
>>
>>Hmmm, I don't see the big problem here, but I may as well be wrong.
>>
>>The schemes are link-rewriting systems.
> 
> Schemes are what the URI RFC defines them to be:
> 
>   "The URI scheme (Section 3.1) defines the namespace of the URI, and
>   thus may further restrict the syntax and semantics of identifiers using
>   that scheme.
>     http://www.ietf.org/rfc/rfc2396.txt

Corrected: Forrest schemes IMV are link-rewriting systems. This is to 
make the resulting URI space be completely decoupled from the source space.

>>Why would we need to rewrite "file:"s?
> 
> Given the above definition, what do you think the implied scheme for
> <link href="hello.pdf"> is?  What syntactic and semantic restrictions are
> there?  Can we link to anything?  No: we can only link to URIs defined by
> sitemap rules.  Therefore the implied scheme is 'cocoon:'.  I need to
> invoke Cocoon to get 'hello.pdf'.  If my editor were written in Java as
> an Avalon component, it might really be able to invoke Cocoon and
> retrieve 'hello.pdf'.
> 
> What about when a file is sitting on my harddisk?  Do I need Cocoon to
> view it?  No; I can open it in an editor.  Hence the 'file:' protocol is
> implied.  In fact, in vim I can type 'gf' and automatically traverse the
> link.  My editor is a 'browser' of the Source URI space, just like
> Mozilla browses the Destination URI space.
> 
> That is the important concept: the Source URI space is distinct from the
> Destination URI space.  In the Source URI space (XML docs + <link>
> elems), we have all sorts of schemes (linkmap:, java:, file:, person:
> etc), but in the Destination URI space (HTML docs + <a> elems), we have
> only one protocol, usually http: or file:.

First distinction: schemes are not IMV in the source URI space, but in 
the destination URI space, hence my definition of link rewriting. Links 
are always seen from the outside IMV. With this in mind, you can infer 
why I don't see the need for a file: scheme.

Thus I link to the resulting URI space, not the source one. The 
resulting URI space can be complicated, so to ease the linking I use 
schemes to make linking easier.

Well, it might as well be not the best thing to do, but this is what 
I've been saying till now, so I see why we didn't really understand each 
other.

> I described this notion of separating the Source and Destination URI
> space in a RT: http://marc.theaimsgroup.com/?t=103959284100002&r=1&w=2

I read it, and I basically agree with it, except the above distinction 
which wasn't clear to me in the first place.

> So that is the theory: it is better to have an explicit file: scheme,
> because it distinguishes those URIs from the implied 'cocoon:' scheme,
> and fits in better in a world where there are schemes everywhere.

Please expand on this. Do you mean file scheme=sources and cocoon 
scheme=resulting URI space?

> Practically, right now, what is the difference?
> 
> Well for a start, if we consistently used 'file:' for URIs identifying
> static files, we could throw away the current resource-exists action:
> 
>   <map:match pattern="**">
> 
>     <map:act type="resource-exists">
>      <map:parameter name="url" value="content/{1}"/>
>      <map:read src="content/{../1}"/>
>     </map:act>
>     ....
> 
> And replace it with a simple sitemap rule:
> 
>   <map:match pattern="file:**">
>     <map:read src="content/{1}"/>
>   </map:match>

Which is something I don't like.

Again, you are telling Cocoon how to treat that file, which is not a 
concern of the editor.

We decided to take away the extension to files, but this file: thing 
does the same conceptual thing, it selects the sitemap to use inside the 
link.

> Having to interrogate the filesystem to decide a URI's scheme is a total hack.
> What happens if our docs are stored in Xindice, or anything other than a
> filesystem?  Resource-exists is going to break.

Hmmm, this is a good point, but not a resource-exists "conceptual" 
problem. I can test if a resource exists also in remote repositories.
If the "file:" thing takes care different backends, there is no reason 
why a better resource-exists cannot. So seems is more about the 
deficiencies of the resource-exists implementation rather than the need 
of a site: scheme.

> Secondly, introducing a 'file:' prefix fixes the current name clash problem.
> What if I have a static file called 'index.pdf'?  How do I access the index.pdf
> generated from XML?  I can't, because the resource-exists will always choose
> for me.

Which is another seemingly good point, but since we have decided that 
link URIs should not end in extensions, because of many reasons one of 
which is the fact that a URI can reference different formats at 
different times in history, having a scheme that effectively makes me 
serve two different versions of the same file is totally off-target.

> So there are two practical reasons, and a bunch of theory, as to why we should
> have a 'file:' prefix.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

File prefix again (Re: Cocoon CLI - how to generate the whole site)

Posted by Jeff Turner <je...@apache.org>.

On Mon, Dec 16, 2002 at 02:01:52PM +0100, Nicola Ken Barozzi wrote:
> 
> 
> Jeff Turner wrote:
> >On Mon, Dec 16, 2002 at 08:59:32AM +0100, Nicola Ken Barozzi wrote:
> >
> >>Jeff Turner wrote:
> >
> >...
> >
> >>>>We've established that Cocoon is not going to be invoking Javadoc.  That
> >>>>means that the user could generate the Javadocs _after_ they generate 
> >>>>the
> >>>>Cocoon docs.
> >>>>
> >>>>To handle this possibility, the only course of action is to ignore links
> >>>>to external directories like Javadocs.  What alternative is there?
> >>
> >>Yes, but I don't want this to happen, as I said in other mails.
> >>The fact is that for every URI sub-space we take away from Cocoon, we 
> >>should have something that manages it for Cocoon, and that's for *all* 
> >>the environments Cocoon has to offer, because Forrest is made to run in 
> >>all of them.
> >
> >
> >Ah, gotcha :)
> 
> Pfew, it took a long time didn't it?

;P

> >The file: patch has two effects:
> >
> > - Introduce schemes in xdocs, starting with a 'file:' scheme.  I think
> >   that schemes in general are uncontroversial.  When linkmaps arrive,
> >   90% of links are going to be linkmap links, so having a scheme prefix
> >   should be the norm. 
> 
> I'm totally for the scheme concept. But schemes are IMHV onlt link 
> rewriting rules, and should not address other concerns.
> A file: scheme would not do any rewriting, so I don't see the need ATM.
...
> >What we really need to agree on is the first point; whether we want to
> >prefix static links with 'file:'.  When xdocs are swarming with linkmap:,
> >java:, person:, mail:, etc links, why not have file:?  Conversely, if we
> >want to "infer" the file: scheme, are we going to try to infer all the
> >other schemes?
> 
> Hmmm, I don't see the big problem here, but I may as well be wrong.
> 
> The schemes are link-rewriting systems.

Schemes are what the URI RFC defines them to be:

  "The URI scheme (Section 3.1) defines the namespace of the URI, and
  thus may further restrict the syntax and semantics of identifiers using
  that scheme.
    http://www.ietf.org/rfc/rfc2396.txt

> Why would we need to rewrite "file:"s?

Given the above definition, what do you think the implied scheme for
<link href="hello.pdf"> is?  What syntactic and semantic restrictions are
there?  Can we link to anything?  No: we can only link to URIs defined by
sitemap rules.  Therefore the implied scheme is 'cocoon:'.  I need to
invoke Cocoon to get 'hello.pdf'.  If my editor were written in Java as
an Avalon component, it might really be able to invoke Cocoon and
retrieve 'hello.pdf'.

What about when a file is sitting on my harddisk?  Do I need Cocoon to
view it?  No; I can open it in an editor.  Hence the 'file:' protocol is
implied.  In fact, in vim I can type 'gf' and automatically traverse the
link.  My editor is a 'browser' of the Source URI space, just like
Mozilla browses the Destination URI space.

That is the important concept: the Source URI space is distinct from the
Destination URI space.  In the Source URI space (XML docs + <link>
elems), we have all sorts of schemes (linkmap:, java:, file:, person:
etc), but in the Destination URI space (HTML docs + <a> elems), we have
only one protocol, usually http: or file:.

I described this notion of separating the Source and Destination URI
space in a RT: http://marc.theaimsgroup.com/?t=103959284100002&r=1&w=2

So that is the theory: it is better to have an explicit file: scheme,
because it distinguishes those URIs from the implied 'cocoon:' scheme,
and fits in better in a world where there are schemes everywhere.

Practically, right now, what is the difference?

Well for a start, if we consistently used 'file:' for URIs identifying
static files, we could throw away the current resource-exists action:

  <map:match pattern="**">

    <map:act type="resource-exists">
     <map:parameter name="url" value="content/{1}"/>
     <map:read src="content/{../1}"/>
    </map:act>
    ....

And replace it with a simple sitemap rule:

  <map:match pattern="file:**">
    <map:read src="content/{1}"/>
  </map:match>

Having to interrogate the filesystem to decide a URI's scheme is a total hack.
What happens if our docs are stored in Xindice, or anything other than a
filesystem?  Resource-exists is going to break.

Secondly, introducing a 'file:' prefix fixes the current name clash problem.
What if I have a static file called 'index.pdf'?  How do I access the index.pdf
generated from XML?  I can't, because the resource-exists will always choose
for me.

So there are two practical reasons, and a bunch of theory, as to why we should
have a 'file:' prefix.

--Jeff

Re: Cocoon CLI - how to generate the whole site (Re: The Mythical Javadoc generator (Re: Conflict resolution))

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Jeff Turner wrote:
> On Mon, Dec 16, 2002 at 08:59:32AM +0100, Nicola Ken Barozzi wrote:
> 
>>Jeff Turner wrote:
> 
> ...
> 
>>>>We've established that Cocoon is not going to be invoking Javadoc.  That
>>>>means that the user could generate the Javadocs _after_ they generate the
>>>>Cocoon docs.
>>>>
>>>>To handle this possibility, the only course of action is to ignore links
>>>>to external directories like Javadocs.  What alternative is there?
>>
>>Yes, but I don't want this to happen, as I said in other mails.
>>The fact is that for every URI sub-space we take away from Cocoon, we 
>>should have something that manages it for Cocoon, and that's for *all* 
>>the environments Cocoon has to offer, because Forrest is made to run in 
>>all of them.
> 
> 
> Ah, gotcha :)

Pfew, it took a long time didn't it?

> Though remember, with the file: patch, the sitemap *did* serve up files,
> through this rule:
> 
> <map:match pattern="**">
>   <map:act type="resource-exists">
>     <map:parameter name="url" value="content/{1}"/>
>     <map:read src="content/{../1}"/>
>   </map:act>
> 
> So it worked in both command-line and webapp.  The command-line solution
> just happened to bypass the Cocoon CLI.

Which is the point :-)

> The file: patch has two effects:
> 
>  - Introduce schemes in xdocs, starting with a 'file:' scheme.  I think
>    that schemes in general are uncontroversial.  When linkmaps arrive,
>    90% of links are going to be linkmap links, so having a scheme prefix
>    should be the norm. 

I'm totally for the scheme concept. But schemes are IMHV onlt link 
rewriting rules, and should not address other concerns.
A file: scheme would not do any rewriting, so I don't see the need ATM.

>  - Routes around a CLI bug, by copying static files with Ant, rather than
>    through the CLI.

Yup, that's the major point that I didn't like.

> What we really need to agree on is the first point; whether we want to
> prefix static links with 'file:'.  When xdocs are swarming with linkmap:,
> java:, person:, mail:, etc links, why not have file:?  Conversely, if we
> want to "infer" the file: scheme, are we going to try to infer all the
> other schemes?

Hmmm, I don't see the big problem here, but I may as well be wrong.

The schemes are link-rewriting systems. Why would we need to rewrite 
"file:"s? Remember that to get a specific type of "view" on the file we 
have the mime-type attribute in links.

>>If we had a CLI-only Forrest, I could say ok, let's do it, let's make 
>>Ant handle that, but I don't want to see different "special cases" of 
>>handling these spaces. Your proposal has IMHO the same drawbacks as it 
>>had before nevertheless.
> 
> Yes I see.  It hacks around a CLI bug, and introduces a mechanism by
> which further potentially-hack-requiring schemes (like java:) could be
> implemented.

I'm quite confident that we won't use "hack-requiring schemes".
At least that's my goal.

>>>>One thing we could do, is record all 'unprocessable' links in an external
>>>>file, and then the Ant script responsible for invoking Cocoon can look at
>>>>that, and ensure that the links won't break.  For example, say Cocoon
>>>>encounters an unprocessable 'java:org.apache.foo' link.  Cocoon records
>>>>that in unprocessed-files.txt, and otherwise ignore it.  Then, after the
>>>><java> task has finished running Cocoon, an Ant task examines
>>>>unprocessed-files.txt, and if any java: links are recorded, it invokes a
>>>>Javadoc task.
>>>>
>>>>So we have a kind of loose coupling between Cocoon and other doc
>>>>generators.  Cocoon isn't _responsible_ for generating Javadocs, but it
>>>>can _cause_ Javadocs to be generated, by recording that fact that it
>>>>encountered a java: link and couldn't handle it.
>>
>>Hmmm... this idea is somewhat new... the problem is that it breaks down 
>>with the Cocoon webapp.
> 
> It doesn't break down.  It makes the CLI solution independent of the
> webapp solution.  In the case of file:, the webapp happened to have
> solved the problem.
> 
> 
>>My point is IMHO simple: if the webapp Cocoon can handle it, the CLI 
>>should similarly handle it. No special cases. If Cocoon has to trigger 
>>some outer system, we already have Generators, Transformers, Actions, 
>>etc, no need to create another system that BTW bypasses all Cocoon 
>>environment abstractions.
> 
> 
> Yes, that's the ideal.
> 
> 
>>IMHO, Cocoon is the last step, the publishing step. This is the only way 
>>I see to keep consistency between the different Cocoon running modes. 
>>Hence I don't think that triggereing actions after Cocoon CLI is going 
>>to solve problems, but instead created more since it breaks the sitemap.
> 
> Not break, just doesn't solve the problem with the same mechanism.
> Remember we only have two 'running modes': webapp and CLI.

Not for long. Gianugo is probably gonna work on a EJB environment soon, 
we have an Any one in the works, and in the future an 
Avalon-native-component version.

>>You say that the webapp is the primary Cocoon-Forrest method, and as you 
>>know I agree. the CLI is just a way of recreating the same 
>>user-experience by acting as a user that clicks on all links.
>>
>>BUT the user doesn't necessarily work like this, the user can also type 
>>in a URL in the address filed, even if it's not linked, but CLI won't 
>>generate this.
>>Why?
>>Because Cocoon is not an invertible function. That means that given 
>>sources and a sitemap, we *cannot* create all the possible positive 
>>requests. Which in turn means that the Cocoon CLI will never be able to 
>>create a fully equivalent site as the webapp.
>>
>>So we should acknowledge that we need a mechanism that given some rules, 
>>can reasonably create an equivalent site. Crawling is it, and it 
>>generally works well, since usually sites need to be linked from a 
>>homepage to be accessed. Site usage goes through navigation, ie links.
>>
>>Now, Cocoon is not invertible, and this is IMHO a fact. But *parts* of 
>>the sitemap *are* invertible. These parts are basically those where a 
>>complete URI sub-space is mapped to a specific pipeline, and when no 
>>parts of it have been matched before.
>>
>>
>>    <map:match pattern="sub/URI/space/**">
>>       ...
>>    </map:match>
>>
>>
>>This means that we can safely invert Cocoon here, and look at the 
>>sources to know what the result will look like.
>>
>>Conceptually, this gives me the theorical possibility of doing CLI 
>>optimizations for crawling without changing the Cocoon usage patterns. 
>>It's an optimizations inside the CLI, and nothing outside changes.
> 
> Yes!  Today's Mr Clever Award goes to Nicola, for working all this out
> and presenting it so clearly :)
> 
> So really, the CLI could short-cut any URI served with <map:read>.

Not exactly. Also non-reads can be dealt this way. It's not the read 
part that it short-cuts, but the URI space handling.
IE, if a pipeline handles all the URI space, it can safely invert that 
*match* (not the pipeline). See below.

> The "how to invert a sitemap" question also pops up when trying to
> auto-generate a linkmap (specifically, link targets), so a general
> solution (insofar as one is possible) would be very useful.
> 
> One thing I don't see: how does the CLI know that when one Javadoc file
> is referenced, it must copy all of them across?  Remember, you stripped
> the 'java:' scheme in step 1.

Actually, it simply would not crawl that URI space.

This is how it could do it as a start:

1) get all the "matches" in the sitemap; attention must be put in nested 
matches.

2) the ones ending in ** are to be taken into account.

3) for each of those matches, it inverts the match and is able to "map" 
the source and output spaces. Basically it scans all the subdirs defined 
in the match, gathers all the filenames, rewrites them as URIs using the 
inverted match, and calls cocoon on them one by one.

3b) [secong optimization] *If* the pipeline is a read, it can simply 
copy the files across and change filemanes according to the inverted 
match rule.

4) then it can start crawling the docs, remembering not to follow links 
in the spaces already generated.

In essence, we are able to not use crawling to generate parts of a 
website, so it's done much faster.

>>Now, since the theory is solved, the question slides to how to do it, 
>>especially because the pattern can have sitemap variable substitutions 
>>in it.
> 
> So we have two options:
> 
> 1) Implement a sitemap inverter, use it to create a 'lookup table' of
> shortcuttable URIs, and then integrate this into the CLI.
> 2) Say "life's too short, let's just copy the files with Ant".
> 
> Now, practically, solution 1) is going to take a _long_ time to be
> developed.  If it comes down to me, it will be developed when the linkmap
> needs it.
> 
> So, given that 2) is dead simple and 90% implemented, how about going
> with it for now, and replacing it with 1) when that arrives?  As long as
> the public interface (link syntax) is maintained, we can switch
> implementations without affecting users.

Let's define then the syntax. I don't see the need for a "file:" scheme, 
let's argue on this then.

As for individual files, we should be able to fix it by using a 
MimeTypeAction that defines the actual mime-type of the file and/or 
fixing CLI so that it doesn't append the html to unknown mimetype stuff.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Cocoon CLI - how to generate the whole site (Re: The Mythical Javadoc generator (Re: Conflict resolution))

Posted by Jeff Turner <je...@apache.org>.

On Mon, Dec 16, 2002 at 08:59:32AM +0100, Nicola Ken Barozzi wrote:
> 
> Jeff Turner wrote:
...
> >>We've established that Cocoon is not going to be invoking Javadoc.  That
> >>means that the user could generate the Javadocs _after_ they generate the
> >>Cocoon docs.
> >>
> >>To handle this possibility, the only course of action is to ignore links
> >>to external directories like Javadocs.  What alternative is there?
> 
> Yes, but I don't want this to happen, as I said in other mails.
> The fact is that for every URI sub-space we take away from Cocoon, we 
> should have something that manages it for Cocoon, and that's for *all* 
> the environments Cocoon has to offer, because Forrest is made to run in 
> all of them.

Ah, gotcha :)

Though remember, with the file: patch, the sitemap *did* serve up files,
through this rule:

<map:match pattern="**">
  <map:act type="resource-exists">
    <map:parameter name="url" value="content/{1}"/>
    <map:read src="content/{../1}"/>
  </map:act>

So it worked in both command-line and webapp.  The command-line solution
just happened to bypass the Cocoon CLI.

The file: patch has two effects:

 - Introduce schemes in xdocs, starting with a 'file:' scheme.  I think
   that schemes in general are uncontroversial.  When linkmaps arrive,
   90% of links are going to be linkmap links, so having a scheme prefix
   should be the norm. 

 - Routes around a CLI bug, by copying static files with Ant, rather than
   through the CLI.

What we really need to agree on is the first point; whether we want to
prefix static links with 'file:'.  When xdocs are swarming with linkmap:,
java:, person:, mail:, etc links, why not have file:?  Conversely, if we
want to "infer" the file: scheme, are we going to try to infer all the
other schemes?

> If we had a CLI-only Forrest, I could say ok, let's do it, let's make 
> Ant handle that, but I don't want to see different "special cases" of 
> handling these spaces. Your proposal has IMHO the same drawbacks as it 
> had before nevertheless.

Yes I see.  It hacks around a CLI bug, and introduces a mechanism by
which further potentially-hack-requiring schemes (like java:) could be
implemented.

> >>One thing we could do, is record all 'unprocessable' links in an external
> >>file, and then the Ant script responsible for invoking Cocoon can look at
> >>that, and ensure that the links won't break.  For example, say Cocoon
> >>encounters an unprocessable 'java:org.apache.foo' link.  Cocoon records
> >>that in unprocessed-files.txt, and otherwise ignore it.  Then, after the
> >><java> task has finished running Cocoon, an Ant task examines
> >>unprocessed-files.txt, and if any java: links are recorded, it invokes a
> >>Javadoc task.
> >>
> >>So we have a kind of loose coupling between Cocoon and other doc
> >>generators.  Cocoon isn't _responsible_ for generating Javadocs, but it
> >>can _cause_ Javadocs to be generated, by recording that fact that it
> >>encountered a java: link and couldn't handle it.
> 
> Hmmm... this idea is somewhat new... the problem is that it breaks down 
> with the Cocoon webapp.

It doesn't break down.  It makes the CLI solution independent of the
webapp solution.  In the case of file:, the webapp happened to have
solved the problem.

> My point is IMHO simple: if the webapp Cocoon can handle it, the CLI 
> should similarly handle it. No special cases. If Cocoon has to trigger 
> some outer system, we already have Generators, Transformers, Actions, 
> etc, no need to create another system that BTW bypasses all Cocoon 
> environment abstractions.

Yes, that's the ideal.

> IMHO, Cocoon is the last step, the publishing step. This is the only way 
> I see to keep consistency between the different Cocoon running modes. 
> Hence I don't think that triggereing actions after Cocoon CLI is going 
> to solve problems, but instead created more since it breaks the sitemap.

Not break, just doesn't solve the problem with the same mechanism.
Remember we only have two 'running modes': webapp and CLI.

> You say that the webapp is the primary Cocoon-Forrest method, and as you 
> know I agree. the CLI is just a way of recreating the same 
> user-experience by acting as a user that clicks on all links.
> 
> BUT the user doesn't necessarily work like this, the user can also type 
> in a URL in the address filed, even if it's not linked, but CLI won't 
> generate this.
> Why?
> Because Cocoon is not an invertible function. That means that given 
> sources and a sitemap, we *cannot* create all the possible positive 
> requests. Which in turn means that the Cocoon CLI will never be able to 
> create a fully equivalent site as the webapp.
> 
> So we should acknowledge that we need a mechanism that given some rules, 
> can reasonably create an equivalent site. Crawling is it, and it 
> generally works well, since usually sites need to be linked from a 
> homepage to be accessed. Site usage goes through navigation, ie links.
> 
> Now, Cocoon is not invertible, and this is IMHO a fact. But *parts* of 
> the sitemap *are* invertible. These parts are basically those where a 
> complete URI sub-space is mapped to a specific pipeline, and when no 
> parts of it have been matched before.
> 
> 
>     <map:match pattern="sub/URI/space/**">
>        ...
>     </map:match>
> 
> 
> This means that we can safely invert Cocoon here, and look at the 
> sources to know what the result will look like.
> 
> Conceptually, this gives me the theorical possibility of doing CLI 
> optimizations for crawling without changing the Cocoon usage patterns. 
> It's an optimizations inside the CLI, and nothing outside changes.

Yes!  Today's Mr Clever Award goes to Nicola, for working all this out
and presenting it so clearly :)

So really, the CLI could short-cut any URI served with <map:read>.

The "how to invert a sitemap" question also pops up when trying to
auto-generate a linkmap (specifically, link targets), so a general
solution (insofar as one is possible) would be very useful.

One thing I don't see: how does the CLI know that when one Javadoc file
is referenced, it must copy all of them across?  Remember, you stripped
the 'java:' scheme in step 1.

> Now, since the theory is solved, the question slides to how to do it, 
> especially because the pattern can have sitemap variable substitutions 
> in it.

So we have two options:

1) Implement a sitemap inverter, use it to create a 'lookup table' of
shortcuttable URIs, and then integrate this into the CLI.
2) Say "life's too short, let's just copy the files with Ant".

Now, practically, solution 1) is going to take a _long_ time to be
developed.  If it comes down to me, it will be developed when the linkmap
needs it.

So, given that 2) is dead simple and 90% implemented, how about going
with it for now, and replacing it with 1) when that arrives?  As long as
the public interface (link syntax) is maintained, we can switch
implementations without affecting users.

--Jeff

Re: Cocoon CLI - how to generate the whole site (Re: The Mythical Javadoc generator (Re: Conflict resolution))

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:
> Nicola,
> 
> Mind replying to this?  It describes why some links are unprocessable by
> the Cocoon CLI, and proposes a general system for handling these links,
> of which my file: patch was an example.

Np. I have difficulty in these days to process all the mail that passes 
in my inbox, I get more than 300 mails a day, so please do put my 
attention to important mails like these ones if I fail to see them.

> --Jeff
> 
> On Sat, Dec 14, 2002 at 04:06:18AM +1100, Jeff Turner wrote:
> 
>>On Fri, Dec 13, 2002 at 05:31:59PM +0100, Nicola Ken Barozzi wrote:
>>
>>>Jeff Turner wrote:
>>>
>>>
>>>>The javadocs are _already_ generated, and <javadoc> has already put them
>>>>in build/site/apidocs/.  Now how is Cocoon (via the CLI) going to
>>>>"publish" them?
>>>
>>>Ok, now we finally get to the actual technical point. I will take this 
>>>discussion in a general way, because the issue is in fact quite general.
>>>
>>>                              -oOo-
>>>
>>>ATM, the Cocoon CLI system is completely crawler based. This means that
>>>it starts from a list of URLs, and "crawles" the site by getting the 
>>>links from these pages, putting them in the list, purging the visited 
>>>ones, and restrting the process with those.
>>>
>>>If we only have XML documents, the system can be made to be very fast 
>>>and semantically rich.
>>>
>>>  - fast
>>>   if we get the links while processing the file, we don't
>>>   have to reparse it later for the crawling
>>>
>>>  - semantically rich
>>>    we get the links not from the output, but from the real source.
>>>    In the sitemap, the source content, with all semantics, is
>>>    tagged and used for the link gathering. So we can even gather
>>>    links from an svg file that will become a jpeg image!
>>>
>>>Things start breaking a bit down when we have to use resources that are 
>>>not transformed to XML. Examples are CSS and massive docs to be included 
>>>like javadocs.
>>>
>>>The problem is not *reading* this files via Cocoon, but getting the 
>>>links from them. In the case of CSS we need the links, in case of 
>>>Javadocs, we know the dir structure and eventually would not need them.
>>>
>>>For the CSS, the best thing is actually parsing them and passing them in 
>>>the SAX pipeline. I see no technical nor conceptual problem with it.
>>>
>>>The problem arises when we need to pass files in "bulk". In this case 
>>>they are javadocs, but what about jars, binaries, images, all things 
>>>that are not necessarily linked in the site, or that we simply want to 
>>>dump in the resulting system?
>>>
>>>This is the answer that I seek.
>>
>>There is only one answer.
>>
>>We've established that Cocoon is not going to be invoking Javadoc.  That
>>means that the user could generate the Javadocs _after_ they generate the
>>Cocoon docs.
>>
>>To handle this possibility, the only course of action is to ignore links
>>to external directories like Javadocs.  What alternative is there?

Yes, but I don't want this to happen, as I said in other mails.
The fact is that for every URI sub-space we take away from Cocoon, we 
should have something that manages it for Cocoon, and that's for *all* 
the environments Cocoon has to offer, because Forrest is made to run in 
all of them.

If we had a CLI-only Forrest, I could say ok, let's do it, let's make 
Ant handle that, but I don't want to see different "special cases" of 
handling these spaces. Your proposal has IMHO the same drawbacks as it 
had before nevertheless.

>>One thing we could do, is record all 'unprocessable' links in an external
>>file, and then the Ant script responsible for invoking Cocoon can look at
>>that, and ensure that the links won't break.  For example, say Cocoon
>>encounters an unprocessable 'java:org.apache.foo' link.  Cocoon records
>>that in unprocessed-files.txt, and otherwise ignore it.  Then, after the
>><java> task has finished running Cocoon, an Ant task examines
>>unprocessed-files.txt, and if any java: links are recorded, it invokes a
>>Javadoc task.
>>
>>So we have a kind of loose coupling between Cocoon and other doc
>>generators.  Cocoon isn't _responsible_ for generating Javadocs, but it
>>can _cause_ Javadocs to be generated, by recording that fact that it
>>encountered a java: link and couldn't handle it.

Hmmm... this idea is somewhat new... the problem is that it breaks down 
with the Cocoon webapp.

My point is IMHO simple: if the webapp Cocoon can handle it, the CLI 
should similarly handle it. No special cases. If Cocoon has to trigger 
some outer system, we already have Generators, Transformers, Actions, 
etc, no need to create another system that BTW bypasses all Cocoon 
environment abstractions.

IMHO, Cocoon is the last step, the publishing step. This is the only way 
I see to keep consistency between the different Cocoon running modes. 
Hence I don't think that triggereing actions after Cocoon CLI is going 
to solve problems, but instead created more since it breaks the sitemap.

You say that the webapp is the primary Cocoon-Forrest method, and as you 
know I agree. the CLI is just a way of recreating the same 
user-experience by acting as a user that clicks on all links.

BUT the user doesn't necessarily work like this, the user can also type 
in a URL in the address filed, even if it's not linked, but CLI won't 
generate this.
Why?
Because Cocoon is not an invertible function. That means that given 
sources and a sitemap, we *cannot* create all the possible positive 
requests. Which in turn means that the Cocoon CLI will never be able to 
create a fully equivalent site as the webapp.

So we should acknowledge that we need a mechanism that given some rules, 
can reasonably create an equivalent site. Crawling is it, and it 
generally works well, since usually sites need to be linked from a 
homepage to be accessed. Site usage goes through navigation, ie links.

Now, Cocoon is not invertible, and this is IMHO a fact. But *parts* of 
the sitemap *are* invertible. These parts are basically those where a 
complete URI sub-space is mapped to a specific pipeline, and when no 
parts of it have been matched before.

     <map:match pattern="sub/URI/space/**">
        ...
     </map:match>

This means that we can safely invert Cocoon here, and look at the 
sources to know what the result will look like.

Conceptually, this gives me the theorical possibility of doing CLI 
optimizations for crawling without changing the Cocoon usage patterns. 
It's an optimizations inside the CLI, and nothing outside changes.

Now, since the theory is solved, the question slides to how to do it, 
especially because the pattern can have sitemap variable substitutions 
in it.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Cocoon CLI - how to generate the whole site (Re: The Mythical Javadoc generator (Re: Conflict resolution))

Posted by Jeff Turner <je...@apache.org>.

Nicola,

Mind replying to this?  It describes why some links are unprocessable by
the Cocoon CLI, and proposes a general system for handling these links,
of which my file: patch was an example.


--Jeff

On Sat, Dec 14, 2002 at 04:06:18AM +1100, Jeff Turner wrote:
> On Fri, Dec 13, 2002 at 05:31:59PM +0100, Nicola Ken Barozzi wrote:
> > 
> > Jeff Turner wrote:
> > 
> > >The javadocs are _already_ generated, and <javadoc> has already put them
> > >in build/site/apidocs/.  Now how is Cocoon (via the CLI) going to
> > >"publish" them?
> > 
> > Ok, now we finally get to the actual technical point. I will take this 
> > discussion in a general way, because the issue is in fact quite general.
> > 
> >                               -oOo-
> > 
> > ATM, the Cocoon CLI system is completely crawler based. This means that
> > it starts from a list of URLs, and "crawles" the site by getting the 
> > links from these pages, putting them in the list, purging the visited 
> > ones, and restrting the process with those.
> > 
> > If we only have XML documents, the system can be made to be very fast 
> > and semantically rich.
> > 
> >   - fast
> >    if we get the links while processing the file, we don't
> >    have to reparse it later for the crawling
> > 
> >   - semantically rich
> >     we get the links not from the output, but from the real source.
> >     In the sitemap, the source content, with all semantics, is
> >     tagged and used for the link gathering. So we can even gather
> >     links from an svg file that will become a jpeg image!
> > 
> > Things start breaking a bit down when we have to use resources that are 
> > not transformed to XML. Examples are CSS and massive docs to be included 
> > like javadocs.
> > 
> > The problem is not *reading* this files via Cocoon, but getting the 
> > links from them. In the case of CSS we need the links, in case of 
> > Javadocs, we know the dir structure and eventually would not need them.
> > 
> > For the CSS, the best thing is actually parsing them and passing them in 
> > the SAX pipeline. I see no technical nor conceptual problem with it.
> > 
> > The problem arises when we need to pass files in "bulk". In this case 
> > they are javadocs, but what about jars, binaries, images, all things 
> > that are not necessarily linked in the site, or that we simply want to 
> > dump in the resulting system?
> > 
> > This is the answer that I seek.
> 
> There is only one answer.
> 
> We've established that Cocoon is not going to be invoking Javadoc.  That
> means that the user could generate the Javadocs _after_ they generate the
> Cocoon docs.
> 
> To handle this possibility, the only course of action is to ignore links
> to external directories like Javadocs.  What alternative is there?
> 
> 
> One thing we could do, is record all 'unprocessable' links in an external
> file, and then the Ant script responsible for invoking Cocoon can look at
> that, and ensure that the links won't break.  For example, say Cocoon
> encounters an unprocessable 'java:org.apache.foo' link.  Cocoon records
> that in unprocessed-files.txt, and otherwise ignore it.  Then, after the
> <java> task has finished running Cocoon, an Ant task examines
> unprocessed-files.txt, and if any java: links are recorded, it invokes a
> Javadoc task.
> 
> So we have a kind of loose coupling between Cocoon and other doc
> generators.  Cocoon isn't _responsible_ for generating Javadocs, but it
> can _cause_ Javadocs to be generated, by recording that fact that it
> encountered a java: link and couldn't handle it.
> 
> 
> --Jeff

Re: Cocoon CLI - how to generate the whole site (Re: The Mythical Javadoc generator (Re: Conflict resolution))

Posted by Jeff Turner <je...@apache.org>.

On Fri, Dec 13, 2002 at 05:31:59PM +0100, Nicola Ken Barozzi wrote:
> 
> Jeff Turner wrote:
> 
> >The javadocs are _already_ generated, and <javadoc> has already put them
> >in build/site/apidocs/.  Now how is Cocoon (via the CLI) going to
> >"publish" them?
> 
> Ok, now we finally get to the actual technical point. I will take this 
> discussion in a general way, because the issue is in fact quite general.
> 
>                               -oOo-
> 
> ATM, the Cocoon CLI system is completely crawler based. This means that
> it starts from a list of URLs, and "crawles" the site by getting the 
> links from these pages, putting them in the list, purging the visited 
> ones, and restrting the process with those.
> 
> If we only have XML documents, the system can be made to be very fast 
> and semantically rich.
> 
>   - fast
>    if we get the links while processing the file, we don't
>    have to reparse it later for the crawling
> 
>   - semantically rich
>     we get the links not from the output, but from the real source.
>     In the sitemap, the source content, with all semantics, is
>     tagged and used for the link gathering. So we can even gather
>     links from an svg file that will become a jpeg image!
> 
> Things start breaking a bit down when we have to use resources that are 
> not transformed to XML. Examples are CSS and massive docs to be included 
> like javadocs.
> 
> The problem is not *reading* this files via Cocoon, but getting the 
> links from them. In the case of CSS we need the links, in case of 
> Javadocs, we know the dir structure and eventually would not need them.
> 
> For the CSS, the best thing is actually parsing them and passing them in 
> the SAX pipeline. I see no technical nor conceptual problem with it.
> 
> The problem arises when we need to pass files in "bulk". In this case 
> they are javadocs, but what about jars, binaries, images, all things 
> that are not necessarily linked in the site, or that we simply want to 
> dump in the resulting system?
> 
> This is the answer that I seek.

There is only one answer.

We've established that Cocoon is not going to be invoking Javadoc.  That
means that the user could generate the Javadocs _after_ they generate the
Cocoon docs.

To handle this possibility, the only course of action is to ignore links
to external directories like Javadocs.  What alternative is there?

One thing we could do, is record all 'unprocessable' links in an external
file, and then the Ant script responsible for invoking Cocoon can look at
that, and ensure that the links won't break.  For example, say Cocoon
encounters an unprocessable 'java:org.apache.foo' link.  Cocoon records
that in unprocessed-files.txt, and otherwise ignore it.  Then, after the
<java> task has finished running Cocoon, an Ant task examines
unprocessed-files.txt, and if any java: links are recorded, it invokes a
Javadoc task.

So we have a kind of loose coupling between Cocoon and other doc
generators.  Cocoon isn't _responsible_ for generating Javadocs, but it
can _cause_ Javadocs to be generated, by recording that fact that it
encountered a java: link and couldn't handle it.

--Jeff

Cocoon CLI - how to generate the whole site (Re: The Mythical Javadoc generator (Re: Conflict resolution))

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:

> The javadocs are _already_ generated, and <javadoc> has already put them
> in build/site/apidocs/.  Now how is Cocoon (via the CLI) going to
> "publish" them?

Ok, now we finally get to the actual technical point. I will take this 
discussion in a general way, because the issue is in fact quite general.

                               -oOo-

ATM, the Cocoon CLI system is completely crawler based. This means that
it starts from a list of URLs, and "crawles" the site by getting the 
links from these pages, putting them in the list, purging the visited 
ones, and restrting the process with those.

If we only have XML documents, the system can be made to be very fast 
and semantically rich.

   - fast
    if we get the links while processing the file, we don't
    have to reparse it later for the crawling

   - semantically rich
     we get the links not from the output, but from the real source.
     In the sitemap, the source content, with all semantics, is
     tagged and used for the link gathering. So we can even gather
     links from an svg file that will become a jpeg image!

Things start breaking a bit down when we have to use resources that are 
not transformed to XML. Examples are CSS and massive docs to be included 
like javadocs.

The problem is not *reading* this files via Cocoon, but getting the 
links from them. In the case of CSS we need the links, in case of 
Javadocs, we know the dir structure and eventually would not need them.

For the CSS, the best thing is actually parsing them and passing them in 
the SAX pipeline. I see no technical nor conceptual problem with it.

The problem arises when we need to pass files in "bulk". In this case 
they are javadocs, but what about jars, binaries, images, all things 
that are not necessarily linked in the site, or that we simply want to 
dump in the resulting system?

This is the answer that I seek.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Vadim Gritsenko <va...@verizon.net>.

Jeff Turner wrote:

<snip/>

>Cocoon has got this sitemap,
>trying to own the _whole_ URI space.. even the bits which don't require
>XML processing!  Argh.. and this can't even be prevented, because web.xml
>allows only one url-pattern per servlet.
>

Huh? Aren't you mistaken? Right from Cocoon CVS:

  <servlet-mapping>
    <servlet-name>Cocoon2</servlet-name>
    <url-pattern>*.jsp</url-pattern>
  </servlet-mapping>

  <servlet-mapping>
    <servlet-name>Cocoon2</servlet-name>
    <url-pattern>*.html</url-pattern>
  </servlet-mapping>


Regards,
Vadim

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Jeff Turner <je...@apache.org>.

On Fri, Dec 13, 2002 at 04:07:53PM +0100, Nicola Ken Barozzi wrote:
> 
> 
> Jeff Turner wrote:
> >On Thu, Dec 12, 2002 at 05:49:03PM +0100, Nicola Ken Barozzi wrote:
> >...
> >
> >>>>Everything must be underneath Cocoon, or we have a hybrid we cannot 
> >>>>easily control. What can't Cocoon do?
> >>>
> >>>Cocoon can't generate Javadocs, which is inherently a batch-processing
> >>>job.  
> >>
> >>It can, it just needs a javadoc Generator.
> >>Give it a start dir, and it will generate a listing with the dir 
> >>Generator, generate all javadocs there, and get on by crawling the 
> >>directory links.
> >
> >
> >That is impossible.  Generating Javadocs is, as I said, "inherently a
> >batch-processing job".  Javadoc pages are highly hyperlinked, and the
> >Javadoc tool needs to have _everything_ loaded into memory, so it can
> >work out whether to write out, say, 'org.apache.foo.MyClass' or '<a
> >href="MyClass.html">MyClass</a>'.
> 
> It just needs to have the compiled jars in the classpath, and this is 
> what qdox does.

qdox works with bytecode. Javadoc works with Java source.

> I'm talking about a javadoc-like tool, not necessarily javadoc.

Well I'm talking about Javadoc ;P  Which is the primary use-case.

> >Besides which, how long do you think it would take for Cocoon to invoke
> >Javadoc once for every single source file?
> 
> It would invoke its own Generator, not the javadoc tool. Besides, the 
> Generator can always preload all the files if it wants to,

For every single file, preload all the OTHER files, then spit out just
one HTML file?  Javadoc isn't crazy enough to let you do this.

Guess I'll stop shooting the fish in the barrel now.

> there is no real technical problem.
>
> >Cocoon is a doc generating tool.  Javadoc is a doc generating tool.  One
> >renders XML, the other renders Java files.  
> 
> Errr, one renders *through* XML, it can get it from anything. Javadoc 
> gets only Java files.

Yes.  One renders XML, the other renders Java files.  Fundamentally
different types of data.

But I see your point.. Cocoon and Javadoc are not equivalent, because
Cocoon isn't just an XML processing tool.  Cocoon has got this sitemap,
trying to own the _whole_ URI space.. even the bits which don't require
XML processing!  Argh.. and this can't even be prevented, because web.xml
allows only one url-pattern per servlet.  So Cocoon gets mapped to '/*',
and that's it: no other servlets, no cgi-bins, not even a directory of
Javadoc HTML can escape.

> >Trying to invoke one from the other is... how do I put it politely...
> >um, I'd better not even try :)
> 
> No need to invoke javadoc, it would just need to have the docs in the 
> classpath and generate. On cocoon-docs they seem to be working on a 
> Generator that does this, we'll see.
> 
> Anyway, the point is not if Cocoon should *generate* these docs, as it 
> doesn't generate the sources of the docs too. But IMO it should serve 
> them all, be they handwritten or pregenerated. In Centipede we generate 
> documentv11 docs out of junit, checkstyle, and other tests, and pass 
> them to Forrest. Of course I don't want Cocoon to *generate* them, but 
> to publish them, yes.

What do you mean by "publish"?  Running Cocoon in a live webapp, serving
Javadocs through Cocoon makes sense, if only because web.xml has such
coarse-grained uri-to-servlet mapping.  But how about for CLI generation?
Do we try to mimick the Cocoon Über Alles approach of the live webapp?
The javadocs are _already_ generated, and <javadoc> has already put them
in build/site/apidocs/.  Now how is Cocoon (via the CLI) going to
"publish" them?

--Jeff

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Bertrand Delacretaz <bd...@codeconsult.ch>.

Jeff Turner wrote:
 >. . .
> That is impossible.  Generating Javadocs is, as I said, "inherently a
> batch-processing job".  
 >. . .

I agree that generating Javadoc for a _whole tree_ of java source files 
is problematic, but a JavadocGenerator can be very useful in processing 
_single_ java source files, and much easier to write I think (assuming 
the java parser won't attempt to parse the whole source code tree every 
time).

We've been discussing this on cocoon-docs lately (as a way of generating 
parts of Component Reference Pages out of javadoc tags) and someone is 
supposed to work on this as we speak. So hopefully this is going to be 
less mythical soon ;-)

-Bertrand

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Steven Noels <st...@outerthought.org>.

Jeff Turner wrote:

> There are some schemes which can't be done in XSLT:
> 
> - mail:<Message-Id> being transformed into a URL for marc.theaimsgroup or
>   wherever.
> - linkmap:<link XPath>, which requires opening linkmap.xml.  I currently
>   have a half-written transformer written for this.  But even querying
>   linkmap.xml could be done with XSLT, using the EXSLT dynamic XPath
>   extensions that most processors implement.

Yup. Other examples:

google:
trove:
xlink: (for non simple links - popping up a window with targets)
...

> How would a single configurable transformer be able to do all this?
> Wouldn't having a transformer per scheme be easiest?

This was the sentence I forgot before pressing ctrl-enter :)

> Then users can add
> their own by editing the sitemap.

We will need to provide the common (internal to Forrest) ones.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Jeff Turner <je...@apache.org>.

On Fri, Dec 13, 2002 at 05:45:11PM +0100, Steven Noels wrote:
> Jeff Turner wrote:
> 
> >So assuming that the javadocs are already in build/site/apidocs, how do
> >we handle <link href="java:org.apache.Blah">.
> >
> >Seems the simplest approach is to just strip it from the links view of
> >the resource, so Cocoon doesn't even see it.
> 
> If that assumption is true and the Transformer is/can be configured 
> accordingly: yes.
> 
> >That's what my patch did for 'file:' URLs.
> 
> It ain't me who asked to revert it ;-)
> 
> I had my doubts whether XSLT was the way to do it, though.

There are some schemes which can't be done in XSLT:

- mail:<Message-Id> being transformed into a URL for marc.theaimsgroup or
  wherever.
- linkmap:<link XPath>, which requires opening linkmap.xml.  I currently
  have a half-written transformer written for this.  But even querying
  linkmap.xml could be done with XSLT, using the EXSLT dynamic XPath
  extensions that most processors implement.

How would a single configurable transformer be able to do all this?
Wouldn't having a transformer per scheme be easiest?  Then users can add
their own by editing the sitemap.


--Jeff

> I'm sure we can come up with a decent set of 
> link-rewriting/-resolution/view-bypassing rules, so that everybody gets 
> what he wants. If that involves view bypassing for static resources: why 
> not. I know I'll be using that one.
> 
> </Steven>
> -- 
> Steven Noels                            http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> Read my weblog at              http://radio.weblogs.com/0103539/
> stevenn at outerthought.org                stevenn at apache.org
>

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Steven Noels <st...@outerthought.org>.

Jeff Turner wrote:

> So assuming that the javadocs are already in build/site/apidocs, how do
> we handle <link href="java:org.apache.Blah">.
> 
> Seems the simplest approach is to just strip it from the links view of
> the resource, so Cocoon doesn't even see it.

If that assumption is true and the Transformer is/can be configured 
accordingly: yes.

> That's what my patch did for 'file:' URLs.

It ain't me who asked to revert it ;-)

I had my doubts whether XSLT was the way to do it, though.

I'm sure we can come up with a decent set of 
link-rewriting/-resolution/view-bypassing rules, so that everybody gets 
what he wants. If that involves view bypassing for static resources: why 
not. I know I'll be using that one.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Jeff Turner wrote:
> On Fri, Dec 13, 2002 at 04:37:49PM +0100, Steven Noels wrote:
> ...
> 
>>>them to Forrest. Of course I don't want Cocoon to *generate* them, but 
>>>to publish them, yes.
>>
>>OK.
>>
>>Now how are we going to address & locate these non-generated, yet 
>>published resources? Is my FS mail anywhere near a feasible approach 
>>(i.e. LinkResolverTransformer)?
> 
> 
> So assuming that the javadocs are already in build/site/apidocs, how do
> we handle <link href="java:org.apache.Blah">.

No, they are already in src/site/apidocs, or wherever they are placed to
                        ^^^^^^
be mounted.

> Seems the simplest approach is to just strip it from the links view of
> the resource, so Cocoon doesn't even see it.
> 
> That's what my patch did for 'file:' URLs.


-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Jeff Turner <je...@apache.org>.

On Fri, Dec 13, 2002 at 04:37:49PM +0100, Steven Noels wrote:
...
> >them to Forrest. Of course I don't want Cocoon to *generate* them, but 
> >to publish them, yes.
> 
> OK.
> 
> Now how are we going to address & locate these non-generated, yet 
> published resources? Is my FS mail anywhere near a feasible approach 
> (i.e. LinkResolverTransformer)?

So assuming that the javadocs are already in build/site/apidocs, how do
we handle <link href="java:org.apache.Blah">.

Seems the simplest approach is to just strip it from the links view of
the resource, so Cocoon doesn't even see it.

That's what my patch did for 'file:' URLs.


--Jeff

> </Steven>
> -- 
> Steven Noels                            http://outerthought.org/
> Outerthought - Open Source, Java & XML Competence Support Center
> Read my weblog at              http://radio.weblogs.com/0103539/
> stevenn at outerthought.org                stevenn at apache.org
>

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Steven Noels wrote:
> Nicola Ken Barozzi wrote:

[...]

>> Anyway, the point is not if Cocoon should *generate* these docs, as it 
>> doesn't generate the sources of the docs too. But IMO it should serve 
>> them all, be they handwritten or pregenerated. In Centipede we 
>> generate documentv11 docs out of junit, checkstyle, and other tests, 
>> and pass them to Forrest. Of course I don't want Cocoon to *generate* 
>> them, but to publish them, yes.
> 
> 
> OK.
> 
> Now how are we going to address & locate these non-generated, yet 
> published resources? Is my FS mail anywhere near a feasible approach 
> (i.e. LinkResolverTransformer)?

http://marc.theaimsgroup.com/?l=forrest-dev&m=103976582923758&w=2

Concern 1 is basically your transformer concept.

Concern 2 we need a resourcemap, where we have mount points that tell us 
where to find the source (takes care of the "locate these non-generated, 
yet published resources" part)

Concern 3 are CAPs.


-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Steven Noels <st...@outerthought.org>.

Nicola Ken Barozzi wrote:

> It just needs to have the compiled jars in the classpath, and this is 
> what qdox does. I'm talking about a javadoc-like tool, not necessarily 
> javadoc.

We weren't talking about Ant neither, but just a bunch of Java classes 
which appear as implementing a build system :-D

Let's not start fuzzing about terminology. Javadoc is Javadoc, i.e. 
http://java.sun.com/j2se/javadoc/ Anything else is, ... anything else. 
Syntax highlighting Java source code with some hyperlinks hopefully 
resolving to extended or referred clases is quite something else, and 
much more simple to achieve. Of course this can be done using a 
qdoxGenerator.

>> Besides which, how long do you think it would take for Cocoon to invoke
>> Javadoc once for every single source file?
> 
> 
> It would invoke its own Generator, not the javadoc tool. Besides, the 
> Generator can always preload all the files if it wants to, there is no 
> real technical problem.

This isn't serious, Nicola.

>> Cocoon is a doc generating tool.  Javadoc is a doc generating tool.  One
>> renders XML, the other renders Java files.  
> 
> 
> Errr, one renders *through* XML, it can get it from anything. Javadoc 
> gets only Java files.
> 
>> Trying to invoke one from the
>> other is... how do I put it politely... um, I'd better not even try :)
> 
> 
> No need to invoke javadoc, it would just need to have the docs in the 
> classpath and generate. On cocoon-docs they seem to be working on a 
> Generator that does this, we'll see.

They are packaging qdox IIRC. Marc & I coached a final terms CS grad 2 
years ago to create a source code/browse system. Trust me - Javadoc is 
quite a different beast: 
http://java.sun.com/j2se/javadoc/faq/index.html#memory and 
http://developer.java.sun.com/developer/bugParade/bugs/4032755.html for 
just two examples.

But as you say, this isn't the problem at hand:

> Anyway, the point is not if Cocoon should *generate* these docs, as it 
> doesn't generate the sources of the docs too. But IMO it should serve 
> them all, be they handwritten or pregenerated. In Centipede we generate 
> documentv11 docs out of junit, checkstyle, and other tests, and pass 
> them to Forrest. Of course I don't want Cocoon to *generate* them, but 
> to publish them, yes.

OK.

Now how are we going to address & locate these non-generated, yet 
published resources? Is my FS mail anywhere near a feasible approach 
(i.e. LinkResolverTransformer)?

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:
> On Thu, Dec 12, 2002 at 05:49:03PM +0100, Nicola Ken Barozzi wrote:
> ...
> 
>>>>Everything must be underneath Cocoon, or we have a hybrid we cannot 
>>>>easily control. What can't Cocoon do?
>>>
>>>Cocoon can't generate Javadocs, which is inherently a batch-processing
>>>job.  
>>
>>It can, it just needs a javadoc Generator.
>>Give it a start dir, and it will generate a listing with the dir 
>>Generator, generate all javadocs there, and get on by crawling the 
>>directory links.
> 
> 
> That is impossible.  Generating Javadocs is, as I said, "inherently a
> batch-processing job".  Javadoc pages are highly hyperlinked, and the
> Javadoc tool needs to have _everything_ loaded into memory, so it can
> work out whether to write out, say, 'org.apache.foo.MyClass' or '<a
> href="MyClass.html">MyClass</a>'.

It just needs to have the compiled jars in the classpath, and this is 
what qdox does. I'm talking about a javadoc-like tool, not necessarily 
javadoc.

> Besides which, how long do you think it would take for Cocoon to invoke
> Javadoc once for every single source file?

It would invoke its own Generator, not the javadoc tool. Besides, the 
Generator can always preload all the files if it wants to, there is no 
real technical problem.

> Cocoon is a doc generating tool.  Javadoc is a doc generating tool.  One
> renders XML, the other renders Java files.  

Errr, one renders *through* XML, it can get it from anything. Javadoc 
gets only Java files.

> Trying to invoke one from the
> other is... how do I put it politely... um, I'd better not even try :)

No need to invoke javadoc, it would just need to have the docs in the 
classpath and generate. On cocoon-docs they seem to be working on a 
Generator that does this, we'll see.

Anyway, the point is not if Cocoon should *generate* these docs, as it 
doesn't generate the sources of the docs too. But IMO it should serve 
them all, be they handwritten or pregenerated. In Centipede we generate 
documentv11 docs out of junit, checkstyle, and other tests, and pass 
them to Forrest. Of course I don't want Cocoon to *generate* them, but 
to publish them, yes.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

The Mythical Javadoc generator (Re: Conflict resolution)

Posted by Jeff Turner <je...@apache.org>.

On Thu, Dec 12, 2002 at 05:49:03PM +0100, Nicola Ken Barozzi wrote:
...
> >>Everything must be underneath Cocoon, or we have a hybrid we cannot 
> >>easily control. What can't Cocoon do?
> >
> >Cocoon can't generate Javadocs, which is inherently a batch-processing
> >job.  
> 
> It can, it just needs a javadoc Generator.
> Give it a start dir, and it will generate a listing with the dir 
> Generator, generate all javadocs there, and get on by crawling the 
> directory links.

That is impossible.  Generating Javadocs is, as I said, "inherently a
batch-processing job".  Javadoc pages are highly hyperlinked, and the
Javadoc tool needs to have _everything_ loaded into memory, so it can
work out whether to write out, say, 'org.apache.foo.MyClass' or '<a
href="MyClass.html">MyClass</a>'.

Besides which, how long do you think it would take for Cocoon to invoke
Javadoc once for every single source file?

Cocoon is a doc generating tool.  Javadoc is a doc generating tool.  One
renders XML, the other renders Java files.  Trying to invoke one from the
other is... how do I put it politely... um, I'd better not even try :)

--Jeff

Re: Conflict resolution (Re: URI spaces: source, processing, result)

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Jeff Turner wrote:
> On Thu, Dec 12, 2002 at 03:52:28PM +0100, Nicola Ken Barozzi wrote:
> 
> ...
> 
>>As I have already written, it's about separating the link resolving via 
>>a linkmap, form deciding where the source is, from applying the correct 
>>generation.
>>
>>Let me try to explain it again:
> 
> This time making much more sense.  An example is worth a thousand words
> of theory.
> 
> 
>>Concern 1 : links
>>-------------------
>>
>> href="linkmap:my/site/home"  -(lookup)->   href="http://www.home.it/"
>> href="javadocs:MyClass"      -(resolve)->  href="javadocs/MyClass"
> 
> So, a scheme is just an elaborate alias for a path, right?

Not only that, there is also some resolving in place.
Basically link rewriting.

>>Concern 2 : source finding
>>---------------------------
>>
>> href="javadocs/MyClass"  -(sitemap)-> src="../../javadocs/MyClass"
> 
> I gather this is where the resourcemap is used.  Can you give a sitemap
> snippet to illustrate how this fits in with the bit below?

Done off the top of my head in meta-sitemap speak ;-)

  <map:match pattern="**">
    <!-- concern 1 -->
    <map:act type="linkrewrite" src="{1}">

      <!-- concern 2 -->
      <map:act type="findsource" src="{1}">

       ...<other-matches/>...

       <map:match pattern="**.xml">
         <map:generate src="{foundsource}"/>

         <!-- concern 3 -->
         <map:call resource="transform-to-document">
           <map:parameter name="src" value="{foundsource}"/>
         </map:call>
         <map:call resource="skinit">
           <map:parameter name="type" value="document2html"/>
         <map:parameter name="path" value="{1}/{2}.xml"/>
         </map:call>
       </map:match>

    </map:act>
  </map:act>

>>Concern 3 : Sitemap selection
>>------------------------------
>>
>> src="../../javadocs/MyClass"  -(CAP)-> execute ReaderPipeline
> 
> And what would be the output of this pipeline?  A single Javadoc HTML
> file?  What about the rest?

Each request outputs one document.
How this becomes a site is a concern of the frontend.

>>>I hate seeing Ant becoming part of the Forrest equation, because it will 
>>>break webapps. At the same time, I have had this REALLY BIG argument 
>>>about external resources in the office that much, that I'm pretty sure 
>>>we can't fit everything underneath Cocoon.
>>
>>Everything must be underneath Cocoon, or we have a hybrid we cannot 
>>easily control. What can't Cocoon do?
> 
> Cocoon can't generate Javadocs, which is inherently a batch-processing
> job.  

It can, it just needs a javadoc Generator.
Give it a start dir, and it will generate a listing with the dir 
Generator, generate all javadocs there, and get on by crawling the 
directory links.

>The best Cocoon can do is pass through untransformed HTML.  Fine
> for webapps, useless for CLI generation.

I don't get this.

>>CLI has problems with crawling?  then let's fix that.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Conflict resolution (Re: URI spaces: source, processing, result)

Posted by Jeff Turner <je...@apache.org>.

On Thu, Dec 12, 2002 at 03:52:28PM +0100, Nicola Ken Barozzi wrote:
>
...
> As I have already written, it's about separating the link resolving via 
> a linkmap, form deciding where the source is, from applying the correct 
> generation.
> 
> Let me try to explain it again:

This time making much more sense.  An example is worth a thousand words
of theory.

> Concern 1 : links
> -------------------
> 
>  href="linkmap:my/site/home"  -(lookup)->   href="http://www.home.it/"
>  href="javadocs:MyClass"      -(resolve)->  href="javadocs/MyClass"

So, a scheme is just an elaborate alias for a path, right?

> Concern 2 : source finding
> ---------------------------
> 
>  href="javadocs/MyClass"  -(sitemap)-> src="../../javadocs/MyClass"

I gather this is where the resourcemap is used.  Can you give a sitemap
snippet to illustrate how this fits in with the bit below?

> Concern 3 : Sitemap selection
> ------------------------------
> 
>  src="../../javadocs/MyClass"  -(CAP)-> execute ReaderPipeline

And what would be the output of this pipeline?  A single Javadoc HTML
file?  What about the rest?

> >I hate seeing Ant becoming part of the Forrest equation, because it will 
> >break webapps. At the same time, I have had this REALLY BIG argument 
> >about external resources in the office that much, that I'm pretty sure 
> >we can't fit everything underneath Cocoon.
> 
> Everything must be underneath Cocoon, or we have a hybrid we cannot 
> easily control. What can't Cocoon do?

Cocoon can't generate Javadocs, which is inherently a batch-processing
job.  The best Cocoon can do is pass through untransformed HTML.  Fine
for webapps, useless for CLI generation.

--Jeff

> CLI has problems with crawling?  then let's fix that.
>

Re: Conflict resolution (Re: URI spaces: source, processing, result)

Posted by Stefano Mazzocchi <st...@apache.org>.

Nicola Ken Barozzi wrote:

>>> Concern 1 : links
>>> -------------------
>>>
>>>  href="linkmap:my/site/home"  -(lookup)->   href="http://www.home.it/"
>>>  href="javadocs:MyClass"      -(resolve)->  href="javadocs/MyClass"
>>
>>
>>
>> I'm not sure I see the difference between 'lookup' and 'resolve'. Can 
>> you elaborate more on this?
> 
> 
> Lookup is simply something like this:
> 
>  <linkmap>
>    <link1 href="blah"/>
>    <link2 href="blah"/>
>    ...
>  </linkmap>
> 
> I ask for link1, and there is a lookup of where it is and the 
> corresponding href is taken.
> 
> As for resolving, it involves more complicated rewriting rules that have 
> to be set in code.
> 
> for example, to be able to do
> 
>   href="javadocs:package.mclass"
>             -(resolve)->href="jdocs/package/MyClass"
> 
> I need to have a JavadocsResolver that is configured with  the base 
> javadocs URI (the resulting one, not the one of the source which is 
> concern 2)
> 
>   so it can do javadocs: -> jdocs/
> 
> and then append the path to the package from the package names
> 
>   package.mclass -> package/MyClass

Gotcha.

>>> Concern 2 : source finding
>>> ---------------------------
>>>
>>>  href="javadocs/MyClass"  -(sitemap)-> src="../../javadocs/MyClass"
>>
>>
>>
>> I see this.
> 
> 
> 
>>> Concern 3 : Sitemap selection
>>> ------------------------------
>>>
>>>  src="../../javadocs/MyClass"  -(CAP)-> execute ReaderPipeline
>>
>>
>>
>> But again, I'm not sure I see your point here.
> 
> 
> Let's say it with xml->html.
> 
> If I got the final link in concern 1, and founf the source file in 
> concern 2, I need now to select the correct transformation to apply.
> 
> Since I don't partition the URI space to do this (as Cocoon is normally 
> used), I find myself that the URIs are unable to make me match a correct 
> pipeline.
> 
> The extension of the source gives me a first hint, so I can match on 
> that. But then, I have
> 
>  - destination URI with the output info
>      /URI/to/file.html
> 
>  - source file(s)
>      /path/to/file.xml
> 
> but I still don't know what to do with it. The xml can have a 
> document11DTD, a changes10 DTD, a todo10 DTD, etc.
> 
> So I use the sourcetype action to peek into the file, get the DTD, and 
> select the correct transformation. In our case we would transform DTDs 
> that are not document11 to that format, so we can skin them all later on 
> with the same skin.

Ok, got it.

>>>> I hate seeing Ant becoming part of the Forrest equation, because it 
>>>> will break webapps. At the same time, I have had this REALLY BIG 
>>>> argument about external resources in the office that much, that I'm 
>>>> pretty sure we can't fit everything underneath Cocoon.
>>>
>>>
>>> Everything must be underneath Cocoon, or we have a hybrid we cannot 
>>> easily control. 
>>
>>
>>
>> Nicola, please, try to be less emotional, it's not helping your point 
>> to come across.
>>
>> *must* is a pretty tension-creating word, expecially in a discussion 
>> with divergence.
>>
>> While I agree with your point of view, I also see Jeff's.
>>
>> Some of us would sacrifice architectural coherence (and maintenance 
>> ease) for speed, some of us would do the opposite.
>>
>> I'm not sure there is a perfect solution for this problem, but it's 
>> definately worth seeking it in a friendly and open way.
> 
> 
> IMHO premature optimization usually does not solve problems.

Jeff is not tryng to do premature optimization, but he's, quite validly, 
objecting the use of the Cocoon CLI and the reason why that code is 
maintained by a community which is not focused generation of static web 
sites unlike this one.

Sure, one of his points is performance, and I believe he vastly 
understimates the architectural challenges of writing such a CLI 
interface without sacrificing major architectural benefits.

But I'll be very happy to be proven wrong.

>>> What can't Cocoon do? CLI has problems with crawling? then let's fix 
>>> that.
>>
>>
>> I wrote the Cocoon CLI but I didn't optimize it. I'm sure there is 
>> *tons* of room for speed improvement at the algorithmic level.
> 
> 
> And work is being done in the Cocoon scratchpad.

Really? where is it? I couldn't find it.

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: Conflict resolution (Re: URI spaces: source, processing, result)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Stefano Mazzocchi wrote:
> Nicola Ken Barozzi wrote:
> 
>>
>>
>> Steven Noels wrote:
>>
>>> Jeff Turner wrote:
[...]
> I think this is a very healthy way to settling disagreement and I would 
> like forrest to keep the attitude of seeking consensus.

+1   Exactly what I want too.

>>> If I see 'SoC breakage', I want it to be defined. It is too much of a 
>>> vague term. We should all trust our own bellies, but not each belly 
>>> is equal. So let's please don't use SoC, IoC and FS without 
>>> explaining what problems we will encounter. And I hope we try to be 
>>> as openminded as possible, while keeping F. Brooks in mind: there is 
>>> NO silver bullet. Not even Cocoon.
>>
>> As I have already written, it's about separating the link resolving 
>> via a linkmap, form deciding where the source is, from applying the 
>> correct generation.
>>
>> Let me try to explain it again:
>>
>> Concern 1 : links
>> -------------------
>>
>>  href="linkmap:my/site/home"  -(lookup)->   href="http://www.home.it/"
>>  href="javadocs:MyClass"      -(resolve)->  href="javadocs/MyClass"
> 
> 
> I'm not sure I see the difference between 'lookup' and 'resolve'. Can 
> you elaborate more on this?

Lookup is simply something like this:

  <linkmap>
    <link1 href="blah"/>
    <link2 href="blah"/>
    ...
  </linkmap>

I ask for link1, and there is a lookup of where it is and the 
corresponding href is taken.

As for resolving, it involves more complicated rewriting rules that have 
to be set in code.

for example, to be able to do

   href="javadocs:package.mclass"
             -(resolve)->href="jdocs/package/MyClass"

I need to have a JavadocsResolver that is configured with  the base 
javadocs URI (the resulting one, not the one of the source which is 
concern 2)

   so it can do javadocs: -> jdocs/

and then append the path to the package from the package names

   package.mclass -> package/MyClass


>> Concern 2 : source finding
>> ---------------------------
>>
>>  href="javadocs/MyClass"  -(sitemap)-> src="../../javadocs/MyClass"
> 
> 
> I see this.


>> Concern 3 : Sitemap selection
>> ------------------------------
>>
>>  src="../../javadocs/MyClass"  -(CAP)-> execute ReaderPipeline
> 
> 
> But again, I'm not sure I see your point here.

Let's say it with xml->html.

If I got the final link in concern 1, and founf the source file in 
concern 2, I need now to select the correct transformation to apply.

Since I don't partition the URI space to do this (as Cocoon is normally 
used), I find myself that the URIs are unable to make me match a correct 
pipeline.

The extension of the source gives me a first hint, so I can match on 
that. But then, I have

  - destination URI with the output info
      /URI/to/file.html

  - source file(s)
      /path/to/file.xml

but I still don't know what to do with it. The xml can have a 
document11DTD, a changes10 DTD, a todo10 DTD, etc.

So I use the sourcetype action to peek into the file, get the DTD, and 
select the correct transformation. In our case we would transform DTDs 
that are not document11 to that format, so we can skin them all later on 
with the same skin.

>>> I hate seeing Ant becoming part of the Forrest equation, because it 
>>> will break webapps. At the same time, I have had this REALLY BIG 
>>> argument about external resources in the office that much, that I'm 
>>> pretty sure we can't fit everything underneath Cocoon.
>>
>> Everything must be underneath Cocoon, or we have a hybrid we cannot 
>> easily control. 
> 
> 
> Nicola, please, try to be less emotional, it's not helping your point to 
> come across.
> 
> *must* is a pretty tension-creating word, expecially in a discussion 
> with divergence.
> 
> While I agree with your point of view, I also see Jeff's.
> 
> Some of us would sacrifice architectural coherence (and maintenance 
> ease) for speed, some of us would do the opposite.
> 
> I'm not sure there is a perfect solution for this problem, but it's 
> definately worth seeking it in a friendly and open way.

IMHO premature optimization usually does not solve problems.

>> What can't Cocoon do? CLI has problems with crawling? then let's fix 
>> that.
> 
> I wrote the Cocoon CLI but I didn't optimize it. I'm sure there is 
> *tons* of room for speed improvement at the algorithmic level.

And work is being done in the Cocoon scratchpad.

> Instead of seeing people ranting blindly on it, I would love to bring 
> out the details and see if this community can find better ways to do the 
> same things *and* without missing its current functionality.

Yup.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Conflict resolution (Re: URI spaces: source, processing, result)

Posted by Stefano Mazzocchi <st...@apache.org>.

Nicola Ken Barozzi wrote:
> 
> 
> Steven Noels wrote:
> 
>> Jeff Turner wrote:
>>
>>> 2) Want to withdraw your -1 and trust that The World According to 
>>> Jeff is
>>> not a dystopian nightmare of mixed concerns.
>>
>>
>>
>> While Jeff still should prove his world using running code, I do trust 
>> him: he has proven in the past that he is able to live up the 
>> expectations he created himself. This doesn't mean we shouldn't do the 
>> discussion. But we should make it a discussion as Jeff says:
> 
> 
> I want it to be a discussion, and want to come to a solution.
> What I don't want is just to get round issues instead of tackling them.

Agreed, but let's not panic please.

Jeff committed something that he thought was cool, but there was 
disagreement in the community and he immediately rolled it back. Now we 
are looking for a solution thru discussion.

I think this is a very healthy way to settling disagreement and I would 
like forrest to keep the attitude of seeking consensus.

>> If I see 'SoC breakage', I want it to be defined. It is too much of a 
>> vague term. We should all trust our own bellies, but not each belly is 
>> equal. So let's please don't use SoC, IoC and FS without explaining 
>> what problems we will encounter. And I hope we try to be as openminded 
>> as possible, while keeping F. Brooks in mind: there is NO silver 
>> bullet. Not even Cocoon.
> 
> 
> As I have already written, it's about separating the link resolving via 
> a linkmap, form deciding where the source is, from applying the correct 
> generation.
> 
> Let me try to explain it again:
> 
> Concern 1 : links
> -------------------
> 
>  href="linkmap:my/site/home"  -(lookup)->   href="http://www.home.it/"
>  href="javadocs:MyClass"      -(resolve)->  href="javadocs/MyClass"

I'm not sure I see the difference between 'lookup' and 'resolve'. Can 
you elaborate more on this?

> Concern 2 : source finding
> ---------------------------
> 
>  href="javadocs/MyClass"  -(sitemap)-> src="../../javadocs/MyClass"

I see this.

> Concern 3 : Sitemap selection
> ------------------------------
> 
>  src="../../javadocs/MyClass"  -(CAP)-> execute ReaderPipeline

But again, I'm not sure I see your point here.

>> I hate seeing Ant becoming part of the Forrest equation, because it 
>> will break webapps. At the same time, I have had this REALLY BIG 
>> argument about external resources in the office that much, that I'm 
>> pretty sure we can't fit everything underneath Cocoon.
> 
> 
> Everything must be underneath Cocoon, or we have a hybrid we cannot 
> easily control. 

Nicola, please, try to be less emotional, it's not helping your point to 
come across.

*must* is a pretty tension-creating word, expecially in a discussion 
with divergence.

While I agree with your point of view, I also see Jeff's.

Some of us would sacrifice architectural coherence (and maintenance 
ease) for speed, some of us would do the opposite.

I'm not sure there is a perfect solution for this problem, but it's 
definately worth seeking it in a friendly and open way.

> What can't Cocoon do? CLI has problems with crawling? then let's fix that.

I wrote the Cocoon CLI but I didn't optimize it. I'm sure there is 
*tons* of room for speed improvement at the algorithmic level.

Instead of seeing people ranting blindly on it, I would love to bring 
out the details and see if this community can find better ways to do the 
same things *and* without missing its current functionality.

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: Conflict resolution (Re: URI spaces: source, processing, result)

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Steven Noels wrote:
> Jeff Turner wrote:
> 
>> 2) Want to withdraw your -1 and trust that The World According to Jeff is
>> not a dystopian nightmare of mixed concerns.
> 
> 
> While Jeff still should prove his world using running code, I do trust 
> him: he has proven in the past that he is able to live up the 
> expectations he created himself. This doesn't mean we shouldn't do the 
> discussion. But we should make it a discussion as Jeff says:

I want it to be a discussion, and want to come to a solution.
What I don't want is just to get round issues instead of tackling them.

> If I see 'SoC breakage', I want it to be defined. It is too much of a 
> vague term. We should all trust our own bellies, but not each belly is 
> equal. So let's please don't use SoC, IoC and FS without explaining what 
> problems we will encounter. And I hope we try to be as openminded as 
> possible, while keeping F. Brooks in mind: there is NO silver bullet. 
> Not even Cocoon.

As I have already written, it's about separating the link resolving via 
a linkmap, form deciding where the source is, from applying the correct 
generation.

Let me try to explain it again:

Concern 1 : links
-------------------

  href="linkmap:my/site/home"  -(lookup)->   href="http://www.home.it/"
  href="javadocs:MyClass"      -(resolve)->  href="javadocs/MyClass"



Concern 2 : source finding
---------------------------

  href="javadocs/MyClass"  -(sitemap)-> src="../../javadocs/MyClass"



Concern 3 : Sitemap selection
------------------------------

  src="../../javadocs/MyClass"  -(CAP)-> execute ReaderPipeline


> I hate seeing Ant becoming part of the Forrest equation, because it will 
> break webapps. At the same time, I have had this REALLY BIG argument 
> about external resources in the office that much, that I'm pretty sure 
> we can't fit everything underneath Cocoon.

Everything must be underneath Cocoon, or we have a hybrid we cannot 
easily control. What can't Cocoon do? CLI has problems with crawling? 
then let's fix that.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Conflict resolution (Re: URI spaces: source, processing, result)

Posted by Steven Noels <st...@outerthought.org>.

Jeff Turner wrote:

> 2) Want to withdraw your -1 and trust that The World According to Jeff is
> not a dystopian nightmare of mixed concerns.

While Jeff still should prove his world using running code, I do trust 
him: he has proven in the past that he is able to live up the 
expectations he created himself. This doesn't mean we shouldn't do the 
discussion. But we should make it a discussion as Jeff says:

If I see 'SoC breakage', I want it to be defined. It is too much of a 
vague term. We should all trust our own bellies, but not each belly is 
equal. So let's please don't use SoC, IoC and FS without explaining what 
problems we will encounter. And I hope we try to be as openminded as 
possible, while keeping F. Brooks in mind: there is NO silver bullet. 
Not even Cocoon.

I hate seeing Ant becoming part of the Forrest equation, because it will 
break webapps. At the same time, I have had this REALLY BIG argument 
about external resources in the office that much, that I'm pretty sure 
we can't fit everything underneath Cocoon.

We shouldn't forget the sitemap is a pattern specification language, 
similar to XSLT. I'm not a CS grad at all, but I'm pretty sure there 
exists no immediate, direct relationship between a URI space (a 
collection of URIs) and a sitemap configured as a reactor for that 
'incoming namespace'. So we might need something new for that.

No 'trust' thing here too, please, let's leave that to other mailing 
lists. And no opinions, feelings or judgement that aren't backed by 
code. I have seen quite some FUD lately on various Apache mailing lists, 
some of it based on eloquence and position rather than on facts (that 
was quite an eloquent sentence if I may say so ;-)

Or was this FUD? Dunnow.

</Steven>

ps: I'll be off-list from 17/Dec until Christmas. 'The back' needs 
surgery. No Wi-Fi access in hospitals, I gues ;-)
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Conflict resolution (Re: URI spaces: source, processing, result)

Posted by Jeff Turner <je...@apache.org>.

On Thu, Dec 12, 2002 at 09:17:36AM +0100, Nicola Ken Barozzi wrote:
> 
> Jeff Turner wrote:
> >On Wed, Dec 11, 2002 at 09:35:47PM +0100, Steven Noels wrote:
> >
> >>Nicola Ken Barozzi wrote:
> >>
> 
> >>Trying to bring the town of you together, I see there is some general 
> >>tendency to tolerate and even advocate some source:/ or scheme:/ like 
> >>think, if not for the same reason. While I love to KISS, the aspect of 
> >>having to declare my links in my future Forrest docs like <link 
> >>href="protocol:name"/> feels kinda good, protocol being things like
> >>
> >>- javadoc
> >>- code
> >>- keyword
> >>- index
> >>- raw
> >>- href (default)
> >>- linkmap (indirection layer, also to aforementioned protocols)
> >
> >
> >One I'm really keen on is "mail:", for referencing list emails by
> >Message-Id.  For example, <link
> >href="mail:3DF7A1A3.6010109@outerthought.org"> gets translated into <a
> >href="http://marc.theaimsgroup.com/....">.
> >
> >But anyway..
> >
> >Once we have 'linkmap' implemented, that accounts for 95% of relative
> >links in our xdocs.  So eventually, unprefixed links will become an
> >anachronism.  So why try to "guess" if a link is static to preserve the
> >current prefix-less status quo, when we want Forrest to eventually have
> >_all_ links prefixed?
> 
> Ask yourself, what should we use the prefix for?
> 
> In the proposal mail I sent (yes, I do feel mildly offended by your
> massive snips and sarcastic comments), I tried to explain my POV.

Sorry.  You vastly underestimated how deep the misunderstanding runs.
What to you were the core issues of the debate separated out, to me were
a collection of rehashes of previous, divisive arguments with zero
relevance to the current debate.  Hence they got snipped.

Reread in the context of this email, I can see _vaguely_ what you're on
about.

Figuring out what the hell Forrest should look like is a _hard_ job.

Figuring out what someone _else_ thinks Forrest should look like is even
harder.

When two people with completely different, semi-formed ideas start
pushing their POV on the list, it degenerates into point-by-point
bashing, with no hope of a common understanding being reached.

Two possible solutions:

a) Say 'to hell with 100% consensus'; take a majority vote, in which
   mostly bewildered bystanders vote on who sounds more convincing.
b) Both contenders forego chances to push their own POV, until they have
   a complete understanding of the other person's POV.  Then a solution
   naturally emerges.

Most of the time, a) and b) have the same net effect. a) is much faster
but less politically correct.

So, do you:

1) Have lots of time to patiently explain your POV, in multiple emails
over the coming days?  We can start with "Jeff explaining Nicola's POV"
and "Nicola explaining Jeff's POV" emails.
2) Want to withdraw your -1 and trust that The World According to Jeff is
not a dystopian nightmare of mixed concerns.

--Jeff

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Jeff Turner <je...@apache.org>.

On Fri, Dec 13, 2002 at 12:07:50AM -0800, Stefano Mazzocchi wrote:
...
> >I don't think you understood my point.  There should be no need for fancy
> >HTTPd <-> Cocoon interactions.  There should be strict IoC, with the
> >webserver, not Cocoon, in full control of the URI space, delegating small
> >portions of it to Cocoon. 
> 
> I like the fact that I can write my selectors/matchers in a pluggable way.
> 
> Should I throw that ability away for use mod_rewrite? forget it, dude!
> 
> Should I write a new apache module for every matcher and selector? and 
> then, what about flowscript? and what if my reader is not just a blatant 
> bit-2-bit copier but performs things like image rescaling and maybe has 
> to cooperate with the flow? should I write another module?
>
> Sure, if we had mod_java, then we could do that, but thinks like 
> flowscript? forget it.
> 
> the HTTPd conf file has not enough semantics to be able to drive cocoon 
> at its full power.

True.  I question whether it would have been better to put all that
effort implementing a Cocoon sitemap, into implementing a HTTPd sitemap.
Rather pointless debating it now, I agree.

> >>NOTE: httpd 2.0 has a pluggable configuration facility. in the future, 
> >>we might be able to use the cocoon sitemap to drive *httpd* directly.
> >
> >
> >Cocoon telling httpd what to do.. isn't that classic subversion of
> >control?
> 
> No, you misunderstood: it's the idea of having HTTPd using a conf file 
> written using the cocoon sitemap markup and using modules as components.

Oh.  Cool :)  Well that's just what I meant; move the sitemap goodness up
one level.  Then we wouldn't be in the ridicuous situation of
contemplating feeding 20mb of static Javadocs through Cocoon.

> But this is *wild* and too many things have to change inside HTTPd to 
> make this possible.
...
> My idea is different: let's remove the unnecessary Servlet API layer and 
> let's glue cocoon directly to httpd's butt. This is what Pier and I have 
> been thinking about in the last year or so.... since next year I'll 
> probably end up living with him, expect something to happen.

Sounds neat.  If Pier shows any signs of wanting to rewrite HTTPd to use
a Cocoon sitemap, please encourage him. :)


--Jeff

> NOTE: I'm not *mandating* this behavior to Cocoon. Just creating another 
> wrapper: CLI, Servlet API and Apache API.
> 
> Last time that Federico Pierpaolo and I lived together, Avalon, James 
> and Cocoon were born. I'm curious to see what will happen now :)
...

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Steven Noels <st...@outerthought.org>.

Miles Elam wrote:

> As a footnote to this thread, let me briefly describe what my group has 
> done with Cocoon.

Very nice and interesting intro-to-self()... Welcome!

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Miles Elam <mi...@geekspeak.org>.

As a footnote to this thread, let me briefly describe what my group has 
done with Cocoon.

Tomcat with Cocoon and no Apache HTTPd.  It's a Linux box with TUX that 
handles the static content on the filesystem far faster than Apache, and 
it has Tomcat/Cocoon on the backend doing the "real" work.

We *needed* Cocoon both for its pipelines and for its sitemap (URI 
mapping).  Cocoon was a no-brainer.  Nothing else comes close.  Then we 
looked at servlet containers.  Tomcat just happened to be what we're 
used to.  The great thing about containers is that they can be swapped. 
 I'm still out looking for a 1.4/nio-based HTTP handler with the ability 
to disable Keep-Alives (I'll explain later...).  Then there's WebDAV as 
we need to add/update content.  This is still a work in progress for us, 
but Slide seems to fit our needs well.  If bandwidth becomes a bigger 
issue than CPU, we can uncomment the line in our web.xml file that 
handles gzip encoding.

We looked briefly at how to get Apache HTTPd working with our setup 
months ago.  Then we looked at our requirements.  The only thing we 
needed Apache for was fast serving of static content -- which TUX does 
better.  All dynamic content is served from Tomcat/Cocoon.

When we looked closely, we found that Cocoon would simply be slower 
without help from an external, static processor.  We also found that 
Apache HTTPd, as robust and mature it is, lacks significant 
functionality that we find readily available in Cocoon.  When it comes 
down to it, a gzip filter replaces mod_gzip, the PHP generator (which we 
don't use anyway) replaces mod_php, the JSP generator has no analogue 
without mod_jk (or equivalent), mod_rewrite is redundant with the Cocoon 
sitemap, and on and on.

Hmmm...  Now that I think of it, there's no equivalent of mod_speling in 
Tomcat/Cocoon.

But to echo what seems to be an undercurrent, is Apache HTTPd becoming 
redundant?  If speed is your primary concern, wouldn't a few Squid 
servers in front of Tomcat/Cocoon make any speed gains from Apache flat 
file serving get lost in the noise?

-----

The hardest thing about getting our site up was making it fully 
standards-compliant and choosing good URIs.  If I had only used Apache 
(or IIS or iPlanet or just Tomcat), we may have launched faster; 
 However, that would only be because the correct solution would have 
been impossible without Cocoon.

Yeah, you could call me a Cocoon cheerleader.  We still have a great 
deal of work to do, but the URL is http://geekspeak.org/.  For all 
intents and purposes, it doesn't use Cocoon;  It is completely run by 
and controlled by Cocoon.  TUX is just window dressing -- just a flat 
file accelerator.

What I begin to wonder is whether Apache HTTPd is truly the most useful 
and flexible architecture for new websites (not already existing sites 
of course).

-----

Cocoon is the reason why I want to help with Forrest.  It is one of the 
only ways I can think of to say thank you for all of the hard work.  So 
far, I've added Nicola's Krysalis layout (imitation as the sincerest 
form of flattery and all of that) as a CSS skin to the existing XHTML 
mockup from before.  It's got a banner size issue with font-resizing and 
the lists aren't handled correctly, but it's a start.  Once finals are 
over (end of next week), I will try to continue the skin work I started 
a couple of months ago.

http://forrest.iguanacharlie.com/
http://forrest.iguanacharlie.com/krysalis.html

- Miles

P.S.  Nicola: Your layout is very elegant.  It reminds me that I am 
basically a web code monkey and not a graphic designer by a long shot. 
 If you can dream the layout up, I can probably retool it for CSS.

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Stefano Mazzocchi <st...@apache.org>.

Jeff Turner wrote:
> On Thu, Dec 12, 2002 at 07:36:24PM -0800, Stefano Mazzocchi wrote:
> 
>>Jeff Turner wrote:
>>
>>>On Thu, Dec 12, 2002 at 12:07:34PM +0000, Andrew Savory wrote:
>>>
>>>
>>>>On Thu, 12 Dec 2002, Jeff Turner wrote:
> 
> ...
> 
>>>That's what I'm saying: the sitemap is great, but it should be the
>>>"servlet container sitemap", not the "Cocoon sitemap".  There should be
>>>URI management tools (notably URL rewriting) standardized right in
>>>web.xml.
>>
>>Jeff, if you experienced *years* of fighting over the Servlet API Expert 
>>Group to get exactly what you describe, maybe you wouldn't bash the 
>>Cocoon Sitemap so much.
> 
> 
> I was not bashing the Cocoon sitemap, nor the hard-working people who
> made it a reality.  I'm saying that, in a better world, the web server
> would do all the URI management, and Cocoon would be left with just the
> job of transforming and rendering XML.

Yeah, well, that's highly debetable, but it's pointless to do so.

> This 'better world' does not exist in Java-land, so I cannot criticise
> the route Cocoon took.  But I think it _does_ exist in the non-Java
> world, if you view Apache HTTPd as the webserver, and I _suspect_ (never
> having used it) that this is how AxKit got away with not implementing a
> sitemap.

I don't know enough of AxKit to comment on this.

>>>Here is an analogy: why doesn't AxKit have a sitemap?  Because it doesn't
>>>need it.  It relies on Apache httpd's native URL management ability.  All
>>>AxKit needs are those few pipelines for defining XML transformations.
>>
>>Here, Jeff, you miss another few years of talks between myself, 
>>Pierpaolo and the HTTPd 2.0 layered I/O architects, trying to estimate 
>>the ability to have HTTPd 2.0 using something like a mod_cocoon and 
>>referring back all processing that made sense to APR (thru a JNI interface).
> 
> ...
> 
>>At that point, we *might* try to run Cocoon connected directly to the 
>>Apache module API, thus bypassing all the servlet API limitations and 
>>being able to handle back processing (like map:read, for example) to 
>>where it belongs.
> 
> 
> 'Referring back'..
> 'back processing'..
> 
> I don't think you understood my point.  There should be no need for fancy
> HTTPd <-> Cocoon interactions.  There should be strict IoC, with the
> webserver, not Cocoon, in full control of the URI space, delegating small
> portions of it to Cocoon. 

I like the fact that I can write my selectors/matchers in a pluggable way.

Should I throw that ability away for use mod_rewrite? forget it, dude!

Should I write a new apache module for every matcher and selector? and 
then, what about flowscript? and what if my reader is not just a blatant 
bit-2-bit copier but performs things like image rescaling and maybe has 
to cooperate with the flow? should I write another module?

Sure, if we had mod_java, then we could do that, but thinks like 
flowscript? forget it.

the HTTPd conf file has not enough semantics to be able to drive cocoon 
at its full power.

>>NOTE: httpd 2.0 has a pluggable configuration facility. in the future, 
>>we might be able to use the cocoon sitemap to drive *httpd* directly.
> 
> 
> Cocoon telling httpd what to do.. isn't that classic subversion of
> control?

No, you misunderstood: it's the idea of having HTTPd using a conf file 
written using the cocoon sitemap markup and using modules as components.

But this is *wild* and too many things have to change inside HTTPd to 
make this possible.

>>Once again, please, don't underestimate the effort that is put in the 
>>design of a complex software system. You're appear disrespectful and 
>>this might bite you back later on.
> 
> 
> As I said, I'm not criticizing _anything_ about the design of Cocoon or
> the Cocoon sitemap.  I am lamenting what seems to be a fundamental
> screw-up in the entire server-side Java processing stack; that the
> webserver has such poor URI management facilities that tools like Cocoon
> feel it necessary to take the job upon themselves.
> 
> I would _love_ to have a Cocoon-like sitemap in Tomcat.  Imagine.. the
> URI space could be completely independent of the filesystem!  I could
> store a whole website in a RDBMS and map it to the URI space.  IIRC,
> Craig McClanahan said that they were considering a JNDI abstraction for
> the filesystem (as Tomcat 4 does internally) in the servlet spec, but
> sadly it didn't happen.

My idea is different: let's remove the unnecessary Servlet API layer and 
let's glue cocoon directly to httpd's butt. This is what Pier and I have 
been thinking about in the last year or so.... since next year I'll 
probably end up living with him, expect something to happen.

NOTE: I'm not *mandating* this behavior to Cocoon. Just creating another 
wrapper: CLI, Servlet API and Apache API.

Last time that Federico Pierpaolo and I lived together, Avalon, James 
and Cocoon were born. I'm curious to see what will happen now :)

>>>*shrug* There's no real solution now.  The only feasible 'URI daemon' is
>>>Apache httpd.  More and more I agree with Pier Fumagalli, who had some
>>>enlightening rants on tomcat-dev about the need to treat httpd as
>>>_central_, and Tomcat as _only_ a servlet container.  Forget this idea
>>>that httpd is optional.  Put it right in the centre, use it for URI
>>>management and static resource handling, and delegate to Cocoon only the
>>>things Cocoon is good at handling.
>>
>>Should I remind you that Pierpaolo is the guy that designed the Cocoon 
>>sitemap with me?
> 
> 
> I know.. back then he was a Tomcat committer too :)

We both still are. :) But we'd rather stay away from it.

>>Believe me, we have spent so much thinking about ways to make httpd and 
>>java talking closer together that I'm sick of it. But the political and 
>>technological inertia is *not* something that should be underestimated. 
>>And I mean on both sides of the fence: servlet *and* httpd!
> 
> 
> Perhaps because you're trying to fix a _major_ architectural flaw by
> breaking IoC between the webserver and Cocoon?

No, you just misunderstood me there.

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Jeff Turner wrote:
[...] /Stefano will reply to the other points/
> I would _love_ to have a Cocoon-like sitemap in Tomcat.  Imagine.. the
> URI space could be completely independent of the filesystem!  I could
> store a whole website in a RDBMS and map it to the URI space. 

That's Cocoon. You see Cocoon as a plugin to other containers, why can't 
you see Cocoon as the container itself?

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Jeff Turner <je...@apache.org>.

On Thu, Dec 12, 2002 at 07:36:24PM -0800, Stefano Mazzocchi wrote:
> Jeff Turner wrote:
> >On Thu, Dec 12, 2002 at 12:07:34PM +0000, Andrew Savory wrote:
> >
> >>On Thu, 12 Dec 2002, Jeff Turner wrote:
...
> >That's what I'm saying: the sitemap is great, but it should be the
> >"servlet container sitemap", not the "Cocoon sitemap".  There should be
> >URI management tools (notably URL rewriting) standardized right in
> >web.xml.
> 
> Jeff, if you experienced *years* of fighting over the Servlet API Expert 
> Group to get exactly what you describe, maybe you wouldn't bash the 
> Cocoon Sitemap so much.

I was not bashing the Cocoon sitemap, nor the hard-working people who
made it a reality.  I'm saying that, in a better world, the web server
would do all the URI management, and Cocoon would be left with just the
job of transforming and rendering XML.

This 'better world' does not exist in Java-land, so I cannot criticise
the route Cocoon took.  But I think it _does_ exist in the non-Java
world, if you view Apache HTTPd as the webserver, and I _suspect_ (never
having used it) that this is how AxKit got away with not implementing a
sitemap.

...
> >Here is an analogy: why doesn't AxKit have a sitemap?  Because it doesn't
> >need it.  It relies on Apache httpd's native URL management ability.  All
> >AxKit needs are those few pipelines for defining XML transformations.
> 
> Here, Jeff, you miss another few years of talks between myself, 
> Pierpaolo and the HTTPd 2.0 layered I/O architects, trying to estimate 
> the ability to have HTTPd 2.0 using something like a mod_cocoon and 
> referring back all processing that made sense to APR (thru a JNI interface).
...
> At that point, we *might* try to run Cocoon connected directly to the 
> Apache module API, thus bypassing all the servlet API limitations and 
> being able to handle back processing (like map:read, for example) to 
> where it belongs.

'Referring back'..
'back processing'..

I don't think you understood my point.  There should be no need for fancy
HTTPd <-> Cocoon interactions.  There should be strict IoC, with the
webserver, not Cocoon, in full control of the URI space, delegating small
portions of it to Cocoon. 

> NOTE: httpd 2.0 has a pluggable configuration facility. in the future, 
> we might be able to use the cocoon sitemap to drive *httpd* directly.

Cocoon telling httpd what to do.. isn't that classic subversion of
control?

> Once again, please, don't underestimate the effort that is put in the 
> design of a complex software system. You're appear disrespectful and 
> this might bite you back later on.

As I said, I'm not criticizing _anything_ about the design of Cocoon or
the Cocoon sitemap.  I am lamenting what seems to be a fundamental
screw-up in the entire server-side Java processing stack; that the
webserver has such poor URI management facilities that tools like Cocoon
feel it necessary to take the job upon themselves.

I would _love_ to have a Cocoon-like sitemap in Tomcat.  Imagine.. the
URI space could be completely independent of the filesystem!  I could
store a whole website in a RDBMS and map it to the URI space.  IIRC,
Craig McClanahan said that they were considering a JNDI abstraction for
the filesystem (as Tomcat 4 does internally) in the servlet spec, but
sadly it didn't happen.

...
> >*shrug* There's no real solution now.  The only feasible 'URI daemon' is
> >Apache httpd.  More and more I agree with Pier Fumagalli, who had some
> >enlightening rants on tomcat-dev about the need to treat httpd as
> >_central_, and Tomcat as _only_ a servlet container.  Forget this idea
> >that httpd is optional.  Put it right in the centre, use it for URI
> >management and static resource handling, and delegate to Cocoon only the
> >things Cocoon is good at handling.
> 
> Should I remind you that Pierpaolo is the guy that designed the Cocoon 
> sitemap with me?

I know.. back then he was a Tomcat committer too :)

> Believe me, we have spent so much thinking about ways to make httpd and 
> java talking closer together that I'm sick of it. But the political and 
> technological inertia is *not* something that should be underestimated. 
> And I mean on both sides of the fence: servlet *and* httpd!

Perhaps because you're trying to fix a _major_ architectural flaw by
breaking IoC between the webserver and Cocoon?

--Jeff

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Stefano Mazzocchi <st...@apache.org>.

Jeff Turner wrote:
> On Thu, Dec 12, 2002 at 12:07:34PM +0000, Andrew Savory wrote:
> 
>>On Thu, 12 Dec 2002, Jeff Turner wrote:
>>
>>
>>>Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
>>>the sitemap would be really nice.  The overhead of a <map:read> for every
>>>Javadoc page probably wouldn't be noticed in a live webapp.  But for the
>>>command-line?  Imagine how long it would take for the crawler to grind
>>>through _every_ Javadoc page, effectively coping it unmodified from A to
>>>B.
>>
>>I guess on the plus side, everything is still controlled in one place, and
>>since it's on the command line, it can be automated. The downside, as you
>>mention, is speed. But is Cocoon significantly slower doing a map:read
>>than, say, a "cp" on the command-line? What sort of factor of trade-off
>>are we talking about?
>>
>>
>>>IMO, the _real_ problem is that the sitemap has been sold as a generic
>>>URI management system, but it works at the level of a specific XML
>>>publishing tool.  It's scope is overly broad.
>>
>>Again, it's a pro/con kind of argument: I *like* that everything is dealt
>>with within the Cocoon sitemap: my httpd/servlet engines are
>>interchangeable, but Cocoon is a constant.
> 
> 
> That's what I'm saying: the sitemap is great, but it should be the
> "servlet container sitemap", not the "Cocoon sitemap".  There should be
> URI management tools (notably URL rewriting) standardized right in
> web.xml.

Jeff, if you experienced *years* of fighting over the Servlet API Expert 
Group to get exactly what you describe, maybe you wouldn't bash the 
Cocoon Sitemap so much.

Cocoon was implemented *way before* the Servlet API EG came up with that 
stupid and useless notion of Servlet Filters. Cocoon was created to show 
how pipelining should happen *inside* the servlet, not *outside* and the 
web.xml should allow servlet componentization.

Of course, that was Cocoon1 and without a stinking JSR with politics 
attached, we were able to get *much* further than their stupid and 
useless web.xml (with hardcoded JSP semantics, yuck!)

> Here is an analogy: why doesn't AxKit have a sitemap?  Because it doesn't
> need it.  It relies on Apache httpd's native URL management ability.  All
> AxKit needs are those few pipelines for defining XML transformations.

Here, Jeff, you miss another few years of talks between myself, 
Pierpaolo and the HTTPd 2.0 layered I/O architects, trying to estimate 
the ability to have HTTPd 2.0 using something like a mod_cocoon and 
referring back all processing that made sense to APR (thru a JNI interface).

Unfortunately, we had to wait until Apache 2.0 was stable enough to try 
to implement a mod_java first (having a JVM running inside the web 
server would make several sys-adm scream and yell and leave the building 
like it was on fire!) and see what happens.

At that point, we *might* try to run Cocoon connected directly to the 
Apache module API, thus bypassing all the servlet API limitations and 
being able to handle back processing (like map:read, for example) to 
where it belongs.

NOTE: httpd 2.0 has a pluggable configuration facility. in the future, 
we might be able to use the cocoon sitemap to drive *httpd* directly.

Once again, please, don't underestimate the effort that is put in the 
design of a complex software system. You're appear disrespectful and 
this might bite you back later on.

>>>So where does Forrest stand?  We have servlet containers with wholly
>>>inadequate URI mapping.  We have Cocoon, trying to handle requests for
>>>binary content which it shouldn't, resulting is hopeless performance.  We
>>>have httpd, with good URI handling (eg mod_rewrite), but whose presence
>>>can't be relied upon.  What is the way out?
>>
>>Well, one solution might be to split the sitemap (URI mapping) from
>>the sitemap (URI handling), and have a separate URI daemon that can run in
>>front of Cocoon (and in front of httpd, Tomcat, etc too). This seems kinda
>>drastic though, and could lead to a tangled mess of rewrites at each
>>stage.
> 
> 
> *shrug* There's no real solution now.  The only feasible 'URI daemon' is
> Apache httpd.  More and more I agree with Pier Fumagalli, who had some
> enlightening rants on tomcat-dev about the need to treat httpd as
> _central_, and Tomcat as _only_ a servlet container.  Forget this idea
> that httpd is optional.  Put it right in the centre, use it for URI
> management and static resource handling, and delegate to Cocoon only the
> things Cocoon is good at handling.

Should I remind you that Pierpaolo is the guy that designed the Cocoon 
sitemap with me?

Believe me, we have spent so much thinking about ways to make httpd and 
java talking closer together that I'm sick of it. But the political and 
technological inertia is *not* something that should be underestimated. 
And I mean on both sides of the fence: servlet *and* httpd!

>>>A more current example of this principle: say we want to link to class
>>>MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
>>>Javadoc, UML and qdox representations of that resource.  Should we invent
>>>three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
>>>attribute specifying a MIME type (inventing one if we have to)?
>>
>>Hrm, ok. But if we have javadoc it is going to be HTTP/HTML, so why
>>javadoc: as a protocol? Come to think of it, why java: as a protocol? If
>>the part of any href before a colon refers to the transport, is it right
>>to effectively overload the transport with additional MIME type
>>information? 
> 
> 
> But it's not a protocol, it's a 'scheme' :)  Everyone makes this mistake
> (thanks to Marc for pointing it out).  A URI is an _identifier_.  Have a
> look at the URI RFC; it makes clear that protocol (transport mechanism)
> != scheme (identifier syntax):
> 
>  "The URI scheme (Section 3.1) defines the namespace of the URI, and thus
>  may further restrict the syntax and semantics of identifiers using that
>  scheme."
> 
> And this.. "many URL schemes are named after protocols":
> 
>   "Although many URL schemes are named after protocols, this does not
>   imply that the only way to access the URL's resource is via the named
>   protocol.  Gateways, proxies, caches, and name resolution services
>   might be used to access some resources, independent of the protocol of
>   their origin, and the resolution of some URL may require the use of
>   more than one protocol (e.g., both DNS and HTTP are typically used to
>   access an "http" URL's resource when it can't be found in a local
>   cache)."
> 
> And again, distinguishing "methods of access" from "schemes for
> identif[ication]":
> 
>  "Just as there are many different methods of access to resources, there
>  are a variety of schemes for identifying such resources.  The URI syntax
>  consists of a sequence of components separated by reserved characters,
>  with the first component defining the semantics for the remainder of the
>  URI string."
> 
> 
> So when you see <link href="java:org.apache.myproj.MyClass">, the 'java:'
> bit is simply telling the link processor that "org.apache.myproj.MyClass"
> is to be interpreted as a Java resource identifier.

I agree with your notion that 'schema != protocol', just like "URI != URL'.

But this is another story, I'll reply to that in another email.

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Stefano Mazzocchi <st...@apache.org>.

Nicola Ken Barozzi wrote:
> 
> 
> Andrew Savory wrote:
> 
>> On Fri, 13 Dec 2002, Jeff Turner wrote:
>>
>>
>>> Forget this idea that httpd is optional.  Put it right in the centre,
>>> use it for URI management and static resource handling, and delegate to
>>> Cocoon only the things Cocoon is good at handling.
>>
>>
>> I can see the sense in that ... although it does assume that 
>> everything is
>> going to be coming and going via HTTP. But as I can't think of any sane
>> alternatives, that seems reasonable ;-)
> 
> 
> IMO this should be transparent to the container(s) and not compulsory.
> I should not *rely* on this, but could use it if I wanted to get a speed 
> boost.

This is a cocoon-related discussion, but anyway, at ApacheCON I talked 
*extensively* with the mod_proxy people and they told me that mod_cache 
that ships with 2.0 is perfectly able to do exactly what we ask for: 
avoid processing requests that don't belong to Cocoon.

So, there is *some* truth in saying that map:read is a hack (I don't 
think it is, see the ImageReader I wrote, for an example of an 
not-so-trivial use of the concept), but the use of a transparent cache 
up front and the use of HTTPd 2.0 filtered I/O allows us to

  1) keep the URI-space control in one location
  2) gain lightspeed native performance (thru native up-front caching)
  3) gain distributed cachign (with proxy-friendly cocoon-generated HTTP 
headers)

And I see *no lack of elegance* in such a solution, which is also very 
fiendly in respect of the various politial frictions that happen to 
exist between java and C worlds (and something that, unfortunately, we 
have to deal with!)

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Andrew Savory wrote:
> On Fri, 13 Dec 2002, Jeff Turner wrote:
> 
> 
>>Forget this idea that httpd is optional.  Put it right in the centre,
>>use it for URI management and static resource handling, and delegate to
>>Cocoon only the things Cocoon is good at handling.
> 
> I can see the sense in that ... although it does assume that everything is
> going to be coming and going via HTTP. But as I can't think of any sane
> alternatives, that seems reasonable ;-)

IMO this should be transparent to the container(s) and not compulsory.
I should not *rely* on this, but could use it if I wanted to get a speed 
boost.

>>But it's not a protocol, it's a 'scheme' :)  Everyone makes this mistake
>>(thanks to Marc for pointing it out).
> 
> Ah, gotcha, thanks. I see the point now.
> 
>>So when you see <link href="java:org.apache.myproj.MyClass">, the 'java:'
>>bit is simply telling the link processor that "org.apache.myproj.MyClass"
>>is to be interpreted as a Java resource identifier.
>>
>>>(That's not to say I'm in favour of the +uml notation either...
>>
>>Oh, that 'text/html+javadoc' was a wild guess at what a Javadoc MIME type
>>might be, based on the observation that the SVG MIME type is
>>'text/xml+svg'
> 
> Ok. Again, my misunderstanding of your intention -- I thought you were
> aiming to add bits to MIME types, rather than using a specific "javadoc"
> type.

Yes, we had basically all come to this non-obvious consensus, after a 
long and profitable discussion.
I want the same thing to happen on current open issues.

> Thanks for the explanations!

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Andrew Savory <an...@luminas.co.uk>.

On Fri, 13 Dec 2002, Jeff Turner wrote:

> Forget this idea that httpd is optional.  Put it right in the centre,
> use it for URI management and static resource handling, and delegate to
> Cocoon only the things Cocoon is good at handling.

I can see the sense in that ... although it does assume that everything is
going to be coming and going via HTTP. But as I can't think of any sane
alternatives, that seems reasonable ;-)

> But it's not a protocol, it's a 'scheme' :)  Everyone makes this mistake
> (thanks to Marc for pointing it out).

Ah, gotcha, thanks. I see the point now.

> So when you see <link href="java:org.apache.myproj.MyClass">, the 'java:'
> bit is simply telling the link processor that "org.apache.myproj.MyClass"
> is to be interpreted as a Java resource identifier.
>
> > (That's not to say I'm in favour of the +uml notation either...
>
> Oh, that 'text/html+javadoc' was a wild guess at what a Javadoc MIME type
> might be, based on the observation that the SVG MIME type is
> 'text/xml+svg'

Ok. Again, my misunderstanding of your intention -- I thought you were
aiming to add bits to MIME types, rather than using a specific "javadoc"
type.

Thanks for the explanations!


Andrew.

-- 
Andrew Savory                                Email: andrew@luminas.co.uk
Managing Director                              Tel:  +44 (0)870 741 6658
Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
This is not an official statement or order.    Web:    www.luminas.co.uk

[OT] Re: Sitemap woes and semantic linking

Posted by Jeff Turner <je...@apache.org>.

On Thu, Dec 12, 2002 at 12:07:34PM +0000, Andrew Savory wrote:
> 
> On Thu, 12 Dec 2002, Jeff Turner wrote:
> 
> > Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
> > the sitemap would be really nice.  The overhead of a <map:read> for every
> > Javadoc page probably wouldn't be noticed in a live webapp.  But for the
> > command-line?  Imagine how long it would take for the crawler to grind
> > through _every_ Javadoc page, effectively coping it unmodified from A to
> > B.
> 
> I guess on the plus side, everything is still controlled in one place, and
> since it's on the command line, it can be automated. The downside, as you
> mention, is speed. But is Cocoon significantly slower doing a map:read
> than, say, a "cp" on the command-line? What sort of factor of trade-off
> are we talking about?
> 
> > IMO, the _real_ problem is that the sitemap has been sold as a generic
> > URI management system, but it works at the level of a specific XML
> > publishing tool.  It's scope is overly broad.
> 
> Again, it's a pro/con kind of argument: I *like* that everything is dealt
> with within the Cocoon sitemap: my httpd/servlet engines are
> interchangeable, but Cocoon is a constant.

That's what I'm saying: the sitemap is great, but it should be the
"servlet container sitemap", not the "Cocoon sitemap".  There should be
URI management tools (notably URL rewriting) standardized right in
web.xml.

Here is an analogy: why doesn't AxKit have a sitemap?  Because it doesn't
need it.  It relies on Apache httpd's native URL management ability.  All
AxKit needs are those few pipelines for defining XML transformations.

> > So where does Forrest stand?  We have servlet containers with wholly
> > inadequate URI mapping.  We have Cocoon, trying to handle requests for
> > binary content which it shouldn't, resulting is hopeless performance.  We
> > have httpd, with good URI handling (eg mod_rewrite), but whose presence
> > can't be relied upon.  What is the way out?
> 
> Well, one solution might be to split the sitemap (URI mapping) from
> the sitemap (URI handling), and have a separate URI daemon that can run in
> front of Cocoon (and in front of httpd, Tomcat, etc too). This seems kinda
> drastic though, and could lead to a tangled mess of rewrites at each
> stage.

*shrug* There's no real solution now.  The only feasible 'URI daemon' is
Apache httpd.  More and more I agree with Pier Fumagalli, who had some
enlightening rants on tomcat-dev about the need to treat httpd as
_central_, and Tomcat as _only_ a servlet container.  Forget this idea
that httpd is optional.  Put it right in the centre, use it for URI
management and static resource handling, and delegate to Cocoon only the
things Cocoon is good at handling.

> > A more current example of this principle: say we want to link to class
> > MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
> > Javadoc, UML and qdox representations of that resource.  Should we invent
> > three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
> > attribute specifying a MIME type (inventing one if we have to)?
> 
> Hrm, ok. But if we have javadoc it is going to be HTTP/HTML, so why
> javadoc: as a protocol? Come to think of it, why java: as a protocol? If
> the part of any href before a colon refers to the transport, is it right
> to effectively overload the transport with additional MIME type
> information? 

But it's not a protocol, it's a 'scheme' :)  Everyone makes this mistake
(thanks to Marc for pointing it out).  A URI is an _identifier_.  Have a
look at the URI RFC; it makes clear that protocol (transport mechanism)
!= scheme (identifier syntax):

 "The URI scheme (Section 3.1) defines the namespace of the URI, and thus
 may further restrict the syntax and semantics of identifiers using that
 scheme."

And this.. "many URL schemes are named after protocols":

  "Although many URL schemes are named after protocols, this does not
  imply that the only way to access the URL's resource is via the named
  protocol.  Gateways, proxies, caches, and name resolution services
  might be used to access some resources, independent of the protocol of
  their origin, and the resolution of some URL may require the use of
  more than one protocol (e.g., both DNS and HTTP are typically used to
  access an "http" URL's resource when it can't be found in a local
  cache)."

And again, distinguishing "methods of access" from "schemes for
identif[ication]":

 "Just as there are many different methods of access to resources, there
 are a variety of schemes for identifying such resources.  The URI syntax
 consists of a sequence of components separated by reserved characters,
 with the first component defining the semantics for the remainder of the
 URI string."

So when you see <link href="java:org.apache.myproj.MyClass">, the 'java:'
bit is simply telling the link processor that "org.apache.myproj.MyClass"
is to be interpreted as a Java resource identifier.

> (That's not to say I'm in favour of the +uml notation either... 

Oh, that 'text/html+javadoc' was a wild guess at what a Javadoc MIME type
might be, based on the observation that the SVG MIME type is
'text/xml+svg'

--Jeff

> 
> Andrew.
> 
> -- 
> Andrew Savory                                Email: andrew@luminas.co.uk
> Managing Director                              Tel:  +44 (0)870 741 6658
> Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
> This is not an official statement or order.    Web:    www.luminas.co.uk
>

Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)

Posted by Sylvain Wallez <sy...@anyware-tech.com>.

Jeff Turner wrote:

>On Thu, Dec 12, 2002 at 10:39:05AM +0000, Andrew Savory wrote:
>  
>
>>On Thu, 12 Dec 2002, Steven Noels wrote:
>>
>>    
>>
>>>could you please comment on my summary, too? Also, I'd like to hear the
>>>opinion of others.
>>>      
>>>
>>Ok, caveat: I've not used Forrest (yet), but I use Cocoon extensively.
>>
>>Jeff Turner wrote:
>>
>>    
>>
>>>Are you really suggesting that requests for Javadoc pages should go
>>>through Cocoon?
>>>
>>>But the problem is real: how do we integrate Javadocs into
>>>the URI space.
>>>
>>>I'd say write out .htaccess files with mod_rewrite rules, and figure out
>>>what the equivalent for Tomcat is.  Perhaps a separate servlet..
>>>_anything_ but Cocoon ;P
>>>      
>>>
>>Whilst I understand your concern about passing 21mb of files through
>>Cocoon untouched, I'm not sure there's a more elegant way of handling URI
>>space issues, without ending up bundling a massive amount of software with
>>Forrest (or making unrealistic software prerequisite installation
>>demands).
>>
>>So, since Cocoon _can_ handle the rewriting concern, and is already in
>>Forrest, why not use it?
>>    
>>
>
>Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
>the sitemap would be really nice.  The overhead of a <map:read> for every
>Javadoc page probably wouldn't be noticed in a live webapp.  But for the
>command-line?  Imagine how long it would take for the crawler to grind
>through _every_ Javadoc page, effectively coping it unmodified from A to
>B.
>
>IMO, the _real_ problem is that the sitemap has been sold as a generic
>URI management system, but it works at the level of a specific XML
>publishing tool.  It's scope is overly broad.  The webserver (Tomcat)
>should be defining the 'site map', and Cocoon should never even _see_
>requests for static resources.  Just like mod_jk only forwards servlet
>and JSP requests on to Tomcat, Tomcat should only forward requests for
>XML processing on to Cocoon.  So <map:read> is a hack to handle requests
>that Cocoon should never have been asked to handle in the first place.
>

No flame intended, but I'd like to explain why I disagree with 
<map:read> being a hack.

It can only be considered so in the specific case where a mod_rewrite 
rule can translate the request URI to a _file_ name. This is very 
restrictive compared to what is possible in Cocoon with and around a 
reader, and there are many more uses that don't fit in this.

For example, I use it on some projects to retrieve binary attachements 
to documents in an SQL database (BLOBs), or to access remote CVS 
repositories. This only uses the standard ResourceReader with specific 
sources, but we can also have some very specialized readers that can 
produce binary content from almost anything.

The world isn't full of XML, and Readers are the way for Cocoon to serve 
content that cannot be defined through XML processing pipelines.

>So where does Forrest stand?  We have servlet containers with wholly
>inadequate URI mapping.  We have Cocoon, trying to handle requests for
>binary content which it shouldn't, resulting is hopeless performance.  We
>have httpd, with good URI handling (eg mod_rewrite), but whose presence
>can't be relied upon.  What is the way out?
>

The way out may be to have equivalent mod_rewrite configuration and 
sitemap snippets for binary source handling. This allows the Cocoon app 
to be self-contained, yet being able to be deployed behing a 
mod_rewrite-enabled httpd.

Also, Cocoon's CLI is slow on handling XML-processed content since it 
processes it twice : once to extract the links, and once to produce the 
file. Using the recent work on caching-points in Cocoon 2.1, we can 
envision some significant speed improvement if Cocoon's crawler takes 
care of this.

Ah, and something that Cocoon's crawler can do but wget can't is follow 
links between generated PDFs...

>>I like the idea of link naming schemes, but I'm really worried about the
>>idea of specifying MIME types as link attributes. This seems like a nasty
>>hack: should we be specifying MIME types?
>>    
>>
>
>There is some context you're missing there..
>
>http://marc.theaimsgroup.com/?l=forrest-dev&m=103097808318773&w=2
>
>The theory is that links should _not_ specify MIME type of linked-to docs
>by default.  The MIME type should be inferred by the type of the linking
>document, and what's available.  Eg, <link href="site:/primer"> links to
>"The Forrest Primer" in whatever form it's available.
>
>However it is also sometimes desirable to specify the MIME type
>explicitly.  So rather than corrupt our nice semantic URLs, eg <link
>href="site:/primer.pdf">, we should express the type as a separate
>attribute: <link href="site:/primer" type="application/pdf">.
>
>A more current example of this principle: say we want to link to class
>MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
>Javadoc, UML and qdox representations of that resource.  Should we invent
>three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
>attribute specifying a MIME type (inventing one if we have to)?
>

A positive note to end this post : I find these MIME-typed links a very 
elegant solution to cleanly separate the referred content from its 
presentation.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }

Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Andrew Savory wrote:
> On Thu, 12 Dec 2002, Jeff Turner wrote:
> 
> 
>>Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
>>the sitemap would be really nice.  The overhead of a <map:read> for every
>>Javadoc page probably wouldn't be noticed in a live webapp.  But for the
>>command-line?  Imagine how long it would take for the crawler to grind
>>through _every_ Javadoc page, effectively coping it unmodified from A to
>>B.
> 
> I guess on the plus side, everything is still controlled in one place, and
> since it's on the command line, it can be automated. The downside, as you
> mention, is speed. But is Cocoon significantly slower doing a map:read
> than, say, a "cp" on the command-line? What sort of factor of trade-off
> are we talking about?

The actual problem is the CLI Cocoon, that crawles links.
The server version does not have this problem. So it's a CLI issue, not 
a Cocoon issue.

>>IMO, the _real_ problem is that the sitemap has been sold as a generic
>>URI management system, but it works at the level of a specific XML
>>publishing tool.  It's scope is overly broad.
> 
> Again, it's a pro/con kind of argument: I *like* that everything is dealt
> with within the Cocoon sitemap: my httpd/servlet engines are
> interchangeable, but Cocoon is a constant.

Not only. Cocoon is *not* a servlet app. It's an XML processing engine. 
So it should manage everything it serves, so that its apps can be ported 
to every environment Cocoon can run in.

>>So where does Forrest stand?  We have servlet containers with wholly
>>inadequate URI mapping.  We have Cocoon, trying to handle requests for
>>binary content which it shouldn't, resulting is hopeless performance.  We
>>have httpd, with good URI handling (eg mod_rewrite), but whose presence
>>can't be relied upon.  What is the way out?
> 
> Well, one solution might be to split the sitemap (URI mapping) from
> the sitemap (URI handling), and have a separate URI daemon that can run in
> front of Cocoon (and in front of httpd, Tomcat, etc too). This seems kinda
> drastic though, and could lead to a tangled mess of rewrites at each
> stage.

Exactly. These problems are not necessary bady things that Cocoon has 
but bugs or missing features. We should not circumvent them with hacks, 
but be able to manage them better in Cocoon.

>>There is some context you're missing there..
>>
>>http://marc.theaimsgroup.com/?l=forrest-dev&m=103097808318773&w=2
> 
> 
> Ok, gotcha. That seems fair, apologies for rehashing old discussions.
> 
> 
>>A more current example of this principle: say we want to link to class
>>MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
>>Javadoc, UML and qdox representations of that resource.  Should we invent
>>three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
>>attribute specifying a MIME type (inventing one if we have to)?
> 
> Hrm, ok. But if we have javadoc it is going to be HTTP/HTML, so why
> javadoc: as a protocol? Come to think of it, why java: as a protocol? If
> the part of any href before a colon refers to the transport, is it right
> to effectively overload the transport with additional MIME type
> information? (That's not to say I'm in favour of the +uml notation
> either... do we need another attribute?)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)

Posted by Stefano Mazzocchi <st...@apache.org>.

Andrew Savory wrote:
> On Thu, 12 Dec 2002, Jeff Turner wrote:
> 
> 
>>Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
>>the sitemap would be really nice.  The overhead of a <map:read> for every
>>Javadoc page probably wouldn't be noticed in a live webapp.  But for the
>>command-line?  Imagine how long it would take for the crawler to grind
>>through _every_ Javadoc page, effectively coping it unmodified from A to
>>B.
> 
> 
> I guess on the plus side, everything is still controlled in one place, and
> since it's on the command line, it can be automated. The downside, as you
> mention, is speed. But is Cocoon significantly slower doing a map:read
> than, say, a "cp" on the command-line? What sort of factor of trade-off
> are we talking about?

A file copy is a native operation. In a modern operating system with a 
modern JVM it can be performed using DMA. So it's lightspeed compared to 
anything that cocoon will be able to do.

But we are talking about 'bulk copy'.

If we talk about scanning for links (and any wget-like crawler, 
CocoonCLI or others, have to do this), then there is no technical reason 
why the Cocoon CLI has to be slower than, say, a wget java clone.

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)

Posted by Andrew Savory <an...@luminas.co.uk>.

On Thu, 12 Dec 2002, Jeff Turner wrote:

> Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
> the sitemap would be really nice.  The overhead of a <map:read> for every
> Javadoc page probably wouldn't be noticed in a live webapp.  But for the
> command-line?  Imagine how long it would take for the crawler to grind
> through _every_ Javadoc page, effectively coping it unmodified from A to
> B.

I guess on the plus side, everything is still controlled in one place, and
since it's on the command line, it can be automated. The downside, as you
mention, is speed. But is Cocoon significantly slower doing a map:read
than, say, a "cp" on the command-line? What sort of factor of trade-off
are we talking about?

> IMO, the _real_ problem is that the sitemap has been sold as a generic
> URI management system, but it works at the level of a specific XML
> publishing tool.  It's scope is overly broad.

Again, it's a pro/con kind of argument: I *like* that everything is dealt
with within the Cocoon sitemap: my httpd/servlet engines are
interchangeable, but Cocoon is a constant.

> So where does Forrest stand?  We have servlet containers with wholly
> inadequate URI mapping.  We have Cocoon, trying to handle requests for
> binary content which it shouldn't, resulting is hopeless performance.  We
> have httpd, with good URI handling (eg mod_rewrite), but whose presence
> can't be relied upon.  What is the way out?

Well, one solution might be to split the sitemap (URI mapping) from
the sitemap (URI handling), and have a separate URI daemon that can run in
front of Cocoon (and in front of httpd, Tomcat, etc too). This seems kinda
drastic though, and could lead to a tangled mess of rewrites at each
stage.

> There is some context you're missing there..
>
> http://marc.theaimsgroup.com/?l=forrest-dev&m=103097808318773&w=2

Ok, gotcha. That seems fair, apologies for rehashing old discussions.

> A more current example of this principle: say we want to link to class
> MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
> Javadoc, UML and qdox representations of that resource.  Should we invent
> three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
> attribute specifying a MIME type (inventing one if we have to)?

Hrm, ok. But if we have javadoc it is going to be HTTP/HTML, so why
javadoc: as a protocol? Come to think of it, why java: as a protocol? If
the part of any href before a colon refers to the transport, is it right
to effectively overload the transport with additional MIME type
information? (That's not to say I'm in favour of the +uml notation
either... do we need another attribute?)

Andrew.

-- 
Andrew Savory                                Email: andrew@luminas.co.uk
Managing Director                              Tel:  +44 (0)870 741 6658
Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
This is not an official statement or order.    Web:    www.luminas.co.uk

Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)

Posted by Jeff Turner <je...@apache.org>.

On Thu, Dec 12, 2002 at 10:39:05AM +0000, Andrew Savory wrote:
> 
> On Thu, 12 Dec 2002, Steven Noels wrote:
> 
> > could you please comment on my summary, too? Also, I'd like to hear the
> > opinion of others.
> 
> Ok, caveat: I've not used Forrest (yet), but I use Cocoon extensively.
> 
> Jeff Turner wrote:
> 
> > Are you really suggesting that requests for Javadoc pages should go
> > through Cocoon?
> >
> > But the problem is real: how do we integrate Javadocs into
> > the URI space.
> >
> > I'd say write out .htaccess files with mod_rewrite rules, and figure out
> > what the equivalent for Tomcat is.  Perhaps a separate servlet..
> > _anything_ but Cocoon ;P
> 
> Whilst I understand your concern about passing 21mb of files through
> Cocoon untouched, I'm not sure there's a more elegant way of handling URI
> space issues, without ending up bundling a massive amount of software with
> Forrest (or making unrealistic software prerequisite installation
> demands).
> 
> So, since Cocoon _can_ handle the rewriting concern, and is already in
> Forrest, why not use it?

Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
the sitemap would be really nice.  The overhead of a <map:read> for every
Javadoc page probably wouldn't be noticed in a live webapp.  But for the
command-line?  Imagine how long it would take for the crawler to grind
through _every_ Javadoc page, effectively coping it unmodified from A to
B.

IMO, the _real_ problem is that the sitemap has been sold as a generic
URI management system, but it works at the level of a specific XML
publishing tool.  It's scope is overly broad.  The webserver (Tomcat)
should be defining the 'site map', and Cocoon should never even _see_
requests for static resources.  Just like mod_jk only forwards servlet
and JSP requests on to Tomcat, Tomcat should only forward requests for
XML processing on to Cocoon.  So <map:read> is a hack to handle requests
that Cocoon should never have been asked to handle in the first place.

So where does Forrest stand?  We have servlet containers with wholly
inadequate URI mapping.  We have Cocoon, trying to handle requests for
binary content which it shouldn't, resulting is hopeless performance.  We
have httpd, with good URI handling (eg mod_rewrite), but whose presence
can't be relied upon.  What is the way out?

> I like the idea of link naming schemes, but I'm really worried about the
> idea of specifying MIME types as link attributes. This seems like a nasty
> hack: should we be specifying MIME types?

There is some context you're missing there..

http://marc.theaimsgroup.com/?l=forrest-dev&m=103097808318773&w=2

The theory is that links should _not_ specify MIME type of linked-to docs
by default.  The MIME type should be inferred by the type of the linking
document, and what's available.  Eg, <link href="site:/primer"> links to
"The Forrest Primer" in whatever form it's available.

However it is also sometimes desirable to specify the MIME type
explicitly.  So rather than corrupt our nice semantic URLs, eg <link
href="site:/primer.pdf">, we should express the type as a separate
attribute: <link href="site:/primer" type="application/pdf">.

A more current example of this principle: say we want to link to class
MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
Javadoc, UML and qdox representations of that resource.  Should we invent
three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
attribute specifying a MIME type (inventing one if we have to)?

HTH,

--Jeff

> 
> Andrew.
> 
> -- 
> Andrew Savory                                Email: andrew@luminas.co.uk
> Managing Director                              Tel:  +44 (0)870 741 6658
> Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
> This is not an official statement or order.    Web:    www.luminas.co.uk
>

Re: URI spaces: source, processing, result

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Steven Noels wrote:

> could you please comment on my summary, too? Also, I'd like to hear the 
> opinion of others.

Ok, done.
IMHO my answer to jeff also answered your points, I didn't want to 
ignore your post, sorry.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: URI spaces: source, processing, result

Posted by Andrew Savory <an...@luminas.co.uk>.

On Thu, 12 Dec 2002, Steven Noels wrote:

> could you please comment on my summary, too? Also, I'd like to hear the
> opinion of others.

Ok, caveat: I've not used Forrest (yet), but I use Cocoon extensively.

Jeff Turner wrote:

> Are you really suggesting that requests for Javadoc pages should go
> through Cocoon?
>
> But the problem is real: how do we integrate Javadocs into
> the URI space.
>
> I'd say write out .htaccess files with mod_rewrite rules, and figure out
> what the equivalent for Tomcat is.  Perhaps a separate servlet..
> _anything_ but Cocoon ;P

Whilst I understand your concern about passing 21mb of files through
Cocoon untouched, I'm not sure there's a more elegant way of handling URI
space issues, without ending up bundling a massive amount of software with
Forrest (or making unrealistic software prerequisite installation
demands).

So, since Cocoon _can_ handle the rewriting concern, and is already in
Forrest, why not use it?

I like the idea of link naming schemes, but I'm really worried about the
idea of specifying MIME types as link attributes. This seems like a nasty
hack: should we be specifying MIME types? Why would we want to know that
the PDF I'm linking to is application/pdf? Or even worse,
application/pdf+uml? Seems like it would make my job harder.

I guess I'd better go track down the arguments in favour of this in the
mailing list archive.

Andrew.

-- 
Andrew Savory                                Email: andrew@luminas.co.uk
Managing Director                              Tel:  +44 (0)870 741 6658
Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
This is not an official statement or order.    Web:    www.luminas.co.uk

Re: URI spaces: source, processing, result

Posted by Steven Noels <st...@outerthought.org>.

Nicola Ken Barozzi wrote:

> Ask yourself, what should we use the prefix for?
> 
> In the proposal mail I sent (yes, I do feel mildly offended by your
> massive snips and sarcastic comments), I tried to explain my POV.

Nicola,

could you please comment on my summary, too? Also, I'd like to hear the 
opinion of others.

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: URI spaces: source, processing, result

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Jeff Turner wrote:
> On Wed, Dec 11, 2002 at 09:35:47PM +0100, Steven Noels wrote:
> 
>>Nicola Ken Barozzi wrote:
>>

>>Trying to bring the town of you together, I see there is some general 
>>tendency to tolerate and even advocate some source:/ or scheme:/ like 
>>think, if not for the same reason. While I love to KISS, the aspect of 
>>having to declare my links in my future Forrest docs like <link 
>>href="protocol:name"/> feels kinda good, protocol being things like
>>
>> - javadoc
>> - code
>> - keyword
>> - index
>> - raw
>> - href (default)
>> - linkmap (indirection layer, also to aforementioned protocols)
> 
> 
> One I'm really keen on is "mail:", for referencing list emails by
> Message-Id.  For example, <link
> href="mail:3DF7A1A3.6010109@outerthought.org"> gets translated into <a
> href="http://marc.theaimsgroup.com/....">.
> 
> But anyway..
> 
> Once we have 'linkmap' implemented, that accounts for 95% of relative
> links in our xdocs.  So eventually, unprefixed links will become an
> anachronism.  So why try to "guess" if a link is static to preserve the
> current prefix-less status quo, when we want Forrest to eventually have
> _all_ links prefixed?

Ask yourself, what should we use the prefix for?

In the proposal mail I sent (yes, I do feel mildly offended by your
massive snips and sarcastic comments), I tried to explain my POV.
Since it didn't get through, I assume that I wasn't able to convey the
message, so please give me the time and possibility of getting my points
through.

You are mixing three things in the scheme:

   1 - link translation  (scheme -> real uri space)
   2 - source definition (where the source is)
   3 - generation method (what is to be done with it)

I totally agree that we should use it for link translation, but I think
that the other two points are concerns of the sitemap. Hence the use of
a "resource-exists" feature and mount points to address point 2 and CAPs
to address point 3.

Link translation is just a facility, that can be used or skipped.
Let's say that translation has been done, and I have the resulting URI
ready, what does Cocoon do with it?

It would select based on the source mime-type inferred from file
extensions and DTDs from CAPs. Found what to do with the file, it then
resolves the source of the file using also mount points.

This gives you the best flexibility and separation between all these
concerns.

                                -  ~  -

What the user sees, is that wherever they put the file, they can link to
it without specifying the extension, and Cocoon serves it. I find it
quite straightforward.

If they want to refer to links via shorthands, they define them in the
linkmap; there are special shorthand schemes that help in the definition
of complex links like the mail archives.

Finally, if they want to bring in stuff from outer dirs, they mount
them, and Cocoon treats them as if they were local.

> Here is an analogy with the seemingly uncontroversial 'linkmap' scheme.
> How should 'linkmap' links be implemented?
> 
> a) Have an explicit prefix, like <link href="site:/primer">
> b) Have unprefixed links like <link href="primer">, and have the CLI open
> the linkmap.xml file, and check if a 'primer' entry exists.  If so, treat
> as a linkmap link.
> 
> This same choice of implementation; explicit or inferred, needs to be
> made for every potential scheme.  We have a clear fork in the road:
> 
> 1) _All_ schemes are explicit.  Implemented with XSLTs and Transformers
>    in Forrest
> 
> 2) Some schemes (like javadoc:) are explicit, and others (like file:, and
>    perhaps linkmap:?) are inferred.  Implicit schemes are implemented
>    with CLI modifications and 'conditional' sitemap hacks like
>    resource-exists.

I'm confortable with both, and think that if we define an implicit
scheme it should be only one.

>>and name being a filename or a named resource name depending on the 
>>protocol and the eventual indirection
>>
>>In terms of implementation, some of this will point towards SAXable 
>>information that must be passed across Cocoon pipelines, some of this is 
>>external data, perhaps binary in the sense of not being based on XML and 
>>not serializable as such. So some of this information will require its 
>>own Source implementation or Generator, some will just need to be copied 
>>around, either as a file, or as a collection of files. Some of it will 
>>require link augmentation or resolution, if linkmaps/indirection has 
>>been used.
>>
>>With regards to greater datasets, such as massive Javadoc collections, 
>>I'm not sure whether we would need to try and keep this within the 
>>concern of Forrest, at least in terms of the static generation of it - 
>>there exist tools to do that.
> 
> Yes, I would prefer for Javadoc invocation to be the concern of whatever
> invoked Forrest, eg, an Ant script.

Yes. Forrest, as said many times before, should be /eventually/
concerned about the skinning of the sources that other programs put
there. For example, I want to use the xjavadoc doclet to make xml
javadocs, and Forrest can skin them.

>>In terms of doing something with tools like qdox, I'm not sure - I
>>think this can be a value for the Forrest user: <link
>>href="src:/org.apache.cocoon.foobar"/> bring up a syntax-highlighted
>>version of that class.
> 
> 
> Remember how we kinda decided that link MIME type should be specified as
> a separate attribute?  Well here is a neat example:
> 
> <link href="java:/org.apache.cocoon.foobar" type="text/html+javadoc"/>
> <link href="java:/org.apache.cocoon.foobar" type="text/html+uml"/>
> <link href="java:/org.apache.cocoon.foobar" type="text/html+qdox"/>
> 
> So the identifier (URI) stays the same, and 'type' specifies different
> representations of that URI.  This illustrates why 'java:' is preferable
> 'javadoc:'.

I don't like this, because the +xxx stuff is not part of a mime type.
Forrest should not be concerned, as we have previously decided, about
generating the above stuff, but only eventually skinning it.

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
              - verba volant, scripta manent -
     (discussions get forgotten, just code remains)
---------------------------------------------------------------------

RE: URI spaces: source, processing, result

Posted by Robert Koberg <ro...@koberg.com>.

ooops... I meant for these pages to have unique IDs - sorry for any confusion.
> 
> <folder name="docroot">
>   <page id="abcd"/>
>   <folder name="folder1">
>     <page id="f1abc"/>
>     <page id="f1bcd"/>
>     <page id="f1cde"/>
>     <folder name="folder11">
>       <page id="f1abc"/>
>       <page id="f1bcd"/>
>       <page id="f1cde"/>
>     </folder>
>   </folder>
> </folder>
>

RE: URI spaces: source, processing, result

Posted by Robert Koberg <ro...@koberg.com>.

Hi,

> -----Original Message-----
> From: Jeff Turner [mailto:jefft@apache.org]
> Sent: Wednesday, December 11, 2002 10:27 PM
<snip/>
>
> Here is an analogy with the seemingly uncontroversial 'linkmap' scheme.
> How should 'linkmap' links be implemented?
>
> a) Have an explicit prefix, like <link href="site:/primer">
> b) Have unprefixed links like <link href="primer">, and have the CLI open
> the linkmap.xml file, and check if a 'primer' entry exists.  If so, treat
> as a linkmap link.

I am failing to understand why this is a concern of some post process. Are you
not trying to transform one representation to another? To me, the 'linkmap.xml'
should be accessed at transformation time to transform the link.

On the linkmap: I would not like to see a list of URIs (or URLs). Is forrest
intended to be only for well established projects? That is, those projects that
have their site architecture set in stone. Should forrest be used for projects
that might need to rearrange the site structure? If it is for a new site/project
then it would be nice to be able easily move things around without having to
hand edit the linkmap to change the URI/URL string for each changed item. If you
have a linkmap like:

<folder name="docroot">
  <page id="abcd"/>
  <folder name="folder1">
    <page id="f1abc"/>
    <page id="f1bcd"/>
    <page id="f1cde"/>
    <folder name="folder11">
      <page id="f1abc"/>
      <page id="f1bcd"/>
      <page id="f1cde"/>
    </folder>
  </folder>
</folder>

After you have created this initial structure, generate the site, and then some
people look at it and determine it is not the best, usability-wise. It is
determined that folder11 would be better served as a child of the docroot. Using
a structure like the above you simply move the folder11 nodeset to be a child of
the docroot. There is no need to rewrite strings telling where these things are.
The transformation finds the ID of the item in question and recursively builds
the path as it is structured in the linkmap at time of generation. Now, I think
the objection to this is that it is too hard to understand or do recursion to
build these paths? Is that the problem?

best,
-Rob


<snip/>

Re: URI spaces: source, processing, result

Posted by Jeff Turner <je...@apache.org>.

On Wed, Dec 11, 2002 at 09:35:47PM +0100, Steven Noels wrote:
> Nicola Ken Barozzi wrote:
> 
> >
> >Does this make sense?
> >
> 
> I really don't know. I have been talking offlist to both you and Jeff, 
> and some of this is above my drugged head these days.

Heaven help us if forrest-dev ever becomes as incomprehensible as
avalon-dev :P

> Trying to bring the town of you together, I see there is some general 
> tendency to tolerate and even advocate some source:/ or scheme:/ like 
> think, if not for the same reason. While I love to KISS, the aspect of 
> having to declare my links in my future Forrest docs like <link 
> href="protocol:name"/> feels kinda good, protocol being things like
> 
>  - javadoc
>  - code
>  - keyword
>  - index
>  - raw
>  - href (default)
>  - linkmap (indirection layer, also to aforementioned protocols)

One I'm really keen on is "mail:", for referencing list emails by
Message-Id.  For example, <link
href="mail:3DF7A1A3.6010109@outerthought.org"> gets translated into <a
href="http://marc.theaimsgroup.com/....">.

But anyway..

Once we have 'linkmap' implemented, that accounts for 95% of relative
links in our xdocs.  So eventually, unprefixed links will become an
anachronism.  So why try to "guess" if a link is static to preserve the
current prefix-less status quo, when we want Forrest to eventually have
_all_ links prefixed?

Here is an analogy with the seemingly uncontroversial 'linkmap' scheme.
How should 'linkmap' links be implemented?

a) Have an explicit prefix, like <link href="site:/primer">
b) Have unprefixed links like <link href="primer">, and have the CLI open
the linkmap.xml file, and check if a 'primer' entry exists.  If so, treat
as a linkmap link.

This same choice of implementation; explicit or inferred, needs to be
made for every potential scheme.  We have a clear fork in the road:

1) _All_ schemes are explicit.  Implemented with XSLTs and Transformers
   in Forrest

2) Some schemes (like javadoc:) are explicit, and others (like file:, and
   perhaps linkmap:?) are inferred.  Implicit schemes are implemented
   with CLI modifications and 'conditional' sitemap hacks like
   resource-exists.

Are we ready for a vote yet?

> and name being a filename or a named resource name depending on the 
> protocol and the eventual indirection
> 
> In terms of implementation, some of this will point towards SAXable 
> information that must be passed across Cocoon pipelines, some of this is 
> external data, perhaps binary in the sense of not being based on XML and 
> not serializable as such. So some of this information will require its 
> own Source implementation or Generator, some will just need to be copied 
> around, either as a file, or as a collection of files. Some of it will 
> require link augmentation or resolution, if linkmaps/indirection has 
> been used.
> 
> With regards to greater datasets, such as massive Javadoc collections, 
> I'm not sure whether we would need to try and keep this within the 
> concern of Forrest, at least in terms of the static generation of it - 
> there exist tools to do that.

Yes, I would prefer for Javadoc invocation to be the concern of whatever
invoked Forrest, eg, an Ant script.

> In terms of doing something with tools like qdox, I'm not sure - I
> think this can be a value for the Forrest user: <link
> href="src:/org.apache.cocoon.foobar"/> bring up a syntax-highlighted
> version of that class.

Remember how we kinda decided that link MIME type should be specified as
a separate attribute?  Well here is a neat example:

<link href="java:/org.apache.cocoon.foobar" type="text/html+javadoc"/>
<link href="java:/org.apache.cocoon.foobar" type="text/html+uml"/>
<link href="java:/org.apache.cocoon.foobar" type="text/html+qdox"/>

So the identifier (URI) stays the same, and 'type' specifies different
representations of that URI.  This illustrates why 'java:' is preferable
'javadoc:'.

<snip good implementation summary>

--Jeff

Re: URI spaces: source, processing, result

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Steven Noels wrote:
> Nicola Ken Barozzi wrote:
> 
>> Does this make sense?
>>
> 
> I really don't know. I have been talking offlist to both you and Jeff, 
> and some of this is above my drugged head these days.
> 
> Trying to bring the town of you together, I see there is some general 
> tendency to tolerate and even advocate some source:/ or scheme:/ like 
> think, if not for the same reason. While I love to KISS, the aspect of 
> having to declare my links in my future Forrest docs like <link 
> href="protocol:name"/> feels kinda good, protocol being things like
> 
>  - javadoc
>  - code
>  - keyword
>  - index
>  - raw
>  - href (default)
>  - linkmap (indirection layer, also to aforementioned protocols)
> 
> and name being a filename or a named resource name depending on the 
> protocol and the eventual indirection
> 
> In terms of implementation, some of this will point towards SAXable 
> information that must be passed across Cocoon pipelines, some of this is 
> external data, perhaps binary in the sense of not being based on XML and 
> not serializable as such. So some of this information will require its 
> own Source implementation or Generator, some will just need to be copied 
> around, either as a file, or as a collection of files. Some of it will 
> require link augmentation or resolution, if linkmaps/indirection has 
> been used.

This mixes concerns.
The schemes should only be a link translating system.

> With regards to greater datasets, such as massive Javadoc collections, 
> I'm not sure whether we would need to try and keep this within the 
> concern of Forrest, at least in terms of the static generation of it - 
> there exist tools to do that. In terms of doing something with tools 
> like qdox, I'm not sure - I think this can be a value for the Forrest 
> user: <link href="src:/org.apache.cocoon.foobar"/> bring up a 
> syntax-highlighted version of that class.

The fact is that these systems should be pluggable, and not concern 
forrestr. But we should be able to easily resolve java source classes to 
a URI space, and that would call the correct generation stuff, be it 
actual generation, skinning or copying over.

> So maybe we end up with a number of scenarios for these scheme 
> implementations:
> 
>  - triggering a pipeline generating the target of the link, where we 
> need mimetype based extension generation (see, I concur with you, Nicola!)

Yes, taking the extension away and putting a non-comupulsory mime-type 
attribute +1

>  - triggering some funny SourceWritinglikeTransformer, _moving_ 
> information as-is from source layout to output layout and 'tagging' the 
> link so that it doesn't get traversed anymore by the beloved CLI

We have readers for that. A readers reads the resource as-is.
The issue is not about how to get-copy it, but the crawling, that is 
slow, and doesn't work on readers.

So if I want to include a directory of stuff, it hass all to be parsed 
and put in XML format, so that links can be extracted and the stuff be 
crawled. It's the crawling that should be addressed.

>  - rewriting a link so that, based on some configuration, <link 
> href="javadoc:/org.apache.foobar"/> will be rewritten as <a 
> href="../../../build/javadocs/foobar.html"/> (I forgot how Javadoc 
> generates filenames)

+1 to this.

> That's the few scenarios I could come up with now. Do we need separate 
> components for those? Separate <link/> notations? Have these scenarios 
> implemented through passive or active components?...
> 
> IMHO: "as few as possible", "no" and "active".

What do you mena by active?

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: URI spaces: source, processing, result

Posted by Steven Noels <st...@outerthought.org>.

Nicola Ken Barozzi wrote:

> 
> Does this make sense?
> 

I really don't know. I have been talking offlist to both you and Jeff, 
and some of this is above my drugged head these days.

Trying to bring the town of you together, I see there is some general 
tendency to tolerate and even advocate some source:/ or scheme:/ like 
think, if not for the same reason. While I love to KISS, the aspect of 
having to declare my links in my future Forrest docs like <link 
href="protocol:name"/> feels kinda good, protocol being things like

  - javadoc
  - code
  - keyword
  - index
  - raw
  - href (default)
  - linkmap (indirection layer, also to aforementioned protocols)

and name being a filename or a named resource name depending on the 
protocol and the eventual indirection

In terms of implementation, some of this will point towards SAXable 
information that must be passed across Cocoon pipelines, some of this is 
external data, perhaps binary in the sense of not being based on XML and 
not serializable as such. So some of this information will require its 
own Source implementation or Generator, some will just need to be copied 
around, either as a file, or as a collection of files. Some of it will 
require link augmentation or resolution, if linkmaps/indirection has 
been used.

With regards to greater datasets, such as massive Javadoc collections, 
I'm not sure whether we would need to try and keep this within the 
concern of Forrest, at least in terms of the static generation of it - 
there exist tools to do that. In terms of doing something with tools 
like qdox, I'm not sure - I think this can be a value for the Forrest 
user: <link href="src:/org.apache.cocoon.foobar"/> bring up a 
syntax-highlighted version of that class.

So maybe we end up with a number of scenarios for these scheme 
implementations:

  - triggering a pipeline generating the target of the link, where we 
need mimetype based extension generation (see, I concur with you, Nicola!)
  - triggering some funny SourceWritinglikeTransformer, _moving_ 
information as-is from source layout to output layout and 'tagging' the 
link so that it doesn't get traversed anymore by the beloved CLI
  - rewriting a link so that, based on some configuration, <link 
href="javadoc:/org.apache.foobar"/> will be rewritten as <a 
href="../../../build/javadocs/foobar.html"/> (I forgot how Javadoc 
generates filenames)

That's the few scenarios I could come up with now. Do we need separate 
components for those? Separate <link/> notations? Have these scenarios 
implemented through passive or active components?...

IMHO: "as few as possible", "no" and "active".

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: URI spaces: source, processing, result

Posted by Jeff Turner <je...@apache.org>.

On Wed, Dec 11, 2002 at 03:34:59PM +0100, Nicola Ken Barozzi wrote:
> 
> Last commit of Jeff about the "link:" usage and my commit about 
> resource-exists everywhere are all tentatives to resolve an issue that 
> is still unfortunately open. Add to that topicmaps, linkmaps, and local 
> dir hierarchy, and we definately have an issue to solve about URI spaces.
> 
> Now, IMHO these things are being difficult to resolve because we mix 
> problems together, so I'll try here to separate them in different sections.
> 
> 
>  SoC
> -----------------------
> 
> SoC means that the user should not tell the system how it wants the 
> files to be processed.

Or rather, no assumptions about the processing tool should creep into the
source.  Before SoC and IoC, this was called "common sense" :)

> It just puts content on the disk and Forrest creates a site out of it.
>

<snip completely unrelated directory layout discussion>

<snip decent description of linkmap>

>  Source Mounting
> -----------------------

I can't see how this relates to the file: question, but anyway..

> We should be able to include external directories in our local contents.
> For example, if I have
> 
>  ./src/documentation/**
> 
> and
> 
>  ./build/javadocs/**
> 
> I may want to make Forrest work as if the javadocs were in
> 
>  ./src/documentation/javadocs/**
> 
> without having to actually move the files there.
>
> This is not only about files that must be served as-is, but also served 
> as if they were in the normal hierarchy (ie xdocs).
> 
> This means that probably we should make our own sourceresolver or 
> filegenerator and have that keep a mounting config that can tell where 
> to get the files.
> Or maybe just a SourceMountTranslate action to be called before every 
> generation, that resolves the real source path, given the mount point.

Are you really suggesting that requests for Javadoc pages should go
through Cocoon?

That is completely crazy!  Cocoon's javadocs are 21mb.  Guess how long it
will take for the CLI to pointlessly filter them through Cocoon,
untransformed... just so we can say "everything goes through Cocoon".

But the problem is real: how do we integrate Javadocs into the URI space.

I'd say write out .htaccess files with mod_rewrite rules, and figure out
what the equivalent for Tomcat is.  Perhaps a separate servlet..
_anything_ but Cocoon ;P

> Thus linking to these external resources will be done exactly as if they 
> were in the normal dir space.
> 
> 
> Does this make sense?

Yes.  I see you've let in the door two schemes, 'javadoc:' and 'linkmap:'
(what I was calling 'site:'), so adding 'file:' or 'source:' to indicate
static content shouldn't cause any conceptual hiccups.

--Jeff