You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@forrest.apache.org by Jeff Turner <je...@apache.org> on 2002/12/12 12:32:58 UTC

Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)

On Thu, Dec 12, 2002 at 10:39:05AM +0000, Andrew Savory wrote:
> 
> On Thu, 12 Dec 2002, Steven Noels wrote:
> 
> > could you please comment on my summary, too? Also, I'd like to hear the
> > opinion of others.
> 
> Ok, caveat: I've not used Forrest (yet), but I use Cocoon extensively.
> 
> Jeff Turner wrote:
> 
> > Are you really suggesting that requests for Javadoc pages should go
> > through Cocoon?
> >
> > But the problem is real: how do we integrate Javadocs into
> > the URI space.
> >
> > I'd say write out .htaccess files with mod_rewrite rules, and figure out
> > what the equivalent for Tomcat is.  Perhaps a separate servlet..
> > _anything_ but Cocoon ;P
> 
> Whilst I understand your concern about passing 21mb of files through
> Cocoon untouched, I'm not sure there's a more elegant way of handling URI
> space issues, without ending up bundling a massive amount of software with
> Forrest (or making unrealistic software prerequisite installation
> demands).
> 
> So, since Cocoon _can_ handle the rewriting concern, and is already in
> Forrest, why not use it?

Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
the sitemap would be really nice.  The overhead of a <map:read> for every
Javadoc page probably wouldn't be noticed in a live webapp.  But for the
command-line?  Imagine how long it would take for the crawler to grind
through _every_ Javadoc page, effectively coping it unmodified from A to
B.

IMO, the _real_ problem is that the sitemap has been sold as a generic
URI management system, but it works at the level of a specific XML
publishing tool.  It's scope is overly broad.  The webserver (Tomcat)
should be defining the 'site map', and Cocoon should never even _see_
requests for static resources.  Just like mod_jk only forwards servlet
and JSP requests on to Tomcat, Tomcat should only forward requests for
XML processing on to Cocoon.  So <map:read> is a hack to handle requests
that Cocoon should never have been asked to handle in the first place.

So where does Forrest stand?  We have servlet containers with wholly
inadequate URI mapping.  We have Cocoon, trying to handle requests for
binary content which it shouldn't, resulting is hopeless performance.  We
have httpd, with good URI handling (eg mod_rewrite), but whose presence
can't be relied upon.  What is the way out?

> I like the idea of link naming schemes, but I'm really worried about the
> idea of specifying MIME types as link attributes. This seems like a nasty
> hack: should we be specifying MIME types?

There is some context you're missing there..

http://marc.theaimsgroup.com/?l=forrest-dev&m=103097808318773&w=2

The theory is that links should _not_ specify MIME type of linked-to docs
by default.  The MIME type should be inferred by the type of the linking
document, and what's available.  Eg, <link href="site:/primer"> links to
"The Forrest Primer" in whatever form it's available.

However it is also sometimes desirable to specify the MIME type
explicitly.  So rather than corrupt our nice semantic URLs, eg <link
href="site:/primer.pdf">, we should express the type as a separate
attribute: <link href="site:/primer" type="application/pdf">.

A more current example of this principle: say we want to link to class
MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
Javadoc, UML and qdox representations of that resource.  Should we invent
three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
attribute specifying a MIME type (inventing one if we have to)?

HTH,

--Jeff

> 
> Andrew.
> 
> -- 
> Andrew Savory                                Email: andrew@luminas.co.uk
> Managing Director                              Tel:  +44 (0)870 741 6658
> Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
> This is not an official statement or order.    Web:    www.luminas.co.uk
>

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Jeff Turner <je...@apache.org>.

On Fri, Dec 13, 2002 at 12:07:50AM -0800, Stefano Mazzocchi wrote:
...
> >I don't think you understood my point.  There should be no need for fancy
> >HTTPd <-> Cocoon interactions.  There should be strict IoC, with the
> >webserver, not Cocoon, in full control of the URI space, delegating small
> >portions of it to Cocoon. 
> 
> I like the fact that I can write my selectors/matchers in a pluggable way.
> 
> Should I throw that ability away for use mod_rewrite? forget it, dude!
> 
> Should I write a new apache module for every matcher and selector? and 
> then, what about flowscript? and what if my reader is not just a blatant 
> bit-2-bit copier but performs things like image rescaling and maybe has 
> to cooperate with the flow? should I write another module?
>
> Sure, if we had mod_java, then we could do that, but thinks like 
> flowscript? forget it.
> 
> the HTTPd conf file has not enough semantics to be able to drive cocoon 
> at its full power.

True.  I question whether it would have been better to put all that
effort implementing a Cocoon sitemap, into implementing a HTTPd sitemap.
Rather pointless debating it now, I agree.

> >>NOTE: httpd 2.0 has a pluggable configuration facility. in the future, 
> >>we might be able to use the cocoon sitemap to drive *httpd* directly.
> >
> >
> >Cocoon telling httpd what to do.. isn't that classic subversion of
> >control?
> 
> No, you misunderstood: it's the idea of having HTTPd using a conf file 
> written using the cocoon sitemap markup and using modules as components.

Oh.  Cool :)  Well that's just what I meant; move the sitemap goodness up
one level.  Then we wouldn't be in the ridicuous situation of
contemplating feeding 20mb of static Javadocs through Cocoon.

> But this is *wild* and too many things have to change inside HTTPd to 
> make this possible.
...
> My idea is different: let's remove the unnecessary Servlet API layer and 
> let's glue cocoon directly to httpd's butt. This is what Pier and I have 
> been thinking about in the last year or so.... since next year I'll 
> probably end up living with him, expect something to happen.

Sounds neat.  If Pier shows any signs of wanting to rewrite HTTPd to use
a Cocoon sitemap, please encourage him. :)


--Jeff

> NOTE: I'm not *mandating* this behavior to Cocoon. Just creating another 
> wrapper: CLI, Servlet API and Apache API.
> 
> Last time that Federico Pierpaolo and I lived together, Avalon, James 
> and Cocoon were born. I'm curious to see what will happen now :)
...

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Steven Noels <st...@outerthought.org>.

Miles Elam wrote:

> As a footnote to this thread, let me briefly describe what my group has 
> done with Cocoon.

Very nice and interesting intro-to-self()... Welcome!

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
Read my weblog at              http://radio.weblogs.com/0103539/
stevenn at outerthought.org                stevenn at apache.org

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Miles Elam <mi...@geekspeak.org>.

As a footnote to this thread, let me briefly describe what my group has 
done with Cocoon.

Tomcat with Cocoon and no Apache HTTPd.  It's a Linux box with TUX that 
handles the static content on the filesystem far faster than Apache, and 
it has Tomcat/Cocoon on the backend doing the "real" work.

We *needed* Cocoon both for its pipelines and for its sitemap (URI 
mapping).  Cocoon was a no-brainer.  Nothing else comes close.  Then we 
looked at servlet containers.  Tomcat just happened to be what we're 
used to.  The great thing about containers is that they can be swapped. 
 I'm still out looking for a 1.4/nio-based HTTP handler with the ability 
to disable Keep-Alives (I'll explain later...).  Then there's WebDAV as 
we need to add/update content.  This is still a work in progress for us, 
but Slide seems to fit our needs well.  If bandwidth becomes a bigger 
issue than CPU, we can uncomment the line in our web.xml file that 
handles gzip encoding.

We looked briefly at how to get Apache HTTPd working with our setup 
months ago.  Then we looked at our requirements.  The only thing we 
needed Apache for was fast serving of static content -- which TUX does 
better.  All dynamic content is served from Tomcat/Cocoon.

When we looked closely, we found that Cocoon would simply be slower 
without help from an external, static processor.  We also found that 
Apache HTTPd, as robust and mature it is, lacks significant 
functionality that we find readily available in Cocoon.  When it comes 
down to it, a gzip filter replaces mod_gzip, the PHP generator (which we 
don't use anyway) replaces mod_php, the JSP generator has no analogue 
without mod_jk (or equivalent), mod_rewrite is redundant with the Cocoon 
sitemap, and on and on.

Hmmm...  Now that I think of it, there's no equivalent of mod_speling in 
Tomcat/Cocoon.

But to echo what seems to be an undercurrent, is Apache HTTPd becoming 
redundant?  If speed is your primary concern, wouldn't a few Squid 
servers in front of Tomcat/Cocoon make any speed gains from Apache flat 
file serving get lost in the noise?

-----

The hardest thing about getting our site up was making it fully 
standards-compliant and choosing good URIs.  If I had only used Apache 
(or IIS or iPlanet or just Tomcat), we may have launched faster; 
 However, that would only be because the correct solution would have 
been impossible without Cocoon.

Yeah, you could call me a Cocoon cheerleader.  We still have a great 
deal of work to do, but the URL is http://geekspeak.org/.  For all 
intents and purposes, it doesn't use Cocoon;  It is completely run by 
and controlled by Cocoon.  TUX is just window dressing -- just a flat 
file accelerator.

What I begin to wonder is whether Apache HTTPd is truly the most useful 
and flexible architecture for new websites (not already existing sites 
of course).

-----

Cocoon is the reason why I want to help with Forrest.  It is one of the 
only ways I can think of to say thank you for all of the hard work.  So 
far, I've added Nicola's Krysalis layout (imitation as the sincerest 
form of flattery and all of that) as a CSS skin to the existing XHTML 
mockup from before.  It's got a banner size issue with font-resizing and 
the lists aren't handled correctly, but it's a start.  Once finals are 
over (end of next week), I will try to continue the skin work I started 
a couple of months ago.

http://forrest.iguanacharlie.com/
http://forrest.iguanacharlie.com/krysalis.html

- Miles

P.S.  Nicola: Your layout is very elegant.  It reminds me that I am 
basically a web code monkey and not a graphic designer by a long shot. 
 If you can dream the layout up, I can probably retool it for CSS.

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Stefano Mazzocchi <st...@apache.org>.

Jeff Turner wrote:
> On Thu, Dec 12, 2002 at 07:36:24PM -0800, Stefano Mazzocchi wrote:
> 
>>Jeff Turner wrote:
>>
>>>On Thu, Dec 12, 2002 at 12:07:34PM +0000, Andrew Savory wrote:
>>>
>>>
>>>>On Thu, 12 Dec 2002, Jeff Turner wrote:
> 
> ...
> 
>>>That's what I'm saying: the sitemap is great, but it should be the
>>>"servlet container sitemap", not the "Cocoon sitemap".  There should be
>>>URI management tools (notably URL rewriting) standardized right in
>>>web.xml.
>>
>>Jeff, if you experienced *years* of fighting over the Servlet API Expert 
>>Group to get exactly what you describe, maybe you wouldn't bash the 
>>Cocoon Sitemap so much.
> 
> 
> I was not bashing the Cocoon sitemap, nor the hard-working people who
> made it a reality.  I'm saying that, in a better world, the web server
> would do all the URI management, and Cocoon would be left with just the
> job of transforming and rendering XML.

Yeah, well, that's highly debetable, but it's pointless to do so.

> This 'better world' does not exist in Java-land, so I cannot criticise
> the route Cocoon took.  But I think it _does_ exist in the non-Java
> world, if you view Apache HTTPd as the webserver, and I _suspect_ (never
> having used it) that this is how AxKit got away with not implementing a
> sitemap.

I don't know enough of AxKit to comment on this.

>>>Here is an analogy: why doesn't AxKit have a sitemap?  Because it doesn't
>>>need it.  It relies on Apache httpd's native URL management ability.  All
>>>AxKit needs are those few pipelines for defining XML transformations.
>>
>>Here, Jeff, you miss another few years of talks between myself, 
>>Pierpaolo and the HTTPd 2.0 layered I/O architects, trying to estimate 
>>the ability to have HTTPd 2.0 using something like a mod_cocoon and 
>>referring back all processing that made sense to APR (thru a JNI interface).
> 
> ...
> 
>>At that point, we *might* try to run Cocoon connected directly to the 
>>Apache module API, thus bypassing all the servlet API limitations and 
>>being able to handle back processing (like map:read, for example) to 
>>where it belongs.
> 
> 
> 'Referring back'..
> 'back processing'..
> 
> I don't think you understood my point.  There should be no need for fancy
> HTTPd <-> Cocoon interactions.  There should be strict IoC, with the
> webserver, not Cocoon, in full control of the URI space, delegating small
> portions of it to Cocoon. 

I like the fact that I can write my selectors/matchers in a pluggable way.

Should I throw that ability away for use mod_rewrite? forget it, dude!

Should I write a new apache module for every matcher and selector? and 
then, what about flowscript? and what if my reader is not just a blatant 
bit-2-bit copier but performs things like image rescaling and maybe has 
to cooperate with the flow? should I write another module?

Sure, if we had mod_java, then we could do that, but thinks like 
flowscript? forget it.

the HTTPd conf file has not enough semantics to be able to drive cocoon 
at its full power.

>>NOTE: httpd 2.0 has a pluggable configuration facility. in the future, 
>>we might be able to use the cocoon sitemap to drive *httpd* directly.
> 
> 
> Cocoon telling httpd what to do.. isn't that classic subversion of
> control?

No, you misunderstood: it's the idea of having HTTPd using a conf file 
written using the cocoon sitemap markup and using modules as components.

But this is *wild* and too many things have to change inside HTTPd to 
make this possible.

>>Once again, please, don't underestimate the effort that is put in the 
>>design of a complex software system. You're appear disrespectful and 
>>this might bite you back later on.
> 
> 
> As I said, I'm not criticizing _anything_ about the design of Cocoon or
> the Cocoon sitemap.  I am lamenting what seems to be a fundamental
> screw-up in the entire server-side Java processing stack; that the
> webserver has such poor URI management facilities that tools like Cocoon
> feel it necessary to take the job upon themselves.
> 
> I would _love_ to have a Cocoon-like sitemap in Tomcat.  Imagine.. the
> URI space could be completely independent of the filesystem!  I could
> store a whole website in a RDBMS and map it to the URI space.  IIRC,
> Craig McClanahan said that they were considering a JNDI abstraction for
> the filesystem (as Tomcat 4 does internally) in the servlet spec, but
> sadly it didn't happen.

My idea is different: let's remove the unnecessary Servlet API layer and 
let's glue cocoon directly to httpd's butt. This is what Pier and I have 
been thinking about in the last year or so.... since next year I'll 
probably end up living with him, expect something to happen.

NOTE: I'm not *mandating* this behavior to Cocoon. Just creating another 
wrapper: CLI, Servlet API and Apache API.

Last time that Federico Pierpaolo and I lived together, Avalon, James 
and Cocoon were born. I'm curious to see what will happen now :)

>>>*shrug* There's no real solution now.  The only feasible 'URI daemon' is
>>>Apache httpd.  More and more I agree with Pier Fumagalli, who had some
>>>enlightening rants on tomcat-dev about the need to treat httpd as
>>>_central_, and Tomcat as _only_ a servlet container.  Forget this idea
>>>that httpd is optional.  Put it right in the centre, use it for URI
>>>management and static resource handling, and delegate to Cocoon only the
>>>things Cocoon is good at handling.
>>
>>Should I remind you that Pierpaolo is the guy that designed the Cocoon 
>>sitemap with me?
> 
> 
> I know.. back then he was a Tomcat committer too :)

We both still are. :) But we'd rather stay away from it.

>>Believe me, we have spent so much thinking about ways to make httpd and 
>>java talking closer together that I'm sick of it. But the political and 
>>technological inertia is *not* something that should be underestimated. 
>>And I mean on both sides of the fence: servlet *and* httpd!
> 
> 
> Perhaps because you're trying to fix a _major_ architectural flaw by
> breaking IoC between the webserver and Cocoon?

No, you just misunderstood me there.

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Jeff Turner wrote:
[...] /Stefano will reply to the other points/
> I would _love_ to have a Cocoon-like sitemap in Tomcat.  Imagine.. the
> URI space could be completely independent of the filesystem!  I could
> store a whole website in a RDBMS and map it to the URI space. 

That's Cocoon. You see Cocoon as a plugin to other containers, why can't 
you see Cocoon as the container itself?

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Jeff Turner <je...@apache.org>.

On Thu, Dec 12, 2002 at 07:36:24PM -0800, Stefano Mazzocchi wrote:
> Jeff Turner wrote:
> >On Thu, Dec 12, 2002 at 12:07:34PM +0000, Andrew Savory wrote:
> >
> >>On Thu, 12 Dec 2002, Jeff Turner wrote:
...
> >That's what I'm saying: the sitemap is great, but it should be the
> >"servlet container sitemap", not the "Cocoon sitemap".  There should be
> >URI management tools (notably URL rewriting) standardized right in
> >web.xml.
> 
> Jeff, if you experienced *years* of fighting over the Servlet API Expert 
> Group to get exactly what you describe, maybe you wouldn't bash the 
> Cocoon Sitemap so much.

I was not bashing the Cocoon sitemap, nor the hard-working people who
made it a reality.  I'm saying that, in a better world, the web server
would do all the URI management, and Cocoon would be left with just the
job of transforming and rendering XML.

This 'better world' does not exist in Java-land, so I cannot criticise
the route Cocoon took.  But I think it _does_ exist in the non-Java
world, if you view Apache HTTPd as the webserver, and I _suspect_ (never
having used it) that this is how AxKit got away with not implementing a
sitemap.

...
> >Here is an analogy: why doesn't AxKit have a sitemap?  Because it doesn't
> >need it.  It relies on Apache httpd's native URL management ability.  All
> >AxKit needs are those few pipelines for defining XML transformations.
> 
> Here, Jeff, you miss another few years of talks between myself, 
> Pierpaolo and the HTTPd 2.0 layered I/O architects, trying to estimate 
> the ability to have HTTPd 2.0 using something like a mod_cocoon and 
> referring back all processing that made sense to APR (thru a JNI interface).
...
> At that point, we *might* try to run Cocoon connected directly to the 
> Apache module API, thus bypassing all the servlet API limitations and 
> being able to handle back processing (like map:read, for example) to 
> where it belongs.

'Referring back'..
'back processing'..

I don't think you understood my point.  There should be no need for fancy
HTTPd <-> Cocoon interactions.  There should be strict IoC, with the
webserver, not Cocoon, in full control of the URI space, delegating small
portions of it to Cocoon. 

> NOTE: httpd 2.0 has a pluggable configuration facility. in the future, 
> we might be able to use the cocoon sitemap to drive *httpd* directly.

Cocoon telling httpd what to do.. isn't that classic subversion of
control?

> Once again, please, don't underestimate the effort that is put in the 
> design of a complex software system. You're appear disrespectful and 
> this might bite you back later on.

As I said, I'm not criticizing _anything_ about the design of Cocoon or
the Cocoon sitemap.  I am lamenting what seems to be a fundamental
screw-up in the entire server-side Java processing stack; that the
webserver has such poor URI management facilities that tools like Cocoon
feel it necessary to take the job upon themselves.

I would _love_ to have a Cocoon-like sitemap in Tomcat.  Imagine.. the
URI space could be completely independent of the filesystem!  I could
store a whole website in a RDBMS and map it to the URI space.  IIRC,
Craig McClanahan said that they were considering a JNDI abstraction for
the filesystem (as Tomcat 4 does internally) in the servlet spec, but
sadly it didn't happen.

...
> >*shrug* There's no real solution now.  The only feasible 'URI daemon' is
> >Apache httpd.  More and more I agree with Pier Fumagalli, who had some
> >enlightening rants on tomcat-dev about the need to treat httpd as
> >_central_, and Tomcat as _only_ a servlet container.  Forget this idea
> >that httpd is optional.  Put it right in the centre, use it for URI
> >management and static resource handling, and delegate to Cocoon only the
> >things Cocoon is good at handling.
> 
> Should I remind you that Pierpaolo is the guy that designed the Cocoon 
> sitemap with me?

I know.. back then he was a Tomcat committer too :)

> Believe me, we have spent so much thinking about ways to make httpd and 
> java talking closer together that I'm sick of it. But the political and 
> technological inertia is *not* something that should be underestimated. 
> And I mean on both sides of the fence: servlet *and* httpd!

Perhaps because you're trying to fix a _major_ architectural flaw by
breaking IoC between the webserver and Cocoon?

--Jeff

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Stefano Mazzocchi <st...@apache.org>.

Jeff Turner wrote:
> On Thu, Dec 12, 2002 at 12:07:34PM +0000, Andrew Savory wrote:
> 
>>On Thu, 12 Dec 2002, Jeff Turner wrote:
>>
>>
>>>Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
>>>the sitemap would be really nice.  The overhead of a <map:read> for every
>>>Javadoc page probably wouldn't be noticed in a live webapp.  But for the
>>>command-line?  Imagine how long it would take for the crawler to grind
>>>through _every_ Javadoc page, effectively coping it unmodified from A to
>>>B.
>>
>>I guess on the plus side, everything is still controlled in one place, and
>>since it's on the command line, it can be automated. The downside, as you
>>mention, is speed. But is Cocoon significantly slower doing a map:read
>>than, say, a "cp" on the command-line? What sort of factor of trade-off
>>are we talking about?
>>
>>
>>>IMO, the _real_ problem is that the sitemap has been sold as a generic
>>>URI management system, but it works at the level of a specific XML
>>>publishing tool.  It's scope is overly broad.
>>
>>Again, it's a pro/con kind of argument: I *like* that everything is dealt
>>with within the Cocoon sitemap: my httpd/servlet engines are
>>interchangeable, but Cocoon is a constant.
> 
> 
> That's what I'm saying: the sitemap is great, but it should be the
> "servlet container sitemap", not the "Cocoon sitemap".  There should be
> URI management tools (notably URL rewriting) standardized right in
> web.xml.

Jeff, if you experienced *years* of fighting over the Servlet API Expert 
Group to get exactly what you describe, maybe you wouldn't bash the 
Cocoon Sitemap so much.

Cocoon was implemented *way before* the Servlet API EG came up with that 
stupid and useless notion of Servlet Filters. Cocoon was created to show 
how pipelining should happen *inside* the servlet, not *outside* and the 
web.xml should allow servlet componentization.

Of course, that was Cocoon1 and without a stinking JSR with politics 
attached, we were able to get *much* further than their stupid and 
useless web.xml (with hardcoded JSP semantics, yuck!)

> Here is an analogy: why doesn't AxKit have a sitemap?  Because it doesn't
> need it.  It relies on Apache httpd's native URL management ability.  All
> AxKit needs are those few pipelines for defining XML transformations.

Here, Jeff, you miss another few years of talks between myself, 
Pierpaolo and the HTTPd 2.0 layered I/O architects, trying to estimate 
the ability to have HTTPd 2.0 using something like a mod_cocoon and 
referring back all processing that made sense to APR (thru a JNI interface).

Unfortunately, we had to wait until Apache 2.0 was stable enough to try 
to implement a mod_java first (having a JVM running inside the web 
server would make several sys-adm scream and yell and leave the building 
like it was on fire!) and see what happens.

At that point, we *might* try to run Cocoon connected directly to the 
Apache module API, thus bypassing all the servlet API limitations and 
being able to handle back processing (like map:read, for example) to 
where it belongs.

NOTE: httpd 2.0 has a pluggable configuration facility. in the future, 
we might be able to use the cocoon sitemap to drive *httpd* directly.

Once again, please, don't underestimate the effort that is put in the 
design of a complex software system. You're appear disrespectful and 
this might bite you back later on.

>>>So where does Forrest stand?  We have servlet containers with wholly
>>>inadequate URI mapping.  We have Cocoon, trying to handle requests for
>>>binary content which it shouldn't, resulting is hopeless performance.  We
>>>have httpd, with good URI handling (eg mod_rewrite), but whose presence
>>>can't be relied upon.  What is the way out?
>>
>>Well, one solution might be to split the sitemap (URI mapping) from
>>the sitemap (URI handling), and have a separate URI daemon that can run in
>>front of Cocoon (and in front of httpd, Tomcat, etc too). This seems kinda
>>drastic though, and could lead to a tangled mess of rewrites at each
>>stage.
> 
> 
> *shrug* There's no real solution now.  The only feasible 'URI daemon' is
> Apache httpd.  More and more I agree with Pier Fumagalli, who had some
> enlightening rants on tomcat-dev about the need to treat httpd as
> _central_, and Tomcat as _only_ a servlet container.  Forget this idea
> that httpd is optional.  Put it right in the centre, use it for URI
> management and static resource handling, and delegate to Cocoon only the
> things Cocoon is good at handling.

Should I remind you that Pierpaolo is the guy that designed the Cocoon 
sitemap with me?

Believe me, we have spent so much thinking about ways to make httpd and 
java talking closer together that I'm sick of it. But the political and 
technological inertia is *not* something that should be underestimated. 
And I mean on both sides of the fence: servlet *and* httpd!

>>>A more current example of this principle: say we want to link to class
>>>MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
>>>Javadoc, UML and qdox representations of that resource.  Should we invent
>>>three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
>>>attribute specifying a MIME type (inventing one if we have to)?
>>
>>Hrm, ok. But if we have javadoc it is going to be HTTP/HTML, so why
>>javadoc: as a protocol? Come to think of it, why java: as a protocol? If
>>the part of any href before a colon refers to the transport, is it right
>>to effectively overload the transport with additional MIME type
>>information? 
> 
> 
> But it's not a protocol, it's a 'scheme' :)  Everyone makes this mistake
> (thanks to Marc for pointing it out).  A URI is an _identifier_.  Have a
> look at the URI RFC; it makes clear that protocol (transport mechanism)
> != scheme (identifier syntax):
> 
>  "The URI scheme (Section 3.1) defines the namespace of the URI, and thus
>  may further restrict the syntax and semantics of identifiers using that
>  scheme."
> 
> And this.. "many URL schemes are named after protocols":
> 
>   "Although many URL schemes are named after protocols, this does not
>   imply that the only way to access the URL's resource is via the named
>   protocol.  Gateways, proxies, caches, and name resolution services
>   might be used to access some resources, independent of the protocol of
>   their origin, and the resolution of some URL may require the use of
>   more than one protocol (e.g., both DNS and HTTP are typically used to
>   access an "http" URL's resource when it can't be found in a local
>   cache)."
> 
> And again, distinguishing "methods of access" from "schemes for
> identif[ication]":
> 
>  "Just as there are many different methods of access to resources, there
>  are a variety of schemes for identifying such resources.  The URI syntax
>  consists of a sequence of components separated by reserved characters,
>  with the first component defining the semantics for the remainder of the
>  URI string."
> 
> 
> So when you see <link href="java:org.apache.myproj.MyClass">, the 'java:'
> bit is simply telling the link processor that "org.apache.myproj.MyClass"
> is to be interpreted as a Java resource identifier.

I agree with your notion that 'schema != protocol', just like "URI != URL'.

But this is another story, I'll reply to that in another email.

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Stefano Mazzocchi <st...@apache.org>.

Nicola Ken Barozzi wrote:
> 
> 
> Andrew Savory wrote:
> 
>> On Fri, 13 Dec 2002, Jeff Turner wrote:
>>
>>
>>> Forget this idea that httpd is optional.  Put it right in the centre,
>>> use it for URI management and static resource handling, and delegate to
>>> Cocoon only the things Cocoon is good at handling.
>>
>>
>> I can see the sense in that ... although it does assume that 
>> everything is
>> going to be coming and going via HTTP. But as I can't think of any sane
>> alternatives, that seems reasonable ;-)
> 
> 
> IMO this should be transparent to the container(s) and not compulsory.
> I should not *rely* on this, but could use it if I wanted to get a speed 
> boost.

This is a cocoon-related discussion, but anyway, at ApacheCON I talked 
*extensively* with the mod_proxy people and they told me that mod_cache 
that ships with 2.0 is perfectly able to do exactly what we ask for: 
avoid processing requests that don't belong to Cocoon.

So, there is *some* truth in saying that map:read is a hack (I don't 
think it is, see the ImageReader I wrote, for an example of an 
not-so-trivial use of the concept), but the use of a transparent cache 
up front and the use of HTTPd 2.0 filtered I/O allows us to

  1) keep the URI-space control in one location
  2) gain lightspeed native performance (thru native up-front caching)
  3) gain distributed cachign (with proxy-friendly cocoon-generated HTTP 
headers)

And I see *no lack of elegance* in such a solution, which is also very 
fiendly in respect of the various politial frictions that happen to 
exist between java and C worlds (and something that, unfortunately, we 
have to deal with!)

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Nicola Ken Barozzi <ni...@apache.org>.


Andrew Savory wrote:
> On Fri, 13 Dec 2002, Jeff Turner wrote:
> 
> 
>>Forget this idea that httpd is optional.  Put it right in the centre,
>>use it for URI management and static resource handling, and delegate to
>>Cocoon only the things Cocoon is good at handling.
> 
> I can see the sense in that ... although it does assume that everything is
> going to be coming and going via HTTP. But as I can't think of any sane
> alternatives, that seems reasonable ;-)

IMO this should be transparent to the container(s) and not compulsory.
I should not *rely* on this, but could use it if I wanted to get a speed 
boost.

>>But it's not a protocol, it's a 'scheme' :)  Everyone makes this mistake
>>(thanks to Marc for pointing it out).
> 
> Ah, gotcha, thanks. I see the point now.
> 
>>So when you see <link href="java:org.apache.myproj.MyClass">, the 'java:'
>>bit is simply telling the link processor that "org.apache.myproj.MyClass"
>>is to be interpreted as a Java resource identifier.
>>
>>>(That's not to say I'm in favour of the +uml notation either...
>>
>>Oh, that 'text/html+javadoc' was a wild guess at what a Javadoc MIME type
>>might be, based on the observation that the SVG MIME type is
>>'text/xml+svg'
> 
> Ok. Again, my misunderstanding of your intention -- I thought you were
> aiming to add bits to MIME types, rather than using a specific "javadoc"
> type.

Yes, we had basically all come to this non-obvious consensus, after a 
long and profitable discussion.
I want the same thing to happen on current open issues.

> Thanks for the explanations!

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [OT] Re: Sitemap woes and semantic linking

Posted by Andrew Savory <an...@luminas.co.uk>.

On Fri, 13 Dec 2002, Jeff Turner wrote:

> Forget this idea that httpd is optional.  Put it right in the centre,
> use it for URI management and static resource handling, and delegate to
> Cocoon only the things Cocoon is good at handling.

I can see the sense in that ... although it does assume that everything is
going to be coming and going via HTTP. But as I can't think of any sane
alternatives, that seems reasonable ;-)

> But it's not a protocol, it's a 'scheme' :)  Everyone makes this mistake
> (thanks to Marc for pointing it out).

Ah, gotcha, thanks. I see the point now.

> So when you see <link href="java:org.apache.myproj.MyClass">, the 'java:'
> bit is simply telling the link processor that "org.apache.myproj.MyClass"
> is to be interpreted as a Java resource identifier.
>
> > (That's not to say I'm in favour of the +uml notation either...
>
> Oh, that 'text/html+javadoc' was a wild guess at what a Javadoc MIME type
> might be, based on the observation that the SVG MIME type is
> 'text/xml+svg'

Ok. Again, my misunderstanding of your intention -- I thought you were
aiming to add bits to MIME types, rather than using a specific "javadoc"
type.

Thanks for the explanations!


Andrew.

-- 
Andrew Savory                                Email: andrew@luminas.co.uk
Managing Director                              Tel:  +44 (0)870 741 6658
Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
This is not an official statement or order.    Web:    www.luminas.co.uk

[OT] Re: Sitemap woes and semantic linking

Posted by Jeff Turner <je...@apache.org>.

On Thu, Dec 12, 2002 at 12:07:34PM +0000, Andrew Savory wrote:
> 
> On Thu, 12 Dec 2002, Jeff Turner wrote:
> 
> > Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
> > the sitemap would be really nice.  The overhead of a <map:read> for every
> > Javadoc page probably wouldn't be noticed in a live webapp.  But for the
> > command-line?  Imagine how long it would take for the crawler to grind
> > through _every_ Javadoc page, effectively coping it unmodified from A to
> > B.
> 
> I guess on the plus side, everything is still controlled in one place, and
> since it's on the command line, it can be automated. The downside, as you
> mention, is speed. But is Cocoon significantly slower doing a map:read
> than, say, a "cp" on the command-line? What sort of factor of trade-off
> are we talking about?
> 
> > IMO, the _real_ problem is that the sitemap has been sold as a generic
> > URI management system, but it works at the level of a specific XML
> > publishing tool.  It's scope is overly broad.
> 
> Again, it's a pro/con kind of argument: I *like* that everything is dealt
> with within the Cocoon sitemap: my httpd/servlet engines are
> interchangeable, but Cocoon is a constant.

That's what I'm saying: the sitemap is great, but it should be the
"servlet container sitemap", not the "Cocoon sitemap".  There should be
URI management tools (notably URL rewriting) standardized right in
web.xml.

Here is an analogy: why doesn't AxKit have a sitemap?  Because it doesn't
need it.  It relies on Apache httpd's native URL management ability.  All
AxKit needs are those few pipelines for defining XML transformations.

> > So where does Forrest stand?  We have servlet containers with wholly
> > inadequate URI mapping.  We have Cocoon, trying to handle requests for
> > binary content which it shouldn't, resulting is hopeless performance.  We
> > have httpd, with good URI handling (eg mod_rewrite), but whose presence
> > can't be relied upon.  What is the way out?
> 
> Well, one solution might be to split the sitemap (URI mapping) from
> the sitemap (URI handling), and have a separate URI daemon that can run in
> front of Cocoon (and in front of httpd, Tomcat, etc too). This seems kinda
> drastic though, and could lead to a tangled mess of rewrites at each
> stage.

*shrug* There's no real solution now.  The only feasible 'URI daemon' is
Apache httpd.  More and more I agree with Pier Fumagalli, who had some
enlightening rants on tomcat-dev about the need to treat httpd as
_central_, and Tomcat as _only_ a servlet container.  Forget this idea
that httpd is optional.  Put it right in the centre, use it for URI
management and static resource handling, and delegate to Cocoon only the
things Cocoon is good at handling.

> > A more current example of this principle: say we want to link to class
> > MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
> > Javadoc, UML and qdox representations of that resource.  Should we invent
> > three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
> > attribute specifying a MIME type (inventing one if we have to)?
> 
> Hrm, ok. But if we have javadoc it is going to be HTTP/HTML, so why
> javadoc: as a protocol? Come to think of it, why java: as a protocol? If
> the part of any href before a colon refers to the transport, is it right
> to effectively overload the transport with additional MIME type
> information? 

But it's not a protocol, it's a 'scheme' :)  Everyone makes this mistake
(thanks to Marc for pointing it out).  A URI is an _identifier_.  Have a
look at the URI RFC; it makes clear that protocol (transport mechanism)
!= scheme (identifier syntax):

 "The URI scheme (Section 3.1) defines the namespace of the URI, and thus
 may further restrict the syntax and semantics of identifiers using that
 scheme."

And this.. "many URL schemes are named after protocols":

  "Although many URL schemes are named after protocols, this does not
  imply that the only way to access the URL's resource is via the named
  protocol.  Gateways, proxies, caches, and name resolution services
  might be used to access some resources, independent of the protocol of
  their origin, and the resolution of some URL may require the use of
  more than one protocol (e.g., both DNS and HTTP are typically used to
  access an "http" URL's resource when it can't be found in a local
  cache)."

And again, distinguishing "methods of access" from "schemes for
identif[ication]":

 "Just as there are many different methods of access to resources, there
 are a variety of schemes for identifying such resources.  The URI syntax
 consists of a sequence of components separated by reserved characters,
 with the first component defining the semantics for the remainder of the
 URI string."

So when you see <link href="java:org.apache.myproj.MyClass">, the 'java:'
bit is simply telling the link processor that "org.apache.myproj.MyClass"
is to be interpreted as a Java resource identifier.

> (That's not to say I'm in favour of the +uml notation either... 

Oh, that 'text/html+javadoc' was a wild guess at what a Javadoc MIME type
might be, based on the observation that the SVG MIME type is
'text/xml+svg'

--Jeff

> 
> Andrew.
> 
> -- 
> Andrew Savory                                Email: andrew@luminas.co.uk
> Managing Director                              Tel:  +44 (0)870 741 6658
> Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
> This is not an official statement or order.    Web:    www.luminas.co.uk
>

Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)

Posted by Sylvain Wallez <sy...@anyware-tech.com>.

Jeff Turner wrote:

>On Thu, Dec 12, 2002 at 10:39:05AM +0000, Andrew Savory wrote:
>  
>
>>On Thu, 12 Dec 2002, Steven Noels wrote:
>>
>>    
>>
>>>could you please comment on my summary, too? Also, I'd like to hear the
>>>opinion of others.
>>>      
>>>
>>Ok, caveat: I've not used Forrest (yet), but I use Cocoon extensively.
>>
>>Jeff Turner wrote:
>>
>>    
>>
>>>Are you really suggesting that requests for Javadoc pages should go
>>>through Cocoon?
>>>
>>>But the problem is real: how do we integrate Javadocs into
>>>the URI space.
>>>
>>>I'd say write out .htaccess files with mod_rewrite rules, and figure out
>>>what the equivalent for Tomcat is.  Perhaps a separate servlet..
>>>_anything_ but Cocoon ;P
>>>      
>>>
>>Whilst I understand your concern about passing 21mb of files through
>>Cocoon untouched, I'm not sure there's a more elegant way of handling URI
>>space issues, without ending up bundling a massive amount of software with
>>Forrest (or making unrealistic software prerequisite installation
>>demands).
>>
>>So, since Cocoon _can_ handle the rewriting concern, and is already in
>>Forrest, why not use it?
>>    
>>
>
>Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
>the sitemap would be really nice.  The overhead of a <map:read> for every
>Javadoc page probably wouldn't be noticed in a live webapp.  But for the
>command-line?  Imagine how long it would take for the crawler to grind
>through _every_ Javadoc page, effectively coping it unmodified from A to
>B.
>
>IMO, the _real_ problem is that the sitemap has been sold as a generic
>URI management system, but it works at the level of a specific XML
>publishing tool.  It's scope is overly broad.  The webserver (Tomcat)
>should be defining the 'site map', and Cocoon should never even _see_
>requests for static resources.  Just like mod_jk only forwards servlet
>and JSP requests on to Tomcat, Tomcat should only forward requests for
>XML processing on to Cocoon.  So <map:read> is a hack to handle requests
>that Cocoon should never have been asked to handle in the first place.
>

No flame intended, but I'd like to explain why I disagree with 
<map:read> being a hack.

It can only be considered so in the specific case where a mod_rewrite 
rule can translate the request URI to a _file_ name. This is very 
restrictive compared to what is possible in Cocoon with and around a 
reader, and there are many more uses that don't fit in this.

For example, I use it on some projects to retrieve binary attachements 
to documents in an SQL database (BLOBs), or to access remote CVS 
repositories. This only uses the standard ResourceReader with specific 
sources, but we can also have some very specialized readers that can 
produce binary content from almost anything.

The world isn't full of XML, and Readers are the way for Cocoon to serve 
content that cannot be defined through XML processing pipelines.

>So where does Forrest stand?  We have servlet containers with wholly
>inadequate URI mapping.  We have Cocoon, trying to handle requests for
>binary content which it shouldn't, resulting is hopeless performance.  We
>have httpd, with good URI handling (eg mod_rewrite), but whose presence
>can't be relied upon.  What is the way out?
>

The way out may be to have equivalent mod_rewrite configuration and 
sitemap snippets for binary source handling. This allows the Cocoon app 
to be self-contained, yet being able to be deployed behing a 
mod_rewrite-enabled httpd.

Also, Cocoon's CLI is slow on handling XML-processed content since it 
processes it twice : once to extract the links, and once to produce the 
file. Using the recent work on caching-points in Cocoon 2.1, we can 
envision some significant speed improvement if Cocoon's crawler takes 
care of this.

Ah, and something that Cocoon's crawler can do but wget can't is follow 
links between generated PDFs...

>>I like the idea of link naming schemes, but I'm really worried about the
>>idea of specifying MIME types as link attributes. This seems like a nasty
>>hack: should we be specifying MIME types?
>>    
>>
>
>There is some context you're missing there..
>
>http://marc.theaimsgroup.com/?l=forrest-dev&m=103097808318773&w=2
>
>The theory is that links should _not_ specify MIME type of linked-to docs
>by default.  The MIME type should be inferred by the type of the linking
>document, and what's available.  Eg, <link href="site:/primer"> links to
>"The Forrest Primer" in whatever form it's available.
>
>However it is also sometimes desirable to specify the MIME type
>explicitly.  So rather than corrupt our nice semantic URLs, eg <link
>href="site:/primer.pdf">, we should express the type as a separate
>attribute: <link href="site:/primer" type="application/pdf">.
>
>A more current example of this principle: say we want to link to class
>MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
>Javadoc, UML and qdox representations of that resource.  Should we invent
>three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
>attribute specifying a MIME type (inventing one if we have to)?
>

A positive note to end this post : I find these MIME-typed links a very 
elegant solution to cleanly separate the referred content from its 
presentation.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }

Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Andrew Savory wrote:
> On Thu, 12 Dec 2002, Jeff Turner wrote:
> 
> 
>>Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
>>the sitemap would be really nice.  The overhead of a <map:read> for every
>>Javadoc page probably wouldn't be noticed in a live webapp.  But for the
>>command-line?  Imagine how long it would take for the crawler to grind
>>through _every_ Javadoc page, effectively coping it unmodified from A to
>>B.
> 
> I guess on the plus side, everything is still controlled in one place, and
> since it's on the command line, it can be automated. The downside, as you
> mention, is speed. But is Cocoon significantly slower doing a map:read
> than, say, a "cp" on the command-line? What sort of factor of trade-off
> are we talking about?

The actual problem is the CLI Cocoon, that crawles links.
The server version does not have this problem. So it's a CLI issue, not 
a Cocoon issue.

>>IMO, the _real_ problem is that the sitemap has been sold as a generic
>>URI management system, but it works at the level of a specific XML
>>publishing tool.  It's scope is overly broad.
> 
> Again, it's a pro/con kind of argument: I *like* that everything is dealt
> with within the Cocoon sitemap: my httpd/servlet engines are
> interchangeable, but Cocoon is a constant.

Not only. Cocoon is *not* a servlet app. It's an XML processing engine. 
So it should manage everything it serves, so that its apps can be ported 
to every environment Cocoon can run in.

>>So where does Forrest stand?  We have servlet containers with wholly
>>inadequate URI mapping.  We have Cocoon, trying to handle requests for
>>binary content which it shouldn't, resulting is hopeless performance.  We
>>have httpd, with good URI handling (eg mod_rewrite), but whose presence
>>can't be relied upon.  What is the way out?
> 
> Well, one solution might be to split the sitemap (URI mapping) from
> the sitemap (URI handling), and have a separate URI daemon that can run in
> front of Cocoon (and in front of httpd, Tomcat, etc too). This seems kinda
> drastic though, and could lead to a tangled mess of rewrites at each
> stage.

Exactly. These problems are not necessary bady things that Cocoon has 
but bugs or missing features. We should not circumvent them with hacks, 
but be able to manage them better in Cocoon.

>>There is some context you're missing there..
>>
>>http://marc.theaimsgroup.com/?l=forrest-dev&m=103097808318773&w=2
> 
> 
> Ok, gotcha. That seems fair, apologies for rehashing old discussions.
> 
> 
>>A more current example of this principle: say we want to link to class
>>MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
>>Javadoc, UML and qdox representations of that resource.  Should we invent
>>three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
>>attribute specifying a MIME type (inventing one if we have to)?
> 
> Hrm, ok. But if we have javadoc it is going to be HTTP/HTML, so why
> javadoc: as a protocol? Come to think of it, why java: as a protocol? If
> the part of any href before a colon refers to the transport, is it right
> to effectively overload the transport with additional MIME type
> information? (That's not to say I'm in favour of the +uml notation
> either... do we need another attribute?)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)

Posted by Stefano Mazzocchi <st...@apache.org>.

Andrew Savory wrote:
> On Thu, 12 Dec 2002, Jeff Turner wrote:
> 
> 
>>Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
>>the sitemap would be really nice.  The overhead of a <map:read> for every
>>Javadoc page probably wouldn't be noticed in a live webapp.  But for the
>>command-line?  Imagine how long it would take for the crawler to grind
>>through _every_ Javadoc page, effectively coping it unmodified from A to
>>B.
> 
> 
> I guess on the plus side, everything is still controlled in one place, and
> since it's on the command line, it can be automated. The downside, as you
> mention, is speed. But is Cocoon significantly slower doing a map:read
> than, say, a "cp" on the command-line? What sort of factor of trade-off
> are we talking about?

A file copy is a native operation. In a modern operating system with a 
modern JVM it can be performed using DMA. So it's lightspeed compared to 
anything that cocoon will be able to do.

But we are talking about 'bulk copy'.

If we talk about scanning for links (and any wget-like crawler, 
CocoonCLI or others, have to do this), then there is no technical reason 
why the Cocoon CLI has to be slower than, say, a wget java clone.

-- 
Stefano Mazzocchi                               <st...@apache.org>
--------------------------------------------------------------------

Re: Sitemap woes and semantic linking (Re: URI spaces: source, processing, result)

Posted by Andrew Savory <an...@luminas.co.uk>.

On Thu, 12 Dec 2002, Jeff Turner wrote:

> Yes I agree.  Having the _whole_ URI space (including javadocs) mapped in
> the sitemap would be really nice.  The overhead of a <map:read> for every
> Javadoc page probably wouldn't be noticed in a live webapp.  But for the
> command-line?  Imagine how long it would take for the crawler to grind
> through _every_ Javadoc page, effectively coping it unmodified from A to
> B.

I guess on the plus side, everything is still controlled in one place, and
since it's on the command line, it can be automated. The downside, as you
mention, is speed. But is Cocoon significantly slower doing a map:read
than, say, a "cp" on the command-line? What sort of factor of trade-off
are we talking about?

> IMO, the _real_ problem is that the sitemap has been sold as a generic
> URI management system, but it works at the level of a specific XML
> publishing tool.  It's scope is overly broad.

Again, it's a pro/con kind of argument: I *like* that everything is dealt
with within the Cocoon sitemap: my httpd/servlet engines are
interchangeable, but Cocoon is a constant.

> So where does Forrest stand?  We have servlet containers with wholly
> inadequate URI mapping.  We have Cocoon, trying to handle requests for
> binary content which it shouldn't, resulting is hopeless performance.  We
> have httpd, with good URI handling (eg mod_rewrite), but whose presence
> can't be relied upon.  What is the way out?

Well, one solution might be to split the sitemap (URI mapping) from
the sitemap (URI handling), and have a separate URI daemon that can run in
front of Cocoon (and in front of httpd, Tomcat, etc too). This seems kinda
drastic though, and could lead to a tangled mess of rewrites at each
stage.

> There is some context you're missing there..
>
> http://marc.theaimsgroup.com/?l=forrest-dev&m=103097808318773&w=2

Ok, gotcha. That seems fair, apologies for rehashing old discussions.

> A more current example of this principle: say we want to link to class
> MyClass:  <link href="java:org.apache.foo.MyClass">.  Now say we have
> Javadoc, UML and qdox representations of that resource.  Should we invent
> three new protocols; javadoc:, uml: and qdox:, or should we add a 'type'
> attribute specifying a MIME type (inventing one if we have to)?

Hrm, ok. But if we have javadoc it is going to be HTTP/HTML, so why
javadoc: as a protocol? Come to think of it, why java: as a protocol? If
the part of any href before a colon refers to the transport, is it right
to effectively overload the transport with additional MIME type
information? (That's not to say I'm in favour of the +uml notation
either... do we need another attribute?)

Andrew.

-- 
Andrew Savory                                Email: andrew@luminas.co.uk
Managing Director                              Tel:  +44 (0)870 741 6658
Luminas Internet Applications                  Fax:  +44 (0)700 598 1135
This is not an official statement or order.    Web:    www.luminas.co.uk