You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Paul Russell <pa...@luminas.co.uk> on 2000/05/25 03:49:24 UTC

[RT] [OT] Content_negotiation++;

Hi all,

I hope I'm not infringing an unwritten copyright of Stefano's
here, but this e-mail *definately* fits into the catagory of
a random thought, rather than an immediate proposition. It's
also somewhat off topic; this thought is NOT a suggestion
for something we could do with Cocoon - I just don't think
it makes sense in that environment. It's a random thought
that's been bouncing around my head, and I thought I'd share
it with you.

Before I dive in and start wombling, note that my internet
connection is currently as dead as a dodo, and so I can't
check any namespaces or anything. This also has the side
effect that I have absolutely no idea whatsoever when our
mailserver will actually succeed in delivering it.

I've simplified a lot of this, because (a) I don't want
to cloud matters, if I can avoid it, and (b) it's too
damned late for me to think straight ;)

I've been thinking about content/presentation abstraction for
a *long* time, and even more since I've been involved with
Cocoon.

One thing has always been in the back of my mind..

  "Surely the server should be able to deliver content
   in whatever format the client wants? More to the point,
   surely I shouldn't always have to think about it?"

Let's think about this. Currently in Cocoon we have either
PIs (in Cocoon1.x) or a sitemap (Cocoon2). In each of these,
we explicitly specify how to get from the source document
to the browser. Now, in Cocoon2, we can use matchers to
determine what format to send data to the client in, which
is great. I can have exactly the same URI, and without
changing the source document, I can pump it out in HTML,
text, PDF, or if I'm feeling particularly adventurous (or
plain sick - you decide) SVG, PNG or JPEG. Cocoon1.x offers
similar facilities, albeit in a somewhat under-engineered
fashion.

Real Soon Now, we'll be able to use something called
'content negotiation' to work out what format browers
would prefer without having to inferr it from the URL
or User Agent.

The way this works is that the client sends an 'Accept:'
header to the server specifying what types of data it can
understand. This feature is somewhat underdeveloped in
current browsers, but it will improve, particularly as
technologies such as Cocoon become prevalent.

Now, the seed that has been growing in my head (albeit on
the back burner) for a good few months now is that it is
potentially possible to take this concept further.

All XML documents have a 'namespace'. This is a unique URI,
which allows us to be *sure* we're dealing with the set of
semantics we were expecting. As the XML content within 
Cocoon flows from the generator to the serializer, this
namespace changes.

Now, take the following lump of (imaginary) config for a
Cocoon-like system:

<negotiate>
  <translate
    from="http://xmlns.luminas.co.uk/uea/prosp/intro/"
    to="http://xmlns.luminas.co.uk/uea/prosp/layout/"
    filter="xslt">
    <parameter name="stylesheet" value="intro-layout.xsl"/>
  </translate>
  <translate
    from="http://xmlns.luminas.co.uk/uea/prosp/subj/"
    to="http://xmlns.luminas.co.uk/uea/prosp/layout/"
    filter="xslt">
    <parameter name="stylesheet" value="subj-layout.xsl"/>
  </translate>
  <translation
    from="http://xmlns.luminas.co.uk/uea/prosp/layout/">
    to="http://www.w3c.org/1999/xhtml/"
    filter="xslt">
    <parameter name="stylesheet" value="layout-xhtml.xsl"/>
  </translate>
  <translation
    from="http://xmlns.luminas.co.uk/uea/prosp/layout/"
    to="http://www.wapforum.org/[...Iforget...]"
    filter="xslt">
    <parameter name="stylesheet" value="layout-wml.xsl"/>
  </translation>
</negotiate>

The server could then build a node-edge graph of the
translations in memory, and back propagate the target
namespaces to preceeding nodes in the graph. For the
above (very simple) config file, the graph would look
like [namespaces shortened]:

[uea/prosp/intro] \                      / [xhtml]
                   |-[uea/prosp/layout]-|
 [uea/prosp/subj] /                      \ [wml]

The xml and xhtml namespaces would be back propagated
towards the left of the diagram, so that the 'layout'
node knows to go 'up' for xhtml, and 'down' for wml,
and so that the 'intro' and 'subj' nodes know that
they can reach 'layout', 'xhtml', and 'wml' by going
'right'.

When a request comes in, it would be tagged with the
destination namespace (xhtml, wml, svg, whatever...).
When the source XML is parsed/generated, we discover
its namespace from the root element, and go find
ourselves that node in the graph. The node then looks
at the destination namespace and forwards the SAX
events to a filter and on to the destination node.
This process continues until it gets to the destination
node, at which point it's serialized, which brings me
nicely onto the next point...

I've got to admit, up to this point, I've made a
rather large simplification. Great. We can transform
from one namespace to another, but how on earth
are we going to get a png or a jpeg out? The obvious
answer is to treat mime types in a similar manner,
and build them into the node graph. This causes
problems, because suddenly we're not dealing with
filters, we're dealing with serializers, which have
a SAX input stream, but a *binary* output stream.
To be frank, I'm still pondering this bit.

I've explained *how* something like this could work
(loosely, I admit), but the question now has to be
"why on earth would you want to?".

The answer is (and I did warn you this was *not* an
immediate proposition) that at the moment, you
*wouldn't* (this thing is *not* in the Cocoon2 target
area, as far as I'm concerned). The sitemap handles
pretty much anything most people are going to throw
at it for the forseable (and a hefty wadge that most
people aren't <grin>)

Where I *can* see it being useful is where you're
dealing with all kinds of different DTDs from a
particular URI space, and matching becomes
cumbersome. For example, imagine you had a project
linked to CORBA. Everything in /objects/* was
linked to a CORBA generator, so that /objects/<iiopID>
retrieved the content of an object. You could
potentially write a matcher, and put entries in the
sitemap for each type of object. This could become
somewhat cumbersome, particularly if you're targeting
WAP and HTML and PDF, for example. Using the directed
graph, you don't worry about it - just let the server
work out the easiest way to translate the document
into what the client wants.

I don't know, maybe this is a Really Bad Idea (tm),
but I think in some situations; particularly inside
large application servers where you've got lots of
people administering the system. If people could
upload a stylesheet, and just specify the source
and destination namespaces and let the server work
out when to use it, we save ourselves a lot of
configuration nightmares.

Anyway, that's my random thought for the evening,
hope reading this e-mail wasn't a *total* waste
of time for those who made it ;)

Cheers,



-- 
Paul Russell                               <pa...@luminas.co.uk>
Technical Director,                   http://www.luminas.co.uk
Luminas Ltd.

Re: [RT] [OT] Content_negotiation++;

Posted by Paul Russell <pa...@luminas.co.uk>.
Hi all,

Sorry I've been so quiet. Bit of work and computer madness going
down (didn't help that I broke Xwindows this morning. whoops).

On Thu, May 25, 2000 at 02:16:15PM +0200, Stefano Mazzocchi wrote:
> Paul Russell wrote:
> > I hope I'm not infringing an unwritten copyright of Stefano's
> > here, 
> No, not at all. I'd love to have a collection of random thoughts from
> everybody on this list, it would make it easier to create a coherent
> model that fits all needs. The power of open source is that is also open
> developped. I wish more things were open developped.

Glad you think that way, same here.

> What you are talking about was already discussed previously and named
> "namespace reaction", this means that you indicate what transformations
> to apply when a namespace is found.

Ahhh. Does this have anything to do with the 'reactor' model
I remember reading about on the website?

> We'll see the problems this reasoning has.

Heh. okay.

> > Real Soon Now, we'll be able to use something called
> > 'content negotiation' to work out what format browers
> > would prefer without having to inferr it from the URL
> > or User Agent.
> I would say: as soon as the sitemap matchers are implemented.

The only problem with this is that currently, browsers have
a habit of not really putting much thought into their accept
headers. I was working with a site which had macromedia flash
content (anyone written a serializer for it yet? <g>), and I
didn't want to do any of the kludgy client side plugin detec-
tion stuff that people recommed. I wrote myself a little web-
server in perl (using LWP, which rocks for that kind of thing)
to see what the clients were doing. From what I can remember,
both netscape had a habit of sending

   Accept: text/html text/plain image/gif image/jpeg */*

... or something similar, regardless of plugins etc. All in all,
about as useful as a chocolate fireguard ;)


> Apache is very good at this, even if most people don't know it. The
> problem is that it's very good on static content only.

Yep, indeed it is. Apache's got a disturbing habit of being
good at everything, well, everything static, anyway.

> > All XML documents have a 'namespace'. 
> First problem:a  general well-formed XML document has "no" namespace.
> But I agree that every "good-behaving" XML document should indicate its
> namespace, just like any "good-behaving" document should use xlink for
> links and RDF for metadata.

Yep.

> > [... 'translation map' ...]
> Ok, I see.
> 
> What you are writing is a "translation map", instructions for what
> transformations to apply to come from one namespace to another. This is
> why I call it "namespace reaction": you feed the processing engine with
> instructions to allow it to understand, reacting on the namespace found,
> what transformation to apply.

Indeed. Look the final target up in the connected graph
(mentioned below), and traverse towards your destination.

> Second problem: there are _infinite_ ways to move from one namespace to
> another.
> This is called _styling_ :-)
> What you are proposing is a single-style, hardwired path between
> namespaces, which is good only if there is one and only one
> transformation style applied to move from namespace A to namespace B.

Yep. True.

> you are confusing MIME-types for namespaces. It is very likely that more
> than one namespace partecipate directly in the creation of one single
> MIME type (fo + svg -> pdf)

Again. True. Sigh.

> > When the source XML is parsed/generated, we discover
> > its namespace from the root element, and go find
> > ourselves that node in the graph. 
> AHHHHH! no way! the root element has nothing to do with the namespaces
> that can be found inside the document. You are confusing namespaces with
> SGML-like doctypes, which, in a true XML world make very little sense.

Yeah, you're right, I'm misunderstanding namespaces. I was think-
ing of <docroot xmlns="foo"> meaning that 'foo' was the doctype,
whereas actually it's just the default namespace.

> But I'm wide open to suggestions to integrate namespace reaction in the
> sitemap if you find this simplifies your life in situations I can't
> think of.

Nahh. As I said, I don't really thing this is an idea that ties
with Cocoon. The origional idea came about in relation to large
scale distributed application servers (don't panic, I'm not talk-
ing about making Cocoon distributed - that's a really *really*
bad idea). What I was considering, essentially was how to take
arbitary data from persistant objects and display it to a user
in some format. I'm still thinking about how to do this side of
things.

> > [... stuff about 'packaging' filters so that staff ]
> > [ don't have to worry about sitemap configuration. ]
> 
> Yes, this is clearly the idea scenario. I believe, on the other hand,
> that the real scalability power is done with cascading sitemaps. Of
> course, if namespace reaction is included in the sitemap, the cascading
> capabilitiy will inherited, so this is not a clear argument against
> namespace reaction.

Yeah, I agree. The new sitemap ideas rock, although I've
been a bit buried, and so haven't had a chance to pull
them to pieces yet. Tell you what, starting a company
certainly eats vast amounts of time ;). Rest assured I'll
try and find myself a few hours to play over the next
couple of days (between exams and meetings ;)

> Like I said, suggestions for integration are welcome.

Heh. As I said, I don't really think this is something
that we should be adding to Cocoon2 unless someone can
come up with a groundbreaking way of doing it. I think
it would muddy the waters too much, and detract from
the simplicity of Cocoon. I'm (obviously) going to
continue banging the idea around, so if I come up with
anything dramatic that means that it might be a benifit
to cocoon, then obviously I'll let you all know!


-- 
Paul Russell                               <pa...@luminas.co.uk>
Technical Director,                   http://www.luminas.co.uk
Luminas Ltd.

Re: [RT] [OT] Content_negotiation++;

Posted by Stefano Mazzocchi <st...@apache.org>.
Paul Russell wrote:
> 
> Hi all,
> 
> I hope I'm not infringing an unwritten copyright of Stefano's
> here, 

No, not at all. I'd love to have a collection of random thoughts from
everybody on this list, it would make it easier to create a coherent
model that fits all needs. The power of open source is that is also open
developped. I wish more things were open developped.

> but this e-mail *definately* fits into the catagory of
> a random thought, rather than an immediate proposition. It's
> also somewhat off topic; this thought is NOT a suggestion
> for something we could do with Cocoon - I just don't think
> it makes sense in that environment. It's a random thought
> that's been bouncing around my head, and I thought I'd share
> it with you.

cool, see my comments below
 
> Before I dive in and start wombling, note that my internet
> connection is currently as dead as a dodo, and so I can't
> check any namespaces or anything. This also has the side
> effect that I have absolutely no idea whatsoever when our
> mailserver will actually succeed in delivering it.
> 
> I've simplified a lot of this, because (a) I don't want
> to cloud matters, if I can avoid it, and (b) it's too
> damned late for me to think straight ;)
> 
> I've been thinking about content/presentation abstraction for
> a *long* time, and even more since I've been involved with
> Cocoon.
> 
> One thing has always been in the back of my mind..
> 
>   "Surely the server should be able to deliver content
>    in whatever format the client wants? More to the point,
>    surely I shouldn't always have to think about it?"

I had the exact same feeling when moving from to the sitemap mindset. In
fact, Cocoon1 PIs were created to _reduce_ the load of site management
by allowing each one to indicate _how_ their resource was to be
constructed.

But it failed to provide a central place for administration, thus
_increasing_ the site management workload later on and reducing resource
reuse.

What you are talking about was already discussed previously and named
"namespace reaction", this means that you indicate what transformations
to apply when a namespace is found.

We'll see the problems this reasoning has.
 
> Let's think about this. Currently in Cocoon we have either
> PIs (in Cocoon1.x) or a sitemap (Cocoon2). In each of these,
> we explicitly specify how to get from the source document
> to the browser. Now, in Cocoon2, we can use matchers to
> determine what format to send data to the client in, which
> is great. I can have exactly the same URI, and without
> changing the source document, I can pump it out in HTML,
> text, PDF, or if I'm feeling particularly adventurous (or
> plain sick - you decide) SVG, PNG or JPEG. Cocoon1.x offers
> similar facilities, albeit in a somewhat under-engineered
> fashion.
> 
> Real Soon Now, we'll be able to use something called
> 'content negotiation' to work out what format browers
> would prefer without having to inferr it from the URL
> or User Agent.

I would say: as soon as the sitemap matchers are implemented.
 
> The way this works is that the client sends an 'Accept:'
> header to the server specifying what types of data it can
> understand. This feature is somewhat underdeveloped in
> current browsers, but it will improve, particularly as
> technologies such as Cocoon become prevalent.

Apache is very good at this, even if most people don't know it. The
problem is that it's very good on static content only.
 
> Now, the seed that has been growing in my head (albeit on
> the back burner) for a good few months now is that it is
> potentially possible to take this concept further.
> 
> All XML documents have a 'namespace'. 

First problem:a  general well-formed XML document has "no" namespace.
But I agree that every "good-behaving" XML document should indicate its
namespace, just like any "good-behaving" document should use xlink for
links and RDF for metadata.

> This is a unique URI,
> which allows us to be *sure* we're dealing with the set of
> semantics we were expecting. As the XML content within
> Cocoon flows from the generator to the serializer, this
> namespace changes.

Yes.
 
> Now, take the following lump of (imaginary) config for a
> Cocoon-like system:
> 
> <negotiate>
>   <translate
>     from="http://xmlns.luminas.co.uk/uea/prosp/intro/"
>     to="http://xmlns.luminas.co.uk/uea/prosp/layout/"
>     filter="xslt">
>     <parameter name="stylesheet" value="intro-layout.xsl"/>
>   </translate>
>   <translate
>     from="http://xmlns.luminas.co.uk/uea/prosp/subj/"
>     to="http://xmlns.luminas.co.uk/uea/prosp/layout/"
>     filter="xslt">
>     <parameter name="stylesheet" value="subj-layout.xsl"/>
>   </translate>
>   <translation
>     from="http://xmlns.luminas.co.uk/uea/prosp/layout/">
>     to="http://www.w3c.org/1999/xhtml/"
>     filter="xslt">
>     <parameter name="stylesheet" value="layout-xhtml.xsl"/>
>   </translate>
>   <translation
>     from="http://xmlns.luminas.co.uk/uea/prosp/layout/"
>     to="http://www.wapforum.org/[...Iforget...]"
>     filter="xslt">
>     <parameter name="stylesheet" value="layout-wml.xsl"/>
>   </translation>
> </negotiate>

Ok, I see.

What you are writing is a "translation map", instructions for what
transformations to apply to come from one namespace to another. This is
why I call it "namespace reaction": you feed the processing engine with
instructions to allow it to understand, reacting on the namespace found,
what transformation to apply.
 
> The server could then build a node-edge graph of the
> translations in memory, and back propagate the target
> namespaces to preceeding nodes in the graph. For the
> above (very simple) config file, the graph would look
> like [namespaces shortened]:
> 
> [uea/prosp/intro] \                      / [xhtml]
>                    |-[uea/prosp/layout]-|
>  [uea/prosp/subj] /                      \ [wml]
> 
> The xml and xhtml namespaces would be back propagated
> towards the left of the diagram, so that the 'layout'
> node knows to go 'up' for xhtml, and 'down' for wml,
> and so that the 'intro' and 'subj' nodes know that
> they can reach 'layout', 'xhtml', and 'wml' by going
> 'right'.

Second problem: there are _infinite_ ways to move from one namespace to
another.

This is called _styling_ :-)

What you are proposing is a single-style, hardwired path between
namespaces, which is good only if there is one and only one
transformation style applied to move from namespace A to namespace B.

For example, suppose you have a document that contains three of your
namespaces: doc: for documents, sql: for sql-generated data, quote: for
SOAP-retrieved stocks information.

So you have

 origin := doc + sql + quote

now you want to generate a PDF report of this. How do I do it? you need

 transform1 := (sql -> doc)
 transform2 := (quote -> doc)
 transform3 := (doc -> docbook)
 trasnform4 := (docbook -> fo)
 serializer := (fo -> pdf)

which you can create with the above graph.

But now you want to create an SVG table out of your SQL data but still
maintain the PDF output, ok, then you need

 transform1 := (sql -> svg)
 transform2 := (quote -> doc)
 transform3 := (doc -> docbook)
 trasnform4 := (docbook -> fo)
 serializer := (fo + svg -> pdf)

[NOTE: each transformation _must_ copy all the namespaces it is not
programmed to transform, otherwise the whole thing collapses and order
of namespace application becomes vital]

Ok, cool, but your boss wants big and fancy graphics all over to print
their broshures, all with the same data and with a fancier 3D SVG graph.
You look at the namespace chain (your transformation skeleton) and you
find it's exactly the same, but you just have to apply another _flavor_
of transformation.

At the end, it could be possible to do "namespace reaction" only if you
declared all the possible ways to crawl the namespace trallis, attaching
your own indentifies at every path.

While I believe this would make a very appealing visualization GUI-based
sitemap authoring tools, I don't think this is a good model for sitemaps
where more flavors of the same MIME-type are to be expressed. (which is
very likely to be the case in Cocoon, given it's power).

> When a request comes in, it would be tagged with the
> destination namespace (xhtml, wml, svg, whatever...).

you are confusing MIME-types for namespaces. It is very likely that more
than one namespace partecipate directly in the creation of one single
MIME type (fo + svg -> pdf)

> When the source XML is parsed/generated, we discover
> its namespace from the root element, and go find
> ourselves that node in the graph. 

AHHHHH! no way! the root element has nothing to do with the namespaces
that can be found inside the document. You are confusing namespaces with
SGML-like doctypes, which, in a true XML world make very little sense.

> The node then looks
> at the destination namespace and forwards the SAX
> events to a filter and on to the destination node.
> This process continues until it gets to the destination
> node, at which point it's serialized, which brings me
> nicely onto the next point...
> 
> I've got to admit, up to this point, I've made a
> rather large simplification. Great. We can transform
> from one namespace to another, but how on earth
> are we going to get a png or a jpeg out? The obvious
> answer is to treat mime types in a similar manner,
> and build them into the node graph. This causes
> problems, because suddenly we're not dealing with
> filters, we're dealing with serializers, which have
> a SAX input stream, but a *binary* output stream.
> To be frank, I'm still pondering this bit.
> 
> I've explained *how* something like this could work
> (loosely, I admit), but the question now has to be
> "why on earth would you want to?".
> 
> The answer is (and I did warn you this was *not* an
> immediate proposition) that at the moment, you
> *wouldn't* (this thing is *not* in the Cocoon2 target
> area, as far as I'm concerned). The sitemap handles
> pretty much anything most people are going to throw
> at it for the forseable (and a hefty wadge that most
> people aren't <grin>)
> 
> Where I *can* see it being useful is where you're
> dealing with all kinds of different DTDs from a
> particular URI space, and matching becomes
> cumbersome. For example, imagine you had a project
> linked to CORBA. Everything in /objects/* was
> linked to a CORBA generator, so that /objects/<iiopID>
> retrieved the content of an object. You could
> potentially write a matcher, and put entries in the
> sitemap for each type of object. This could become
> somewhat cumbersome, particularly if you're targeting
> WAP and HTML and PDF, for example. Using the directed
> graph, you don't worry about it - just let the server
> work out the easiest way to translate the document
> into what the client wants.

I agree, but I've showed how namespace reaction works well only for
single-flavored transformations and I don't think this will be the case
in many situations. And, if it was the case, I'd rather use placeholders
for the transformation chain in the sitemap rather than forcing
namespace reaction and polluting the uri-based declarative model.

But I'm wide open to suggestions to integrate namespace reaction in the
sitemap if you find this simplifies your life in situations I can't
think of.
 
> I don't know, maybe this is a Really Bad Idea (tm),
> but I think in some situations; particularly inside
> large application servers where you've got lots of
> people administering the system. If people could
> upload a stylesheet, and just specify the source
> and destination namespaces and let the server work
> out when to use it, we save ourselves a lot of
> configuration nightmares.

Yes, this is clearly the idea scenario. I believe, on the other hand,
that the real scalability power is done with cascading sitemaps. Of
course, if namespace reaction is included in the sitemap, the cascading
capabilitiy will inherited, so this is not a clear argument against
namespace reaction.

Like I said, suggestions for integration are welcome.

> Anyway, that's my random thought for the evening,
> hope reading this e-mail wasn't a *total* waste
> of time for those who made it ;)

Not at all :)

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------