You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Daniel Fagerstrom <da...@nada.kth.se> on 2005/01/06 01:54:09 UTC

[RT] Escaping Sitemap Hell

  (was: Splitting xconf files step 2: the sitemap)

Although the Cocoon sitemap is a really cool innovation it is not 
entierly without problems:

* Sitemaps for large webapps easy becomes a mess
* It sucks as a map describing the site [1]
* It doesn't give that much support for "cool URLs" [2]

In this RT I will try to analyze the situation especially with respect 
to URL space design and then move on to discuss a possible solution.

Before you entusiasticly dive into the text:

* It is a long RT, (as my RTs usually are)
* It might contain provoking and hopefully even thought provoking ideas
* No, I will not require that everything in it should be part of 2.2
* No, I don't propose that we should scrap the current sitemap, actually 
I believe that we should support it for the next few millenia ;)

                            --- o0o ---

Peter and I had some discussion:

 Peter Hunsberger wrote:

> On Tue, 04 Jan 2005 13:25:05 +0100, Daniel Fagerstrom 
> <da...@nada.kth.se> wrote: 

<snip/>

>> Anyway, sometimes when I need to refactor or add functionallity to 
>> some of our Cocoon applications, where I or colleagues of mine have 
>> written endless sitemaps, I have felt that it would have been nice if 
>> the sitemap would have been more declarative so that I could have 
>> asked it basic things like geting a list of what URLs or URL pattern 
>> it handles. Also if I have an URL in a large webapp and it doesn't 
>> work as expected it can require quite some work to trace through all 
>> involved sitemaps to see what rule that actually is used. 
>

>> Of course I understand that if I used a set of efficient conventions 
>> about how to structure my URL space and my sitemaps the problem would 
>> be much less. Problem is that I don't have found such a set of 
>> conventions yet. Of course I'm following some kind of principles, but 
>> I don't have anything that I'm completely happy with yet. Anyone 
>> having good design patterns for URL space structuring and sitemap 
>> structuring, that you want to share? 
>
>
>We have conventions that use sort of type extensions on the names: 
>patient.search, patient.list, patient.edit where the search, list,
>edit (and other) screen patterns are common across many different
>metadata sources (in this case patient). We don't do match *.edit
>directly in the sitemap (any more) but I find that if you've got to
>handle orthoganal concerns then x.y.z naming patterns can sometimes
>help.
>
Ok, lets look at this in a more abstracted setting:

Resource Aspects
================

In the example above we have an object or better a _resource_, the 
patient that everything else is about. The resource should be 
identifyable in an unique way in this case with e.g. the social security 
number.

There are a number of _operations_ that can be performed at the patient 
resource: show, edit, list, search etc, (although the search might be on 
the set of patient rather than a single one).

The resource has a _type_, patient, that might affect how we choose to 
show it etc.

There are in general other aspects that will stear how we render the 
response when someone asks for the resouce:

* The _format_ of the response: html, pdf, svg, gif etc.
* The _status_ of the resource: old, draft, new etc.
* The _access_ rights of the response: public, team, member etc.

There are plenty of other possible aspect areas as well.

Cool Webapp URLs
================

I searched the web to gain some insights in URL space design. It soon 
become clear that I should re-read Tim Berners-Lee's clasic, "Cool URIs 
don't change" [2]. I must say I wasn't prepared to the chock, I had 
completely missed how radical the message in it was when I read it the 
last time. I can also recomend reading [3], a W3C note that codifies the 
message from [2] and some other good URI practices into a set of guidelines.

So what is an URI? According to [3]:

  A URI is, actually, a //reference to a resource, with fixed and 
independent semantics/ /.

This means that the URI should reference to a specific product, 
_always_. Independent semantics means that a social security number is 
not enough, it should say that it is a person (from USA) as well. See 
[3] for the philosophical details.

* The URI should be easy to type
* It should not contain to much meaning, especially not about 
implementation details

Now I try to apply the ideas from [2] and [3] on the different resource 
aspects mentioned above. When I use words like "should" or "should not" 
without any motivation it means that I believed in the motivation from 
the gurus in the references ;) I will try to motivate my own ideas ;)

What I'm going to suggest might be quite far from how you design your 
URL spaces. It is certainly far from the implementation detail plauged 
mess that I have created in my own applications.

The Resource
------------

The idea is that an URL identifies a resource. For the patient case 
above it could be:

http://myhospital.com/person/123456789

If we use a hierarchial URI space like /person/123456789, the "parent" 
URIs e.g. /person should also refer to a resource. Its in most cases not 
a good idea to put a lot of topics classification effort in the URI 
hierarchy. Classifications are not unique and will change according to 
changing interests and world view.

Operations
----------

What about the operations on the resource: list, search, edit etc? I 
find the object oriented style in WebDAV elegant where you use one URL 
together with different HTTP methods to perform different operations. 
Sam Ruby also have some intersting ideas about using URLs to identify 
"objects" and different SOAP messages for different methods on the 
object in his "REST+SOAP" article [4]. But neither adhoc HTTP methods or 
XML posts seem like good candidates for invoking operations on a 
resource in a typical webapp. So maybe something like:

/person/123456789/edit or
/person/123456789.edit or
/person/123456789?operation=edit

is a good idea.

Resource Type
-------------

Should the type of the resource be part of the URI? We probably have to 
contain some type info in the URL to give it "independent sematics" 
(person e.g.). But we should not put types that might change like 
patient, manager, project-leader etc in the URL. And we should 
especially avoid types that only have to do with implementation details 
like what pipeline we want to use for rendering the resource.

Format
------

Cocoon especially shines in handling all the various file name 
extensions: .html, .wml, .pdf, .txt, .doc, .jpg, .png, .svg, etc, etc. 
But I'm sorry, if you want cool URLs you have to kiss them godbye as well ;)

It might be a good idea to send a html page to a browser on a PC and a 
wml page to a PDA user. But you shouldn't require your user to remember 
different URLs for different clients, thats a task for server driven 
content negotiation.

Using .html is not especially future proof, should all links become 
invalid when you decide to reimplement your site with dynamic SVG?

Often it is good to provide the user with a nice printable version of 
your page. But why should you advertice Adobes products in your URLs. A 
few years ago it was .ps or .dvi from academic sites and .doc in 
comersial sites. Right now it happen to be .pdf but will that be forever?

Same thing with images, the user don't care about the format as long as 
it can be shown in the browser (content negotiation), neither should you 
make your content links or (Googles image search) be  dependent on a 
particular compression scheme that happen to be popular right now.

There are of course cases where you really whant to give your user the 
abillity to choose a specific format. Then a file name extension is a 
good idea. If you happens to maintain 
http://www.adobe.com/products/acrobat/ its ok to put some .pdf there e.g. ;)

But in most cases file name extensions is an implementation detail that 
not is relevant for your users.

Status
------

The status will by definition change, and that make your URL uncool if 
the status was part of the URL.

Access Rights
-------------

Access rigths will often change for a document. I know it is easy to 
write path dependent rules for access rights in most webserver 
configuration files. But you expose irrelevant implementation details 
and its not future proof.

Am I Really Serious?
--------------------

Why should a webapp URL be cool and future proof? Well, its the 
interface to your webapp. We agree that we shouldn't change interfaces 
in Cocoon at a whim, why should we treat the users of our webapps 
differently? And like it or not, usefull software sometimes lives for 
decades. If you build useful webapps you should consider planing ahead.

Currently we are all used with webapps that uses the most horrible URLs 
containing tons of implementation details and changing every now and 
then. But it is not a law of nature that it must be like that. It is 
mainly a result of webapp development still being immature and the tools 
being far from perfect. Of course the user should be able to bookmark a 
useful form or wizard.

Also I believe that exposing implementation details in ones URLs is at 
least as bad as making all member variables public in Java classes. It 
makes your webapp monolithic and fragile.

                            --- o0o ---

You might find the views expressed above rather extreme and maybe 
unpractical. As indicated above they are also far away from what I 
curently do in my webapps. But I have for quite some time thought about 
how to fight the to easily increasing entropy in the webapps we develop. 
I have suspected that badly designed URL spaces has been part of the 
trouble. And when I re-read Tim BLs classic I suddenly realized that the 
habit of exposing implementation in the URLs might be at the root of the 
evil.

If this realization will survive the contact with your comments and 
other parts of reality is of course to early to tell ;)

Does Cocoon Support Cool URLs?
==============================

But how does Cocoon support the above ideas about URL space design?

Well, in some way one could say that it supports it. The sitemap is so 
powerfull that you can program most usage patterns in it in some more or 
less elegant way. But AFAICS, writing webapps following the URL space 
design ideas above would be rather tricky. So I would say that Cocoon 
doesn't support it that well. The main reasons are:

* The sitemap is not that usefull as a site map
* The sitemap gives excelent support for choosing resource production 
implementation based on the implementation details coded into the URL, 
but not for avoiding it
* The sitemap mixes site map concerns with resource production 
implementation details

Is it a Map of the Site?
------------------------

The Forrest people don't think that the sitemap is enough as map of the 
site. They have a special linkmap [1] that gives a map over the site and 
that is used for internal navigation and for creating menu trees. I have 
a similar view. From the sitemap it can be hard to answer basic 
questions like:

* What is the URL space of the application
* What is the structure of the URL space
* How is the resource refered by this URL produced

The basic view of a URL in the sitemap is that it is any string. Even if 
there are constructions like mount, the URL space is not considered as 
hierarchial. That means that the URLs can be presented as patterns in 
any order in the sitemap and you have to read through all of it to see 
if there is a rule for a certain URL.

A real map for the site should be tree structured like the linkmap in 
forrest. Take a look at the example in [1], (I don't suggest using the 
"free form" XML, something stricter is required). Such a tree model will 
also help in planning the URI space as it gives a good overview of it.

The Forrest linkmap have no notion of wildcards, which is a must in 
Cocoon. We continue discussing that.

Choosing Production Pipeline
----------------------------

With the sitemap it is very easy to choose the pipeline used for 
producing the response based on a URL pattern "*.x.y.z". That more or 
less forces the user to code implementation details i.e. what pipeline 
to use into the URL. This is only a problem for wildcard patterns 
otherwise we just associate the pipeline to the concrete "cool URL".

Before I suggested that aspects like: type, format, status, access 
rights etc shouldn't be part of the URL as those aspects might change 
for the resource. OTH these aspects certainly are necessary for choosing 
rendering pipeline, what should we do?

The requested resource will often be based on some content or 
combination of content that we can access from Cocoon. The content can 
be a file, data in a db, result from a business object etc. Let us 
assume that it resides in some kind of content repository. Now if we 
think about it, isn't it more natural to ask the content, that we are 
going to use, about its propertiies like type, format, status, access 
rights, etc, than to encode it in the URL? This properties can be 
encoded in the file name, in metadata in some property file, within the 
file, in a DB etc. Now instead of having the rule:

*.x.y.z ==> XYZPipeline

we have

* where repository:{1} have properites {x, y, z} ==> XYZPipeline

or

* where repository:{1}.x.y.z exists ==> XYZPipeline

We get the right pipeline by querying the repository instead of encoding 
it in the URL. A further advantage is that the rule becomes "listable" 
as the "where" clause in the match expresses what are the allowed values 
for the wildcard.

Separating the Concerns
-----------------------

The sitemap handles two concerns: it maps an URL to a pipeline that 
produces a repsonse and it describes how to put together this pipeline 
from sitemap components. The first concern is related to site design and 
the second is more a form of programming. Puting them together makes it 
hard to see the URL structure and also makes it tempting to group URLs 
based on common pipeline implementation instead of on site structure.

Virtual Pipeline Components (VPCs) give us a way out from this. Large 
parts of our sites might be buildable with pipelines allready 
constructed in some standard blocks.

I would propose to go even further, in the "real" site map it should 
only be allowed to call VPC pipelines, no pipeline construction is 
allowed, that should be done in the component area.

In the "real" site map the current context is set and the the arguments 
to the called VPC is given.

Search Order
------------

>  The problem for us, is as you allude to at the start of this
>thread: Cocoon takes the first match, where what you really want is a
>more XSLT "best match" type of handling; sometimes *.a, *.b, *.c works
>and other times it's m.*, n.*, o.*...
>
>In the past that has lead me to suggest a sort of XSLT flow, but
>thinking about it in this light I wonder if what I really want is just
>XSLT sitemap matching (same thing in the end)...
>  
>
I also believe that a "best match" type of handling is preferable, it 
increases IMO usabillity and it also makes it possible to use tree based 
maching algoritms that are far more efficient than the current linear 
search based.

The new sitemap
===========

To sum up the proposal:

Pipelines:
* Pipeline construction is only done as VPCs in component areas (often 
in blocks).

Sitemap:
* The sitemap is folow the tree structure of the URL space (like the 
Forrest linkmap).
* Its responsibillity is to map URLs to VPCs
* It can set the current context for each level in the tree (for 
derefering relative paths used in the VPC)
* Wildcards can have restrictions based on properties in the content 
repository
* Its best match based rather than rule order based
* Of course we have an include construct so that we can reuse sub sites

It might look like:

<sitemap>
  <path match="person" context="adm/persons" 
pipeline="block:skin:default(search.xml)">
    <path match="*:patient" test="mydb:/patients/{patient} exists" 
context="adm/patients" pipeline="journal-summary({patient})">
      <path match="edit" pipeline="edit({patient})"/>
      <path match="list" pipeline="list({patient})"/>
      <!-- and so on -->
    </path>
  </path>
</sitemap>

Don't care about the syntactical details in the example it needs much 
more thought, I just wanted to make it a little bit more concrete. The 
path separator "/" is implicily assumed between the levels. "*:patient", 
means that the content of "*" can be refered to as "patient".

Much of what I propose can be achieved with VPCs and a new "property 
aware" matcher. But IMO the stricter SoC above, the ability to "query" 
the sitemap, the possible advantages of the "best match" search, are 
reasons enough to go further.

WDYT?

/Daniel

[1] "site.xml" http://forrest.apache.org/docs/dev/linking.html
[2] "Cool URIs don't change", http://www.w3.org/Provider/Style/URI.html
[3] "Common HTTP Implementation problems" 
http://www.w3.org/TR/2003/NOTE-chips-20030128/
[4] "REST + SOAP" 
http://www.intertwingly.net/stories/2002/07/20/restSoap.html

Re: [RT] Escaping Sitemap Hell

Posted by Niclas Hedhman <ni...@hedhman.org>.

On Thursday 06 January 2005 19:37, Daniel Fagerstrom wrote:

> >As a sidenote; Tapestry is battling with these types of problems as well
> > (but a somewhat different level), and handles it by allowing an
> > interceptor in the URL encoding/decoding phase.
>
> Do you have any references that sumarize that?

Summarize is perhaps not the word, but a complete guide how it is done;

http://wiki.apache.org/jakarta-tapestry/FriendlyUrls


Cheers
Niclas
-- 
---------------
All those who believe in psychokinesis, raise my hand.
 -  Steven Wright

+---------//-------------------+
|   http://www.dpml.net        |
|  http://niclas.hedhman.org   |
+------//----------------------+

Re: [RT] Escaping Sitemap Hell

Posted by Daniel Fagerstrom <da...@nada.kth.se>.

Niclas Hedhman wrote:

>On Thursday 06 January 2005 08:54, Daniel Fagerstrom wrote:
>
>Good post !
>
Thanks :)

>>But you shouldn't require your user to remember
>>different URLs for different clients, thats a task for server driven
>>content negotiation.
>>
>>Using .html is not especially future proof, should all links become
>>invalid when you decide to reimplement your site with dynamic SVG?
>>    
>>
>
>I have thought about this a million times over the last 5 years, and first 
>concluded "yeah, let's do that", and then back tracked since the file system 
>is not 'negotiating' and having the same stuff working locally is always a 
>big plus.
>
Are you refering to the Forrest kind of situation where you can generate 
a static "site" at your hard disk and access it directly?

Even in such situations you could, at least in principle, do format 
negotiation by leting the "cool" file URL point to a html page that has 
alternate media type liks 
http://www.w3.org/TR/REC-html40/struct/links.html#h-12.3, 
http://www.w3.org/TR/REC-html40/types.html#type-media-descriptors. Then 
the media type links in turn points to the screen, tty, projection, 
handheld, print etc page that the browser can use, depending on its 
preferences. The altenate media pages could in turn have a bookmark link 
that point to the cool URL. I don't know if this works well in practice 
with common browsers.

As I'm rather fed up with the URL space hell that I have created in my 
own applications, I choosed to take a fundamentalistic view concerning 
"fixed and independed semantics" of URLs to see where it leads. And as 
you could see in the second half of my post it is not only some file 
systems that have limited support for cool URLs, AFAIK Cocoons support 
is rather limited as well.

Now in practice I think that URL space design have similarities with API 
design. For some externally used intefaces and URLs, the cost of change 
is very high as many users depend on them, for internal interfaces and 
URLs the cost of change is much lower, so you don't need to care that 
much about the design. But exposing lots of internal implementation 
details in your APIs and URL spaces is often to ask for trouble.

>>But it is not a law of nature that it must be like that. It is
>>mainly a result of webapp development still being immature and 
>>the tools being far from perfect. Of course the user should be 
>>able to bookmark a useful form or wizard.
>>    
>>
>
>Now you have the interesting thing of 'temporary URLs' used for session 
>sensitivity. How often doesn't one bookmark a page and later coming back 
>"Session has expired" type of resource not found??
>Would be real cool if the web app system could help dramatically in this 
>field.
>
The W3C note i refered to has some guidelines 
http://www.w3.org/TR/2003/NOTE-chips-20030128/#gl3.

My view is that we should strive for making it as easy as possible to 
follow god web practicies (like those described in 
http://www.w3.org/TR/2003/NOTE-chips-20030128/), when using Cocoon.

>As a sidenote; Tapestry is battling with these types of problems as well (but 
>a somewhat different level), and handles it by allowing an interceptor in the 
>URL encoding/decoding phase.
>
Do you have any references that sumarize that?

> The internal URL space remains, and a 
>user-created component translates URLs back and forth between the public 
>space and the internal one.
>
That is similar to what I propose. The new tree structured sitemap 
translates public URLs to an internal space of (virtual) pipeline calls. 
Due to Cocoons dynamic nature the internal "URL space", is more abstract 
than file reference URLs, but we can (and probably should) make it an 
URL space anyway.

The XFrames URIs http://www.w3.org/TR/xframes/ can serve as inspiration:

|  http://example.org/home.frm#frames(id1=uri1,id2=uri2,...)

We can have:

  block:myblock#bar-chart-view(data=mydata.xml)

Where relative URLs ("mydata.xml" in this case) are relative to the 
current context (actually it is a little bit more complicated 
http://marc.theaimsgroup.com/?t=110064560900003&r=1&w=2). The current 
context can be set at each level in the tree sitemap.

Thanks to the declarative nature of my proposed tree sitemap and that 
the wildcards are "typed" so that their range can be found, the sitemap 
can be inverted to a maping from internal URLs to external ones.

|/Daniel

Re: [RT] Escaping Sitemap Hell

Posted by Niclas Hedhman <ni...@hedhman.org>.

On Thursday 06 January 2005 08:54, Daniel Fagerstrom wrote:

Good post !

> But you shouldn't require your user to remember
> different URLs for different clients, thats a task for server driven
> content negotiation.
>
> Using .html is not especially future proof, should all links become
> invalid when you decide to reimplement your site with dynamic SVG?

I have thought about this a million times over the last 5 years, and first 
concluded "yeah, let's do that", and then back tracked since the file system 
is not 'negotiating' and having the same stuff working locally is always a 
big plus.

> But it is not a law of nature that it must be like that. It is
> mainly a result of webapp development still being immature and 
> the tools being far from perfect. Of course the user should be 
> able to bookmark a useful form or wizard.

Now you have the interesting thing of 'temporary URLs' used for session 
sensitivity. How often doesn't one bookmark a page and later coming back 
"Session has expired" type of resource not found??
Would be real cool if the web app system could help dramatically in this 
field.

As a sidenote; Tapestry is battling with these types of problems as well (but 
a somewhat different level), and handles it by allowing an interceptor in the 
URL encoding/decoding phase. The internal URL space remains, and a 
user-created component translates URLs back and forth between the public 
space and the internal one.

Cheers
Niclas
-- 
---------------
All those who believe in psychokinesis, raise my hand.
 -  Steven Wright

+---------//-------------------+
|   http://www.dpml.net        |
|  http://niclas.hedhman.org   |
+------//----------------------+

Re: [RT] Escaping Sitemap Hell

Posted by Peter Hunsberger <pe...@gmail.com>.

On Thu, 06 Jan 2005 01:54:09 +0100, Daniel Fagerstrom
<da...@nada.kth.se> wrote:
>  (was: Splitting xconf files step 2: the sitemap)
> 

Interesting thoughts, I didn't get much sleep last night, so excuse me
if any of the following comes across as a bit grumpy, no flames or
criticism is intended...

(Excuse the random un-annotated snips, don't have much time today.)

> Although the Cocoon sitemap is a really cool innovation it is not
> entierly without problems:
> 
> * Sitemaps for large webapps easy becomes a mess
> * It sucks as a map describing the site [1]
> * It doesn't give that much support for "cool URLs" [2]

Except for the issue of matching which you get into later, I'm not
sure that I see a big problem here:

- sitemaps for large webapps only become a mess if you design the URL
space badly.  Better suport for matching could help, and that issue
might be tied to how the sitemap is managed within Cocoon, but my gut
feel says it's just a matcher issue;

- a Cocoon sitemap is not Website site map, nor should it be.  Maybe
the name "sitemap" isn't the best? I think having the ability to
produce a "URL Map" from something that crawls the sitemap is maybe
part of the answer here.

- I don't think support of Cool URLs is supported or hindered by the
sitemap ? Maybe more on that below.

<snip/>

> The Resource
> ------------
> 
> The idea is that an URL identifies a resource. For the patient case
> above it could be:
> 
> http://myhospital.com/person/123456789
> 
> If we use a hierarchial URI space like /person/123456789, the "parent"
> URIs e.g. /person should also refer to a resource. Its in most cases not
> a good idea to put a lot of topics classification effort in the URI
> hierarchy. Classifications are not unique and will change according to
> changing interests and world view.

I tend to map my URL hierarchy to an abstract object hierarchy.  In
this case patient can be thought of as an object.  The patient with
the id 12345678 is an instance and as such that identifier should not
be part of the URL hierarchy.  Instead, object attributes become
request parameters. This may seem like a quibble but I think it's
important as one tries to figure out good rules for hierarchy mapping.

> 
> Operations
> ----------
> 
> What about the operations on the resource: list, search, edit etc? I
> find the object oriented style in WebDAV elegant where you use one URL
> together with different HTTP methods to perform different operations.
>
> Sam Ruby also have some intersting ideas about using URLs to identify
> "objects" and different SOAP messages for different methods on the
> object in his "REST+SOAP" article [4]. But neither adhoc HTTP methods or
> XML posts seem like good candidates for invoking operations on a
> resource in a typical webapp. So maybe something like:
> 
> /person/123456789/edit or
> /person/123456789.edit or
> /person/123456789?operation=edit

Ok, next rule: operations map to methods, not to hierarchy or to
attributes.  As such, that's why I introduce the "." notation; to
separate the concerns.  Thus, so far the general rule is:

location_hierarchy/../resource.operation?attributes

Your mileage may vary, but it works for us...

> 
> Resource Type
> -------------
> 
> Should the type of the resource be part of the URI? We probably have to
> contain some type info in the URL to give it "independent sematics"
> (person e.g.). But we should not put types that might change like
> patient, manager, project-leader etc in the URL. And we should
> especially avoid types that only have to do with implementation details
> like what pipeline we want to use for rendering the resource.

Keep the objects abstract :-)...

> 
> Format
> ------
> 
> Cocoon especially shines in handling all the various file name
> extensions: .html, .wml, .pdf, .txt, .doc, .jpg, .png, .svg, etc, etc.
> But I'm sorry, if you want cool URLs you have to kiss them godbye as well ;)

<snip/>

> But in most cases file name extensions is an implementation detail that
> not is relevant for your users.

Makes sense, in our case much of the time operations and format are
interchangeable and there may be defaults.  Thus:

   x/y/z/patient?id=12345

might be the same as:

  x/y/z/patient.html?12345

Don't know if it helps, but you can think of format as an object
method to get a specific output format...

<snip/>

> Does Cocoon Support Cool URLs?
> ==============================
> 
> But how does Cocoon support the above ideas about URL space design?
> 
> Well, in some way one could say that it supports it. The sitemap is so
> powerfull that you can program most usage patterns in it in some more or
> less elegant way. But AFAICS, writing webapps following the URL space
> design ideas above would be rather tricky. So I would say that Cocoon
> doesn't support it that well. The main reasons are:
> 
> * The sitemap is not that usefull as a site map
> * The sitemap gives excelent support for choosing resource production
> implementation based on the implementation details coded into the URL,
> but not for avoiding it
> * The sitemap mixes site map concerns with resource production
> implementation details

I think this all goes back to my initial comments: need a "URL Map"
producer and better matching?

<snip/>

> Before I suggested that aspects like: type, format, status, access
> rights etc shouldn't be part of the URL as those aspects might change
> for the resource. OTH these aspects certainly are necessary for choosing
> rendering pipeline, what should we do?

This is where the difference between resource publishing and browser
based application support start to matter.  It seems to me that maybe
you are coming at this more from an application support perspective
than a resource publishing perspective? If so, I'll note that you want
to separate the issue of layout from the choice of pipeline.  In
particular, in an application it's better to have the URL behave in a
friendly way than to stay "cool" at all times.  For example:

x/y/z/patient?id=123*

might produce a search layout if no match was found, a list layout if
multiple matches was found or an edit layout if a single match was
found.  All of these might be considered different renderings of a
single "resource".  On might (perhaps rightfully) think that I'm
stretching TBL's definition of resource, but consider that if there
was no application behind the scenes (and thus the data never changed)
then this URL would turn "cool" since it would always produce the same
output.

<snip/>

> 
> The new sitemap
> ===========
> 
> To sum up the proposal:
> 
> Pipelines:
> * Pipeline construction is only done as VPCs in component areas (often
> in blocks).

As has been pointed out, flow also gives you a way to separate these concerns...

> Sitemap:
> * The sitemap is folow the tree structure of the URL space (like the
> Forrest linkmap).
> * Its responsibillity is to map URLs to VPCs
> * It can set the current context for each level in the tree (for
> derefering relative paths used in the VPC)
> * Wildcards can have restrictions based on properties in the content
> repository
> * Its best match based rather than rule order based
> * Of course we have an include construct so that we can reuse sub sites
> 
> It might look like:
> 
> <sitemap>
>  <path match="person" context="adm/persons"
> pipeline="block:skin:default(search.xml)">
>    <path match="*:patient" test="mydb:/patients/{patient} exists"
> context="adm/patients" pipeline="journal-summary({patient})">
>      <path match="edit" pipeline="edit({patient})"/>
>      <path match="list" pipeline="list({patient})"/>
>      <!-- and so on -->
>    </path>
>  </path>
> </sitemap>

I think you're reinventing flow in XML that isn't even XSLT.  I'd
prefer an XSLT based sitemap :-)

<snip>

> 
> Much of what I propose can be achieved with VPCs and a new "property
> aware" matcher. But IMO the stricter SoC above, the ability to "query"
> the sitemap, the possible advantages of the "best match" search, are
> reasons enough to go further.

Better matching would help, but again flow can already do most of what
you describe.  The ability to query the sitemap to get a link map/url
map might be the most important thing.

I'll likely be offline for the next couple of days, small odd's of
being able to contribute further to this thread.  Good luck :-)

-- 
Peter Hunsberger

Re: [RT] Escaping Sitemap Hell

Posted by Glen Ezkovich <gl...@hard-bop.com>.

On Jan 12, 2005, at 3:20 PM, Peter Hunsberger wrote:

> On Wed, 12 Jan 2005 13:59:41 -0600, Glen Ezkovich <gl...@hard-bop.com> 
> wrote:
>>
>>
>> I'm just curious as to what type of information is contained your URLs
>> that enables continuation and what the applications do first when a
>> user continues.
>
> It varies, but basically:
>
>    location/screen.type?parameters
>
> where parameters are request parameters that vary by screen and type
> and qualify the instance of the screen.  We sometimes add a third
> qualifier to type but it's not involved in session recovery that I
> know of.
>
> As mentioned earlier, this essentially maps to:
>
>   package.object.method( parameters )
>
> on the front end, but that's not 100%, flow script can mangle the
> mapping completely in some cases and even in the generic case there's
> a lookup substitution.

Thanks. I should have gone back and reread the thread. Sorry for the 
inconvenience.

Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com



A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to 
worry about answers."
- Thomas Pynchon Gravity's Rainbow

Re: [RT] Escaping Sitemap Hell

Posted by Peter Hunsberger <pe...@gmail.com>.

On Wed, 12 Jan 2005 13:59:41 -0600, Glen Ezkovich <gl...@hard-bop.com> wrote:
> 
> 
> I'm just curious as to what type of information is contained your URLs
> that enables continuation and what the applications do first when a
> user continues.

It varies, but basically: 

   location/screen.type?parameters 

where parameters are request parameters that vary by screen and type
and qualify the instance of the screen.  We sometimes add a third
qualifier to type but it's not involved in session recovery that I
know of.

As mentioned earlier, this essentially maps to:

  package.object.method( parameters )

on the front end, but that's not 100%, flow script can mangle the
mapping completely in some cases and even in the generic case there's
a lookup substitution.

-- 
Peter Hunsberger

Re: [RT] Escaping Sitemap Hell

Posted by Glen Ezkovich <gl...@hard-bop.com>.

On Jan 12, 2005, at 12:41 PM, Peter Hunsberger wrote:

>
>> At a single session level I think you can keep this information 
>> private
>> by using forms. Users will not see the query that contains the data 
>> you
>> are tracking. Sure the user can't come back after closing their 
>> browser
>> and pick up where they left off, but you really didn't design your
>> application for this.
>
> Sure, if you don't need any adoption/resumption of the state across
> browsers then form variables work fine. However, we _do_ design our
> applications for resumption upon resumed sessions.

I did realize that you do design for resumption. Hence the use of 
anchors instead of forms.

>  In particular, we
> do timed out session recovery/resumption after reauthentication

I'm just curious as to what type of information is contained your URLs 
that enables continuation and what the applications do first when a 
user continues.

Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com



A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to 
worry about answers."
- Thomas Pynchon Gravity's Rainbow

Re: [RT] Escaping Sitemap Hell

Posted by Peter Hunsberger <pe...@gmail.com>.

On Wed, 12 Jan 2005 12:04:38 -0600, Glen Ezkovich <gl...@hard-bop.com> wrote:

> 
> On Jan 12, 2005, at 9:17 AM, Peter Hunsberger wrote:
> 
> > On Sat, 8 Jan 2005 18:48:29 -0600, Glen Ezkovich <gl...@hard-bop.com>
> > wrote:
> >
> > <snip>stuff everyone seems to agree on</snip>\
> >
> >>> You've got to allow for variations on
> >>> authorizations, error handling, timeouts, resumed sessions, etc.
> >>
> >> These do not have to be public URLs. All of these things are internal
> >> to the application. If you are providing your users with URLs for
> >> these
> >> things you should ask yourself if there is a better way to handle
> >> this.
> >
> > The problem is that apps often need to track some kind of state.  Any
> > one of the above can affect state.  In some cases you can't rely on
> > session (or cookies) and URL becomes the easiest fall back.
> 
> Easiest, not necessarily the best. As always it depends on your
> resources and use cases as to which solution is best. An application
> designed to handle authorizations, time outs and/or resumed sessions
> probably should not depend on URLs to accomplish these tasks.

Certainly not.  However, URL's can still be a convenient, even good,
way to encode state, in particular if the state representation is
valid across multiple other concerns.

> At a single session level I think you can keep this information private
> by using forms. Users will not see the query that contains the data you
> are tracking. Sure the user can't come back after closing their browser
> and pick up where they left off, but you really didn't design your
> application for this.

Sure, if you don't need any adoption/resumption of the state across
browsers then form variables work fine. However, we _do_ design our
applications for resumption upon resumed sessions.  In particular, we
do timed out session recovery/resumption after reauthentication.

<snip/>

> >
> > I think there are some overall patterns that can be useful for
> > probably 80% of the use cases. Stefano seems to believe it's not worth
> > trying to build anything directly into Cocoon to handle these.
> 
> I think his argument is stronger then that. If you implement a sitemap
> structure that maps perfectly to 80% of the use cases, there is a good
> chance that the other 20% would be impossible to achieve using that
> structure or at the very least require considerable hacking.

Yes: but that's exactly the current situation.  The reality of the Web
is that it's a graph.  Graph theory tells us that's there's no
algorithm for generalized graph traversal that will be guaranteed to
finish in finite time for all graphs.  So, no single solution can work
for all Cocoon apps.  So far Cocoon has three ways (at least) to
attack the problem of customized graph traversal:

1) the sitemap with the default matchers;

2) pluggable matchers;

3) flow.

There seems to be very little exploration of 2), as has been kicked
around in this thread at least part of the solution may lie in some
new types of matchers.

For some reason some people seem to want to avoid 3), even with Java
flow (as opposed to flow script).  The other issue here is that 3) 
currently makes it hard to get back a URL-map so there's problems with
debugging (and arguably potential security exposures).

What I'd like is something new that would attack 2) and 3)
simultaneously: an XSLT version of the sitemap.  However, as I said,
I've argued for this before and I still don't think I can make a
convincing case that it would solve anyone's problems than my own...

<snip/>

> Matching is deferent then restructuring the sitemap. I don't believe
> Stefano would have a problem with a TreeMatcher. But I could be wrong.
> :-0

It's not clear that a tree matcher can be implemented with the current
sitemap: either you match or you don't, there's no way of "voting" on
how well you match.  I think to get a true TreeMatcher you'd also need
to change some parts of the Cocoon internals (to evaluate what a best
match means or to track past match context). IOW, you'd need something
other than the current sitemap (even if the syntax remains backwards
compatible)...

-- 
Peter Hunsberger

Re: [RT] Escaping Sitemap Hell

Posted by Glen Ezkovich <gl...@hard-bop.com>.

On Jan 12, 2005, at 9:17 AM, Peter Hunsberger wrote:

> On Sat, 8 Jan 2005 18:48:29 -0600, Glen Ezkovich <gl...@hard-bop.com> 
> wrote:
>
> <snip>stuff everyone seems to agree on</snip>\
>
>>> You've got to allow for variations on
>>> authorizations, error handling, timeouts, resumed sessions, etc.
>>
>> These do not have to be public URLs. All of these things are internal
>> to the application. If you are providing your users with URLs for 
>> these
>> things you should ask yourself if there is a better way to handle 
>> this.
>
> The problem is that apps often need to track some kind of state.  Any
> one of the above can affect state.  In some cases you can't rely on
> session (or cookies) and URL becomes the easiest fall back.

Easiest, not necessarily the best. As always it depends on your 
resources and use cases as to which solution is best. An application 
designed to handle authorizations, time outs and/or resumed sessions 
probably should not depend on URLs to accomplish these tasks.

At a single session level I think you can keep this information private 
by using forms. Users will not see the query that contains the data you 
are tracking. Sure the user can't come back after closing their browser 
and pick up where they left off, but you really didn't design your 
application for this.

> I've done
> some strange hacks to work around this in my life, but a structured
> URL can definitely make life easier.

Always. I don't think anyone will argue against this. :-))

>
>>>> As far as dogmatism and URL structure goes, you can always be 
>>>> dogmatic
>>>> in the way you structure them. ;-)  The problem with dogmatism is 
>>>> that
>>>> it does not always lead to the best solution for a given case. Then
>>>> again sometimes it does.
>>>
>>> And that's this issue Daniel's worried about: what's the best
>>> solution.  (However, I'm not completely sure for what problem space.)
>>
>> As always, it depends on the problem space. ;-) There is not one best
>> way and thats why implementing the sitemap according to the "best way"
>> to design your URL space is probably a bad idea. I do find the idea of
>> a tree structured sitemap appealing (even though it works nicely with
>> my "best way" :-O).
>
> I think there are some overall patterns that can be useful for
> probably 80% of the use cases. Stefano seems to believe it's not worth
> trying to build anything directly into Cocoon to handle these.

I think his argument is stronger then that. If you implement a sitemap 
structure that maps perfectly to 80% of the use cases, there is a good 
chance that the other 20% would be impossible to achieve using that 
structure or at the very least require considerable hacking.

> I'm
> not sure I can codify any rational  proposal well enough to dispel
> this belief so I'll probably drop this for now.
>
> Tree structured URL matching makes sense to me and doing it using XSLT
> makes even more sense.  An XSLT flow handler is something I've been
> pushing for for years now, but for the moment Javascript flow and
> resolving URLs through our internal rules based matcher works fine so
> I can't really justify spending any time on this...

Matching is deferent then restructuring the sitemap. I don't believe 
Stefano would have a problem with a TreeMatcher. But I could be wrong. 
:-0

Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com

A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to 
worry about answers."
- Thomas Pynchon Gravity's Rainbow

Re: [RT] Escaping Sitemap Hell

Posted by Peter Hunsberger <pe...@gmail.com>.

On Sat, 8 Jan 2005 18:48:29 -0600, Glen Ezkovich <gl...@hard-bop.com> wrote:

<snip>stuff everyone seems to agree on</snip>\

> > You've got to allow for variations on
> > authorizations, error handling, timeouts, resumed sessions, etc.
> 
> These do not have to be public URLs. All of these things are internal
> to the application. If you are providing your users with URLs for these
> things you should ask yourself if there is a better way to handle this.

The problem is that apps often need to track some kind of state.  Any
one of the above can affect state.  In some cases you can't rely on
session (or cookies) and URL becomes the easiest fall back.  I've done
some strange hacks to work around this in my life, but a structured
URL can definitely make life easier.

> >> As far as dogmatism and URL structure goes, you can always be dogmatic
> >> in the way you structure them. ;-)  The problem with dogmatism is that
> >> it does not always lead to the best solution for a given case. Then
> >> again sometimes it does.
> >
> > And that's this issue Daniel's worried about: what's the best
> > solution.  (However, I'm not completely sure for what problem space.)
> 
> As always, it depends on the problem space. ;-) There is not one best
> way and thats why implementing the sitemap according to the "best way"
> to design your URL space is probably a bad idea. I do find the idea of
> a tree structured sitemap appealing (even though it works nicely with
> my "best way" :-O).

I think there are some overall patterns that can be useful for
probably 80% of the use cases. Stefano seems to believe it's not worth
trying to build anything directly into Cocoon to handle these.  I'm
not sure I can codify any rational  proposal well enough to dispel
this belief so I'll probably drop this for now.

Tree structured URL matching makes sense to me and doing it using XSLT
makes even more sense.  An XSLT flow handler is something I've been
pushing for for years now, but for the moment Javascript flow and
resolving URLs through our internal rules based matcher works fine so
I can't really justify spending any time on this...

-- 
Peter Hunsberger

Re: [RT] Escaping Sitemap Hell

Posted by Glen Ezkovich <gl...@hard-bop.com>.

On Jan 7, 2005, at 10:30 PM, Peter Hunsberger wrote:

> On Fri, 7 Jan 2005 17:38:35 -0600, Glen Ezkovich <gl...@hard-bop.com> 
> wrote:
>>
>> On Jan 7, 2005, at 1:43 PM, Peter Hunsberger wrote:
>>
>>> On Fri, 07 Jan 2005 14:28:06 -0500, Stefano Mazzocchi
>>> <st...@apache.org> wrote:
>>>>
>>>> See? the problem is that you are partitioning the matching space 
>>>> with
>>>> URL matchers... I strongly feel that most of the problems that 
>>>> Daniel
>>>> (and you) are outlining will just go away if you used non-URL
>>>> matchers.
>>>
>>> Although I agree that 90% of the problem seems to be a matcher issue
>>> I've got to ask; what would the matchers be matching on if it's not a
>>> URL?  I have a couple answers, but I'd like other opinions...
>>>
>>> It seems to me that Daniel might be coming at this from a mostly
>>> application POV.  If so, for such cases, I think you can't _always_ 
>>> be
>>> quite as dogmatic about how a URL is structured; for many apps 
>>> there's
>>> little to no exectation of long term URI persistance/repeatability.
>>
>> I don't see this. The application is the resource and it is the
>> application that should have a unique identifier. If the application
>> allows a user to perform multiple tasks you may want to consider each
>> task a resource.
>
> An app may be 1000's of resources.  The issue is how to find a
> rational way of getting 1000's of unique identifiers.

Hmmm... This is a question of semantics. To me the app is the only 
resource. Because I use the http protocol I am constrained as to how I 
issue commands and pass arguments. As a result I use constructs that 
appear to be URIs but in reality are just an awkward translation to 
something that looks like a URI. This does not diminish the problem but 
explains why it exists.

>
>> The persistence of the URI in general is not that
>> important from a users perspective since the URI identifies a resource
>> that might be reachable from multiple URLs. What is important is that
>> the URL that a user uses to reach an application persist and not 
>> change
>> as long as users may use the application.
>
> Umm, wasn't that my point?

Sorry, I just missed it. I'm glad we agree.

>
>> We may not expect to see
>> identical results each time we access http://weather.com/neworleans 
>> but
>> we do expect to get the current weather forecast for New Orleans. If
>> weather.com switched the URI/L to http://weather.com?city=neworleans,
>> as a user I would be perturbed to say the least.
>
> You're missing the point:

Probably.

> weather is a published resource, it's not an
> application.

This was too simplistic an example with out the details. I should also 
not have used an existing site.  The point was a single 
application/resource can be accessed by two distinct  URLs and changing 
URLs is unsettling to a user.

> Consider something like a Web hosted SFA app, something
> where you need work flow. The hard point isn't getting to the app, the
> hard part is partitioning the app.  You've got lots' of orthogonal
> concerns (and I mean lots). URI, scheme (protocol), and request
> parameters (among others) all give you ways of attacking various parts
> of the problem,

Of course. And this is where problems in structuring a sitemap arise.

> but when you start hosting 1000's of different app
> spaces on the same machine the issue isn't as trivial as
> scheme://host?couple-of-parms.

What else do you have? This is pretty much it X 1000s. An application 
consists of its commands and queries. Each may have parameters. The 
problem is mapping the commands, queries and parameters to something 
that is a valid URI. Your sitemap can only do so much. With the advent 
of flow there is a very direct way to map public URLs to commands and 
queries. If you want to simplify (though not shorten) your sitemap use 
flow and match using a hierarchical menu like pattern (
	<map:match pattern="patients/edit">
		<map:call function="editPatient"/>
	</map:match>
).

I realize that everything is not this simple. Many problems arise from 
technologies that have been supplanted by flow. Prior to flow there was 
a tendency to use the sitemap as the application controller. Using it 
as such probably leads one to much more inventive URI designs.

> You've got to allow for variations on
> authorizations, error handling, timeouts, resumed sessions, etc.

These do not have to be public URLs. All of these things are internal 
to the application. If you are providing your users with URLs for these 
things you should ask yourself if there is a better way to handle this.

>> As far as dogmatism and URL structure goes, you can always be dogmatic
>> in the way you structure them. ;-)  The problem with dogmatism is that
>> it does not always lead to the best solution for a given case. Then
>> again sometimes it does.
>
> And that's this issue Daniel's worried about: what's the best
> solution.  (However, I'm not completely sure for what problem space.)

As always, it depends on the problem space. ;-) There is not one best 
way and thats why implementing the sitemap according to the "best way" 
to design your URL space is probably a bad idea. I do find the idea of 
a tree structured sitemap appealing (even though it works nicely with 
my "best way" :-O).


Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com



A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to 
worry about answers."
- Thomas Pynchon Gravity's Rainbow

Re: [RT] Escaping Sitemap Hell

Posted by Stefano Mazzocchi <st...@apache.org>.

Peter Hunsberger wrote:

> And that's this issue Daniel's worried about: what's the best
> solution.  (However, I'm not completely sure for what problem space.)

There is no best solution, folks.

For some people, even something like http://site.com/29309840938 is not 
persistent enough (because site.com might go away... but even because 
the HTTP protocol might go away!)

I'm all for the creation of some documentation about "best practices in 
URL-space design", but I would be very strongly against the distillation 
of any of these best practices into the sitemap architecture itself, 
unless until one of those best practices turns out to be used by 
practically everybody and in any given circumstance.

-- 
Stefano.

Re: [RT] Escaping Sitemap Hell

Posted by Peter Hunsberger <pe...@gmail.com>.

On Fri, 7 Jan 2005 17:38:35 -0600, Glen Ezkovich <gl...@hard-bop.com> wrote:
> 
> On Jan 7, 2005, at 1:43 PM, Peter Hunsberger wrote:
> 
> > On Fri, 07 Jan 2005 14:28:06 -0500, Stefano Mazzocchi
> > <st...@apache.org> wrote:
> >>
> >> See? the problem is that you are partitioning the matching space with
> >> URL matchers... I strongly feel that most of the problems that Daniel
> >> (and you) are outlining will just go away if you used non-URL
> >> matchers.
> >
> > Although I agree that 90% of the problem seems to be a matcher issue
> > I've got to ask; what would the matchers be matching on if it's not a
> > URL?  I have a couple answers, but I'd like other opinions...
> >
> > It seems to me that Daniel might be coming at this from a mostly
> > application POV.  If so, for such cases, I think you can't _always_ be
> > quite as dogmatic about how a URL is structured; for many apps there's
> > little to no exectation of long term URI persistance/repeatability.
> 
> I don't see this. The application is the resource and it is the
> application that should have a unique identifier. If the application
> allows a user to perform multiple tasks you may want to consider each
> task a resource. 

An app may be 1000's of resources.  The issue is how to find a
rational way of getting 1000's of unique identifiers.

> The persistence of the URI in general is not that
> important from a users perspective since the URI identifies a resource
> that might be reachable from multiple URLs. What is important is that
> the URL that a user uses to reach an application persist and not change
> as long as users may use the application. 

Umm, wasn't that my point?

> We may not expect to see
> identical results each time we access http://weather.com/neworleans but
> we do expect to get the current weather forecast for New Orleans. If
> weather.com switched the URI/L to http://weather.com?city=neworleans,
> as a user I would be perturbed to say the least.

You're missing the point: weather is a published resource, it's not an
application.   Consider something like a Web hosted SFA app, something
where you need work flow. The hard point isn't getting to the app, the
hard part is partitioning the app.  You've got lots' of orthogonal
concerns (and I mean lots). URI, scheme (protocol), and request
parameters (among others) all give you ways of attacking various parts
of the problem, but when you start hosting 1000's of different app
spaces on the same machine the issue isn't as trivial as
scheme://host?couple-of-parms.  You've got to allow for variations on
authorizations, error handling, timeouts, resumed sessions, etc.

> As far as dogmatism and URL structure goes, you can always be dogmatic
> in the way you structure them. ;-)  The problem with dogmatism is that
> it does not always lead to the best solution for a given case. Then
> again sometimes it does.

And that's this issue Daniel's worried about: what's the best
solution.  (However, I'm not completely sure for what problem space.)

-- 
Peter Hunsberger

Re: [RT] Escaping Sitemap Hell

Posted by Glen Ezkovich <gl...@hard-bop.com>.

On Jan 7, 2005, at 1:43 PM, Peter Hunsberger wrote:

> On Fri, 07 Jan 2005 14:28:06 -0500, Stefano Mazzocchi
> <st...@apache.org> wrote:
>>
>> See? the problem is that you are partitioning the matching space with
>> URL matchers... I strongly feel that most of the problems that Daniel
>> (and you) are outlining will just go away if you used non-URL 
>> matchers.
>
> Although I agree that 90% of the problem seems to be a matcher issue
> I've got to ask; what would the matchers be matching on if it's not a
> URL?  I have a couple answers, but I'd like other opinions...
>
> It seems to me that Daniel might be coming at this from a mostly
> application POV.  If so, for such cases, I think you can't _always_ be
> quite as dogmatic about how a URL is structured; for many apps there's
> little to no exectation of long term URI persistance/repeatability.

I don't see this. The application is the resource and it is the 
application that should have a unique identifier. If the application 
allows a user to perform multiple tasks you may want to consider each 
task a resource. The persistence of the URI in general is not that 
important from a users perspective since the URI identifies a resource 
that might be reachable from multiple URLs. What is important is that 
the URL that a user uses to reach an application persist and not change 
as long as users may use the application. We may not expect to see 
identical results each time we access http://weather.com/neworleans but 
we do expect to get the current weather forecast for New Orleans. If 
weather.com switched the URI/L to http://weather.com?city=neworleans, 
as a user I would be perturbed to say the least.

As far as dogmatism and URL structure goes, you can always be dogmatic 
in the way you structure them. ;-)  The problem with dogmatism is that 
it does not always lead to the best solution for a given case. Then 
again sometimes it does.

Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com

A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to 
worry about answers."
- Thomas Pynchon Gravity's Rainbow

Re: [RT] Escaping Sitemap Hell

Posted by Peter Hunsberger <pe...@gmail.com>.

On Fri, 07 Jan 2005 14:28:06 -0500, Stefano Mazzocchi
<st...@apache.org> wrote:

<snip/>

> >
> > Forrest is doing it for docs... someone else can do it for apps :-)
> 
> See? the problem is that you are partitioning the matching space with
> URL matchers... I strongly feel that most of the problems that Daniel
> (and you) are outlining will just go away if you used non-URL matchers.

Although I agree that 90% of the problem seems to be a matcher issue
I've got to ask; what would the matchers be matching on if it's not a
URL?  I have a couple answers, but I'd like other opinions...

It seems to me that Daniel might be coming at this from a mostly
application POV.  If so, for such cases, I think you can't _always_ be
quite as dogmatic about how a URL is structured; for many apps there's
little to no exectation of long term URI persistance/repeatability.

-- 
Peter Hunsberger  

(Won't be able to respond untili Wed.)

Re: [RT] Escaping Sitemap Hell

Posted by Stefano Mazzocchi <st...@apache.org>.

Nicola Ken Barozzi wrote:
> Stefano Mazzocchi wrote:
> 
>> Daniel Fagerstrom wrote:
> 
> ...
> 
>>> A real map for the site should be tree structured like the linkmap in 
>>> forrest. Take a look at the example in [1], (I don't suggest using 
>>> the "free form" XML, something stricter is required). Such a tree 
>>> model will also help in planning the URI space as it gives a good 
>>> overview of it.
>>
>>
>> Forrest and cocoon serve different purposes.
>>
>> While I totally welcome the fact that Forrest has such "linkmaps", I 
>> don't think they are general-enough concepts to drive the entire 
>> framework. They are fine as specific cases, especially appealing for a 
>> website generation facility like forrest, but as a general concept is 
>> too weak.
> 
> 
> While I agree with your reply, I think that I understand what problem 
> Daniel thinks he sees.
> 
>  A sitemap is not the _map_ of a _site_.
> 
> That's why we made site.xml.
> 
> But saying that this should drive processing is IMHO not correct. In 
> fact, it's the opposite. We made the site.xml stuff in a different file 
> so that it would *not* interfere with processing.

Right. I see that.

> 
> ...
> 
>>> Choosing Production Pipeline
>>> ----------------------------
> 
> ...
> 
>>> Now instead of having the rule:
>>>
>>> *.x.y.z ==> XYZPipeline
>>>
>>> we have
>>>
>>> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>>>
>>> or
>>>
>>> * where repository:{1}.x.y.z exists ==> XYZPipeline
>>
>>
>>
>> Oh, a rule system for sitemap!
>>
>> hmmmm, interesting... know what? the above smells a *lot* like you are 
>> querying RDF. hmmmm...
> 
> 
> At Forrest, we have done a similar thing, and are still in the process 
> of finishing it. IT states something like this:
> 
>  "Forrest processing should not be tied to URLs."
> 
> IOW, Forrest should not process a file differently just because it's in 
> a particular directory, but using other characteristics, like mime-type, 
> DTD, schema, etc. For us, an URL is a partitioning decision of the 
> content creator, not of the application creator.
> 
> Many sites fail to do this, and URL matching has become the easiest way 
> of partitioning Cocoon's processing, although not the best.
> 
> You can make your own matcher... and here is where we should 
> concentrate, by defining new blueprints that don't use the URL as a 
> matching system.
> 
> Forrest is doing it for docs... someone else can do it for apps :-)

See? the problem is that you are partitioning the matching space with 
URL matchers... I strongly feel that most of the problems that Daniel 
(and you) are outlining will just go away if you used non-URL matchers.

-- 
Stefano.

Re: [RT] Escaping Sitemap Hell

Posted by Nicola Ken Barozzi <ni...@apache.org>.

Stefano Mazzocchi wrote:
> Daniel Fagerstrom wrote:
...
>> A real map for the site should be tree structured like the linkmap in 
>> forrest. Take a look at the example in [1], (I don't suggest using the 
>> "free form" XML, something stricter is required). Such a tree model 
>> will also help in planning the URI space as it gives a good overview 
>> of it.
> 
> Forrest and cocoon serve different purposes.
> 
> While I totally welcome the fact that Forrest has such "linkmaps", I 
> don't think they are general-enough concepts to drive the entire 
> framework. They are fine as specific cases, especially appealing for a 
> website generation facility like forrest, but as a general concept is 
> too weak.

While I agree with your reply, I think that I understand what problem 
Daniel thinks he sees.

  A sitemap is not the _map_ of a _site_.

That's why we made site.xml.

But saying that this should drive processing is IMHO not correct. In 
fact, it's the opposite. We made the site.xml stuff in a different file 
so that it would *not* interfere with processing.

...
>> Choosing Production Pipeline
>> ----------------------------
...
>> Now instead of having the rule:
>>
>> *.x.y.z ==> XYZPipeline
>>
>> we have
>>
>> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>>
>> or
>>
>> * where repository:{1}.x.y.z exists ==> XYZPipeline
> 
> 
> Oh, a rule system for sitemap!
> 
> hmmmm, interesting... know what? the above smells a *lot* like you are 
> querying RDF. hmmmm...

At Forrest, we have done a similar thing, and are still in the process 
of finishing it. IT states something like this:

  "Forrest processing should not be tied to URLs."

IOW, Forrest should not process a file differently just because it's in 
a particular directory, but using other characteristics, like mime-type, 
DTD, schema, etc. For us, an URL is a partitioning decision of the 
content creator, not of the application creator.

Many sites fail to do this, and URL matching has become the easiest way 
of partitioning Cocoon's processing, although not the best.

You can make your own matcher... and here is where we should 
concentrate, by defining new blueprints that don't use the URL as a 
matching system.

Forrest is doing it for docs... someone else can do it for apps :-)

-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------

Re: [RT] Escaping Sitemap Hell

Posted by Stefano Mazzocchi <st...@apache.org>.

Daniel Fagerstrom wrote:
>  (was: Splitting xconf files step 2: the sitemap)
> 
> Although the Cocoon sitemap is a really cool innovation it is not 
> entierly without problems:
> 
> * Sitemaps for large webapps easy becomes a mess
> * It sucks as a map describing the site [1]
> * It doesn't give that much support for "cool URLs" [2]
> 
> In this RT I will try to analyze the situation especially with respect 
> to URL space design and then move on to discuss a possible solution.
> 
> Before you entusiasticly dive into the text:
> 
> * It is a long RT, (as my RTs usually are)
> * It might contain provoking and hopefully even thought provoking ideas
> * No, I will not require that everything in it should be part of 2.2
> * No, I don't propose that we should scrap the current sitemap, actually 
> I believe that we should support it for the next few millenia ;)

See my comments intermixed.

>                            --- o0o ---
> 
> Peter and I had some discussion:
> 
> Peter Hunsberger wrote:
> 
>> On Tue, 04 Jan 2005 13:25:05 +0100, Daniel Fagerstrom 
>> <da...@nada.kth.se> wrote: 
> 
> 
> <snip/>
> 
>>> Anyway, sometimes when I need to refactor or add functionallity to 
>>> some of our Cocoon applications, where I or colleagues of mine have 
>>> written endless sitemaps, I have felt that it would have been nice if 
>>> the sitemap would have been more declarative so that I could have 
>>> asked it basic things like geting a list of what URLs or URL pattern 
>>> it handles. Also if I have an URL in a large webapp and it doesn't 
>>> work as expected it can require quite some work to trace through all 
>>> involved sitemaps to see what rule that actually is used. 
>>
>>
> 
>>> Of course I understand that if I used a set of efficient conventions 
>>> about how to structure my URL space and my sitemaps the problem would 
>>> be much less. Problem is that I don't have found such a set of 
>>> conventions yet. Of course I'm following some kind of principles, but 
>>> I don't have anything that I'm completely happy with yet. Anyone 
>>> having good design patterns for URL space structuring and sitemap 
>>> structuring, that you want to share? 
>>
>>
>>
>> We have conventions that use sort of type extensions on the names: 
>> patient.search, patient.list, patient.edit where the search, list,
>> edit (and other) screen patterns are common across many different
>> metadata sources (in this case patient). We don't do match *.edit
>> directly in the sitemap (any more) but I find that if you've got to
>> handle orthoganal concerns then x.y.z naming patterns can sometimes
>> help.
>>
> Ok, lets look at this in a more abstracted setting:
> 
> Resource Aspects
> ================
> 
> In the example above we have an object or better a _resource_, the 
> patient that everything else is about. The resource should be 
> identifyable in an unique way in this case with e.g. the social security 
> number.

First big mistake: you think that http-based URIs and http-based URLs 
are the same thing.

Well, WRONG.

There is nothing that says that every http-URI should be automatically 
treated as a URL. This is a very commmon misconception, but nevertheless 
a big one.

> There are a number of _operations_ that can be performed at the patient 
> resource: show, edit, list, search etc, (although the search might be on 
> the set of patient rather than a single one).
> 
> The resource has a _type_, patient, that might affect how we choose to 
> show it etc.

Secong mistake: it is a architectural design issue to *avoid* adding a 
type to a URI. These are three separate issues:

  1) how to resolve a URI into a URL
  2) how to negotiate the content of that URL
  3) how to map that returned URL metadata (the HTTP response headers) 
to a recognized type or format.

combining them into one is just a really poor way to use the web 
architecture.

> There are in general other aspects that will stear how we render the 
> response when someone asks for the resouce:
> 
> * The _format_ of the response: html, pdf, svg, gif etc.
> * The _status_ of the resource: old, draft, new etc.
> * The _access_ rights of the response: public, team, member etc.
> 
> There are plenty of other possible aspect areas as well.

> 
> Cool Webapp URLs
> ================
> 
> I searched the web to gain some insights in URL space design. It soon 
> become clear that I should re-read Tim Berners-Lee's clasic, "Cool URIs 
> don't change" [2]. I must say I wasn't prepared to the chock, I had 
> completely missed how radical the message in it was when I read it the 
> last time. 
> I can also recomend reading [3], a W3C note that codifies the 
> message from [2] and some other good URI practices into a set of 
> guidelines.

I suggest you to read

   http://www.w3.org/TR/webarch/

> So what is an URI? According to [3]:
> 
>  A URI is, actually, a //reference to a resource, with fixed and 
> independent semantics/ /.
> 
> This means that the URI should reference to a specific product, 
> _always_. 

GRRRR! A URI IS NOT A REFERENCE! A URI IS AN IDENTIFIER!

How to get a reference out of an itentifier is a totally different thing.

> Independent semantics means that a social security number is 
> not enough, it should say that it is a person (from USA) as well. See 
> [3] for the philosophical details.

Pfff, independent semantics doesn't mean anything. A perfectly valid URI is

  urn:943098029834098/9829982739487298374

> * The URI should be easy to type

What the hell does this mean?

http://tinyurl.com/5r8kl

is easier to type than

http://www.amazon.com/exec/obidos/tg/detail/-/0465026567/

but which one is "better"? They both locate the same resource, but which 
one of them identifies it better?

> * It should not contain to much meaning, especially not about 
> implementation details
> 
> Now I try to apply the ideas from [2] and [3] on the different resource 
> aspects mentioned above. When I use words like "should" or "should not" 
> without any motivation it means that I believed in the motivation from 
> the gurus in the references ;) I will try to motivate my own ideas ;)
> 
> What I'm going to suggest might be quite far from how you design your 
> URL spaces. It is certainly far from the implementation detail plauged 
> mess that I have created in my own applications.
> 
> The Resource
> ------------
> 
> The idea is that an URL identifies a resource. For the patient case 
> above it could be:
> 
> http://myhospital.com/person/123456789
> 
> If we use a hierarchial URI space like /person/123456789, the "parent" 
> URIs e.g. /person should also refer to a resource. 

There is *NO SUCH THING* as a parent URI, because URIs do not have the 
notion of paths. It is a *convention* that it was established by early 
web server implementations (and that apache httpd perdured) that the / 
in the paths got automatically mapped to the / in the file system or in 
a hierarchical system where the / is used as a fragmentor for hierachy 
identifiers.

There is *NOTHING* in any web spec that says this is the rule or, for 
that matter, that this is a good thing.

/ is a "separator" in fact, from a URI point of view

  http://myhospital.com/123456789/person

and

  http://myhospital.com/person/123456789

show no difference in identification power.. which is what URIs do: they 
identify!

> Its in most cases not 
> a good idea to put a lot of topics classification effort in the URI 
> hierarchy. Classifications are not unique and will change according to 
> changing interests and world view.

This is true. But it is also true that, if you follow this reasoning, 
you should not be using http:// URIs at all!

In fact, what happens to a URI when say, two hospitals merge and they 
decide that it's in their best interest to get rid of the previous 
references of the names, including those in the URIs?

This is the reason why a lot of people prefer URNs over http-URIs, for 
example:

  1) the handle system: http://www.handle.net/
  2) the LSID system: http://www.omg.org/docs/dtc/04-05-01.pdf
  3) the DOI system: http://www.doi.org/

TimBL believes that the above systems are just a different way to skin a 
cat and they don't really solve anything (even if he agrees on the 
problem that the domain part of http-URIs is the weakest part of an 
http-URI, in terms of long-term persistence)

Also, you should take a look at 'Dynamic Delegation Discovery System' 
(DDDS):

   http://uri.net/ddds.html

which aims to become the standard way to translate a URI into a URL.

> Operations
> ----------
> 
> What about the operations on the resource: list, search, edit etc? I 
> find the object oriented style in WebDAV elegant where you use one URL 
> together with different HTTP methods to perform different operations. 

It's not the OO style of WebDAV, but it's the design of HTTP. Here is 
another example of somebody ruining a perfectly great design by not 
getting it: the browsers only allowed people to overload the actions in 
forms, but never in anchor tags and the browsers never allowed 
javascript to change that either.

> Sam Ruby also have some intersting ideas about using URLs to identify 
> "objects" and different SOAP messages for different methods on the 
> object in his "REST+SOAP" article [4]. But neither adhoc HTTP methods or 
> XML posts seem like good candidates for invoking operations on a 
> resource in a typical webapp. So maybe something like:
> 
> /person/123456789/edit or
> /person/123456789.edit or
> /person/123456789?operation=edit
> 
> is a good idea.
> 
> Resource Type
> -------------
> 
> Should the type of the resource be part of the URI? 

Absolutely not!

> We probably have to 
> contain some type info in the URL to give it "independent sematics" 
> (person e.g.). But we should not put types that might change like 
> patient, manager, project-leader etc in the URL. And we should 
> especially avoid types that only have to do with implementation details 
> like what pipeline we want to use for rendering the resource.
> 
> Format
> ------
> 
> Cocoon especially shines in handling all the various file name 
> extensions: .html, .wml, .pdf, .txt, .doc, .jpg, .png, .svg, etc, etc. 
> But I'm sorry, if you want cool URLs you have to kiss them godbye as 
> well ;)

This is, again, another one of those major screwups from some browsers 
(mostly IE) where the "extension" of a URL (as such a thing existed!) 
was used to identify the mime-type instead of the response headers.

> It might be a good idea to send a html page to a browser on a PC and a 
> wml page to a PDA user. But you shouldn't require your user to remember 
> different URLs for different clients, thats a task for server driven 
> content negotiation.
> 
> Using .html is not especially future proof, should all links become 
> invalid when you decide to reimplement your site with dynamic SVG?
> 
> Often it is good to provide the user with a nice printable version of 
> your page. But why should you advertice Adobes products in your URLs. 

Unfair: many non-adobe things produce PDF and it's a royalty-free 
specification to use.

http://partners.adobe.com/public/developer/pdf/index_reference.html

> A 
> few years ago it was .ps or .dvi from academic sites and .doc in 
> comersial sites. Right now it happen to be .pdf but will that be forever?
> 
> Same thing with images, the user don't care about the format as long as 
> it can be shown in the browser (content negotiation), neither should you 
> make your content links or (Googles image search) be  dependent on a 
> particular compression scheme that happen to be popular right now.
> 
> There are of course cases where you really whant to give your user the 
> abillity to choose a specific format. Then a file name extension is a 
> good idea. If you happens to maintain 
> http://www.adobe.com/products/acrobat/ its ok to put some .pdf there 
> e.g. ;)
> 
> But in most cases file name extensions is an implementation detail that 
> not is relevant for your users.

This is correct. Although a URL that might break in the future but shows 
me a page in my browser today is better than a URL that might not break 
tomorrow but doesn't show me anything at all today ;-)

> Status
> ------
> 
> The status will by definition change, and that make your URL uncool if 
> the status was part of the URL.
> 
> Access Rights
> -------------
> 
> Access rigths will often change for a document. I know it is easy to 
> write path dependent rules for access rights in most webserver 
> configuration files. But you expose irrelevant implementation details 
> and its not future proof.
> 
> Am I Really Serious?
> --------------------
> 
> Why should a webapp URL be cool and future proof? Well, its the 
> interface to your webapp. We agree that we shouldn't change interfaces 
> in Cocoon at a whim, why should we treat the users of our webapps 
> differently? And like it or not, usefull software sometimes lives for 
> decades. If you build useful webapps you should consider planing ahead.
> 
> Currently we are all used with webapps that uses the most horrible URLs 
> containing tons of implementation details and changing every now and 
> then. But it is not a law of nature that it must be like that. It is 
> mainly a result of webapp development still being immature and the tools 
> being far from perfect. Of course the user should be able to bookmark a 
> useful form or wizard.
> 
> Also I believe that exposing implementation details in ones URLs is at 
> least as bad as making all member variables public in Java classes. It 
> makes your webapp monolithic and fragile.

To get this straight: I totally agree that a cool URL scheme is a great 
thing and I also think that the best URL scheme is something like

  http://site.com/342343

and that's it... that's the only way never to change anything because 
those numbers are the only 'semantically neutral' thing that you can do?

But still, my blog news URLs are the form of

  http://www.betaversion.org/~stefano/linotype/news/34/

which have several problems:

  1) we might forget to register the domain and somebody might steal it 
from us

  2) well, my name might change (but that's unlikely)

  3) the company that has a trademark on linotype might sue me

  4) I might decide to add other types of idems to my blog, like images 
or articles or whatever else... then news/id/ would seem awkward

but the best part is the number, chosen to be incremental and unique in 
that space.

> 
>                            --- o0o ---
> 
> You might find the views expressed above rather extreme and maybe 
> unpractical. As indicated above they are also far away from what I 
> curently do in my webapps. But I have for quite some time thought about 
> how to fight the to easily increasing entropy in the webapps we develop. 
> I have suspected that badly designed URL spaces has been part of the 
> trouble. And when I re-read Tim BLs classic I suddenly realized that the 
> habit of exposing implementation in the URLs might be at the root of the 
> evil.

There is truth in this, but what I found irritating was the lack of 
understanding of the difference between a URI and a URL.

Cocoon's internals show some of this too (and I have to admit that I 
understood what URIs really were only after starting to work on the 
semantic web) but this should not be perpetuated further.

> If this realization will survive the contact with your comments and 
> other parts of reality is of course to early to tell ;)
> 
> 
> Does Cocoon Support Cool URLs?
> ==============================

Yessir!

> But how does Cocoon support the above ideas about URL space design?
> 
> Well, in some way one could say that it supports it. The sitemap is so 
> powerfull that you can program most usage patterns in it in some more or 
> less elegant way. But AFAICS, writing webapps following the URL space 
> design ideas above would be rather tricky. So I would say that Cocoon 
> doesn't support it that well. 

I rather strongly (and probably not surprisingly) disagree with this 
statement.

> The main reasons are:
> 
> * The sitemap is not that usefull as a site map

How is this making it worse to support "cool URLs"?

> * The sitemap gives excelent support for choosing resource production 
> implementation based on the implementation details coded into the URL, 
> but not for avoiding it

wrong! that's why we have pluggable matchers! the fact that you choose 
to match by URL is your choice, not an architectural decision!

> * The sitemap mixes site map concerns with resource production 
> implementation details

Yes, the cocoon sitemap describes how resources get produced in the 
pipelines.... but what is the site map you are talking about? a 
collection of all the resources available on the site? or just the URL 
matchers without anything else?

> Is it a Map of the Site?
> ------------------------
> 
> The Forrest people don't think that the sitemap is enough as map of the 
> site. They have a special linkmap [1] that gives a map over the site and 
> that is used for internal navigation and for creating menu trees. I have 
> a similar view. From the sitemap it can be hard to answer basic 
> questions like:
> 
> * What is the URL space of the application
> * What is the structure of the URL space
> * How is the resource refered by this URL produced

Hold it right there!

If you think that understanding the URLspace of the application for a 
sitemap is hard, then what about PHP? JSP? what about web.xml 
descriptors? are they any better?

Second point: how in hell is "structure of the URL space" different from 
"the URL space of the application"?

Third point: this is *flow* is should *NOT* be part of a sitemap anyway.

> The basic view of a URL in the sitemap is that it is any string. Even if 
> there are constructions like mount, the URL space is not considered as 
> hierarchial. That means that the URLs can be presented as patterns in 
> any order in the sitemap and you have to read through all of it to see 
> if there is a rule for a certain URL.

As I mentioned already, this is a design decision based on the fact that 
it is *arbitrary* to consider the / as a hierachical separator.

Also, matchers are *NOT* URL-specific and it's a very useful concept. 
Forcing matching to be:

  1) URL-based

and

  2) intrinsically hierarchical

is IMO a *severe* step backward in terms of architectural design.

> A real map for the site should be tree structured like the linkmap in 
> forrest. Take a look at the example in [1], (I don't suggest using the 
> "free form" XML, something stricter is required). Such a tree model will 
> also help in planning the URI space as it gives a good overview of it.

Forrest and cocoon serve different purposes.

While I totally welcome the fact that Forrest has such "linkmaps", I 
don't think they are general-enough concepts to drive the entire 
framework. They are fine as specific cases, especially appealing for a 
website generation facility like forrest, but as a general concept is 
too weak.

> The Forrest linkmap have no notion of wildcards, which is a must in 
> Cocoon. We continue discussing that.

All right.

> Choosing Production Pipeline
> ----------------------------
> 
> With the sitemap it is very easy to choose the pipeline used for 
> producing the response based on a URL pattern "*.x.y.z". That more or 
> less forces the user to code implementation details i.e. what pipeline 
> to use into the URL. This is only a problem for wildcard patterns 
> otherwise we just associate the pipeline to the concrete "cool URL".

At this point I seriously wonder: are you aware that matchers are pluggable?

> Before I suggested that aspects like: type, format, status, access 
> rights etc shouldn't be part of the URL as those aspects might change 
> for the resource. OTH these aspects certainly are necessary for choosing 
> rendering pipeline, what should we do?

URL-parameter matching.

  <match type="wildcard" pattern="/news/*">
    <match type="param" pattern="edit">
     ....
    </match>
    <match type="param" pattern="delete">
     ....
    </match>
  </match>

or, if you have HTTP action control (as in form actions), you can do

  <match type="wildcard" pattern="/news/*">
    <match type="action" pattern="get">
     ....
    </match>
    <match type="action" pattern="post">
     ....
    </match>
  </match>

and, most of all, you do *NOT* include access control information in the 
URL! nor type! nor status!

> The requested resource will often be based on some content or 
> combination of content that we can access from Cocoon. The content can 
> be a file, data in a db, result from a business object etc. Let us 
> assume that it resides in some kind of content repository. Now if we 
> think about it, isn't it more natural to ask the content, that we are 
> going to use, about its propertiies like type, format, status, access 
> rights, etc, than to encode it in the URL? This properties can be 
> encoded in the file name, in metadata in some property file, within the 
> file, in a DB etc. 

Ok, now that the nonsense venting is over, we seem to be getting at your RT.

> Now instead of having the rule:
> 
> *.x.y.z ==> XYZPipeline
> 
> we have
> 
> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
> 
> or
> 
> * where repository:{1}.x.y.z exists ==> XYZPipeline

Oh, a rule system for sitemap!

hmmmm, interesting... know what? the above smells a *lot* like you are 
querying RDF. hmmmm...

> We get the right pipeline by querying the repository instead of encoding 
> it in the URL. A further advantage is that the rule becomes "listable" 
> as the "where" clause in the match expresses what are the allowed values 
> for the wildcard.
> 
> Separating the Concerns
> -----------------------
> 
> The sitemap handles two concerns: it maps an URL to a pipeline that 
> produces a repsonse and it describes how to put together this pipeline 
> from sitemap components.

True.

> The first concern is related to site design and 
> the second is more a form of programming. Puting them together makes it 
> hard to see the URL structure and also makes it tempting to group URLs 
> based on common pipeline implementation instead of on site structure.

Fair enough.

> Virtual Pipeline Components (VPCs) give us a way out from this. Large 
> parts of our sites might be buildable with pipelines allready 
> constructed in some standard blocks.

Right.

> I would propose to go even further, in the "real" site map it should 
> only be allowed to call VPC pipelines, no pipeline construction is 
> allowed, that should be done in the component area.
> 
> In the "real" site map the current context is set and the the arguments 
> to the called VPC is given.

Hmmm, rather drastic, but let's stick to it for your proposal.

> Search Order
> ------------
> 
>>  The problem for us, is as you allude to at the start of this
>> thread: Cocoon takes the first match, where what you really want is a
>> more XSLT "best match" type of handling; sometimes *.a, *.b, *.c works
>> and other times it's m.*, n.*, o.*...
>>
>> In the past that has lead me to suggest a sort of XSLT flow, but
>> thinking about it in this light I wonder if what I really want is just
>> XSLT sitemap matching (same thing in the end)...
>>  
>>
> I also believe that a "best match" type of handling is preferable, it 
> increases IMO usabillity and it also makes it possible to use tree based 
> maching algoritms that are far more efficient than the current linear 
> search based.

This is a valid point.

> The new sitemap
> ===========
> 
> To sum up the proposal:
> 
> Pipelines:
> * Pipeline construction is only done as VPCs in component areas (often 
> in blocks).
> 
> Sitemap:
> * The sitemap is folow the tree structure of the URL space (like the 
> Forrest linkmap).
> * Its responsibillity is to map URLs to VPCs
> * It can set the current context for each level in the tree (for 
> derefering relative paths used in the VPC)
> * Wildcards can have restrictions based on properties in the content 
> repository
> * Its best match based rather than rule order based
> * Of course we have an include construct so that we can reuse sub sites
> 
> It might look like:
> 
> <sitemap>
>  <path match="person" context="adm/persons" 
> pipeline="block:skin:default(search.xml)">
>    <path match="*:patient" test="mydb:/patients/{patient} exists" 
> context="adm/patients" pipeline="journal-summary({patient})">
>      <path match="edit" pipeline="edit({patient})"/>
>      <path match="list" pipeline="list({patient})"/>
>      <!-- and so on -->
>    </path>
>  </path>
> </sitemap>
> 
> Don't care about the syntactical details in the example it needs much 
> more thought, I just wanted to make it a little bit more concrete. The 
> path separator "/" is implicily assumed between the levels. "*:patient", 
> means that the content of "*" can be refered to as "patient".
> 
> Much of what I propose can be achieved with VPCs and a new "property 
> aware" matcher. But IMO the stricter SoC above, the ability to "query" 
> the sitemap, the possible advantages of the "best match" search, are 
> reasons enough to go further.

First thing that comes to mind is that the implicit assumption of '/' is 
just bad. I would be against the proposal just for that.

Second, you lose the ability to do non-URL matching, which is, again 
another reason to vote against this.

Third, conditional matching is just nonsense, it's mixing flow concerns 
with matching.

Forth, I don't find the above any more readable than a sitemap that uses 
VPCs.

I'll think about the rule-based pipeline resolution (which is an 
interesting concept on itself) but the rest, I'm sorry, it really does 
not resonate with me at all.

-- 
Stefano.

Re: [RT] Escaping Sitemap Hell

Posted by Reinhard Poetz <re...@apache.org>.

Ugo Cei wrote:
> Il giorno 06/gen/05, alle 01:54, Daniel Fagerstrom ha scritto:
> 
>>
>> The requested resource will often be based on some content or 
>> combination of content that we can access from Cocoon. The content can 
>> be a file, data in a db, result from a business object etc. Let us 
>> assume that it resides in some kind of content repository. Now if we 
>> think about it, isn't it more natural to ask the content, that we are 
>> going to use, about its propertiies like type, format, status, access 
>> rights, etc, than to encode it in the URL? This properties can be 
>> encoded in the file name, in metadata in some property file, within 
>> the file, in a DB etc. Now instead of having the rule:
>>
>> *.x.y.z ==> XYZPipeline
>>
>> we have
>>
>> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>>
>> or
>>
>> * where repository:{1}.x.y.z exists ==> XYZPipeline
>>
>> We get the right pipeline by querying the repository instead of 
>> encoding it in the URL. A further advantage is that the rule becomes 
>> "listable" as the "where" clause in the match expresses what are the 
>> allowed values for the wildcard.
> 
> 
> Unless I misinterpret what you mean, we already can do this:
> 
> <map:match pattern="*">
>   <map:call function="fun">
>     <map:parameter name="par" value="{1}"/>
>   </map:call>
> </map:match>
> 
> function fun() {
>   var entity = repo.query("select * from entities where id = " + 
> cocoon.parameters.par + " and {x, y, z}");
>   cocoon.sendPage("views/XYZPipeline", { entity : entity });
> }
> 
> <map:match pattern="XYZPipeline">
>   <map:generate type="jx" src="xyz.jx.xml"/>
>  ...
> </map:match>
> 
> Apart from the obviously contrived example, isn't the Flowscript just 
> what we need to "get the right pipeline by querying the repository"?
> 
>> I would propose to go even further, in the "real" site map it should 
>> only be allowed to call VPC pipelines, no pipeline construction is 
>> allowed, that should be done in the component area.
> 
> 
> In my sitemaps, public pipelines contain almost only <map:call> and 
> <map:read> (for static resources) elements. All "classical" 
> generator-transformer-serializer pipelines go into an "internal-only" 
> pipeline that can be called from flowscripts only.
> 
> Admittedly, this is fine for webapps, and maybe not so much for 
> publishing-oriented websites. But what I want to point out is that your 
> otherwise very well thought-out RT is incomplete if it doesn't take 
> Flowscript in consideration, IMHO.


I started to write a very similar answer (example + considering flowscript)
Thanks Ugo ;-)

-- 
Reinhard

Re: [RT] Escaping Sitemap Hell

Posted by Glen Ezkovich <gl...@hard-bop.com>.

On Jan 6, 2005, at 11:12 AM, Daniel Fagerstrom wrote:

>
> How much work you should spend in creating "cool" URLs in your webapp, 
> varies of course from application to application. I just wanted to 
> point out that we can do more than 123456.cont if we want to. I have 
> used more than one webapp that goes through many "transactions" during 
> use, but force me to start the navigation from some start screen if my 
> session expires. At least I would found such applications more 
> userfriendly if they had used "cooler" URIs.

What do you mean by "transactions"? If the "transaction" is only 
"committed" to the Session then once the session expires there is no 
going back no matter how cool the URI. If the "transaction" is 
committed to persistent storage then continuation is possible. However, 
using flow to achieve this is dubious at best. Once such a transaction 
is completed it seems wiser to use sendPage and if more control is 
needed to pass it to a different function and create a new continuation 
hierarchy. If a user can return to a site days latter and continue 
where they left off its likely that the same URI from which they 
started will be the one they use to continue, with the business layer 
determining where they should start. The simple case would be an single 
page form that can be saved at any point. A more complex case would be 
an eLearning or testing application where each request requires the 
users place to be saved. In both cases we are talking of a single 
resource that is controlled in the business layer and not in flow. In 
such applications the use of flow is simply to mediate between the 
presentation and the model; i.e. validate input or display a reading 
selection and then present questions to be answered.

No matter how cool the URI, its the application that determines the 
user friendliness. If you're session expired you will have to log in or 
have a cookie with your data to continue where you left off. Whether 
each page has a unique URI is unimportant as long as the application is 
smart enough to know where you left off.

On a related note, the idea of using 123456.cont as part of a URI that 
a user can see is SOOO UNCOOL. Hide these things in forms so a user 
will never see them. Once a user enters a flow that should be the last 
and only URL they see in the address bar until they exit flow and/or 
choose to move to a new resource.

Holding such a view as to how and when to use flow allows cool URLs 
since the resource essentially is a flow function that controls user 
interaction with the model; in other words  a mini-application.

The point is as things currently stand cool URLs are possible even 
considering flow.

>
> If you think that reusable URLs is a good idea, not only in publishing 
> oriented sites but in webapps as well, you will need to have more 
> external URLs and your webapp will get more in common with publishing 
> oriented sites.

URL wise there is no reason for them to be different, at least from a 
user perspective. Further, there never was. I don't see the need to 
have more external URIs, I just better designed web applications. The 
resource is the application not an individual page the application 
presents.

To my mind what a tree based sitemap buys is a better model of the site 
as application (publication or webapp).  Because of that it just might 
lead to cooler URLs but not guarantee it.

>
>

Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com

A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to 
worry about answers."
- Thomas Pynchon Gravity's Rainbow

Re: [RT] Escaping Sitemap Hell

Posted by Daniel Fagerstrom <da...@nada.kth.se>.

Ugo Cei wrote:

> Il giorno 06/gen/05, alle 01:54, Daniel Fagerstrom ha scritto:
>
>>
>> The requested resource will often be based on some content or 
>> combination of content that we can access from Cocoon. The content 
>> can be a file, data in a db, result from a business object etc. Let 
>> us assume that it resides in some kind of content repository. Now if 
>> we think about it, isn't it more natural to ask the content, that we 
>> are going to use, about its propertiies like type, format, status, 
>> access rights, etc, than to encode it in the URL? This properties can 
>> be encoded in the file name, in metadata in some property file, 
>> within the file, in a DB etc. Now instead of having the rule:
>>
>> *.x.y.z ==> XYZPipeline
>>
>> we have
>>
>> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>>
>> or
>>
>> * where repository:{1}.x.y.z exists ==> XYZPipeline
>>
>> We get the right pipeline by querying the repository instead of 
>> encoding it in the URL. A further advantage is that the rule becomes 
>> "listable" as the "where" clause in the match expresses what are the 
>> allowed values for the wildcard.
>
>
> Unless I misinterpret what you mean, we already can do this:
>
> <map:match pattern="*">
>   <map:call function="fun">
>     <map:parameter name="par" value="{1}"/>
>   </map:call>
> </map:match>
>
> function fun() {
>   var entity = repo.query("select * from entities where id = " + 
> cocoon.parameters.par + " and {x, y, z}");
>   cocoon.sendPage("views/XYZPipeline", { entity : entity });
> }
>
> <map:match pattern="XYZPipeline">
>   <map:generate type="jx" src="xyz.jx.xml"/>
>  ...
> </map:match> 

My example was a little bit unclear what I meant was that you could have 
a number of sitemap rules:

* where repository:{1} have properites {x, y, z} ==> XYZPipeline
* where repository:{1} have properites {a, b, c} ==> ABCPipeline
* where repository:{1} have properites {a, b, d} ==> ABDPipeline

Which rather would correspond to:

<map:match pattern="*" where="repository:{1} have properites {x, y, z}" 
type="property">
   <!-- call XYZPipeline -->
</map:match>
<map:match pattern="*" where="repository:{1} have properites {a, b, c}" 
type="property">
   <!-- call ABCPipeline -->
</map:match>
etc

That would be rather inneficient in the current sequencial search based 
sitemap, while it could be efficient in a tree based matcher.

> Apart from the obviously contrived example, isn't the Flowscript just 
> what we need to "get the right pipeline by querying the repository"? 

You could implement that above example by putting the property based 
switch in a flowscript instead, thats true. My own experience with using 
a flowscript as switchboard, have made me believe that it is an anti 
patern that should be avoided. But maybe other people have been luckier.

One of my aims with the RT was to make the sitemap more usable as a map 
over the site, by making it tree structured and more declarative. From 
that view using flowscripts instead of sitemaps is a step in the wrong 
direction IMO.

>> I would propose to go even further, in the "real" site map it should 
>> only be allowed to call VPC pipelines, no pipeline construction is 
>> allowed, that should be done in the component area.
>
>
> In my sitemaps, public pipelines contain almost only <map:call> and 
> <map:read> (for static resources) elements. All "classical" 
> generator-transformer-serializer pipelines go into an "internal-only" 
> pipeline that can be called from flowscripts only.
>
> Admittedly, this is fine for webapps, and maybe not so much for 
> publishing-oriented websites. But what I want to point out is that 
> your otherwise very well thought-out RT is incomplete if it doesn't 
> take Flowscript in consideration, IMHO.

I hoped that no one would notice, and that we could discuss the publish 
oriented stuff before handling such complications ;)

But you are completely right, flowscripts must also be discussed. This 
breaks down in two parts: how do we design a cool URL space for a 
flowscript driven webapp, and how do we implement such a URL space in 
Cocoon. Does what we have give good support or do we need new mechanisms?

I'll try to say something about cool URLs for flowscripts and leave the 
second part to another time or as a rather non trivial exercise for the 
interested reader ;)

Cool URLs for webapps
=====================

Ok, I continue to uncritically assume that the guidelines in 
http://www.w3.org/TR/2003/NOTE-chips-20030128/ are good and try to apply 
them on the current situation. Given that, I don't consider:

  12345.cont

particulary cool. No way back at all after your web-continuations have 
expired. Sometimes that might be the most reasonable response that is 
available. But we could do better than always using it.

When planning the URL space for the webapp, all allowed access points to 
the webapp must be given a "cool" URL. Let us say that we have a wizard 
and that users only are allowed to access it from the first screen when 
they don't have a valid session. in this case we could have:

  wizard

as URL to the start point and

  wizard?cont=12345

as URL to a screen within the session. If that URL is bookmarked and 
used after the expiration of the continuation, the user will get a 
permanent redirect to the "wizard" URL as response, possibly with an 
"session expired" message in the respons page.

If we follow this idea, we can think of our webapps in a transaction 
oriented way. Each time we have "commited a transaction" in our 
flowscript we could give a "cool" URL for the next screen in the flow 
(if it is allowable as a starting point). Now, this is a little bit 
tricky as the user probably made a post to something like:

  wizard?cont=23456

where the transaction was committed, and that the flow could continue to 
"wizard2" or "wizard3" depending on user choice. This could maybe be 
solved by doing a redirect to the new wizard.

If we are creating a persistent object in the wizard we can take it 
further if we want to. Then we start the wizard the object is given an id:

  123456/wizard

then we can go back to a wizard initialized from the object later. If we 
want to use some other identification after the object is commited, we 
have to save a mapping from the initial id to the correct one and do a 
permanent redirect when the URL is accesed. If we don' like going to a 
wizard we do a permanent direct to something more relevant. We could 
even let the user go back to specific pages for persistent objects:

  123456/wizard/3

for such cases there are not much use in having a continuation id at all 
in the URL except if we want to let the user having multiple instances 
of the same page with different content.

If we follow the URL style without continuation IDs we use the session 
for differing between users instead.

                                 --- o0o ---

How much work you should spend in creating "cool" URLs in your webapp, 
varies of course from application to application. I just wanted to point 
out that we can do more than 123456.cont if we want to. I have used more 
than one webapp that goes through many "transactions" during use, but 
force me to start the navigation from some start screen if my session 
expires. At least I would found such applications more userfriendly if 
they had used "cooler" URIs.

If you think that reusable URLs is a good idea, not only in publishing 
oriented sites but in webapps as well, you will need to have more 
external URLs and your webapp will get more in common with publishing 
oriented sites.

/Daniel

RE: [RT] Escaping Sitemap Hell

Posted by Conal Tuohy <co...@paradise.net.nz>.

> Il giorno 06/gen/05, alle 01:54, Daniel Fagerstrom ha scritto:

> > * where repository:{1}.x.y.z exists ==> XYZPipeline
> >
> > We get the right pipeline by querying the repository instead of
> > encoding it in the URL. A further advantage is that the
> rule becomes
> > "listable" as the "where" clause in the match expresses
> what are the
> > allowed values for the wildcard.

Ugo Cei wrote:

> Unless I misinterpret what you mean, we already can do this:
>
> <map:match pattern="*">
>    <map:call function="fun">
>      <map:parameter name="par" value="{1}"/>
>    </map:call>
> </map:match>
>
> function fun() {
>    var entity = repo.query("select * from entities where id = " +
> cocoon.parameters.par + " and {x, y, z}");
>    cocoon.sendPage("views/XYZPipeline", { entity : entity });
> }
>
> <map:match pattern="XYZPipeline">
>    <map:generate type="jx" src="xyz.jx.xml"/>
>   ...
> </map:match>
>
> Apart from the obviously contrived example, isn't the Flowscript just
> what we need to "get the right pipeline by querying the repository"?

I was struck by your example because right now we are revising our website
using the same technique you describe, with a single external pipeline
calling a flowscript. (BTW the revised website isn't public yet but should
be ready next month.) We're using a topic map as the metadata repository
(with TM4J). As in Daniel's example it completely decouples the external URL
space from the URLs of internal pipelines.

In the "external" sitemap, we just marshall a couple of parameters out of
the URL and request headers, and pass them to a flowscript. This is the
first time I've used flowscript, but it has been fairly easy to write and
it's worked pretty well.

The flowscript queries the topic map to find the topic to display, and the
appropriate internal pipeline to use. It also looks up other "scope" topics
which define different viewpoints of the other topics. They are such things
as different languages, and (since this is a digital library application) we
also have "simplified" and "scholarly" scopes. The flowscript traverses the
class-instance and superclass-subclass hierarchies between the topics
looking for a jxtemplate to use (in the appropriate scope). Finally it
passes the "content" topic and the "scope" topics (and a basic ontology of
other topics) to the specified jxtemplate pipeline.

> In my sitemaps, public pipelines contain almost only <map:call> and
> <map:read> (for static resources) elements. All "classical"
> generator-transformer-serializer pipelines go into an "internal-only"
> pipeline that can be called from flowscripts only.
>
> Admittedly, this is fine for webapps, and maybe not so much for
> publishing-oriented websites.

Yes I think for webapps you could do the mapping just in javascript, but for
publishing I think you really need a metadata repository of some sort. You
could use an xml document linkbase, as Daniel suggests, or a cms, or a topic
map or rdf store, or a sql db or any number of things.

Con

Re: [RT] Escaping Sitemap Hell

Posted by Ugo Cei <ug...@apache.org>.

Il giorno 06/gen/05, alle 01:54, Daniel Fagerstrom ha scritto:

>
> The requested resource will often be based on some content or 
> combination of content that we can access from Cocoon. The content can 
> be a file, data in a db, result from a business object etc. Let us 
> assume that it resides in some kind of content repository. Now if we 
> think about it, isn't it more natural to ask the content, that we are 
> going to use, about its propertiies like type, format, status, access 
> rights, etc, than to encode it in the URL? This properties can be 
> encoded in the file name, in metadata in some property file, within 
> the file, in a DB etc. Now instead of having the rule:
>
> *.x.y.z ==> XYZPipeline
>
> we have
>
> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>
> or
>
> * where repository:{1}.x.y.z exists ==> XYZPipeline
>
> We get the right pipeline by querying the repository instead of 
> encoding it in the URL. A further advantage is that the rule becomes 
> "listable" as the "where" clause in the match expresses what are the 
> allowed values for the wildcard.

Unless I misinterpret what you mean, we already can do this:

<map:match pattern="*">
   <map:call function="fun">
     <map:parameter name="par" value="{1}"/>
   </map:call>
</map:match>

function fun() {
   var entity = repo.query("select * from entities where id = " + 
cocoon.parameters.par + " and {x, y, z}");
   cocoon.sendPage("views/XYZPipeline", { entity : entity });
}

<map:match pattern="XYZPipeline">
   <map:generate type="jx" src="xyz.jx.xml"/>
  ...
</map:match>

Apart from the obviously contrived example, isn't the Flowscript just 
what we need to "get the right pipeline by querying the repository"?

> I would propose to go even further, in the "real" site map it should 
> only be allowed to call VPC pipelines, no pipeline construction is 
> allowed, that should be done in the component area.

In my sitemaps, public pipelines contain almost only <map:call> and 
<map:read> (for static resources) elements. All "classical" 
generator-transformer-serializer pipelines go into an "internal-only" 
pipeline that can be called from flowscripts only.

Admittedly, this is fine for webapps, and maybe not so much for 
publishing-oriented websites. But what I want to point out is that your 
otherwise very well thought-out RT is incomplete if it doesn't take 
Flowscript in consideration, IMHO.

	Ugo

-- 
Ugo Cei - http://beblogging.com/blojsom/blog/

Re: [RT] Escaping Sitemap Hell

Posted by Ralph Goers <Ra...@dslextreme.com>.

Daniel Fagerstrom wrote:

>
>>   The layout actually separates the site layout from the sitemap 
>> quite nicely.
>
>
> Care to explain what I should look for, and in what document I find 
> the info?
>
The easiest way to see what I am talking about is to look at the portal 
sample site.  Look at 
build/webapp/samples/blocks/portal/profiles/layout/portal.xml. This is 
the site layout for after you have logged in. portal-user-anonymous.xml 
is for before you have logged in (technically, you are logged in as user 
anonymous).  This is documented at 
http://cocoon.apache.org/2.1/developing/portal/portal-block.html.

If you have any other questions feel free to ask.

Re: [RT] Escaping Sitemap Hell

Posted by Daniel Fagerstrom <da...@nada.kth.se>.

Ralph Goers wrote:

> Daniel Fagerstrom wrote:
>
>>
>> Is it a Map of the Site?
>> ------------------------
>>
>> The Forrest people don't think that the sitemap is enough as map of 
>> the site. They have a special linkmap [1] that gives a map over the 
>> site and that is used for internal navigation and for creating menu 
>> trees. I have a similar view. From the sitemap it can be hard to 
>> answer basic questions like:
>>
>> * What is the URL space of the application
>> * What is the structure of the URL space
>> * How is the resource refered by this URL produced
>>
>> The basic view of a URL in the sitemap is that it is any string. Even 
>> if there are constructions like mount, the URL space is not 
>> considered as hierarchial. That means that the URLs can be presented 
>> as patterns in any order in the sitemap and you have to read through 
>> all of it to see if there is a rule for a certain URL.
>>
>> A real map for the site should be tree structured like the linkmap in 
>> forrest. Take a look at the example in [1], (I don't suggest using 
>> the "free form" XML, something stricter is required). Such a tree 
>> model will also help in planning the URI space as it gives a good 
>> overview of it.
>
>
> Out of curiosity, have you looked at the Portal block?

Not until now. But I spent to large part of the night writing my RT to 
be able to understand how everything worked together :/

>   The layout actually separates the site layout from the sitemap quite 
> nicely.

Care to explain what I should look for, and in what document I find the 
info?

>   Unfortunately, the Portal URLs are not cool.  I'm in the process 
> right now of attempting to fix that.

Have you looked at XFrames URLs http://www.w3.org/TR/xframes/, they 
adress a problem area that should be quite similar to the one you have 
in portals, IIUC.

|  http://example.org/home.frm#frames(id1=uri1,id2=uri2,...)|

You need to write a parser for the frames part, but I could use that 
booth in my attribute template language proposal and in the virtual 
pipeline URLs that I proposed in the current thread, so it is well 
invested time for me if I convince you to write such a parser ;)

/Daniel

> We are finding that exposing the Portal's events in the browser causes 
> JSR-168 portlets to fall on their face simply by reloading the page. 
> Not cool at all.
>
> Ralph
>

Re: [RT] Escaping Sitemap Hell

Posted by Ralph Goers <Ra...@dslextreme.com>.

Daniel Fagerstrom wrote:

>
> Is it a Map of the Site?
> ------------------------
>
> The Forrest people don't think that the sitemap is enough as map of 
> the site. They have a special linkmap [1] that gives a map over the 
> site and that is used for internal navigation and for creating menu 
> trees. I have a similar view. From the sitemap it can be hard to 
> answer basic questions like:
>
> * What is the URL space of the application
> * What is the structure of the URL space
> * How is the resource refered by this URL produced
>
> The basic view of a URL in the sitemap is that it is any string. Even 
> if there are constructions like mount, the URL space is not considered 
> as hierarchial. That means that the URLs can be presented as patterns 
> in any order in the sitemap and you have to read through all of it to 
> see if there is a rule for a certain URL.
>
> A real map for the site should be tree structured like the linkmap in 
> forrest. Take a look at the example in [1], (I don't suggest using the 
> "free form" XML, something stricter is required). Such a tree model 
> will also help in planning the URI space as it gives a good overview 
> of it.

Out of curiosity, have you looked at the Portal block?  The layout 
actually separates the site layout from the sitemap quite nicely.  
Unfortunately, the Portal URLs are not cool.  I'm in the process right 
now of attempting to fix that. We are finding that exposing the Portal's 
events in the browser causes JSR-168 portlets to fall on their face 
simply by reloading the page. Not cool at all.

Ralph