You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Daniel Fagerstrom <da...@nada.kth.se> on 2005/01/06 01:54:09 UTC
[RT] Escaping Sitemap Hell
(was: Splitting xconf files step 2: the sitemap)
Although the Cocoon sitemap is a really cool innovation it is not
entierly without problems:
* Sitemaps for large webapps easy becomes a mess
* It sucks as a map describing the site [1]
* It doesn't give that much support for "cool URLs" [2]
In this RT I will try to analyze the situation especially with respect
to URL space design and then move on to discuss a possible solution.
Before you entusiasticly dive into the text:
* It is a long RT, (as my RTs usually are)
* It might contain provoking and hopefully even thought provoking ideas
* No, I will not require that everything in it should be part of 2.2
* No, I don't propose that we should scrap the current sitemap, actually
I believe that we should support it for the next few millenia ;)
--- o0o ---
Peter and I had some discussion:
Peter Hunsberger wrote:
> On Tue, 04 Jan 2005 13:25:05 +0100, Daniel Fagerstrom
> <da...@nada.kth.se> wrote:
<snip/>
>> Anyway, sometimes when I need to refactor or add functionallity to
>> some of our Cocoon applications, where I or colleagues of mine have
>> written endless sitemaps, I have felt that it would have been nice if
>> the sitemap would have been more declarative so that I could have
>> asked it basic things like geting a list of what URLs or URL pattern
>> it handles. Also if I have an URL in a large webapp and it doesn't
>> work as expected it can require quite some work to trace through all
>> involved sitemaps to see what rule that actually is used.
>
>> Of course I understand that if I used a set of efficient conventions
>> about how to structure my URL space and my sitemaps the problem would
>> be much less. Problem is that I don't have found such a set of
>> conventions yet. Of course I'm following some kind of principles, but
>> I don't have anything that I'm completely happy with yet. Anyone
>> having good design patterns for URL space structuring and sitemap
>> structuring, that you want to share?
>
>
>We have conventions that use sort of type extensions on the names:
>patient.search, patient.list, patient.edit where the search, list,
>edit (and other) screen patterns are common across many different
>metadata sources (in this case patient). We don't do match *.edit
>directly in the sitemap (any more) but I find that if you've got to
>handle orthoganal concerns then x.y.z naming patterns can sometimes
>help.
>
Ok, lets look at this in a more abstracted setting:
Resource Aspects
================
In the example above we have an object or better a _resource_, the
patient that everything else is about. The resource should be
identifyable in an unique way in this case with e.g. the social security
number.
There are a number of _operations_ that can be performed at the patient
resource: show, edit, list, search etc, (although the search might be on
the set of patient rather than a single one).
The resource has a _type_, patient, that might affect how we choose to
show it etc.
There are in general other aspects that will stear how we render the
response when someone asks for the resouce:
* The _format_ of the response: html, pdf, svg, gif etc.
* The _status_ of the resource: old, draft, new etc.
* The _access_ rights of the response: public, team, member etc.
There are plenty of other possible aspect areas as well.
Cool Webapp URLs
================
I searched the web to gain some insights in URL space design. It soon
become clear that I should re-read Tim Berners-Lee's clasic, "Cool URIs
don't change" [2]. I must say I wasn't prepared to the chock, I had
completely missed how radical the message in it was when I read it the
last time. I can also recomend reading [3], a W3C note that codifies the
message from [2] and some other good URI practices into a set of guidelines.
So what is an URI? According to [3]:
A URI is, actually, a //reference to a resource, with fixed and
independent semantics/ /.
This means that the URI should reference to a specific product,
_always_. Independent semantics means that a social security number is
not enough, it should say that it is a person (from USA) as well. See
[3] for the philosophical details.
* The URI should be easy to type
* It should not contain to much meaning, especially not about
implementation details
Now I try to apply the ideas from [2] and [3] on the different resource
aspects mentioned above. When I use words like "should" or "should not"
without any motivation it means that I believed in the motivation from
the gurus in the references ;) I will try to motivate my own ideas ;)
What I'm going to suggest might be quite far from how you design your
URL spaces. It is certainly far from the implementation detail plauged
mess that I have created in my own applications.
The Resource
------------
The idea is that an URL identifies a resource. For the patient case
above it could be:
http://myhospital.com/person/123456789
If we use a hierarchial URI space like /person/123456789, the "parent"
URIs e.g. /person should also refer to a resource. Its in most cases not
a good idea to put a lot of topics classification effort in the URI
hierarchy. Classifications are not unique and will change according to
changing interests and world view.
Operations
----------
What about the operations on the resource: list, search, edit etc? I
find the object oriented style in WebDAV elegant where you use one URL
together with different HTTP methods to perform different operations.
Sam Ruby also have some intersting ideas about using URLs to identify
"objects" and different SOAP messages for different methods on the
object in his "REST+SOAP" article [4]. But neither adhoc HTTP methods or
XML posts seem like good candidates for invoking operations on a
resource in a typical webapp. So maybe something like:
/person/123456789/edit or
/person/123456789.edit or
/person/123456789?operation=edit
is a good idea.
Resource Type
-------------
Should the type of the resource be part of the URI? We probably have to
contain some type info in the URL to give it "independent sematics"
(person e.g.). But we should not put types that might change like
patient, manager, project-leader etc in the URL. And we should
especially avoid types that only have to do with implementation details
like what pipeline we want to use for rendering the resource.
Format
------
Cocoon especially shines in handling all the various file name
extensions: .html, .wml, .pdf, .txt, .doc, .jpg, .png, .svg, etc, etc.
But I'm sorry, if you want cool URLs you have to kiss them godbye as well ;)
It might be a good idea to send a html page to a browser on a PC and a
wml page to a PDA user. But you shouldn't require your user to remember
different URLs for different clients, thats a task for server driven
content negotiation.
Using .html is not especially future proof, should all links become
invalid when you decide to reimplement your site with dynamic SVG?
Often it is good to provide the user with a nice printable version of
your page. But why should you advertice Adobes products in your URLs. A
few years ago it was .ps or .dvi from academic sites and .doc in
comersial sites. Right now it happen to be .pdf but will that be forever?
Same thing with images, the user don't care about the format as long as
it can be shown in the browser (content negotiation), neither should you
make your content links or (Googles image search) be dependent on a
particular compression scheme that happen to be popular right now.
There are of course cases where you really whant to give your user the
abillity to choose a specific format. Then a file name extension is a
good idea. If you happens to maintain
http://www.adobe.com/products/acrobat/ its ok to put some .pdf there e.g. ;)
But in most cases file name extensions is an implementation detail that
not is relevant for your users.
Status
------
The status will by definition change, and that make your URL uncool if
the status was part of the URL.
Access Rights
-------------
Access rigths will often change for a document. I know it is easy to
write path dependent rules for access rights in most webserver
configuration files. But you expose irrelevant implementation details
and its not future proof.
Am I Really Serious?
--------------------
Why should a webapp URL be cool and future proof? Well, its the
interface to your webapp. We agree that we shouldn't change interfaces
in Cocoon at a whim, why should we treat the users of our webapps
differently? And like it or not, usefull software sometimes lives for
decades. If you build useful webapps you should consider planing ahead.
Currently we are all used with webapps that uses the most horrible URLs
containing tons of implementation details and changing every now and
then. But it is not a law of nature that it must be like that. It is
mainly a result of webapp development still being immature and the tools
being far from perfect. Of course the user should be able to bookmark a
useful form or wizard.
Also I believe that exposing implementation details in ones URLs is at
least as bad as making all member variables public in Java classes. It
makes your webapp monolithic and fragile.
--- o0o ---
You might find the views expressed above rather extreme and maybe
unpractical. As indicated above they are also far away from what I
curently do in my webapps. But I have for quite some time thought about
how to fight the to easily increasing entropy in the webapps we develop.
I have suspected that badly designed URL spaces has been part of the
trouble. And when I re-read Tim BLs classic I suddenly realized that the
habit of exposing implementation in the URLs might be at the root of the
evil.
If this realization will survive the contact with your comments and
other parts of reality is of course to early to tell ;)
Does Cocoon Support Cool URLs?
==============================
But how does Cocoon support the above ideas about URL space design?
Well, in some way one could say that it supports it. The sitemap is so
powerfull that you can program most usage patterns in it in some more or
less elegant way. But AFAICS, writing webapps following the URL space
design ideas above would be rather tricky. So I would say that Cocoon
doesn't support it that well. The main reasons are:
* The sitemap is not that usefull as a site map
* The sitemap gives excelent support for choosing resource production
implementation based on the implementation details coded into the URL,
but not for avoiding it
* The sitemap mixes site map concerns with resource production
implementation details
Is it a Map of the Site?
------------------------
The Forrest people don't think that the sitemap is enough as map of the
site. They have a special linkmap [1] that gives a map over the site and
that is used for internal navigation and for creating menu trees. I have
a similar view. From the sitemap it can be hard to answer basic
questions like:
* What is the URL space of the application
* What is the structure of the URL space
* How is the resource refered by this URL produced
The basic view of a URL in the sitemap is that it is any string. Even if
there are constructions like mount, the URL space is not considered as
hierarchial. That means that the URLs can be presented as patterns in
any order in the sitemap and you have to read through all of it to see
if there is a rule for a certain URL.
A real map for the site should be tree structured like the linkmap in
forrest. Take a look at the example in [1], (I don't suggest using the
"free form" XML, something stricter is required). Such a tree model will
also help in planning the URI space as it gives a good overview of it.
The Forrest linkmap have no notion of wildcards, which is a must in
Cocoon. We continue discussing that.
Choosing Production Pipeline
----------------------------
With the sitemap it is very easy to choose the pipeline used for
producing the response based on a URL pattern "*.x.y.z". That more or
less forces the user to code implementation details i.e. what pipeline
to use into the URL. This is only a problem for wildcard patterns
otherwise we just associate the pipeline to the concrete "cool URL".
Before I suggested that aspects like: type, format, status, access
rights etc shouldn't be part of the URL as those aspects might change
for the resource. OTH these aspects certainly are necessary for choosing
rendering pipeline, what should we do?
The requested resource will often be based on some content or
combination of content that we can access from Cocoon. The content can
be a file, data in a db, result from a business object etc. Let us
assume that it resides in some kind of content repository. Now if we
think about it, isn't it more natural to ask the content, that we are
going to use, about its propertiies like type, format, status, access
rights, etc, than to encode it in the URL? This properties can be
encoded in the file name, in metadata in some property file, within the
file, in a DB etc. Now instead of having the rule:
*.x.y.z ==> XYZPipeline
we have
* where repository:{1} have properites {x, y, z} ==> XYZPipeline
or
* where repository:{1}.x.y.z exists ==> XYZPipeline
We get the right pipeline by querying the repository instead of encoding
it in the URL. A further advantage is that the rule becomes "listable"
as the "where" clause in the match expresses what are the allowed values
for the wildcard.
Separating the Concerns
-----------------------
The sitemap handles two concerns: it maps an URL to a pipeline that
produces a repsonse and it describes how to put together this pipeline
from sitemap components. The first concern is related to site design and
the second is more a form of programming. Puting them together makes it
hard to see the URL structure and also makes it tempting to group URLs
based on common pipeline implementation instead of on site structure.
Virtual Pipeline Components (VPCs) give us a way out from this. Large
parts of our sites might be buildable with pipelines allready
constructed in some standard blocks.
I would propose to go even further, in the "real" site map it should
only be allowed to call VPC pipelines, no pipeline construction is
allowed, that should be done in the component area.
In the "real" site map the current context is set and the the arguments
to the called VPC is given.
Search Order
------------
> The problem for us, is as you allude to at the start of this
>thread: Cocoon takes the first match, where what you really want is a
>more XSLT "best match" type of handling; sometimes *.a, *.b, *.c works
>and other times it's m.*, n.*, o.*...
>
>In the past that has lead me to suggest a sort of XSLT flow, but
>thinking about it in this light I wonder if what I really want is just
>XSLT sitemap matching (same thing in the end)...
>
>
I also believe that a "best match" type of handling is preferable, it
increases IMO usabillity and it also makes it possible to use tree based
maching algoritms that are far more efficient than the current linear
search based.
The new sitemap
===========
To sum up the proposal:
Pipelines:
* Pipeline construction is only done as VPCs in component areas (often
in blocks).
Sitemap:
* The sitemap is folow the tree structure of the URL space (like the
Forrest linkmap).
* Its responsibillity is to map URLs to VPCs
* It can set the current context for each level in the tree (for
derefering relative paths used in the VPC)
* Wildcards can have restrictions based on properties in the content
repository
* Its best match based rather than rule order based
* Of course we have an include construct so that we can reuse sub sites
It might look like:
<sitemap>
<path match="person" context="adm/persons"
pipeline="block:skin:default(search.xml)">
<path match="*:patient" test="mydb:/patients/{patient} exists"
context="adm/patients" pipeline="journal-summary({patient})">
<path match="edit" pipeline="edit({patient})"/>
<path match="list" pipeline="list({patient})"/>
<!-- and so on -->
</path>
</path>
</sitemap>
Don't care about the syntactical details in the example it needs much
more thought, I just wanted to make it a little bit more concrete. The
path separator "/" is implicily assumed between the levels. "*:patient",
means that the content of "*" can be refered to as "patient".
Much of what I propose can be achieved with VPCs and a new "property
aware" matcher. But IMO the stricter SoC above, the ability to "query"
the sitemap, the possible advantages of the "best match" search, are
reasons enough to go further.
WDYT?
/Daniel
[1] "site.xml" http://forrest.apache.org/docs/dev/linking.html
[2] "Cool URIs don't change", http://www.w3.org/Provider/Style/URI.html
[3] "Common HTTP Implementation problems"
http://www.w3.org/TR/2003/NOTE-chips-20030128/
[4] "REST + SOAP"
http://www.intertwingly.net/stories/2002/07/20/restSoap.html
Re: [RT] Escaping Sitemap Hell
Posted by Niclas Hedhman <ni...@hedhman.org>.
On Thursday 06 January 2005 19:37, Daniel Fagerstrom wrote:
> >As a sidenote; Tapestry is battling with these types of problems as well
> > (but a somewhat different level), and handles it by allowing an
> > interceptor in the URL encoding/decoding phase.
>
> Do you have any references that sumarize that?
Summarize is perhaps not the word, but a complete guide how it is done;
http://wiki.apache.org/jakarta-tapestry/FriendlyUrls
Cheers
Niclas
--
---------------
All those who believe in psychokinesis, raise my hand.
- Steven Wright
+---------//-------------------+
| http://www.dpml.net |
| http://niclas.hedhman.org |
+------//----------------------+
Re: [RT] Escaping Sitemap Hell
Posted by Daniel Fagerstrom <da...@nada.kth.se>.
Niclas Hedhman wrote:
>On Thursday 06 January 2005 08:54, Daniel Fagerstrom wrote:
>
>Good post !
>
Thanks :)
>>But you shouldn't require your user to remember
>>different URLs for different clients, thats a task for server driven
>>content negotiation.
>>
>>Using .html is not especially future proof, should all links become
>>invalid when you decide to reimplement your site with dynamic SVG?
>>
>>
>
>I have thought about this a million times over the last 5 years, and first
>concluded "yeah, let's do that", and then back tracked since the file system
>is not 'negotiating' and having the same stuff working locally is always a
>big plus.
>
Are you refering to the Forrest kind of situation where you can generate
a static "site" at your hard disk and access it directly?
Even in such situations you could, at least in principle, do format
negotiation by leting the "cool" file URL point to a html page that has
alternate media type liks
http://www.w3.org/TR/REC-html40/struct/links.html#h-12.3,
http://www.w3.org/TR/REC-html40/types.html#type-media-descriptors. Then
the media type links in turn points to the screen, tty, projection,
handheld, print etc page that the browser can use, depending on its
preferences. The altenate media pages could in turn have a bookmark link
that point to the cool URL. I don't know if this works well in practice
with common browsers.
As I'm rather fed up with the URL space hell that I have created in my
own applications, I choosed to take a fundamentalistic view concerning
"fixed and independed semantics" of URLs to see where it leads. And as
you could see in the second half of my post it is not only some file
systems that have limited support for cool URLs, AFAIK Cocoons support
is rather limited as well.
Now in practice I think that URL space design have similarities with API
design. For some externally used intefaces and URLs, the cost of change
is very high as many users depend on them, for internal interfaces and
URLs the cost of change is much lower, so you don't need to care that
much about the design. But exposing lots of internal implementation
details in your APIs and URL spaces is often to ask for trouble.
>>But it is not a law of nature that it must be like that. It is
>>mainly a result of webapp development still being immature and
>>the tools being far from perfect. Of course the user should be
>>able to bookmark a useful form or wizard.
>>
>>
>
>Now you have the interesting thing of 'temporary URLs' used for session
>sensitivity. How often doesn't one bookmark a page and later coming back
>"Session has expired" type of resource not found??
>Would be real cool if the web app system could help dramatically in this
>field.
>
The W3C note i refered to has some guidelines
http://www.w3.org/TR/2003/NOTE-chips-20030128/#gl3.
My view is that we should strive for making it as easy as possible to
follow god web practicies (like those described in
http://www.w3.org/TR/2003/NOTE-chips-20030128/), when using Cocoon.
>As a sidenote; Tapestry is battling with these types of problems as well (but
>a somewhat different level), and handles it by allowing an interceptor in the
>URL encoding/decoding phase.
>
Do you have any references that sumarize that?
> The internal URL space remains, and a
>user-created component translates URLs back and forth between the public
>space and the internal one.
>
That is similar to what I propose. The new tree structured sitemap
translates public URLs to an internal space of (virtual) pipeline calls.
Due to Cocoons dynamic nature the internal "URL space", is more abstract
than file reference URLs, but we can (and probably should) make it an
URL space anyway.
The XFrames URIs http://www.w3.org/TR/xframes/ can serve as inspiration:
| http://example.org/home.frm#frames(id1=uri1,id2=uri2,...)
We can have:
block:myblock#bar-chart-view(data=mydata.xml)
Where relative URLs ("mydata.xml" in this case) are relative to the
current context (actually it is a little bit more complicated
http://marc.theaimsgroup.com/?t=110064560900003&r=1&w=2). The current
context can be set at each level in the tree sitemap.
Thanks to the declarative nature of my proposed tree sitemap and that
the wildcards are "typed" so that their range can be found, the sitemap
can be inverted to a maping from internal URLs to external ones.
|/Daniel
Re: [RT] Escaping Sitemap Hell
Posted by Niclas Hedhman <ni...@hedhman.org>.
On Thursday 06 January 2005 08:54, Daniel Fagerstrom wrote:
Good post !
> But you shouldn't require your user to remember
> different URLs for different clients, thats a task for server driven
> content negotiation.
>
> Using .html is not especially future proof, should all links become
> invalid when you decide to reimplement your site with dynamic SVG?
I have thought about this a million times over the last 5 years, and first
concluded "yeah, let's do that", and then back tracked since the file system
is not 'negotiating' and having the same stuff working locally is always a
big plus.
> But it is not a law of nature that it must be like that. It is
> mainly a result of webapp development still being immature and
> the tools being far from perfect. Of course the user should be
> able to bookmark a useful form or wizard.
Now you have the interesting thing of 'temporary URLs' used for session
sensitivity. How often doesn't one bookmark a page and later coming back
"Session has expired" type of resource not found??
Would be real cool if the web app system could help dramatically in this
field.
As a sidenote; Tapestry is battling with these types of problems as well (but
a somewhat different level), and handles it by allowing an interceptor in the
URL encoding/decoding phase. The internal URL space remains, and a
user-created component translates URLs back and forth between the public
space and the internal one.
Cheers
Niclas
--
---------------
All those who believe in psychokinesis, raise my hand.
- Steven Wright
+---------//-------------------+
| http://www.dpml.net |
| http://niclas.hedhman.org |
+------//----------------------+
Re: [RT] Escaping Sitemap Hell
Posted by Peter Hunsberger <pe...@gmail.com>.
On Thu, 06 Jan 2005 01:54:09 +0100, Daniel Fagerstrom
<da...@nada.kth.se> wrote:
> (was: Splitting xconf files step 2: the sitemap)
>
Interesting thoughts, I didn't get much sleep last night, so excuse me
if any of the following comes across as a bit grumpy, no flames or
criticism is intended...
(Excuse the random un-annotated snips, don't have much time today.)
> Although the Cocoon sitemap is a really cool innovation it is not
> entierly without problems:
>
> * Sitemaps for large webapps easy becomes a mess
> * It sucks as a map describing the site [1]
> * It doesn't give that much support for "cool URLs" [2]
Except for the issue of matching which you get into later, I'm not
sure that I see a big problem here:
- sitemaps for large webapps only become a mess if you design the URL
space badly. Better suport for matching could help, and that issue
might be tied to how the sitemap is managed within Cocoon, but my gut
feel says it's just a matcher issue;
- a Cocoon sitemap is not Website site map, nor should it be. Maybe
the name "sitemap" isn't the best? I think having the ability to
produce a "URL Map" from something that crawls the sitemap is maybe
part of the answer here.
- I don't think support of Cool URLs is supported or hindered by the
sitemap ? Maybe more on that below.
<snip/>
> The Resource
> ------------
>
> The idea is that an URL identifies a resource. For the patient case
> above it could be:
>
> http://myhospital.com/person/123456789
>
> If we use a hierarchial URI space like /person/123456789, the "parent"
> URIs e.g. /person should also refer to a resource. Its in most cases not
> a good idea to put a lot of topics classification effort in the URI
> hierarchy. Classifications are not unique and will change according to
> changing interests and world view.
I tend to map my URL hierarchy to an abstract object hierarchy. In
this case patient can be thought of as an object. The patient with
the id 12345678 is an instance and as such that identifier should not
be part of the URL hierarchy. Instead, object attributes become
request parameters. This may seem like a quibble but I think it's
important as one tries to figure out good rules for hierarchy mapping.
>
> Operations
> ----------
>
> What about the operations on the resource: list, search, edit etc? I
> find the object oriented style in WebDAV elegant where you use one URL
> together with different HTTP methods to perform different operations.
>
> Sam Ruby also have some intersting ideas about using URLs to identify
> "objects" and different SOAP messages for different methods on the
> object in his "REST+SOAP" article [4]. But neither adhoc HTTP methods or
> XML posts seem like good candidates for invoking operations on a
> resource in a typical webapp. So maybe something like:
>
> /person/123456789/edit or
> /person/123456789.edit or
> /person/123456789?operation=edit
Ok, next rule: operations map to methods, not to hierarchy or to
attributes. As such, that's why I introduce the "." notation; to
separate the concerns. Thus, so far the general rule is:
location_hierarchy/../resource.operation?attributes
Your mileage may vary, but it works for us...
>
> Resource Type
> -------------
>
> Should the type of the resource be part of the URI? We probably have to
> contain some type info in the URL to give it "independent sematics"
> (person e.g.). But we should not put types that might change like
> patient, manager, project-leader etc in the URL. And we should
> especially avoid types that only have to do with implementation details
> like what pipeline we want to use for rendering the resource.
Keep the objects abstract :-)...
>
> Format
> ------
>
> Cocoon especially shines in handling all the various file name
> extensions: .html, .wml, .pdf, .txt, .doc, .jpg, .png, .svg, etc, etc.
> But I'm sorry, if you want cool URLs you have to kiss them godbye as well ;)
<snip/>
> But in most cases file name extensions is an implementation detail that
> not is relevant for your users.
Makes sense, in our case much of the time operations and format are
interchangeable and there may be defaults. Thus:
x/y/z/patient?id=12345
might be the same as:
x/y/z/patient.html?12345
Don't know if it helps, but you can think of format as an object
method to get a specific output format...
<snip/>
> Does Cocoon Support Cool URLs?
> ==============================
>
> But how does Cocoon support the above ideas about URL space design?
>
> Well, in some way one could say that it supports it. The sitemap is so
> powerfull that you can program most usage patterns in it in some more or
> less elegant way. But AFAICS, writing webapps following the URL space
> design ideas above would be rather tricky. So I would say that Cocoon
> doesn't support it that well. The main reasons are:
>
> * The sitemap is not that usefull as a site map
> * The sitemap gives excelent support for choosing resource production
> implementation based on the implementation details coded into the URL,
> but not for avoiding it
> * The sitemap mixes site map concerns with resource production
> implementation details
I think this all goes back to my initial comments: need a "URL Map"
producer and better matching?
<snip/>
> Before I suggested that aspects like: type, format, status, access
> rights etc shouldn't be part of the URL as those aspects might change
> for the resource. OTH these aspects certainly are necessary for choosing
> rendering pipeline, what should we do?
This is where the difference between resource publishing and browser
based application support start to matter. It seems to me that maybe
you are coming at this more from an application support perspective
than a resource publishing perspective? If so, I'll note that you want
to separate the issue of layout from the choice of pipeline. In
particular, in an application it's better to have the URL behave in a
friendly way than to stay "cool" at all times. For example:
x/y/z/patient?id=123*
might produce a search layout if no match was found, a list layout if
multiple matches was found or an edit layout if a single match was
found. All of these might be considered different renderings of a
single "resource". On might (perhaps rightfully) think that I'm
stretching TBL's definition of resource, but consider that if there
was no application behind the scenes (and thus the data never changed)
then this URL would turn "cool" since it would always produce the same
output.
<snip/>
>
> The new sitemap
> ===========
>
> To sum up the proposal:
>
> Pipelines:
> * Pipeline construction is only done as VPCs in component areas (often
> in blocks).
As has been pointed out, flow also gives you a way to separate these concerns...
> Sitemap:
> * The sitemap is folow the tree structure of the URL space (like the
> Forrest linkmap).
> * Its responsibillity is to map URLs to VPCs
> * It can set the current context for each level in the tree (for
> derefering relative paths used in the VPC)
> * Wildcards can have restrictions based on properties in the content
> repository
> * Its best match based rather than rule order based
> * Of course we have an include construct so that we can reuse sub sites
>
> It might look like:
>
> <sitemap>
> <path match="person" context="adm/persons"
> pipeline="block:skin:default(search.xml)">
> <path match="*:patient" test="mydb:/patients/{patient} exists"
> context="adm/patients" pipeline="journal-summary({patient})">
> <path match="edit" pipeline="edit({patient})"/>
> <path match="list" pipeline="list({patient})"/>
> <!-- and so on -->
> </path>
> </path>
> </sitemap>
I think you're reinventing flow in XML that isn't even XSLT. I'd
prefer an XSLT based sitemap :-)
<snip>
>
> Much of what I propose can be achieved with VPCs and a new "property
> aware" matcher. But IMO the stricter SoC above, the ability to "query"
> the sitemap, the possible advantages of the "best match" search, are
> reasons enough to go further.
Better matching would help, but again flow can already do most of what
you describe. The ability to query the sitemap to get a link map/url
map might be the most important thing.
I'll likely be offline for the next couple of days, small odd's of
being able to contribute further to this thread. Good luck :-)
--
Peter Hunsberger
Re: [RT] Escaping Sitemap Hell
Posted by Glen Ezkovich <gl...@hard-bop.com>.
On Jan 12, 2005, at 3:20 PM, Peter Hunsberger wrote:
> On Wed, 12 Jan 2005 13:59:41 -0600, Glen Ezkovich <gl...@hard-bop.com>
> wrote:
>>
>>
>> I'm just curious as to what type of information is contained your URLs
>> that enables continuation and what the applications do first when a
>> user continues.
>
> It varies, but basically:
>
> location/screen.type?parameters
>
> where parameters are request parameters that vary by screen and type
> and qualify the instance of the screen. We sometimes add a third
> qualifier to type but it's not involved in session recovery that I
> know of.
>
> As mentioned earlier, this essentially maps to:
>
> package.object.method( parameters )
>
> on the front end, but that's not 100%, flow script can mangle the
> mapping completely in some cases and even in the generic case there's
> a lookup substitution.
Thanks. I should have gone back and reread the thread. Sorry for the
inconvenience.
Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com
A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to
worry about answers."
- Thomas Pynchon Gravity's Rainbow
Re: [RT] Escaping Sitemap Hell
Posted by Peter Hunsberger <pe...@gmail.com>.
On Wed, 12 Jan 2005 13:59:41 -0600, Glen Ezkovich <gl...@hard-bop.com> wrote:
>
>
> I'm just curious as to what type of information is contained your URLs
> that enables continuation and what the applications do first when a
> user continues.
It varies, but basically:
location/screen.type?parameters
where parameters are request parameters that vary by screen and type
and qualify the instance of the screen. We sometimes add a third
qualifier to type but it's not involved in session recovery that I
know of.
As mentioned earlier, this essentially maps to:
package.object.method( parameters )
on the front end, but that's not 100%, flow script can mangle the
mapping completely in some cases and even in the generic case there's
a lookup substitution.
--
Peter Hunsberger
Re: [RT] Escaping Sitemap Hell
Posted by Glen Ezkovich <gl...@hard-bop.com>.
On Jan 12, 2005, at 12:41 PM, Peter Hunsberger wrote:
>
>> At a single session level I think you can keep this information
>> private
>> by using forms. Users will not see the query that contains the data
>> you
>> are tracking. Sure the user can't come back after closing their
>> browser
>> and pick up where they left off, but you really didn't design your
>> application for this.
>
> Sure, if you don't need any adoption/resumption of the state across
> browsers then form variables work fine. However, we _do_ design our
> applications for resumption upon resumed sessions.
I did realize that you do design for resumption. Hence the use of
anchors instead of forms.
> In particular, we
> do timed out session recovery/resumption after reauthentication
I'm just curious as to what type of information is contained your URLs
that enables continuation and what the applications do first when a
user continues.
Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com
A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to
worry about answers."
- Thomas Pynchon Gravity's Rainbow
Re: [RT] Escaping Sitemap Hell
Posted by Peter Hunsberger <pe...@gmail.com>.
On Wed, 12 Jan 2005 12:04:38 -0600, Glen Ezkovich <gl...@hard-bop.com> wrote:
>
> On Jan 12, 2005, at 9:17 AM, Peter Hunsberger wrote:
>
> > On Sat, 8 Jan 2005 18:48:29 -0600, Glen Ezkovich <gl...@hard-bop.com>
> > wrote:
> >
> > <snip>stuff everyone seems to agree on</snip>\
> >
> >>> You've got to allow for variations on
> >>> authorizations, error handling, timeouts, resumed sessions, etc.
> >>
> >> These do not have to be public URLs. All of these things are internal
> >> to the application. If you are providing your users with URLs for
> >> these
> >> things you should ask yourself if there is a better way to handle
> >> this.
> >
> > The problem is that apps often need to track some kind of state. Any
> > one of the above can affect state. In some cases you can't rely on
> > session (or cookies) and URL becomes the easiest fall back.
>
> Easiest, not necessarily the best. As always it depends on your
> resources and use cases as to which solution is best. An application
> designed to handle authorizations, time outs and/or resumed sessions
> probably should not depend on URLs to accomplish these tasks.
Certainly not. However, URL's can still be a convenient, even good,
way to encode state, in particular if the state representation is
valid across multiple other concerns.
> At a single session level I think you can keep this information private
> by using forms. Users will not see the query that contains the data you
> are tracking. Sure the user can't come back after closing their browser
> and pick up where they left off, but you really didn't design your
> application for this.
Sure, if you don't need any adoption/resumption of the state across
browsers then form variables work fine. However, we _do_ design our
applications for resumption upon resumed sessions. In particular, we
do timed out session recovery/resumption after reauthentication.
<snip/>
> >
> > I think there are some overall patterns that can be useful for
> > probably 80% of the use cases. Stefano seems to believe it's not worth
> > trying to build anything directly into Cocoon to handle these.
>
> I think his argument is stronger then that. If you implement a sitemap
> structure that maps perfectly to 80% of the use cases, there is a good
> chance that the other 20% would be impossible to achieve using that
> structure or at the very least require considerable hacking.
Yes: but that's exactly the current situation. The reality of the Web
is that it's a graph. Graph theory tells us that's there's no
algorithm for generalized graph traversal that will be guaranteed to
finish in finite time for all graphs. So, no single solution can work
for all Cocoon apps. So far Cocoon has three ways (at least) to
attack the problem of customized graph traversal:
1) the sitemap with the default matchers;
2) pluggable matchers;
3) flow.
There seems to be very little exploration of 2), as has been kicked
around in this thread at least part of the solution may lie in some
new types of matchers.
For some reason some people seem to want to avoid 3), even with Java
flow (as opposed to flow script). The other issue here is that 3)
currently makes it hard to get back a URL-map so there's problems with
debugging (and arguably potential security exposures).
What I'd like is something new that would attack 2) and 3)
simultaneously: an XSLT version of the sitemap. However, as I said,
I've argued for this before and I still don't think I can make a
convincing case that it would solve anyone's problems than my own...
<snip/>
> Matching is deferent then restructuring the sitemap. I don't believe
> Stefano would have a problem with a TreeMatcher. But I could be wrong.
> :-0
It's not clear that a tree matcher can be implemented with the current
sitemap: either you match or you don't, there's no way of "voting" on
how well you match. I think to get a true TreeMatcher you'd also need
to change some parts of the Cocoon internals (to evaluate what a best
match means or to track past match context). IOW, you'd need something
other than the current sitemap (even if the syntax remains backwards
compatible)...
--
Peter Hunsberger
Re: [RT] Escaping Sitemap Hell
Posted by Glen Ezkovich <gl...@hard-bop.com>.
On Jan 12, 2005, at 9:17 AM, Peter Hunsberger wrote:
> On Sat, 8 Jan 2005 18:48:29 -0600, Glen Ezkovich <gl...@hard-bop.com>
> wrote:
>
> <snip>stuff everyone seems to agree on</snip>\
>
>>> You've got to allow for variations on
>>> authorizations, error handling, timeouts, resumed sessions, etc.
>>
>> These do not have to be public URLs. All of these things are internal
>> to the application. If you are providing your users with URLs for
>> these
>> things you should ask yourself if there is a better way to handle
>> this.
>
> The problem is that apps often need to track some kind of state. Any
> one of the above can affect state. In some cases you can't rely on
> session (or cookies) and URL becomes the easiest fall back.
Easiest, not necessarily the best. As always it depends on your
resources and use cases as to which solution is best. An application
designed to handle authorizations, time outs and/or resumed sessions
probably should not depend on URLs to accomplish these tasks.
At a single session level I think you can keep this information private
by using forms. Users will not see the query that contains the data you
are tracking. Sure the user can't come back after closing their browser
and pick up where they left off, but you really didn't design your
application for this.
> I've done
> some strange hacks to work around this in my life, but a structured
> URL can definitely make life easier.
Always. I don't think anyone will argue against this. :-))
>
>>>> As far as dogmatism and URL structure goes, you can always be
>>>> dogmatic
>>>> in the way you structure them. ;-) The problem with dogmatism is
>>>> that
>>>> it does not always lead to the best solution for a given case. Then
>>>> again sometimes it does.
>>>
>>> And that's this issue Daniel's worried about: what's the best
>>> solution. (However, I'm not completely sure for what problem space.)
>>
>> As always, it depends on the problem space. ;-) There is not one best
>> way and thats why implementing the sitemap according to the "best way"
>> to design your URL space is probably a bad idea. I do find the idea of
>> a tree structured sitemap appealing (even though it works nicely with
>> my "best way" :-O).
>
> I think there are some overall patterns that can be useful for
> probably 80% of the use cases. Stefano seems to believe it's not worth
> trying to build anything directly into Cocoon to handle these.
I think his argument is stronger then that. If you implement a sitemap
structure that maps perfectly to 80% of the use cases, there is a good
chance that the other 20% would be impossible to achieve using that
structure or at the very least require considerable hacking.
> I'm
> not sure I can codify any rational proposal well enough to dispel
> this belief so I'll probably drop this for now.
>
> Tree structured URL matching makes sense to me and doing it using XSLT
> makes even more sense. An XSLT flow handler is something I've been
> pushing for for years now, but for the moment Javascript flow and
> resolving URLs through our internal rules based matcher works fine so
> I can't really justify spending any time on this...
Matching is deferent then restructuring the sitemap. I don't believe
Stefano would have a problem with a TreeMatcher. But I could be wrong.
:-0
Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com
A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to
worry about answers."
- Thomas Pynchon Gravity's Rainbow
Re: [RT] Escaping Sitemap Hell
Posted by Peter Hunsberger <pe...@gmail.com>.
On Sat, 8 Jan 2005 18:48:29 -0600, Glen Ezkovich <gl...@hard-bop.com> wrote:
<snip>stuff everyone seems to agree on</snip>\
> > You've got to allow for variations on
> > authorizations, error handling, timeouts, resumed sessions, etc.
>
> These do not have to be public URLs. All of these things are internal
> to the application. If you are providing your users with URLs for these
> things you should ask yourself if there is a better way to handle this.
The problem is that apps often need to track some kind of state. Any
one of the above can affect state. In some cases you can't rely on
session (or cookies) and URL becomes the easiest fall back. I've done
some strange hacks to work around this in my life, but a structured
URL can definitely make life easier.
> >> As far as dogmatism and URL structure goes, you can always be dogmatic
> >> in the way you structure them. ;-) The problem with dogmatism is that
> >> it does not always lead to the best solution for a given case. Then
> >> again sometimes it does.
> >
> > And that's this issue Daniel's worried about: what's the best
> > solution. (However, I'm not completely sure for what problem space.)
>
> As always, it depends on the problem space. ;-) There is not one best
> way and thats why implementing the sitemap according to the "best way"
> to design your URL space is probably a bad idea. I do find the idea of
> a tree structured sitemap appealing (even though it works nicely with
> my "best way" :-O).
I think there are some overall patterns that can be useful for
probably 80% of the use cases. Stefano seems to believe it's not worth
trying to build anything directly into Cocoon to handle these. I'm
not sure I can codify any rational proposal well enough to dispel
this belief so I'll probably drop this for now.
Tree structured URL matching makes sense to me and doing it using XSLT
makes even more sense. An XSLT flow handler is something I've been
pushing for for years now, but for the moment Javascript flow and
resolving URLs through our internal rules based matcher works fine so
I can't really justify spending any time on this...
--
Peter Hunsberger
Re: [RT] Escaping Sitemap Hell
Posted by Glen Ezkovich <gl...@hard-bop.com>.
On Jan 7, 2005, at 10:30 PM, Peter Hunsberger wrote:
> On Fri, 7 Jan 2005 17:38:35 -0600, Glen Ezkovich <gl...@hard-bop.com>
> wrote:
>>
>> On Jan 7, 2005, at 1:43 PM, Peter Hunsberger wrote:
>>
>>> On Fri, 07 Jan 2005 14:28:06 -0500, Stefano Mazzocchi
>>> <st...@apache.org> wrote:
>>>>
>>>> See? the problem is that you are partitioning the matching space
>>>> with
>>>> URL matchers... I strongly feel that most of the problems that
>>>> Daniel
>>>> (and you) are outlining will just go away if you used non-URL
>>>> matchers.
>>>
>>> Although I agree that 90% of the problem seems to be a matcher issue
>>> I've got to ask; what would the matchers be matching on if it's not a
>>> URL? I have a couple answers, but I'd like other opinions...
>>>
>>> It seems to me that Daniel might be coming at this from a mostly
>>> application POV. If so, for such cases, I think you can't _always_
>>> be
>>> quite as dogmatic about how a URL is structured; for many apps
>>> there's
>>> little to no exectation of long term URI persistance/repeatability.
>>
>> I don't see this. The application is the resource and it is the
>> application that should have a unique identifier. If the application
>> allows a user to perform multiple tasks you may want to consider each
>> task a resource.
>
> An app may be 1000's of resources. The issue is how to find a
> rational way of getting 1000's of unique identifiers.
Hmmm... This is a question of semantics. To me the app is the only
resource. Because I use the http protocol I am constrained as to how I
issue commands and pass arguments. As a result I use constructs that
appear to be URIs but in reality are just an awkward translation to
something that looks like a URI. This does not diminish the problem but
explains why it exists.
>
>> The persistence of the URI in general is not that
>> important from a users perspective since the URI identifies a resource
>> that might be reachable from multiple URLs. What is important is that
>> the URL that a user uses to reach an application persist and not
>> change
>> as long as users may use the application.
>
> Umm, wasn't that my point?
Sorry, I just missed it. I'm glad we agree.
>
>> We may not expect to see
>> identical results each time we access http://weather.com/neworleans
>> but
>> we do expect to get the current weather forecast for New Orleans. If
>> weather.com switched the URI/L to http://weather.com?city=neworleans,
>> as a user I would be perturbed to say the least.
>
> You're missing the point:
Probably.
> weather is a published resource, it's not an
> application.
This was too simplistic an example with out the details. I should also
not have used an existing site. The point was a single
application/resource can be accessed by two distinct URLs and changing
URLs is unsettling to a user.
> Consider something like a Web hosted SFA app, something
> where you need work flow. The hard point isn't getting to the app, the
> hard part is partitioning the app. You've got lots' of orthogonal
> concerns (and I mean lots). URI, scheme (protocol), and request
> parameters (among others) all give you ways of attacking various parts
> of the problem,
Of course. And this is where problems in structuring a sitemap arise.
> but when you start hosting 1000's of different app
> spaces on the same machine the issue isn't as trivial as
> scheme://host?couple-of-parms.
What else do you have? This is pretty much it X 1000s. An application
consists of its commands and queries. Each may have parameters. The
problem is mapping the commands, queries and parameters to something
that is a valid URI. Your sitemap can only do so much. With the advent
of flow there is a very direct way to map public URLs to commands and
queries. If you want to simplify (though not shorten) your sitemap use
flow and match using a hierarchical menu like pattern (
<map:match pattern="patients/edit">
<map:call function="editPatient"/>
</map:match>
).
I realize that everything is not this simple. Many problems arise from
technologies that have been supplanted by flow. Prior to flow there was
a tendency to use the sitemap as the application controller. Using it
as such probably leads one to much more inventive URI designs.
> You've got to allow for variations on
> authorizations, error handling, timeouts, resumed sessions, etc.
These do not have to be public URLs. All of these things are internal
to the application. If you are providing your users with URLs for these
things you should ask yourself if there is a better way to handle this.
>> As far as dogmatism and URL structure goes, you can always be dogmatic
>> in the way you structure them. ;-) The problem with dogmatism is that
>> it does not always lead to the best solution for a given case. Then
>> again sometimes it does.
>
> And that's this issue Daniel's worried about: what's the best
> solution. (However, I'm not completely sure for what problem space.)
As always, it depends on the problem space. ;-) There is not one best
way and thats why implementing the sitemap according to the "best way"
to design your URL space is probably a bad idea. I do find the idea of
a tree structured sitemap appealing (even though it works nicely with
my "best way" :-O).
Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com
A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to
worry about answers."
- Thomas Pynchon Gravity's Rainbow
Re: [RT] Escaping Sitemap Hell
Posted by Stefano Mazzocchi <st...@apache.org>.
Peter Hunsberger wrote:
> And that's this issue Daniel's worried about: what's the best
> solution. (However, I'm not completely sure for what problem space.)
There is no best solution, folks.
For some people, even something like http://site.com/29309840938 is not
persistent enough (because site.com might go away... but even because
the HTTP protocol might go away!)
I'm all for the creation of some documentation about "best practices in
URL-space design", but I would be very strongly against the distillation
of any of these best practices into the sitemap architecture itself,
unless until one of those best practices turns out to be used by
practically everybody and in any given circumstance.
--
Stefano.
Re: [RT] Escaping Sitemap Hell
Posted by Peter Hunsberger <pe...@gmail.com>.
On Fri, 7 Jan 2005 17:38:35 -0600, Glen Ezkovich <gl...@hard-bop.com> wrote:
>
> On Jan 7, 2005, at 1:43 PM, Peter Hunsberger wrote:
>
> > On Fri, 07 Jan 2005 14:28:06 -0500, Stefano Mazzocchi
> > <st...@apache.org> wrote:
> >>
> >> See? the problem is that you are partitioning the matching space with
> >> URL matchers... I strongly feel that most of the problems that Daniel
> >> (and you) are outlining will just go away if you used non-URL
> >> matchers.
> >
> > Although I agree that 90% of the problem seems to be a matcher issue
> > I've got to ask; what would the matchers be matching on if it's not a
> > URL? I have a couple answers, but I'd like other opinions...
> >
> > It seems to me that Daniel might be coming at this from a mostly
> > application POV. If so, for such cases, I think you can't _always_ be
> > quite as dogmatic about how a URL is structured; for many apps there's
> > little to no exectation of long term URI persistance/repeatability.
>
> I don't see this. The application is the resource and it is the
> application that should have a unique identifier. If the application
> allows a user to perform multiple tasks you may want to consider each
> task a resource.
An app may be 1000's of resources. The issue is how to find a
rational way of getting 1000's of unique identifiers.
> The persistence of the URI in general is not that
> important from a users perspective since the URI identifies a resource
> that might be reachable from multiple URLs. What is important is that
> the URL that a user uses to reach an application persist and not change
> as long as users may use the application.
Umm, wasn't that my point?
> We may not expect to see
> identical results each time we access http://weather.com/neworleans but
> we do expect to get the current weather forecast for New Orleans. If
> weather.com switched the URI/L to http://weather.com?city=neworleans,
> as a user I would be perturbed to say the least.
You're missing the point: weather is a published resource, it's not an
application. Consider something like a Web hosted SFA app, something
where you need work flow. The hard point isn't getting to the app, the
hard part is partitioning the app. You've got lots' of orthogonal
concerns (and I mean lots). URI, scheme (protocol), and request
parameters (among others) all give you ways of attacking various parts
of the problem, but when you start hosting 1000's of different app
spaces on the same machine the issue isn't as trivial as
scheme://host?couple-of-parms. You've got to allow for variations on
authorizations, error handling, timeouts, resumed sessions, etc.
> As far as dogmatism and URL structure goes, you can always be dogmatic
> in the way you structure them. ;-) The problem with dogmatism is that
> it does not always lead to the best solution for a given case. Then
> again sometimes it does.
And that's this issue Daniel's worried about: what's the best
solution. (However, I'm not completely sure for what problem space.)
--
Peter Hunsberger
Re: [RT] Escaping Sitemap Hell
Posted by Glen Ezkovich <gl...@hard-bop.com>.
On Jan 7, 2005, at 1:43 PM, Peter Hunsberger wrote:
> On Fri, 07 Jan 2005 14:28:06 -0500, Stefano Mazzocchi
> <st...@apache.org> wrote:
>>
>> See? the problem is that you are partitioning the matching space with
>> URL matchers... I strongly feel that most of the problems that Daniel
>> (and you) are outlining will just go away if you used non-URL
>> matchers.
>
> Although I agree that 90% of the problem seems to be a matcher issue
> I've got to ask; what would the matchers be matching on if it's not a
> URL? I have a couple answers, but I'd like other opinions...
>
> It seems to me that Daniel might be coming at this from a mostly
> application POV. If so, for such cases, I think you can't _always_ be
> quite as dogmatic about how a URL is structured; for many apps there's
> little to no exectation of long term URI persistance/repeatability.
I don't see this. The application is the resource and it is the
application that should have a unique identifier. If the application
allows a user to perform multiple tasks you may want to consider each
task a resource. The persistence of the URI in general is not that
important from a users perspective since the URI identifies a resource
that might be reachable from multiple URLs. What is important is that
the URL that a user uses to reach an application persist and not change
as long as users may use the application. We may not expect to see
identical results each time we access http://weather.com/neworleans but
we do expect to get the current weather forecast for New Orleans. If
weather.com switched the URI/L to http://weather.com?city=neworleans,
as a user I would be perturbed to say the least.
As far as dogmatism and URL structure goes, you can always be dogmatic
in the way you structure them. ;-) The problem with dogmatism is that
it does not always lead to the best solution for a given case. Then
again sometimes it does.
Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com
A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to
worry about answers."
- Thomas Pynchon Gravity's Rainbow
Re: [RT] Escaping Sitemap Hell
Posted by Peter Hunsberger <pe...@gmail.com>.
On Fri, 07 Jan 2005 14:28:06 -0500, Stefano Mazzocchi
<st...@apache.org> wrote:
<snip/>
> >
> > Forrest is doing it for docs... someone else can do it for apps :-)
>
> See? the problem is that you are partitioning the matching space with
> URL matchers... I strongly feel that most of the problems that Daniel
> (and you) are outlining will just go away if you used non-URL matchers.
Although I agree that 90% of the problem seems to be a matcher issue
I've got to ask; what would the matchers be matching on if it's not a
URL? I have a couple answers, but I'd like other opinions...
It seems to me that Daniel might be coming at this from a mostly
application POV. If so, for such cases, I think you can't _always_ be
quite as dogmatic about how a URL is structured; for many apps there's
little to no exectation of long term URI persistance/repeatability.
--
Peter Hunsberger
(Won't be able to respond untili Wed.)
Re: [RT] Escaping Sitemap Hell
Posted by Stefano Mazzocchi <st...@apache.org>.
Nicola Ken Barozzi wrote:
> Stefano Mazzocchi wrote:
>
>> Daniel Fagerstrom wrote:
>
> ...
>
>>> A real map for the site should be tree structured like the linkmap in
>>> forrest. Take a look at the example in [1], (I don't suggest using
>>> the "free form" XML, something stricter is required). Such a tree
>>> model will also help in planning the URI space as it gives a good
>>> overview of it.
>>
>>
>> Forrest and cocoon serve different purposes.
>>
>> While I totally welcome the fact that Forrest has such "linkmaps", I
>> don't think they are general-enough concepts to drive the entire
>> framework. They are fine as specific cases, especially appealing for a
>> website generation facility like forrest, but as a general concept is
>> too weak.
>
>
> While I agree with your reply, I think that I understand what problem
> Daniel thinks he sees.
>
> A sitemap is not the _map_ of a _site_.
>
> That's why we made site.xml.
>
> But saying that this should drive processing is IMHO not correct. In
> fact, it's the opposite. We made the site.xml stuff in a different file
> so that it would *not* interfere with processing.
Right. I see that.
>
> ...
>
>>> Choosing Production Pipeline
>>> ----------------------------
>
> ...
>
>>> Now instead of having the rule:
>>>
>>> *.x.y.z ==> XYZPipeline
>>>
>>> we have
>>>
>>> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>>>
>>> or
>>>
>>> * where repository:{1}.x.y.z exists ==> XYZPipeline
>>
>>
>>
>> Oh, a rule system for sitemap!
>>
>> hmmmm, interesting... know what? the above smells a *lot* like you are
>> querying RDF. hmmmm...
>
>
> At Forrest, we have done a similar thing, and are still in the process
> of finishing it. IT states something like this:
>
> "Forrest processing should not be tied to URLs."
>
> IOW, Forrest should not process a file differently just because it's in
> a particular directory, but using other characteristics, like mime-type,
> DTD, schema, etc. For us, an URL is a partitioning decision of the
> content creator, not of the application creator.
>
> Many sites fail to do this, and URL matching has become the easiest way
> of partitioning Cocoon's processing, although not the best.
>
> You can make your own matcher... and here is where we should
> concentrate, by defining new blueprints that don't use the URL as a
> matching system.
>
> Forrest is doing it for docs... someone else can do it for apps :-)
See? the problem is that you are partitioning the matching space with
URL matchers... I strongly feel that most of the problems that Daniel
(and you) are outlining will just go away if you used non-URL matchers.
--
Stefano.
Re: [RT] Escaping Sitemap Hell
Posted by Nicola Ken Barozzi <ni...@apache.org>.
Stefano Mazzocchi wrote:
> Daniel Fagerstrom wrote:
...
>> A real map for the site should be tree structured like the linkmap in
>> forrest. Take a look at the example in [1], (I don't suggest using the
>> "free form" XML, something stricter is required). Such a tree model
>> will also help in planning the URI space as it gives a good overview
>> of it.
>
> Forrest and cocoon serve different purposes.
>
> While I totally welcome the fact that Forrest has such "linkmaps", I
> don't think they are general-enough concepts to drive the entire
> framework. They are fine as specific cases, especially appealing for a
> website generation facility like forrest, but as a general concept is
> too weak.
While I agree with your reply, I think that I understand what problem
Daniel thinks he sees.
A sitemap is not the _map_ of a _site_.
That's why we made site.xml.
But saying that this should drive processing is IMHO not correct. In
fact, it's the opposite. We made the site.xml stuff in a different file
so that it would *not* interfere with processing.
...
>> Choosing Production Pipeline
>> ----------------------------
...
>> Now instead of having the rule:
>>
>> *.x.y.z ==> XYZPipeline
>>
>> we have
>>
>> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>>
>> or
>>
>> * where repository:{1}.x.y.z exists ==> XYZPipeline
>
>
> Oh, a rule system for sitemap!
>
> hmmmm, interesting... know what? the above smells a *lot* like you are
> querying RDF. hmmmm...
At Forrest, we have done a similar thing, and are still in the process
of finishing it. IT states something like this:
"Forrest processing should not be tied to URLs."
IOW, Forrest should not process a file differently just because it's in
a particular directory, but using other characteristics, like mime-type,
DTD, schema, etc. For us, an URL is a partitioning decision of the
content creator, not of the application creator.
Many sites fail to do this, and URL matching has become the easiest way
of partitioning Cocoon's processing, although not the best.
You can make your own matcher... and here is where we should
concentrate, by defining new blueprints that don't use the URL as a
matching system.
Forrest is doing it for docs... someone else can do it for apps :-)
--
Nicola Ken Barozzi nicolaken@apache.org
- verba volant, scripta manent -
(discussions get forgotten, just code remains)
---------------------------------------------------------------------
Re: [RT] Escaping Sitemap Hell
Posted by Stefano Mazzocchi <st...@apache.org>.
Daniel Fagerstrom wrote:
> (was: Splitting xconf files step 2: the sitemap)
>
> Although the Cocoon sitemap is a really cool innovation it is not
> entierly without problems:
>
> * Sitemaps for large webapps easy becomes a mess
> * It sucks as a map describing the site [1]
> * It doesn't give that much support for "cool URLs" [2]
>
> In this RT I will try to analyze the situation especially with respect
> to URL space design and then move on to discuss a possible solution.
>
> Before you entusiasticly dive into the text:
>
> * It is a long RT, (as my RTs usually are)
> * It might contain provoking and hopefully even thought provoking ideas
> * No, I will not require that everything in it should be part of 2.2
> * No, I don't propose that we should scrap the current sitemap, actually
> I believe that we should support it for the next few millenia ;)
See my comments intermixed.
> --- o0o ---
>
> Peter and I had some discussion:
>
> Peter Hunsberger wrote:
>
>> On Tue, 04 Jan 2005 13:25:05 +0100, Daniel Fagerstrom
>> <da...@nada.kth.se> wrote:
>
>
> <snip/>
>
>>> Anyway, sometimes when I need to refactor or add functionallity to
>>> some of our Cocoon applications, where I or colleagues of mine have
>>> written endless sitemaps, I have felt that it would have been nice if
>>> the sitemap would have been more declarative so that I could have
>>> asked it basic things like geting a list of what URLs or URL pattern
>>> it handles. Also if I have an URL in a large webapp and it doesn't
>>> work as expected it can require quite some work to trace through all
>>> involved sitemaps to see what rule that actually is used.
>>
>>
>
>>> Of course I understand that if I used a set of efficient conventions
>>> about how to structure my URL space and my sitemaps the problem would
>>> be much less. Problem is that I don't have found such a set of
>>> conventions yet. Of course I'm following some kind of principles, but
>>> I don't have anything that I'm completely happy with yet. Anyone
>>> having good design patterns for URL space structuring and sitemap
>>> structuring, that you want to share?
>>
>>
>>
>> We have conventions that use sort of type extensions on the names:
>> patient.search, patient.list, patient.edit where the search, list,
>> edit (and other) screen patterns are common across many different
>> metadata sources (in this case patient). We don't do match *.edit
>> directly in the sitemap (any more) but I find that if you've got to
>> handle orthoganal concerns then x.y.z naming patterns can sometimes
>> help.
>>
> Ok, lets look at this in a more abstracted setting:
>
> Resource Aspects
> ================
>
> In the example above we have an object or better a _resource_, the
> patient that everything else is about. The resource should be
> identifyable in an unique way in this case with e.g. the social security
> number.
First big mistake: you think that http-based URIs and http-based URLs
are the same thing.
Well, WRONG.
There is nothing that says that every http-URI should be automatically
treated as a URL. This is a very commmon misconception, but nevertheless
a big one.
> There are a number of _operations_ that can be performed at the patient
> resource: show, edit, list, search etc, (although the search might be on
> the set of patient rather than a single one).
>
> The resource has a _type_, patient, that might affect how we choose to
> show it etc.
Secong mistake: it is a architectural design issue to *avoid* adding a
type to a URI. These are three separate issues:
1) how to resolve a URI into a URL
2) how to negotiate the content of that URL
3) how to map that returned URL metadata (the HTTP response headers)
to a recognized type or format.
combining them into one is just a really poor way to use the web
architecture.
> There are in general other aspects that will stear how we render the
> response when someone asks for the resouce:
>
> * The _format_ of the response: html, pdf, svg, gif etc.
> * The _status_ of the resource: old, draft, new etc.
> * The _access_ rights of the response: public, team, member etc.
>
> There are plenty of other possible aspect areas as well.
>
> Cool Webapp URLs
> ================
>
> I searched the web to gain some insights in URL space design. It soon
> become clear that I should re-read Tim Berners-Lee's clasic, "Cool URIs
> don't change" [2]. I must say I wasn't prepared to the chock, I had
> completely missed how radical the message in it was when I read it the
> last time.
> I can also recomend reading [3], a W3C note that codifies the
> message from [2] and some other good URI practices into a set of
> guidelines.
I suggest you to read
http://www.w3.org/TR/webarch/
> So what is an URI? According to [3]:
>
> A URI is, actually, a //reference to a resource, with fixed and
> independent semantics/ /.
>
> This means that the URI should reference to a specific product,
> _always_.
GRRRR! A URI IS NOT A REFERENCE! A URI IS AN IDENTIFIER!
How to get a reference out of an itentifier is a totally different thing.
> Independent semantics means that a social security number is
> not enough, it should say that it is a person (from USA) as well. See
> [3] for the philosophical details.
Pfff, independent semantics doesn't mean anything. A perfectly valid URI is
urn:943098029834098/9829982739487298374
> * The URI should be easy to type
What the hell does this mean?
http://tinyurl.com/5r8kl
is easier to type than
http://www.amazon.com/exec/obidos/tg/detail/-/0465026567/
but which one is "better"? They both locate the same resource, but which
one of them identifies it better?
> * It should not contain to much meaning, especially not about
> implementation details
>
> Now I try to apply the ideas from [2] and [3] on the different resource
> aspects mentioned above. When I use words like "should" or "should not"
> without any motivation it means that I believed in the motivation from
> the gurus in the references ;) I will try to motivate my own ideas ;)
>
> What I'm going to suggest might be quite far from how you design your
> URL spaces. It is certainly far from the implementation detail plauged
> mess that I have created in my own applications.
>
> The Resource
> ------------
>
> The idea is that an URL identifies a resource. For the patient case
> above it could be:
>
> http://myhospital.com/person/123456789
>
> If we use a hierarchial URI space like /person/123456789, the "parent"
> URIs e.g. /person should also refer to a resource.
There is *NO SUCH THING* as a parent URI, because URIs do not have the
notion of paths. It is a *convention* that it was established by early
web server implementations (and that apache httpd perdured) that the /
in the paths got automatically mapped to the / in the file system or in
a hierarchical system where the / is used as a fragmentor for hierachy
identifiers.
There is *NOTHING* in any web spec that says this is the rule or, for
that matter, that this is a good thing.
/ is a "separator" in fact, from a URI point of view
http://myhospital.com/123456789/person
and
http://myhospital.com/person/123456789
show no difference in identification power.. which is what URIs do: they
identify!
> Its in most cases not
> a good idea to put a lot of topics classification effort in the URI
> hierarchy. Classifications are not unique and will change according to
> changing interests and world view.
This is true. But it is also true that, if you follow this reasoning,
you should not be using http:// URIs at all!
In fact, what happens to a URI when say, two hospitals merge and they
decide that it's in their best interest to get rid of the previous
references of the names, including those in the URIs?
This is the reason why a lot of people prefer URNs over http-URIs, for
example:
1) the handle system: http://www.handle.net/
2) the LSID system: http://www.omg.org/docs/dtc/04-05-01.pdf
3) the DOI system: http://www.doi.org/
TimBL believes that the above systems are just a different way to skin a
cat and they don't really solve anything (even if he agrees on the
problem that the domain part of http-URIs is the weakest part of an
http-URI, in terms of long-term persistence)
Also, you should take a look at 'Dynamic Delegation Discovery System'
(DDDS):
http://uri.net/ddds.html
which aims to become the standard way to translate a URI into a URL.
> Operations
> ----------
>
> What about the operations on the resource: list, search, edit etc? I
> find the object oriented style in WebDAV elegant where you use one URL
> together with different HTTP methods to perform different operations.
It's not the OO style of WebDAV, but it's the design of HTTP. Here is
another example of somebody ruining a perfectly great design by not
getting it: the browsers only allowed people to overload the actions in
forms, but never in anchor tags and the browsers never allowed
javascript to change that either.
> Sam Ruby also have some intersting ideas about using URLs to identify
> "objects" and different SOAP messages for different methods on the
> object in his "REST+SOAP" article [4]. But neither adhoc HTTP methods or
> XML posts seem like good candidates for invoking operations on a
> resource in a typical webapp. So maybe something like:
>
> /person/123456789/edit or
> /person/123456789.edit or
> /person/123456789?operation=edit
>
> is a good idea.
>
> Resource Type
> -------------
>
> Should the type of the resource be part of the URI?
Absolutely not!
> We probably have to
> contain some type info in the URL to give it "independent sematics"
> (person e.g.). But we should not put types that might change like
> patient, manager, project-leader etc in the URL. And we should
> especially avoid types that only have to do with implementation details
> like what pipeline we want to use for rendering the resource.
>
> Format
> ------
>
> Cocoon especially shines in handling all the various file name
> extensions: .html, .wml, .pdf, .txt, .doc, .jpg, .png, .svg, etc, etc.
> But I'm sorry, if you want cool URLs you have to kiss them godbye as
> well ;)
This is, again, another one of those major screwups from some browsers
(mostly IE) where the "extension" of a URL (as such a thing existed!)
was used to identify the mime-type instead of the response headers.
> It might be a good idea to send a html page to a browser on a PC and a
> wml page to a PDA user. But you shouldn't require your user to remember
> different URLs for different clients, thats a task for server driven
> content negotiation.
>
> Using .html is not especially future proof, should all links become
> invalid when you decide to reimplement your site with dynamic SVG?
>
> Often it is good to provide the user with a nice printable version of
> your page. But why should you advertice Adobes products in your URLs.
Unfair: many non-adobe things produce PDF and it's a royalty-free
specification to use.
http://partners.adobe.com/public/developer/pdf/index_reference.html
> A
> few years ago it was .ps or .dvi from academic sites and .doc in
> comersial sites. Right now it happen to be .pdf but will that be forever?
>
> Same thing with images, the user don't care about the format as long as
> it can be shown in the browser (content negotiation), neither should you
> make your content links or (Googles image search) be dependent on a
> particular compression scheme that happen to be popular right now.
>
> There are of course cases where you really whant to give your user the
> abillity to choose a specific format. Then a file name extension is a
> good idea. If you happens to maintain
> http://www.adobe.com/products/acrobat/ its ok to put some .pdf there
> e.g. ;)
>
> But in most cases file name extensions is an implementation detail that
> not is relevant for your users.
This is correct. Although a URL that might break in the future but shows
me a page in my browser today is better than a URL that might not break
tomorrow but doesn't show me anything at all today ;-)
> Status
> ------
>
> The status will by definition change, and that make your URL uncool if
> the status was part of the URL.
>
> Access Rights
> -------------
>
> Access rigths will often change for a document. I know it is easy to
> write path dependent rules for access rights in most webserver
> configuration files. But you expose irrelevant implementation details
> and its not future proof.
>
> Am I Really Serious?
> --------------------
>
> Why should a webapp URL be cool and future proof? Well, its the
> interface to your webapp. We agree that we shouldn't change interfaces
> in Cocoon at a whim, why should we treat the users of our webapps
> differently? And like it or not, usefull software sometimes lives for
> decades. If you build useful webapps you should consider planing ahead.
>
> Currently we are all used with webapps that uses the most horrible URLs
> containing tons of implementation details and changing every now and
> then. But it is not a law of nature that it must be like that. It is
> mainly a result of webapp development still being immature and the tools
> being far from perfect. Of course the user should be able to bookmark a
> useful form or wizard.
>
> Also I believe that exposing implementation details in ones URLs is at
> least as bad as making all member variables public in Java classes. It
> makes your webapp monolithic and fragile.
To get this straight: I totally agree that a cool URL scheme is a great
thing and I also think that the best URL scheme is something like
http://site.com/342343
and that's it... that's the only way never to change anything because
those numbers are the only 'semantically neutral' thing that you can do?
But still, my blog news URLs are the form of
http://www.betaversion.org/~stefano/linotype/news/34/
which have several problems:
1) we might forget to register the domain and somebody might steal it
from us
2) well, my name might change (but that's unlikely)
3) the company that has a trademark on linotype might sue me
4) I might decide to add other types of idems to my blog, like images
or articles or whatever else... then news/id/ would seem awkward
but the best part is the number, chosen to be incremental and unique in
that space.
>
> --- o0o ---
>
> You might find the views expressed above rather extreme and maybe
> unpractical. As indicated above they are also far away from what I
> curently do in my webapps. But I have for quite some time thought about
> how to fight the to easily increasing entropy in the webapps we develop.
> I have suspected that badly designed URL spaces has been part of the
> trouble. And when I re-read Tim BLs classic I suddenly realized that the
> habit of exposing implementation in the URLs might be at the root of the
> evil.
There is truth in this, but what I found irritating was the lack of
understanding of the difference between a URI and a URL.
Cocoon's internals show some of this too (and I have to admit that I
understood what URIs really were only after starting to work on the
semantic web) but this should not be perpetuated further.
> If this realization will survive the contact with your comments and
> other parts of reality is of course to early to tell ;)
>
>
> Does Cocoon Support Cool URLs?
> ==============================
Yessir!
> But how does Cocoon support the above ideas about URL space design?
>
> Well, in some way one could say that it supports it. The sitemap is so
> powerfull that you can program most usage patterns in it in some more or
> less elegant way. But AFAICS, writing webapps following the URL space
> design ideas above would be rather tricky. So I would say that Cocoon
> doesn't support it that well.
I rather strongly (and probably not surprisingly) disagree with this
statement.
> The main reasons are:
>
> * The sitemap is not that usefull as a site map
How is this making it worse to support "cool URLs"?
> * The sitemap gives excelent support for choosing resource production
> implementation based on the implementation details coded into the URL,
> but not for avoiding it
wrong! that's why we have pluggable matchers! the fact that you choose
to match by URL is your choice, not an architectural decision!
> * The sitemap mixes site map concerns with resource production
> implementation details
Yes, the cocoon sitemap describes how resources get produced in the
pipelines.... but what is the site map you are talking about? a
collection of all the resources available on the site? or just the URL
matchers without anything else?
> Is it a Map of the Site?
> ------------------------
>
> The Forrest people don't think that the sitemap is enough as map of the
> site. They have a special linkmap [1] that gives a map over the site and
> that is used for internal navigation and for creating menu trees. I have
> a similar view. From the sitemap it can be hard to answer basic
> questions like:
>
> * What is the URL space of the application
> * What is the structure of the URL space
> * How is the resource refered by this URL produced
Hold it right there!
If you think that understanding the URLspace of the application for a
sitemap is hard, then what about PHP? JSP? what about web.xml
descriptors? are they any better?
Second point: how in hell is "structure of the URL space" different from
"the URL space of the application"?
Third point: this is *flow* is should *NOT* be part of a sitemap anyway.
> The basic view of a URL in the sitemap is that it is any string. Even if
> there are constructions like mount, the URL space is not considered as
> hierarchial. That means that the URLs can be presented as patterns in
> any order in the sitemap and you have to read through all of it to see
> if there is a rule for a certain URL.
As I mentioned already, this is a design decision based on the fact that
it is *arbitrary* to consider the / as a hierachical separator.
Also, matchers are *NOT* URL-specific and it's a very useful concept.
Forcing matching to be:
1) URL-based
and
2) intrinsically hierarchical
is IMO a *severe* step backward in terms of architectural design.
> A real map for the site should be tree structured like the linkmap in
> forrest. Take a look at the example in [1], (I don't suggest using the
> "free form" XML, something stricter is required). Such a tree model will
> also help in planning the URI space as it gives a good overview of it.
Forrest and cocoon serve different purposes.
While I totally welcome the fact that Forrest has such "linkmaps", I
don't think they are general-enough concepts to drive the entire
framework. They are fine as specific cases, especially appealing for a
website generation facility like forrest, but as a general concept is
too weak.
> The Forrest linkmap have no notion of wildcards, which is a must in
> Cocoon. We continue discussing that.
All right.
> Choosing Production Pipeline
> ----------------------------
>
> With the sitemap it is very easy to choose the pipeline used for
> producing the response based on a URL pattern "*.x.y.z". That more or
> less forces the user to code implementation details i.e. what pipeline
> to use into the URL. This is only a problem for wildcard patterns
> otherwise we just associate the pipeline to the concrete "cool URL".
At this point I seriously wonder: are you aware that matchers are pluggable?
> Before I suggested that aspects like: type, format, status, access
> rights etc shouldn't be part of the URL as those aspects might change
> for the resource. OTH these aspects certainly are necessary for choosing
> rendering pipeline, what should we do?
URL-parameter matching.
<match type="wildcard" pattern="/news/*">
<match type="param" pattern="edit">
....
</match>
<match type="param" pattern="delete">
....
</match>
</match>
or, if you have HTTP action control (as in form actions), you can do
<match type="wildcard" pattern="/news/*">
<match type="action" pattern="get">
....
</match>
<match type="action" pattern="post">
....
</match>
</match>
and, most of all, you do *NOT* include access control information in the
URL! nor type! nor status!
> The requested resource will often be based on some content or
> combination of content that we can access from Cocoon. The content can
> be a file, data in a db, result from a business object etc. Let us
> assume that it resides in some kind of content repository. Now if we
> think about it, isn't it more natural to ask the content, that we are
> going to use, about its propertiies like type, format, status, access
> rights, etc, than to encode it in the URL? This properties can be
> encoded in the file name, in metadata in some property file, within the
> file, in a DB etc.
Ok, now that the nonsense venting is over, we seem to be getting at your RT.
> Now instead of having the rule:
>
> *.x.y.z ==> XYZPipeline
>
> we have
>
> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>
> or
>
> * where repository:{1}.x.y.z exists ==> XYZPipeline
Oh, a rule system for sitemap!
hmmmm, interesting... know what? the above smells a *lot* like you are
querying RDF. hmmmm...
> We get the right pipeline by querying the repository instead of encoding
> it in the URL. A further advantage is that the rule becomes "listable"
> as the "where" clause in the match expresses what are the allowed values
> for the wildcard.
>
> Separating the Concerns
> -----------------------
>
> The sitemap handles two concerns: it maps an URL to a pipeline that
> produces a repsonse and it describes how to put together this pipeline
> from sitemap components.
True.
> The first concern is related to site design and
> the second is more a form of programming. Puting them together makes it
> hard to see the URL structure and also makes it tempting to group URLs
> based on common pipeline implementation instead of on site structure.
Fair enough.
> Virtual Pipeline Components (VPCs) give us a way out from this. Large
> parts of our sites might be buildable with pipelines allready
> constructed in some standard blocks.
Right.
> I would propose to go even further, in the "real" site map it should
> only be allowed to call VPC pipelines, no pipeline construction is
> allowed, that should be done in the component area.
>
> In the "real" site map the current context is set and the the arguments
> to the called VPC is given.
Hmmm, rather drastic, but let's stick to it for your proposal.
> Search Order
> ------------
>
>> The problem for us, is as you allude to at the start of this
>> thread: Cocoon takes the first match, where what you really want is a
>> more XSLT "best match" type of handling; sometimes *.a, *.b, *.c works
>> and other times it's m.*, n.*, o.*...
>>
>> In the past that has lead me to suggest a sort of XSLT flow, but
>> thinking about it in this light I wonder if what I really want is just
>> XSLT sitemap matching (same thing in the end)...
>>
>>
> I also believe that a "best match" type of handling is preferable, it
> increases IMO usabillity and it also makes it possible to use tree based
> maching algoritms that are far more efficient than the current linear
> search based.
This is a valid point.
> The new sitemap
> ===========
>
> To sum up the proposal:
>
> Pipelines:
> * Pipeline construction is only done as VPCs in component areas (often
> in blocks).
>
> Sitemap:
> * The sitemap is folow the tree structure of the URL space (like the
> Forrest linkmap).
> * Its responsibillity is to map URLs to VPCs
> * It can set the current context for each level in the tree (for
> derefering relative paths used in the VPC)
> * Wildcards can have restrictions based on properties in the content
> repository
> * Its best match based rather than rule order based
> * Of course we have an include construct so that we can reuse sub sites
>
> It might look like:
>
> <sitemap>
> <path match="person" context="adm/persons"
> pipeline="block:skin:default(search.xml)">
> <path match="*:patient" test="mydb:/patients/{patient} exists"
> context="adm/patients" pipeline="journal-summary({patient})">
> <path match="edit" pipeline="edit({patient})"/>
> <path match="list" pipeline="list({patient})"/>
> <!-- and so on -->
> </path>
> </path>
> </sitemap>
>
> Don't care about the syntactical details in the example it needs much
> more thought, I just wanted to make it a little bit more concrete. The
> path separator "/" is implicily assumed between the levels. "*:patient",
> means that the content of "*" can be refered to as "patient".
>
> Much of what I propose can be achieved with VPCs and a new "property
> aware" matcher. But IMO the stricter SoC above, the ability to "query"
> the sitemap, the possible advantages of the "best match" search, are
> reasons enough to go further.
First thing that comes to mind is that the implicit assumption of '/' is
just bad. I would be against the proposal just for that.
Second, you lose the ability to do non-URL matching, which is, again
another reason to vote against this.
Third, conditional matching is just nonsense, it's mixing flow concerns
with matching.
Forth, I don't find the above any more readable than a sitemap that uses
VPCs.
I'll think about the rule-based pipeline resolution (which is an
interesting concept on itself) but the rest, I'm sorry, it really does
not resonate with me at all.
--
Stefano.
Re: [RT] Escaping Sitemap Hell
Posted by Reinhard Poetz <re...@apache.org>.
Ugo Cei wrote:
> Il giorno 06/gen/05, alle 01:54, Daniel Fagerstrom ha scritto:
>
>>
>> The requested resource will often be based on some content or
>> combination of content that we can access from Cocoon. The content can
>> be a file, data in a db, result from a business object etc. Let us
>> assume that it resides in some kind of content repository. Now if we
>> think about it, isn't it more natural to ask the content, that we are
>> going to use, about its propertiies like type, format, status, access
>> rights, etc, than to encode it in the URL? This properties can be
>> encoded in the file name, in metadata in some property file, within
>> the file, in a DB etc. Now instead of having the rule:
>>
>> *.x.y.z ==> XYZPipeline
>>
>> we have
>>
>> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>>
>> or
>>
>> * where repository:{1}.x.y.z exists ==> XYZPipeline
>>
>> We get the right pipeline by querying the repository instead of
>> encoding it in the URL. A further advantage is that the rule becomes
>> "listable" as the "where" clause in the match expresses what are the
>> allowed values for the wildcard.
>
>
> Unless I misinterpret what you mean, we already can do this:
>
> <map:match pattern="*">
> <map:call function="fun">
> <map:parameter name="par" value="{1}"/>
> </map:call>
> </map:match>
>
> function fun() {
> var entity = repo.query("select * from entities where id = " +
> cocoon.parameters.par + " and {x, y, z}");
> cocoon.sendPage("views/XYZPipeline", { entity : entity });
> }
>
> <map:match pattern="XYZPipeline">
> <map:generate type="jx" src="xyz.jx.xml"/>
> ...
> </map:match>
>
> Apart from the obviously contrived example, isn't the Flowscript just
> what we need to "get the right pipeline by querying the repository"?
>
>> I would propose to go even further, in the "real" site map it should
>> only be allowed to call VPC pipelines, no pipeline construction is
>> allowed, that should be done in the component area.
>
>
> In my sitemaps, public pipelines contain almost only <map:call> and
> <map:read> (for static resources) elements. All "classical"
> generator-transformer-serializer pipelines go into an "internal-only"
> pipeline that can be called from flowscripts only.
>
> Admittedly, this is fine for webapps, and maybe not so much for
> publishing-oriented websites. But what I want to point out is that your
> otherwise very well thought-out RT is incomplete if it doesn't take
> Flowscript in consideration, IMHO.
I started to write a very similar answer (example + considering flowscript)
Thanks Ugo ;-)
--
Reinhard
Re: [RT] Escaping Sitemap Hell
Posted by Glen Ezkovich <gl...@hard-bop.com>.
On Jan 6, 2005, at 11:12 AM, Daniel Fagerstrom wrote:
>
> How much work you should spend in creating "cool" URLs in your webapp,
> varies of course from application to application. I just wanted to
> point out that we can do more than 123456.cont if we want to. I have
> used more than one webapp that goes through many "transactions" during
> use, but force me to start the navigation from some start screen if my
> session expires. At least I would found such applications more
> userfriendly if they had used "cooler" URIs.
What do you mean by "transactions"? If the "transaction" is only
"committed" to the Session then once the session expires there is no
going back no matter how cool the URI. If the "transaction" is
committed to persistent storage then continuation is possible. However,
using flow to achieve this is dubious at best. Once such a transaction
is completed it seems wiser to use sendPage and if more control is
needed to pass it to a different function and create a new continuation
hierarchy. If a user can return to a site days latter and continue
where they left off its likely that the same URI from which they
started will be the one they use to continue, with the business layer
determining where they should start. The simple case would be an single
page form that can be saved at any point. A more complex case would be
an eLearning or testing application where each request requires the
users place to be saved. In both cases we are talking of a single
resource that is controlled in the business layer and not in flow. In
such applications the use of flow is simply to mediate between the
presentation and the model; i.e. validate input or display a reading
selection and then present questions to be answered.
No matter how cool the URI, its the application that determines the
user friendliness. If you're session expired you will have to log in or
have a cookie with your data to continue where you left off. Whether
each page has a unique URI is unimportant as long as the application is
smart enough to know where you left off.
On a related note, the idea of using 123456.cont as part of a URI that
a user can see is SOOO UNCOOL. Hide these things in forms so a user
will never see them. Once a user enters a flow that should be the last
and only URL they see in the address bar until they exit flow and/or
choose to move to a new resource.
Holding such a view as to how and when to use flow allows cool URLs
since the resource essentially is a flow function that controls user
interaction with the model; in other words a mini-application.
The point is as things currently stand cool URLs are possible even
considering flow.
>
> If you think that reusable URLs is a good idea, not only in publishing
> oriented sites but in webapps as well, you will need to have more
> external URLs and your webapp will get more in common with publishing
> oriented sites.
URL wise there is no reason for them to be different, at least from a
user perspective. Further, there never was. I don't see the need to
have more external URIs, I just better designed web applications. The
resource is the application not an individual page the application
presents.
To my mind what a tree based sitemap buys is a better model of the site
as application (publication or webapp). Because of that it just might
lead to cooler URLs but not guarantee it.
>
>
Glen Ezkovich
HardBop Consulting
glen at hard-bop.com
http://www.hard-bop.com
A Proverb for Paranoids:
"If they can get you asking the wrong questions, they don't have to
worry about answers."
- Thomas Pynchon Gravity's Rainbow
Re: [RT] Escaping Sitemap Hell
Posted by Daniel Fagerstrom <da...@nada.kth.se>.
Ugo Cei wrote:
> Il giorno 06/gen/05, alle 01:54, Daniel Fagerstrom ha scritto:
>
>>
>> The requested resource will often be based on some content or
>> combination of content that we can access from Cocoon. The content
>> can be a file, data in a db, result from a business object etc. Let
>> us assume that it resides in some kind of content repository. Now if
>> we think about it, isn't it more natural to ask the content, that we
>> are going to use, about its propertiies like type, format, status,
>> access rights, etc, than to encode it in the URL? This properties can
>> be encoded in the file name, in metadata in some property file,
>> within the file, in a DB etc. Now instead of having the rule:
>>
>> *.x.y.z ==> XYZPipeline
>>
>> we have
>>
>> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>>
>> or
>>
>> * where repository:{1}.x.y.z exists ==> XYZPipeline
>>
>> We get the right pipeline by querying the repository instead of
>> encoding it in the URL. A further advantage is that the rule becomes
>> "listable" as the "where" clause in the match expresses what are the
>> allowed values for the wildcard.
>
>
> Unless I misinterpret what you mean, we already can do this:
>
> <map:match pattern="*">
> <map:call function="fun">
> <map:parameter name="par" value="{1}"/>
> </map:call>
> </map:match>
>
> function fun() {
> var entity = repo.query("select * from entities where id = " +
> cocoon.parameters.par + " and {x, y, z}");
> cocoon.sendPage("views/XYZPipeline", { entity : entity });
> }
>
> <map:match pattern="XYZPipeline">
> <map:generate type="jx" src="xyz.jx.xml"/>
> ...
> </map:match>
My example was a little bit unclear what I meant was that you could have
a number of sitemap rules:
* where repository:{1} have properites {x, y, z} ==> XYZPipeline
* where repository:{1} have properites {a, b, c} ==> ABCPipeline
* where repository:{1} have properites {a, b, d} ==> ABDPipeline
Which rather would correspond to:
<map:match pattern="*" where="repository:{1} have properites {x, y, z}"
type="property">
<!-- call XYZPipeline -->
</map:match>
<map:match pattern="*" where="repository:{1} have properites {a, b, c}"
type="property">
<!-- call ABCPipeline -->
</map:match>
etc
That would be rather inneficient in the current sequencial search based
sitemap, while it could be efficient in a tree based matcher.
> Apart from the obviously contrived example, isn't the Flowscript just
> what we need to "get the right pipeline by querying the repository"?
You could implement that above example by putting the property based
switch in a flowscript instead, thats true. My own experience with using
a flowscript as switchboard, have made me believe that it is an anti
patern that should be avoided. But maybe other people have been luckier.
One of my aims with the RT was to make the sitemap more usable as a map
over the site, by making it tree structured and more declarative. From
that view using flowscripts instead of sitemaps is a step in the wrong
direction IMO.
>> I would propose to go even further, in the "real" site map it should
>> only be allowed to call VPC pipelines, no pipeline construction is
>> allowed, that should be done in the component area.
>
>
> In my sitemaps, public pipelines contain almost only <map:call> and
> <map:read> (for static resources) elements. All "classical"
> generator-transformer-serializer pipelines go into an "internal-only"
> pipeline that can be called from flowscripts only.
>
> Admittedly, this is fine for webapps, and maybe not so much for
> publishing-oriented websites. But what I want to point out is that
> your otherwise very well thought-out RT is incomplete if it doesn't
> take Flowscript in consideration, IMHO.
I hoped that no one would notice, and that we could discuss the publish
oriented stuff before handling such complications ;)
But you are completely right, flowscripts must also be discussed. This
breaks down in two parts: how do we design a cool URL space for a
flowscript driven webapp, and how do we implement such a URL space in
Cocoon. Does what we have give good support or do we need new mechanisms?
I'll try to say something about cool URLs for flowscripts and leave the
second part to another time or as a rather non trivial exercise for the
interested reader ;)
Cool URLs for webapps
=====================
Ok, I continue to uncritically assume that the guidelines in
http://www.w3.org/TR/2003/NOTE-chips-20030128/ are good and try to apply
them on the current situation. Given that, I don't consider:
12345.cont
particulary cool. No way back at all after your web-continuations have
expired. Sometimes that might be the most reasonable response that is
available. But we could do better than always using it.
When planning the URL space for the webapp, all allowed access points to
the webapp must be given a "cool" URL. Let us say that we have a wizard
and that users only are allowed to access it from the first screen when
they don't have a valid session. in this case we could have:
wizard
as URL to the start point and
wizard?cont=12345
as URL to a screen within the session. If that URL is bookmarked and
used after the expiration of the continuation, the user will get a
permanent redirect to the "wizard" URL as response, possibly with an
"session expired" message in the respons page.
If we follow this idea, we can think of our webapps in a transaction
oriented way. Each time we have "commited a transaction" in our
flowscript we could give a "cool" URL for the next screen in the flow
(if it is allowable as a starting point). Now, this is a little bit
tricky as the user probably made a post to something like:
wizard?cont=23456
where the transaction was committed, and that the flow could continue to
"wizard2" or "wizard3" depending on user choice. This could maybe be
solved by doing a redirect to the new wizard.
If we are creating a persistent object in the wizard we can take it
further if we want to. Then we start the wizard the object is given an id:
123456/wizard
then we can go back to a wizard initialized from the object later. If we
want to use some other identification after the object is commited, we
have to save a mapping from the initial id to the correct one and do a
permanent redirect when the URL is accesed. If we don' like going to a
wizard we do a permanent direct to something more relevant. We could
even let the user go back to specific pages for persistent objects:
123456/wizard/3
for such cases there are not much use in having a continuation id at all
in the URL except if we want to let the user having multiple instances
of the same page with different content.
If we follow the URL style without continuation IDs we use the session
for differing between users instead.
--- o0o ---
How much work you should spend in creating "cool" URLs in your webapp,
varies of course from application to application. I just wanted to point
out that we can do more than 123456.cont if we want to. I have used more
than one webapp that goes through many "transactions" during use, but
force me to start the navigation from some start screen if my session
expires. At least I would found such applications more userfriendly if
they had used "cooler" URIs.
If you think that reusable URLs is a good idea, not only in publishing
oriented sites but in webapps as well, you will need to have more
external URLs and your webapp will get more in common with publishing
oriented sites.
/Daniel
RE: [RT] Escaping Sitemap Hell
Posted by Conal Tuohy <co...@paradise.net.nz>.
> Il giorno 06/gen/05, alle 01:54, Daniel Fagerstrom ha scritto:
> > * where repository:{1}.x.y.z exists ==> XYZPipeline
> >
> > We get the right pipeline by querying the repository instead of
> > encoding it in the URL. A further advantage is that the
> rule becomes
> > "listable" as the "where" clause in the match expresses
> what are the
> > allowed values for the wildcard.
Ugo Cei wrote:
> Unless I misinterpret what you mean, we already can do this:
>
> <map:match pattern="*">
> <map:call function="fun">
> <map:parameter name="par" value="{1}"/>
> </map:call>
> </map:match>
>
> function fun() {
> var entity = repo.query("select * from entities where id = " +
> cocoon.parameters.par + " and {x, y, z}");
> cocoon.sendPage("views/XYZPipeline", { entity : entity });
> }
>
> <map:match pattern="XYZPipeline">
> <map:generate type="jx" src="xyz.jx.xml"/>
> ...
> </map:match>
>
> Apart from the obviously contrived example, isn't the Flowscript just
> what we need to "get the right pipeline by querying the repository"?
I was struck by your example because right now we are revising our website
using the same technique you describe, with a single external pipeline
calling a flowscript. (BTW the revised website isn't public yet but should
be ready next month.) We're using a topic map as the metadata repository
(with TM4J). As in Daniel's example it completely decouples the external URL
space from the URLs of internal pipelines.
In the "external" sitemap, we just marshall a couple of parameters out of
the URL and request headers, and pass them to a flowscript. This is the
first time I've used flowscript, but it has been fairly easy to write and
it's worked pretty well.
The flowscript queries the topic map to find the topic to display, and the
appropriate internal pipeline to use. It also looks up other "scope" topics
which define different viewpoints of the other topics. They are such things
as different languages, and (since this is a digital library application) we
also have "simplified" and "scholarly" scopes. The flowscript traverses the
class-instance and superclass-subclass hierarchies between the topics
looking for a jxtemplate to use (in the appropriate scope). Finally it
passes the "content" topic and the "scope" topics (and a basic ontology of
other topics) to the specified jxtemplate pipeline.
> In my sitemaps, public pipelines contain almost only <map:call> and
> <map:read> (for static resources) elements. All "classical"
> generator-transformer-serializer pipelines go into an "internal-only"
> pipeline that can be called from flowscripts only.
>
> Admittedly, this is fine for webapps, and maybe not so much for
> publishing-oriented websites.
Yes I think for webapps you could do the mapping just in javascript, but for
publishing I think you really need a metadata repository of some sort. You
could use an xml document linkbase, as Daniel suggests, or a cms, or a topic
map or rdf store, or a sql db or any number of things.
Con
Re: [RT] Escaping Sitemap Hell
Posted by Ugo Cei <ug...@apache.org>.
Il giorno 06/gen/05, alle 01:54, Daniel Fagerstrom ha scritto:
>
> The requested resource will often be based on some content or
> combination of content that we can access from Cocoon. The content can
> be a file, data in a db, result from a business object etc. Let us
> assume that it resides in some kind of content repository. Now if we
> think about it, isn't it more natural to ask the content, that we are
> going to use, about its propertiies like type, format, status, access
> rights, etc, than to encode it in the URL? This properties can be
> encoded in the file name, in metadata in some property file, within
> the file, in a DB etc. Now instead of having the rule:
>
> *.x.y.z ==> XYZPipeline
>
> we have
>
> * where repository:{1} have properites {x, y, z} ==> XYZPipeline
>
> or
>
> * where repository:{1}.x.y.z exists ==> XYZPipeline
>
> We get the right pipeline by querying the repository instead of
> encoding it in the URL. A further advantage is that the rule becomes
> "listable" as the "where" clause in the match expresses what are the
> allowed values for the wildcard.
Unless I misinterpret what you mean, we already can do this:
<map:match pattern="*">
<map:call function="fun">
<map:parameter name="par" value="{1}"/>
</map:call>
</map:match>
function fun() {
var entity = repo.query("select * from entities where id = " +
cocoon.parameters.par + " and {x, y, z}");
cocoon.sendPage("views/XYZPipeline", { entity : entity });
}
<map:match pattern="XYZPipeline">
<map:generate type="jx" src="xyz.jx.xml"/>
...
</map:match>
Apart from the obviously contrived example, isn't the Flowscript just
what we need to "get the right pipeline by querying the repository"?
> I would propose to go even further, in the "real" site map it should
> only be allowed to call VPC pipelines, no pipeline construction is
> allowed, that should be done in the component area.
In my sitemaps, public pipelines contain almost only <map:call> and
<map:read> (for static resources) elements. All "classical"
generator-transformer-serializer pipelines go into an "internal-only"
pipeline that can be called from flowscripts only.
Admittedly, this is fine for webapps, and maybe not so much for
publishing-oriented websites. But what I want to point out is that your
otherwise very well thought-out RT is incomplete if it doesn't take
Flowscript in consideration, IMHO.
Ugo
--
Ugo Cei - http://beblogging.com/blojsom/blog/
Re: [RT] Escaping Sitemap Hell
Posted by Ralph Goers <Ra...@dslextreme.com>.
Daniel Fagerstrom wrote:
>
>> The layout actually separates the site layout from the sitemap
>> quite nicely.
>
>
> Care to explain what I should look for, and in what document I find
> the info?
>
The easiest way to see what I am talking about is to look at the portal
sample site. Look at
build/webapp/samples/blocks/portal/profiles/layout/portal.xml. This is
the site layout for after you have logged in. portal-user-anonymous.xml
is for before you have logged in (technically, you are logged in as user
anonymous). This is documented at
http://cocoon.apache.org/2.1/developing/portal/portal-block.html.
If you have any other questions feel free to ask.
Re: [RT] Escaping Sitemap Hell
Posted by Daniel Fagerstrom <da...@nada.kth.se>.
Ralph Goers wrote:
> Daniel Fagerstrom wrote:
>
>>
>> Is it a Map of the Site?
>> ------------------------
>>
>> The Forrest people don't think that the sitemap is enough as map of
>> the site. They have a special linkmap [1] that gives a map over the
>> site and that is used for internal navigation and for creating menu
>> trees. I have a similar view. From the sitemap it can be hard to
>> answer basic questions like:
>>
>> * What is the URL space of the application
>> * What is the structure of the URL space
>> * How is the resource refered by this URL produced
>>
>> The basic view of a URL in the sitemap is that it is any string. Even
>> if there are constructions like mount, the URL space is not
>> considered as hierarchial. That means that the URLs can be presented
>> as patterns in any order in the sitemap and you have to read through
>> all of it to see if there is a rule for a certain URL.
>>
>> A real map for the site should be tree structured like the linkmap in
>> forrest. Take a look at the example in [1], (I don't suggest using
>> the "free form" XML, something stricter is required). Such a tree
>> model will also help in planning the URI space as it gives a good
>> overview of it.
>
>
> Out of curiosity, have you looked at the Portal block?
Not until now. But I spent to large part of the night writing my RT to
be able to understand how everything worked together :/
> The layout actually separates the site layout from the sitemap quite
> nicely.
Care to explain what I should look for, and in what document I find the
info?
> Unfortunately, the Portal URLs are not cool. I'm in the process
> right now of attempting to fix that.
Have you looked at XFrames URLs http://www.w3.org/TR/xframes/, they
adress a problem area that should be quite similar to the one you have
in portals, IIUC.
| http://example.org/home.frm#frames(id1=uri1,id2=uri2,...)|
You need to write a parser for the frames part, but I could use that
booth in my attribute template language proposal and in the virtual
pipeline URLs that I proposed in the current thread, so it is well
invested time for me if I convince you to write such a parser ;)
/Daniel
> We are finding that exposing the Portal's events in the browser causes
> JSR-168 portlets to fall on their face simply by reloading the page.
> Not cool at all.
>
> Ralph
>
Re: [RT] Escaping Sitemap Hell
Posted by Ralph Goers <Ra...@dslextreme.com>.
Daniel Fagerstrom wrote:
>
> Is it a Map of the Site?
> ------------------------
>
> The Forrest people don't think that the sitemap is enough as map of
> the site. They have a special linkmap [1] that gives a map over the
> site and that is used for internal navigation and for creating menu
> trees. I have a similar view. From the sitemap it can be hard to
> answer basic questions like:
>
> * What is the URL space of the application
> * What is the structure of the URL space
> * How is the resource refered by this URL produced
>
> The basic view of a URL in the sitemap is that it is any string. Even
> if there are constructions like mount, the URL space is not considered
> as hierarchial. That means that the URLs can be presented as patterns
> in any order in the sitemap and you have to read through all of it to
> see if there is a rule for a certain URL.
>
> A real map for the site should be tree structured like the linkmap in
> forrest. Take a look at the example in [1], (I don't suggest using the
> "free form" XML, something stricter is required). Such a tree model
> will also help in planning the URI space as it gives a good overview
> of it.
Out of curiosity, have you looked at the Portal block? The layout
actually separates the site layout from the sitemap quite nicely.
Unfortunately, the Portal URLs are not cool. I'm in the process right
now of attempting to fix that. We are finding that exposing the Portal's
events in the browser causes JSR-168 portlets to fall on their face
simply by reloading the page. Not cool at all.
Ralph