You are viewing a plain text version of this content. The canonical link for it is here.
Posted to docs@cocoon.apache.org by do...@cocoon.apache.org on 2005/01/12 22:12:52 UTC

[Cocoon Wiki] New: SimpleContentModel

   Date: 2005-01-12T13:12:51
   Editor: MarkLundquist
   Wiki: Cocoon Wiki
   Page: SimpleContentModel
   URL: http://wiki.apache.org/cocoon/SimpleContentModel

   no comment

New Page:

= A simple content model for pages generated from static XML sources =

In a standard Web server (such as [http://httpd.apache.org Apache httpd]), the default interpretation of the URL path component is as a filesystem pathname relative to the Document``Root directory.  In Cocoon, there's no default; if a request doesn't match in the sitemap, you get the "No pipeline matched" error.

The Cocoon sitemap gives you complete freedom in mapping requests to processing resources (pipelines).  But quite often, it makes sense to organize source documents in the filesystem in a way that maps directly to the request path component, i.e. something very analogous to what a standard Web server does (but not precisely the same, since we're still piping the source XML through some transformations to generate the HTML presentation).

This HOW-TO offers a content model and sitemap implementation that does this, and which also handles the following additional concerns automatically (i.e., without page-specific rules in the sitemap):
 * Co-location of additional page-specific resources
 * External redirects to a "trailing slash" form of the URL (yes, you '''do''' want this...)

I'll explain the content model by way of examples, and discuss these additional aspects in context of the examples.  

You can use this model as a starting point, and modify it according to your needs/tastes/etc.

= The Content Model =

OK, then... first of all, in this implementation all the source documents handled in this way live in a directory structure rooted at a directory named 'content' that is a subdirectory of the webapp context directory.

== The simplest case ==

In the simplest case, we just want to take an XML source document and transform it in the standard way.  In this case, the request

{{{
/path/to/something
}}}

maps to the source document

{{{
content/path/to/something.xml
}}}

The implementation looks for this resource first.

== Co-located resources ==

It seems like a standard practice in implementing websites in traditional web servers is to create some subdirectories in the Document``Root to contain non-HTML resources, like:

{{{
images/
css/
flash/
js/
}}}

I never liked just dumping "all the images" into one big subdirectory, just because they happen to be images.  I'd rather have assets co-located with the documents that use them.

In this content model, a page that includes specific resources is represented by a directory whose name ends with ".page".  In this case, the same example request '/path/to/dogs/Bowser' maps to this directory structure:

{{{
content/path/to/dogs
   Bowser.page/
      source.xml       # the document to be transformed

      # Everything else is ad hoc, whatever the page wants
      # (i.e. the sitemap doesn't know/care about it)
      #
      images/
         bad_dog.jpg
         favorite_bone.jpg
         # etc..
      style.css        # if I had a page-specific external stylesheet
      client.js        # maybe there's some javascript specific to this page...
      flash/
         bowser_chasing_tail.swf
}}}

''[To-Do: document how the page references these resources and how Cocoon or Apache serves them]''

The implementation looks for {{{Bowser.page/source.xml}}} in {{{content/path/to/dogs}}} if the resource {{{Bowser.xml}}} (see "The simplest case" above) does not exist there.

== "Trailing slash" resources ==

Sometimes you really need a resource to be served at a URI that ends with a slash, because it determines whether a relative link in the page denotes a ''child''(if the current URL has a trailing slash), or a ''sibling'' (current URL has no trailing slash) of the current page.  If the URL of the current page doesn't end with a slash (e.g., {{{path/to/dogs}}}), then we have an annoying situation if we want to include a link on that page to a child resource (e.g., {{{path/to/dogs/Bowser}}}).  Our choices are:

 * Use a rooted URI, e.g. {{{<a href="/path/to/dogs/Bowser">}}}
    This arguably sucks... we'd like to avoid being tightly-coupled to site structure here.
 * ...on the other hand, the page-relative link looks like this: {{{<a href="../dogs/Bowser">}}}.
    You may (as do I) find this confusing and klunky.  Moreover, the path component for "this page" might not be as simple as "dogs"; it might be something we have to generate, as in the URI {{{CustomerAccounts/94006748}}}.

So it just seems best to adopt the convention that "a resource with children has a URI ending in '/'".

But it would be dumb to require the user to know which resources need to end in '/'!  Only we should need to know this, and then we redirect the user's browser if necessary.

Here's how we handle it.  Given the request

{{{path/to/dogs}}}

if neither {{{Bowser.xml}}} nor {{{Bowser.page/source.xml}}} exist in {{{content/path/to}}}, then the client is redirected to

{{{path/to/dogs/}}}

Request URIs ending in '/' are matched by an internal redirect to the a relative resource named 'main'.  So for this example, that would be {{{path/to/dogs/main}}}.  This request then is handled by our selector as already described, so if the resource {{{content/path/to/dogs/main.xml}}} exists, then it is used.  If not, then the selector would next try {{{content/path/to/dogs/main.page/source.xml}}}.  So we can have a structure like this

{{{
conent/path/to/dogs/
   main.page/               # URL: 'path/to/dogs/'
      source.xml
      thumbnail-images/
          Bowser.jpg
          Doofus.jpg
   Bowser.page/             # URL: 'path/to/dogs/Bowser'
       source.xml
       # etc. per above
   Doofus.page/             # URL: 'path/to/dogs/Doofus'
       source.xml
       # etc.
   vet_story.xml            # URL: 'path/to/dogs/vet_story
                            # this one has no page-specific resources
}}}

...and of course this kind of structure can be nested in any arbitrary way.

The resource "main.xml" or "main.page/source.xml" here sort of corrseponds to "index.html" or "index.php" or whatever you might be used to in the Apache Directory``Index directive (except that I always called mine "main.php", since it usually isn't really an "index", is it?)

= The Sitemap =

Now for the sitemap fragment that implements this.  Sorry, I use a default namespace in my sitemaps, so you'll have to add "map:" to everything unless you do the same.

OK, here it is:

{{{
    <match pattern="content//**/">
      <redirect-to uri="content//{1}/main" />
    </match>

    <match pattern="content//**">
      <select type="resource-exists">
          <when test="content/{1}.xml">
            <generate src="content/{1}.xml" />
          </when>
          <when test="content/{1}.page">
            <generate src="content/{1}.page/source.xml" />
          </when>
          <when test="content/{1}">
            <redirect-to uri="/{1}/" global="true" />
          </when>
          <otherwise>
            <!-- we want an error message, so... -->
            <generate src="content/{1}" /> <!-- (non-existent) -->
          </otherwise>
      </select>
      <serialize type="xml"/>
    </match>

    <match pattern="**">
      <redirect-to uri="cocoon:/content//{0}" />
    </match>
}}}

That's all there is to it!

----
''Author:'' MarkLundquist