You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by Steven Noels <st...@outerthought.org> on 2002/07/26 23:15:09 UTC

URI namespace management & the sitemap

Hi all,

happy as I am with the current progress we made with the new forrestbot, 
I'm planning to convert some private sites to Forrest now, and was 
immediately stumped by the fact that we haven't thoroughly discussed nor 
analyzed the URI namespace management. So I want to regroup a number of 
thoughts I have and throw this in the group, hopefully ending up with 
some checklist and a nice todo to refactor the sitemap and the assorted 
skins and xdocs.

1) Index documents

Currently, there is no matcher set up for URIs ending with a trailing 
slash, which means I saw quite some broken links being reported by the 
CLI during my initial trial of outerthought.net. I believe we should add 
those matchers so that we support book.xml links like this:

   <menu label="Document Samples">
     <menu-item label="DTD documentation" href="dtd-docs.html"/>
     <menu-item label="document-v11 (HTML)" href="document-v11.html"/>
     <menu-item label="document-v11 (PDF)" href="document-v11.pdf"/>
     <menu-item label="How-Tos" href="community/howto/"/>
                                                   ^^^^
     <menu-item label="xml.apache.org" href="xml-site/"/>
                                                   ^^^^
   </menu>

and in normal <link> links also, of course.

2) Filename extensions

I gather some people will want to generate filenames with other 
extensions than the default .html ones, e.g. if they want to have 
further serverside include behaviour triggered on the webserver level. 
Im +0 whether we really shouls support this, but maybe some people 
creating there own skins will want to generate .php3, .shtml or similar 
files. I believe we could support using an XSLT library template doing 
file extension rewriting for link elements, and having configurable 
matchers, but I'm not sure whether this is supported with Cocoon:

    <map:match pattern="*.{ext-parameter}">
     <map:aggregate element="site">
      <map:part src="cocoon:/book-{1}.xml"/>
      <map:part src="cocoon:/tab-{1}.xml"/>
      <map:part src="cocoon:/body-{1}.xml" label="content"/>
     </map:aggregate>
     <map:call resource="skinit">
       <map:parameter name="type" value="site2xhtml"/>
     </map:call>
    </map:match>

and then some general sitemap parameter {ext-parameter} being set to 
".html" to start with, hopefully configurable through some CLI 
inputmodule possibly overriding this parameter from the commandline.

Is this feasible?

3) host and project location

With the current issue of the tab-link prefixes being hardcoded in the 
XSLT in mind, I thought we should set those 'host' and 
'project-location' links like this:

  http://{host}/{project-location}/foobaruri

  - host address links created in the XSLT that makes up the skin, 
perhaps also configured from the outside with the aforementioned sitemap 
parameters being fed into the XSLT as XSLT <param>s and some CLI 
inputmodule setting those sitemap parameters.

  - project location can be inherited from the forrestbot.conf.xml (what 
about sites generated using "./build.{sh|bat} docs" then, without the 
forrestbot) and primarly used in links created in the menu and tabs 
pipeline - also fed in using the same mechanism or a dash of Ant 
filtering ;-)

4) 'static', pregenerated resources, like downloads, Javadoc et al.

for outerthought.net, I have some PDF's and binary downloads, and also 
the XMLSpy generated Schema documentation, basically a bunch of 
resources that should not pass the Forrest pipeline.

Putting them in place on the server can be done from outside the 
forrestbot/forrestprocess, or we could have those handled with readers 
and store them inside the {project:}src/documentation tree, preferably 
in the src/documentation/resources/ directory.
But we cannot enforce this, nor do I want to check in the 100+ files of 
generated XMLSpy docs into my outerthought.net CVS - I'm happy to manage 
those directly as files on my webserver. If I want to link to them 
however, the build fails since it cannot process the link to those files 
(<link href="sitemap.html">). What can we do about this...?

  - having a pipeline set up for static resources, i.e.

   <map:match pattern="static/**.extension">

in the Forrest sitemap - which means we will have to manage some list of 
reader matchers for each and every mimetype (something which should 
already be taken care off by the webserver).

  - make the CLI fail gracefully for unresolvable links

  - have some special link (or attribute set for the <link> element) 
indicating to the CLI that it shouldn't try to traverse that link, even 
though there is a pipeline set up for it

For people who want to import their static resources into CVS, we could 
make sure they are moved over to the published site if they are not 
processed by the crawler - perhaps configuring some copy-over task in 
forrestbot.conf.xml

5) pipelines

(maybe we are lucky and Sylvain comes up with that DocTypeMatcher right 
away, but I don't think so ;-)

in general, we have externally accessible pipelines set up for:

  - "" (entry page)
  - apachestats (not used currently)
  - *.html
  - **/*.html
  - *.pdf
  - **/*.pdf
  - libre (testing purposes)

and a whole lot of skin- and project-related matchers for css, js, and 
images

the other ones are internal to Forrest (and should be set so IMO):

  - **tab-**.xml
  - **book-**/*.xml
  - **book-**.xml
  - body-todo.xml
  - body-changes.xml
  - body-faq.xml
  - body-community/*/index.xml
  - body-community**revision-*.xml
  - body-community/*/*/**.xml
  - revisions-community/*/*/**
  - doclist/content/xdocs/**book.xml
  - body-doclist.xml
  - body-**.dtdx.xml
  - body-**.xml

body-todo/changes/faq could possibly handled by the virtual 
DocTypeMatcher, the community/revision stuff I'm not so sure we should 
keep it as-is (or isolate it in its own sitemap), the nekodtd matcher is 
a specialty thing to Forrest, and the doclist is perhaps a candidate for 
replacement by libre.

Anyway, we don't like too lenghty mails, and my babydaughter wants some 
attention. So here it is, please comment and discuss until these issues 
are solved. We need a todo for next week ;-)

</Steven>
-- 
Steven Noels                            http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org                      stevenn@apache.org