You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by je...@apache.org on 2003/01/07 17:16:33 UTC
cvs commit: xml-forrest/src/documentation/content/xdocs linking.xml
jefft 2003/01/07 08:16:33
Added: src/documentation/content/xdocs linking.xml
Log:
cat brain | grep linking > linking.xml
Revision Changes Path
1.1 xml-forrest/src/documentation/content/xdocs/linking.xml
Index: linking.xml
===================================================================
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
"document-v11.dtd" [
<!ENTITY a '<code>index.xml</code>'>
<!ENTITY b '<code>todo.xml</code>'>
<!ENTITY s '<code>site.xml</code>'>
]>
<document>
<header>
<title>Menus and Linking</title>
<version>$Revision: 1.1 $</version>
<authors>
<person name="Jeff Turner" email="jefft@apache.org"/>
</authors>
</header>
<body>
<section id="intro">
<title>Introduction</title>
<p>
This document describes Forrest's internal URI space; how it is managed
with &s;, how menus are generated, and how various link schemes
(site:, ext:) work.
</p>
</section>
<section>
<title>site.xml</title>
<p>
&s; is what we'd call a 'site map' if Cocoon hadn't already claimed
that term. &s; is a loosely structured XML file, acting as a map of the
site's contents. It provides a unique identifier (an XPath address)
for 'nodes' of information in the website. A 'node' of site information
can be:
</p>
<ul>
<li>A category of information, like 'the user guide'. A category may
correspond to a directory, but that is not required.</li>
<li>A specific page, eg 'the FAQ page'</li>
<li>A specific node (identified by <code>id</code> attribute) in an
XML file.</li>
</ul>
<p>
In addition to providing fine-grained addressing of site info, &s;
allows <em>metadata</em> to be associated with each node, with
attributes or child elements. Most commonly, a <code>label</code>
attribute is used to provide a text description of the node.
</p>
<p>
There are currently two applications of &s;
</p>
<dl>
<dt><link href="#menu_generation">Menu generation</link></dt>
<dd>&s; is used to generate the menus for the HTML website, replacing
the old <code>book.xml</code> system</dd>
<dt><link href="#semantic_linking">Semantic linking</link></dt>
<dd>&s; provides a basic aliasing mechanism for linking. Eg, one
can write <link href="site:changes"> from anywhere in the site, and
link to the 'changes' information node (translated to changes.html).
More on this below.</dd>
</dl>
<p>
Here is a sample site.xml, a stripped-down version from Forrest's
own <link href="ext:forrest">website</link>:
</p>
<source><![CDATA[
<?xml version="1.0"?>
<site label="Forrest" href="" xmlns="http://apache.org/forrest/linkmap/1.0">
<about label="About">
<index label="Index" href="index.html"/>
<license label="License" href="license.html"/>
<your-project label="Using Forrest" href="your-project.html">
<new_content_type href="#adding_new_content_type"/>
</your-project>
<linking label="Linking" href="linking.html"/>
<changes label="Changes" href="changes.html"/>
<todo label="Todo" href="todo.html"/>
<live-sites label="Live sites" href="live-sites.html"/>
</about>
<community label="Community" href="community/">
<index label="About" href="index.html"/>
<howto-samples label="How-To Samples" href="howto/">
<single-page label="Single Page" href="v10/howto-v10.html"/>
<xmlform label="Multi-Page" href="xmlform/">
<intro label="Intro" href="howto-xmlform.html"/>
<step1 label="Step 1" href="step1.html"/>
<step2 label="Step 2" href="step2.html"/>
</xmlform>
</howto-samples>
</community>
<references label="References">
<gump label="Apache Gump" href="http://jakarta.apache.org/gump/"/>
<cocoon label="Apache Cocoon" href="http://xml.apache.org/cocoon/"/>
</references>
<external-refs>
<mail-archive href="http://marc.theaimsgroup.com"/>
<xml.apache.org href="http://xml.apache.org/">
<cocoon href="cocoon/">
<ml href="mail-lists.html"/>
<actions href="userdocs/concepts/actions.html"/>
</cocoon>
<forrest href="forrest/"/>
<xindice href="xindice/"/>
<fop href="fop/"/>
</xml.apache.org>
<mail>
<semantic-linking href="http://marc.theaimsgroup.com/?l=forrest-dev&m=103097808318773&w=2"/>
</mail>
<cool-uris href="www.w3.org/Provider/Style/URI.html"/>
<uri-rfc href="http://zvon.org/tmRFC/RFC2396/Output/index.html"/>
</external-refs>
</site>
]]></source>
<p>As you can see, things are pretty free-form. The rules are as follows:</p>
<ul>
<li>The root element must be 'site', and normal content should be in the
namespace <code>http://apache.org/forrest/linkmap/1.0</code>. Feel
free to mix in your own content (RDF, dublin core, etc) under new
namespaces</li>
<li>Element names are used as identifiers. The <code>foo</code> in
<code>site:foo</code> must therefore be a valid NMTOKEN.</li>
<li>Elements with <code>href</code> attributes can be used as identifiers
in <code>site:</code> URIs</li>
<li>Relative href attribute contents are 'accumulated' by prepending hrefs
from ancestor nodes</li>
<li>Elements without <code>label</code> attributes (and their children)
are not displayed in the menu.</li>
<li>Elements below <code>external-refs</code> are mapped to the
<code>ext:</code> scheme. so <code>ext:cocoon/ml</code> becomes
<code>http://xml.apache.org/cocoon/mail-lists.html</code></li>
</ul>
</section>
<section id="menu_generation">
<title>Generating Menus</title>
<p>
If the &s; above were placed in
<code>src/documentation/content/xdocs/</code>, the generated website
would have a menu like this:
</p>
<figure src="images/menu.png" alt="Menu generated from site.xml"/>
<p>
As you can see, the elements without labels, like <new_content_type
href="#adding_new_content_type"/>, and the <code>external-refs</code>
section, are not displayed.
</p>
<p>
Files in subdirectories are displayed with a menu local to that
subdirectory:
</p>
<figure src="images/menu2.png" alt="Subdirectory menu generated from site.xml"/>
<note>Yes, this truncating of the menu is annoying in many circumstances,
and will be made configurable eventually. For now, to avoid generating
truncated menus, edit sitemap.xmap, line 661 or thereabouts:
<map:generate src="cocoon:/{dir}linkmap/{dir}"/> and remove the
'/{dir}'. See <link href="#menus_from_site">here</link> for more info.
</note>
<section>
<title>Overriding menus with book.xml</title>
<p>
Historically, menus in Forrest have been generated from a
<code>book.xml</code> file, one per directory. This mechanism is
still available, and if a <code>book.xml</code> is found, it will be
used in preference to the &s;-generated menu. Not only does this
preserve backwards-compatibility, it is sometimes necessary for sites
whose content isn't strictly hierarchical, or where the &s;-generated
menu isn't appropriate. <code>book.xml</code> files can use
<code>site:</code> URIs to ease the maintenance burden that led to
book.xml's obsolescence. In general however, we prefer to enhance the
&s;-based solution than rely on <code>book.xml</code> hacks - please
<link href="site:forrest-dev">let us know</link> if the &s; menu isn't
meeting your use-case.
</p>
</section>
</section>
<section id="destination_linking">
<title>The old system: destination linking</title>
<p>
Traditionally in Forrest (and similar systems), there has only been one
URI space: that of the generated site. If &a; wants to link to &b;, &a;
would use
</p>
<source>
<link href="todo.html">todo.html<link>
</source>
<p>
The theoretical problem with this is that the content producer should
not know or care how Forrest is going to render the source. A URI
should only <em>identify</em> a resource, not specify it's type [<link
href="ext:semantic-linking">mail ref</link>] [<link
href="ext:cool-uris">cool URIs</link>]. In fact, as Forrest
typically renders to multiple output formats (HTML and PDF), links in
one of them (here, the PDF) are going to break.
</p>
</section>
<section id="semantic_linking">
<title>Semantic linking</title>
<p>
Forrest's solution is simple: instead of <link href="todo.html">, write
<link href="site:todo">, where:
</p>
<dl>
<dt>site</dt>
<dd>is a URI 'scheme'; a namespace that restricts
the syntax and semantics of the rest of the URI [<link
href="ext:uri-rfc">rfc2396</link>]. The semantics of 'site' are
"this identifier locates something in the site's XML sources".</dd>
<dt>todo</dt>
<dd>identifies the content in <code>todo.xml</code>, by reference to a
'node' of content declared in &s;.</dd>
</dl>
<p>
We call this <em>semantic</em> linking because instead of linking to a
physical representation (todo.html), we've linked to the 'idea' of "the
todo file". It doesn't matter where it physically lives; that will be
sorted out by Forrest.
</p>
<section>
<title>Resolving site: URIs</title>
<p>
How exactly does <code>site:todo</code> get resolved? A full answer
is provided in the <link href="#implementation">implementation</link>
section. Essentially, the <code>todo</code> part has
<code>/site//</code> prepended, and <code>/@href</code> appended, to
form string <code>/site//todo/@href</code>. This is
then used as an XPath expression in &s; identifying the string
replacement, in this case <code>todo.html</code>.
</p>
<note>
Actually, the XPath is applied to XML generated dynamically from
d&s;. The generated XML has @href's fully expanded ('absolutized')
and ..'s added ('relativized') as needed.
</note>
<p>
Thus by modifying the XPath prefix and suffix, just about any XML
format can be accommodated.
</p>
<p>
Notice that the '//' allows us any degree of specificity when linking.
In the sample &s; above, both <code>site:new_content_type</code> and
<code>site:about/your-project/new_content_type</code> identify the
same node. It is up to you to decide how specific to make links. One
nice benefit of link 'ambiguity' is that &s; can be reorganized
without breaking links. For example, 'new_content_type' currently
identifies a node in 'your-project'. By leaving that fact unspecified
in <code>site:new_content_type</code>, we are free to make
'new_content_type' its own XML file, or a node in another file, in
another category.
</p>
</section>
<section>
<title>ext: URIs: linking to external URLs</title>
<p>
The <code>ext:</code> scheme was created partly to demonstrate the
ease with which new schemes can be defined, and partly for practical
use. <code>ext:</code> URIs identify nodes in &s; below the
<external-refs> node. By convention, nodes here link to URLs
outside the website, and are not listed in the menu generated from
&s;.
</p>
<p>Here is a &s; snippet illustrating <code>external-refs</code>:</p>
<source><![CDATA[
<site>
...
<external-refs>
<mail-archive href="http://marc.theaimsgroup.com"/>
<xml.apache.org href="http://xml.apache.org/">
<cocoon href="cocoon/">
<ml href="mail-lists.html"/>
<actions href="userdocs/concepts/actions.html"/>
</cocoon>
</xml.apache.org>
<forrest href="forrest/"/>
<xindice href="xindice/"/>
<fop href="fop/"/>
...
</external-refs>
</site>
]]></source>
<p>
As an example, <link href="ext:cocoon/ml">
generates the link <link
href="ext:cocoon/ml">http://xml.apache.org/cocoon/mail-lists.html</link>
</p>
<p>
The general rules of &s; and <code>site:</code> linking apply.
Specifically, the @href aggregation makes defining large numbers of
related URLs easy.
</p>
</section>
<section>
<title>Theory: source URIs</title>
<p>
<code>site:</code> URIs like <code>site:todo</code> are examples of
<em>source</em> URIs, in contrast to the more usual
<code>foo.html</code>-style URIs, which we here call
<em>destination</em> URIs. This introduces an important concept: that
the <em>source</em> URI space exists and is independent of that of the
generated site. Furthermore, URIs (ie, links) are first-class objects,
on par with XML documents, in that just as XML content is transformed,
so are the links. Within the source URI space, we can have all sorts of
interesting schemes (person:, mail:, google:, java:, etc). These will
all be translated into plain old <code>http:</code> or relative URIs
in the destination URI space.
</p>
</section>
<section>
<title>Future schemes</title>
<p>
So far, <code>site:</code> and <code>ext:</code> schemes are defined.
To give you some ideas on other things we'd like to implement (and
we'd welcome help implementing), here are a few possibilities.
</p>
<table>
<tr><td>Scheme</td><td>Example 'From'</td><td>Example 'To'</td><td>Description</td></tr>
<tr>
<td>java</td>
<td>java:org.apache.proj.SomeClass</td>
<td><code>../../apidocs/org/apache/proj/SomeClass.html</code></td>
<td>
Links to documentation for a Java class (typically generated by
<code>javadoc</code>).
</td>
</tr>
<tr>
<td>mail</td>
<td>mail::<Message-Id></td>
<td><code>http://marc.theaimsgroup.com?t=12345678</code></td>
<td>
Links to an email, identified by its <code>Message-Id</code>
header. Any mail archive website could be used.
</td>
</tr>
<tr>
<td>search</td>
<td>search:<searchterm></td>
<td><code>http://www.google.com/search?q=searchterm</code></td>
<td>Link to set of results from a search engine</td>
</tr>
<tr>
<td>person</td>
<td>person:JT, person:JT/blog etc</td>
<td><code>mailto:jefft<at>apache.org</code>,
<code>http://www.webweavertech.com/jefft/weblog/</code>, etc:</td>
<td>
A <code>person:</code> scheme could be used, say, to insert an
automatically obfuscated email address, or link to a URI in some
way associated with that person.
</td>
</tr>
</table>
<p>
There are even more possibilities in specific environments. In an
intranet, a <code>project:XYZ</code> scheme could identify company
project pages. In a project like <link href="ext:ant">Apache
Ant</link>, each Task could be identified with
<code>task:<taskname></code>, eg <code>task:pathconvert</code>.
</p>
</section>
</section>
<section id="implementation">
<title>Implementation</title>
<p>
This section describes how the menu and linking systems are currently
implemented in Forrest. This is primarily of interest to Forrest
developers, and users wishing to implement their own schemes.
</p>
<section>
<title>Concept</title>
<p>
The <code>site:</code> scheme and associated ideas for &s; were
originally described in <link href="ext:linkmaps">the 'linkmap' RT
email</link> to the forrest-dev list (RT means 'random thought'; a
Cocoon invention). Only section 2 has been implemented, and there is
still significant work required to implement the full system
described. In particular, there is much scope for automating the
creation of &s; (section 4). However, what is currently implemented
gains most of the advantages of the system.
</p>
</section>
<section>
<title>Cocoon foundations: Input Modules</title>
<p>
The implementation of <code>site:</code> linking is heavily based on
Cocoon <link href="ext:cocoon/input-modules">Input Modules</link>, a
little known but quite powerful aspect of Cocoon. Input Modules are
generic Components which simply allow you to look up a value with a
key. The value is generally dynamically generated, or obtained by
querying an underlying data source.
</p>
<p>
In particular, Cocoon contains an <code>XMLFileModule</code>, which
lets one look up the value of an XML node, by interpreting the key as
an XPath expression. Cocoon also has a
<code>SimpleMappingMetaModule</code>, which allows the key to be
rewritten before it is used to look up a value.
</p>
<p>
The idea for putting these together to rewrite <code>site:</code>
links was described in <link href="ext:inputmoduletransformer">this
thread</link>. The idea was to write a Cocoon Transformer that
triggers on encountering <link
href="<code>scheme:address</code>">, and interprets the
<code>scheme:address</code> internal URI as
<code>inputmodule:key</code>. The transformer then uses the named
InputModule to look up the key value. The <code>scheme:address</code>
URI is then rewritten with the found value. This transformer was
implemented as <link
href="ext:linkrewritertransformer">LinkRewriterTransformer</link>.
</p>
</section>
<section>
<title>Implementing site: rewriting</title>
<p>
Using the above components, <code>site:</code> URI rewriting is
accomplished as follows.
</p>
<section>
<title>cocoon.xconf</title>
<p>
First, we declare an XMLFileModule called 'linkmap'. This is going
to provide access to the contents of &s;; for example,
<code>linkmap:/site/about/index/@href</code> should return the value
'index.html'. We declare this InputModule in
<code>WEB-INF/cocoon.xconf</code> with:
</p>
<source><![CDATA[
<component-instance
class="org.apache.cocoon.components.modules.input.XMLFileModule"
logger="core.modules.xml" name="linkmap">
<file src="cocoon:/linkmap"/>
<reloadable>true</reloadable>
</component-instance>
]]></source>
<p>
An interesting point is that we tell XMLFileModule to use
<em>dynamically generated XML</em> as its source. This allows us to
transform &s; before the XPath is applied. These transformations
are described below. Note that the <code>cocoon:/linkmap</code>
specified here is a static configuration which will be overridden,
as described below.
</p>
<p>
To simplify things for the user, and to hide the structure of our
XML, we now define a <em>mapping</em> module:
</p>
<source><![CDATA[
<!-- Links to URIs within the site -->
<component-instance
class="org.apache.cocoon.components.modules.input.SimpleMappingMetaModule"
logger="core.modules.mapper" name="site">
<input-module name="linkmap"/>
<prefix>/site//</prefix>
<suffix>/@href</suffix>
</component-instance>
]]></source>
<p>
This module rewrites the key, and uses it to query the
<code>linkmap</code> module. This means <code>site:index</code>
is equivalent to <code>linkmap:/site//index/@href</code>.
</p>
<p>The <code>ext</code> module is similarly defined: </p>
<source><![CDATA[
<!-- Links to external URIs, as distinct from 'site' URIs -->
<component-instance
class="org.apache.cocoon.components.modules.input.SimpleMappingMetaModule"
logger="core.modules.mapper" name="ext">
<input-module name="linkmap"/>
<prefix>/site/external-refs//</prefix>
<suffix>/@href</suffix>
</component-instance>
]]></source>
</section>
<section>
<title>sitemap.xmap</title>
<p>
Now in the sitemap, we have to define the LinkRewriterTransformer,
and insert it into any pipelines dealing with user-editable XML
content:
</p>
<source><![CDATA[
....
<map:transformer name="linkrewriter"
src="org.apache.cocoon.transformation.LinkRewriterTransformer">
<input-module name="linkmap" src="{src}" reloadable="true"/>
<input-module name="site">
<input-module name="linkmap" src="{src}"
reloadable="true"/>
<prefix>/site//</prefix>
<suffix>/@href</suffix>
</input-module>
</map:transformer>
....
<!-- Generates body HTML for files in subdirs -->
<map:match pattern="body-**/*.xml">
<map:generate src="content/xdocs/{1}/{2}.xml"/>
<map:transform type="linkrewriter" src="cocoon:/{1}/linkmap"/>
....
</map:match>
]]></source>
<p>
Why is the LinkRewriterTransformer reconfiguring the InputModules?
Because we only know what XML to feed the XMLFileModule at request
time. The XML is generated by the
<code>cocoon:/{1}/linkmap</code> pipeline, and we don't know
<code>{1}</code> until request time. Thus we need to effectively
reconfigure the InputModule on every request. Fortunately
InputModules are designed for this. They can be configured twice:
once 'statically' in <code>cocoon.xconf</code>, and then
'dynamically' at the point of execution.
</p>
<p>
The end result is that the source XML for sitemap request
<code>body-community/index.xml</code> has its links rewritten by
an XMLFileModule reading XML from
<code>cocoon:/community/linkmap</code>.
</p>
</section>
<section>
<title>Dynamically generating a linkmap</title>
<p>
Why do we need this 'linkmap' pipeline generating dynamic XML from
<code>site.xml</code>? The reasons are described in <link
href="ext:linkmaps">the linkmap RT</link>: we need to concatenate
@hrefs and add ..'s to the paths, depending on which directory the
linkee is in. This is done with the following pipelines:
</p>
<source><![CDATA[
<map:match pattern="abs-linkmap">
<map:generate src="content/xdocs/site.xml"/>
<map:transform src="library/xslt/absolutize-linkmap.xsl"/>
<map:serialize type="xml"/>
</map:match>
<map:match pattern="**linkmap">
<map:generate src="cocoon:/abs-linkmap"/>
<map:transform src="library/xslt/relativize-linkmap.xsl">
<map:parameter name="path" value="{0}"/>
</map:transform>
<map:serialize type="xml"/>
</map:match>
]]></source>
<p>You can try these URIs out directly on a live Forrest to see what
is going on.</p>
</section>
</section>
<section id="menus_from_site">
<title>Generating menus from site.xml</title>
<p>
The process of generating a HTML menu from &s; is fairly
straightforward Cocoon work. It is currently implemented with these
pipelines:
</p>
<source><![CDATA[
<map:resource name="book">
.... <!-- Stuff for using book.xml if present -->
<!-- If no book.xml, generate it from the linkmap. -->
<map:generate src="cocoon:/{dir}linkmap/{dir}"/>
<!-- The above generates the subset of the linkmap relevant to our
directory. -->
<map:transform src="library/xslt/site2book.xsl"/>
<map:call resource="skinit">
<map:parameter name="type" value="book2menu"/>
<map:parameter name="path" value="{path}"/>
</map:call>
</map:resource>
.....
<map:match pattern="abs-linkmap/**">
<map:generate src="cocoon:/abs-linkmap"/>
<map:transform type="xpath">
<map:parameter name="include" value="//*[@href='{1}']"/>
</map:transform>
<map:serialize type="xml"/>
</map:match>
<map:match pattern="**linkmap/**">
<map:generate src="cocoon:/abs-linkmap/{2}"/>
<map:transform
src="library/xslt/relativize-linkmap.xsl">
<map:parameter name="path"
value="{1}linkmap"/>
</map:transform>
<map:serialize type="xml"/>
</map:match>
]]></source>
<p>As with linking, we need to first 'absolutize' our &s; file by
concatenating all @hrefs. This is done in
<code>cocoon:/abs-linkmap</code>, shown in the previous section.
The twist is that, for subdirectories, we only want to show the part
of the menu relevant to that directory. We achieve this by filtering
out everything except the node with a specific @href value, using an
<link href="ext:xpathtransformer">XPathTransformer</link>. The
<code>include</code> param is <code>//*[@href='{1}']</code>, where
<code>{1}</code> will be replaced with the second <code>{dir}</code>
in the line:
<code><map:generate src="cocoon:/{dir}linkmap/{dir}"/></code>.
This is why removing the '/{dir}' stops truncation of menus.
</p>
<p>
The <code>site2book.xml</code> stylesheet generates
<code>book.xml</code> XML, as expected by the subsequent
<code>book2menu.xsl</code> stylesheet. In the future, this
intermediate format can be removed.
</p>
</section>
</section>
</body>
</document>