You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by je...@apache.org on 2003/01/07 17:16:33 UTC
cvs commit: xml-forrest/src/documentation/content/xdocs linking.xml

jefft       2003/01/07 08:16:33

  Added:       src/documentation/content/xdocs linking.xml
  Log:
  cat brain | grep linking > linking.xml
  
  Revision  Changes    Path
  1.1                  xml-forrest/src/documentation/content/xdocs/linking.xml
  
  Index: linking.xml
  ===================================================================
  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
  "document-v11.dtd" [
  <!ENTITY a '<code>index.xml</code>'>
  <!ENTITY b '<code>todo.xml</code>'>
  <!ENTITY s '<code>site.xml</code>'>
  ]>
  
  <document>
    <header>
      <title>Menus and Linking</title>
      <version>$Revision: 1.1 $</version>
      <authors>
        <person name="Jeff Turner" email="jefft@apache.org"/>
      </authors>
    </header>
  
    <body>
      <section id="intro">
        <title>Introduction</title>
        <p>
          This document describes Forrest's internal URI space; how it is managed
          with &s;, how menus are generated, and how various link schemes
          (site:, ext:) work.  
        </p>
      </section>
  
      <section>
        <title>site.xml</title>
        <p>
          &s; is what we'd call a 'site map' if Cocoon hadn't already claimed
          that term. &s; is a loosely structured XML file, acting as a map of the
          site's contents.  It provides a unique identifier (an XPath address)
          for 'nodes' of information in the website.  A 'node' of site information
          can be:
        </p>
        <ul>
          <li>A category of information, like 'the user guide'. A category may
            correspond to a directory, but that is not required.</li>
          <li>A specific page, eg 'the FAQ page'</li>
          <li>A specific node (identified by <code>id</code> attribute) in an
            XML file.</li>
        </ul>
        <p>
          In addition to providing fine-grained addressing of site info, &s;
          allows <em>metadata</em> to be associated with each node, with
          attributes or child elements.  Most commonly, a <code>label</code>
          attribute is used to provide a text description of the node.
        </p>
        <p>
          There are currently two applications of &s;
        </p>
        <dl>
          <dt><link href="#menu_generation">Menu generation</link></dt>
          <dd>&s; is used to generate the menus for the HTML website, replacing
            the old <code>book.xml</code> system</dd>
          <dt><link href="#semantic_linking">Semantic linking</link></dt>
          <dd>&s; provides a basic aliasing mechanism for linking.  Eg, one
            can write &lt;link href="site:changes"> from anywhere in the site, and
            link to the 'changes' information node (translated to changes.html).
            More on this below.</dd>
        </dl>
        <p>
          Here is a sample site.xml, a stripped-down version from Forrest's
          own <link href="ext:forrest">website</link>:
        </p>
        <source><![CDATA[
          <?xml version="1.0"?>
          <site label="Forrest" href="" xmlns="http://apache.org/forrest/linkmap/1.0">
  
            <about label="About">
              <index label="Index" href="index.html"/>
              <license label="License" href="license.html"/>
              <your-project label="Using Forrest" href="your-project.html">
                <new_content_type href="#adding_new_content_type"/>
              </your-project>
              <linking label="Linking" href="linking.html"/>
              <changes label="Changes" href="changes.html"/>
              <todo label="Todo" href="todo.html"/>
              <live-sites label="Live sites" href="live-sites.html"/>
            </about>
  
            <community label="Community" href="community/">
              <index label="About" href="index.html"/>
              <howto-samples label="How-To Samples" href="howto/">
                <single-page label="Single Page" href="v10/howto-v10.html"/>
                <xmlform label="Multi-Page" href="xmlform/">
                  <intro label="Intro" href="howto-xmlform.html"/>
                  <step1 label="Step 1" href="step1.html"/>
                  <step2 label="Step 2" href="step2.html"/>
                </xmlform>
              </howto-samples>
            </community>
  
            <references label="References">
              <gump label="Apache Gump" href="http://jakarta.apache.org/gump/"/>
              <cocoon label="Apache Cocoon" href="http://xml.apache.org/cocoon/"/>
            </references>
  
            <external-refs>
              <mail-archive href="http://marc.theaimsgroup.com"/>
              <xml.apache.org href="http://xml.apache.org/">
                <cocoon href="cocoon/">
                  <ml href="mail-lists.html"/>
                  <actions href="userdocs/concepts/actions.html"/>
                </cocoon>
                <forrest href="forrest/"/>
                <xindice href="xindice/"/>
                <fop href="fop/"/>
              </xml.apache.org>
  
              <mail>
                <semantic-linking href="http://marc.theaimsgroup.com/?l=forrest-dev&amp;m=103097808318773&amp;w=2"/>
              </mail>
              <cool-uris href="www.w3.org/Provider/Style/URI.html"/>
              <uri-rfc href="http://zvon.org/tmRFC/RFC2396/Output/index.html"/>
  
            </external-refs>
  
          </site>
          ]]></source>
        <p>As you can see, things are pretty free-form. The rules are as follows:</p>
        <ul>
          <li>The root element must be 'site', and normal content should be in the
            namespace <code>http://apache.org/forrest/linkmap/1.0</code>. Feel
            free to mix in your own content (RDF, dublin core, etc) under new
            namespaces</li>
          <li>Element names are used as identifiers.  The <code>foo</code> in
            <code>site:foo</code> must therefore be a valid NMTOKEN.</li>
          <li>Elements with <code>href</code> attributes can be used as identifiers
            in <code>site:</code> URIs</li>
          <li>Relative href attribute contents are 'accumulated' by prepending hrefs
            from ancestor nodes</li>
          <li>Elements without <code>label</code> attributes (and their children)
            are not displayed in the menu.</li>
          <li>Elements below <code>external-refs</code> are mapped to the
            <code>ext:</code> scheme.  so <code>ext:cocoon/ml</code> becomes
            <code>http://xml.apache.org/cocoon/mail-lists.html</code></li>
        </ul>
      </section>
  
      <section id="menu_generation">
        <title>Generating Menus</title>
        <p>
          If the &s; above were placed in
          <code>src/documentation/content/xdocs/</code>, the generated website
          would have a menu like this:
        </p>
        <figure src="images/menu.png" alt="Menu generated from site.xml"/>
        <p>
          As you can see, the elements without labels, like &lt;new_content_type
          href="#adding_new_content_type"/&gt;, and the <code>external-refs</code>
          section, are not displayed.
        </p>
        <p>
          Files in subdirectories are displayed with a menu local to that
          subdirectory:
        </p>
        <figure src="images/menu2.png" alt="Subdirectory menu generated from site.xml"/>
        <note>Yes, this truncating of the menu is annoying in many circumstances,
          and will be made configurable eventually. For now, to avoid generating
          truncated menus, edit sitemap.xmap, line 661 or thereabouts:
          &lt;map:generate src="cocoon:/{dir}linkmap/{dir}"/&gt; and remove the
          '/{dir}'. See <link href="#menus_from_site">here</link> for more info.
        </note>
        <section>
          <title>Overriding menus with book.xml</title>
          <p>
            Historically, menus in Forrest have been generated from a
            <code>book.xml</code> file, one per directory.  This mechanism is
            still available, and if a <code>book.xml</code> is found, it will be
            used in preference to the &s;-generated menu.  Not only does this
            preserve backwards-compatibility, it is sometimes necessary for sites
            whose content isn't strictly hierarchical, or where the &s;-generated
            menu isn't appropriate.  <code>book.xml</code> files can use
            <code>site:</code> URIs to ease the maintenance burden that led to
            book.xml's obsolescence.  In general however, we prefer to enhance the
            &s;-based solution than rely on <code>book.xml</code> hacks - please
            <link href="site:forrest-dev">let us know</link> if the &s; menu isn't
            meeting your use-case.
          </p>
        </section>
      </section>
  
      <section id="destination_linking">
        <title>The old system: destination linking</title>
        <p>
          Traditionally in Forrest (and similar systems), there has only been one
          URI space: that of the generated site.  If &a; wants to link to &b;, &a;
          would use
        </p>
        <source>
          &lt;link href="todo.html">todo.html&lt;link>
        </source>
        <p>
          The theoretical problem with this is that the content producer should
          not know or care how Forrest is going to render the source.  A URI
          should only <em>identify</em> a resource, not specify it's type [<link
            href="ext:semantic-linking">mail ref</link>] [<link
            href="ext:cool-uris">cool URIs</link>]. In fact, as Forrest
          typically renders to multiple output formats (HTML and PDF), links in
          one of them (here, the PDF) are going to break.
        </p>
      </section>
  
      <section id="semantic_linking">
        <title>Semantic linking</title>
        <p>
          Forrest's solution is simple: instead of &lt;link href="todo.html">, write
          &lt;link href="site:todo">, where:
        </p>
        <dl>
          <dt>site</dt>
          <dd>is a URI 'scheme'; a namespace that restricts
            the syntax and semantics of the rest of the URI [<link
              href="ext:uri-rfc">rfc2396</link>].  The semantics of 'site' are
            "this identifier locates something in the site's XML sources".</dd>
          <dt>todo</dt>
          <dd>identifies the content in <code>todo.xml</code>, by reference to a
            'node' of content declared in &s;.</dd>
        </dl>
        <p>
          We call this <em>semantic</em> linking because instead of linking to a
          physical representation (todo.html), we've linked to the 'idea' of "the
          todo file".  It doesn't matter where it physically lives; that will be
          sorted out by Forrest.
        </p>
  
        <section>
          <title>Resolving site: URIs</title>
  
          <p>
            How exactly does <code>site:todo</code> get resolved?  A full answer
            is provided in the <link href="#implementation">implementation</link>
            section.  Essentially, the <code>todo</code> part has
            <code>/site//</code> prepended, and <code>/@href</code> appended, to
            form string <code>/site//todo/@href</code>.  This is
            then used as an XPath expression in &s; identifying the string
            replacement, in this case <code>todo.html</code>.
          </p>
          <note>
            Actually, the XPath is applied to XML generated dynamically from
            d&s;.  The generated XML has @href's fully expanded ('absolutized')
            and ..'s added ('relativized') as needed.
          </note>
          <p>
            Thus by modifying the XPath prefix and suffix, just about any XML
            format can be accommodated.
          </p>
  
          <p>
            Notice that the '//' allows us any degree of specificity when linking.
            In the sample &s; above, both <code>site:new_content_type</code> and
            <code>site:about/your-project/new_content_type</code> identify the
            same node.  It is up to you to decide how specific to make links.  One
            nice benefit of link 'ambiguity' is that &s; can be reorganized
            without breaking links.  For example, 'new_content_type' currently
            identifies a node in 'your-project'.  By leaving that fact unspecified
            in <code>site:new_content_type</code>, we are free to make
            'new_content_type' its own XML file, or a node in another file, in
            another category.
          </p>
        </section>
  
        <section>
          <title>ext: URIs: linking to external URLs</title>
          <p>
            The <code>ext:</code> scheme was created partly to demonstrate the
            ease with which new schemes can be defined, and partly for practical
            use.  <code>ext:</code> URIs identify nodes in &s; below the
            &lt;external-refs&gt; node.  By convention, nodes here link to URLs
            outside the website, and are not listed in the menu generated from
            &s;.
          </p>
          <p>Here is a &s; snippet illustrating <code>external-refs</code>:</p>
          <source><![CDATA[
            <site>
              ...
              <external-refs>
                <mail-archive href="http://marc.theaimsgroup.com"/>
                <xml.apache.org href="http://xml.apache.org/">
                  <cocoon href="cocoon/">
                    <ml href="mail-lists.html"/>
                    <actions href="userdocs/concepts/actions.html"/>
                  </cocoon>
                </xml.apache.org>
                <forrest href="forrest/"/>
                <xindice href="xindice/"/>
                <fop href="fop/"/>
  
                ...
              </external-refs>
            </site>
            ]]></source>
          <p>
            As an example, &lt;link href="ext:cocoon/ml"&gt;
            generates the link <link
              href="ext:cocoon/ml">http://xml.apache.org/cocoon/mail-lists.html</link>
          </p>
          <p>
            The general rules of &s; and <code>site:</code> linking apply.
            Specifically, the @href aggregation makes defining large numbers of
            related URLs easy.
          </p>
        </section>
  
        <section>
          <title>Theory: source URIs</title>
          <p>
            <code>site:</code> URIs like <code>site:todo</code> are examples of
            <em>source</em> URIs, in contrast to the more usual
            <code>foo.html</code>-style URIs, which we here call
            <em>destination</em> URIs.  This introduces an important concept: that
            the <em>source</em> URI space exists and is independent of that of the
            generated site.  Furthermore, URIs (ie, links) are first-class objects,
            on par with XML documents, in that just as XML content is transformed,
            so are the links.  Within the source URI space, we can have all sorts of
            interesting schemes (person:, mail:, google:, java:, etc). These will
            all be translated into plain old <code>http:</code> or relative URIs
            in the destination URI space.
          </p>
        </section>
  
  
        <section>
          <title>Future schemes</title>
          <p>
            So far, <code>site:</code> and <code>ext:</code> schemes are defined.
            To give you some ideas on other things we'd like to implement (and
            we'd welcome help implementing), here are a few possibilities.
          </p>
          <table>
            <tr><td>Scheme</td><td>Example 'From'</td><td>Example 'To'</td><td>Description</td></tr>
            <tr>
              <td>java</td>
              <td>java:org.apache.proj.SomeClass</td>
              <td><code>../../apidocs/org/apache/proj/SomeClass.html</code></td>
              <td>
                Links to documentation for a Java class (typically generated by
                <code>javadoc</code>).
              </td>
            </tr>
            <tr>
              <td>mail</td>
              <td>mail::&lt;Message-Id></td>
              <td><code>http://marc.theaimsgroup.com?t=12345678</code></td>
              <td>
                Links to an email, identified by its <code>Message-Id</code>
                header. Any mail archive website could be used.
              </td>
            </tr>
            <tr>
              <td>search</td>
              <td>search:&lt;searchterm></td>
              <td><code>http://www.google.com/search?q=searchterm</code></td>
              <td>Link to set of results from a search engine</td>
            </tr>
            <tr>
              <td>person</td>
              <td>person:JT, person:JT/blog etc</td>
              <td><code>mailto:jefft&lt;at&gt;apache.org</code>,
                <code>http://www.webweavertech.com/jefft/weblog/</code>, etc:</td>
              <td>
                A <code>person:</code> scheme could be used, say, to insert an
                automatically obfuscated email address, or link to a URI in some
                way associated with that person.
              </td>
            </tr>
          </table>
          <p>
            There are even more possibilities in specific environments.  In an
            intranet, a <code>project:XYZ</code> scheme could identify company
            project pages.  In a project like <link href="ext:ant">Apache
              Ant</link>, each Task could be identified with
            <code>task:&lt;taskname&gt;</code>, eg <code>task:pathconvert</code>.
          </p>
        </section>
      </section>
  
      <section id="implementation">
        <title>Implementation</title>
        <p>
          This section describes how the menu and linking systems are currently
          implemented in Forrest.  This is primarily of interest to Forrest
          developers, and users wishing to implement their own schemes.
        </p>
  
        <section>
          <title>Concept</title>
          <p>
            The <code>site:</code> scheme and associated ideas for &s; were
            originally described in <link href="ext:linkmaps">the 'linkmap' RT
              email</link> to the forrest-dev list (RT means 'random thought'; a
            Cocoon invention).   Only section 2 has been implemented, and there is
            still significant work required to implement the full system
            described.  In particular, there is much scope for automating the
            creation of &s; (section 4).  However, what is currently implemented
            gains most of the advantages of the system.
          </p>
        </section>
  
        <section>
          <title>Cocoon foundations: Input Modules</title>
          <p>
            The implementation of <code>site:</code> linking is heavily based on
            Cocoon <link href="ext:cocoon/input-modules">Input Modules</link>, a
            little known but quite powerful aspect of Cocoon.  Input Modules are
            generic Components which simply allow you to look up a value with a
            key.  The value is generally dynamically generated, or obtained by
            querying an underlying data source.
          </p>
          <p>
            In particular, Cocoon contains an <code>XMLFileModule</code>, which
            lets one look up the value of an XML node, by interpreting the key as
            an XPath expression.  Cocoon also has a
            <code>SimpleMappingMetaModule</code>, which allows the key to be
            rewritten before it is used to look up a value.
          </p>
          <p>
            The idea for putting these together to rewrite <code>site:</code>
            links was described in <link href="ext:inputmoduletransformer">this
              thread</link>. The idea was to write a Cocoon Transformer that
            triggers on encountering &lt;link
            href="<code>scheme:address</code>"&gt;, and interprets the
            <code>scheme:address</code> internal URI as
            <code>inputmodule:key</code>.  The transformer then uses the named
            InputModule to look up the key value. The <code>scheme:address</code>
            URI is then rewritten with the found value.  This transformer was
            implemented as <link
              href="ext:linkrewritertransformer">LinkRewriterTransformer</link>.
          </p>
        </section>
  
        <section>
          <title>Implementing site: rewriting</title>
          <p>
            Using the above components, <code>site:</code> URI rewriting is
            accomplished as follows.
          </p>
          <section>
            <title>cocoon.xconf</title>
            <p>
              First, we declare an XMLFileModule called 'linkmap'.  This is going
              to provide access to the contents of &s;; for example,
              <code>linkmap:/site/about/index/@href</code> should return the value
              'index.html'.  We declare this InputModule in
              <code>WEB-INF/cocoon.xconf</code> with:
            </p>
            <source><![CDATA[
              <component-instance
                class="org.apache.cocoon.components.modules.input.XMLFileModule"
                logger="core.modules.xml" name="linkmap">
                <file src="cocoon:/linkmap"/>
                <reloadable>true</reloadable>
              </component-instance>
              ]]></source>
            <p>
              An interesting point is that we tell XMLFileModule to use
              <em>dynamically generated XML</em> as its source.  This allows us to
              transform &s; before the XPath is applied.  These transformations
              are described below.  Note that the <code>cocoon:/linkmap</code>
              specified here is a static configuration which will be overridden,
              as described below.
            </p>
            <p>
              To simplify things for the user, and to hide the structure of our
              XML, we now define a <em>mapping</em> module:
            </p>
            <source><![CDATA[
              <!-- Links to URIs within the site -->
              <component-instance
                class="org.apache.cocoon.components.modules.input.SimpleMappingMetaModule"
                logger="core.modules.mapper" name="site">
                <input-module name="linkmap"/>
                <prefix>/site//</prefix>
                <suffix>/@href</suffix>
              </component-instance>
              ]]></source>
            <p>
              This module rewrites the key, and uses it to query the
              <code>linkmap</code> module.  This means <code>site:index</code>
              is equivalent to <code>linkmap:/site//index/@href</code>.
            </p>
            <p>The <code>ext</code> module is similarly defined: </p>
            <source><![CDATA[
              <!-- Links to external URIs, as distinct from 'site' URIs -->
              <component-instance
                class="org.apache.cocoon.components.modules.input.SimpleMappingMetaModule"
                logger="core.modules.mapper" name="ext">
                <input-module name="linkmap"/>
                <prefix>/site/external-refs//</prefix>
                <suffix>/@href</suffix>
              </component-instance>
              ]]></source>
          </section>
  
          <section>
            <title>sitemap.xmap</title>
            <p>
              Now in the sitemap, we have to define the LinkRewriterTransformer,
              and insert it into any pipelines dealing with user-editable XML
              content:
            </p>
            <source><![CDATA[
              ....
              <map:transformer name="linkrewriter"
                src="org.apache.cocoon.transformation.LinkRewriterTransformer">
                <input-module name="linkmap" src="{src}" reloadable="true"/>
                <input-module name="site">
                  <input-module name="linkmap" src="{src}"
                    reloadable="true"/>
                  <prefix>/site//</prefix>
                  <suffix>/@href</suffix>
                </input-module>
              </map:transformer>
              ....
  
              <!-- Generates body HTML for files in subdirs -->
              <map:match pattern="body-**/*.xml">
                <map:generate src="content/xdocs/{1}/{2}.xml"/>
                <map:transform type="linkrewriter" src="cocoon:/{1}/linkmap"/>
                ....
              </map:match>
              ]]></source>
  
            <p>
              Why is the LinkRewriterTransformer reconfiguring the InputModules?
              Because we only know what XML to feed the XMLFileModule at request
              time.  The XML is generated by the
              <code>cocoon:/{1}/linkmap</code> pipeline, and we don't know
              <code>{1}</code> until request time.  Thus we need to effectively
              reconfigure the InputModule on every request.  Fortunately
              InputModules are designed for this. They can be configured twice:
              once 'statically' in <code>cocoon.xconf</code>, and then
              'dynamically' at the point of execution.
            </p>
            <p>
              The end result is that the source XML for sitemap request
              <code>body-community/index.xml</code> has its links rewritten by
              an XMLFileModule reading XML from
              <code>cocoon:/community/linkmap</code>.
            </p>
          </section>
          <section>
            <title>Dynamically generating a linkmap</title>
            <p>
              Why do we need this 'linkmap' pipeline generating dynamic XML from
              <code>site.xml</code>?  The reasons are described in <link
                href="ext:linkmaps">the linkmap RT</link>: we need to concatenate
              @hrefs and add ..'s to the paths, depending on which directory the
              linkee is in.  This is done with the following pipelines:
            </p>
            <source><![CDATA[
              <map:match pattern="abs-linkmap">
                <map:generate src="content/xdocs/site.xml"/>
                <map:transform src="library/xslt/absolutize-linkmap.xsl"/>
                <map:serialize type="xml"/>
              </map:match>
  
              <map:match pattern="**linkmap">
                <map:generate src="cocoon:/abs-linkmap"/>
                <map:transform src="library/xslt/relativize-linkmap.xsl">
                  <map:parameter name="path" value="{0}"/>
                </map:transform>
                <map:serialize type="xml"/>
              </map:match>
              ]]></source>
            <p>You can try these URIs out directly on a live Forrest to see what
              is going on.</p>
          </section>
        </section>
  
  
        <section id="menus_from_site">
          <title>Generating menus from site.xml</title>
          <p>
            The process of generating a HTML menu from &s; is fairly
            straightforward Cocoon work.  It is currently implemented with these
            pipelines:
          </p>
          <source><![CDATA[
            <map:resource name="book">
  
              .... <!--  Stuff for using book.xml if present  -->
  
              <!-- If no book.xml, generate it from the linkmap. -->
  
              <map:generate src="cocoon:/{dir}linkmap/{dir}"/>
              <!-- The above generates the subset of the linkmap relevant to our
              directory. -->
              <map:transform src="library/xslt/site2book.xsl"/>
              <map:call resource="skinit">
                <map:parameter name="type" value="book2menu"/>
                <map:parameter name="path" value="{path}"/>
              </map:call>
            </map:resource>
  
            .....
  
            <map:match pattern="abs-linkmap/**">
              <map:generate src="cocoon:/abs-linkmap"/>
              <map:transform type="xpath">
                <map:parameter name="include" value="//*[@href='{1}']"/>
              </map:transform>
              <map:serialize type="xml"/>
            </map:match>
  
            <map:match pattern="**linkmap/**">
              <map:generate src="cocoon:/abs-linkmap/{2}"/>
              <map:transform
                src="library/xslt/relativize-linkmap.xsl">
                <map:parameter name="path"
                  value="{1}linkmap"/>
              </map:transform>
              <map:serialize type="xml"/>
            </map:match>
            ]]></source>
  
          <p>As with linking, we need to first 'absolutize' our &s; file by
            concatenating all @hrefs.  This is done in
            <code>cocoon:/abs-linkmap</code>, shown in the previous section.
            The twist is that, for subdirectories, we only want to show the part
            of the menu relevant to that directory.  We achieve this by filtering
            out everything except the node with a specific @href value, using an
            <link href="ext:xpathtransformer">XPathTransformer</link>.  The
            <code>include</code> param is <code>//*[@href='{1}']</code>, where
            <code>{1}</code> will be replaced with the second <code>{dir}</code>
            in the line:
            <code>&lt;map:generate src="cocoon:/{dir}linkmap/{dir}"/&gt;</code>.
            This is why removing the '/{dir}' stops truncation of menus.
          </p>
          <p>
            The <code>site2book.xml</code> stylesheet generates
            <code>book.xml</code> XML, as expected by the subsequent
            <code>book2menu.xsl</code> stylesheet.  In the future, this
            intermediate format can be removed.
          </p>
        </section>
  
      </section>
  
    </body>
  </document>