You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@cocoon.apache.org by di...@apache.org on 2001/08/28 14:52:24 UTC

cvs commit: xml-cocoon2/xdocs catalog.xml docs-book.xml site-book.xml

dims        01/08/28 05:52:24

  Modified:    .        build.xml
               webapp   sitemap.xmap
               webapp/docs/samples samples.xml
               xdocs    docs-book.xml site-book.xml
  Added:       webapp/docs/samples/catalog style.xsl test.xml testovr.xml
               webapp/resources/entities/catalog-demo catalog-demo-v10.dtd
                        override.xml testpub.xml testsys.xml
               xdocs    catalog.xml
  Log:
  Patch from David Crossley <cr...@indexgeo.com.au> for "entity catalogs - xdocs and samples"
  
  Revision  Changes    Path
  1.54      +1 -0      xml-cocoon2/build.xml
  
  Index: build.xml
  ===================================================================
  RCS file: /home/cvs/xml-cocoon2/build.xml,v
  retrieving revision 1.53
  retrieving revision 1.54
  diff -u -r1.53 -r1.54
  --- build.xml	2001/08/25 13:00:47	1.53
  +++ build.xml	2001/08/28 12:52:23	1.54
  @@ -233,6 +233,7 @@
       <filter token="date"    value="${TODAY}"/>
       <filter token="log"     value="true"/>
       <filter token="verbose" value="true"/>
  +    <filter token="install.war" value="${install.war}"/>
   
       <!-- Add filters for loading database information from database.properties file -->
       <property file="database.properties"/>
  
  
  
  1.45      +14 -0     xml-cocoon2/webapp/sitemap.xmap
  
  Index: sitemap.xmap
  ===================================================================
  RCS file: /home/cvs/xml-cocoon2/webapp/sitemap.xmap,v
  retrieving revision 1.44
  retrieving revision 1.45
  diff -u -r1.44 -r1.45
  --- sitemap.xmap	2001/08/23 17:46:19	1.44
  +++ sitemap.xmap	2001/08/28 12:52:23	1.45
  @@ -401,6 +401,20 @@
       <map:serialize/>
      </map:match>
   
  +   <!-- ==============  Catalog  ========================== -->
  +
  +   <map:match pattern="**/samples/catalog-demo">
  +    <map:generate src="docs/samples/catalog/test.xml"/>
  +    <map:transform src="docs/samples/catalog/style.xsl"/>
  +    <map:serialize type="html"/>
  +   </map:match>
  +
  +   <map:match pattern="catalog-demo">
  +    <map:generate src="docs/samples/catalog/test.xml"/>
  +    <map:transform src="docs/samples/catalog/style.xsl"/>
  +    <map:serialize type="html"/>
  +   </map:match>
  +
      <!-- ======================== C2 Docs ============================== -->
   
      <map:match pattern="documents/*">
  
  
  
  1.11      +8 -0      xml-cocoon2/webapp/docs/samples/samples.xml
  
  Index: samples.xml
  ===================================================================
  RCS file: /home/cvs/xml-cocoon2/webapp/docs/samples/samples.xml,v
  retrieving revision 1.10
  retrieving revision 1.11
  diff -u -r1.10 -r1.11
  --- samples.xml	2001/08/21 12:43:37	1.10
  +++ samples.xml	2001/08/28 12:52:23	1.11
  @@ -36,6 +36,14 @@
      </sample>
     </group>
   
  +  <group name="Entity Catalogs">
  +   <sample name="Entity resolution using catalogs" href="catalog-demo">
  +    external XML entities are resolved to local resources
  +    (ensure that you have configured your CatalogManager.properties file
  +    - see catalog.html)
  +   </sample>
  +  </group>
  +
     <group name="XML-ized web sites">
      <sample name="java.apache.org" href="sites/java.apache.org">
       This page shows a much more complex example that shows how powerful
  
  
  
  1.1                  xml-cocoon2/webapp/docs/samples/catalog/style.xsl
  
  Index: style.xsl
  ===================================================================
  <?xml version='1.0'?>
  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                  version='1.0'>
  
  <xsl:output method="html"/>
  
  <xsl:template match="catalog-demo">
  <html>
   <head>
    <title>Demonstration of entity resolution using catalogs</title>
   </head>
   <body>
    <xsl:apply-templates/>
   </body>
  </html>
  </xsl:template>
  
  <xsl:template match="section">
   <xsl:apply-templates/>
  <hr/>
  </xsl:template>
  
  <xsl:template match="para">
  <p>
   <xsl:apply-templates/>
  </p>
  </xsl:template>
  
  <xsl:template match="link">
  <a href="{@href}"><xsl:apply-templates/></a>
  </xsl:template>
  
  </xsl:stylesheet>
  
  
  
  1.1                  xml-cocoon2/webapp/docs/samples/catalog/test.xml
  
  Index: test.xml
  ===================================================================
  <?xml version="1.0"?>
  <!DOCTYPE catalog-demo PUBLIC "-//Indexgeo//DTD Catalog Demo v1.0//EN"
    "http://www.indexgeo.com.au/dtd/catalog-demo-v10.dtd"
  [
   <!ENTITY testpub PUBLIC "-//Arbortext//TEXT Test Public Identifier//EN"
     "bogus-system-identifier.xml">
   <!ENTITY testsys SYSTEM "urn:x-arbortext:test-system-identifier">
   <!ENTITY testovr PUBLIC "-//Arbortext//TEXT Test Override//EN"
     "testovr.xml">
   <!ENTITY % ISOnum PUBLIC
     "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML"
     "ISOnum.pen">
   %ISOnum;
   <!ENTITY note "Note:">
  ]>
  
  <catalog-demo>
   <section>
    <para>This sample application demonstrates the use of catalogs for
     entity resolution. &note; see the Apache Cocoon documentation
     <link href="/cocoon/documents/catalog.html">Entity resolution with
     catalogs</link> for the full background and explanation, and the XML
     source of this document (test.xml).
    </para>
  
    <para>This top-level XML instance document is test.xml - it declares
     three other XML sub-documents as external entities and then includes
     them in the sections below. The real system identifiers will be looked
     up in the catalog, to resolve the actual location of the resource.
    </para>
  
    <para>The Document Type Definition (DTD) is declared using both a public
     identifier and a system identifier. The system identifier for the DTD is
     a network-based resource (which is deliberately non-existent). However,
     the catalog overrides that remote DTD to instead use a copy from the
     local filesystem at the location defined by the catalog entry. Note that
     it is via the use of a public identifier that we gain this power.
    </para>
  
    <para>The internal DTD subset of the top-level document instance goes on
     to declare the three external sub-document entities using various means.
     It also declares and includes the ISOnum set of character entities,
     so that we can use entities like &amp;frac12; (to represent &frac12;).
     Finally the internal DTD subset declares an internal general entity
     for &quot;note&quot;.
    </para>
   </section>
  
   <section>
    <para>testpub ... this entity is declared with a PUBLIC identifier and a
     bogus system identifier (which will be overridden by the catalog)
    </para>
    &testpub;
   </section>
  
   <section>
    <para>testsys ... this entity is declared with a SYSTEM identifier
     (which will be resolved by the catalog)
    </para>
    &testsys;
   </section>
  
   <section>
    <para>testovr ... is declared with a PUBLIC identifier and a system
     identifier (the catalog is set to not override this one, so the
     declared system identifier is used)
    </para>
    &testovr;
   </section>
  
  </catalog-demo>
  
  
  
  1.1                  xml-cocoon2/webapp/docs/samples/catalog/testovr.xml
  
  Index: testovr.xml
  ===================================================================
  <para>&note; This paragraph is automatically included from the
   testovr.xml external file.
   The location of this entity was not resolved by the catalog, because
   there is no matching catalog entry for its public identifier or its
   system identifier. So the declared system identifier is used,
   i.e. the file is retrieved relative to the top-level document.
  </para>
  
  
  
  1.1                  xml-cocoon2/webapp/resources/entities/catalog-demo/catalog-demo-v10.dtd
  
  Index: catalog-demo-v10.dtd
  ===================================================================
  <!--
  This is the Document Type Definition for the Apache Cocoon sample
  demonstration "catalog-demo" which explains entity resolution
  using catalogs. See the Apache Cocoon documentation
  "Entity resolution with catalogs" (catalog.html).
  
  Version 1.0 2001-08-09
  -->
  
  <!ELEMENT catalog-demo (section+)>
  <!ELEMENT link (#PCDATA)>
  <!ATTLIST link href CDATA #IMPLIED>
  <!ELEMENT para (#PCDATA | link)*>
  <!ELEMENT section (para+)>
  
  
  
  1.1                  xml-cocoon2/webapp/resources/entities/catalog-demo/override.xml
  
  Index: override.xml
  ===================================================================
  <para>&note; This is content from the override.xml external file.
   This content will not actually be included, because the catalog
   was set with OVERRIDE NO for this public identifier.
  </para>
  
  
  
  1.1                  xml-cocoon2/webapp/resources/entities/catalog-demo/testpub.xml
  
  Index: testpub.xml
  ===================================================================
  <para>&note; This paragraph is automatically included from the
   testpub.xml external file.
   The entity declaration deliberately used a non-existent file
   as the system identifier. The catalog then used the declared
   public identifer to resolve to a specific location on the local
   filesystem.
  </para>
  
  
  
  1.1                  xml-cocoon2/webapp/resources/entities/catalog-demo/testsys.xml
  
  Index: testsys.xml
  ===================================================================
  <para>&note; This paragraph is automatically included from the
   testsys.xml external file.
   The declared SYSTEM identifier was resolved by the catalog to a
   specific location on the local filesystem.
  </para>
  
  
  
  1.25      +1 -0      xml-cocoon2/xdocs/docs-book.xml
  
  Index: docs-book.xml
  ===================================================================
  RCS file: /home/cvs/xml-cocoon2/xdocs/docs-book.xml,v
  retrieving revision 1.24
  retrieving revision 1.25
  diff -u -r1.24 -r1.25
  --- docs-book.xml	2001/08/23 12:54:58	1.24
  +++ docs-book.xml	2001/08/28 12:52:24	1.25
  @@ -68,6 +68,7 @@
     <page id="caching" label="Caching" source="caching.xml"/>
     <page id="mrustore" label="MRU Store" source="mrustore.xml"/>
     <page id="sessions" label="Sessions" source="sessions.xml"/>
  +  <page id="catalog" label="Entity Catalogs" source="catalog.xml"/>
     <page id="datasources" label="Using Databases" source="datasources.xml"/>
     <page id="extending" label="Extending C2" source="extending.xml"/>
     <page id="avalon" label="Avalon" source="avalon.xml"/>
  
  
  
  1.27      +1 -0      xml-cocoon2/xdocs/site-book.xml
  
  Index: site-book.xml
  ===================================================================
  RCS file: /home/cvs/xml-cocoon2/xdocs/site-book.xml,v
  retrieving revision 1.26
  retrieving revision 1.27
  diff -u -r1.26 -r1.27
  --- site-book.xml	2001/08/20 16:14:01	1.26
  +++ site-book.xml	2001/08/28 12:52:24	1.27
  @@ -70,6 +70,7 @@
     <page id="caching" label="Caching" source="caching.xml"/>
     <page id="mrustore" label="MRU Store" source="mrustore.xml"/>
     <page id="sessions" label="Sessions" source="sessions.xml"/>
  +  <page id="catalog" label="Entity Catalogs" source="catalog.xml"/>
     <page id="datasources" label="Using Databases" source="datasources.xml"/>
     <page id="extending" label="Extending C2" source="extending.xml"/>
     <page id="avalon" label="Avalon" source="avalon.xml"/>
  
  
  
  1.1                  xml-cocoon2/xdocs/catalog.xml
  
  Index: catalog.xml
  ===================================================================
  <?xml version="1.0"?>
  
  <!DOCTYPE document SYSTEM "dtd/document-v10.dtd">
  
  <document>
   <header>
    <title>Entity resolution with catalogs</title>
    <subtitle>Resolve entities to local or other resources</subtitle>
  <!-- ??? what use is subtitle - it is not displayed -->
    <version>0.6</version> 
    <type>Technical document</type> 
    <authors>
     <person name="David Crossley" email="crossley@indexgeo.com.au"/>
    </authors>
   </header>
  
   <body>
   <s1 title="Introduction">
    <p>
     @docname@ has the capability to utilise an entity resolution mechanism. This
     assists with entity management and also reduces the necessity for expensive
     and failure-prone network retrieval of the required resources (e.g. DTDs,
     character entity sets, XML sub-documents).
    </p>
  
  <note>To enable catalog support, you need to edit one line of a simple
   properties file (<link href="#imp">discussed below</link>).
   </note>
   </s1>
  
   <s1 title="Overview">
    <p>
     "Entities" represent the physical structure of an XML instance document, whereas "elements" represent the logical structure. The complete entity structure of the document defines which pieces need to be incorporated, so as to build the final document. Those entities are objects from some accessible place, e.g. local file system, local network, remote network, generated from a database. Example entities are: DTDs, XML sub-documents, sets of character entities to represent symbols and other glyphs, image files.
    </p>
  
    <p>
     So how are you going to define the accessible location of all those pieces? How will you ensure that those resources are reliably available? Entity resolution catalogs to the rescue. These are simple standards-based plain-text files to map public identifiers and system identifiers to local or other resources.
    </p>
  
    <p>
     Do you wonder why we cannot use the sitemap to resolve these resources?
     This is because the resolution of all entities that compose the XML
     document is under the direct control of the guts of the parser and the XML
     structure. The parser has no choice - it must incorporate all of the defined    pieces. If it cannot retrieve them, then it is broken and reports an error.
    </p>
  
    <p>
     With powerful catalog support there are no such problems. This document
     provides the following sections to explain @docname@ capability for
     resolving entities ...
    </p>
  
    <ul>
     <li>
      <link href="#background">Background</link>
      - explains the need, explains some terminology, describes the solution
     </li>
     <li>
      <link href="#demo1">Demonstration #1</link>
       - explains a remote resource and how it gets resolved
     </li>
     <li>
      <link href="#cat">Catalogs overview</link>
       - briefly explains how catalogs resolve entity declarations
     </li>
     <li>
      <link href="#demo2">Demonstration #2</link>
       - explains more detailed need and use of catalogs
     </li>
     <li>
      <link href="#imp">Implementation</link>
       - describes how support for catalogs is added to @doctitle@ and
       provides the few configuration steps
     </li>
     <li>
      <link href="#dev">Development notes</link>
       - basic catalog support is now in the HEAD branch - needs minor tweaks
     </li>
     <li>
      <link href="#summ">Summary</link>
     </li>
     <li>
      <link href="#info">Further information</link>
       - links to some useful resources
     </li>
    </ul>
   </s1>
  
   <anchor id="background"/>
   <s1 title="Background">
    <p>
     The following article eloquently describes the need for all
  parsers and XML frameworks to be capable of utilising entity
  resolvers.
     "<link href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">If You Can Name It, You Can Claim It!</link>"
     by Norman Walsh. Please read that document, then return here to apply entity catalogs to @docname@.
    </p>
  
    <p>
     (Note: That article (and Java classes) evolved to become the Sun <code>resolver.zip</code> Java package that has been added to @docname@ - a more recent version of the article is available with the Sun download (see below). The API javadocs from your build have further information. However, you do not need to know the gory details to understand catalogs and configure them.)
    </p>
   </s1>
  
   <anchor id="demo1"/>
   <s1 title="Demonstration #1">
    <p>
     This snippet from an XML instance shows the Document Type Declaration. Notice that it declares its ruleset, the Document Type Definition (DTD), as an external entity. Notice also that the resource is network-based.
    </p>
  
  <source><![CDATA[
  <?xml version="1.0"?>
  <!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V4.1.2.5//EN"
    "http://www.oasis-open.org/docbook/xml/simple/4.1.2.5/sdocbook.dtd"
  <article>
   ... content goes here
  </article>
  ]]></source>
  
    <p>
     Now consider what will happen when @docname@ tries to process this XML instance. Whether you have set validation=yes or not, the parser will still want to resolve all of the entities that are required by the XML instance (i.e. the DTD and any other entities that the DTD might declare). So it will happily trundle across the network to get them. It will do this every time that the document is processed. This is obviously a needless overhead. Worse still, what happens if that host is down or the network is congested. Additionally, if your @docname@ is an off-line server then it is always broken because it cannot retrieve the network-based resources.
    </p>
   </s1>
  
   <anchor id="cat"/>
   <s1 title="Catalogs overview">
    <p>
     As the Walsh document explained, the secrets to entity resolution are the public identifiers, system identifiers, and the catalog to map between them. Here we provide an overview and show an example catalog which we will then use with the
     <link href="#demo2">Demonstration #2</link> below.
    </p>
  
    <s2 title="External entity declarations">
     <p>
      To define an external entity in an XML instance document, you must 
      provide an external declaration consisting of at least a
      <strong>system identifier</strong> and optionally a 
      <strong>public identifier</strong>. The system identifier defines the
      physical location of the external entity. The public identifier is a
      unique symbolic name that can be used to map to a certain physical location.
      Note that if you provide both a public and a system identifier, then the
      public identifier is listed first and the system identifier is not 
      preceded by the keyword <code>SYSTEM</code>.
      Here are four separate examples ...
     </p>
  
  <source><![CDATA[
  <!ENTITY pic SYSTEM "images/pic.gif" NDATA gif>
  <!ENTITY % ISOnum PUBLIC
    "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML" "ISOnum.pen">
  <!DOCTYPE document SYSTEM "dtd/document-v10.dtd">
  <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1//EN"
    "http://www.oasis-open.org/docbook/xml/4.1/docbookx.dtd">
  ]]></source>
  
  <note>TODO: briefly explain each of those declarations</note>
  
    <p>
     (In your XML instance document you would include those entities like
     this ... <code>%ISOnum;</code>)
    </p>
  
    <p>
     None of those system identifiers looks reliable or easily managed.
     Use a catalog to make them so.
    </p>
    </s2>
  
    <s2 title="Simple example catalog">
     <p>
      The <code>catalog</code> maps public identifiers to their corresponding
      physical locations. The catalog entries in an OASIS catalog are a simple
      whitespace-delimited format.
      (The <link href="#info">specification</link> fully defines the format.) 
      There about a dozen different types of catalog entry - two important
      ones are:
     </p>
  
     <ul>
      <li><strong>PUBLIC</strong> <code>publicId systemId</code>
       <br/>- maps the public identifier <code>publicId</code> to the system
       identifier <code>systemId</code>
      </li>
      <li><strong>SYSTEM</strong> <code>systemId otherSystemId</code>
       <br/>- maps the system identifier <code>systemId</code> to the alternate
       system identifier <code>otherSystemId</code>
      </li>
     </ul>
  
  <source><![CDATA[
  -- this is the default OASIS catalog for Apache Cocoon --
  
  OVERRIDE YES
  
  -- ISO public identifiers for sets of character entities --
  PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//XML"
         "ISOlat1.pen"
  PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN//XML"
         "ISOlat1.pen"
  PUBLIC "ISO 9573-15:1993//ENTITIES Greek Letters//EN//XML"
         "ISOgrk1.pen"
  PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN//XML"
         "ISOpub.pen"
  PUBLIC "ISO 8879:1986//ENTITIES General Technical//EN//XML"
         "ISOtech.pen"
  PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML"
         "ISOnum.pen"
  
  -- these entries are used for the catalog-demo sample application --
  OVERRIDE NO
  PUBLIC "-//Arbortext//TEXT Test Override//EN"
         "catalog-demo/override.xml"
  OVERRIDE YES
  PUBLIC "-//Arbortext//TEXT Test Public Identifier//EN"
         "catalog-demo/testpub.xml"
  SYSTEM "urn:x-arbortext:test-system-identifier"
         "catalog-demo/testsys.xml"
  PUBLIC "-//Indexgeo//DTD Catalog Demo v1.0//EN"
         "catalog-demo/catalog-demo-v10.dtd"
  -- end of entries for the catalog-demo sample application --
  ]]></source>
  
     <p>
      System identifiers can use full pathnames, filenames, relative pathnames,
      or URLs - if it is just a filename or a relative pathname, then the 
      entity resolver will look for the resource relative to the location of
      the catalog.
     </p>
    </s2>
   </s1>
  
   <anchor id="demo2"/>
   <s1 title="Demonstration #2">
  
    <p>
     See catalogs in action with the sample
     <link href="samples/catalog-demo">catalog-demo</link>. The demonstration
     intends to be self-documenting. The top-level XML instance describes its
     role, and each included external entity reports how it came into being.
     This example builds upon the example provided by the Walsh article.
     (Tip: To see the error message that would result from not using a catalog,
     simply rename the properties file before starting @docname@.)
    </p>
  
  <note>TODO: ensure that the link to samples works OK</note>
  
    <p>Here is the source for the top-level XML instance document
     <code>test.xml</code> ...
    </p>
  
  <source><![CDATA[
  <?xml version="1.0"?>
  <!DOCTYPE catalog-demo PUBLIC "-//Indexgeo//DTD Catalog Demo v1.0//EN"
    "http://www.indexgeo.com.au/dtd/catalog-demo-v10.dtd"
  [
   <!ENTITY testpub PUBLIC "-//Arbortext//TEXT Test Public Identifier//EN"
     "bogus-system-identifier.xml">
   <!ENTITY testsys SYSTEM "urn:x-arbortext:test-system-identifier">
   <!ENTITY testovr PUBLIC "-//Arbortext//TEXT Test Override//EN"
     "testovr.xml">
   <!ENTITY % ISOnum PUBLIC
     "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML"
     "ISOnum.pen">
   %ISOnum;
   <!ENTITY note "Note:">
  ]>
  
  <catalog-demo>
   <section>
    <para>This sample application demonstrates the use of catalogs for
     entity resolution. &note; see the Apache Cocoon documentation
     <link href="/cocoon/documents/catalog.html">Entity resolution with
     catalogs</link> for the full background and explanation, and the XML
     source of this document (test.xml).
    </para>
  
    <para>This top-level XML instance document is test.xml - it declares
     three other XML sub-documents as external entities and then includes
     them in the sections below. The real system identifiers will be looked
     up in the catalog, to resolve the actual location of the resource.
    </para>
  
    <para>The Document Type Definition (DTD) is declared using both a public
     identifier and a system identifier. The system identifier for the DTD is
     a network-based resource (which is deliberately non-existent). However,
     the catalog overrides that remote DTD to instead use a copy from the
     local filesystem at the location defined by the catalog entry. Note that
     it is via the use of a public identifier that we gain this power.
    </para>
  
    <para>The internal DTD subset of the top-level document instance goes on
     to declare the three external sub-document entities using various means.
     It also declares and includes the ISOnum set of character entities,
     so that we can use entities like &amp;frac12; (to represent &frac12;).
     Finally the internal DTD subset declares an internal general entity
     for &quot;note&quot;.
    </para>
   </section>
  
   <section>
    <para>testpub ... this entity is declared with a PUBLIC identifier and a
     bogus system identifier (which will be overridden by the catalog)
    </para>
    &testpub;
   </section>
  
   <section>
    <para>testsys ... this entity is declared with a SYSTEM identifier
     (which will be resolved by the catalog)
    </para>
    &testsys;
   </section>
  
   <section>
    <para>testovr ... is declared with a PUBLIC identifier and a system
     identifier (the catalog is set to not override this one, so the
     declared system identifier is used)
    </para>
    &testovr;
   </section>
  
  </catalog-demo>
  ]]></source>
  
    <p>
     Here is the source for one of the included sub-document external entities
     <code>testpub.xml</code> ...
    </p>
  
  <source><![CDATA[
  <para>&note; This paragraph is automatically included from the
   testpub.xml external file.
   The entity declaration deliberately used a non-existent file
   as the system identifier. The catalog then used the declared
   public identifer to resolve to a specific location on the local
   filesystem.
  </para>
  ]]></source>
   </s1>
  
   <anchor id="imp"/>
   <s1 title="Implementation">
    <p>
     The SAX <code>Parser</code> interface provides an <code>entityResolver</code> hook to allow an application to resolve the external entities. The Sun Microsystems Java code for "<code>resolver.jar</code>" provides a CatalogManager. This is already incorporated into @doctitle@ - all that you need to do, is to make the 
  "<code>CatalogManager.properties</code>" file available on your classpath.
    </p>
  
    <ul>
     <li>An example annotated <code>CatalogManager.properties</code> file is 
      shipped with @doctitle@
  (<code>webapps/cocoon/resources/entities/CatalogManager.properties</code>)
     </li>
     <li>A default catalog and some base entities (e.g. ISO*.pen character
      entity sets) are available at 
      <code>webapps/cocoon/resources/entities/</code> ... the example properties
      file declares this default catalog.
     </li>
     <li>You will need to edit the properties file to provide the full pathname
      to the default catalog (and add your own too). [see dev note below]
     </li>
     <li>Make your properties file available on your classpath before starting
      the servlet engine.</li>
    </ul>
  
    <note>TODO: Later we will figure out how to load the default catalog automatically with pathname from cocoon config, thereby leaving the properties file for loading local catalogs.
    </note>
  
    <note>TODO: We need to explain the properties file here in doco (annotation helps for now) ... full documentation is avaiable with the Sun download.
    </note>
   </s1>
  
   <anchor id="dev"/>
   <s1 title="Development notes">
  
    <p>
     We are current using the tried and true catalog format - plain-text files
     OASIS Catalogs (TR 9401:1995 Entity Management). XML-based catalogs can also be used. However, the standard is not yet settled. OASIS TR9401 catalogs will suffice. See note below.
    </p>
  
    <p>
     Assistance is required with the following development issues ...
    </p>
  
    <ul>
     <li>4) ? Is there any negative impact on performance? I can see one extra
      file read per parse - does that matter? Perhaps there are mostly
      performance improvements instead.
     </li>
    </ul>
  
    <p>
     Platform testing so far ...
    </p>
  
    <ol>
     <li>Linux Red Hat 7.1, java.vm.version=Blackdown-1.3.1-FCS,
      Tomcat 3.2.2 ... OK</li>
     <li>Macintosh ... looking for a test platform</li>
     <li>Windows ... looking for a test platform</li>
     <li>Other UNIX ... looking for a test platform</li>
     <li>Other JDK versions ... looking for a test platform</li>
    </ol>
  
    <p>
     Some core @docname@ FIXME notes can be addressed by catalog ...
    </p>
  
    <ul>
     <li>the first FIXME note in document-1.0.dtd re how to include
      entities without hardwiring
     </li>
     <li>there are various other hard-coded pathnames to XML resources
     </li>
     <li>this needs further investigation after basic catalog support is
      implemented
     </li>
    </ul>
  
    <p>
     Some relevant past discussion on @docname@ mailing lists ...
    </p>
  
    <ul>
     <li>
      <link href="http://mailman.real-time.com/pipermail/cocoon-devel/2000-August/000940.html">Re: DTD PUBLIC ID resolution</link>
  Fri, 04 Aug 2000 09:20:15 +1000
      <br/>
      <link href="http://mailman.real-time.com/pipermail/cocoon-devel/2000-August/000964.html">Re: DTD PUBLIC ID resolution</link>
  Sat, 05 Aug 2000 20:58:50 +0200
     </li>
  
     <li>
      <link href="http://mailman.real-time.com/pipermail/cocoon-devel/2001-May/thread.html#7236">[Fwd: Re: C2: Sitemaps and DTD's]</link>
  Thu, 03 May 2001 09:41:24 -0400
     </li>
  
     <li>
      <link href="http://mailman.real-time.com/pipermail/cocoon-devel/2001-August/thread.html#10365">proposal: entity resolution capability</link>
  Fri, 10 Aug 2001 16:04:27 +1000
     </li>
    </ul>
  
    <p>
     Other notes that still need to be added in to this document somewhere ...
    </p>
  
    <ul>
     <li>there has been a recent flood of XML tools - unfortunately, many do not
      implement entity resolution (other than by brute-force retrieval), so
      those tools are crippled and cannot be used for serious XML processing
     </li>
     <li>OASIS Catalogs (TR 9401:1995 Entity Management) are plain-text files 
      with a simple delimited format. There is also a new standard being
      developed for XML Catalogs, using an xml-based structured plain-text file
      (gee :-). Links to both standards are provided below. Both catalog formats
      can be currently used with this entity resolver.
     </li>
     <li>if there is no mapping for an identifier in the catalog (or in any
      sub-ordinate catalogs), then @docname@ will carry on to retrieve the
      resource using the declared system identifier
     </li>
    </ul>
   </s1>
  
   <anchor id="summ"/>
   <s1 title="Summary">
    <p>
     Most XML documents that we would want to serve with @docname@ are already in existence in another information system. The XML document instances have a declaration of their DTD Document Type Definition as an external file. This external DTD also includes entity sets such as ISOnum, ISOlat1, etc. Also the DTD declaration has a Formal Public Identifier and a System Identifier which points to a remote URL. These XML instance documents cannot be changed.
    </p>
  
    <p>
     Entity management is effected by providing a standards-based mechanism to resolve public identifiers and system identifiers to local filenames or other identifiers or even to other remote network resources. So references to external DTDs, sets of character entities such as mathematical symbols, fragments of XML documents, complete sub-documents, non-xml data chunks (like images), etc. can all be centrally managed and resolved locally.
    </p>
   </s1>
  
   <anchor id="info"/>
   <s1 title="Further information">
    <p>
     Here are some links to documents which extol entity management:
    </p>
  
    <ul>
     <li><link href="http://www.oasis-open.org/committees/entity/">OASIS Entity
      Resolution Technical Committee</link> - see especially the
      <link href="http://www.oasis-open.org/specs/a401.html">specification for OASIS Catalogs</link> (TR 9401:1995 Entity Management)
      and the 
      <link href="http://www.oasis-open.org/committees/entity/spec.html">specification for XML Catalogs</link>
     </li>
     <li><link href="http://www.oasis-open.org/cover/topics.html#entities">SGML/XML Special Topics: Entity Sets and Entity Management</link>
      at the
      <link href="http://www.oasis-open.org/cover/">XML Cover Pages</link></li>
     <li><link href="http://www.oasis-open.org/cover/topics.html#fpi-fsi">SGML/XML Special Topics: Catalogs, Formal Public Identifiers, Formal System Identifiers</link>
      at the
      <link href="http://www.oasis-open.org/cover/">XML Cover Pages</link></li>
     <li>Arbortext column by Norm Walsh
      <link href="http://www.arbortext.com/Think_Tank/think_tank.html">Standard
      Deviations from Norm</link>
      <br/> - Issue Three
      <link href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">If You Can Name It, You Can Claim It!</link></li>
     <li>
      <link href="http://www.sun.com/xml/developers/resolver/">XML Entity and URI Resolvers Java classes</link> and evolution of the Arbortext article.
     </li>
     <li>XML-Deviant article 2000-11-29
        <link href="http://www.xml.com/pub/a/2000/11/29/deviant.html">What's in a
  Name?</link></li>
     <li>Organization for the Advancement of Structured Information Standards
      (<link href="http://www.oasis-open.org/">OASIS</link>)</li>
    </ul>
   </s1>
  
   </body>
  </document>
  
  
  

----------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          cocoon-cvs-unsubscribe@xml.apache.org
For additional commands, e-mail: cocoon-cvs-help@xml.apache.org