You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by je...@apache.org on 2002/10/31 18:04:21 UTC
cvs commit: xml-forrest/src/documentation/content/xdocs validation.xml
jefft 2002/10/31 09:04:21
Added: src/documentation/content/xdocs validation.xml
Log:
Add doc on Forrest validation
Revision Changes Path
1.1 xml-forrest/src/documentation/content/xdocs/validation.xml
Index: validation.xml
===================================================================
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN"
"document-v11.dtd" [
<!ENTITY catalog_spec
'http://www.oasis-open.org/committees/entity/background/9401.html'>
<!ENTITY catalog_intro
'http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html'>
]>
<document>
<header>
<title>XML Validation</title>
<subtitle>DTDs, catalogs and whatnot</subtitle>
<version>1.0</version>
<authors>
<person name="Jeff Turner" email="jefft@apache.org"/>
</authors>
</header>
<body>
<section id="xml_validation">
<title>XML validation</title>
<p>
By default, Forrest will try to validate your XML before generating
HTML or a webapp from it, and fail if any XML files are not valid.
Validation can be performed manually by typing 'forrest validate' in
the project root.
</p>
<p>
For an XML file to be valid, it <em>must</em> have a DOCTYPE
declaration at the top, indicating its content type. Hence by
default, all Forrest-processed XML files must have a DOCTYPE
declaration, or the build will break.
</p>
<p>
Despite the strict default behavior, Forrest is quite flexible about
validation. It is also possible to projects to specify exactly what
files they want (and don't want) validated, through the
<code>forrest.validate.excludes</code> and
<code>forrest.validate.includes</code> properties, set in
<code>forrest.properties</code>. Each specifies a comma-separated
list of paths relative to <code>${project.xdocs-dir}</code>. For
example, to avoid validating
<code>${project.xdocs-dir}</code>/slides.xml and everything inside the
<code>${project.xdocs-dir}/manual/</code> directory, add this to
<code>forrest.properties</code>:
</p>
<source>forrest.validate.excludes=slides.xml, manual/**</source>
<p>
XML validation can also be made non-fatal by setting the following in
<code>forrest.properties</code>:
</p>
<source>forrest.validation.failonerror=false</source>
</section>
<section>
<title>Validating new XML types</title>
<p>
Forrest provides a <link href="&catalog_spec;">SGML Catalog</link>
[<link href="&catalog_intro;">tutorial</link>],
<code>xml-forrest/src/resources/schema/catalog</code>, as a means of
associating public identifiers (<code>-//APACHE//DTD Documentation
V1.1//EN</code> above) with DTDs.
If you <link href="http://localhost:8787/forrest/your-project.html#adding_new_content_type">add a new content type</link>, you
should add the DTD to <code>${project.schema-dir}/dtd/</code>, and add
an entry to the <code>${project.schema-dir}/catalog</code> file. This
section describes the details of this process.
</p>
<section>
<title>Creating or extending a DTD</title>
<p>
The main Forrest DTDs are designed to be modular and extensible, so
it is fairly easy to create a new document type that is a superset
of one from Forrest. This is what we'll demonstrate here, using our
earlier <link href="http://localhost:8787/forrest/adding_new_content_type">download format</link>
as an example. Our download format adds a group of new elements to
the standard 'documentv11' format. Our new elements are described
by the following DTD:
</p>
<source> <!ELEMENT release (downloads)>
<!ATTLIST release
version CDATA #REQUIRED
date CDATA #REQUIRED>
<!ELEMENT downloads (file*)>
<!ELEMENT file EMPTY>
<!ATTLIST file
url CDATA #REQUIRED
name CDATA #REQUIRED
size CDATA #IMPLIED>
</source>
<p>
The documentv11 entities are defined in a reusable 'module':
<code>xml-forrest/src/resources/schema/dtd/document-v11.mod</code>
The
<code>xml-forrest/src/resources/schema/dtd/document-v11.dtd</code>
file provides a full description and basic example of how to pull in
modules. In our example, our DTD reuses modules
<code>common-charents-v10.mod</code> and
<code>document-v11.mod</code>. Here is the full DTD, with
explanation to follow.
</p>
<source><!-- ===================================================================
Download Doc format
PURPOSE:
This DTD provides simple extensions on the Apache DocumentV11 format to link
to a set of downloadable files.
TYPICAL INVOCATION:
<!DOCTYPE document PUBLIC "-//Acme//DTD Download Documentation V1.0//EN"
"download-v11.dtd">
AUTHORS:
Jeff Turner <jefft@apache.org>
COPYRIGHT:
Copyright (c) 2002 The Apache Software Foundation.
Permission to copy in any form is granted provided this notice is
included in all copies. Permission to redistribute is granted
provided this file is distributed untouched in all its parts and
included files.
==================================================================== -->
<!-- =============================================================== -->
<!-- Include the Common ISO Character Entity Sets -->
<!-- =============================================================== -->
<!ENTITY % common-charents PUBLIC
"-//APACHE//ENTITIES Common Character Entity Sets V1.0//EN"
"common-charents-v10.mod">
%common-charents;
<!-- =============================================================== -->
<!-- Document -->
<!-- =============================================================== -->
<!ENTITY % document PUBLIC
"-//APACHE//ENTITIES Documentation V1.1//EN"
"document-v11.mod">
<!-- Override this entity so that 'release' is allowed below 'section' -->
<!ENTITY % local.sections "|release">
%document;
<!ELEMENT release (downloads)>
<!ATTLIST release
version CDATA #REQUIRED
date CDATA #REQUIRED>
<!ELEMENT downloads (file*)>
<!ELEMENT file EMPTY>
<!ATTLIST file
url CDATA #REQUIRED
name CDATA #REQUIRED
size CDATA #IMPLIED>
<!-- =============================================================== -->
<!-- End of DTD -->
<!-- =============================================================== -->
</source>
<p>
The <!ENTITY % ... > blocks are so-called <link href="http://www.xml.com/axml/target.html#dt-PERef">parameter
entities</link>. They are like macros, whose content will be
inserted when a parameter-entity reference, like
<code>%common-charents;</code> or <code>%document;</code>, is
inserted.
</p>
<p>
In our DTD, we first pull in the 'common-charents' entity, which
defines character symbol sets. We then define the 'document'
entity. However, before <code>%document;</code> PE reference, we
first override the 'local.section' entity. This is a hook into
document-v11.mod. By setting its value to '|release', we declare
that our <release> element is to be allowed wherever "local
sections" are used. There are 5 or so such hooks for different
areas of the document; see document-v11.dtd for more details. We
then import the %document; contents, and declare the rest of our DTD
elements.
</p>
<p>
We now have a DTD for the 'download' document type.
</p>
</section>
<section>
<title>Associating DTDs with document types</title>
<p>
Recall that our DOCTYPE declaration for our download document type
is:
</p>
<source> <!DOCTYPE document PUBLIC "-//Acme//DTD Download Documentation V1.0//EN"
"download-v11.dtd">
</source>
<p>
We only care about the quoted section after <code>PUBLIC</code>, called
the "public identifier", which globally identifies our document type.
We cannot rely on the subsequent "system identifier" part
("download-v11.dtd"), because as a relative reference it is liable to
break. The solution Forrest uses is to ignore the system id, and rely
on a mapping from the public ID to a stable DTD location, via a
Catalog file.</p>
<note>
See <link href="&catalog_intro;">this article</link> for a good
introduction to catalogs.
</note>
<p>
Forrest provides a standard catalog file,
<code>xml-forrest/src/resources/schema/catalog</code>, for document
types it provides. Projects can augment this with their own catalog
file in <code>${project.schema-dir}/catalog</code>. Here is what ours
should look like:
</p>
<source> -- OASIS TR 9401 Catalog for our project --
OVERRIDE YES
-- custom doctype --
PUBLIC "-//Acme//DTD Download Documentation V1.0//EN" "dtd/download-v11.dtd"
</source>
<p>
The format is described in <link href="http://www.oasis-open.org/committees/entity/background/9401.html">the
spec</link>, but is fairly simple. In particular, lines beginning
with PUBLIC map a public identifier to a DTD (relative to the catalog
file).
</p>
<p>
We now have a custom DTD and a catalog mapping which lets Forrest
locate the DTD. Now if we were to run 'forrest validate', our
download file would validate along with all the others.
</p>
</section>
</section>
<section>
<title>Validating in an editor</title>
<p>
If you have an XML editor that understands SGML or XML catalogs, let
it know where the Forrest catalog file is, and you will be able to
validate any Forrest XML file, regardless of location, as you edit
your files.
</p>
<section>
<title>Case study: setting up xmllint</title>
<p>
On *nix systems, one of the best XML validation tools is
<code>xmllint</code>, that comes as part of the libxml2 package. It is
very fast, can validate whole directories of files at once, and can
configured to use Forrest's catalog file for validation.
</p>
<p>
To tell xmllint where the Forrest catalog is, add the path to the catalog
file to the <code>SGML_CATALOG_FILES</code> variable. For example:
</p>
<source>export SGML_CATALOG_FILES=$SGML_CATALOG_FILES:\
/home/jeff/apache/xml/xml-forrest/src/resources/schema/catalog
</source>
<p>
Then Forrest XML files can be validated as follows:
</p>
<source> xmllint --valid --noout --catalogs *.xml
</source>
<p>
For users of the vim editor, the following .vimrc entries are useful:
</p>
<source>
au FileType xml set efm=%A%f:%l:\ %.%#error:\ %m,%-Z%p^,%-C%.%#
au FileType xml set makeprg=xmllint\ --noout\ --valid\ --catalogs\ %
</source>
</section>
</section>
</body>
</document>