You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by Steven Noels <st...@outerthought.org> on 2002/02/28 12:39:27 UTC

RE: rationalise storage of DTDs and other entities

David Crossley wrote:

> Sorry that it is long - i am trying to ensure that we look
> at all issues early. Also, i am providing background so
> that other people on forrest-dev, who are not necessarily
> familiar with Cocoon, can see where we are coming from.
> Parts of that background will have bearing on where we
> decide to store the DTDs in Forrest.

It sums it up very nicely, so the length was appropriate :-)

> I am using the term "entities" to refer to all external bits that
> are required to build an XML instance document, i.e. its DTD,
> any character entity sets that are declared by either the DTD
> or the XML instance, and potential other external entities.
>

<snip>cocoon-history</snip>

>
> Now that the entity resolver is working for Cocoon, the
> storage of DTDs could be at just one directory, probably
> webapp/resources/entities/  Anyway, this issue has not
> yet been raised on cocoon-dev.

We should test and raise if necesarry - I would stick all Forrest
related stuff in the same webapp, unless we are planning to make the
DTD's available across HTTP (as Sun is doing for their server.xml and
the like).

> Existing filesystem structure for Cocoon ...
> webapp/resources/entities/*.dtd
> src/documentation/xdocs/dtd/*.dtd
>
> --------------------------------------
> Background - Forrest
> ----------------
> The CVS for xml-forrest has recently been set up and
> is based on Krysalis Centipede. This in turn was based
> on Cocoon, so it brought with it a similar filesystem
> structure for the entities. It also brought similar duplication
> due to the still-standing issue.
>
> Meanwhile Forrest is starting to develop the next version
> of the DTDs. It has them stored at a different location, together
> with new OASIS Catalogs.
>
> By the way, i verified that the catalog entity resolver of
> Cocoon is working properly inside Forrest by raising the
> verbosity level and tweaking the document type declaration
> in index.xml and entries in the OASIS Catalog. Would someone
> on Windows please verify this too? Perhaps Ken has done
> so already for Centipede.

We should make use of the entity resolver a default, which means
cleaning up the docs inside current CVS, and provide some template docs
for each doctype.

> Existing filesystem structure for Forrest ...
> src/resources/entities/*.dtd
> src/documentation/xdocs/dtd/*.dtd
> src/resources/schemas/DTD/*.dtd
>
> --------------------------------------
> Proposed storage of external entities for Forrest
> ---------------------------------
> Here are the alternatives that i see. We may need some
> discussion before we can decide.
>
> A) under src/resources/some-dir-structure/

I would go for A)

> B) under src/documentation/xdocs/some-dir-structure/
>
> "some-dir-structure" either has sub-directories ...
> schemas/dtd/*.dtd
> schemas/entities/*.pen
>
> or it is flat ...
> entities/*.dtd
> entities/*.pen

not flat:

src/resources/schema/dtd
src/resources/schema/entities
src/resources/schema/relax

> By the way, the word "schemata" is actually the plural
> of "schema", if good grammar matters. That is why i chose
> the directory name "entities" for Cocoon - to avoid that
> issue :-)

My collaegue told me once that using plurals for directory names is
making explicit what is already implied: directories are 'made' to
contain multiple items of some kind, so the plural is superficial. Oh
well... ;-)

> I currently lean towards A, because it should be entirely
> independent of Forrest's own documentation. I also prefer
> a flat structure because there are not really all that many
> entities involved.

hm - flat is good for directories containing the same type of
'entities', which is not the case anymore - i would prefer some
structure.

>
> --------------------------------------
> Other issues
> ---------
> 1) Need to decide where to store the ISO*.pen character entity
> sets. Cocoon has them dumped in the same directory as the
> DTDs. Forrest currently has them in a separate directory.

Should stay like that.

I don't like these entities anyhow: documents should be using proper
Unicode encoding, and eventually character references instead of these
remains from the SGML-era. We should avoid entities like the plague:
http://www.textuality.com/xml/xmlSW.html makes no reference to entities
(or even DTD's anymore), and
http://www.xml.com/pub/a/2002/02/20/deviant.html.

> 2) Other projects, such as Centipede and Cocoon, will still want
> to ship a collection of external entities and an OASIS Catalog.
>
> 3) How should default System Identifiers be expressed for
> the XML instance documents of Forrest's own documentation?

"filename.dtd" unless we are planning to make them public as you
describe underneath.

> 3a) Use filesystem-based URIs with a relative pathname to
> the actual DTD, e.g. ../dtd/document-v10.dtd
> 3b) Use a filename which does not yield the DTD and just
> lets the entity resolver do the work, e.g. "document-v11.dtd"
> 3c) Use URL-based default System Identifiers,
> e.g. "http://xml.apache.org/dtd/document-v11.dtd"
> I prefer 3b because it is the easiest.

3b: enforce usage of the entity resolver

> 4) Should there be an actual DTD file located at that URL?
> We need to be careful with this, because we do not want
> to encourage crappy xml tools that do not use an entity
> resolver. Instead, such tools are wasting bandwidth by
> retrieving the DTD from xml.apache.org every time that
> the XML instance is parsed. They should resolve the
> request to a local copy.
>
> 5) We need to encourage any project that uses Forrest
> to provide proper document type declarations in their
> XML instance documents. Lead by example is best.

yes indeed

I still am quite unclear on what the difference will be between the CVS
structure and the 'unit of deployment', but am lacking some time to
properly investigate.

The main idea to commit Centipede as-is was to give us some flying
start: it should be easier now to edit and remove, instead of endless
tinkering on the mailing list. So feel free everybody to attack what we
have in CVS now ;-)

</Steven>


Re: rationalise storage of DTDs and other entities

Posted by Nicola Ken Barozzi <ba...@nicolaken.com>.
From: "Steven Noels" <st...@outerthought.org>

> David Crossley wrote:
>
> > --------------------------------------
> > Background - Forrest
> > ----------------
> > The CVS for xml-forrest has recently been set up and
> > is based on Krysalis Centipede. This in turn was based
> > on Cocoon, so it brought with it a similar filesystem
> > structure for the entities. It also brought similar duplication
> > due to the still-standing issue.
> >
> > Meanwhile Forrest is starting to develop the next version
> > of the DTDs. It has them stored at a different location, together
> > with new OASIS Catalogs.
> >
> > By the way, i verified that the catalog entity resolver of
> > Cocoon is working properly inside Forrest by raising the
> > verbosity level and tweaking the document type declaration
> > in index.xml and entries in the OASIS Catalog. Would someone
> > on Windows please verify this too? Perhaps Ken has done
> > so already for Centipede.

Yes, it works.

> We should make use of the entity resolver a default, which means
> cleaning up the docs inside current CVS, and provide some template docs
> for each doctype.

+1

> > Existing filesystem structure for Forrest ...
> > src/resources/entities/*.dtd
> > src/documentation/xdocs/dtd/*.dtd
> > src/resources/schemas/DTD/*.dtd
> >
> > --------------------------------------
> > Proposed storage of external entities for Forrest
> > ---------------------------------
> > Here are the alternatives that i see. We may need some
> > discussion before we can decide.
> >
> > A) under src/resources/some-dir-structure/
>
> I would go for A)

Me too. In a similar fashion, I've put cocoon.xconf in another dir, and
would like to see sitemap.xmap get out of the way too.

> > B) under src/documentation/xdocs/some-dir-structure/
> >
> > "some-dir-structure" either has sub-directories ...
> > schemas/dtd/*.dtd
> > schemas/entities/*.pen
> >
> > or it is flat ...
> > entities/*.dtd
> > entities/*.pen
>
> not flat:
>
> src/resources/schema/dtd
> src/resources/schema/entities
> src/resources/schema/relax

+1

> > By the way, the word "schemata" is actually the plural
> > of "schema", if good grammar matters. That is why i chose
> > the directory name "entities" for Cocoon - to avoid that
> > issue :-)
>
> My collaegue told me once that using plurals for directory names is
> making explicit what is already implied: directories are 'made' to
> contain multiple items of some kind, so the plural is superficial. Oh
> well... ;-)

When you use English words in Italian, it's good practice not to use
plurals.
For example:
one computer: un computer
many computers: molti computer

So to me, schema makes sense.
+0

> > I currently lean towards A, because it should be entirely
> > independent of Forrest's own documentation. I also prefer
> > a flat structure because there are not really all that many
> > entities involved.

I also lean towards A, but for a different reason: Forrest proper source
code should go in ./src IMHO. The "conceptual" issue is that Forrest is a
product that uses itself, in this case in documentation. This is a very
special case for Centipede, so we need to make a target that synchronizes
the documentation system with source before docs generation.

> hm - flat is good for directories containing the same type of
> 'entities', which is not the case anymore - i would prefer some
> structure.

+1

> >
> > --------------------------------------
> > Other issues
> > ---------
> > 1) Need to decide where to store the ISO*.pen character entity
> > sets. Cocoon has them dumped in the same directory as the
> > DTDs. Forrest currently has them in a separate directory.
>
> Should stay like that.

+1

> I don't like these entities anyhow: documents should be using proper
> Unicode encoding, and eventually character references instead of these
> remains from the SGML-era. We should avoid entities like the plague:
> http://www.textuality.com/xml/xmlSW.html makes no reference to entities
> (or even DTD's anymore), and
> http://www.xml.com/pub/a/2002/02/20/deviant.html.
>
> > 2) Other projects, such as Centipede and Cocoon, will still want
> > to ship a collection of external entities and an OASIS Catalog.
> >
> > 3) How should default System Identifiers be expressed for
> > the XML instance documents of Forrest's own documentation?
>
> "filename.dtd" unless we are planning to make them public as you
> describe underneath.
>
> > 3a) Use filesystem-based URIs with a relative pathname to
> > the actual DTD, e.g. ../dtd/document-v10.dtd
> > 3b) Use a filename which does not yield the DTD and just
> > lets the entity resolver do the work, e.g. "document-v11.dtd"
> > 3c) Use URL-based default System Identifiers,
> > e.g. "http://xml.apache.org/dtd/document-v11.dtd"
> > I prefer 3b because it is the easiest.
>
> 3b: enforce usage of the entity resolver

+1

> > 4) Should there be an actual DTD file located at that URL?
> > We need to be careful with this, because we do not want
> > to encourage crappy xml tools that do not use an entity
> > resolver. Instead, such tools are wasting bandwidth by
> > retrieving the DTD from xml.apache.org every time that
> > the XML instance is parsed. They should resolve the
> > request to a local copy.

Makes sense, but, correct me if I'm wrong, isn't the URI something that
should in some way describe the real location on the Internet?
As an alternative, we could use a URI with a bogus protocol (resolved by
forrest only) to eliminate this possible inconsistency (forrest://).

> > 5) We need to encourage any project that uses Forrest
> > to provide proper document type declarations in their
> > XML instance documents. Lead by example is best.
>
> yes indeed

and enforce this along with validation during the build (I'm putting it in
Cent.)

> I still am quite unclear on what the difference will be between the CVS
> structure and the 'unit of deployment', but am lacking some time to
> properly investigate.

The way I see it, Forrest could be deployed as a directory in ./tools.
Currently, in Centipede, we have ./tools/ant, ./tools/centipede and
./tools/cocoon.
In the near future, Centipede will ship with ./tools/ant, ./tools/centipede,
./tools/cocoon and ./tools/forrest.

My idea is to make the tools/dir a kind of "webapps" dir for build tools.
This is what I'm working on for next Centipede release.
Ant-dev has now the possibility of loading tasks dynamically. If I add to
that the possibility of adding targets and dependency, it's done. Instead of
.war or jar, we could have .cent . Just my two .cent ;-)

> The main idea to commit Centipede as-is was to give us some flying
> start: it should be easier now to edit and remove, instead of endless
> tinkering on the mailing list. So feel free everybody to attack what we
> have in CVS now ;-)

Yup

Just remember that all that is in the ./tool dir will change with subsequent
releases of the tools. This means that the tool that has been touched needs
to evolve, while changes in the ./src dir are indipendent.
So I humbly suggest that we discuss, in any case, changes to the ./tools/**.

--
Nicola Ken Barozzi                 barozzi@nicolaken.com
            - verba volant, scripta manent -
   (discussions get forgotten, just code remains)
---------------------------------------------------------------------