You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@forrest.apache.org by David Crossley <cr...@indexgeo.com.au> on 2002/03/01 08:24:44 UTC

rationalise storage of DTDs and other entities

One of the issues that Steven touched in another
thread is where to store the DTDs and other entities
so that they are readily available to all XML instance
documents. We also want to only maintain one copy
of these entities.

Sorry that it is long - i am trying to ensure that we look
at all issues early. Also, i am providing background so
that other people on forrest-dev, who are not necessarily
familiar with Cocoon, can see where we are coming from.
Parts of that background will have bearing on where we
decide to store the DTDs in Forrest.

I am using the term "entities" to refer to all external bits that
are required to build an XML instance document, i.e. its DTD,
any character entity sets that are declared by either the DTD
or the XML instance, and potential other external entities.

--------------------------------------
Background - Cocoon
----------------
Cocoon originally had its DTDs at xdocs/dtd/*.dtd
This was when Cocoon had a flat directory structure.
The DTDs were conveniently directly underneath all
xdocs/*.xml and their document type declarations used
a basic default System Identifier, e.g. dtd/document-v10.dtd

Then the Cocoon xdocs were re-organised to have
a hierarchy, i.e. sub-directories under xdocs/
That meant that the default System Identifiers needed to
start using tricks with ../../ to refer to their DTD. Messy.

Additionally there were other documents that were outside
the xdocs/ directory, e.g. changes.xml at the top-level.
These needed default System Identifiers with hard-coded
pathnames. Even more messy.

At around the same time the Entity Catalog resolver support
was added to Cocoon [1]. This allowed DTDs and entity sets
to be placed in a centralised location. The XML instance
documents could declare their Public Identifiers and the
entity resolver could ignore the default System Identifiers
and locate the relevant DTDs via their Public Identifiers.

We decided to put the DTDs and entity sets together with
other resources at webapp/resources/entities/
However, we needed to leave a copy of the DTDs at their
original location under xdocs/dtd/ as a belt-and-braces
solution while the entity resolver capability was being
developed. In this way the entity resolver could fail and
yet the parser could still fall-back to using the hard-coded
System Identifiers.

Now that the entity resolver is working for Cocoon, the
storage of DTDs could be at just one directory, probably
webapp/resources/entities/  Anyway, this issue has not
yet been raised on cocoon-dev.

Existing filesystem structure for Cocoon ...
webapp/resources/entities/*.dtd
src/documentation/xdocs/dtd/*.dtd

--------------------------------------
Background - Forrest
----------------
The CVS for xml-forrest has recently been set up and
is based on Krysalis Centipede. This in turn was based
on Cocoon, so it brought with it a similar filesystem
structure for the entities. It also brought similar duplication
due to the still-standing issue.

Meanwhile Forrest is starting to develop the next version
of the DTDs. It has them stored at a different location, together
with new OASIS Catalogs.

By the way, i verified that the catalog entity resolver of
Cocoon is working properly inside Forrest by raising the
verbosity level and tweaking the document type declaration
in index.xml and entries in the OASIS Catalog. Would someone
on Windows please verify this too? Perhaps Ken has done
so already for Centipede.

Existing filesystem structure for Forrest ...
src/resources/entities/*.dtd
src/documentation/xdocs/dtd/*.dtd
src/resources/schemas/DTD/*.dtd

--------------------------------------
Proposed storage of external entities for Forrest
---------------------------------
Here are the alternatives that i see. We may need some
discussion before we can decide.

A) under src/resources/some-dir-structure/

B) under src/documentation/xdocs/some-dir-structure/

"some-dir-structure" either has sub-directories ...
schemas/dtd/*.dtd
schemas/entities/*.pen

or it is flat ...
entities/*.dtd
entities/*.pen

By the way, the word "schemata" is actually the plural
of "schema", if good grammar matters. That is why i chose
the directory name "entities" for Cocoon - to avoid that
issue :-)

I currently lean towards A, because it should be entirely
independent of Forrest's own documentation. I also prefer
a flat structure because there are not really all that many
entities involved.

--------------------------------------
Other issues 
---------
1) Need to decide where to store the ISO*.pen character entity
sets. Cocoon has them dumped in the same directory as the
DTDs. Forrest currently has them in a separate directory.

2) Other projects, such as Centipede and Cocoon, will still want
to ship a collection of external entities and an OASIS Catalog.

3) How should default System Identifiers be expressed for
the XML instance documents of Forrest's own documentation?
3a) Use filesystem-based URIs with a relative pathname to
the actual DTD, e.g. ../dtd/document-v10.dtd
3b) Use a filename which does not yield the DTD and just
lets the entity resolver do the work, e.g. "document-v11.dtd"
3c) Use URL-based default System Identifiers,
e.g. "http://xml.apache.org/dtd/document-v11.dtd"
I prefer 3b because it is the easiest.

4) Should there be an actual DTD file located at that URL?
We need to be careful with this, because we do not want
to encourage crappy xml tools that do not use an entity
resolver. Instead, such tools are wasting bandwidth by
retrieving the DTD from xml.apache.org every time that
the XML instance is parsed. They should resolve the
request to a local copy.

5) We need to encourage any project that uses Forrest
to provide proper document type declarations in their
XML instance documents. Lead by example is best.

--------------------------------------
[1] Enitity resolution with catalogs
http://xml.apache.org/cocoon/userdocs/concepts/catalog.html

Re: rationalise storage of DTDs and other entities

Posted by Nicola Ken Barozzi <ba...@nicolaken.com>.
From: "Steven Noels" <st...@outerthought.org>

> Nicola and David wrote:
>
> >
> > From: "David Crossley" <cr...@indexgeo.com.au>
> > > I have left the old directory src/resources/schemas/
> > > in place for a little while because i am wondering if
> > > anyone had any outstanding commits for the v1.1 DTDs.
>
> > IMHO if you axe it now it will cause less confusion.
>
> What we also could do is import the old catalog for some time (there's
> no depreciation method in XML however). See my uncommitted .zip.

Since we are starting from scratch, we have no need to deprecate it.
But we will need them for the xsl that will change the old docs in the new
format.

> > Now for the layout of the docs: is there somone currently
> > working on it?
>
> Not here.

Ok, I'll do a first version.

BTW, any comments on the current layout proposal?

--
Nicola Ken Barozzi                 barozzi@nicolaken.com
            - verba volant, scripta manent -
   (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Re: rationalise storage of DTDs and other entities

Posted by Nicola Ken Barozzi <ba...@nicolaken.com>.
From: "David Crossley" <cr...@indexgeo.com.au>

> I suppose that the looming issue is going to be how to
> keep copies of these schema in sync between various
> projects: Forrest, Cocoon, Centipede, etc.

Centipede's are not needed since we are going to use Forrest to make its own
docs (see [PATCH] New fdoc build target).
Cocoon's is not our problem.
We just need Forrest's and those of the projects that we are going to
transform.

--
Nicola Ken Barozzi                 barozzi@nicolaken.com
            - verba volant, scripta manent -
   (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Re: rationalise storage of DTDs and other entities

Posted by David Crossley <cr...@indexgeo.com.au>.
Steven Noels wrote:
> Nicola and David wrote:
> > David Crossley wrote:
> > > I have now centralised the location of all the external entities,
> > > as we discussed. They have been amalgamated at
> > > src/resources/schema/
> >
> > Good. :-)
> 
> Well, our diffs are pretty similar - hehe. I have found my Australian
> alter ego :-)

Reporting from the antipodes ... i checked your diffs
and it looks like we have it all in CVS now. There is a
different directory name at schema/relax/ instead of
schema/relaxng/ ... do you want to change it to the latter?

I see that you did the same as me with the initial xdocs/*.xml
i.e. declare them to be document-v11.dtd even though the
<s1> element is now gone. We can catch up later - there
is an associated stylesheet job too. I did not feel like getting
involved with that at this stage. 

> > > Ken, some of these changes may be relevant for Centipede.
> >
> > My rule of the thumb is: if someone touches ./tools, it's a
> > Centipede update right away.
> > All else can wait consolidation...
> > ...looking at new code...
> > Oh, you you are right: it will have to be put in Centipede also.
> > Thanks.
> 
> Yep.
>
> > > I have also removed the xdocs/dtd/*
> > > So now we are reliant on the entity resolver to find all
> > > external entities. It works for me on Linux. How goes it
> > > for others? ... do build docs.
> >
> > W2000, JDK1.4: yup it works! :-)
> 
> Same here - let's just doublecheck the possible bug in the resolver when
> there is a comment at the end of a catalog file.

Yes, it happens for me too. An unclosed comment
i.e. -- (without the closing --) causes the build to hang.
The "bug" happens anywhere with an unclosed comment
i.e. not only at the end of the file. I suppose that XML Catalogs
(schema/catalog.xcat) helps with having a reliable catalog
file ... have not tried that yet.

> > > I have left the old directory src/resources/schemas/
> > > in place for a little while because i am wondering if
> > > anyone had any outstanding commits for the v1.1 DTDs.
> 
> > IMHO if you axe it now it will cause less confusion.
>
> What we also could do is import the old catalog for some time (there's
> no depreciation method in XML however). See my uncommitted .zip.

Ah, you are talking about a different thing here Steven - still
very relevant. I was talking about leaving the recent DTDs that
Stefano is working on for Forrest, until he has checked in
any changes. We can merge any changes into the final set
at src/resources/schema/ then remove the redundant
src/resources/schemas/ directory.

You are talking about the old DTDs that are inherited from
Cocoon via Centipede. There are two possibilities:
A) copy any version 1.0 DTDs over when we need them
B) use a CATALOG directive inside the main Forrest OASIS
catalog to point to the old stuff. 

I was not sure what to do at this stage so i opted for A.
My assumption was that we do not need the old DTDs.
However, if we do need them then the Public Identifiers do
facilitate old xml instance documents declaring the old DTDs.

I suppose that the looming issue is going to be how to
keep copies of these schema in sync between various
projects: Forrest, Cocoon, Centipede, etc.

> > Now for the layout of the docs: is there somone currently
> > working on it?
> 
> Not here.

Nor here. I am working on getting this framework stuff sorted
out first. Yes, layout of Forrest's own xdocs is needed soon.

-- David

RE: rationalise storage of DTDs and other entities

Posted by Steven Noels <st...@outerthought.org>.
Nicola and David wrote:

>
> From: "David Crossley" <cr...@indexgeo.com.au>
>
> > I have now centralised the location of all the external entities,
> > as we discussed. They have been amalgamated at
> > src/resources/schema/
>
> Good. :-)

Well, our diffs are pretty similar - hehe. I have found my Australian
alter ego :-)

> > Ken, some of these changes may be relevant for Centipede.
>
> My rule of the thumb is: if someone touches ./tools, it's a
> Centipede update
> right away.
> All else can wait consolidation...
> ...looking at new code...
> Oh, you you are right: it will have to be put in Centipede also.
> Thanks.

Yep.

> > I have also removed the xdocs/dtd/*
> > So now we are reliant on the entity resolver to find all
> > external entities. It works for me on Linux. How goes it
> > for others? ... do build docs.
>
> W2000, JDK1.4: yup it works! :-)

Same here - let's just doublecheck the possible bug in the resolver when
there is a comment at the end of a catalog file.

> > I have left the old directory src/resources/schemas/
> > in place for a little while because i am wondering if
> > anyone had any outstanding commits for the v1.1 DTDs.

> IMHO if you axe it now it will cause less confusion.

What we also could do is import the old catalog for some time (there's
no depreciation method in XML however). See my uncommitted .zip.

> Now for the layout of the docs: is there somone currently
> working on it?

Not here.

</Steven>


Re: rationalise storage of DTDs and other entities

Posted by Nicola Ken Barozzi <ba...@nicolaken.com>.
From: "David Crossley" <cr...@indexgeo.com.au>

> I have now centralised the location of all the external entities,
> as we discussed. They have been amalgamated at
> src/resources/schema/

Good. :-)

> Ken, some of these changes may be relevant for Centipede.

My rule of the thumb is: if someone touches ./tools, it's a Centipede update
right away.
All else can wait consolidation...
...looking at new code...
Oh, you you are right: it will have to be put in Centipede also.
Thanks.

> I have also removed the xdocs/dtd/*
> So now we are reliant on the entity resolver to find all
> external entities. It works for me on Linux. How goes it
> for others? ... do build docs.

W2000, JDK1.4: yup it works! :-)

> I have left the old directory src/resources/schemas/
> in place for a little while because i am wondering if
> anyone had any outstanding commits for the v1.1 DTDs.

IMHO if you axe it now it will cause less confusion.

Now for the layout of the docs: is there somone currently working on it?

--
Nicola Ken Barozzi                 barozzi@nicolaken.com
            - verba volant, scripta manent -
   (discussions get forgotten, just code remains)
---------------------------------------------------------------------


Re: rationalise storage of DTDs and other entities

Posted by Stefano Mazzocchi <st...@apache.org>.
Steven Noels wrote:
> 
> David Crossley wrote:
> 
> >
> > I have now centralised the location of all the external entities,
> > as we discussed. They have been amalgamated at
> > src/resources/schema/
> >
> > (Strange? ... have not seen the commit emails arrive yet.)
> 
> Cool ;-)
> 
> Maybe we should diff our diffs and find out how many neurons we share
> :-)
> 
> (see my previous mail)
> 
> > Ken, some of these changes may be relevant for Centipede.
> >
> > I have also removed the xdocs/dtd/*
> > So now we are reliant on the entity resolver to find all
> > external entities. It works for me on Linux. How goes it
> > for others? ... do build docs.
> 
> OK, I'll do an update and see how everything goes.
> 
> > I have left the old directory src/resources/schemas/
> > in place for a little while because i am wondering if
> > anyone had any outstanding commits for the v1.1 DTDs.
> 
> Nothing here.

had to approuve them. Should be arriving soon.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------

RE: rationalise storage of DTDs and other entities

Posted by Steven Noels <st...@outerthought.org>.
David Crossley wrote:

>
> I have now centralised the location of all the external entities,
> as we discussed. They have been amalgamated at
> src/resources/schema/
>
> (Strange? ... have not seen the commit emails arrive yet.)

Cool ;-)

Maybe we should diff our diffs and find out how many neurons we share
:-)

(see my previous mail)

> Ken, some of these changes may be relevant for Centipede.
>
> I have also removed the xdocs/dtd/*
> So now we are reliant on the entity resolver to find all
> external entities. It works for me on Linux. How goes it
> for others? ... do build docs.

OK, I'll do an update and see how everything goes.

> I have left the old directory src/resources/schemas/
> in place for a little while because i am wondering if
> anyone had any outstanding commits for the v1.1 DTDs.

Nothing here.

</Steven>


Re: rationalise storage of DTDs and other entities

Posted by David Crossley <cr...@indexgeo.com.au>.
I have now centralised the location of all the external entities,
as we discussed. They have been amalgamated at
src/resources/schema/

(Strange? ... have not seen the commit emails arrive yet.)

Ken, some of these changes may be relevant for Centipede.

I have also removed the xdocs/dtd/*
So now we are reliant on the entity resolver to find all
external entities. It works for me on Linux. How goes it
for others? ... do build docs.

I have left the old directory src/resources/schemas/
in place for a little while because i am wondering if
anyone had any outstanding commits for the v1.1 DTDs.

-- David

David Crossley wrote:
> One of the issues that Steven touched in another
> thread is where to store the DTDs and other entities
> so that they are readily available to all XML instance
> documents. We also want to only maintain one copy
> of these entities.
<skip/>

Re: rationalise storage of DTDs and other entities

Posted by Stefano Mazzocchi <st...@apache.org>.
David Crossley wrote:
> 
> One of the issues that Steven touched in another
> thread is where to store the DTDs and other entities
> so that they are readily available to all XML instance
> documents. We also want to only maintain one copy
> of these entities.
> 
> Sorry that it is long - i am trying to ensure that we look
> at all issues early. Also, i am providing background so
> that other people on forrest-dev, who are not necessarily
> familiar with Cocoon, can see where we are coming from.
> Parts of that background will have bearing on where we
> decide to store the DTDs in Forrest.
> 
> I am using the term "entities" to refer to all external bits that
> are required to build an XML instance document, i.e. its DTD,
> any character entity sets that are declared by either the DTD
> or the XML instance, and potential other external entities.

I'm current +0 with anything you guys come up with on this realm.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



Re: rationalise storage of DTDs and other entities

Posted by Nicola Ken Barozzi <ba...@nicolaken.com>.
From: "Steven Noels" <st...@outerthought.org>

> David Crossley wrote:
>
> > --------------------------------------
> > Background - Forrest
> > ----------------
> > The CVS for xml-forrest has recently been set up and
> > is based on Krysalis Centipede. This in turn was based
> > on Cocoon, so it brought with it a similar filesystem
> > structure for the entities. It also brought similar duplication
> > due to the still-standing issue.
> >
> > Meanwhile Forrest is starting to develop the next version
> > of the DTDs. It has them stored at a different location, together
> > with new OASIS Catalogs.
> >
> > By the way, i verified that the catalog entity resolver of
> > Cocoon is working properly inside Forrest by raising the
> > verbosity level and tweaking the document type declaration
> > in index.xml and entries in the OASIS Catalog. Would someone
> > on Windows please verify this too? Perhaps Ken has done
> > so already for Centipede.

Yes, it works.

> We should make use of the entity resolver a default, which means
> cleaning up the docs inside current CVS, and provide some template docs
> for each doctype.

+1

> > Existing filesystem structure for Forrest ...
> > src/resources/entities/*.dtd
> > src/documentation/xdocs/dtd/*.dtd
> > src/resources/schemas/DTD/*.dtd
> >
> > --------------------------------------
> > Proposed storage of external entities for Forrest
> > ---------------------------------
> > Here are the alternatives that i see. We may need some
> > discussion before we can decide.
> >
> > A) under src/resources/some-dir-structure/
>
> I would go for A)

Me too. In a similar fashion, I've put cocoon.xconf in another dir, and
would like to see sitemap.xmap get out of the way too.

> > B) under src/documentation/xdocs/some-dir-structure/
> >
> > "some-dir-structure" either has sub-directories ...
> > schemas/dtd/*.dtd
> > schemas/entities/*.pen
> >
> > or it is flat ...
> > entities/*.dtd
> > entities/*.pen
>
> not flat:
>
> src/resources/schema/dtd
> src/resources/schema/entities
> src/resources/schema/relax

+1

> > By the way, the word "schemata" is actually the plural
> > of "schema", if good grammar matters. That is why i chose
> > the directory name "entities" for Cocoon - to avoid that
> > issue :-)
>
> My collaegue told me once that using plurals for directory names is
> making explicit what is already implied: directories are 'made' to
> contain multiple items of some kind, so the plural is superficial. Oh
> well... ;-)

When you use English words in Italian, it's good practice not to use
plurals.
For example:
one computer: un computer
many computers: molti computer

So to me, schema makes sense.
+0

> > I currently lean towards A, because it should be entirely
> > independent of Forrest's own documentation. I also prefer
> > a flat structure because there are not really all that many
> > entities involved.

I also lean towards A, but for a different reason: Forrest proper source
code should go in ./src IMHO. The "conceptual" issue is that Forrest is a
product that uses itself, in this case in documentation. This is a very
special case for Centipede, so we need to make a target that synchronizes
the documentation system with source before docs generation.

> hm - flat is good for directories containing the same type of
> 'entities', which is not the case anymore - i would prefer some
> structure.

+1

> >
> > --------------------------------------
> > Other issues
> > ---------
> > 1) Need to decide where to store the ISO*.pen character entity
> > sets. Cocoon has them dumped in the same directory as the
> > DTDs. Forrest currently has them in a separate directory.
>
> Should stay like that.

+1

> I don't like these entities anyhow: documents should be using proper
> Unicode encoding, and eventually character references instead of these
> remains from the SGML-era. We should avoid entities like the plague:
> http://www.textuality.com/xml/xmlSW.html makes no reference to entities
> (or even DTD's anymore), and
> http://www.xml.com/pub/a/2002/02/20/deviant.html.
>
> > 2) Other projects, such as Centipede and Cocoon, will still want
> > to ship a collection of external entities and an OASIS Catalog.
> >
> > 3) How should default System Identifiers be expressed for
> > the XML instance documents of Forrest's own documentation?
>
> "filename.dtd" unless we are planning to make them public as you
> describe underneath.
>
> > 3a) Use filesystem-based URIs with a relative pathname to
> > the actual DTD, e.g. ../dtd/document-v10.dtd
> > 3b) Use a filename which does not yield the DTD and just
> > lets the entity resolver do the work, e.g. "document-v11.dtd"
> > 3c) Use URL-based default System Identifiers,
> > e.g. "http://xml.apache.org/dtd/document-v11.dtd"
> > I prefer 3b because it is the easiest.
>
> 3b: enforce usage of the entity resolver

+1

> > 4) Should there be an actual DTD file located at that URL?
> > We need to be careful with this, because we do not want
> > to encourage crappy xml tools that do not use an entity
> > resolver. Instead, such tools are wasting bandwidth by
> > retrieving the DTD from xml.apache.org every time that
> > the XML instance is parsed. They should resolve the
> > request to a local copy.

Makes sense, but, correct me if I'm wrong, isn't the URI something that
should in some way describe the real location on the Internet?
As an alternative, we could use a URI with a bogus protocol (resolved by
forrest only) to eliminate this possible inconsistency (forrest://).

> > 5) We need to encourage any project that uses Forrest
> > to provide proper document type declarations in their
> > XML instance documents. Lead by example is best.
>
> yes indeed

and enforce this along with validation during the build (I'm putting it in
Cent.)

> I still am quite unclear on what the difference will be between the CVS
> structure and the 'unit of deployment', but am lacking some time to
> properly investigate.

The way I see it, Forrest could be deployed as a directory in ./tools.
Currently, in Centipede, we have ./tools/ant, ./tools/centipede and
./tools/cocoon.
In the near future, Centipede will ship with ./tools/ant, ./tools/centipede,
./tools/cocoon and ./tools/forrest.

My idea is to make the tools/dir a kind of "webapps" dir for build tools.
This is what I'm working on for next Centipede release.
Ant-dev has now the possibility of loading tasks dynamically. If I add to
that the possibility of adding targets and dependency, it's done. Instead of
.war or jar, we could have .cent . Just my two .cent ;-)

> The main idea to commit Centipede as-is was to give us some flying
> start: it should be easier now to edit and remove, instead of endless
> tinkering on the mailing list. So feel free everybody to attack what we
> have in CVS now ;-)

Yup

Just remember that all that is in the ./tool dir will change with subsequent
releases of the tools. This means that the tool that has been touched needs
to evolve, while changes in the ./src dir are indipendent.
So I humbly suggest that we discuss, in any case, changes to the ./tools/**.

--
Nicola Ken Barozzi                 barozzi@nicolaken.com
            - verba volant, scripta manent -
   (discussions get forgotten, just code remains)
---------------------------------------------------------------------


RE: rationalise storage of DTDs and other entities

Posted by Steven Noels <st...@outerthought.org>.
David Crossley wrote:

> Sorry that it is long - i am trying to ensure that we look
> at all issues early. Also, i am providing background so
> that other people on forrest-dev, who are not necessarily
> familiar with Cocoon, can see where we are coming from.
> Parts of that background will have bearing on where we
> decide to store the DTDs in Forrest.

It sums it up very nicely, so the length was appropriate :-)

> I am using the term "entities" to refer to all external bits that
> are required to build an XML instance document, i.e. its DTD,
> any character entity sets that are declared by either the DTD
> or the XML instance, and potential other external entities.
>

<snip>cocoon-history</snip>

>
> Now that the entity resolver is working for Cocoon, the
> storage of DTDs could be at just one directory, probably
> webapp/resources/entities/  Anyway, this issue has not
> yet been raised on cocoon-dev.

We should test and raise if necesarry - I would stick all Forrest
related stuff in the same webapp, unless we are planning to make the
DTD's available across HTTP (as Sun is doing for their server.xml and
the like).

> Existing filesystem structure for Cocoon ...
> webapp/resources/entities/*.dtd
> src/documentation/xdocs/dtd/*.dtd
>
> --------------------------------------
> Background - Forrest
> ----------------
> The CVS for xml-forrest has recently been set up and
> is based on Krysalis Centipede. This in turn was based
> on Cocoon, so it brought with it a similar filesystem
> structure for the entities. It also brought similar duplication
> due to the still-standing issue.
>
> Meanwhile Forrest is starting to develop the next version
> of the DTDs. It has them stored at a different location, together
> with new OASIS Catalogs.
>
> By the way, i verified that the catalog entity resolver of
> Cocoon is working properly inside Forrest by raising the
> verbosity level and tweaking the document type declaration
> in index.xml and entries in the OASIS Catalog. Would someone
> on Windows please verify this too? Perhaps Ken has done
> so already for Centipede.

We should make use of the entity resolver a default, which means
cleaning up the docs inside current CVS, and provide some template docs
for each doctype.

> Existing filesystem structure for Forrest ...
> src/resources/entities/*.dtd
> src/documentation/xdocs/dtd/*.dtd
> src/resources/schemas/DTD/*.dtd
>
> --------------------------------------
> Proposed storage of external entities for Forrest
> ---------------------------------
> Here are the alternatives that i see. We may need some
> discussion before we can decide.
>
> A) under src/resources/some-dir-structure/

I would go for A)

> B) under src/documentation/xdocs/some-dir-structure/
>
> "some-dir-structure" either has sub-directories ...
> schemas/dtd/*.dtd
> schemas/entities/*.pen
>
> or it is flat ...
> entities/*.dtd
> entities/*.pen

not flat:

src/resources/schema/dtd
src/resources/schema/entities
src/resources/schema/relax

> By the way, the word "schemata" is actually the plural
> of "schema", if good grammar matters. That is why i chose
> the directory name "entities" for Cocoon - to avoid that
> issue :-)

My collaegue told me once that using plurals for directory names is
making explicit what is already implied: directories are 'made' to
contain multiple items of some kind, so the plural is superficial. Oh
well... ;-)

> I currently lean towards A, because it should be entirely
> independent of Forrest's own documentation. I also prefer
> a flat structure because there are not really all that many
> entities involved.

hm - flat is good for directories containing the same type of
'entities', which is not the case anymore - i would prefer some
structure.

>
> --------------------------------------
> Other issues
> ---------
> 1) Need to decide where to store the ISO*.pen character entity
> sets. Cocoon has them dumped in the same directory as the
> DTDs. Forrest currently has them in a separate directory.

Should stay like that.

I don't like these entities anyhow: documents should be using proper
Unicode encoding, and eventually character references instead of these
remains from the SGML-era. We should avoid entities like the plague:
http://www.textuality.com/xml/xmlSW.html makes no reference to entities
(or even DTD's anymore), and
http://www.xml.com/pub/a/2002/02/20/deviant.html.

> 2) Other projects, such as Centipede and Cocoon, will still want
> to ship a collection of external entities and an OASIS Catalog.
>
> 3) How should default System Identifiers be expressed for
> the XML instance documents of Forrest's own documentation?

"filename.dtd" unless we are planning to make them public as you
describe underneath.

> 3a) Use filesystem-based URIs with a relative pathname to
> the actual DTD, e.g. ../dtd/document-v10.dtd
> 3b) Use a filename which does not yield the DTD and just
> lets the entity resolver do the work, e.g. "document-v11.dtd"
> 3c) Use URL-based default System Identifiers,
> e.g. "http://xml.apache.org/dtd/document-v11.dtd"
> I prefer 3b because it is the easiest.

3b: enforce usage of the entity resolver

> 4) Should there be an actual DTD file located at that URL?
> We need to be careful with this, because we do not want
> to encourage crappy xml tools that do not use an entity
> resolver. Instead, such tools are wasting bandwidth by
> retrieving the DTD from xml.apache.org every time that
> the XML instance is parsed. They should resolve the
> request to a local copy.
>
> 5) We need to encourage any project that uses Forrest
> to provide proper document type declarations in their
> XML instance documents. Lead by example is best.

yes indeed

I still am quite unclear on what the difference will be between the CVS
structure and the 'unit of deployment', but am lacking some time to
properly investigate.

The main idea to commit Centipede as-is was to give us some flying
start: it should be easier now to edit and remove, instead of endless
tinkering on the mailing list. So feel free everybody to attack what we
have in CVS now ;-)

</Steven>