You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by David Crossley <cr...@indexgeo.com.au> on 2002/10/21 12:31:15 UTC

Re: XML validation during Cocoon build

Colin Paul Adams wrote:
> >>>>> "David" == David Crossley <cr...@indexgeo.com.au> writes:
> 
> David> We could. However, proper support for Entity Catalogs is
> David> not yet in Ant. So we need to use a rudimentary catalog
> David> facility which automatically builds an internal
> David> catalog. This works, but is cumbersome. 
> 
> David> I still think that the Anteater discussion links that i
> David> provided earlier in this thread is the most promising
> David> option. This was building validation facilities for
> David> Anteater which could also be used in Cocoon.
> 
> OK. Has this progressed at all?

I do not know, i just signed on to the aft-devel list
to help out. Perhaps the others can say ... Ivelin, Jeff ...

> What I shall do as an interim measure, is create an optional patch to
> use the ant XMLValidate task (this will take me less than an hour), so
> I can keep track of any necessary sitemap.dtd changes.

Great. Here are the ANT targets from Forrest if it helps:
http://cvs.apache.org/viewcvs/xml-forrest/src/scratchpad/targets/
There is also a task in there to automatically
build an XML Catalog.

> I don't
> necessarily suggest integrating it into CVS, as it will involve adding
> DOCTYPEs to all the sitemap.xmap files, and this might add extra
> overhead during parsing.

I wondered about that too. How often does a sitemap
get parsed? Perhaps the overhead is immaterial.

> But the patch would be available for others
> to use if they wish.

We could put it into CVS, but comment-out its caller.
This makes it more readily used, thence feedback.

I shall also continue to work on the DTD. It's
> still far too ad-hoc to serve satisfactorily as a source for
> generating a W3C/Schematron/RelaxNG grammar. I shall try to make it
> "correct" if I can.

Yes, now that we have a good draft DTD. We need to
go through it piece-by-piece and verify the required
and optional attributes of each.

That is a good exercise to go through at this stage
of the 2.1 development. Fine-tuning is what is required.
The pace of development has been phenomenal, and it
now needs some spit-and-polish.

As the discussion of each piece progresses,
some of us can help out by reflecting that in
the docs.

> (Can anyone point me to which java files parse
> sitemap.xmap files?) 

Here is a start ...
[localhost]$ find src/java -type f -exec grep -l "sitemap.xmap" {} \;
src/java/org/apache/cocoon/components/pipeline/impl/CachingPointProcessingPipeline.java
src/java/org/apache/cocoon/components/treeprocessor/sitemap/MountNode.java
src/java/org/apache/cocoon/components/treeprocessor/treeprocessor-builtins.xml
src/java/org/apache/cocoon/transformation/SourceWritingTransformer.java




---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: XML validation during Cocoon build

Posted by Colin Paul Adams <co...@colina.demon.co.uk>.
>>>>> "Jeff" == Jeff Turner <je...@apache.org> writes:

    Jeff> tool, http://doctypechanger.sf.net for programmatically
    Jeff> stripping off a DOCTYPE declaration, which is the only way
    Jeff> to prevent DTD parsing.

Not quite. If we are using Xerces then:
http://xml.org/sax/features/external-general-entities 
but if any JAXP parser is allowed then you are correct.

    Jeff> In the long term it might to better to abandon DTDs and this
    Jeff> silly idea of parse-time validation altogether.  But since
    Jeff> Colin went to all the trouble of writing a DTD, it would be
    Jeff> good to use it :)

Actually, I haven't created it from scratch. I've just brought the
existing one uptodate, as it was one of the todo items.
-- 
Colin Paul Adams
Preston Lancashire

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: XML validation during Cocoon build

Posted by Jeff Turner <je...@apache.org>.
On Mon, Oct 21, 2002 at 08:31:15PM +1000, David Crossley wrote:
> Colin Paul Adams wrote:
> > >>>>> "David" == David Crossley <cr...@indexgeo.com.au> writes:
> > 
> > David> We could. However, proper support for Entity Catalogs is
> > David> not yet in Ant. So we need to use a rudimentary catalog
> > David> facility which automatically builds an internal
> > David> catalog. This works, but is cumbersome. 
> > 
> > David> I still think that the Anteater discussion links that i
> > David> provided earlier in this thread is the most promising
> > David> option. This was building validation facilities for
> > David> Anteater which could also be used in Cocoon.
> > 
> > OK. Has this progressed at all?
> 
> I do not know, i just signed on to the aft-devel list
> to help out. Perhaps the others can say ... Ivelin, Jeff ...

No progress the specific topic of that email.  I think JARV [1] might be
a better set of interfaces to standardise on.

In the context of this discussion, Anteater is probably the wrong tool.
I think the best solution would be to add a DOCTYPE declaration to the
sitemap and let the parser validate.  This has the added benefit that
users with catalog-aware editors [2] can validate as they edit.

> > I don't necessarily suggest integrating it into CVS, as it will
> > involve adding DOCTYPEs to all the sitemap.xmap files, and this might
> > add extra overhead during parsing.
> 
> I wondered about that too. How often does a sitemap get parsed? Perhaps
> the overhead is immaterial.

If performance becomes a problem, we can add a switch to cocoon.xconf
which turns off sitemap validation.  I wrote a tool,
http://doctypechanger.sf.net for programmatically stripping off a DOCTYPE
declaration, which is the only way to prevent DTD parsing.  If we want to
go this route, I can suggest integrating this 'switch-off-DTDs' flag into
the o.a.e.xml.Parser implementation in Excalibur.  It could then be
exposed in cocoon.xconf as something like:

<xml-parser ...
  <parameter name="validate" value="false"/>
  <strip-doctypes>

    <!-- Don't validate sitemaps -->
    <publicId>-//APACHE//DTD Cocoon Sitemap V1.0//EN</publicId>

    <!-- Don't validate treeprocessor-builtins.xml -->
    <rootElement>tree-processor</rootElement>

    <!-- DO validate Forrest docs.. 
    <publicId>-//APACHE//DTD XML Documentation V1.1//EN</publicId>
    -->
    ....
  </strip-doctypes>
</xml-parser>


In the long term it might to better to abandon DTDs and this silly idea
of parse-time validation altogether.  But since Colin went to all the
trouble of writing a DTD, it would be good to use it :)


--Jeff

[1] http://iso-relax.sourceforge.net/JARV/ 
[2] http://xml.apache.org/forrest/your-project.html#N102AD

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: XML validation during Cocoon build

Posted by Colin Paul Adams <co...@colina.demon.co.uk>.
>>>>> "David" == David Crossley <cr...@indexgeo.com.au> writes:

    >> I don't necessarily suggest integrating it into CVS, as it will
    >> involve adding DOCTYPEs to all the sitemap.xmap files, and this
    >> might add extra overhead during parsing.

    David> I wondered about that too. How often does a sitemap get
    David> parsed? Perhaps the overhead is immaterial.

Probably. But more significant is that the build no longer works once
you add DOCTYPEs to the sitemaps. That's because XConfToolTask (at
least) doesn't use the resolver. So it can't read the external subset
and falls over.

    David> Yes, now that we have a good draft DTD. We need to go
    David> through it piece-by-piece and verify the required and
    David> optional attributes of each.

I'm currently parameterising it, so the bits in common stand out
better.
-- 
Colin Paul Adams
Preston Lancashire

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org