You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Geoff Waggott <ge...@zergsoft.com> on 2004/12/03 03:45:50 UTC

HTMLSerializer entity reference problem

Hi,

I'm trying to use the org.apache.xml.serializer.HTMLSerializer to output 
a particular flavour of HTML with the following doctype:
-//W3C//DTD Compact HTML 1.0 Draft //EN

The document I'm trying to render has no doctype defined on it and I'm 
specifying that DTD in the OutputFormat which I'm using to construct the 
serializer instance.

The serialized result has the correct doctype definition. However, the 
document content contains symbolic entity references that are not 
defined in that DTD. For example &rsquo; which is defined for HTML 4.01

Does anybody have any idea what I might be doing wrong?

Cheers,
Geoff

-- 
===========================================================
Geoff Waggott <ge...@zergsoft.com>
ZergSoft
Tel:   (052) 930-7790
FAX:   (052) 930-7791
email: geoff@zergsoft.com
www.zergsoft.com
===========================================================


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Problem with grammar pool

Posted by Bob Foster <bo...@objfac.com>.
Jeff Greif wrote:
> From: "Bob Foster" <bo...@objfac.com>
>>My XMLGrammarPool implementation keeps a cache of grammars, invalidating
>>the cache when a grammar is modified. This works quite well when a
>>document directly references a schemaLocation of, say, A.xsd. If A.xsd
>>is modified and the document revalidated, the pool correctly does not
>>return the cached grammar, allowing Xerces to recalculate it.
>>
>>The problem arises when A.xsd includes or imports B.xsd and B.xsd is
>>changed. Even though the cached grammar for B.xsd is refreshed, when the
>>document is revalidated the consolidated schema representing A.xsd,
>>which incorporates information from the previous version of B.xsd, still
>>appears to be up-to-date and is incorrectly used for validation.
> 
> It appears that you know for sure that the grammar in the pool for A
> includes rather referencing constructs from B.  I'm curious to know why this
> would be done (as an optimization or to satisfy some functional
> requirement).

The problem is I _don't_ know that A references B, and don't have a good 
way of finding out short of parsing A myself. Since Xerces does know, it 
would seem pretty simple for it to provide some sort of 
beginSchema/endSchema callbacks so others could capture the dependency 
relationships in a grammar pool.

> I've noticed something similar.  If you use grammar preparsing to construct
> the grammar for A.xsd, and the import B is not found by the entity resolver
> or any default mechanism, the parsing of A does not stop (whether the
> resolver returns null or throws an IOException)  Instead, the grammar for A
> is constructed with all references to types in the namespace for B replaced
> by xsd:anyType.  This seems like incorrect behavior to me.  Is there a
> justification?  I can provide a test case if necessary.

Different question. I believe the schema spec says that failure to 
locate a schema is not an error, or at least leaves it up to the 
implementation. It sounds like something that should be settable by the 
parser user.

Bob

> Jeff



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: Problem with grammar pool

Posted by Jeff Greif <jg...@alumni.princeton.edu>.
----- Original Message ----- 
From: "Bob Foster" <bo...@objfac.com>
To: <xe...@xml.apache.org>
Sent: Saturday, December 04, 2004 11:10 AM
Subject: Problem with grammar pool


> My XMLGrammarPool implementation keeps a cache of grammars, invalidating
> the cache when a grammar is modified. This works quite well when a
> document directly references a schemaLocation of, say, A.xsd. If A.xsd
> is modified and the document revalidated, the pool correctly does not
> return the cached grammar, allowing Xerces to recalculate it.
>
> The problem arises when A.xsd includes or imports B.xsd and B.xsd is
> changed. Even though the cached grammar for B.xsd is refreshed, when the
> document is revalidated the consolidated schema representing A.xsd,
> which incorporates information from the previous version of B.xsd, still
> appears to be up-to-date and is incorrectly used for validation.

It appears that you know for sure that the grammar in the pool for A
includes rather referencing constructs from B.  I'm curious to know why this
would be done (as an optimization or to satisfy some functional
requirement).

I've noticed something similar.  If you use grammar preparsing to construct
the grammar for A.xsd, and the import B is not found by the entity resolver
or any default mechanism, the parsing of A does not stop (whether the
resolver returns null or throws an IOException)  Instead, the grammar for A
is constructed with all references to types in the namespace for B replaced
by xsd:anyType.  This seems like incorrect behavior to me.  Is there a
justification?  I can provide a test case if necessary.

Jeff


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Problem with grammar pool

Posted by Bob Foster <bo...@objfac.com>.
My XMLGrammarPool implementation keeps a cache of grammars, invalidating 
the cache when a grammar is modified. This works quite well when a 
document directly references a schemaLocation of, say, A.xsd. If A.xsd 
is modified and the document revalidated, the pool correctly does not 
return the cached grammar, allowing Xerces to recalculate it.

The problem arises when A.xsd includes or imports B.xsd and B.xsd is 
changed. Even though the cached grammar for B.xsd is refreshed, when the 
document is revalidated the consolidated schema representing A.xsd, 
which incorporates information from the previous version of B.xsd, still 
appears to be up-to-date and is incorrectly used for validation.

The solution would seem to be that when a schema for a given namespace 
is replaced in the cache, cache entries for all schemas depending on 
that schema should also be invalidated. Unfortunately, Xerces doesn't 
seem to provide any way to get such dependency information. AFAIK, the 
grammar pool can't tell the difference between two schemas that are used 
sequentially (independent) and one schema included by another (dependent).

Any suggestions welcome.

Bob Foster


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org