You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Andrew Stevens <at...@hotmail.com> on 2005/02/28 14:47:52 UTC

EntityResolverWrapper

In 
org.apache.xerces.util.EntityResolverWrapper.resolveEntity(XMLResourceIdentifier 
resourceIdentifier), it has the line
    String sysId = resourceIdentifier.getExpandedSystemId();
Is there some particular reason this uses the expanded system ID rather than 
using getLiteralSystemId()?

I've got a problem with some XML files I'm processing with Cocoon.  The 
files all contain a DOCTYPE that uses a relative path for the system ID i.e. 
<!DOCTYPE record SYSTEM "dcr4.5.dtd">  The documents are created by an 
another application, and I can't affect what it puts in there.  Trying to 
read the files generates a parser error since the DTD isn't present in the 
directory containing the documents; no problem, I thought, just use a 
suitable entry in the catalog used by Cocoon's EntityResolver.  So, 
following the other entries, I added
    SYSTEM "dcr4.5.dtd" "interwoven/dcr4.5.dtd"
and copied the DTD into WEB-INF\entities\interwoven, however, it still 
doesn't find the DTD.  Turning up the logging (and this is where it becomes 
more relevant to Xerces than Cocoon, and why I'm asking here rather than 
cocoon-user) I discovered that the system ID being passed in to the catalog 
resolver already had the full path to the file, so it's not matching the 
above entry in the catalog.  Since the path to the documents could be more 
or less anything, I can't use a (prefix-based) rewrite entry in the catalog; 
likewise it's impractical to include a system entry for every possible path, 
since I don't know in advance what they're going to be.  Digging through the 
Cocoon & Xerces source code, I discovered the path being received by the 
catalog resolver has come from the EntityResolverWrapper i.e. the 
resourceIdentifier.getExpandedSystemId() I mentioned above.  Presumably, if 
that had used getLiteralSystemId() instead, the catalog resolver would have 
received just "dcr4.5.dtd" for the system ID rather than the full path, and 
would have matched it okay.  But I'm wary of changing it myself, since I 
don't know what else might be affected (and I'd rather avoid using a 
custom-built Xerces in our Cocoon app, to minimise the risk of introducing 
other side-effects).

I notice in the current CVS HEAD, there's an EntityResolver2Wrapper class; 
this one does use getLiteralSystemId(), in fact the latest CVS log message 
on that class says
"Fixing a bug. The systemId passed to EntityResolver2.resolveEntity may be 
an absolute or relative URI. That is it should be the literal system 
identifier, not the expanded one which resolved from the base URI."
However, I also found an old (> 2 years) mailing list message 
(http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=xerces-j-user@xml.apache.org&msgId=568021) 
which says that
"The reason Xerces now returns fully-expanded URI's to the Entity resolver 
is that SAX quite explicitly states that this is what XML processors are 
supposed to do."
So now I'm twice as confused.  Do the SAX2 Extensions 1.1 say that 
EntityResolver2 should behave differently from EntityResolver?  Or have 
things changed since EntityResolverWrapper switched to using 
getExpandedSystemId(), and should it now be using getLiteralSystemId() after 
all?

In the meantime I can work around my problem by plugging in a custom 
EntityResolver which replaces any system IDs ending with "dcr4.5.dtd" with 
just that string, before passing it on to the XML commons catalog resolver 
as before.  But it'd be nice if it could be clarified how exactly Xerces' 
wrapper classes are supposed to work, so I know if I should be raising a bug 
:-)


Andrew.
--



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: EntityResolverWrapper

Posted by Andrew Stevens <at...@hotmail.com>.
Thanks, I think I "get it" now.  It looks like EntityResolver2 is the new & 
improved version to fix some of the limitations of EntityResolver (like the 
one I'm running up against).  But given that the Version in 
EntityResolver2's javadocs is "TBD" and the Xerces wrapper isn't in any 
released version yet (at least, I assume the Xerces-J_01052005 CVS tag is a 
between-versions build?), I won't hold my breath waiting for Excalibur and 
then Cocoon to start using the new version.

org.apache.xerces.util.XMLCatalogResolver looks interesting - I see it has a 
useLiteralSystemId property, but that only gets used by 
resolveIdentifier/resolveEntity(XMLResourceIdentifier) and not by 
resolveEntity(String,String) i.e. only the ones that throw XNIException, so 
I'm not sure it would help me in this instance.  Besides, given the way 
Excalibur componentises EntityResolver, I suspect my current plan 
(subclassing their DefaultResolver, which already uses XML Commons' 
CatalogResolver) will be quicker than trying to plug the XMLCatalogResolver 
into it.  Something to play around with if I have time on my hands, though.


Andrew.

>From: Michael Glavassevich <mr...@ca.ibm.com>
>Reply-To: xerces-j-user@xml.apache.org
>To: xerces-j-user@xml.apache.org
>Subject: Re: EntityResolverWrapper
>Date: Mon, 28 Feb 2005 09:16:46 -0500
>
>Hi Andrew,
>
>EntityResolverWrapper is a wrapper for org.xml.sax.EntityResolver. The
>system ID passed to EntityResolver.resolveEntity() is the "expanded system
>ID". Specifically the docs for resolveEntity() [1] say: "if the system
>identifier is a URL, the SAX parser must resolve it fully before reporting
>it to the application" and that's exactly what the parser does. The other
>wrapper is for EntityResolver2 [2] whose resolveEntity() methods takes the
>literal system ID along with a base URI, so yes the two resolvers behave
>differently. Xerces has a utility class called
>org.apache.xerces.util.XMLCatalogResolver which uses the XML commons
>catalog resolver. You may want to have a look at it.
>
>Hope that helps.
>
>[1]
>http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String)
>[2]
>http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html#resolveEntity(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)
>
>"Andrew Stevens" <at...@hotmail.com> wrote on 02/28/2005 08:47:52 AM:
>
> > In
> > org.apache.xerces.util.EntityResolverWrapper.
> > resolveEntity(XMLResourceIdentifier
> > resourceIdentifier), it has the line
> >     String sysId = resourceIdentifier.getExpandedSystemId();
> > Is there some particular reason this uses the expanded system ID rather
>than
> > using getLiteralSystemId()?
> >
> > I've got a problem with some XML files I'm processing with Cocoon.  The
> > files all contain a DOCTYPE that uses a relative path for the system ID
>i.e.
> > <!DOCTYPE record SYSTEM "dcr4.5.dtd">  The documents are created by an
> > another application, and I can't affect what it puts in there.  Trying
>to
> > read the files generates a parser error since the DTD isn't present in
>the
> > directory containing the documents; no problem, I thought, just use a
> > suitable entry in the catalog used by Cocoon's EntityResolver.  So,
> > following the other entries, I added
> >     SYSTEM "dcr4.5.dtd" "interwoven/dcr4.5.dtd"
> > and copied the DTD into WEB-INF\entities\interwoven, however, it still
> > doesn't find the DTD.  Turning up the logging (and this is where it
>becomes
> > more relevant to Xerces than Cocoon, and why I'm asking here rather than
>
> > cocoon-user) I discovered that the system ID being passed in to the
>catalog
> > resolver already had the full path to the file, so it's not matching the
>
> > above entry in the catalog.  Since the path to the documents could be
>more
> > or less anything, I can't use a (prefix-based) rewrite entry in the
>catalog;
> > likewise it's impractical to include a system entry for every possible
>path,
> > since I don't know in advance what they're going to be.  Digging through
>the
> > Cocoon & Xerces source code, I discovered the path being received by the
>
> > catalog resolver has come from the EntityResolverWrapper i.e. the
> > resourceIdentifier.getExpandedSystemId() I mentioned above.  Presumably,
>if
> > that had used getLiteralSystemId() instead, the catalog resolver would
>have
> > received just "dcr4.5.dtd" for the system ID rather than the full path,
>and
> > would have matched it okay.  But I'm wary of changing it myself, since I
>
> > don't know what else might be affected (and I'd rather avoid using a
> > custom-built Xerces in our Cocoon app, to minimise the risk of
>introducing
> > other side-effects).
> >
> > I notice in the current CVS HEAD, there's an EntityResolver2Wrapper
>class;
> > this one does use getLiteralSystemId(), in fact the latest CVS log
>message
> > on that class says
> > "Fixing a bug. The systemId passed to EntityResolver2.resolveEntity may
>be
> > an absolute or relative URI. That is it should be the literal system
> > identifier, not the expanded one which resolved from the base URI."
> > However, I also found an old (> 2 years) mailing list message
> > (http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=xerces-
> > j-user@xml.apache.org&msgId=568021)
> > which says that
> > "The reason Xerces now returns fully-expanded URI's to the Entity
>resolver
> > is that SAX quite explicitly states that this is what XML processors are
>
> > supposed to do."
> > So now I'm twice as confused.  Do the SAX2 Extensions 1.1 say that
> > EntityResolver2 should behave differently from EntityResolver?  Or have
> > things changed since EntityResolverWrapper switched to using
> > getExpandedSystemId(), and should it now be using getLiteralSystemId()
>after
> > all?
> >
> > In the meantime I can work around my problem by plugging in a custom
> > EntityResolver which replaces any system IDs ending with "dcr4.5.dtd"
>with
> > just that string, before passing it on to the XML commons catalog
>resolver
> > as before.  But it'd be nice if it could be clarified how exactly
>Xerces'
> > wrapper classes are supposed to work, so I know if I should be raising a
>bug
> > :-)
> >
> >
> > Andrew.
>
>Michael Glavassevich
>XML Parser Development
>IBM Toronto Lab
>E-mail: mrglavas@ca.ibm.com
>E-mail: mrglavas@apache.org
--



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org


Re: EntityResolverWrapper

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Andrew,

EntityResolverWrapper is a wrapper for org.xml.sax.EntityResolver. The 
system ID passed to EntityResolver.resolveEntity() is the "expanded system 
ID". Specifically the docs for resolveEntity() [1] say: "if the system 
identifier is a URL, the SAX parser must resolve it fully before reporting 
it to the application" and that's exactly what the parser does. The other 
wrapper is for EntityResolver2 [2] whose resolveEntity() methods takes the 
literal system ID along with a base URI, so yes the two resolvers behave 
differently. Xerces has a utility class called 
org.apache.xerces.util.XMLCatalogResolver which uses the XML commons 
catalog resolver. You may want to have a look at it.

Hope that helps.

[1] 
http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String)
[2] 
http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html#resolveEntity(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)

"Andrew Stevens" <at...@hotmail.com> wrote on 02/28/2005 08:47:52 AM:

> In 
> org.apache.xerces.util.EntityResolverWrapper.
> resolveEntity(XMLResourceIdentifier 
> resourceIdentifier), it has the line
>     String sysId = resourceIdentifier.getExpandedSystemId();
> Is there some particular reason this uses the expanded system ID rather 
than 
> using getLiteralSystemId()?
> 
> I've got a problem with some XML files I'm processing with Cocoon.  The 
> files all contain a DOCTYPE that uses a relative path for the system ID 
i.e. 
> <!DOCTYPE record SYSTEM "dcr4.5.dtd">  The documents are created by an 
> another application, and I can't affect what it puts in there.  Trying 
to 
> read the files generates a parser error since the DTD isn't present in 
the 
> directory containing the documents; no problem, I thought, just use a 
> suitable entry in the catalog used by Cocoon's EntityResolver.  So, 
> following the other entries, I added
>     SYSTEM "dcr4.5.dtd" "interwoven/dcr4.5.dtd"
> and copied the DTD into WEB-INF\entities\interwoven, however, it still 
> doesn't find the DTD.  Turning up the logging (and this is where it 
becomes 
> more relevant to Xerces than Cocoon, and why I'm asking here rather than 

> cocoon-user) I discovered that the system ID being passed in to the 
catalog 
> resolver already had the full path to the file, so it's not matching the 

> above entry in the catalog.  Since the path to the documents could be 
more 
> or less anything, I can't use a (prefix-based) rewrite entry in the 
catalog; 
> likewise it's impractical to include a system entry for every possible 
path, 
> since I don't know in advance what they're going to be.  Digging through 
the 
> Cocoon & Xerces source code, I discovered the path being received by the 

> catalog resolver has come from the EntityResolverWrapper i.e. the 
> resourceIdentifier.getExpandedSystemId() I mentioned above.  Presumably, 
if 
> that had used getLiteralSystemId() instead, the catalog resolver would 
have 
> received just "dcr4.5.dtd" for the system ID rather than the full path, 
and 
> would have matched it okay.  But I'm wary of changing it myself, since I 

> don't know what else might be affected (and I'd rather avoid using a 
> custom-built Xerces in our Cocoon app, to minimise the risk of 
introducing 
> other side-effects).
> 
> I notice in the current CVS HEAD, there's an EntityResolver2Wrapper 
class; 
> this one does use getLiteralSystemId(), in fact the latest CVS log 
message 
> on that class says
> "Fixing a bug. The systemId passed to EntityResolver2.resolveEntity may 
be 
> an absolute or relative URI. That is it should be the literal system 
> identifier, not the expanded one which resolved from the base URI."
> However, I also found an old (> 2 years) mailing list message 
> (http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=xerces-
> j-user@xml.apache.org&msgId=568021) 
> which says that
> "The reason Xerces now returns fully-expanded URI's to the Entity 
resolver 
> is that SAX quite explicitly states that this is what XML processors are 

> supposed to do."
> So now I'm twice as confused.  Do the SAX2 Extensions 1.1 say that 
> EntityResolver2 should behave differently from EntityResolver?  Or have 
> things changed since EntityResolverWrapper switched to using 
> getExpandedSystemId(), and should it now be using getLiteralSystemId() 
after 
> all?
> 
> In the meantime I can work around my problem by plugging in a custom 
> EntityResolver which replaces any system IDs ending with "dcr4.5.dtd" 
with 
> just that string, before passing it on to the XML commons catalog 
resolver 
> as before.  But it'd be nice if it could be clarified how exactly 
Xerces' 
> wrapper classes are supposed to work, so I know if I should be raising a 
bug 
> :-)
> 
> 
> Andrew.
> --
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
> 

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org