You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Andrew Stevens <at...@hotmail.com> on 2005/02/28 14:47:52 UTC
EntityResolverWrapper
In
org.apache.xerces.util.EntityResolverWrapper.resolveEntity(XMLResourceIdentifier
resourceIdentifier), it has the line
String sysId = resourceIdentifier.getExpandedSystemId();
Is there some particular reason this uses the expanded system ID rather than
using getLiteralSystemId()?
I've got a problem with some XML files I'm processing with Cocoon. The
files all contain a DOCTYPE that uses a relative path for the system ID i.e.
<!DOCTYPE record SYSTEM "dcr4.5.dtd"> The documents are created by an
another application, and I can't affect what it puts in there. Trying to
read the files generates a parser error since the DTD isn't present in the
directory containing the documents; no problem, I thought, just use a
suitable entry in the catalog used by Cocoon's EntityResolver. So,
following the other entries, I added
SYSTEM "dcr4.5.dtd" "interwoven/dcr4.5.dtd"
and copied the DTD into WEB-INF\entities\interwoven, however, it still
doesn't find the DTD. Turning up the logging (and this is where it becomes
more relevant to Xerces than Cocoon, and why I'm asking here rather than
cocoon-user) I discovered that the system ID being passed in to the catalog
resolver already had the full path to the file, so it's not matching the
above entry in the catalog. Since the path to the documents could be more
or less anything, I can't use a (prefix-based) rewrite entry in the catalog;
likewise it's impractical to include a system entry for every possible path,
since I don't know in advance what they're going to be. Digging through the
Cocoon & Xerces source code, I discovered the path being received by the
catalog resolver has come from the EntityResolverWrapper i.e. the
resourceIdentifier.getExpandedSystemId() I mentioned above. Presumably, if
that had used getLiteralSystemId() instead, the catalog resolver would have
received just "dcr4.5.dtd" for the system ID rather than the full path, and
would have matched it okay. But I'm wary of changing it myself, since I
don't know what else might be affected (and I'd rather avoid using a
custom-built Xerces in our Cocoon app, to minimise the risk of introducing
other side-effects).
I notice in the current CVS HEAD, there's an EntityResolver2Wrapper class;
this one does use getLiteralSystemId(), in fact the latest CVS log message
on that class says
"Fixing a bug. The systemId passed to EntityResolver2.resolveEntity may be
an absolute or relative URI. That is it should be the literal system
identifier, not the expanded one which resolved from the base URI."
However, I also found an old (> 2 years) mailing list message
(http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=xerces-j-user@xml.apache.org&msgId=568021)
which says that
"The reason Xerces now returns fully-expanded URI's to the Entity resolver
is that SAX quite explicitly states that this is what XML processors are
supposed to do."
So now I'm twice as confused. Do the SAX2 Extensions 1.1 say that
EntityResolver2 should behave differently from EntityResolver? Or have
things changed since EntityResolverWrapper switched to using
getExpandedSystemId(), and should it now be using getLiteralSystemId() after
all?
In the meantime I can work around my problem by plugging in a custom
EntityResolver which replaces any system IDs ending with "dcr4.5.dtd" with
just that string, before passing it on to the XML commons catalog resolver
as before. But it'd be nice if it could be clarified how exactly Xerces'
wrapper classes are supposed to work, so I know if I should be raising a bug
:-)
Andrew.
--
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
Re: EntityResolverWrapper
Posted by Andrew Stevens <at...@hotmail.com>.
Thanks, I think I "get it" now. It looks like EntityResolver2 is the new &
improved version to fix some of the limitations of EntityResolver (like the
one I'm running up against). But given that the Version in
EntityResolver2's javadocs is "TBD" and the Xerces wrapper isn't in any
released version yet (at least, I assume the Xerces-J_01052005 CVS tag is a
between-versions build?), I won't hold my breath waiting for Excalibur and
then Cocoon to start using the new version.
org.apache.xerces.util.XMLCatalogResolver looks interesting - I see it has a
useLiteralSystemId property, but that only gets used by
resolveIdentifier/resolveEntity(XMLResourceIdentifier) and not by
resolveEntity(String,String) i.e. only the ones that throw XNIException, so
I'm not sure it would help me in this instance. Besides, given the way
Excalibur componentises EntityResolver, I suspect my current plan
(subclassing their DefaultResolver, which already uses XML Commons'
CatalogResolver) will be quicker than trying to plug the XMLCatalogResolver
into it. Something to play around with if I have time on my hands, though.
Andrew.
>From: Michael Glavassevich <mr...@ca.ibm.com>
>Reply-To: xerces-j-user@xml.apache.org
>To: xerces-j-user@xml.apache.org
>Subject: Re: EntityResolverWrapper
>Date: Mon, 28 Feb 2005 09:16:46 -0500
>
>Hi Andrew,
>
>EntityResolverWrapper is a wrapper for org.xml.sax.EntityResolver. The
>system ID passed to EntityResolver.resolveEntity() is the "expanded system
>ID". Specifically the docs for resolveEntity() [1] say: "if the system
>identifier is a URL, the SAX parser must resolve it fully before reporting
>it to the application" and that's exactly what the parser does. The other
>wrapper is for EntityResolver2 [2] whose resolveEntity() methods takes the
>literal system ID along with a base URI, so yes the two resolvers behave
>differently. Xerces has a utility class called
>org.apache.xerces.util.XMLCatalogResolver which uses the XML commons
>catalog resolver. You may want to have a look at it.
>
>Hope that helps.
>
>[1]
>http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String)
>[2]
>http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html#resolveEntity(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)
>
>"Andrew Stevens" <at...@hotmail.com> wrote on 02/28/2005 08:47:52 AM:
>
> > In
> > org.apache.xerces.util.EntityResolverWrapper.
> > resolveEntity(XMLResourceIdentifier
> > resourceIdentifier), it has the line
> > String sysId = resourceIdentifier.getExpandedSystemId();
> > Is there some particular reason this uses the expanded system ID rather
>than
> > using getLiteralSystemId()?
> >
> > I've got a problem with some XML files I'm processing with Cocoon. The
> > files all contain a DOCTYPE that uses a relative path for the system ID
>i.e.
> > <!DOCTYPE record SYSTEM "dcr4.5.dtd"> The documents are created by an
> > another application, and I can't affect what it puts in there. Trying
>to
> > read the files generates a parser error since the DTD isn't present in
>the
> > directory containing the documents; no problem, I thought, just use a
> > suitable entry in the catalog used by Cocoon's EntityResolver. So,
> > following the other entries, I added
> > SYSTEM "dcr4.5.dtd" "interwoven/dcr4.5.dtd"
> > and copied the DTD into WEB-INF\entities\interwoven, however, it still
> > doesn't find the DTD. Turning up the logging (and this is where it
>becomes
> > more relevant to Xerces than Cocoon, and why I'm asking here rather than
>
> > cocoon-user) I discovered that the system ID being passed in to the
>catalog
> > resolver already had the full path to the file, so it's not matching the
>
> > above entry in the catalog. Since the path to the documents could be
>more
> > or less anything, I can't use a (prefix-based) rewrite entry in the
>catalog;
> > likewise it's impractical to include a system entry for every possible
>path,
> > since I don't know in advance what they're going to be. Digging through
>the
> > Cocoon & Xerces source code, I discovered the path being received by the
>
> > catalog resolver has come from the EntityResolverWrapper i.e. the
> > resourceIdentifier.getExpandedSystemId() I mentioned above. Presumably,
>if
> > that had used getLiteralSystemId() instead, the catalog resolver would
>have
> > received just "dcr4.5.dtd" for the system ID rather than the full path,
>and
> > would have matched it okay. But I'm wary of changing it myself, since I
>
> > don't know what else might be affected (and I'd rather avoid using a
> > custom-built Xerces in our Cocoon app, to minimise the risk of
>introducing
> > other side-effects).
> >
> > I notice in the current CVS HEAD, there's an EntityResolver2Wrapper
>class;
> > this one does use getLiteralSystemId(), in fact the latest CVS log
>message
> > on that class says
> > "Fixing a bug. The systemId passed to EntityResolver2.resolveEntity may
>be
> > an absolute or relative URI. That is it should be the literal system
> > identifier, not the expanded one which resolved from the base URI."
> > However, I also found an old (> 2 years) mailing list message
> > (http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=xerces-
> > j-user@xml.apache.org&msgId=568021)
> > which says that
> > "The reason Xerces now returns fully-expanded URI's to the Entity
>resolver
> > is that SAX quite explicitly states that this is what XML processors are
>
> > supposed to do."
> > So now I'm twice as confused. Do the SAX2 Extensions 1.1 say that
> > EntityResolver2 should behave differently from EntityResolver? Or have
> > things changed since EntityResolverWrapper switched to using
> > getExpandedSystemId(), and should it now be using getLiteralSystemId()
>after
> > all?
> >
> > In the meantime I can work around my problem by plugging in a custom
> > EntityResolver which replaces any system IDs ending with "dcr4.5.dtd"
>with
> > just that string, before passing it on to the XML commons catalog
>resolver
> > as before. But it'd be nice if it could be clarified how exactly
>Xerces'
> > wrapper classes are supposed to work, so I know if I should be raising a
>bug
> > :-)
> >
> >
> > Andrew.
>
>Michael Glavassevich
>XML Parser Development
>IBM Toronto Lab
>E-mail: mrglavas@ca.ibm.com
>E-mail: mrglavas@apache.org
--
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org
Re: EntityResolverWrapper
Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Andrew,
EntityResolverWrapper is a wrapper for org.xml.sax.EntityResolver. The
system ID passed to EntityResolver.resolveEntity() is the "expanded system
ID". Specifically the docs for resolveEntity() [1] say: "if the system
identifier is a URL, the SAX parser must resolve it fully before reporting
it to the application" and that's exactly what the parser does. The other
wrapper is for EntityResolver2 [2] whose resolveEntity() methods takes the
literal system ID along with a base URI, so yes the two resolvers behave
differently. Xerces has a utility class called
org.apache.xerces.util.XMLCatalogResolver which uses the XML commons
catalog resolver. You may want to have a look at it.
Hope that helps.
[1]
http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String)
[2]
http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html#resolveEntity(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)
"Andrew Stevens" <at...@hotmail.com> wrote on 02/28/2005 08:47:52 AM:
> In
> org.apache.xerces.util.EntityResolverWrapper.
> resolveEntity(XMLResourceIdentifier
> resourceIdentifier), it has the line
> String sysId = resourceIdentifier.getExpandedSystemId();
> Is there some particular reason this uses the expanded system ID rather
than
> using getLiteralSystemId()?
>
> I've got a problem with some XML files I'm processing with Cocoon. The
> files all contain a DOCTYPE that uses a relative path for the system ID
i.e.
> <!DOCTYPE record SYSTEM "dcr4.5.dtd"> The documents are created by an
> another application, and I can't affect what it puts in there. Trying
to
> read the files generates a parser error since the DTD isn't present in
the
> directory containing the documents; no problem, I thought, just use a
> suitable entry in the catalog used by Cocoon's EntityResolver. So,
> following the other entries, I added
> SYSTEM "dcr4.5.dtd" "interwoven/dcr4.5.dtd"
> and copied the DTD into WEB-INF\entities\interwoven, however, it still
> doesn't find the DTD. Turning up the logging (and this is where it
becomes
> more relevant to Xerces than Cocoon, and why I'm asking here rather than
> cocoon-user) I discovered that the system ID being passed in to the
catalog
> resolver already had the full path to the file, so it's not matching the
> above entry in the catalog. Since the path to the documents could be
more
> or less anything, I can't use a (prefix-based) rewrite entry in the
catalog;
> likewise it's impractical to include a system entry for every possible
path,
> since I don't know in advance what they're going to be. Digging through
the
> Cocoon & Xerces source code, I discovered the path being received by the
> catalog resolver has come from the EntityResolverWrapper i.e. the
> resourceIdentifier.getExpandedSystemId() I mentioned above. Presumably,
if
> that had used getLiteralSystemId() instead, the catalog resolver would
have
> received just "dcr4.5.dtd" for the system ID rather than the full path,
and
> would have matched it okay. But I'm wary of changing it myself, since I
> don't know what else might be affected (and I'd rather avoid using a
> custom-built Xerces in our Cocoon app, to minimise the risk of
introducing
> other side-effects).
>
> I notice in the current CVS HEAD, there's an EntityResolver2Wrapper
class;
> this one does use getLiteralSystemId(), in fact the latest CVS log
message
> on that class says
> "Fixing a bug. The systemId passed to EntityResolver2.resolveEntity may
be
> an absolute or relative URI. That is it should be the literal system
> identifier, not the expanded one which resolved from the base URI."
> However, I also found an old (> 2 years) mailing list message
> (http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=xerces-
> j-user@xml.apache.org&msgId=568021)
> which says that
> "The reason Xerces now returns fully-expanded URI's to the Entity
resolver
> is that SAX quite explicitly states that this is what XML processors are
> supposed to do."
> So now I'm twice as confused. Do the SAX2 Extensions 1.1 say that
> EntityResolver2 should behave differently from EntityResolver? Or have
> things changed since EntityResolverWrapper switched to using
> getExpandedSystemId(), and should it now be using getLiteralSystemId()
after
> all?
>
> In the meantime I can work around my problem by plugging in a custom
> EntityResolver which replaces any system IDs ending with "dcr4.5.dtd"
with
> just that string, before passing it on to the XML commons catalog
resolver
> as before. But it'd be nice if it could be clarified how exactly
Xerces'
> wrapper classes are supposed to work, so I know if I should be raising a
bug
> :-)
>
>
> Andrew.
> --
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org