You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Mark Weaver <ma...@npsl.co.uk> on 2003/03/22 16:14:47 UTC

EntityResolver doesn't receive the baseURI

>From an EntityResolver, I can't see anyway of getting the base URI of the URI that is passed in for resolution.  In some instances I can store it (e.g. I can do this for the document), but if the document references a DTD which has imports, then I can't sensibly resolve these.  Is this simply an oversight?  It looks like a reasonably simple change to make, so I am happy to do this unless I've missed some other way of getting the information I need.

Thanks,

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: EntityResolver doesn't receive the baseURI (repost)

Posted by Mark Weaver <ma...@npsl.co.uk>.
Right, got it this time, thanks :)

> -----Original Message-----
> From: Alberto Massari [mailto:amassari@progress.com]
> Sent: 03 April 2003 15:59
> To: xerces-c-dev@xml.apache.org
> Cc: Mark Weaver
> Subject: RE: EntityResolver doesn't receive the baseURI (repost)
>
>
> At 15.35 03/04/2003 +0100, you wrote:
> >Didn't get any response on this the first time,
>
> Strange, I did answer your e-mail....
> In any case, what I was telling you is that the EntityResolver
> interface is
> a standard SAX interface, and I guess nobody would like to have
> it changed.
> The proper way to add this new argument is to implement the SAX2
> Extensions, that defines a new EntityResolver2 interface, defined
> as follows:
>
> EntityResolver2 : public EntityResolver
> {
>   // Allows applications to provide an external subset for documents that
> don't explicitly define one.
>   InputSource getExternalSubset(String name, String baseURI)
>   // Allows applications to map references to external entities
> into input
> sources, or tell the parser it should use conventional URI resolution.
>   InputSource resolveEntity(String name, String publicId, String baseURI,
> String systemId)
> }
>
> (see
> http://sax.sourceforge.net/apidoc/org/xml/sax/ext/EntityResolver2.html )
>
> Alberto
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: EntityResolver doesn't receive the baseURI (repost)

Posted by Alberto Massari <am...@progress.com>.
At 15.35 03/04/2003 +0100, you wrote:
>Didn't get any response on this the first time,

Strange, I did answer your e-mail....
In any case, what I was telling you is that the EntityResolver interface is 
a standard SAX interface, and I guess nobody would like to have it changed.
The proper way to add this new argument is to implement the SAX2 
Extensions, that defines a new EntityResolver2 interface, defined as follows:

EntityResolver2 : public EntityResolver
{
  // Allows applications to provide an external subset for documents that 
don't explicitly define one.
  InputSource getExternalSubset(String name, String baseURI)
  // Allows applications to map references to external entities into input 
sources, or tell the parser it should use conventional URI resolution.
  InputSource resolveEntity(String name, String publicId, String baseURI, 
String systemId)
}

(see http://sax.sourceforge.net/apidoc/org/xml/sax/ext/EntityResolver2.html )

Alberto



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: EntityResolver doesn't receive the baseURI

Posted by Alberto Massari <am...@progress.com>.
At 15.14 22/03/2003 +0000, Mark Weaver wrote:
> From an EntityResolver, I can't see anyway of getting the base URI of the 
> URI that is passed in for resolution.  In some instances I can store it 
> (e.g. I can do this for the document), but if the document references a 
> DTD which has imports, then I can't sensibly resolve these.  Is this 
> simply an oversight?

I guess the reason is because the SAX specs don't have that parameter in 
the signature of the callback function; but SAX 2.0.1 has it. 
Unfortunately, it's still listed at 
http://xml.apache.org/xerces-c/releases_plan.html as not having a volunteer 
assigned....

We worked around this problem by adding the argument to the callback; but 
you should be able to get around this problem also by calling 
getScanner().getLastExtLocation() from within the resolver.

Alberto

>It looks like a reasonably simple change to make, so I am happy to do this 
>unless I've missed some other way of getting the information I need.
>
>Thanks,
>
>Mark
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-c-dev-help@xml.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: DOMInputSource (was: EntityResolver doesn't receive the baseURI)

Posted by Colin Paul Adams <co...@colina.demon.co.uk>.
Has any progress been made on this issue?
-- 
Colin Paul Adams
Preston Lancashire

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: DOMInputSource (was: EntityResolver doesn't receive the baseURI)

Posted by Khaled Noaman <kn...@ca.ibm.com>.
A while back, I posted a note [1] on how the introduction of
DOMInputSource and DOMEntityResolver will affect the
Xerces-C++ internal components. There was not much of a
response. I hope that this note will give people a good overview
of the how Xerces-C++ works internally. We currently have
two input source (IS) wrappers (SAX IS->DOM IS) and
(DOM IS->SAX IS). Those wrappers allow users to use the
LocalFile/MemBuf/etc. classes (which are SAX IS) and pass
them to a DOMBuilder->parse method. Since, SAX IS does
not have a method to get/set the base URI, the resolve entity
of the DOM entity resolver will be passed a null string. One
change that we can make is to add a get/setbaseURI on SAX
IS which we can then pass to the resolve entity method.

Khaled
[1] http://marc.theaimsgroup.com/?l=xerces-c-dev&m=102086592028855&w=2

Colin Paul Adams wrote:

> >>>>> "Gareth" == Gareth Reakes <ga...@decisionsoft.com> writes:
>
>     Gareth> Apologies, I did not mean this to sound as if I was
>     Gareth> expecting you to provide anything. I really do mean that I
>
> Oh, I didn't interpret it like that. I was actually intending to write
> the code some time in the future, when i needed it.
>
>     Gareth> can do this in short order if this is helpful to you and
>     Gareth> is all that is required to be helpful. By new spec I was
>     Gareth> referring to the one released at the end of Feb.
>
> It will be helpful.
>
> I have written a complete (just about) interface for Eiffel to the
> DOM, implemented as bridging code to xerces-c 2.2.
>
> Because I have not done any SAX interface, it is currently impossible
> to write an entity resolver.
>
> I know that I will need it some time down the line (not exactly sure
> when yet).
> --
> Colin Paul Adams
> Preston Lancashire
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: DOMInputSource (was: EntityResolver doesn't receive the baseURI)

Posted by Colin Paul Adams <co...@colina.demon.co.uk>.
>>>>> "Gareth" == Gareth Reakes <ga...@decisionsoft.com> writes:

    Gareth> Apologies, I did not mean this to sound as if I was
    Gareth> expecting you to provide anything. I really do mean that I

Oh, I didn't interpret it like that. I was actually intending to write
the code some time in the future, when i needed it.

    Gareth> can do this in short order if this is helpful to you and
    Gareth> is all that is required to be helpful. By new spec I was
    Gareth> referring to the one released at the end of Feb.

It will be helpful.

I have written a complete (just about) interface for Eiffel to the
DOM, implemented as bridging code to xerces-c 2.2.

Because I have not done any SAX interface, it is currently impossible
to write an entity resolver.

I know that I will need it some time down the line (not exactly sure
when yet).
-- 
Colin Paul Adams
Preston Lancashire

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: DOMInputSource (was: EntityResolver doesn't receive the baseURI)

Posted by Gareth Reakes <ga...@decisionsoft.com>.
Apologies, I did not mean this to sound as if I was expecting you to 
provide anything. I really do mean that I can do this in short order if 
this is helpful to you and is all that is required to be helpful. By new 
spec I was referring to the one released at the end of Feb.

Gareth


On 3 Apr 2003, Colin Paul Adams wrote:

> >>>>> "Gareth" == Gareth Reakes <ga...@decisionsoft.com> writes:
> 
>     Gareth> I can change the interface over to the new spec in fairly
>     Gareth> short order if that would be helpful (or you could provide
>     Gareth> a patch). Is this all that would be required?
> 
> Yes. (new spec? That was the spec way back in april).
> 
> I hadn't intended to raise this until I was ready to provide a patch,
> but as the matter was raised...
> 

-- 
Gareth Reakes, Head of Product Development  +44-1865-203192
DecisionSoft Limited                        http://www.decisionsoft.com
XML Development and Services




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: DOMInputSource (was: EntityResolver doesn't receive the baseURI)

Posted by Colin Paul Adams <co...@colina.demon.co.uk>.
>>>>> "Gareth" == Gareth Reakes <ga...@decisionsoft.com> writes:

    Gareth> I can change the interface over to the new spec in fairly
    Gareth> short order if that would be helpful (or you could provide
    Gareth> a patch). Is this all that would be required?

Yes. (new spec? That was the spec way back in april).

I hadn't intended to raise this until I was ready to provide a patch,
but as the matter was raised...
-- 
Colin Paul Adams
Preston Lancashire

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: DOMInputSource (was: EntityResolver doesn't receive the baseURI)

Posted by Gareth Reakes <ga...@decisionsoft.com>.

I can change the interface over to the new spec in fairly short order if 
that would be helpful (or you could provide a patch). Is this all that 
would be required?

Gareth


> I've been meaning to mention this for a long time now. The official (?
> is 3.0 Load/Save live now) DOMEntityResolver requires a
> DOMInputSource, but Xerces-c does not fully implement this. In fact,
> it doesn't implement the bits that would make it usable! (i.e. the
> attributes byteStream/characterStream/stringData). Without one of
> these methods, it is impossible to actually use a DOMInputSource (at
> least, with a pure DOM interface, which is all I can use), as the only
> official method of creating a DOMInputSource creates an empty one. So
> I hope this interface will be rounded off for 2.3.
> 

-- 
Gareth Reakes, Head of Product Development  +44-1865-203192
DecisionSoft Limited                        http://www.decisionsoft.com
XML Development and Services




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


DOMInputSource (was: EntityResolver doesn't receive the baseURI)

Posted by Colin Paul Adams <co...@colina.demon.co.uk>.
>>>>> "Gareth" == Gareth Reakes <ga...@decisionsoft.com> writes:

    Gareth> Hi, sorry about the lack of response the first time. Is
    Gareth> there a reason why you can't go over to use the official
    Gareth> DOM stuff? If so perhaps we should have 2 constructors so
    Gareth> we don't break backwards compatibility. How does this
    Gareth> sound to you?

I've been meaning to mention this for a long time now. The official (?
is 3.0 Load/Save live now) DOMEntityResolver requires a
DOMInputSource, but Xerces-c does not fully implement this. In fact,
it doesn't implement the bits that would make it usable! (i.e. the
attributes byteStream/characterStream/stringData). Without one of
these methods, it is impossible to actually use a DOMInputSource (at
least, with a pure DOM interface, which is all I can use), as the only
official method of creating a DOMInputSource creates an empty one. So
I hope this interface will be rounded off for 2.3.
-- 
Colin Paul Adams
Preston Lancashire

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: EntityResolver doesn't receive the baseURI (implementation questions)

Posted by Mark Weaver <ma...@npsl.co.uk>.
Alright, that didn't quite work sadly, as the base class overload is always
called.  Tacking a slightly different tack, so far I've:

- Added EntityResolver2 as described at
http://sax.sourceforge.net/apidoc/org/xml/sax/ext/EntityResolver2.html
- Added the feature flag  http://xml.org/sax/features/use-entity-resolver2
defaulting to true
- In order to distinguish between ER2 and ER without RTTI added
"getEntityResolverVersion" to EntityResolver which allows doing the right
thing with a static_cast
- Modified the SAX2 parser to call resolveEntity with the baseURI

Things I'm not clear on:

- Where 'name' should come from.  Description: Identifies the external
entity being resolved. Either "[dtd]" for the external subset, or a name
starting with "%" to indicate a parameter entity, or else the name of a
general entity. This is never null when invoked by a SAX2 parser.  Looks to
me like I need to modify ReaderMgr::createReader to pass this information in
as well, and modify the resolveEntity() function to take name as a paramter.
Is that right?
- Which other parsers require this modification.  There seem to be a lot of
them, and I'm not clear on what they all do.
- Add ER2 derivation to DefaultHandler.  DefaultHandler appears to be a
mismash of the SAX2 DefaultHandler2 and DefaultHandler implementations.
Could call this DefaultHandler2 or add a DefaultHandler2 that overrides
getEntityResolverVersion (thus preserving backward compatiblity)
- How to implement getExternalSubset.  I'm entirely clueless on this.

Obviously lots of pointers required here.  I have got it to the point where
I can do what I want, but I'm sure that's not entirely helpful :)

Mark

> -----Original Message-----
> From: Mark Weaver [mailto:mark@npsl.co.uk]
> Sent: 03 April 2003 19:21
> To: xerces-c-dev@xml.apache.org
> Subject: RE: EntityResolver doesn't receive the baseURI (repost)
>
>
> My reading of it was that you need an overload for setEntityResolver that
> takes an EntityResolver2 and does the right thing dependent on if you have
> ER2, ER, or nothing and if the appropriate features flag is set.  I was
> planning on implementing it in this fashion.  Does this seem OK?
>
> Mark
>
> > -----Original Message-----
> > From: Gareth Reakes [mailto:gareth@decisionsoft.com]
> > Sent: 03 April 2003 16:01
> > To: xerces-c-dev@xml.apache.org
> > Subject: RE: EntityResolver doesn't receive the baseURI (repost)
> >
> >
> > Hi,
> > 	sorry about the lack of response the first time. Is there a reason
> > why you can't go over to use the official DOM stuff? If so perhaps we
> > should have 2 constructors so we don't break backwards
> compatibility. How
> > does this sound to you?
> >
> > Gareth
> >
> > --
> > Gareth Reakes, Head of Product Development  +44-1865-203192
> > DecisionSoft Limited                        http://www.decisionsoft.com
> > XML Development and Services
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: EntityResolver doesn't receive the baseURI (repost)

Posted by Mark Weaver <ma...@npsl.co.uk>.
My reading of it was that you need an overload for setEntityResolver that
takes an EntityResolver2 and does the right thing dependent on if you have
ER2, ER, or nothing and if the appropriate features flag is set.  I was
planning on implementing it in this fashion.  Does this seem OK?

Mark

> -----Original Message-----
> From: Gareth Reakes [mailto:gareth@decisionsoft.com]
> Sent: 03 April 2003 16:01
> To: xerces-c-dev@xml.apache.org
> Subject: RE: EntityResolver doesn't receive the baseURI (repost)
>
>
> Hi,
> 	sorry about the lack of response the first time. Is there a reason
> why you can't go over to use the official DOM stuff? If so perhaps we
> should have 2 constructors so we don't break backwards compatibility. How
> does this sound to you?
>
> Gareth
>
> --
> Gareth Reakes, Head of Product Development  +44-1865-203192
> DecisionSoft Limited                        http://www.decisionsoft.com
> XML Development and Services
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: EntityResolver doesn't receive the baseURI (repost)

Posted by Gareth Reakes <ga...@decisionsoft.com>.
Hi,
	sorry about the lack of response the first time. Is there a reason 
why you can't go over to use the official DOM stuff? If so perhaps we 
should have 2 constructors so we don't break backwards compatibility. How 
does this sound to you?

Gareth

-- 
Gareth Reakes, Head of Product Development  +44-1865-203192
DecisionSoft Limited                        http://www.decisionsoft.com
XML Development and Services




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: EntityResolver doesn't receive the baseURI (repost)

Posted by Mark Weaver <ma...@npsl.co.uk>.
Didn't get any response on this the first time, perhaps to be slightly more
clear, when parsing:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

<head>
<title>A</title>
</head>

<body>A</body>
</html>

my entity resolver will receive calls to resolve:

http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
xhtml-lat1.ent
xhtml-symbol.ent
...

For the subsequent requests, I need these to resolve to:

http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent

etc.

Without being given access to the base URI, there are situations under which
this is impossible to resolve correctly.  I would therefore like to change
the EntityResolver to pass the base URI in.  DOMEntityResolver includes the
base URI:

class DOMEntityResolver {
virtual DOMInputSource * 
resolveEntity (const XMLCh *const publicId, const XMLCh *const systemId,
const XMLCh *const baseURI)=0;
}

however the default entity resolver is declared as:

class EntityResolver {
virtual InputSource * 
resolveEntity (const XMLCh *const publicId, const XMLCh *const systemId)=0;
}
 
I would make these situations consistent, and pass the baseURI parameter to
the default entity resolver.  Objections/comments?

Thanks,

Mark

> -----Original Message-----
> From: Mark Weaver [mailto:mark@npsl.co.uk]
> Sent: 22 March 2003 15:15
> To: Xerces-C-Dev
> Subject: EntityResolver doesn't receive the baseURI
>
>
> From an EntityResolver, I can't see anyway of getting the base
> URI of the URI that is passed in for resolution.  In some
> instances I can store it (e.g. I can do this for the document),
> but if the document references a DTD which has imports, then I
> can't sensibly resolve these.  Is this simply an oversight?  It
> looks like a reasonably simple change to make, so I am happy to
> do this unless I've missed some other way of getting the
> information I need.
>
> Thanks,
>
> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org