You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by Magnus Strand <ma...@tim.se> on 2003/04/22 10:31:21 UTC

Problems parsing entities

Hi,

I have problems with parsing entities.
When I print the test XML file below with the DOMPrint sample I get
problems
with entities (both text and character entities).
The second time I use an enitity it gets outputted twice!
The third time I use an enitity it gets outputted three times and so on.

I tested also with Xerces-J without problems.

I spent a day debugging with VC++7 on Win 2000 and also CodeWarrior 8.3
on Mac.

It seems to be a problem in AbstractDOMParser::parse.
I think it was in the docCharacterData-method that the entitys content
got appended
to the DOMEntityRefImpl when it was used.
The DOMEntityRefImpl-node is changed  from being read-only to
read/write,
when the text in the test-element i parsed.
I wonder if this is correct?
If DOMEntityRefImpl was read/only at this time it wouldn't get double
content.

Does anyone know what is the solution to this problem?

------------
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE test
[
<!ELEMENT test (#PCDATA)>
<!ENTITY greeting     "hi">
]
><test>&greeting;|&greeting;|&greeting;</test>
--------

I get this output:
---------
<?xml version="1.0" encoding="iso-8859-1" standalone="no" ?><!DOCTYPE
test [
<!ELEMENT test (#PCDATA)>
<!ENTITY greeting "hi">
]><test>hi|hihi|hihihi</test>
------------


Regards,
Magnus Strand

–––––––––––––––––––––––––––––––––––––––––––––––––
System Developer, MSc

Teknik i Media Sverige AB (publ)
Södra Förstadsgatan 2, SE-211 43 Malmö, Sweden
http://www.tim.se
–––––––––––––––––––––––––––––––––––––––––––––––––


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: Problems parsing entities

Posted by Gareth Reakes <ga...@decisionsoft.com>.
Hi,

	this is a bug. I am looking into it further and will produce a 
Bugzilla report and/or fix for it.

Gareth




On Tue, 22 Apr 2003, Magnus Strand wrote:

> Hi,
> 
> I have problems with parsing entities.
> When I print the test XML file below with the DOMPrint sample I get
> problems
> with entities (both text and character entities).
> The second time I use an enitity it gets outputted twice!
> The third time I use an enitity it gets outputted three times and so on.
> 
> I tested also with Xerces-J without problems.
> 
> I spent a day debugging with VC++7 on Win 2000 and also CodeWarrior 8.3
> on Mac.
> 
> It seems to be a problem in AbstractDOMParser::parse.
> I think it was in the docCharacterData-method that the entitys content
> got appended
> to the DOMEntityRefImpl when it was used.
> The DOMEntityRefImpl-node is changed  from being read-only to
> read/write,
> when the text in the test-element i parsed.
> I wonder if this is correct?
> If DOMEntityRefImpl was read/only at this time it wouldn't get double
> content.
> 
> Does anyone know what is the solution to this problem?
> 
> ------------
> <?xml version="1.0" encoding="iso-8859-1"?>
> <!DOCTYPE test
> [
> <!ELEMENT test (#PCDATA)>
> <!ENTITY greeting     "hi">
> ]
> ><test>&greeting;|&greeting;|&greeting;</test>
> --------
> 
> I get this output:
> ---------
> <?xml version="1.0" encoding="iso-8859-1" standalone="no" ?><!DOCTYPE
> test [
> <!ELEMENT test (#PCDATA)>
> <!ENTITY greeting "hi">
> ]><test>hi|hihi|hihihi</test>
> ------------
> 
> 
> Regards,
> Magnus Strand
> 
> –––––––––––––––––––––––––––––––––––––––––––––––––
> System Developer, MSc
> 
> Teknik i Media Sverige AB (publ)
> Södra Förstadsgatan 2, SE-211 43 Malmö, Sweden
> http://www.tim.se
> –––––––––––––––––––––––––––––––––––––––––––––––––
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> 
> 

-- 
Gareth Reakes, Head of Product Development  +44-1865-203192
DecisionSoft Limited                        http://www.decisionsoft.com
XML Development and Services




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Problems parsing entities (using DOM parser)

Posted by Gareth Reakes <ga...@decisionsoft.com>.

I'll take a look at this today.

Gareth


On Fri, 25 Apr 2003, Magnus Strand wrote:

> Hi,
> 
> I would like to know if anyone could confirm if the problem mentioned
> in my previous e-mail is a bug or not?
> 
> 
> Many thanks,
> Magnus Strand
> 
> PS. Thanks Erik for the code for getTextContent, it works good.
> 
> –––––––––––––––––––––––––––––––––––––––––––––––––
> System Developer, MSc
> 
> Teknik i Media Sverige AB (publ)
> Södra Förstadsgatan 2, SE-211 43 Malmö, Sweden
> http://www.tim.se
> –––––––––––––––––––––––––––––––––––––––––––––––––
> 

-- 
Gareth Reakes, Head of Product Development  +44-1865-203192
DecisionSoft Limited                        http://www.decisionsoft.com
XML Development and Services




---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


RE: Problems parsing entities (using DOM parser)

Posted by Magnus Strand <ma...@tim.se>.
Hi,

I would like to know if anyone could confirm if the problem mentioned
in my previous e-mail is a bug or not?


Many thanks,
Magnus Strand

PS. Thanks Erik for the code for getTextContent, it works good.

–––––––––––––––––––––––––––––––––––––––––––––––––
System Developer, MSc

Teknik i Media Sverige AB (publ)
Södra Förstadsgatan 2, SE-211 43 Malmö, Sweden
http://www.tim.se
–––––––––––––––––––––––––––––––––––––––––––––––––