You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-dev@axis.apache.org by jayachandra <ja...@gmail.com> on 2005/04/25 15:26:24 UTC

[Axis2] [Update] XMLConformace Testing Report.

Hi all,

Total file count in W3C XMLSuite :2634 (this includes, valid, invalid
and illformed xmls too)
 
Of them, valid ones                    :960 (i.e. excluding invalid
and illformed xmls. However this includes XMLs of both versions 1.0
and 1.1)

Of them, valid XML1.0 ones         :832 (i.e excluding xmls from 1.1
version folders. Since the MXParser we have beneath is only 1.0
compliant)

On this final set, when OM is tested as is. 335 files got parsed
properly, and 309 files had the serialized XML matching the input file
(comparison test).
 
I've implemented OMComment and OMPI and did minimalistic OMDTD
(without validation etc.) support. And with those changes the parsing
rate increased to 735 and comparison success reached 567.

The parsing failures found can be attributed to one or more of the
following observations I could make. This is not an exhaustive list
though.

1. For files where XML declaration line has a mention of 'standalone'
attribute prior to 'encoding' attribute, underlying MXParser threw an
exception with a message reading something like "Expected 'e' in
encoding and not 's' ". Alek! Is this a known issue with STAX. What do
you think?

2. For files in which DTD declaration has right square bracket (']')
as a literal value of some entity, MXParser is treating it as end of
DTD declaration.

3. Some xmls having multi byte characters (UK currency pound sign
amongst others) are failing to get parsed with typical exception
messages like only whitespace content allowed before start tag and not
\ufffd. I have passed a "UTF-8" aware reader to the builder, do I need
to use something else here?

4. Apart from these because I couldn't implement the complete DTD info
set implementation, some more files are failing to get parsed.
 

Regarding the comparison, some of the observed reasons of failures are…

1. Many SYSTEM identifiers in DTD declarations used a relative
reference and so far we don't have considered 'baseURI' property (does
STAX parser provide one?) for any of the elements and hence the XML
comparator (xmlunit) couldn't resolve the system identifiers thereby
leading to a mismatch between the serialized xml and the original
input form.
2. Also since the DTD support is naïve, the presentation of data is
completely ignored thereby leading to scenarios like, serializing as
#PCDATA when DTD says CDATA. This also lead to significant comparison
failures.
 

Thanks
Jaya

--
-- Jaya

Re: [Axis2] XML + Namespaces + Base [Re: [Axis2] [Update] XMLConformace Testing Report.

Posted by Sanjiva Weerawarana <sa...@opensource.lk>.

Aleksander Slominski wrote:

> XML 1.0 makes "xml*" names reserved and XML Namespaces spec defines 
> special xmlns* processing and additionally requires that namespace 
> prefix xml is always bound to special namespace.

Yes I know .. but note that its the XML Namespaces spec that defines a 
special
prefix for xmlns. (No one defines a special prefix for "xml"!). So, if 
you're doing
XML 1.0 tests, there is no concept of a prefix - all you have is that 
names starting
with the characters [xX][mM][lL] are reservered.

> it is additional specification http://www.w3.org/TR/xmlbase/
> that is used when resolving relative links in XML documents
> and i do not think it is required in pure XML + Namespaces
> http://www.w3.org/TR/REC-xml-names/

Right, and here we're talking about XML 1.0 tests; which means XML Base
doesn't come into consideration either.

> but nonetheless is popular and useful in some situations
> including SOAP 1.2:
> defined by this specification (see "SOAP uses XML Base [XML Base] for 
> determining a base URI for relative URI references used as values in 
> information items6. Use of URIs in SOAP)."
> http://www.w3.org/TR/2003/REC-soap12-part1-20030624/

Yep.

> so in conclusion OM needs to support: XML 1.0 + Namespaces + XML Base
> to support fully SOAP 1.2 though it is rarely used IMHO ...

OM needs to support the subset of XML 1.0 as used in SOAP + XML namespaces
+ XML Base. Other stuff is optional and is only necessary if OM is used 
for more
general XML processing; which is not our primary concern.

To be clear, I'm not against someone extending OM to have full XML 1.0 
support,
but IFF it has absolutely no performance or negative API impact on us.

Sanjiva.

[Axis2] XML + Namespaces + Base [Re: [Axis2] [Update] XMLConformace Testing Report.

Posted by Aleksander Slominski <as...@cs.indiana.edu>.

Sanjiva Weerawarana wrote:

>"jayachandra" <ja...@gmail.com> writes:
>  
>
>>-->A default namespace for 'xml' prefix is supposed to be in
>>the scopeof every XML element. I did a work around on my machine
>>as todeclaring this namespace inside the OMElementImpl constructor
>>methodsitself, before running the tests.
>>    
>>
>
>??? I have no idea what you're saying .. XML 1.0 has no concept
>of namespaces! XML 1.0 *+* Namespaces does but not the base XML
>spec. 
>  
>
XML 1.0 makes "xml*" names reserved and XML Namespaces spec defines 
special xmlns* processing and additionally requires that namespace 
prefix xml is always bound to special namespace.

>  
>
>>-->The 'baseURI' property support is not provided by OM 
>>insideOMElement. If we can keep track of this one thing in OM 
>>it can help usreduce the number of parsed tests that fail at 
>>comparison phase by agood number (a few fifties).
>>    
>>
>
>Hmmm. I am not certain but it seems to me that XML Base was
>a thing that built on namespaces? Alek you must know the 
>definitive answer (or I guess I could check but .. ;-)).
>  
>
it is additional specification http://www.w3.org/TR/xmlbase/
that is used when resolving relative links in XML documents
and i do not think it is required in pure XML + Namespaces
http://www.w3.org/TR/REC-xml-names/
but nonetheless is popular and useful in some situations
including SOAP 1.2:
defined by this specification (see "SOAP uses XML Base [XML Base] for 
determining a base URI for relative URI references used as values in 
information items6. Use of URIs in SOAP)."
http://www.w3.org/TR/2003/REC-soap12-part1-20030624/

so in conclusion OM needs to support: XML 1.0 + Namespaces + XML Base
to support fully SOAP 1.2 though it is rarely used IMHO ...

>
>>And Sanjiva, just to be extra cautious that I don't give
>>out wrongsignals :-)... so far I tested OM against *only*
>>valid XMLs of 1.0version that should be parsed and serialized 
>>using any infosetimplementation. We haven't tested OM against
>>how well it can _reject_invalid and ill-formed XMLs. They
>>actually form the larger fraction ofthe XMLsuite about 1800 :-(
>>    
>>
>
>Ah ok - yes we do care about failing on the bad ones!
>  
>
those tests can be done independent of OM. i would not worry too much 
about that as StAX *is* API and you can swap in any parser you need with 
different speed/size/conformance trade-offs.

alek

-- 
The best way to predict the future is to invent it - Alan Kay

Re: [Axis2] [Update] XMLConformace Testing Report.

Posted by Sanjiva Weerawarana <sa...@opensource.lk>.

"jayachandra" <ja...@gmail.com> writes:
> Hi!
> Sanjiva, we do have issues with OM as well.
> -->To start with, 
> OM lacks PI, comments and DTD support. On my end, Iadded their
> implementation into OM code base and then ran the test.

These are not problems IMO :-). SOAP doesn't require them.
So I have no problem with us not being able to handle docs that
have stuff we know we won't get in the SOAP world.

> -->A default namespace for 'xml' prefix is supposed to be in
> the scopeof every XML element. I did a work around on my machine
> as todeclaring this namespace inside the OMElementImpl constructor
> methodsitself, before running the tests.

??? I have no idea what you're saying .. XML 1.0 has no concept
of namespaces! XML 1.0 *+* Namespaces does but not the base XML
spec. 

> -->The 'baseURI' property support is not provided by OM 
> insideOMElement. If we can keep track of this one thing in OM 
> it can help usreduce the number of parsed tests that fail at 
> comparison phase by agood number (a few fifties).

Hmmm. I am not certain but it seems to me that XML Base was
a thing that built on namespaces? Alek you must know the 
definitive answer (or I guess I could check but .. ;-)).

> However, getting a 100% success is unlikely without *full*
> DTDimplementation built into OM. Alek was saying DTD support
> is not thatwell implemented in stAX, it seems, and if that 
> be the need hesuggested to use woodstox.

Again doesn't bother me at all; YAGNI for SOAP.

> And Sanjiva, just to be extra cautious that I don't give
> out wrongsignals :-)... so far I tested OM against *only*
> valid XMLs of 1.0version that should be parsed and serialized 
> using any infosetimplementation. We haven't tested OM against
> how well it can _reject_invalid and ill-formed XMLs. They
> actually form the larger fraction ofthe XMLsuite about 1800 :-(

Ah ok - yes we do care about failing on the bad ones!

> Thanks for all your supportByeJaya

Most welcome!

Sanjiva.

RE: [Axis2] [Update] XMLConformace Testing Report.

Posted by Eran Chinthaka <ch...@opensource.lk>.

Hi Jayachandra,

Talking about OM with invalid xmls. I think this should be handled in the
parser level, meaning if there is an un conformity to XML 1.0, parser should
throw an error. Isn't it ?

Are there any errors that are sliped from parser and should be detected in
the OM level. Mind you pure OM doesn't have any validation code, its just an
object model. Am I missing something ??

BTW : your email seems scrambled to me. Any problem with your mail server,
or mine (ohh !!)

-- Chinthaka 

> -----Original Message-----
> From: jayachandra [mailto:jayachandra@gmail.com]
> Sent: Tuesday, April 26, 2005 12:22 PM
> To: Sanjiva Weerawarana
> Cc: axis-dev@ws.apache.org
> Subject: Re: [Axis2] [Update] XMLConformace Testing Report.
> 
> Hi!
> Sanjiva, we do have issues with OM as well. -->To start with, OM lacks PI,
> comments and DTD support. On my end, Iadded their implementation into OM
> code base and then ran the test.-->A default namespace for 'xml' prefix is
> supposed to be in the scopeof every XML element. I did a work around on my
> machine as todeclaring this namespace inside the OMElementImpl constructor
> methodsitself, before running the tests.-->The 'baseURI' property support
> is not provided by OM insideOMElement. If we can keep track of this one
> thing in OM it can help usreduce the number of parsed tests that fail at
> comparison phase by agood number (a few fifties).
> However, getting a 100% success is unlikely without *full*
> DTDimplementation built into OM. Alek was saying DTD support is not
> thatwell implemented in stAX, it seems, and if that be the need
> hesuggested to use woodstox.
> And Sanjiva, just to be extra cautious that I don't give out wrongsignals
> :-)... so far I tested OM against *only* valid XMLs of 1.0version that
> should be parsed and serialized using any infosetimplementation. We
> haven't tested OM against how well it can _reject_invalid and ill-formed
> XMLs. They actually form the larger fraction ofthe XMLsuite about 1800 :-(
> Thanks for all your supportByeJaya
> On 4/25/05, Sanjiva Weerawarana <sa...@opensource.lk> wrote:> Hi Jaya,>
> > Wow, thanks for all the hard work on this!> > Do I read your report
> correctly as this test didn't find any bugs> in the OM level but rather
> encountered difficulties in the parser> level?? If so I'm very happy :-).>
> > Of the passing ones, what made 735-567 documents not compare>
> successfully? Can we fix that?> > Thanks,> > Sanjiva.> > ----- Original
> Message -----> From: "jayachandra" <ja...@gmail.com>> To: <axis-
> dev@ws.apache.org>> Sent: Monday, April 25, 2005 7:26 PM> Subject: [Axis2]
> [Update] XMLConformace Testing Report.> > > Hi all,> > Total file count in
> W3C XMLSuite :2634 (this includes, valid, invalidand> illformed xmls too)
> Of them, valid ones                    :960 (i.e.> excluding invalidand
> illformed xmls. However this includes XMLs of both> versions 1.0and 1.1)>
> > Of them, valid XML1.0 ones         :832 (i.e excluding xmls from>
> 1.1version folders. Since the MXParser we have beneath is only
> 1.0compliant)> > On this final set, when OM is tested as is. 335 files got
> parsedproperly,> and 309 files had the serialized XML matching the input
> file(comparison> test). I've implemented OMComment and OMPI and did
> minimalistic> OMDTD(without validation etc.) support. And with those
> changes the> parsingrate increased to 735 and comparison success reached
> 567.> > The parsing failures found can be attributed to one or more of>
> thefollowing observations I could make. This is not an exhaustive>
> listthough.> > 1. For files where XML declaration line has a mention of>
> 'standalone'attribute prior to 'encoding' attribute, underlying MXParser>
> threw anexception with a message reading something like "Expected 'e'>
> inencoding and not 's' ". Alek! Is this a known issue with STAX. What
> doyou> think?> > 2. For files in which DTD declaration has right square
> bracket (']')as a> literal value of some entity, MXParser is treating it
> as end ofDTD> declaration.> > 3. Some xmls having multi byte characters
> (UK currency pound signamongst> others) are failing to get parsed with
> typical exceptionmessages like only> whitespace content allowed before
> start tag and not\ufffd. I have passed a> "UTF-8" aware reader to the
> builder, do I needto use something else here?> > 4. Apart from these
> because I couldn't implement the complete DTD infoset> implementation,
> some more files are failing to get parsed.> > Regarding the comparison,
> some of the observed reasons of failures are> > 1. Many SYSTEM
> identifiers in DTD declarations used a relativereference> and so far we
> don't have considered 'baseURI' property (doesSTAX parser> provide one?)
> for any of the elements and hence the XMLcomparator (xmlunit)> couldn't
> resolve the system identifiers therebyleading to a mismatch between> the
> serialized xml and the originalinput form.2. Also since the DTD support>
> is naïve, the presentation of data iscompletely ignored thereby leading
> to> scenarios like, serializing as#PCDATA when DTD says CDATA. This also
> lead to> significant comparisonfailures.> > ThanksJaya> > ---- Jaya> >> >
> 
> -- -- Jaya

Re: [Axis2] [Update] XMLConformace Testing Report.

Posted by jayachandra <ja...@gmail.com>.

Hi!

Sanjiva, we do have issues with OM as well. 
-->To start with, OM lacks PI, comments and DTD support. On my end, I
added their implementation into OM code base and then ran the test.
-->A default namespace for 'xml' prefix is supposed to be in the scope
of every XML element. I did a work around on my machine as to
declaring this namespace inside the OMElementImpl constructor methods
itself, before running the tests.
-->The 'baseURI' property support is not provided by OM inside
OMElement. If we can keep track of this one thing in OM it can help us
reduce the number of parsed tests that fail at comparison phase by a
good number (a few fifties).

However, getting a 100% success is unlikely without *full* DTD
implementation built into OM. Alek was saying DTD support is not that
well implemented in stAX, it seems, and if that be the need he
suggested to use woodstox.

And Sanjiva, just to be extra cautious that I don't give out wrong
signals :-)... so far I tested OM against *only* valid XMLs of 1.0
version that should be parsed and serialized using any infoset
implementation. We haven't tested OM against how well it can _reject_
invalid and ill-formed XMLs. They actually form the larger fraction of
the XMLsuite about 1800 :-(

Thanks for all your support
Bye
Jaya

On 4/25/05, Sanjiva Weerawarana <sa...@opensource.lk> wrote:
> Hi Jaya,
> 
> Wow, thanks for all the hard work on this!
> 
> Do I read your report correctly as this test didn't find any bugs
> in the OM level but rather encountered difficulties in the parser
> level?? If so I'm very happy :-).
> 
> Of the passing ones, what made 735-567 documents not compare
> successfully? Can we fix that?
> 
> Thanks,
> 
> Sanjiva.
> 
> ----- Original Message -----
> From: "jayachandra" <ja...@gmail.com>
> To: <ax...@ws.apache.org>
> Sent: Monday, April 25, 2005 7:26 PM
> Subject: [Axis2] [Update] XMLConformace Testing Report.
> 
> > Hi all,
> > Total file count in W3C XMLSuite :2634 (this includes, valid, invalidand
> illformed xmls too) Of them, valid ones                    :960 (i.e.
> excluding invalidand illformed xmls. However this includes XMLs of both
> versions 1.0and 1.1)
> > Of them, valid XML1.0 ones         :832 (i.e excluding xmls from
> 1.1version folders. Since the MXParser we have beneath is only 1.0compliant)
> > On this final set, when OM is tested as is. 335 files got parsedproperly,
> and 309 files had the serialized XML matching the input file(comparison
> test). I've implemented OMComment and OMPI and did minimalistic
> OMDTD(without validation etc.) support. And with those changes the
> parsingrate increased to 735 and comparison success reached 567.
> > The parsing failures found can be attributed to one or more of
> thefollowing observations I could make. This is not an exhaustive
> listthough.
> > 1. For files where XML declaration line has a mention of
> 'standalone'attribute prior to 'encoding' attribute, underlying MXParser
> threw anexception with a message reading something like "Expected 'e'
> inencoding and not 's' ". Alek! Is this a known issue with STAX. What doyou
> think?
> > 2. For files in which DTD declaration has right square bracket (']')as a
> literal value of some entity, MXParser is treating it as end ofDTD
> declaration.
> > 3. Some xmls having multi byte characters (UK currency pound signamongst
> others) are failing to get parsed with typical exceptionmessages like only
> whitespace content allowed before start tag and not\ufffd. I have passed a
> "UTF-8" aware reader to the builder, do I needto use something else here?
> > 4. Apart from these because I couldn't implement the complete DTD infoset
> implementation, some more files are failing to get parsed.
> > Regarding the comparison, some of the observed reasons of failures are…
> > 1. Many SYSTEM identifiers in DTD declarations used a relativereference
> and so far we don't have considered 'baseURI' property (doesSTAX parser
> provide one?) for any of the elements and hence the XMLcomparator (xmlunit)
> couldn't resolve the system identifiers therebyleading to a mismatch between
> the serialized xml and the originalinput form.2. Also since the DTD support
> is naïve, the presentation of data iscompletely ignored thereby leading to
> scenarios like, serializing as#PCDATA when DTD says CDATA. This also lead to
> significant comparisonfailures.
> > ThanksJaya
> > ---- Jaya
> >
> 
> 


-- 
-- Jaya

Re: [Axis2] [Update] XMLConformace Testing Report.

Posted by Sanjiva Weerawarana <sa...@opensource.lk>.

Hi Jaya,

Wow, thanks for all the hard work on this!

Do I read your report correctly as this test didn't find any bugs
in the OM level but rather encountered difficulties in the parser
level?? If so I'm very happy :-).

Of the passing ones, what made 735-567 documents not compare
successfully? Can we fix that?

Thanks,

Sanjiva.

----- Original Message ----- 
From: "jayachandra" <ja...@gmail.com>
To: <ax...@ws.apache.org>
Sent: Monday, April 25, 2005 7:26 PM
Subject: [Axis2] [Update] XMLConformace Testing Report.


> Hi all,
> Total file count in W3C XMLSuite :2634 (this includes, valid, invalidand
illformed xmls too) Of them, valid ones                    :960 (i.e.
excluding invalidand illformed xmls. However this includes XMLs of both
versions 1.0and 1.1)
> Of them, valid XML1.0 ones         :832 (i.e excluding xmls from
1.1version folders. Since the MXParser we have beneath is only 1.0compliant)
> On this final set, when OM is tested as is. 335 files got parsedproperly,
and 309 files had the serialized XML matching the input file(comparison
test). I've implemented OMComment and OMPI and did minimalistic
OMDTD(without validation etc.) support. And with those changes the
parsingrate increased to 735 and comparison success reached 567.
> The parsing failures found can be attributed to one or more of
thefollowing observations I could make. This is not an exhaustive
listthough.
> 1. For files where XML declaration line has a mention of
'standalone'attribute prior to 'encoding' attribute, underlying MXParser
threw anexception with a message reading something like "Expected 'e'
inencoding and not 's' ". Alek! Is this a known issue with STAX. What doyou
think?
> 2. For files in which DTD declaration has right square bracket (']')as a
literal value of some entity, MXParser is treating it as end ofDTD
declaration.
> 3. Some xmls having multi byte characters (UK currency pound signamongst
others) are failing to get parsed with typical exceptionmessages like only
whitespace content allowed before start tag and not\ufffd. I have passed a
"UTF-8" aware reader to the builder, do I needto use something else here?
> 4. Apart from these because I couldn't implement the complete DTD infoset
implementation, some more files are failing to get parsed.
> Regarding the comparison, some of the observed reasons of failures are…
> 1. Many SYSTEM identifiers in DTD declarations used a relativereference
and so far we don't have considered 'baseURI' property (doesSTAX parser
provide one?) for any of the elements and hence the XMLcomparator (xmlunit)
couldn't resolve the system identifiers therebyleading to a mismatch between
the serialized xml and the originalinput form.2. Also since the DTD support
is naïve, the presentation of data iscompletely ignored thereby leading to
scenarios like, serializing as#PCDATA when DTD says CDATA. This also lead to
significant comparisonfailures.
> ThanksJaya
> ---- Jaya
>

Re: [Axis2] [Update] XMLConformace Testing Report.

Posted by Aleksander Slominski <as...@cs.indiana.edu>.

jayachandra wrote:

>1. For files where XML declaration line has a mention of 'standalone'
>attribute prior to 'encoding' attribute, underlying MXParser threw an
>exception with a message reading something like "Expected 'e' in
>encoding and not 's' ". Alek! Is this a known issue with STAX. What do
>you think?
>  
>
please submit description of the problem (including sample XML necessary 
to reproduce it) to StAX issue tracker:
http://www.extreme.indiana.edu/bugzilla/buglist.cgi?product=STAX

>2. For files in which DTD declaration has right square bracket (']')
>as a literal value of some entity, MXParser is treating it as end of
>DTD declaration.
>  
>
that is the bug that is harder to fix but i plan to look on it (it is 
not important for SOAP as there is no DTD ...)

>3. Some xmls having multi byte characters (UK currency pound sign
>amongst others) are failing to get parsed with typical exception
>messages like only whitespace content allowed before start tag and not
>\ufffd. I have passed a "UTF-8" aware reader to the builder, do I need
>to use something else here?
>  
>
submit how to reproduce it (best if with sample code) to:
http://www.extreme.indiana.edu/bugzilla/buglist.cgi?product=STAX

>4. Apart from these because I couldn't implement the complete DTD info
>set implementation, some more files are failing to get parsed.
>  
>
if need DTDs try different parser - i think woodstox should handle DTDs:
http://woodstox.codehaus.org/

> Regarding the comparison, some of the observed reasons of failures are…
>
>1. Many SYSTEM identifiers in DTD declarations used a relative
>reference and so far we don't have considered 'baseURI' property (does
>STAX parser provide one?) 
>
no

>for any of the elements and hence the XML
>comparator (xmlunit) couldn't resolve the system identifiers thereby
>leading to a mismatch between the serialized xml and the original
>input form.
>  
>
you need to track this in your code (in OM) if it is needed.

>2. Also since the DTD support is naïve, the presentation of data is
>completely ignored thereby leading to scenarios like, serializing as
>#PCDATA when DTD says CDATA. This also lead to significant comparison
>failures.
>  
>
not required for non-validating parser (and in particular irrelevant for 
SOAP as DTDs are not allowed)

good work!

alek

-- 
The best way to predict the future is to invent it - Alan Kay

Re: [Axis2] [Update] XMLConformace Testing Report.

Posted by Venkat Reddy <vr...@gmail.com>.

The code alongwith the w3c conformance suite files, is checked into
the scratch area under ashu_jaya_venkat.

Good job Jaya !!

-- Venkat


On 4/27/05, Eran Chinthaka <ch...@opensource.lk> wrote:
> Jaya, seems like you have done some tremendous amount work.
> 
> Where is the code be available. I'd like to see the things, as it seems
> really interesting. Can u please point me to that ?
> 
> Thanks,
> Chinthaka
>

RE: [Axis2] [Update] XMLConformace Testing Report.

Posted by Eran Chinthaka <ch...@opensource.lk>.

Jaya, seems like you have done some tremendous amount work.

Where is the code be available. I'd like to see the things, as it seems
really interesting. Can u please point me to that ?

Thanks,
Chinthaka

> -----Original Message-----
> From: jayachandra [mailto:jayachandra@gmail.com]
> Sent: Monday, April 25, 2005 7:26 PM
> To: axis-dev@ws.apache.org
> Subject: [Axis2] [Update] XMLConformace Testing Report.
> 
> Hi all,
> Total file count in W3C XMLSuite :2634 (this includes, valid, invalidand
> illformed xmls too) Of them, valid ones                    :960 (i.e.
> excluding invalidand illformed xmls. However this includes XMLs of both
> versions 1.0and 1.1)
> Of them, valid XML1.0 ones         :832 (i.e excluding xmls from
> 1.1version folders. Since the MXParser we have beneath is only
> 1.0compliant)
> On this final set, when OM is tested as is. 335 files got parsedproperly,
> and 309 files had the serialized XML matching the input file(comparison
> test). I've implemented OMComment and OMPI and did minimalistic
> OMDTD(without validation etc.) support. And with those changes the
> parsingrate increased to 735 and comparison success reached 567.
> The parsing failures found can be attributed to one or more of
> thefollowing observations I could make. This is not an exhaustive
> listthough.
> 1. For files where XML declaration line has a mention of
> 'standalone'attribute prior to 'encoding' attribute, underlying MXParser
> threw anexception with a message reading something like "Expected 'e'
> inencoding and not 's' ". Alek! Is this a known issue with STAX. What
> doyou think?
> 2. For files in which DTD declaration has right square bracket (']')as a
> literal value of some entity, MXParser is treating it as end ofDTD
> declaration.
> 3. Some xmls having multi byte characters (UK currency pound signamongst
> others) are failing to get parsed with typical exceptionmessages like only
> whitespace content allowed before start tag and not\ufffd. I have passed a
> "UTF-8" aware reader to the builder, do I needto use something else here?
> 4. Apart from these because I couldn't implement the complete DTD infoset
> implementation, some more files are failing to get parsed.
> Regarding the comparison, some of the observed reasons of failures are
> 1. Many SYSTEM identifiers in DTD declarations used a relativereference
> and so far we don't have considered 'baseURI' property (doesSTAX parser
> provide one?) for any of the elements and hence the XMLcomparator
> (xmlunit) couldn't resolve the system identifiers therebyleading to a
> mismatch between the serialized xml and the originalinput form.2. Also
> since the DTD support is naïve, the presentation of data iscompletely
> ignored thereby leading to scenarios like, serializing as#PCDATA when DTD
> says CDATA. This also lead to significant comparisonfailures.
> ThanksJaya
> ---- Jaya