You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by cbowditch <bo...@hotmail.com> on 2007/02/06 14:56:38 UTC

Well Formed Checking

Hi,

we are using Xerces in an application to parse very large XML files using
SAX. In an older version of our application we are using xerces 2.2.1. If
there is an error in the XML being parsed, i.e. not well formed, then Xerces
generates SAX Events up to the point of the error. However in a newer
version of the our Application we use Xerces 2.7.1 and that throws an
exception before generating any SAX events if there is an error a few Kb
inside the XML.

Is this behaviour change to be expected? Is there anyway I can control if
Xerces parses up to the point of invalid XML or not? I looked in the list of
features and properties but couldn't see anything that would help.
continue-on-fatal-error isnt quite what I'm looking for.

Thanks,

Chris
-- 
View this message in context: http://www.nabble.com/Well-Formed-Checking-tf3180736.html#a8826373
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Well Formed Checking

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Chris,

cbowditch <bo...@hotmail.com> wrote on 02/07/2007 11:44:31 AM:

> Hi Michael,
> 
> thanks for the reply. I am quite sure there isn't an error earlier in 
the
> file. The error message provided by both versions of Xerces is the same. 
The
> XML structure has this opening sequence:
> 
> <Letters>
>    <dataset type="sample" effdate="2002-10-01">
>       <channel channeldest="PDF"/>
>       <customer>
>          <name>
> 
> There are several dataset elements with a complex structure which are 
all
> well formed until later in the XML we have:
> 
>       </customer>
>    </dataset>
>          <name>
> 
> So the opening dataset and customer elements are missing. Xerces reports 
the
> error
> 
> The element type "Letters" must be terminated by the matching end-tag
> "</Letters>".

I really doubt this error is being reported before all of the document 
information preceding it. Xerces doesn't work like that.

> which is clearly referring to the root element. Maybe thats why the 
newer
> version stops at the opening tag of the root element, but the well 
formness
> problem is much later in the file.

If you still think you're really seeing this can you post a small 
(complete) test case which demonstrates this behaviour?

> Thanks,
> 
> Chris
> 
> -- 
> View this message in context: http://www.nabble.com/Well-Formed-
> Checking-tf3180736.html#a8849083
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Well Formed Checking

Posted by ke...@us.ibm.com.
Are you sure this isn't just a change in the error message? The document
contains an unexpected end-tag (because the begin-tag is missing, but
there's no way the lexer can determine that). The current message tells you
what end-tag was expected rather than which one was found, but it's
correct.

______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
(http://www.ovff.org/pegasus/songs/threes-rev-11.html)

Re: Well Formed Checking

Posted by cbowditch <bo...@hotmail.com>.
Hi Michael,

thanks for the reply. I am quite sure there isn't an error earlier in the
file. The error message provided by both versions of Xerces is the same. The
XML structure has this opening sequence:

<Letters>
	<dataset type="sample" effdate="2002-10-01">
		<channel channeldest="PDF"/>
		<customer>
			<name>

There are several dataset elements with a complex structure which are all
well formed until later in the XML we have:

		</customer>
	</dataset>
			<name>

So the opening dataset and customer elements are missing. Xerces reports the
error

The element type "Letters" must be terminated by the matching end-tag
"</Letters>".

which is clearly referring to the root element. Maybe thats why the newer
version stops at the opening tag of the root element, but the well formness
problem is much later in the file.

Thanks,

Chris


Michael Glavassevich wrote:
> 
> Hi Chris,
> 
> Xerces will report SAX events up until the point it detects a 
> well-formedness error. Are you sure there isn't a problem earlier in the 
> document which wasn't being detected with the older (buggy) version of 
> Xerces?
> 
> Thanks.
> 
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> cbowditch <bo...@hotmail.com> wrote on 02/06/2007 08:56:38 AM:
> 
>> Hi,
>> 
>> we are using Xerces in an application to parse very large XML files 
> using
>> SAX. In an older version of our application we are using xerces 2.2.1. 
> If
>> there is an error in the XML being parsed, i.e. not well formed, then 
> Xerces
>> generates SAX Events up to the point of the error. However in a newer
>> version of the our Application we use Xerces 2.7.1 and that throws an
>> exception before generating any SAX events if there is an error a few Kb
>> inside the XML.
>> 
>> Is this behaviour change to be expected? Is there anyway I can control 
> if
>> Xerces parses up to the point of invalid XML or not? I looked in the 
> list of
>> features and properties but couldn't see anything that would help.
>> continue-on-fatal-error isnt quite what I'm looking for.
>> 
>> Thanks,
>> 
>> Chris
>> -- 
>> View this message in context: http://www.nabble.com/Well-Formed-
>> Checking-tf3180736.html#a8826373
>> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Well-Formed-Checking-tf3180736.html#a8849083
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: Well Formed Checking

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hi Chris,

Xerces will report SAX events up until the point it detects a 
well-formedness error. Are you sure there isn't a problem earlier in the 
document which wasn't being detected with the older (buggy) version of 
Xerces?

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

cbowditch <bo...@hotmail.com> wrote on 02/06/2007 08:56:38 AM:

> Hi,
> 
> we are using Xerces in an application to parse very large XML files 
using
> SAX. In an older version of our application we are using xerces 2.2.1. 
If
> there is an error in the XML being parsed, i.e. not well formed, then 
Xerces
> generates SAX Events up to the point of the error. However in a newer
> version of the our Application we use Xerces 2.7.1 and that throws an
> exception before generating any SAX events if there is an error a few Kb
> inside the XML.
> 
> Is this behaviour change to be expected? Is there anyway I can control 
if
> Xerces parses up to the point of invalid XML or not? I looked in the 
list of
> features and properties but couldn't see anything that would help.
> continue-on-fatal-error isnt quite what I'm looking for.
> 
> Thanks,
> 
> Chris
> -- 
> View this message in context: http://www.nabble.com/Well-Formed-
> Checking-tf3180736.html#a8826373
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org