You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-users@xerces.apache.org by Jan Dvorak <ja...@mathan.cz> on 2002/06/20 13:35:43 UTC

Error reporting from XML Schema and from Schematron (long)

Hello all,

I'm facing the following problem, and I'd very much appreciate comments from 
people on this list. I appologize for the longer post.

We have constructed an XML Schema and a Schematron schema that both together 
constrain the data we accept into an information system. Technically 
speaking, it works great. We have an extensive suite of XML inputs and 
corresponding expected error reports and we use these to test that the 
checker does what it's supposed to do.

Where this fails is in the error reports from the XML Schema validation. In 
the Schematron reports we can speak in terms of the problem domain (data 
about scientific projects, their participants and the financial support 
thereof) - we write the messages ourselves. So we can e.g. report that a 
project should specify the date when it started. That's understandable to all 
our users and we can provide all useful diagnostics as to where the problem 
is located. However, if we place this constraint in the XML Schema, all we 
get is a cvc-something error report that says that the content of element 
'lifecycle' doesn't match its model. This is accompanied with line and column 
numbers. In this form, our users find it pretty much indigestible.

The first idea I had was to run away from XML Schema, to place all 
constraints in the Schematron schema. There might even be a way to 
automatically generate the Schematron constraints from an XML Schema, 
where we might be able to adjust the violation report texts. If we are sure 
all constraints from the XML Schema are moved to Schematron, we could skip 
the XML Schema validation step.

However, moving all the constraints to Schematron would increase the 
number of assertions from some 800 to some 6000 (est.) and that's a level of 
complexity neither we, nor our customer can afford. We might also face 
performance problems.

The feasible way out of this seems that of gradually adding checks into the 
Schematron schema to report violations there. We'll start with the most 
frequent ones, and continue with those where the error reports are especially 
cryptic. In the process, we would need to know in every moment that no error 
remains unreported. We might report an error twice, but then a simple 
correction - suppression of the report by XML Schema validator - should take 
care of that. In the end, we might find that something like 30% of the 
constraints are moved to Schematron.

Now, we need to selectively suppress those XML Schema violations that will be 
reported by Schematron. We can't move the XML Schema constraint types one by 
one. It will always be a constraint type in a specific context (of an element 
type, or of a XML Schema type).

For that, we could use a common way of locating errors. I'm afraid that 
getting the physical locations from Schematron is too difficult a task and 
the result might not quite match the physical locations by Xerces. On the 
other hand, Schematron can reliably produce 'logical' locations, something 
like 'canonical XPath' to the node where the violation occurred. E.g. 
'/root/a[1]/b[23]' meaning the 23rd 'b' child of the first 'a' child of 
'root'. (Things are more difficult in the presence of namespaces, but still 
tractable.)

How difficult would it be to extend Xerces to:
 (i) Produce 'logical' locations in terms of 'canonical' XPaths
     as described above.
 (ii) Pass these locations to XMLErrorReporter.
Then I could set up a filtering XMLErrorReporter that would let me gradually 
move violation reports from XML Schema to Schematron. 

Is there a better way to achieve our goal?


Jan Dvorak
MathAn Praha

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: Error reporting from XML Schema and from Schematron (long)

Posted by Jan Dvorak <ja...@mathan.cz>.

Hi Eddie/Rick,

I did a few experiments with the locations Xerces gives with errors, 
and I concluded they weren't reliable enough. 
Our users won't peek around in the XML file. 
That's why I'm more inclined towards the path location. 
That one is much closer to Schematron, anyway. 

Jan

Eddie Robertson wrote:
> >For that, we could use a common way of locating errors. I'm afraid that
> >getting the physical locations from Schematron is too difficult a task and
> >the result might not quite match the physical locations by Xerces.
>
> There are two issues here:
>  1) How accurate are the line numbers generated by Xerces?
>  2) How to get Schematron to report line numbers
>
> For the first, Xerces does not seem to distinguish between
> errors in markup and errors in values enough.  So if an
> element with text content does not match a datatype, the
> error will be reported as occurring at the close of
> the end-tag for that element (something like that, anyway).
>
> Given that problem, merging errors is difficult.
> 
> For the second: if you use XT and schematron-message.xsl,
> you can get line numbers.  We had to customize ours to
> get column numbers as well, but it is straghtforward.
>
> Cheers
> Rick Jelliffe
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-user-help@xml.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: Error reporting from XML Schema and from Schematron (long)

Posted by Eddie Robertsson <er...@allette.com.au>.

>
>
>For that, we could use a common way of locating errors. I'm afraid that 
>getting the physical locations from Schematron is too difficult a task and 
>the result might not quite match the physical locations by Xerces.
>

There are two issues here:
 1) How accurate are the line numbers generated by Xerces?
 2) How to get Schematron to report line numbers

For the first, Xerces does not seem to distinguish between 
errors in markup and errors in values enough.  So if an
element with text content does not match a datatype, the
error will be reported as occurring at the close of
the end-tag for that element (something like that, anyway).

Given that problem, merging errors is difficult. 

For the second: if you use XT and schematron-message.xsl,
you can get line numbers.  We had to customize ours to
get column numbers as well, but it is straghtforward. 

Cheers
Rick Jelliffe



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: Error reporting from XML Schema and from Schematron (long)

Posted by Jan Dvorak <ja...@mathan.cz>.

Great! So there's a real need for the thing!

Representing errors (schema violations) as objects looks like the way it 
should be. Obviously, different errors will have different sets of data (the 
arguments to XMLErrorReporter.reportError() call). If we had a dom (W3C DOM, 
JDOM, dom4j, ...) builder integrated in Xerces (via XNI), we could even 
return the node as an object.

Jan

Torsten Curdt wrote:
>
> God loves me - someone else wants exactly the same as I want :-))
> ...almost...
>
> I was always wondering why the xpath isn't passed with errors!!
>
> But I'd like go even a little further... I like to have some kind of an
> error object/facet whatever which tells what failed not only as a human
> readable message... so one can process it programmatically....
> --
> Torsten

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: Error reporting from XML Schema and from Schematron (long)

Posted by Torsten Curdt <tc...@dff.st>.

<snip/>

> For that, we could use a common way of locating errors. I'm afraid that
> getting the physical locations from Schematron is too difficult a task and
> the result might not quite match the physical locations by Xerces. On the
> other hand, Schematron can reliably produce 'logical' locations, something
> like 'canonical XPath' to the node where the violation occurred. E.g.
> '/root/a[1]/b[23]' meaning the 23rd 'b' child of the first 'a' child of
> 'root'. (Things are more difficult in the presence of namespaces, but still
> tractable.)
>
> How difficult would it be to extend Xerces to:
>  (i) Produce 'logical' locations in terms of 'canonical' XPaths
>      as described above.
>  (ii) Pass these locations to XMLErrorReporter.
> Then I could set up a filtering XMLErrorReporter that would let me
> gradually move violation reports from XML Schema to Schematron.

God loves me - someone else wants exactly the same as I want :-))
...almost...

I was always wondering why the xpath isn't passed with errors!!

But I'd like go even a little further... I like to have some kind of an error 
object/facet whatever which tells what failed not only as a human readable 
message... so one can process it programmatically....
--
Torsten

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

How to get a vector structure?

Posted by Marcial Atienzar <ma...@servicom2000.com>.

Hello everybody,

	A lot of thanks for all your responses. I have another question. I want to
take a vector from an application. And the XML has to take it, and with the
xsl make the HTML. Have I to repeat the line for every line in the vector or
is there an structure in xerces to pass objects to the xml?

	In other way:

		I have this:

			<node>value</node>
			<node>value</node>
			<node>value</node>
			<node>value</node>
			<node>value</node>

		And I want this or something similar:

			<node>Object vector</node>


	Is it posible? A lot of thanks an excuse my poor English.

	Marcial


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: Error reporting from XML Schema and from Schematron (long)

Posted by Jan Dvorak <ja...@mathan.cz>.

Rick,

I agree with you: I think pretty much any application has to rewrite the 
messages of any XML Schema validator, so that they speak the language of the 
problem domain, rather than that of the XML Schema spec.

Plus, applications might have a need to refine the errors. Violating a 
constraint in the context of one element (or type) can mean something quite 
different than violating the same constraint in the context of another 
element (or type), and there should be a way to distinguish these. And there 
can be different causes of a content model mismatch...

Rick Jelliffe wrote:

> There has been talk of adding XPath to the locators used in
> SAX.  That would be a great idea.  Line numbers are
> useful sometimes, and paths are useful at others, so IMHO we
> need a SAX infrastructure that can provide either.

Yes.
XNI has the mechanism of augmentations.
Perhaps that's the way for SAX as well?
Is there a revision of SAX planned?

> Personally, I think an error object should be able to provide
>   - file/line/character number
Of the real problem spot, if that can at all be defined.
>   - XPath
Yes!
>   - severity indicator
>   - sendor ID
>   - nickname or error-code
>   - single line overview
>   - multiline diagnostic, XML
>   - icon for that error
>   - URL for see also
>   - unique ID for keying a repair method
>   - unique ID for diagnostic generating function
Yes!!!

A XML Schema validator alone will never be able to provide all of this on its 
own. Unless it has very detailed instructions.

> This would support Schematron and XSD well.
Yes, errors from the two should be unified.

> My company has also been using Xerces-J as well as
> Schematron (and also RELAX NG and DTDS) in
> an editor product now in beta testing.
>
> I had to rewrite almost all the Xerces error messages
> because they were incomprehensible to end-users.
> (I don't know if it is worthwhile contributing these,
> because some of them are specific to our system
> or leave out diagnostics of errors that cannot happen
> for us.)

> One improvement that I found useful was to first
> classify all errors as either document errors
> or schema errors. At the moment everything is
> mixed together, and a layman has now way of
> knowing whether the document is bad or the
> schema is bad. So the first thing I did was
> to prefix all schema errors with (Schema error).
> Then I rewrote all the other errors for end-users,
> in product specific terms.

I used the IBM Schema Quality Checker to debug my schema, so I didn't 
experience any schema errors from Xerces. It was of great utility to me, as I 
was learning XML Schema in the process.

> Actually, I do tend to think that one should always
> expect to rewrite error messages for a particular
> system.  But for Xerces' case, it would be nice
> if the messages were a little less programmer
> oriented in the first-place.

Or there was a mechanism to customize them, with a clean interface.
The 'Validation errors' thread on this list has a discussion of exactly these 
matters.

> The two worst offenders are:
>  1) Error messages relating to the DOCTYPE
> declaration.  A missing system identifier in
> the DOCTYPE declaration is diagnosed as
> being caused by a missing space.  If there is
> no entity, then IIRC the user gets a message
> to the effect that there is  an error in "null".
> Problems that occur before the perceived
> start of the document are very off-putting.

Yes, that's a pain.

>  2) The XSD error messages.  These are
> fairly poor: you have to learn to ignore
> the reference to the XSD outcome code
> and the parenthetic content models at
> the end.

... and the rest is not particularly clear either.
It is my impression that Xerces-J-2 moved towards more technical speak still.

I'm not saying this is Xerces' fault. 
It's doing it's job and is doing it great. 
Just the real life use of the tool takes some adaptation. 

> Finally, on the issue of migrating from
> XSD to Schematron.  One thing that may
> be helpful is Francis Norton's typeTagger.
> This is an XSLT stylesheet that adds
> xsi:type attributes to a document, based
> on an XSD schema.  So you can continue
> to describe your basic structures and
> datatypes using XSD, but give the
> Schematron access to that typing
> information:
>   <rule context="*[@xsi:type='address']">
>     <assert test="*[@xsi:type='street']"
>
>     >A <name/> is an address, and so
>
>    it needs some kind of street, for example &lt;strasse>.</assert>
>   </rule>

Sounds interesting. 
Also, type information should be available in XPath 2.0.

> Cheers
> Rick Jelliffe

Jan Dvorak

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org

Re: Error reporting from XML Schema and from Schematron (long)

Posted by Eddie Robertsson <er...@allette.com.au>.

Hi Jan,

Rick Jelliffe asked me to forward the following to this list since he is 
a non-subscriber:

-----------------------------------8<---------------------------------------

There has been talk of adding XPath to the locators used in
SAX.  That would be a great idea.  Line numbers are 
useful sometimes, and paths are useful at others, so IMHO we
need a SAX infrastructure that can provide either.

Personally, I think an error object should be able to provide
  - file/line/character number
  - XPath
  - severity indicator
  - sendor ID
  - nickname or error-code
  - single line overview
  - multiline diagnostic, XML
  - icon for that error
  - URL for see also
  - unique ID for keying a repair method
  - unique ID for diagnostic generating function

This would support Schematron and XSD well.

My company has also been using Xerces-J as well as
Schematron (and also RELAX NG and DTDS) in
an editor product now in beta testing.   

I had to rewrite almost all the Xerces error messages
because they were incomprehensible to end-users.
(I don't know if it is worthwhile contributing these,
because some of them are specific to our system
or leave out diagnostics of errors that cannot happen
for us.)

One improvement that I found useful was to first
classify all errors as either document errors
or schema errors. At the moment everything is
mixed together, and a layman has now way of 
knowing whether the document is bad or the 
schema is bad. So the first thing I did was
to prefix all schema errors with (Schema error).
Then I rewrote all the other errors for end-users,
in product specific terms.

Actually, I do tend to think that one should always
expect to rewrite error messages for a particular
system.  But for Xerces' case, it would be nice
if the messages were a little less programmer
oriented in the first-place.

The two worst offenders are:
 1) Error messages relating to the DOCTYPE 
declaration.  A missing system identifier in
the DOCTYPE declaration is diagnosed as
being caused by a missing space.  If there is
no entity, then IIRC the user gets a message
to the effect that there is  an error in "null".  
Problems that occur before the perceived
start of the document are very off-putting.

 2) The XSD error messages.  These are
fairly poor: you have to learn to ignore
the reference to the XSD outcome code
and the parenthetic content models at
the end. 

Finally, on the issue of migrating from
XSD to Schematron.  One thing that may
be helpful is Francis Norton's typeTagger.
This is an XSLT stylesheet that adds 
xsi:type attributes to a document, based
on an XSD schema.  So you can continue
to describe your basic structures and
datatypes using XSD, but give the
Schematron access to that typing 
information:
  <rule context="*[@xsi:type='address']">
    <assert test="*[@xsi:type='street']"
    >A <name/> is an address, and so
   it needs some kind of street, for example &lt;strasse>.</assert>
  </rule>

Cheers
Rick Jelliffe

-----------------------------------8<---------------------------------------

Jan Dvorak wrote:

>Hello all,
>
>I'm facing the following problem, and I'd very much appreciate comments from 
>people on this list. I appologize for the longer post.
>
>We have constructed an XML Schema and a Schematron schema that both together 
>constrain the data we accept into an information system. Technically 
>speaking, it works great. We have an extensive suite of XML inputs and 
>corresponding expected error reports and we use these to test that the 
>checker does what it's supposed to do.
>
>Where this fails is in the error reports from the XML Schema validation. In 
>the Schematron reports we can speak in terms of the problem domain (data 
>about scientific projects, their participants and the financial support 
>thereof) - we write the messages ourselves. So we can e.g. report that a 
>project should specify the date when it started. That's understandable to all 
>our users and we can provide all useful diagnostics as to where the problem 
>is located. However, if we place this constraint in the XML Schema, all we 
>get is a cvc-something error report that says that the content of element 
>'lifecycle' doesn't match its model. This is accompanied with line and column 
>numbers. In this form, our users find it pretty much indigestible.
>
>The first idea I had was to run away from XML Schema, to place all 
>constraints in the Schematron schema. There might even be a way to 
>automatically generate the Schematron constraints from an XML Schema, 
>where we might be able to adjust the violation report texts. If we are sure 
>all constraints from the XML Schema are moved to Schematron, we could skip 
>the XML Schema validation step.
>
>However, moving all the constraints to Schematron would increase the 
>number of assertions from some 800 to some 6000 (est.) and that's a level of 
>complexity neither we, nor our customer can afford. We might also face 
>performance problems.
>
>The feasible way out of this seems that of gradually adding checks into the 
>Schematron schema to report violations there. We'll start with the most 
>frequent ones, and continue with those where the error reports are especially 
>cryptic. In the process, we would need to know in every moment that no error 
>remains unreported. We might report an error twice, but then a simple 
>correction - suppression of the report by XML Schema validator - should take 
>care of that. In the end, we might find that something like 30% of the 
>constraints are moved to Schematron.
>
>Now, we need to selectively suppress those XML Schema violations that will be 
>reported by Schematron. We can't move the XML Schema constraint types one by 
>one. It will always be a constraint type in a specific context (of an element 
>type, or of a XML Schema type).
>
>For that, we could use a common way of locating errors. I'm afraid that 
>getting the physical locations from Schematron is too difficult a task and 
>the result might not quite match the physical locations by Xerces. On the 
>other hand, Schematron can reliably produce 'logical' locations, something 
>like 'canonical XPath' to the node where the violation occurred. E.g. 
>'/root/a[1]/b[23]' meaning the 23rd 'b' child of the first 'a' child of 
>'root'. (Things are more difficult in the presence of namespaces, but still 
>tractable.)
>
>How difficult would it be to extend Xerces to:
> (i) Produce 'logical' locations in terms of 'canonical' XPaths
>     as described above.
> (ii) Pass these locations to XMLErrorReporter.
>Then I could set up a filtering XMLErrorReporter that would let me gradually 
>move violation reports from XML Schema to Schematron. 
>
>Is there a better way to achieve our goal?
>
>
>Jan Dvorak
>MathAn Praha
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
>For additional commands, e-mail: xerces-j-user-help@xml.apache.org
>
>  
>



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org