You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Ivelin Ivanov <iv...@iname.com> on 2002/03/28 06:55:31 UTC

Re: [Schematron-love-in] Re: [Announcement] Fast Schematron Validation Here !

I'm cross-sending this email to Schematron and Cocoon dev-lists, because
we're discussing problems of common interest.


----- Original Message -----
From: "Rick Jelliffe" <ri...@allette.com.au>
To: <sc...@lists.sourceforge.net>
Sent: Wednesday, March 27, 2002 11:38 PM
Subject: Re: [Schematron-love-in] Re: [Announcement] Fast Schematron
Validation Here !


From: "Ivelin Ivanov" <iv...@iname.com>

> A question was brought up on the Cocoon dev list.
>
> Can the phases tag be kept separate from the schema.

>Sure.

>All specific processing semantics of Schematron are
implementation-dependent:
>what happens when an assertion fails, which phases are active, which
elements
>are being tested, which order information items are traversed, etc.

>If you want to have externally-specified phases or to dynamically select
which
>patterns to run, that is fine.

>Once you get inside a pattern, it is a little different: you cannot
arbitrarily run
>rules as they are because they are lexically related: so if you have
  <pattern>
     <rule context="c1">
        ...
     </rule>
     <rule context="c2">
        ...
     </rule>
   </pattern>
>and you wanted to run the second rule only against a particular information
item,
>the actual context is
    not(c1) and c2
>which requires more testing than people may expect.

>The lack of semantics is why I try to encourage people to make general
>statements in <assert> statements "An X should have a Y" rather than
>"Error: you are hopeless, why don't you quit".  The <diagnostic> element
>is provided for that.

>> If the underlying model doesn't change and the full set of patterns is
the same,
>> then when adding support for new devices, wizards, etc. to build a
document
>> instance,  the rules for partial validation should be separate from the
description of
>> the model.

>Sorry, I don't understand this sentence: what do you mean by "devices"
here?

I mean browser, client. Different browsers (PC, PDA, cell-phone, etc.) may
support different human interfaces and therefore the document may be split
into different pieces which are gathered and put together at the end.
The validation of the pieces at each stage is device/client dependent.
Is the question more clear?

> > Does this question make sence? What do you suggest?

>I suggest people experiment and do whatever they can to get their jobs done
>and make life simpler and richer :-)    The point of Schematron is not to
>make a monolithic, ultimate validation system, but to provide a toolkit
>and a different vocabulary to help people solve some big practical problems
>with  minimal fuss, using technology that places human language at the
centre
>(rather than at the periphery, in "documentation' elements.)  If you come
>up with some nice new way to use the statements in a Schematron schema,
>you will only get respect.

>And please pass on to the coccoon people that if they have ideas for
>abstractions or hooks that might enhance Schematron, please feel free
>to prototype them and let us know.

We sure will.


Cheers,

Ivelin


>

Cheers
Rick Jelliffe



_______________________________________________
Schematron-love-in mailing list
Schematron-love-in@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/schematron-love-in


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Abstract Schemas APIs

Posted by Ivelin Ivanov <iv...@iname.com>.
Rick,

I fully support extensions 1,2, 3 & 6.

Actually Torsten is putting together a paper to summarize the extensions
which would be of interest to our group. We plan to send it to
representatives from XML Schema, Relax NG - James Clark, MSV & JARV -
Kohsuke and Schematron - yourself for review.

Regards,

Ivelin



----- Original Message -----
From: "Rick Jelliffe" <ri...@allette.com.au>
To: "Ivelin Ivanov" <iv...@sbcglobal.net>
Cc: <co...@xml.apache.org>; <sc...@lists.sourceforge.net>
Sent: Monday, April 01, 2002 8:03 PM
Subject: Abstract Schemas APIs


From: "Ivelin Ivanov" <iv...@iname.com>

> Have you been following the discussion with Kohsuke on a possible JARV
> integration?
> If you had a chance to see the JARV API, my source code and probably
> Torsten's API, maybe you can elaborate on a possible higher level
validation
> API which will encompass multiple schemas.

I posted some thoughts to XML-DEV at
  http://lists.xml.org/archives/xml-dev/200203/msg01179.html
which I will be sending to the DOM WG.

In general, Locators or SAXExceptions should be extended to
    1) carry paths as well as file/line/column messages
    2) carry HTML (or XML) text for richer messages in addition to plain
text messages
    3) carry some kind of user-defined status constant apart from the basic
Warning etc/ types
    4) carry enough information to allow repair of the document
    5) carry enough information to interrogate the current parsing
context(s) of the document
    at that failure point
    6) tell which parsing/validation system generated the failure

Some of these are pretty easy to do, but some like 5) would probably need
a ground-up redesign, since I don't expect validators are designed to allow
snapshots of their states!

We have been using Xerces-J in our editor for external validation, and
the problems we have with it have been
    1) locator error messages for XML Schemas occur too far from the
    actual incident--for example, if a required element is missing the
    locator is for the end-tag of the parent, it seems.
    2) Xerces does not let the user turn on and off different kinds of
    validation and WF checking depending on the users interest, enough.
    When checking fragments, it is useless to get error messages relating
    to IDREF and keyref, for example.
    3) This even extends to WF checking. As a parser feature, it would
    be good to allow unrooted documents, or to allow truncated documents
    which miss out on some end-tags at the end of the document, or
    try to match start- and end-tags in a case-insensitive way: this
    would allow much flexible validation.
    4) The regular expression bug has a known fix, but this has never
    been incorporated AFAIK. I don't see how any XML Schemas datatypes
    can be reliable without it.
    5) When sending a non-WF document with multiple roots and the
    continue-after-error feature enabled, we get an out-of-memory exception,
    which is out of proportion to the problem that causes it.

In general, there is a design question of whether the technology should
impose
a validation checklist on the user, where they have to attend to
earlier problems first, or whether the technology allows the
user to focus on particular regions of a document or areas of
interest fist: for example, a user might want to get linking correct
before the the metadata but the DTD requires the metadata
for validity.  For documents-in-progress, users should be allowed
to work to their own agenda and order as much as possible;
this has been a long-running problem with SGML and XML
systems.

For contractual exchange of finished documents, the idea of "validity"
is useful. But for documents-in-progress, it can be counter productive.
Instead, a more useful idea is "feasibility".  For Xerces to be really
useful
in document production, it will need more options or features aimed
at this kind of lesser validation.  I am presenting a paper on this at
XML 2002 in Barcelona next month, by the way, if anyone is interested:
"When well-formedness is too much and validity is not enough"

I hope this is some use,
Cheers
Rick Jelliffe
www.topologi.com




---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Abstract Schemas APIs

Posted by Rick Jelliffe <ri...@allette.com.au>.
From: "Ivelin Ivanov" <iv...@iname.com>
 
> Have you been following the discussion with Kohsuke on a possible JARV
> integration?
> If you had a chance to see the JARV API, my source code and probably
> Torsten's API, maybe you can elaborate on a possible higher level validation
> API which will encompass multiple schemas.
 
I posted some thoughts to XML-DEV at
  http://lists.xml.org/archives/xml-dev/200203/msg01179.html
which I will be sending to the DOM WG.

In general, Locators or SAXExceptions should be extended to
    1) carry paths as well as file/line/column messages
    2) carry HTML (or XML) text for richer messages in addition to plain text messages
    3) carry some kind of user-defined status constant apart from the basic Warning etc/ types
    4) carry enough information to allow repair of the document
    5) carry enough information to interrogate the current parsing context(s) of the document
    at that failure point
    6) tell which parsing/validation system generated the failure

Some of these are pretty easy to do, but some like 5) would probably need 
a ground-up redesign, since I don't expect validators are designed to allow
snapshots of their states!

We have been using Xerces-J in our editor for external validation, and
the problems we have with it have been
    1) locator error messages for XML Schemas occur too far from the
    actual incident--for example, if a required element is missing the
    locator is for the end-tag of the parent, it seems.
    2) Xerces does not let the user turn on and off different kinds of
    validation and WF checking depending on the users interest, enough.
    When checking fragments, it is useless to get error messages relating
    to IDREF and keyref, for example.
    3) This even extends to WF checking. As a parser feature, it would
    be good to allow unrooted documents, or to allow truncated documents
    which miss out on some end-tags at the end of the document, or
    try to match start- and end-tags in a case-insensitive way: this
    would allow much flexible validation.
    4) The regular expression bug has a known fix, but this has never
    been incorporated AFAIK. I don't see how any XML Schemas datatypes
    can be reliable without it. 
    5) When sending a non-WF document with multiple roots and the
    continue-after-error feature enabled, we get an out-of-memory exception,
    which is out of proportion to the problem that causes it.

In general, there is a design question of whether the technology should impose
a validation checklist on the user, where they have to attend to
earlier problems first, or whether the technology allows the
user to focus on particular regions of a document or areas of
interest fist: for example, a user might want to get linking correct
before the the metadata but the DTD requires the metadata 
for validity.  For documents-in-progress, users should be allowed
to work to their own agenda and order as much as possible;
this has been a long-running problem with SGML and XML
systems.

For contractual exchange of finished documents, the idea of "validity"
is useful. But for documents-in-progress, it can be counter productive.
Instead, a more useful idea is "feasibility".  For Xerces to be really useful
in document production, it will need more options or features aimed
at this kind of lesser validation.  I am presenting a paper on this at
XML 2002 in Barcelona next month, by the way, if anyone is interested:
"When well-formedness is too much and validity is not enough"

I hope this is some use,
Cheers
Rick Jelliffe
www.topologi.com    



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: [Announcement] Fast Schematron Validation Here ! [Re: Cocoon form handling]

Posted by Ivelin Ivanov <iv...@iname.com>.
Rick,

There's no doubt the Phases concept is essential to Schematron's success and
a definite advantage over XSD and Relax which, as you said, require
"proprietory handwaving" when it comes to partial document validation.

Have you been following the discussion with Kohsuke on a possible JARV
integration?
If you had a chance to see the JARV API, my source code and probably
Torsten's API, maybe you can elaborate on a possible higher level validation
API which will encompass multiple schemas.

Regards,

Ivelin



----- Original Message -----
From: "Rick Jelliffe" <ri...@allette.com.au>
To: <sc...@lists.sourceforge.net>
Cc: <co...@xml.apache.org>
Sent: Friday, March 29, 2002 9:33 PM
Subject: Re: [Schematron-love-in] Re: [Announcement] Fast Schematron
Validation Here !


From: "Ivelin Ivanov" <iv...@iname.com>

> I mean browser, client. Different browsers (PC, PDA, cell-phone, etc.) may
> support different human interfaces and therefore the document may be split
> into different pieces which are gathered and put together at the end.
> The validation of the pieces at each stage is device/client dependent.
> Is the question more clear?

Yes, certainly you could use Schematron <phase>s to validate
device-dependent constraints.

<Phase>s reconstruct the conditional section features of XML DTDs,
("INCLUDE/IGNORE marked sections") which allow you to customize
a DTD to get variants.  XML Schemas and RELAX do not have any equivalent.

The kinds of uses I imagine for phases include
  * versions (e.g. parallel variants)
  * pipe-line processing (e.g. serial variants)
  * variant processing (e.g. device-dependencies and fan-outs)
  * partial processing (e.g. documents under construction)
  * state-dependent processing (e.g. where the results of one
       phase are used by some proprietary system to switch to
       a different phase for further validation)

Critics of phases bleat that one can do the same thing with
different schemas, but the point is that with Schematron
<phase>s they become first-class objects capable of being
manipulated rather than proprietory handwaving :-)

In Topologi's freebie Schematron Validator (and in our
forthcoming Collaborative Markup Editor) we just make
a popup menu for the user to select the particular phase
to run when validating. Very straight-forward to use.

All in all, I think phases are a useful mechanism which
are trivial to implement and write, so they fit into
Schematron's `low-hanging fruit' approach well.

Cheers
Rick Jelliffe

_______________________________________________
Schematron-love-in mailing list
Schematron-love-in@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/schematron-love-in


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: [Schematron-love-in] Re: [Announcement] Fast Schematron Validation Here !

Posted by Rick Jelliffe <ri...@allette.com.au>.
From: "Ivelin Ivanov" <iv...@iname.com>
  
> I mean browser, client. Different browsers (PC, PDA, cell-phone, etc.) may
> support different human interfaces and therefore the document may be split
> into different pieces which are gathered and put together at the end.
> The validation of the pieces at each stage is device/client dependent.
> Is the question more clear?

Yes, certainly you could use Schematron <phase>s to validate 
device-dependent constraints. 

<Phase>s reconstruct the conditional section features of XML DTDs,
("INCLUDE/IGNORE marked sections") which allow you to customize 
a DTD to get variants.  XML Schemas and RELAX do not have any equivalent.

The kinds of uses I imagine for phases include
  * versions (e.g. parallel variants)
  * pipe-line processing (e.g. serial variants)
  * variant processing (e.g. device-dependencies and fan-outs)
  * partial processing (e.g. documents under construction)
  * state-dependent processing (e.g. where the results of one
       phase are used by some proprietary system to switch to
       a different phase for further validation)

Critics of phases bleat that one can do the same thing with
different schemas, but the point is that with Schematron
<phase>s they become first-class objects capable of being
manipulated rather than proprietory handwaving :-)

In Topologi's freebie Schematron Validator (and in our
forthcoming Collaborative Markup Editor) we just make
a popup menu for the user to select the particular phase
to run when validating. Very straight-forward to use.

All in all, I think phases are a useful mechanism which
are trivial to implement and write, so they fit into
Schematron's `low-hanging fruit' approach well. 

Cheers
Rick Jelliffe

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org