You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by "Beckerle, Mike" <mb...@owlcyberdefense.com> on 2021/01/21 17:13:24 UTC

Re: [apache/incubator-daffodil] Embedded Schematron (#463)

Question on schematron or really the new validator system generally.

Can I use both Daffodil's built in "limited" validation AND also use schematron or other validation, or is it an either or?

I have a need to use both Limited validation AND would still like to also use schematron.

It has to do with a strategy for error recovery

<choice>
   <!-- first branch -->
     .... first branch of choice is the data format

    <!-- second branch is used for error recovery -->
   <xs:element name="malformed" type="tns:invalidByte"/>
</choice>

<simpleType name="invalidByte" dfdl:representation="binary" dfdl:lengthKind="implicit">
    <restriction base="xs:unsignedByte">
        <maxExclusive value="0"/> <!-- can never pass. Always will be invalid. -->
    </restriction>
</simpleType>

So if the data format can't be parsed (it's malformed), Daffodil will backtrack to this malformed element which will consume 1 byte of data. And an infoset containing these <malformed>N</malformed> elements will be considered well-formed, but it will not pass validation checks that check the facet on the invalidByte type.

So this validation will fail, indicating that the data contains "malformed" elements explicitly. This validation isn't really about validation at all, it's being used as part of recognizing malformed data but in a way that we can recover from the error, and try to parse again, having consumed one byte.

Hence, I want this validation that is used to look for the elements that indicate the data is malformed, but for "real" data validation I'd like to also​ use schematron rules.

That's the motivation anyway.
________________________________
From: John Wass <no...@github.com>
Sent: Wednesday, January 20, 2021 7:20 AM
To: apache/incubator-daffodil <in...@noreply.github.com>
Cc: Beckerle, Mike <mb...@owlcyberdefense.com>; Push <pu...@noreply.github.com>
Subject: Re: [apache/incubator-daffodil] Embedded Schematron (#463)


@jw3<https://github.com/jw3> pushed 1 commit.

  *   dfb3711<https://github.com/apache/incubator-daffodil/commit/dfb3711f1167173e4f31929e589d9d4ea4fce6cf> Support embedding Schematron rules in DFDL schemas.

—
You are receiving this because you are subscribed to this thread.
View it on GitHub<https://github.com/apache/incubator-daffodil/pull/463/files/bb6073d979f3334be10188bcc3d0d41cdaa2528f..dfb3711f1167173e4f31929e589d9d4ea4fce6cf> or unsubscribe<https://github.com/notifications/unsubscribe-auth/AALUDAZHQFNNJYWEWLHFQXTS23C7RANCNFSM4UBVTVAQ>.

Re: [apache/incubator-daffodil] Embedded Schematron (#463)

Posted by Steve Lawrence <sl...@apache.org>.
Yeah, I don't think we have a way to differentiate the two aside from
scanning the diagnostic messages. We'd likely need a new API to support
that.

On 1/21/21 12:39 PM, Beckerle, Mike wrote:
> So the trick I think is I need 2 independent validation statuses.
> 
> I need to know whether the internal limited validation passed/failed independently of the schematron validation.
> 
> The only way I can tell those apart would be to search the diagnostics and recognize the error message text in some reliable way, and if there are any from the basic Limited validation, then the data is "malformed", whereas if they are all schematron validation messages, then it was well-formed, but invalid.
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, January 21, 2021 12:28 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: [apache/incubator-daffodil] Embedded Schematron (#463)
> 
> Yes, the two validation methods are completely separate.
> 
> "limited" validation is essentially Daffodil calling
> dfdl:checkConstraints() immediately after every element is finished
> being parsed:
> 
> https://github.com/apache/incubator-daffodil/blob/master/daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/ElementCombinator1.scala#L70-L95
> 
> And we always perform this validation as long as validation mode is not
> Off. So whether it's, limited, full, or custom, we will always do our
> "limited" checkConstraint() validation.
> 
> https://github.com/apache/incubator-daffodil/blob/master/daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/ElementCombinator1.scala#L266-L271
> 
> The way "full" validation works is we tee all infoset events to a second
> InfosetOutputter (as XML text), and at the end of parsing we send those
> XML bytes to Xerces/Schematron/custom validator for additional validation.
> 
> https://github.com/apache/incubator-daffodil/blob/master/daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala#L703-L718
> 
> From what I've seen, the new schematron functionality doesn't change any
> of this.
> 
> So I think this accomplishes exactly what you want. Restriction facets
> will be be checked by limited validation during parse and create
> appropriate validation errors, and schematron can be used at the end to
> support more complicated validation like co-constraints.
> 
> 
> 
> On 1/21/21 12:13 PM, Beckerle, Mike wrote:
>>
>> Question on schematron or really the new validator system generally.
>>
>> Can I use both Daffodil's built in "limited" validation AND also use schematron or other validation, or is it an either or?
>>
>> I have a need to use both Limited validation AND would still like to also use schematron.
>>
>> It has to do with a strategy for error recovery
>>
>> <choice>
>>    <!-- first branch -->
>>      .... first branch of choice is the data format
>>
>>     <!-- second branch is used for error recovery -->
>>    <xs:element name="malformed" type="tns:invalidByte"/>
>> </choice>
>>
>> <simpleType name="invalidByte" dfdl:representation="binary" dfdl:lengthKind="implicit">
>>     <restriction base="xs:unsignedByte">
>>         <maxExclusive value="0"/> <!-- can never pass. Always will be invalid. -->
>>     </restriction>
>> </simpleType>
>>
>> So if the data format can't be parsed (it's malformed), Daffodil will backtrack to this malformed element which will consume 1 byte of data. And an infoset containing these <malformed>N</malformed> elements will be considered well-formed, but it will not pass validation checks that check the facet on the invalidByte type.
>>
>> So this validation will fail, indicating that the data contains "malformed" elements explicitly. This validation isn't really about validation at all, it's being used as part of recognizing malformed data but in a way that we can recover from the error, and try to parse again, having consumed one byte.
>>
>> Hence, I want this validation that is used to look for the elements that indicate the data is malformed, but for "real" data validation I'd like to also​ use schematron rules.
>>
>> That's the motivation anyway.
>> ________________________________
>> From: John Wass <no...@github.com>
>> Sent: Wednesday, January 20, 2021 7:20 AM
>> To: apache/incubator-daffodil <in...@noreply.github.com>
>> Cc: Beckerle, Mike <mb...@owlcyberdefense.com>; Push <pu...@noreply.github.com>
>> Subject: Re: [apache/incubator-daffodil] Embedded Schematron (#463)
>>
>>
>> @jw3<https://github.com/jw3> pushed 1 commit.
>>
>>   *   dfb3711<https://github.com/apache/incubator-daffodil/commit/dfb3711f1167173e4f31929e589d9d4ea4fce6cf> Support embedding Schematron rules in DFDL schemas.
>>
>> —
>> You are receiving this because you are subscribed to this thread.
>> View it on GitHub<https://github.com/apache/incubator-daffodil/pull/463/files/bb6073d979f3334be10188bcc3d0d41cdaa2528f..dfb3711f1167173e4f31929e589d9d4ea4fce6cf> or unsubscribe<https://github.com/notifications/unsubscribe-auth/AALUDAZHQFNNJYWEWLHFQXTS23C7RANCNFSM4UBVTVAQ>.
>>
> 


Re: [apache/incubator-daffodil] Embedded Schematron (#463)

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
So the trick I think is I need 2 independent validation statuses.

I need to know whether the internal limited validation passed/failed independently of the schematron validation.

The only way I can tell those apart would be to search the diagnostics and recognize the error message text in some reliable way, and if there are any from the basic Limited validation, then the data is "malformed", whereas if they are all schematron validation messages, then it was well-formed, but invalid.
________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, January 21, 2021 12:28 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: [apache/incubator-daffodil] Embedded Schematron (#463)

Yes, the two validation methods are completely separate.

"limited" validation is essentially Daffodil calling
dfdl:checkConstraints() immediately after every element is finished
being parsed:

https://github.com/apache/incubator-daffodil/blob/master/daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/ElementCombinator1.scala#L70-L95

And we always perform this validation as long as validation mode is not
Off. So whether it's, limited, full, or custom, we will always do our
"limited" checkConstraint() validation.

https://github.com/apache/incubator-daffodil/blob/master/daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/ElementCombinator1.scala#L266-L271

The way "full" validation works is we tee all infoset events to a second
InfosetOutputter (as XML text), and at the end of parsing we send those
XML bytes to Xerces/Schematron/custom validator for additional validation.

https://github.com/apache/incubator-daffodil/blob/master/daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala#L703-L718

From what I've seen, the new schematron functionality doesn't change any
of this.

So I think this accomplishes exactly what you want. Restriction facets
will be be checked by limited validation during parse and create
appropriate validation errors, and schematron can be used at the end to
support more complicated validation like co-constraints.



On 1/21/21 12:13 PM, Beckerle, Mike wrote:
>
> Question on schematron or really the new validator system generally.
>
> Can I use both Daffodil's built in "limited" validation AND also use schematron or other validation, or is it an either or?
>
> I have a need to use both Limited validation AND would still like to also use schematron.
>
> It has to do with a strategy for error recovery
>
> <choice>
>    <!-- first branch -->
>      .... first branch of choice is the data format
>
>     <!-- second branch is used for error recovery -->
>    <xs:element name="malformed" type="tns:invalidByte"/>
> </choice>
>
> <simpleType name="invalidByte" dfdl:representation="binary" dfdl:lengthKind="implicit">
>     <restriction base="xs:unsignedByte">
>         <maxExclusive value="0"/> <!-- can never pass. Always will be invalid. -->
>     </restriction>
> </simpleType>
>
> So if the data format can't be parsed (it's malformed), Daffodil will backtrack to this malformed element which will consume 1 byte of data. And an infoset containing these <malformed>N</malformed> elements will be considered well-formed, but it will not pass validation checks that check the facet on the invalidByte type.
>
> So this validation will fail, indicating that the data contains "malformed" elements explicitly. This validation isn't really about validation at all, it's being used as part of recognizing malformed data but in a way that we can recover from the error, and try to parse again, having consumed one byte.
>
> Hence, I want this validation that is used to look for the elements that indicate the data is malformed, but for "real" data validation I'd like to also​ use schematron rules.
>
> That's the motivation anyway.
> ________________________________
> From: John Wass <no...@github.com>
> Sent: Wednesday, January 20, 2021 7:20 AM
> To: apache/incubator-daffodil <in...@noreply.github.com>
> Cc: Beckerle, Mike <mb...@owlcyberdefense.com>; Push <pu...@noreply.github.com>
> Subject: Re: [apache/incubator-daffodil] Embedded Schematron (#463)
>
>
> @jw3<https://github.com/jw3> pushed 1 commit.
>
>   *   dfb3711<https://github.com/apache/incubator-daffodil/commit/dfb3711f1167173e4f31929e589d9d4ea4fce6cf> Support embedding Schematron rules in DFDL schemas.
>
> —
> You are receiving this because you are subscribed to this thread.
> View it on GitHub<https://github.com/apache/incubator-daffodil/pull/463/files/bb6073d979f3334be10188bcc3d0d41cdaa2528f..dfb3711f1167173e4f31929e589d9d4ea4fce6cf> or unsubscribe<https://github.com/notifications/unsubscribe-auth/AALUDAZHQFNNJYWEWLHFQXTS23C7RANCNFSM4UBVTVAQ>.
>


Re: [apache/incubator-daffodil] Embedded Schematron (#463)

Posted by Steve Lawrence <sl...@apache.org>.
Yes, the two validation methods are completely separate.

"limited" validation is essentially Daffodil calling
dfdl:checkConstraints() immediately after every element is finished
being parsed:

https://github.com/apache/incubator-daffodil/blob/master/daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/ElementCombinator1.scala#L70-L95

And we always perform this validation as long as validation mode is not
Off. So whether it's, limited, full, or custom, we will always do our
"limited" checkConstraint() validation.

https://github.com/apache/incubator-daffodil/blob/master/daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/ElementCombinator1.scala#L266-L271

The way "full" validation works is we tee all infoset events to a second
InfosetOutputter (as XML text), and at the end of parsing we send those
XML bytes to Xerces/Schematron/custom validator for additional validation.

https://github.com/apache/incubator-daffodil/blob/master/daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DataProcessor.scala#L703-L718

From what I've seen, the new schematron functionality doesn't change any
of this.

So I think this accomplishes exactly what you want. Restriction facets
will be be checked by limited validation during parse and create
appropriate validation errors, and schematron can be used at the end to
support more complicated validation like co-constraints.



On 1/21/21 12:13 PM, Beckerle, Mike wrote:
> 
> Question on schematron or really the new validator system generally.
> 
> Can I use both Daffodil's built in "limited" validation AND also use schematron or other validation, or is it an either or?
> 
> I have a need to use both Limited validation AND would still like to also use schematron.
> 
> It has to do with a strategy for error recovery
> 
> <choice>
>    <!-- first branch -->
>      .... first branch of choice is the data format
> 
>     <!-- second branch is used for error recovery -->
>    <xs:element name="malformed" type="tns:invalidByte"/>
> </choice>
> 
> <simpleType name="invalidByte" dfdl:representation="binary" dfdl:lengthKind="implicit">
>     <restriction base="xs:unsignedByte">
>         <maxExclusive value="0"/> <!-- can never pass. Always will be invalid. -->
>     </restriction>
> </simpleType>
> 
> So if the data format can't be parsed (it's malformed), Daffodil will backtrack to this malformed element which will consume 1 byte of data. And an infoset containing these <malformed>N</malformed> elements will be considered well-formed, but it will not pass validation checks that check the facet on the invalidByte type.
> 
> So this validation will fail, indicating that the data contains "malformed" elements explicitly. This validation isn't really about validation at all, it's being used as part of recognizing malformed data but in a way that we can recover from the error, and try to parse again, having consumed one byte.
> 
> Hence, I want this validation that is used to look for the elements that indicate the data is malformed, but for "real" data validation I'd like to also​ use schematron rules.
> 
> That's the motivation anyway.
> ________________________________
> From: John Wass <no...@github.com>
> Sent: Wednesday, January 20, 2021 7:20 AM
> To: apache/incubator-daffodil <in...@noreply.github.com>
> Cc: Beckerle, Mike <mb...@owlcyberdefense.com>; Push <pu...@noreply.github.com>
> Subject: Re: [apache/incubator-daffodil] Embedded Schematron (#463)
> 
> 
> @jw3<https://github.com/jw3> pushed 1 commit.
> 
>   *   dfb3711<https://github.com/apache/incubator-daffodil/commit/dfb3711f1167173e4f31929e589d9d4ea4fce6cf> Support embedding Schematron rules in DFDL schemas.
> 
> —
> You are receiving this because you are subscribed to this thread.
> View it on GitHub<https://github.com/apache/incubator-daffodil/pull/463/files/bb6073d979f3334be10188bcc3d0d41cdaa2528f..dfb3711f1167173e4f31929e589d9d4ea4fce6cf> or unsubscribe<https://github.com/notifications/unsubscribe-auth/AALUDAZHQFNNJYWEWLHFQXTS23C7RANCNFSM4UBVTVAQ>.
>