You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/04/15 11:30:13 UTC

Don't use DFDL to do validation ... do you agree?

Hello DFDL community,

Do you agree with the following?

DFDL is a parsing language, not a validation language.

While it is possible to do validation in DFDL, it is not recommended. That is, it is possible to design a DFDL schema to validate data as the data is being parsed, but that is not recommended.

Instead, use DFDL just for parsing. Once input data has been converted to XML, bring to bear all the XML tools to process the XML, including validation tools such as XML Schema, Schematron, and/or RelaxNG. That is, do validation on the XML, not on the native file format.

But you might argue: Hold on there, in the most recent discussion wasn't validation performed in the DFDL schema:

<xs:sequence>
    <xs:element name="value" type="xs:string"
           dfdl:lengthKind="pattern"
           dfdl:lengthPattern=".*?(?=(\x0D-|\x0D\x0A-|-|\)$))">
        <xs:annotation>
            <xs:appinfo source="http://www.ogf.org/dfdl/">
     <!-- Isn't the following validation? -->
                <dfdl:assert message="empty value" >
                    {fn:string-length(.) gt 0}
                </dfdl:assert>
            </xs:appinfo>
        </xs:annotation>
    </xs:element>
    <xs:choice>
        ...
    </xs:choice>
</xs:sequence>

That validates the length of the input is greater than zero, yes?

Yes. However, it is being used strictly for signaling to Daffodil when to abandon this sequence, back up, and proceed down the next path. In other words, it is being strictly used as a parsing device, not a validation device. [Am I expressing this correctly? Is there a better, richer, more correct way to express this?]

/Roger


Re: Don't use DFDL to do validation ... do you agree?

Posted by "Beckerle, Mike" <mb...@tresys.com>.
One further thought I should have started with in my reply.


it is still important to keep "well formed" separate from "valid". You often want to accept well formed data, and produce a parse of it. You don't want to make validity a criteria for creating an infoset at all.


I think there is this hierarchy:

malformed, well-formed, valid, correct


Obviously you can't parse malformed data so you don't get an infoset.

The next 3 all are about infosets.


The difference between valid and correct is that validity checks can't (and often don't) check everything. The ultimate test of correctness of data is whether it in fact is suitable for purpose.


________________________________
From: Beckerle, Mike
Sent: Monday, April 15, 2019 9:30:09 AM
To: users@daffodil.apache.org
Subject: Re: Don't use DFDL to do validation ... do you agree?


I don't agree.


DFDL has a dfdl:assert with a recoverable flag. (Daffodil doesn't implement this yet, but it's easy.)


This can be used to emit warnings. A dfdl:assert of this type can be used as an explicit validation check much like a schematron rule.


In addition Daffodil has DFDL schema validation built in. If invoked with the validation flag it will do validation of XSD facets and min/max occurs on the fly as it parses, and accumulate these validation warnings. This is MUCH more efficient than running a separate XSD validation.


The only thing you don't get from this, is it doesn't check key and unique constraints. Those you'd have to put in an XML Schema and use a regular XSD validatior.


Well one other thing too. DFDL's expressions are a little less expressive than full XPath (and some of the more powerful DFDL expression features aren't implemented in Daffodil), so the asserts can't be quite as expressive as full XPath. But most of it is there.


At the API level one can then check the diagnostics output by the parse to see if there were any validation errors or recoverable asserts. If so one could indicate the data is invalid.



________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Monday, April 15, 2019 7:30:13 AM
To: users@daffodil.apache.org
Subject: Don't use DFDL to do validation ... do you agree?


Hello DFDL community,



Do you agree with the following?



DFDL is a parsing language, not a validation language.



While it is possible to do validation in DFDL, it is not recommended. That is, it is possible to design a DFDL schema to validate data as the data is being parsed, but that is not recommended.



Instead, use DFDL just for parsing. Once input data has been converted to XML, bring to bear all the XML tools to process the XML, including validation tools such as XML Schema, Schematron, and/or RelaxNG. That is, do validation on the XML, not on the native file format.



But you might argue: Hold on there, in the most recent discussion wasn’t validation performed in the DFDL schema:



<xs:sequence>
    <xs:element name="value" type="xs:string"
           dfdl:lengthKind="pattern"
           dfdl:lengthPattern=".*?(?=(\x0D-|\x0D\x0A-|-|\)$))">
        <xs:annotation>
            <xs:appinfo source="http://www.ogf.org/dfdl/">

     <!-- Isn’t the following validation? -->
                <dfdl:assert message="empty value" >
                    {fn:string-length(.) gt 0}
                </dfdl:assert>
            </xs:appinfo>
        </xs:annotation>
    </xs:element>
    <xs:choice>
        ...
    </xs:choice>
</xs:sequence>



That validates the length of the input is greater than zero, yes?



Yes. However, it is being used strictly for signaling to Daffodil when to abandon this sequence, back up, and proceed down the next path. In other words, it is being strictly used as a parsing device, not a validation device. [Am I expressing this correctly? Is there a better, richer, more correct way to express this?]



/Roger



Re: Don't use DFDL to do validation ... do you agree?

Posted by "Beckerle, Mike" <mb...@tresys.com>.
I don't agree.


DFDL has a dfdl:assert with a recoverable flag. (Daffodil doesn't implement this yet, but it's easy.)


This can be used to emit warnings. A dfdl:assert of this type can be used as an explicit validation check much like a schematron rule.


In addition Daffodil has DFDL schema validation built in. If invoked with the validation flag it will do validation of XSD facets and min/max occurs on the fly as it parses, and accumulate these validation warnings. This is MUCH more efficient than running a separate XSD validation.


The only thing you don't get from this, is it doesn't check key and unique constraints. Those you'd have to put in an XML Schema and use a regular XSD validatior.


Well one other thing too. DFDL's expressions are a little less expressive than full XPath (and some of the more powerful DFDL expression features aren't implemented in Daffodil), so the asserts can't be quite as expressive as full XPath. But most of it is there.


At the API level one can then check the diagnostics output by the parse to see if there were any validation errors or recoverable asserts. If so one could indicate the data is invalid.



________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Monday, April 15, 2019 7:30:13 AM
To: users@daffodil.apache.org
Subject: Don't use DFDL to do validation ... do you agree?


Hello DFDL community,



Do you agree with the following?



DFDL is a parsing language, not a validation language.



While it is possible to do validation in DFDL, it is not recommended. That is, it is possible to design a DFDL schema to validate data as the data is being parsed, but that is not recommended.



Instead, use DFDL just for parsing. Once input data has been converted to XML, bring to bear all the XML tools to process the XML, including validation tools such as XML Schema, Schematron, and/or RelaxNG. That is, do validation on the XML, not on the native file format.



But you might argue: Hold on there, in the most recent discussion wasn’t validation performed in the DFDL schema:



<xs:sequence>
    <xs:element name="value" type="xs:string"
           dfdl:lengthKind="pattern"
           dfdl:lengthPattern=".*?(?=(\x0D-|\x0D\x0A-|-|\)$))">
        <xs:annotation>
            <xs:appinfo source="http://www.ogf.org/dfdl/">

     <!-- Isn’t the following validation? -->
                <dfdl:assert message="empty value" >
                    {fn:string-length(.) gt 0}
                </dfdl:assert>
            </xs:appinfo>
        </xs:annotation>
    </xs:element>
    <xs:choice>
        ...
    </xs:choice>
</xs:sequence>



That validates the length of the input is greater than zero, yes?



Yes. However, it is being used strictly for signaling to Daffodil when to abandon this sequence, back up, and proceed down the next path. In other words, it is being strictly used as a parsing device, not a validation device. [Am I expressing this correctly? Is there a better, richer, more correct way to express this?]



/Roger