You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/04/15 17:16:31 UTC

3 possible things that could happen when parsing flawed input data

Hello DFDL community,

As I see it, there are 3 possible things that could happen when parsing flawed input data:

(a) The parser silently gobbles up the data and outputs XML containing the flawed data.

(b) The parser generates a warning but nonetheless continues onward and gobbles up the data and outputs XML containing the flawed data.

(c) The parser generates an error and halts processing. No XML is output.

Assertion: Daffodil only supports (a) and (c). Do you agree?

Here is the rationale for my assertion. In the following 2 schemas, Daffodil silently gobbles up flawed data and outputs XML containing the flawed data:

[cid:image001.png@01D4F38D.6FD93070]

In the following schema, Daffodil throws an error and stops processing. No XML is output.

[cid:image002.png@01D4F38D.6FD93070]

Re: 3 possible things that could happen when parsing flawed input data

Posted by "Beckerle, Mike" <mb...@tresys.com>.
Let me correct that. I've edited the text below to be more correct.


From: Beckerle, Mike <mb...@tresys.com>
Sent: Monday, April 15, 2019 5:27 PM
To: users@daffodil.apache.org
Subject: Re: 3 possible things that could happen when parsing flawed input data



I think you are missing a case because there are different kinds of "flawed" data.


1) Not well formed: E.g., has to be xs:int, but looks like 7ytb33. I.e., is not suitable for an int.  Or is missing delimiters - this is not well formed.


2) Invalid XML Schema facet and/or max/minOccurs constraints


3) invalid for other rules that require an XPath-like expression (e.g., so called co-constraints)


Daffodil can do (1) or (2) above. For (3) we need to implement a feature of DFDL we have not yet done, which is recoverable errors.  So this third kind yes, Daffodil cannot do this today.


So Daffodil can do your (a) and (c), and can do (b) for point (2) above, but is missing (3).


Point (b) for case (1) above is a matter of schema design. Some schemas are designed to detect some amounts of non-well-formed data, capture them anyway, and carry them in <unrecognized>someNotWellFormedDataHereAsHexOrText</unrecognized> style of elements. I.e., it is the intention of the schema to keep going if possible, and exhibit the data that caused problems. Sometimes this is possible, especially in text formats with nested delimiters (e.g., CSV). But often for binary data you can't keep the parser on the rails.


This tolerant style is particularly important for large files of things. It's not reasonable to refuse to parse a huge file due to a small error someplace. Often schemas have to be written to be a bit tolerant of erroneous things.


________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Monday, April 15, 2019 1:16:31 PM
To: users@daffodil.apache.org
Subject: 3 possible things that could happen when parsing flawed input data


Hello DFDL community,



As I see it, there are 3 possible things that could happen when parsing flawed input data:



(a) The parser silently gobbles up the data and outputs XML containing the flawed data.



(b) The parser generates a warning but nonetheless continues onward and gobbles up the data and outputs XML containing the flawed data.



(c) The parser generates an error and halts processing. No XML is output.



Assertion: Daffodil only supports (a) and (c). Do you agree?



Here is the rationale for my assertion. In the following 2 schemas, Daffodil silently gobbles up flawed data and outputs XML containing the flawed data:



[cid:image001.png@01D4F38D.6FD93070]



In the following schema, Daffodil throws an error and stops processing. No XML is output.



[cid:image002.png@01D4F38D.6FD93070]

Re: 3 possible things that could happen when parsing flawed input data

Posted by "Beckerle, Mike" <mb...@tresys.com>.
I think you are missing a case because there are different kinds of "flawed" data.


1) Not well formed: E.g., has to be xs:int, but looks like 7ytb33. I.e., is not suitable for an int.  Or is missing delimiters - this is not well formed.


2) Invalid XML Schema facet and/or max/minOccurs constraints


3) invalid for other rules that require an XPath-like expression (e.g., so called co-constraints)


Daffodil can do (1) or (2) above. For (3) we need to implement a feature of DFDL we have not yet done, which is recoverable errors.  So this third kind yes, Daffodil cannot do this today.


So Daffodil can do your (a) and (c), and can do (b) for points (1) and (2) above, but is missing (3).

________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Monday, April 15, 2019 1:16:31 PM
To: users@daffodil.apache.org
Subject: 3 possible things that could happen when parsing flawed input data


Hello DFDL community,



As I see it, there are 3 possible things that could happen when parsing flawed input data:



(a) The parser silently gobbles up the data and outputs XML containing the flawed data.



(b) The parser generates a warning but nonetheless continues onward and gobbles up the data and outputs XML containing the flawed data.



(c) The parser generates an error and halts processing. No XML is output.



Assertion: Daffodil only supports (a) and (c). Do you agree?



Here is the rationale for my assertion. In the following 2 schemas, Daffodil silently gobbles up flawed data and outputs XML containing the flawed data:



[cid:image001.png@01D4F38D.6FD93070]



In the following schema, Daffodil throws an error and stops processing. No XML is output.



[cid:image002.png@01D4F38D.6FD93070]