You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/07/11 11:26:20 UTC

The properties included in DFDL are driven by real-world data formats ... correct?

Hello DFDL community,



Question #1: Is the following statement true or false?



The decision on what properties to include in DFDL were driven by real-world use cases - that is, by real-world data formats.



Question #2: Is the following statement true or false?



Upon examining the real-world data formats, the DFDL working discovered that it is rare to have anything other than an empty string as the in-band nil value in a complex type; because of it's rarity (yesterday Mike used the word "obscure"), the DFDL working group decided to restrict in-band nil values on complex types to the empty string.



Question #3: Is the following statement true or false?



For every DFDL property there are multiple real-world data formats which require that DFDL property.



Question #4: Is there a document which shows the mapping between DFDL property and real-world data formats requiring that DFDL property?



/Roger

Re: The properties included in DFDL are driven by real-world data formats ... correct?

Posted by "Beckerle, Mike" <mb...@tresys.com>.
Q1: False.


The decisions were driven by the collective memory of the DFDL Workgroup members, and the experience of many others that was captured in the implementations of many data integration tools. In some cases properties were added because a popular widely used existing data integration tool had the property, and there was a consensus of the group that the property was likely needed in real examples because it would not otherwise exist in that popular tool.


Case in point: dfdl:separatorPosition with values prefix, infix, postfix. Do we really know whether this property is needed? Can all formats that use it be modeled using just "infix" behavior, plus some extra sequence groups and dfdl:terminator or dfdl:initiator? The answer is "maybe", because some data integration tools didn't have separator position properties. But figuring that out would be an academic exercise. This property was introduced (to my knowledge) in the Mercator EII tool, and widely copied, e.g., it also appears in  Microsoft BizTalk, and I believe also in an IBM message broker. So there was already plenty of precedent for the need for such property.  Furthermore, Mercator representatives on the DFDL workgroup contributed greatly, and there was simply no point in trying to negotiate minimization of features they had found a need for.


Effectively, DFDL is the union of all properties from a wide array of such tools - with some renaming for uniformity.


Q2: false


XSD provided the opportunity to allow nilled complex types. I do not recall if prior systems had this or not. The ones I am most familiar with did not. Somehow, a decision was made to allow this feature in DFDL. At some point there was a (misguided in hindsight) effort to provide a use in DFDL for most constructions available in XSD, so this may have been allowed by way of that rationale. Then subsequently,the complexity of it became clear - needing a bunch of properties for the nilValue representation of a complex type that were in addition to the properties for the non-nilled representation added too much complexity, and there was no precedent for such properties in existing data integration systems. So the restriction to ES was put in to eliminate this need. Alternatively we could have dropped the entire feature, but that was not the decision the workgroup decided to go with.


Q3: false


Some properties were added because there are formats that need them, but whether anyone will ever parse those formats with DFDL is unclear. In many cases these properties or property values are not implemented by any DFDL implementation as yet. Example: dfdl:lengthKind='endOfParent'. This property handles the case where a structure of some specified length contains children elements such as strings, each of which has a way to determine its length except the last one, which is assumed to extend to the end of the enclosing parent object. This concept definitely exists in data formats I have seen described. Nobody has needed it as yet. There are, quite possibly, other ways to work-around the need for this element.


Q4: No, there is no such document. What there is, is a test suite for DFDL which exercises every implemented property. This is part of Daffodil.

________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Thursday, July 11, 2019 7:26:20 AM
To: users@daffodil.apache.org
Subject: The properties included in DFDL are driven by real-world data formats ... correct?


Hello DFDL community,



Question #1: Is the following statement true or false?



The decision on what properties to include in DFDL were driven by real-world use cases - that is, by real-world data formats.



Question #2: Is the following statement true or false?



Upon examining the real-world data formats, the DFDL working discovered that it is rare to have anything other than an empty string as the in-band nil value in a complex type; because of it’s rarity (yesterday Mike used the word “obscure”), the DFDL working group decided to restrict in-band nil values on complex types to the empty string.



Question #3: Is the following statement true or false?



For every DFDL property there are multiple real-world data formats which require that DFDL property.



Question #4: Is there a document which shows the mapping between DFDL property and real-world data formats requiring that DFDL property?



/Roger