You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by "Mike Beckerle (Jira)" <ji...@apache.org> on 2019/09/26 20:44:00 UTC

[jira] [Commented] (DAFFODIL-2208) Empty strings never allowed as optional repeats - not compliant with DFDL spec.

    [ https://issues.apache.org/jira/browse/DAFFODIL-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16938957#comment-16938957 ] 

Mike Beckerle commented on DAFFODIL-2208:
-----------------------------------------

In addition, this fix is a change in behavior that will likely change operation of some of the existing schemas, particularly those with separators.

Do we need a compatibility mode given that 2.4.0 went out with this bug?

Note that some tests will no longer need 2 pass if they needed 2-pass due to empty elements not being present; hence, the separators for them being omitted on unparse. Now if the empty string elements are being created, then such tests can be 1-pass.

Schemas dealing with this problem of empty elements being dropped may have worked around the bug either (a) by using two-pass tests or (b) by using elements with lengthKind 'pattern' and regexs that incorporate things that look like adjacent delimiters into a prior element. (usmtf-generic does this). Those workarounds won't be broken by fixing this bug, but the workarounds aren't necessary with the problem fixed. However, the infoset created will be different so tests have to be adjusted if the workarounds are removed.

 

> Empty strings never allowed as optional repeats - not compliant with DFDL spec.
> -------------------------------------------------------------------------------
>
>                 Key: DAFFODIL-2208
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2208
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End
>    Affects Versions: 2.4.0
>            Reporter: Mike Beckerle
>            Assignee: Mike Beckerle
>            Priority: Major
>             Fix For: 2.5.0
>
>
> Exerpts here from emails on the [dfdl-wg@ogf.org|mailto:dfdl-wg@ogf.org] mailing list.
> {noformat}
> Problem: simple format that is impossible to model
> InboxxMike Beckerle <mb...@gmail.com> 1:47 PM (35 minutes ago)
> to DFDL-WG 
> I have a dead-simple little format:
>     data/data/data/data
>     data/data/data/data
> it is lines of "/" separated strings. All elements are optional. 
> I simply want this:
>    data//data
> to round trip. For that to happen I need it to parse into    <field>data</field><field></field><field>data</field>
> That is, I require that empty field element in the middle to be created and put into the infoset.
> I can find no way to do this. 
> The
>  strings have no initiator/terminator, so dfdl:emptyValueDelimiterPolicy
>  is not relevant. All the elements are optional, so default values 
> aren't relevant.
> The spec states:
> 9.4.2.2      Simple element (xs:string or xs:hexBinary)
> Required occurrence: If the element has a default value then an item is 
> added to the infoset using the default value, otherwise an item is added
>  to the Infoset using empty string (type xs:string) or empty hexBinary 
> (type xs:hexBinary) as the value.
> Optional occurrence: If dfdl:emptyValueDelimiterPolicy is not 'none'[12] then
> an item is added to the Infoset using empty string (type xs:string) or empty hexBinary (type xs:hexBinary) as the value, otherwise nothing is added to the Infoset.
> There
>  are errata/actions to clarify wording here around 
> dfdl:emptyValueDelimiterPolicy being in effect or not (because there is 
> no initiator/terminator for it to use as opposed to the property in 
> isolation just being 'none'). 
> But that doesn't change anything about this issue.
> If this very simple format is not possible, then we need a property or new property enum value that makes it possible. 
> Thoughts?{noformat}
> Subsequently to that I figured out what I believe is the spec flaw.
>  
> {noformat}
> To start discussion on my own issue.....
> The problem here may be that for a string (or hexBinary), if there is no initiator/terminator, there is no way to distinguish EmptyRep from NormalRep.
> I.e., an empty string is a "normal" value for a string.
> Sections 9.2.3 and 9.2.4 seem to define EmptyRep and NormalRep such that an empty string will be a EmptyRep, not a NormalRep.
> However section 9.2.5 on zero-length says:
>    "The normal representation can be a zero-length representation if the type is xs:string or xs:hexBinary and there is no framing."
> That suggests that when there is no framing, a zero-length string is NormalRep, not EmptyRep, which is the opposite conclusion from what is in sections 9.2.3 and 9.2.4.
> If this latter clarification is correct, then my format *should* work as I expect, because the empty string elements will be considered NormalRep and infoset values will be created for them.
> It simply doesn't work because of a bug in daffodil which has not interpreted this correctly.{noformat}
> That's the bug to fix: Strings and HexBinary with no framing are NormalRep, not EmptyRep.
>  
> Note that some tests in our test suite will have to be revised to take this into account.
> Behavior for public schemas should not change, as the above behavior is all subject to the new property (still a proposal) dfdlx:emptyElementParsePolicy being "treatAsEmpty" (the enum names are subject to change).
> The IBM-created schemas for EDIFACT and others depend on a behavior in IBM DFDL that we call dfdlx:emptyElementParsePolicy='treatAsMissing' (again enums subject to change). That behavior doesn't allow empty strings to be distinguished from absent strings. Under that policy the behavior of daffodil shouldn't change, so those schemas should still interoperate.
> The need for this bug fix is so as to be able to implement a generic schema for a format called USMTF, which is unfortunately, not public. But the simplified examples above illustrate the issue.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)