You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2018/12/28 18:49:54 UTC

Every row is separated by one newline, some rows have a string of length zero ... is it possible to implement this in DFDL?

Hello DFDL community,

I have this input file:

[cid:image003.png@01D49EB4.359B0240]

One way to characterize that file is every row is separated by exactly one newline, every row contains a string, and some rows contain a string of length zero.

Is it possible to express that characterization in DFDL?

I tried to implement it this way:

<xs:element name="label-message">
    <xs:complexType>
        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
            <xs:element name="row" maxOccurs="unbounded">
                <xs:complexType>
                    <xs:choice>
                        <xs:sequence dfdl:separator=":" dfdl:separatorPosition="infix">
                            <xs:element name="label" type="xs:string" />
                            <xs:element name="message" type="xs:string" />
                        </xs:sequence>
                        <xs:sequence>
                        </xs:sequence>
                    </xs:choice>
                </xs:complexType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>

But the output XML contained just the first two rows.

If there is a way to implement this characterization, would you show me the way please?

/Roger



Re: Every row is separated by one newline, some rows have a string of length zero ... is it possible to implement this in DFDL?

Posted by "Beckerle, Mike" <mb...@tresys.com>.
Well, I'm not exactly sure whether or why this should be required, but I think you need to put an element inside the second sequence inside the choice.


<xs:element name="empty" type="xs:string"/>


I think without that, daffodil is actually removing the sequence, then you have a choice with only one branch, so the choice goes away too.


Arguably that's not correct behavior.  In this case, as a branch of a choice, a sequence with absolutely no framing no elements, no nothing, cannot be optimized out, because there is a separator.


Which would make this a bug.


A workaround, if you don't want this empty element, you can put it in a global group definition, and make the second sequence of the choice just be


<xs:sequence dfdl:hiddenGroupRef="tns:emptyGroup"/>


This will still parse it, but will discard it from the infoset.


An advantage of having this empty element, is you can absorb things like lines containing whitespace, that look just like empty lines, but actually have spaces or tabs. Actually It will absorb any string that doesn't parse your colon-separated group syntax.

If you want to insist that it is actually an empty string, you can add a dfdl:assert statement insisting that it has zero length.


________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Friday, December 28, 2018 1:49:54 PM
To: users@daffodil.apache.org
Subject: Every row is separated by one newline, some rows have a string of length zero ... is it possible to implement this in DFDL?


Hello DFDL community,



I have this input file:



[cid:image003.png@01D49EB4.359B0240]



One way to characterize that file is every row is separated by exactly one newline, every row contains a string, and some rows contain a string of length zero.



Is it possible to express that characterization in DFDL?



I tried to implement it this way:



<xs:element name="label-message">
    <xs:complexType>
        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
            <xs:element name="row" maxOccurs="unbounded">
                <xs:complexType>
                    <xs:choice>
                        <xs:sequence dfdl:separator=":" dfdl:separatorPosition="infix">
                            <xs:element name="label" type="xs:string" />
                            <xs:element name="message" type="xs:string" />
                        </xs:sequence>
                        <xs:sequence>
                        </xs:sequence>
                    </xs:choice>
                </xs:complexType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>



But the output XML contained just the first two rows.



If there is a way to implement this characterization, would you show me the way please?



/Roger