You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Patrick Grandjean <p....@gmail.com> on 2020/03/02 18:11:49 UTC

Re: How to remove empty elements in the output XML?

Hi all,

Sorry for the late reply. Thank you for having a look at this!

Patrick.

On Fri, Feb 21, 2020 at 6:54 PM Beckerle, Mike <mb...@tresys.com> wrote:

> Ah. I missed the nilKind 'literalCharacter' entirely.
>
> So, yeah, we need to test the interaction here. I believe for nilValue
> literalCharacter, one must examine for nil value before trimming pad chars.
> I know we have some tests for literalCharacter, but one where it is the
> same as the pad character with textTrimKind="padChar" I doubt.
>
> When unparsing, it's simpler. If the element is nilled, produce the
> nilValue character repeating for the field width.
>
> If the value is present and empty string when unparsing, then padding will
> fill in the spaces via the textStringPadCharacter.
>
> This is another example of a format that doesn't "round trip", because an
> empty string value would be written out as all spaces, and would be parsed
> back in as a nilled element. I.e., the canonical interpretation is nilled
> not empty sting.
>
> ------------------------------
> *From:* Steve Lawrence <st...@gmail.com>
> *Sent:* Friday, February 21, 2020 4:38 PM
> *To:* Beckerle, Mike <mb...@tresys.com>; users@daffodil.apache.org <
> users@daffodil.apache.org>
> *Subject:* Re: How to remove empty elements in the output XML?
>
> nilValue is %SP; but nilKind="literalCharacter". So it is nil if all
> characters in the "NilLiteralCharacters" region is all spaces. If I'm
> reading the spec correctly, pading isn't applied with
> NilLiteralCharacters, so all the spaces should be part of it.
>
> I guess an alternative would to do nilKind="literalValue" and
> nilValue="%WSP*;" but that doesn't seem to work either.
>
>
> On 2/21/20 5:27 PM, Beckerle, Mike wrote:
> > I don't think this:
> >
> > <xs:element name="field" type="xs:string"
> >    dfdl:lengthKind="explicit" dfdl:length="9"
> >    dfdl:textPadKind="padChar"
> >    dfdl:textTrimKind="padChar"
> >    dfdl:textStringPadCharacter="%SP;"
> >    dfdl:textStringJustification="left"
> >    dfdl:nilKind="literalCharacter"
> >    dfdl:nilValue="%SP;"
> >    nillable="true" />
> >
> > should produce nil when the input is all spaces. The nilValue is not
> WSP* or
> > WSP+,, it's SP, and the value is fixed length 9 chars, so it can *never*
> be just
> > one space, which is the nilValue. So I think this can never produce a
> nil value.
> >
> > The rest of the email seems right tho. I just want to add this sort of
> motivation.
> >
> > This is an interesting thing in DFDL. Sometimes you can't get out what
> you want
> > as XML because in DFDL, the primary requirement is to describe the
> format of the
> > data as it is. In your data representation, the field isn't optional.
> It's
> > mandatory. It has to be there and occupies 9 characters of fixed length.
> So DFDL
> > doesn't let you model this as an optional field on purpose. The physical
> format
> > often constrains the logical model in DFDL.
> >
> > DFDL's job is to describe the input format. Not so much to describe how
> to
> > transform it to what your preference is. That's really a job for other
> tools.
> >
> > That said, tricks like what Steve suggested where you use choices to
> model
> > something as not an optional element, but an alternative of two things,
> one of
> > which is just syntax, the other of which is an element.... that's the
> kind of
> > thing you have to do to force it to produce what you prefer. This sort
> of thing
> > isn't really a trick. It's extensively used in many formats.
> >
> >
> >
> >
> >
> --------------------------------------------------------------------------------
> > *From:* Steve Lawrence <st...@gmail.com>
> > *Sent:* Friday, February 21, 2020 9:15 AM
> > *To:* users@daffodil.apache.org <us...@daffodil.apache.org>
> > *Subject:* Re: How to remove empty elements in the output XML?
> > I think there might be a bug with nillable strings and padding when
> > padChar is the same as the nilValue. The following should result with a
> > nilled element when the data is all spaces but doesn't.
> >
> > <xs:element name="field" type="xs:string"
> >    dfdl:lengthKind="explicit" dfdl:length="9"
> >    dfdl:textPadKind="padChar"
> >    dfdl:textTrimKind="padChar"
> >    dfdl:textStringPadCharacter="%SP;"
> >    dfdl:textStringJustification="left"
> >    dfdl:nilKind="literalCharacter"
> >    dfdl:nilValue="%SP;"
> >    nillable="true" />
> >
> > However, if you want the element to not be in the infoset at all when
> > the data is all spaces, as opposed to a nilled element, you need a
> > different method. Keep in mind that something needs to parse those empty
> > strings if the element is missing.
> >
> > A technique that seems to work well (although it is maybe a bit messy)
> > is something like this:
> >
> > <xs:choice>
> >    <xs:sequence dfdl:initiator="%SP;%SP;%SP;%SP;%SP;%SP;%SP;%SP;%SP;" />
> >    <xs:element name="field" ... />
> > </xs:choice>
> >
> > So in this case we first try to parse 9 spaces via an empty sequence
> > with an initiator. If that fails, then we try to parse those 9
> > characters as "field". So if 9 spaces were found then the field element
> > will not be in the infoset. Note that field should have minOccurs="1"
> > now--it's not optional since the optionality is handled by the sequence.
> >
> > This is a little messy, so I'd recommend defining some formats to make
> > it more clear, e.g.:
> >
> >    <xs:annotation>
> >      <xs:appinfo source="http://www.ogf.org/dfdl/">
> >        <dfdl:defineFormat name="empty9">
> >          <dfdl:format initiator="%SP;%SP;%SP;%SP;%SP;%SP;%SP;%SP;%SP;" />
> >        </dfdl:defineFormat>
> >      </xs:appinfo>
> >    </xs:annotation>
> >
> >    <xs:choice>
> >      <xs:sequence dfdl:ref="empty9" />
> >      <xs:element name="field" ... />
> >    </xs:choice>
> >
> >
> >
> >
> > On 2/20/20 1:03 PM, Patrick Grandjean wrote:
> >> Hi,
> >>
> >> I am parsing a text format and some optional elements are encoded as
> empty
> >> strings or strings with space characters only. How to have these
> elements
> >> omitted in the output XML?
> >>
> >> The element is declared as:
> >>
> >> <xs:element name="field1" type="xs:string" minOccurs="0"
> dfdl:length="9"
> >> dfdl:lengthKind="explicit" />
> >>
> >> Thanks to the properties textStringJustification="center" and
> >> textTrimKind="padChar", the parsed string is trimmed and the output XML
> looks like:
> >>
> >> <field1></field1>
> >>
> >> I have tried specifying properties emptyValueDelimiterPolicy, nilKind,
> >> nilValueDelimiterPolicy and  nilValue but can't find a combination to
> have this
> >> element removed in the output XML.
> >>
> >> Is it possible? If yes, could you please show me how?
> >>
> >> Thanks,
> >> Patrick.
> >>
> >
>
>