You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Beckerle, Mike" <mb...@owlcyberdefense.com> on 2021/09/17 19:12:10 UTC

Re: optional int and unparse formatting

Sorry for the late response on this. Turns out outlook 365 was spam filtering some apache emails. It's a known issue with microsoft's spam filters.

The sequence wrapped around elem5 doesn't need a dfdl:separator because the elem5 has maxOccurs 1, so there will never be enough things to separate.

Otherwise yeah, this looks like what I was suggesting.

I agree that the DFDL spec is quite painful in numerous areas. Unfortunately I have to take the blame for some of that. Someday I hope some sections will get refactored and rewritten.


________________________________
From: Theodore Toth <te...@sage.northcom.mil>
Sent: Tuesday, August 31, 2021 12:21 AM
To: users@daffodil.apache.org <us...@daffodil.apache.org>
Subject: Re: optional int and unparse formatting

The following worked for me although I don't know if it's the 'right'
way to do it. Reading the spec can give you a headache.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">

  <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format ref="default-dfdl-properties" />
    </xs:appinfo>
  </xs:annotation>

  <xs:element name="FOO"
              dfdl:initiator="FOO/"
              dfdl:lengthKind="implicit"
              dfdl:terminator="%NL;%WSP*;">

    <xs:complexType>
      <xs:sequence dfdl:sequenceKind="ordered"
                   dfdl:separator="/"
                   dfdl:separatorPosition="infix">

        <xs:element name="elem1">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:minLength value="1"/>
              <xs:maxLength value="14"/>
              <xs:pattern value="[A-Z0-9,:%#*\- ]+"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:element>

        <xs:element name="elem2">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:pattern value="CAT|DOG|HORSE"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:element>

        <xs:element name="elem3" dfdl:textNumberPattern="#0000">
          <xs:simpleType>
            <xs:restriction base="xs:int">
              <xs:minInclusive value="1"/>
              <xs:maxInclusive value="99999"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:element>

        <xs:element name="elem4" minOccurs="0" maxOccurs="1">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:minLength value="1"/>
              <xs:maxLength value="20"/>
            </xs:restriction>
          </xs:simpleType>
        </xs:element>

        <xs:sequence dfdl:separator="/" dfdl:terminator="/"
                     dfdl:separatorSuppressionPolicy="anyEmpty">
          <xs:element name="elem5" minOccurs="0" maxOccurs="1"
                      dfdl:textNumberPattern="000">
            <xs:simpleType>
              <xs:restriction base="xs:int">
                <xs:minInclusive value="1"/>
                <xs:maxInclusive value="999"/>
              </xs:restriction>
            </xs:simpleType>
          </xs:element>
        </xs:sequence>

      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

On Tue, Aug 31, 2021 at 9:31 AM Theodore Toth
<te...@sage.northcom.mil> wrote:
>
> Thanks for the response.
>
> On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike
> <mb...@owlcyberdefense.com> wrote:
> >
> > Good question.
> >
> > I think what is happening is this. elem5 fails to parse because it is an empty string, but then the parse backtracks, and here's the trick: that means it is putting back the separator before this array/optional element. Then your schema has nothing to absorb the final separator.
> >
> > Your schema has expressed an optional element, but what you want is a required separator, then an optional element after it.
> >
> > I think wrapping an xs:sequence around elem5 will fix this.
>
> So the required separator goes on the sequence?
>
> >
> > To be sure, I need to see the occursCountKind property, lengthKind property, etc. Basically I need to be able to reproduce your run.
> > I would need your default-dfdl-properties/defaults.dfdl.xsd file.
> >
> Here's my defaults that I pulled from the DFDL-part1 presentation:
>
> ?xml version="1.0" encoding="UTF-8"?>
>
> <schema xmlns="http://www.w3.org/2001/XMLSchema"
>         xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
>         xmlns:xs="http://www.w3.org/2001/XMLSchema">
>
>   <xs:annotation>
>     <xs:appinfo source="http://www.ogf.org/dfdl/">
>       <dfdl:defineFormat name="default-dfdl-properties">
>         <dfdl:format
>             alignment="1"
>             alignmentUnits="bytes"
>             binaryFloatRep="ieee"
>             binaryNumberRep="binary"
>             bitOrder="mostSignificantBitFirst"
>             byteOrder="bigEndian"
>             calendarPatternKind="implicit"
>             documentFinalTerminatorCanBeMissing="yes"
>             emptyValueDelimiterPolicy="none"
>             encoding="ISO-8859-1"
>             encodingErrorPolicy="replace"
>             escapeSchemeRef=""
>             fillByte="f"
>             floating="no"
>             ignoreCase="no"
>             initiator=""
>             initiatedContent="no"
>             leadingSkip="0"
>             lengthKind="delimited"
>             lengthUnits="characters"
>             nilKind="literalValue"
>             nilValueDelimiterPolicy="none"
>             occursCountKind="implicit"
>             outputNewLine="%CR;%LF;"
>             representation="text"
>             separator=""
>             separatorPosition="infix"
>             separatorSuppressionPolicy="never"
>             sequenceKind="ordered"
>             terminator=""
>             textBidi="no"
>             textNumberCheckPolicy="strict"
>             textNumberPattern="#,##0.###;-#,##0.###"
>             textNumberRep="standard"
>             textNumberRounding="explicit"
>             textNumberRoundingIncrement="0"
>             textNumberRoundingMode="roundUnnecessary"
>             textOutputMinLength="0"
>             textPadKind="none"
>             textStandardBase="10"
>             textStandardExponentRep="E"
>             textStandardInfinityRep="Inf"
>             textStandardNaNRep="NaN"
>             textStandardZeroRep="0"
>             textStandardDecimalSeparator="."
>             textStandardGroupingSeparator=","
>             textTrimKind="none"
>             trailingSkip="0"
>             truncateSpecifiedLengthString="no"
>             utf16Width="fixed"/>
>           </dfdl:defineFormat>
>         </xs:appinfo>
>       </xs:annotation>
>     </schema>
>
>
> > w.r.t your 0001 issue....
> >
> > The ability to control text number formats like leading zeros, is by way of the dfdl:textNumberPattern property. I think you want different values for this property for your two integer-type elements if they are supposed to have different numbers of digits, as evidenced by their max values of 999 and 99999.
> >
> > However, your request that 0001 be preserved is not consistent with either 999 nor 99999 as max values. So I'm not sure what you are trying to achieve in this format.
>
> Just trying to teach an old dog some new tricks.
>
> >
> > DFDL does not "remember how the integer was presented". It parses it according to rules, creates an xs:int in the infoset, and at that point the leading zero information is gone. It then unparses according to rules. If you want 0001 to parse and unparse as 0001, you want dfdl:textNumberPattern="#0000". That will give you 4 digits, optionally a fifth if needed, but will always produce 4.
> >
> > But in this case, if you are first parsing, then unparsing data, then incoming "01" will also unparse as "0001". Using dfdl:textNumberPattern="#0000" means "canonical form for this data is at least 4 digits". If you parse the data using dfdl:lengthKind='delimited', then your schema has expressed "tolerate any number of digits, but always canonicalize to at least 4 digits".
>
> I'll play with this.
>
> >
> > If you want the text of these numbers preserved, not canonicalized, and your application does both parse and unparse, like data security apps often do, then you need to use strings, not numbers.
>
> If I were to use strings how would I then validate that the value was
> in some range?
>
> >
> > Note, however, that preserving leading/trailing non-numerically significant zeros is a security hole - they can be used to carry covert channel data.
> > Canonicalization of data is fundamentally more secure.
> >
> > The usual reason people want preservation of data exactly, character for character, is to make test/QA easier. That's ok so long as you get that there is a loss of some data security when non-information-carrying things like leading/trailing zeros are preserved.
> >
> >
> >
> > ________________________________
> > From: Theodore Toth <te...@sage.northcom.mil>
> > Sent: Sunday, August 29, 2021 2:45 AM
> > To: users@daffodil.apache.org <us...@daffodil.apache.org>
> > Subject: optional int and unparse formatting
> >
> > I just started looking at daffodil and have a few questions about my
> > first experiment:
> > Here's my dfdl:
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <xs:schema
> >     xmlns:xs="http://www.w3.org/2001/XMLSchema"
> >     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
> >
> >   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
> >   <xs:annotation>
> >     <xs:appinfo source="http://www.ogf.org/dfdl/">
> >       <dfdl:format ref="default-dfdl-properties" />
> >     </xs:appinfo>
> >   </xs:annotation>
> >
> >   <xs:element name="FOO"
> >               dfdl:initiator="FOO/"
> >               dfdl:lengthKind="implicit">
> > <!--
> >               dfdl:terminator="//%NL;%WSP*;">
> > -->
> >     <xs:complexType>
> >       <xs:sequence dfdl:sequenceKind="ordered"
> >                    dfdl:separator="/"
> >                    dfdl:separatorPosition="infix">
> >
> >         <xs:element name="elem1">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:string">
> >               <xs:minLength value="1"/>
> >               <xs:maxLength value="14"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem2">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:string">
> >               <xs:pattern value="CAT|DOG|HORSE"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem3">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:int">
> >               <xs:minInclusive value="1"/>
> >               <xs:maxInclusive value="99999"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:string">
> >               <xs:minLength value="1"/>
> >               <xs:maxLength value="20"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem5" minOccurs="0" maxOccurs="1">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:int">
> >               <xs:minInclusive value="1"/>
> >               <xs:maxInclusive value="999"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >       </xs:sequence>
> >     </xs:complexType>
> >   </xs:element>
> >
> > </xs:schema>
> >
> > Here's some test data:
> > FOO/GONE FISHIN/DOG/0001///
> >
> > The parse fails with:
> > [error] Parse Error: Unable to parse xs:int from empty string
> > Schema context: elem5 Location line 59 column 10 in
> > file:/home/tedx/dfdl-test/test.dfdl.xsd
> > Data location was preceding byte 26
> >
> > Why does it fail when elem5 has minOccurs="0"? elem5 is optional.
> >
> > Then if I put a 0 before the last slash it generates:
> > <?xml version="1.0" encoding="UTF-8"?>
> > <FOO>
> >   <elem1>GONE FISHIN</elem1>
> >   <elem2>DOG</elem2>
> >   <elem3>1</elem3>
> >   <elem4></elem4>
> >   <elem5>0</elem5>
> > </FOO>
> >
> > and when I unparse it generates:
> > FOO/GONE FISHIN/DOG/1//0
> >
> > but I'd like it to output 0001 for elem3, how do I do that?
> >
> > Ted

Re: optional int and unparse formatting

Posted by Theodore Toth <te...@sage.northcom.mil>.
Yes I'm looking at the DI2E USMTF ATO/ACO schemas, thanks.
Unfortunately OTH-G doesn't define optional values settings and 'end
of set' like USMTF which is what I'm struggling with :(

Ted

On Thu, Oct 7, 2021 at 9:58 PM Mike Beckerle <mb...@apache.org> wrote:
>
> Ted,
>
> If you have access to the DI2E.net system, then this USMTF DFDL schema (partial. Mostly just ATO) may help you as OTH-G has similarities.
>
> https://bitbucket.di2e.net/projects/DFDL/repos/usmtf/browse
>
> If you don't have that access, then please get in contact privately and we'll arrange to get you a copy by other means.
>
> Of possible interest: I am currently adding features to Daffodil that will support OTH-G style check-digits i.e., verifying them, computing them on unparse.
> This will come out in release 3.2.0 later this year.
>
> -mikeb
>
>
>
>
> On Thu, Oct 7, 2021 at 6:35 AM Theodore Toth <te...@sage.northcom.mil> wrote:
>>
>> I'm still struggling with optional subelements at the end of an
>> element this time for a complex type, the approach that worked for a
>> simpleType doesn't work for a complex type. I'm getting  "[error]
>> Parse Error: Terminator '%NL;%WSP*;' not found". I'm not sure yet but
>> a newline might not be a valid terminator for a OTH-GOLD message line
>> :(
>> Also how would you specify an optional literal like '//' at the end of
>> an element when there can be other option subelements separated by '/'
>> prior to it?
>>
>> On Sat, Sep 18, 2021 at 4:12 AM Beckerle, Mike
>> <mb...@owlcyberdefense.com> wrote:
>> >
>> > Sorry for the late response on this. Turns out outlook 365 was spam filtering some apache emails. It's a known issue with microsoft's spam filters.
>> >
>> > The sequence wrapped around elem5 doesn't need a dfdl:separator because the elem5 has maxOccurs 1, so there will never be enough things to separate.
>> >
>> > Otherwise yeah, this looks like what I was suggesting.
>> >
>> > I agree that the DFDL spec is quite painful in numerous areas. Unfortunately I have to take the blame for some of that. Someday I hope some sections will get refactored and rewritten.
>> >
>> >
>> > ________________________________
>> > From: Theodore Toth <te...@sage.northcom.mil>
>> > Sent: Tuesday, August 31, 2021 12:21 AM
>> > To: users@daffodil.apache.org <us...@daffodil.apache.org>
>> > Subject: Re: optional int and unparse formatting
>> >
>> > The following worked for me although I don't know if it's the 'right'
>> > way to do it. Reading the spec can give you a headache.
>> >
>> > <?xml version="1.0" encoding="UTF-8"?>
>> > <xs:schema
>> >     xmlns:xs="http://www.w3.org/2001/XMLSchema"
>> >     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
>> >
>> >   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
>> >   <xs:annotation>
>> >     <xs:appinfo source="http://www.ogf.org/dfdl/">
>> >       <dfdl:format ref="default-dfdl-properties" />
>> >     </xs:appinfo>
>> >   </xs:annotation>
>> >
>> >   <xs:element name="FOO"
>> >               dfdl:initiator="FOO/"
>> >               dfdl:lengthKind="implicit"
>> >               dfdl:terminator="%NL;%WSP*;">
>> >
>> >     <xs:complexType>
>> >       <xs:sequence dfdl:sequenceKind="ordered"
>> >                    dfdl:separator="/"
>> >                    dfdl:separatorPosition="infix">
>> >
>> >         <xs:element name="elem1">
>> >           <xs:simpleType>
>> >             <xs:restriction base="xs:string">
>> >               <xs:minLength value="1"/>
>> >               <xs:maxLength value="14"/>
>> >               <xs:pattern value="[A-Z0-9,:%#*\- ]+"/>
>> >             </xs:restriction>
>> >           </xs:simpleType>
>> >         </xs:element>
>> >
>> >         <xs:element name="elem2">
>> >           <xs:simpleType>
>> >             <xs:restriction base="xs:string">
>> >               <xs:pattern value="CAT|DOG|HORSE"/>
>> >             </xs:restriction>
>> >           </xs:simpleType>
>> >         </xs:element>
>> >
>> >         <xs:element name="elem3" dfdl:textNumberPattern="#0000">
>> >           <xs:simpleType>
>> >             <xs:restriction base="xs:int">
>> >               <xs:minInclusive value="1"/>
>> >               <xs:maxInclusive value="99999"/>
>> >             </xs:restriction>
>> >           </xs:simpleType>
>> >         </xs:element>
>> >
>> >         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
>> >           <xs:simpleType>
>> >             <xs:restriction base="xs:string">
>> >               <xs:minLength value="1"/>
>> >               <xs:maxLength value="20"/>
>> >             </xs:restriction>
>> >           </xs:simpleType>
>> >         </xs:element>
>> >
>> >         <xs:sequence dfdl:separator="/" dfdl:terminator="/"
>> >                      dfdl:separatorSuppressionPolicy="anyEmpty">
>> >           <xs:element name="elem5" minOccurs="0" maxOccurs="1"
>> >                       dfdl:textNumberPattern="000">
>> >             <xs:simpleType>
>> >               <xs:restriction base="xs:int">
>> >                 <xs:minInclusive value="1"/>
>> >                 <xs:maxInclusive value="999"/>
>> >               </xs:restriction>
>> >             </xs:simpleType>
>> >           </xs:element>
>> >         </xs:sequence>
>> >
>> >       </xs:sequence>
>> >     </xs:complexType>
>> >   </xs:element>
>> >
>> > </xs:schema>
>> >
>> > On Tue, Aug 31, 2021 at 9:31 AM Theodore Toth
>> > <te...@sage.northcom.mil> wrote:
>> > >
>> > > Thanks for the response.
>> > >
>> > > On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike
>> > > <mb...@owlcyberdefense.com> wrote:
>> > > >
>> > > > Good question.
>> > > >
>> > > > I think what is happening is this. elem5 fails to parse because it is an empty string, but then the parse backtracks, and here's the trick: that means it is putting back the separator before this array/optional element. Then your schema has nothing to absorb the final separator.
>> > > >
>> > > > Your schema has expressed an optional element, but what you want is a required separator, then an optional element after it.
>> > > >
>> > > > I think wrapping an xs:sequence around elem5 will fix this.
>> > >
>> > > So the required separator goes on the sequence?
>> > >
>> > > >
>> > > > To be sure, I need to see the occursCountKind property, lengthKind property, etc. Basically I need to be able to reproduce your run.
>> > > > I would need your default-dfdl-properties/defaults.dfdl.xsd file.
>> > > >
>> > > Here's my defaults that I pulled from the DFDL-part1 presentation:
>> > >
>> > > ?xml version="1.0" encoding="UTF-8"?>
>> > >
>> > > <schema xmlns="http://www.w3.org/2001/XMLSchema"
>> > >         xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
>> > >         xmlns:xs="http://www.w3.org/2001/XMLSchema">
>> > >
>> > >   <xs:annotation>
>> > >     <xs:appinfo source="http://www.ogf.org/dfdl/">
>> > >       <dfdl:defineFormat name="default-dfdl-properties">
>> > >         <dfdl:format
>> > >             alignment="1"
>> > >             alignmentUnits="bytes"
>> > >             binaryFloatRep="ieee"
>> > >             binaryNumberRep="binary"
>> > >             bitOrder="mostSignificantBitFirst"
>> > >             byteOrder="bigEndian"
>> > >             calendarPatternKind="implicit"
>> > >             documentFinalTerminatorCanBeMissing="yes"
>> > >             emptyValueDelimiterPolicy="none"
>> > >             encoding="ISO-8859-1"
>> > >             encodingErrorPolicy="replace"
>> > >             escapeSchemeRef=""
>> > >             fillByte="f"
>> > >             floating="no"
>> > >             ignoreCase="no"
>> > >             initiator=""
>> > >             initiatedContent="no"
>> > >             leadingSkip="0"
>> > >             lengthKind="delimited"
>> > >             lengthUnits="characters"
>> > >             nilKind="literalValue"
>> > >             nilValueDelimiterPolicy="none"
>> > >             occursCountKind="implicit"
>> > >             outputNewLine="%CR;%LF;"
>> > >             representation="text"
>> > >             separator=""
>> > >             separatorPosition="infix"
>> > >             separatorSuppressionPolicy="never"
>> > >             sequenceKind="ordered"
>> > >             terminator=""
>> > >             textBidi="no"
>> > >             textNumberCheckPolicy="strict"
>> > >             textNumberPattern="#,##0.###;-#,##0.###"
>> > >             textNumberRep="standard"
>> > >             textNumberRounding="explicit"
>> > >             textNumberRoundingIncrement="0"
>> > >             textNumberRoundingMode="roundUnnecessary"
>> > >             textOutputMinLength="0"
>> > >             textPadKind="none"
>> > >             textStandardBase="10"
>> > >             textStandardExponentRep="E"
>> > >             textStandardInfinityRep="Inf"
>> > >             textStandardNaNRep="NaN"
>> > >             textStandardZeroRep="0"
>> > >             textStandardDecimalSeparator="."
>> > >             textStandardGroupingSeparator=","
>> > >             textTrimKind="none"
>> > >             trailingSkip="0"
>> > >             truncateSpecifiedLengthString="no"
>> > >             utf16Width="fixed"/>
>> > >           </dfdl:defineFormat>
>> > >         </xs:appinfo>
>> > >       </xs:annotation>
>> > >     </schema>
>> > >
>> > >
>> > > > w.r.t your 0001 issue....
>> > > >
>> > > > The ability to control text number formats like leading zeros, is by way of the dfdl:textNumberPattern property. I think you want different values for this property for your two integer-type elements if they are supposed to have different numbers of digits, as evidenced by their max values of 999 and 99999.
>> > > >
>> > > > However, your request that 0001 be preserved is not consistent with either 999 nor 99999 as max values. So I'm not sure what you are trying to achieve in this format.
>> > >
>> > > Just trying to teach an old dog some new tricks.
>> > >
>> > > >
>> > > > DFDL does not "remember how the integer was presented". It parses it according to rules, creates an xs:int in the infoset, and at that point the leading zero information is gone. It then unparses according to rules. If you want 0001 to parse and unparse as 0001, you want dfdl:textNumberPattern="#0000". That will give you 4 digits, optionally a fifth if needed, but will always produce 4.
>> > > >
>> > > > But in this case, if you are first parsing, then unparsing data, then incoming "01" will also unparse as "0001". Using dfdl:textNumberPattern="#0000" means "canonical form for this data is at least 4 digits". If you parse the data using dfdl:lengthKind='delimited', then your schema has expressed "tolerate any number of digits, but always canonicalize to at least 4 digits".
>> > >
>> > > I'll play with this.
>> > >
>> > > >
>> > > > If you want the text of these numbers preserved, not canonicalized, and your application does both parse and unparse, like data security apps often do, then you need to use strings, not numbers.
>> > >
>> > > If I were to use strings how would I then validate that the value was
>> > > in some range?
>> > >
>> > > >
>> > > > Note, however, that preserving leading/trailing non-numerically significant zeros is a security hole - they can be used to carry covert channel data.
>> > > > Canonicalization of data is fundamentally more secure.
>> > > >
>> > > > The usual reason people want preservation of data exactly, character for character, is to make test/QA easier. That's ok so long as you get that there is a loss of some data security when non-information-carrying things like leading/trailing zeros are preserved.
>> > > >
>> > > >
>> > > >
>> > > > ________________________________
>> > > > From: Theodore Toth <te...@sage.northcom.mil>
>> > > > Sent: Sunday, August 29, 2021 2:45 AM
>> > > > To: users@daffodil.apache.org <us...@daffodil.apache.org>
>> > > > Subject: optional int and unparse formatting
>> > > >
>> > > > I just started looking at daffodil and have a few questions about my
>> > > > first experiment:
>> > > > Here's my dfdl:
>> > > >
>> > > > <?xml version="1.0" encoding="UTF-8"?>
>> > > > <xs:schema
>> > > >     xmlns:xs="http://www.w3.org/2001/XMLSchema"
>> > > >     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
>> > > >
>> > > >   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
>> > > >   <xs:annotation>
>> > > >     <xs:appinfo source="http://www.ogf.org/dfdl/">
>> > > >       <dfdl:format ref="default-dfdl-properties" />
>> > > >     </xs:appinfo>
>> > > >   </xs:annotation>
>> > > >
>> > > >   <xs:element name="FOO"
>> > > >               dfdl:initiator="FOO/"
>> > > >               dfdl:lengthKind="implicit">
>> > > > <!--
>> > > >               dfdl:terminator="//%NL;%WSP*;">
>> > > > -->
>> > > >     <xs:complexType>
>> > > >       <xs:sequence dfdl:sequenceKind="ordered"
>> > > >                    dfdl:separator="/"
>> > > >                    dfdl:separatorPosition="infix">
>> > > >
>> > > >         <xs:element name="elem1">
>> > > >           <xs:simpleType>
>> > > >             <xs:restriction base="xs:string">
>> > > >               <xs:minLength value="1"/>
>> > > >               <xs:maxLength value="14"/>
>> > > >             </xs:restriction>
>> > > >           </xs:simpleType>
>> > > >         </xs:element>
>> > > >
>> > > >         <xs:element name="elem2">
>> > > >           <xs:simpleType>
>> > > >             <xs:restriction base="xs:string">
>> > > >               <xs:pattern value="CAT|DOG|HORSE"/>
>> > > >             </xs:restriction>
>> > > >           </xs:simpleType>
>> > > >         </xs:element>
>> > > >
>> > > >         <xs:element name="elem3">
>> > > >           <xs:simpleType>
>> > > >             <xs:restriction base="xs:int">
>> > > >               <xs:minInclusive value="1"/>
>> > > >               <xs:maxInclusive value="99999"/>
>> > > >             </xs:restriction>
>> > > >           </xs:simpleType>
>> > > >         </xs:element>
>> > > >
>> > > >         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
>> > > >           <xs:simpleType>
>> > > >             <xs:restriction base="xs:string">
>> > > >               <xs:minLength value="1"/>
>> > > >               <xs:maxLength value="20"/>
>> > > >             </xs:restriction>
>> > > >           </xs:simpleType>
>> > > >         </xs:element>
>> > > >
>> > > >         <xs:element name="elem5" minOccurs="0" maxOccurs="1">
>> > > >           <xs:simpleType>
>> > > >             <xs:restriction base="xs:int">
>> > > >               <xs:minInclusive value="1"/>
>> > > >               <xs:maxInclusive value="999"/>
>> > > >             </xs:restriction>
>> > > >           </xs:simpleType>
>> > > >         </xs:element>
>> > > >       </xs:sequence>
>> > > >     </xs:complexType>
>> > > >   </xs:element>
>> > > >
>> > > > </xs:schema>
>> > > >
>> > > > Here's some test data:
>> > > > FOO/GONE FISHIN/DOG/0001///
>> > > >
>> > > > The parse fails with:
>> > > > [error] Parse Error: Unable to parse xs:int from empty string
>> > > > Schema context: elem5 Location line 59 column 10 in
>> > > > file:/home/tedx/dfdl-test/test.dfdl.xsd
>> > > > Data location was preceding byte 26
>> > > >
>> > > > Why does it fail when elem5 has minOccurs="0"? elem5 is optional.
>> > > >
>> > > > Then if I put a 0 before the last slash it generates:
>> > > > <?xml version="1.0" encoding="UTF-8"?>
>> > > > <FOO>
>> > > >   <elem1>GONE FISHIN</elem1>
>> > > >   <elem2>DOG</elem2>
>> > > >   <elem3>1</elem3>
>> > > >   <elem4></elem4>
>> > > >   <elem5>0</elem5>
>> > > > </FOO>
>> > > >
>> > > > and when I unparse it generates:
>> > > > FOO/GONE FISHIN/DOG/1//0
>> > > >
>> > > > but I'd like it to output 0001 for elem3, how do I do that?
>> > > >
>> > > > Ted

Re: optional int and unparse formatting

Posted by Mike Beckerle <mb...@apache.org>.
Ted,

If you have access to the DI2E.net system, then this USMTF DFDL schema
(partial. Mostly just ATO) may help you as OTH-G has similarities.

https://bitbucket.di2e.net/projects/DFDL/repos/usmtf/browse

If you don't have that access, then please get in contact privately and
we'll arrange to get you a copy by other means.

Of possible interest: I am currently adding features to Daffodil that will
support OTH-G style check-digits i.e., verifying them, computing them on
unparse.
This will come out in release 3.2.0 later this year.

-mikeb




On Thu, Oct 7, 2021 at 6:35 AM Theodore Toth <te...@sage.northcom.mil>
wrote:

> I'm still struggling with optional subelements at the end of an
> element this time for a complex type, the approach that worked for a
> simpleType doesn't work for a complex type. I'm getting  "[error]
> Parse Error: Terminator '%NL;%WSP*;' not found". I'm not sure yet but
> a newline might not be a valid terminator for a OTH-GOLD message line
> :(
> Also how would you specify an optional literal like '//' at the end of
> an element when there can be other option subelements separated by '/'
> prior to it?
>
> On Sat, Sep 18, 2021 at 4:12 AM Beckerle, Mike
> <mb...@owlcyberdefense.com> wrote:
> >
> > Sorry for the late response on this. Turns out outlook 365 was spam
> filtering some apache emails. It's a known issue with microsoft's spam
> filters.
> >
> > The sequence wrapped around elem5 doesn't need a dfdl:separator because
> the elem5 has maxOccurs 1, so there will never be enough things to separate.
> >
> > Otherwise yeah, this looks like what I was suggesting.
> >
> > I agree that the DFDL spec is quite painful in numerous areas.
> Unfortunately I have to take the blame for some of that. Someday I hope
> some sections will get refactored and rewritten.
> >
> >
> > ________________________________
> > From: Theodore Toth <te...@sage.northcom.mil>
> > Sent: Tuesday, August 31, 2021 12:21 AM
> > To: users@daffodil.apache.org <us...@daffodil.apache.org>
> > Subject: Re: optional int and unparse formatting
> >
> > The following worked for me although I don't know if it's the 'right'
> > way to do it. Reading the spec can give you a headache.
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <xs:schema
> >     xmlns:xs="http://www.w3.org/2001/XMLSchema"
> >     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
> >
> >   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd"
> />
> >   <xs:annotation>
> >     <xs:appinfo source="http://www.ogf.org/dfdl/">
> >       <dfdl:format ref="default-dfdl-properties" />
> >     </xs:appinfo>
> >   </xs:annotation>
> >
> >   <xs:element name="FOO"
> >               dfdl:initiator="FOO/"
> >               dfdl:lengthKind="implicit"
> >               dfdl:terminator="%NL;%WSP*;">
> >
> >     <xs:complexType>
> >       <xs:sequence dfdl:sequenceKind="ordered"
> >                    dfdl:separator="/"
> >                    dfdl:separatorPosition="infix">
> >
> >         <xs:element name="elem1">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:string">
> >               <xs:minLength value="1"/>
> >               <xs:maxLength value="14"/>
> >               <xs:pattern value="[A-Z0-9,:%#*\- ]+"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem2">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:string">
> >               <xs:pattern value="CAT|DOG|HORSE"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem3" dfdl:textNumberPattern="#0000">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:int">
> >               <xs:minInclusive value="1"/>
> >               <xs:maxInclusive value="99999"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
> >           <xs:simpleType>
> >             <xs:restriction base="xs:string">
> >               <xs:minLength value="1"/>
> >               <xs:maxLength value="20"/>
> >             </xs:restriction>
> >           </xs:simpleType>
> >         </xs:element>
> >
> >         <xs:sequence dfdl:separator="/" dfdl:terminator="/"
> >                      dfdl:separatorSuppressionPolicy="anyEmpty">
> >           <xs:element name="elem5" minOccurs="0" maxOccurs="1"
> >                       dfdl:textNumberPattern="000">
> >             <xs:simpleType>
> >               <xs:restriction base="xs:int">
> >                 <xs:minInclusive value="1"/>
> >                 <xs:maxInclusive value="999"/>
> >               </xs:restriction>
> >             </xs:simpleType>
> >           </xs:element>
> >         </xs:sequence>
> >
> >       </xs:sequence>
> >     </xs:complexType>
> >   </xs:element>
> >
> > </xs:schema>
> >
> > On Tue, Aug 31, 2021 at 9:31 AM Theodore Toth
> > <te...@sage.northcom.mil> wrote:
> > >
> > > Thanks for the response.
> > >
> > > On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike
> > > <mb...@owlcyberdefense.com> wrote:
> > > >
> > > > Good question.
> > > >
> > > > I think what is happening is this. elem5 fails to parse because it
> is an empty string, but then the parse backtracks, and here's the trick:
> that means it is putting back the separator before this array/optional
> element. Then your schema has nothing to absorb the final separator.
> > > >
> > > > Your schema has expressed an optional element, but what you want is
> a required separator, then an optional element after it.
> > > >
> > > > I think wrapping an xs:sequence around elem5 will fix this.
> > >
> > > So the required separator goes on the sequence?
> > >
> > > >
> > > > To be sure, I need to see the occursCountKind property, lengthKind
> property, etc. Basically I need to be able to reproduce your run.
> > > > I would need your default-dfdl-properties/defaults.dfdl.xsd file.
> > > >
> > > Here's my defaults that I pulled from the DFDL-part1 presentation:
> > >
> > > ?xml version="1.0" encoding="UTF-8"?>
> > >
> > > <schema xmlns="http://www.w3.org/2001/XMLSchema"
> > >         xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
> > >         xmlns:xs="http://www.w3.org/2001/XMLSchema">
> > >
> > >   <xs:annotation>
> > >     <xs:appinfo source="http://www.ogf.org/dfdl/">
> > >       <dfdl:defineFormat name="default-dfdl-properties">
> > >         <dfdl:format
> > >             alignment="1"
> > >             alignmentUnits="bytes"
> > >             binaryFloatRep="ieee"
> > >             binaryNumberRep="binary"
> > >             bitOrder="mostSignificantBitFirst"
> > >             byteOrder="bigEndian"
> > >             calendarPatternKind="implicit"
> > >             documentFinalTerminatorCanBeMissing="yes"
> > >             emptyValueDelimiterPolicy="none"
> > >             encoding="ISO-8859-1"
> > >             encodingErrorPolicy="replace"
> > >             escapeSchemeRef=""
> > >             fillByte="f"
> > >             floating="no"
> > >             ignoreCase="no"
> > >             initiator=""
> > >             initiatedContent="no"
> > >             leadingSkip="0"
> > >             lengthKind="delimited"
> > >             lengthUnits="characters"
> > >             nilKind="literalValue"
> > >             nilValueDelimiterPolicy="none"
> > >             occursCountKind="implicit"
> > >             outputNewLine="%CR;%LF;"
> > >             representation="text"
> > >             separator=""
> > >             separatorPosition="infix"
> > >             separatorSuppressionPolicy="never"
> > >             sequenceKind="ordered"
> > >             terminator=""
> > >             textBidi="no"
> > >             textNumberCheckPolicy="strict"
> > >             textNumberPattern="#,##0.###;-#,##0.###"
> > >             textNumberRep="standard"
> > >             textNumberRounding="explicit"
> > >             textNumberRoundingIncrement="0"
> > >             textNumberRoundingMode="roundUnnecessary"
> > >             textOutputMinLength="0"
> > >             textPadKind="none"
> > >             textStandardBase="10"
> > >             textStandardExponentRep="E"
> > >             textStandardInfinityRep="Inf"
> > >             textStandardNaNRep="NaN"
> > >             textStandardZeroRep="0"
> > >             textStandardDecimalSeparator="."
> > >             textStandardGroupingSeparator=","
> > >             textTrimKind="none"
> > >             trailingSkip="0"
> > >             truncateSpecifiedLengthString="no"
> > >             utf16Width="fixed"/>
> > >           </dfdl:defineFormat>
> > >         </xs:appinfo>
> > >       </xs:annotation>
> > >     </schema>
> > >
> > >
> > > > w.r.t your 0001 issue....
> > > >
> > > > The ability to control text number formats like leading zeros, is by
> way of the dfdl:textNumberPattern property. I think you want different
> values for this property for your two integer-type elements if they are
> supposed to have different numbers of digits, as evidenced by their max
> values of 999 and 99999.
> > > >
> > > > However, your request that 0001 be preserved is not consistent with
> either 999 nor 99999 as max values. So I'm not sure what you are trying to
> achieve in this format.
> > >
> > > Just trying to teach an old dog some new tricks.
> > >
> > > >
> > > > DFDL does not "remember how the integer was presented". It parses it
> according to rules, creates an xs:int in the infoset, and at that point the
> leading zero information is gone. It then unparses according to rules. If
> you want 0001 to parse and unparse as 0001, you want
> dfdl:textNumberPattern="#0000". That will give you 4 digits, optionally a
> fifth if needed, but will always produce 4.
> > > >
> > > > But in this case, if you are first parsing, then unparsing data,
> then incoming "01" will also unparse as "0001". Using
> dfdl:textNumberPattern="#0000" means "canonical form for this data is at
> least 4 digits". If you parse the data using dfdl:lengthKind='delimited',
> then your schema has expressed "tolerate any number of digits, but always
> canonicalize to at least 4 digits".
> > >
> > > I'll play with this.
> > >
> > > >
> > > > If you want the text of these numbers preserved, not canonicalized,
> and your application does both parse and unparse, like data security apps
> often do, then you need to use strings, not numbers.
> > >
> > > If I were to use strings how would I then validate that the value was
> > > in some range?
> > >
> > > >
> > > > Note, however, that preserving leading/trailing non-numerically
> significant zeros is a security hole - they can be used to carry covert
> channel data.
> > > > Canonicalization of data is fundamentally more secure.
> > > >
> > > > The usual reason people want preservation of data exactly, character
> for character, is to make test/QA easier. That's ok so long as you get that
> there is a loss of some data security when non-information-carrying things
> like leading/trailing zeros are preserved.
> > > >
> > > >
> > > >
> > > > ________________________________
> > > > From: Theodore Toth <te...@sage.northcom.mil>
> > > > Sent: Sunday, August 29, 2021 2:45 AM
> > > > To: users@daffodil.apache.org <us...@daffodil.apache.org>
> > > > Subject: optional int and unparse formatting
> > > >
> > > > I just started looking at daffodil and have a few questions about my
> > > > first experiment:
> > > > Here's my dfdl:
> > > >
> > > > <?xml version="1.0" encoding="UTF-8"?>
> > > > <xs:schema
> > > >     xmlns:xs="http://www.w3.org/2001/XMLSchema"
> > > >     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
> > > >
> > > >   <xs:include
> schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
> > > >   <xs:annotation>
> > > >     <xs:appinfo source="http://www.ogf.org/dfdl/">
> > > >       <dfdl:format ref="default-dfdl-properties" />
> > > >     </xs:appinfo>
> > > >   </xs:annotation>
> > > >
> > > >   <xs:element name="FOO"
> > > >               dfdl:initiator="FOO/"
> > > >               dfdl:lengthKind="implicit">
> > > > <!--
> > > >               dfdl:terminator="//%NL;%WSP*;">
> > > > -->
> > > >     <xs:complexType>
> > > >       <xs:sequence dfdl:sequenceKind="ordered"
> > > >                    dfdl:separator="/"
> > > >                    dfdl:separatorPosition="infix">
> > > >
> > > >         <xs:element name="elem1">
> > > >           <xs:simpleType>
> > > >             <xs:restriction base="xs:string">
> > > >               <xs:minLength value="1"/>
> > > >               <xs:maxLength value="14"/>
> > > >             </xs:restriction>
> > > >           </xs:simpleType>
> > > >         </xs:element>
> > > >
> > > >         <xs:element name="elem2">
> > > >           <xs:simpleType>
> > > >             <xs:restriction base="xs:string">
> > > >               <xs:pattern value="CAT|DOG|HORSE"/>
> > > >             </xs:restriction>
> > > >           </xs:simpleType>
> > > >         </xs:element>
> > > >
> > > >         <xs:element name="elem3">
> > > >           <xs:simpleType>
> > > >             <xs:restriction base="xs:int">
> > > >               <xs:minInclusive value="1"/>
> > > >               <xs:maxInclusive value="99999"/>
> > > >             </xs:restriction>
> > > >           </xs:simpleType>
> > > >         </xs:element>
> > > >
> > > >         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
> > > >           <xs:simpleType>
> > > >             <xs:restriction base="xs:string">
> > > >               <xs:minLength value="1"/>
> > > >               <xs:maxLength value="20"/>
> > > >             </xs:restriction>
> > > >           </xs:simpleType>
> > > >         </xs:element>
> > > >
> > > >         <xs:element name="elem5" minOccurs="0" maxOccurs="1">
> > > >           <xs:simpleType>
> > > >             <xs:restriction base="xs:int">
> > > >               <xs:minInclusive value="1"/>
> > > >               <xs:maxInclusive value="999"/>
> > > >             </xs:restriction>
> > > >           </xs:simpleType>
> > > >         </xs:element>
> > > >       </xs:sequence>
> > > >     </xs:complexType>
> > > >   </xs:element>
> > > >
> > > > </xs:schema>
> > > >
> > > > Here's some test data:
> > > > FOO/GONE FISHIN/DOG/0001///
> > > >
> > > > The parse fails with:
> > > > [error] Parse Error: Unable to parse xs:int from empty string
> > > > Schema context: elem5 Location line 59 column 10 in
> > > > file:/home/tedx/dfdl-test/test.dfdl.xsd
> > > > Data location was preceding byte 26
> > > >
> > > > Why does it fail when elem5 has minOccurs="0"? elem5 is optional.
> > > >
> > > > Then if I put a 0 before the last slash it generates:
> > > > <?xml version="1.0" encoding="UTF-8"?>
> > > > <FOO>
> > > >   <elem1>GONE FISHIN</elem1>
> > > >   <elem2>DOG</elem2>
> > > >   <elem3>1</elem3>
> > > >   <elem4></elem4>
> > > >   <elem5>0</elem5>
> > > > </FOO>
> > > >
> > > > and when I unparse it generates:
> > > > FOO/GONE FISHIN/DOG/1//0
> > > >
> > > > but I'd like it to output 0001 for elem3, how do I do that?
> > > >
> > > > Ted
>

Re: optional int and unparse formatting

Posted by Theodore Toth <te...@sage.northcom.mil>.
I'm still struggling with optional subelements at the end of an
element this time for a complex type, the approach that worked for a
simpleType doesn't work for a complex type. I'm getting  "[error]
Parse Error: Terminator '%NL;%WSP*;' not found". I'm not sure yet but
a newline might not be a valid terminator for a OTH-GOLD message line
:(
Also how would you specify an optional literal like '//' at the end of
an element when there can be other option subelements separated by '/'
prior to it?

On Sat, Sep 18, 2021 at 4:12 AM Beckerle, Mike
<mb...@owlcyberdefense.com> wrote:
>
> Sorry for the late response on this. Turns out outlook 365 was spam filtering some apache emails. It's a known issue with microsoft's spam filters.
>
> The sequence wrapped around elem5 doesn't need a dfdl:separator because the elem5 has maxOccurs 1, so there will never be enough things to separate.
>
> Otherwise yeah, this looks like what I was suggesting.
>
> I agree that the DFDL spec is quite painful in numerous areas. Unfortunately I have to take the blame for some of that. Someday I hope some sections will get refactored and rewritten.
>
>
> ________________________________
> From: Theodore Toth <te...@sage.northcom.mil>
> Sent: Tuesday, August 31, 2021 12:21 AM
> To: users@daffodil.apache.org <us...@daffodil.apache.org>
> Subject: Re: optional int and unparse formatting
>
> The following worked for me although I don't know if it's the 'right'
> way to do it. Reading the spec can give you a headache.
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema
>     xmlns:xs="http://www.w3.org/2001/XMLSchema"
>     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
>
>   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
>   <xs:annotation>
>     <xs:appinfo source="http://www.ogf.org/dfdl/">
>       <dfdl:format ref="default-dfdl-properties" />
>     </xs:appinfo>
>   </xs:annotation>
>
>   <xs:element name="FOO"
>               dfdl:initiator="FOO/"
>               dfdl:lengthKind="implicit"
>               dfdl:terminator="%NL;%WSP*;">
>
>     <xs:complexType>
>       <xs:sequence dfdl:sequenceKind="ordered"
>                    dfdl:separator="/"
>                    dfdl:separatorPosition="infix">
>
>         <xs:element name="elem1">
>           <xs:simpleType>
>             <xs:restriction base="xs:string">
>               <xs:minLength value="1"/>
>               <xs:maxLength value="14"/>
>               <xs:pattern value="[A-Z0-9,:%#*\- ]+"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:element name="elem2">
>           <xs:simpleType>
>             <xs:restriction base="xs:string">
>               <xs:pattern value="CAT|DOG|HORSE"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:element name="elem3" dfdl:textNumberPattern="#0000">
>           <xs:simpleType>
>             <xs:restriction base="xs:int">
>               <xs:minInclusive value="1"/>
>               <xs:maxInclusive value="99999"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
>           <xs:simpleType>
>             <xs:restriction base="xs:string">
>               <xs:minLength value="1"/>
>               <xs:maxLength value="20"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:sequence dfdl:separator="/" dfdl:terminator="/"
>                      dfdl:separatorSuppressionPolicy="anyEmpty">
>           <xs:element name="elem5" minOccurs="0" maxOccurs="1"
>                       dfdl:textNumberPattern="000">
>             <xs:simpleType>
>               <xs:restriction base="xs:int">
>                 <xs:minInclusive value="1"/>
>                 <xs:maxInclusive value="999"/>
>               </xs:restriction>
>             </xs:simpleType>
>           </xs:element>
>         </xs:sequence>
>
>       </xs:sequence>
>     </xs:complexType>
>   </xs:element>
>
> </xs:schema>
>
> On Tue, Aug 31, 2021 at 9:31 AM Theodore Toth
> <te...@sage.northcom.mil> wrote:
> >
> > Thanks for the response.
> >
> > On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike
> > <mb...@owlcyberdefense.com> wrote:
> > >
> > > Good question.
> > >
> > > I think what is happening is this. elem5 fails to parse because it is an empty string, but then the parse backtracks, and here's the trick: that means it is putting back the separator before this array/optional element. Then your schema has nothing to absorb the final separator.
> > >
> > > Your schema has expressed an optional element, but what you want is a required separator, then an optional element after it.
> > >
> > > I think wrapping an xs:sequence around elem5 will fix this.
> >
> > So the required separator goes on the sequence?
> >
> > >
> > > To be sure, I need to see the occursCountKind property, lengthKind property, etc. Basically I need to be able to reproduce your run.
> > > I would need your default-dfdl-properties/defaults.dfdl.xsd file.
> > >
> > Here's my defaults that I pulled from the DFDL-part1 presentation:
> >
> > ?xml version="1.0" encoding="UTF-8"?>
> >
> > <schema xmlns="http://www.w3.org/2001/XMLSchema"
> >         xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
> >         xmlns:xs="http://www.w3.org/2001/XMLSchema">
> >
> >   <xs:annotation>
> >     <xs:appinfo source="http://www.ogf.org/dfdl/">
> >       <dfdl:defineFormat name="default-dfdl-properties">
> >         <dfdl:format
> >             alignment="1"
> >             alignmentUnits="bytes"
> >             binaryFloatRep="ieee"
> >             binaryNumberRep="binary"
> >             bitOrder="mostSignificantBitFirst"
> >             byteOrder="bigEndian"
> >             calendarPatternKind="implicit"
> >             documentFinalTerminatorCanBeMissing="yes"
> >             emptyValueDelimiterPolicy="none"
> >             encoding="ISO-8859-1"
> >             encodingErrorPolicy="replace"
> >             escapeSchemeRef=""
> >             fillByte="f"
> >             floating="no"
> >             ignoreCase="no"
> >             initiator=""
> >             initiatedContent="no"
> >             leadingSkip="0"
> >             lengthKind="delimited"
> >             lengthUnits="characters"
> >             nilKind="literalValue"
> >             nilValueDelimiterPolicy="none"
> >             occursCountKind="implicit"
> >             outputNewLine="%CR;%LF;"
> >             representation="text"
> >             separator=""
> >             separatorPosition="infix"
> >             separatorSuppressionPolicy="never"
> >             sequenceKind="ordered"
> >             terminator=""
> >             textBidi="no"
> >             textNumberCheckPolicy="strict"
> >             textNumberPattern="#,##0.###;-#,##0.###"
> >             textNumberRep="standard"
> >             textNumberRounding="explicit"
> >             textNumberRoundingIncrement="0"
> >             textNumberRoundingMode="roundUnnecessary"
> >             textOutputMinLength="0"
> >             textPadKind="none"
> >             textStandardBase="10"
> >             textStandardExponentRep="E"
> >             textStandardInfinityRep="Inf"
> >             textStandardNaNRep="NaN"
> >             textStandardZeroRep="0"
> >             textStandardDecimalSeparator="."
> >             textStandardGroupingSeparator=","
> >             textTrimKind="none"
> >             trailingSkip="0"
> >             truncateSpecifiedLengthString="no"
> >             utf16Width="fixed"/>
> >           </dfdl:defineFormat>
> >         </xs:appinfo>
> >       </xs:annotation>
> >     </schema>
> >
> >
> > > w.r.t your 0001 issue....
> > >
> > > The ability to control text number formats like leading zeros, is by way of the dfdl:textNumberPattern property. I think you want different values for this property for your two integer-type elements if they are supposed to have different numbers of digits, as evidenced by their max values of 999 and 99999.
> > >
> > > However, your request that 0001 be preserved is not consistent with either 999 nor 99999 as max values. So I'm not sure what you are trying to achieve in this format.
> >
> > Just trying to teach an old dog some new tricks.
> >
> > >
> > > DFDL does not "remember how the integer was presented". It parses it according to rules, creates an xs:int in the infoset, and at that point the leading zero information is gone. It then unparses according to rules. If you want 0001 to parse and unparse as 0001, you want dfdl:textNumberPattern="#0000". That will give you 4 digits, optionally a fifth if needed, but will always produce 4.
> > >
> > > But in this case, if you are first parsing, then unparsing data, then incoming "01" will also unparse as "0001". Using dfdl:textNumberPattern="#0000" means "canonical form for this data is at least 4 digits". If you parse the data using dfdl:lengthKind='delimited', then your schema has expressed "tolerate any number of digits, but always canonicalize to at least 4 digits".
> >
> > I'll play with this.
> >
> > >
> > > If you want the text of these numbers preserved, not canonicalized, and your application does both parse and unparse, like data security apps often do, then you need to use strings, not numbers.
> >
> > If I were to use strings how would I then validate that the value was
> > in some range?
> >
> > >
> > > Note, however, that preserving leading/trailing non-numerically significant zeros is a security hole - they can be used to carry covert channel data.
> > > Canonicalization of data is fundamentally more secure.
> > >
> > > The usual reason people want preservation of data exactly, character for character, is to make test/QA easier. That's ok so long as you get that there is a loss of some data security when non-information-carrying things like leading/trailing zeros are preserved.
> > >
> > >
> > >
> > > ________________________________
> > > From: Theodore Toth <te...@sage.northcom.mil>
> > > Sent: Sunday, August 29, 2021 2:45 AM
> > > To: users@daffodil.apache.org <us...@daffodil.apache.org>
> > > Subject: optional int and unparse formatting
> > >
> > > I just started looking at daffodil and have a few questions about my
> > > first experiment:
> > > Here's my dfdl:
> > >
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <xs:schema
> > >     xmlns:xs="http://www.w3.org/2001/XMLSchema"
> > >     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
> > >
> > >   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
> > >   <xs:annotation>
> > >     <xs:appinfo source="http://www.ogf.org/dfdl/">
> > >       <dfdl:format ref="default-dfdl-properties" />
> > >     </xs:appinfo>
> > >   </xs:annotation>
> > >
> > >   <xs:element name="FOO"
> > >               dfdl:initiator="FOO/"
> > >               dfdl:lengthKind="implicit">
> > > <!--
> > >               dfdl:terminator="//%NL;%WSP*;">
> > > -->
> > >     <xs:complexType>
> > >       <xs:sequence dfdl:sequenceKind="ordered"
> > >                    dfdl:separator="/"
> > >                    dfdl:separatorPosition="infix">
> > >
> > >         <xs:element name="elem1">
> > >           <xs:simpleType>
> > >             <xs:restriction base="xs:string">
> > >               <xs:minLength value="1"/>
> > >               <xs:maxLength value="14"/>
> > >             </xs:restriction>
> > >           </xs:simpleType>
> > >         </xs:element>
> > >
> > >         <xs:element name="elem2">
> > >           <xs:simpleType>
> > >             <xs:restriction base="xs:string">
> > >               <xs:pattern value="CAT|DOG|HORSE"/>
> > >             </xs:restriction>
> > >           </xs:simpleType>
> > >         </xs:element>
> > >
> > >         <xs:element name="elem3">
> > >           <xs:simpleType>
> > >             <xs:restriction base="xs:int">
> > >               <xs:minInclusive value="1"/>
> > >               <xs:maxInclusive value="99999"/>
> > >             </xs:restriction>
> > >           </xs:simpleType>
> > >         </xs:element>
> > >
> > >         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
> > >           <xs:simpleType>
> > >             <xs:restriction base="xs:string">
> > >               <xs:minLength value="1"/>
> > >               <xs:maxLength value="20"/>
> > >             </xs:restriction>
> > >           </xs:simpleType>
> > >         </xs:element>
> > >
> > >         <xs:element name="elem5" minOccurs="0" maxOccurs="1">
> > >           <xs:simpleType>
> > >             <xs:restriction base="xs:int">
> > >               <xs:minInclusive value="1"/>
> > >               <xs:maxInclusive value="999"/>
> > >             </xs:restriction>
> > >           </xs:simpleType>
> > >         </xs:element>
> > >       </xs:sequence>
> > >     </xs:complexType>
> > >   </xs:element>
> > >
> > > </xs:schema>
> > >
> > > Here's some test data:
> > > FOO/GONE FISHIN/DOG/0001///
> > >
> > > The parse fails with:
> > > [error] Parse Error: Unable to parse xs:int from empty string
> > > Schema context: elem5 Location line 59 column 10 in
> > > file:/home/tedx/dfdl-test/test.dfdl.xsd
> > > Data location was preceding byte 26
> > >
> > > Why does it fail when elem5 has minOccurs="0"? elem5 is optional.
> > >
> > > Then if I put a 0 before the last slash it generates:
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <FOO>
> > >   <elem1>GONE FISHIN</elem1>
> > >   <elem2>DOG</elem2>
> > >   <elem3>1</elem3>
> > >   <elem4></elem4>
> > >   <elem5>0</elem5>
> > > </FOO>
> > >
> > > and when I unparse it generates:
> > > FOO/GONE FISHIN/DOG/1//0
> > >
> > > but I'd like it to output 0001 for elem3, how do I do that?
> > >
> > > Ted