You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/08/12 18:34:43 UTC

Modeling input data that is an integer with a fixed number of digits?

Hello DFDL community,

I want to confirm my understanding of the following DFDL:

<xs:element name="DataEntry"
    type="xs:int"
    dfdl:lengthKind="explicit"
    dfdl:length="2"
    dfdl:lengthUnits="characters"   />

That says the input is an integer that has exactly 2 digits.

Right?

It seems kind of strange to say the length units are characters. It would be less strange if I could say the length units are "digits" but that, of course, not legal. Any explanation of why length units of characters makes sense?

I reckon the above DFDL is kind of equivalent to the following XML Schema, right?

<xs:element name="DataEntry">
    <xs:simpleType>
        <xs:restriction base="xs:int">
            <xs:length value="2" />
        </xs:restriction>
    </xs:simpleType>
</xs:element>





Re: Modeling input data that is an integer with a fixed number of digits?

Posted by Steve Lawrence <sl...@apache.org>.
> That says the input is an integer that has exactly 2 digits.

Not quote. That DFDL snippet says that Daffodil will consume two
characters in the current encoding (so it treats multi-byte encodings
how you would expect), and then converts that two-character string to an
integer based on the dfdl:textNumberPattern and other textNumber
properties. So there's really no concept of parsing a certain number of
digits.

Part of the reason for this is that when representation is text, the
fundamental unit really is a single character. Groupings of these
characters might be interpreted as numbers, or delimiters, or other
things, but at its core the basic unit is a single character (or a byte
or bit in some formats).

Additionally, thinking about numbers as just digits is pretty limiting
in many cases, since numbers are often more than just the 0-9 digits.
They can often include positive and negative signs, grouping separators,
decimal separators, exponent characters, infinity characters, null
representations, prefixes/suffixes, and on and on. As  simple example,
the string "-1" is a two character number that only has single digit.

So the use of lengthUnits="digits" would really only be useful with
unsigned integers with no grouping separators/exponents/prefixes/etc.
Although that might be somewhat common, it's really just
lengthUnits="characters" with type="xs:unsignedInt" and a
dfdl:textNumberPattern that only accepts digits and nothing else.
lenghtUnits="digits" is really just a restriction of the more general
case using existing DFDL properties.


On 8/12/19 2:34 PM, Costello, Roger L. wrote:
> Hello DFDL community,
> 
> I want to confirm my understanding of the following DFDL:
> 
> <xs:elementname="DataEntry"
>      type="xs:int"
>      dfdl:lengthKind="explicit"
>      dfdl:length="2"
>      dfdl:lengthUnits="characters"/>
> 
> That says the input is an integer that has exactly 2 digits.
> 
> Right?
> 
> It seems kind of strange to say the length units are characters. It would be 
> less strange if I could say the length units are “digits” but that, of course, 
> not legal. Any explanation of why length units of characters makes sense?
> 
> I reckon the above DFDL is kind of equivalent to the following XML Schema, right?
> 
> <xs:elementname="DataEntry">
> <xs:simpleType>
> <xs:restrictionbase="xs:int">
> <xs:lengthvalue="2"/>
> </xs:restriction>
> </xs:simpleType>
> </xs:element>
>