You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/08/12 18:34:43 UTC
Modeling input data that is an integer with a fixed number of digits?
Hello DFDL community,
I want to confirm my understanding of the following DFDL:
<xs:element name="DataEntry"
type="xs:int"
dfdl:lengthKind="explicit"
dfdl:length="2"
dfdl:lengthUnits="characters" />
That says the input is an integer that has exactly 2 digits.
Right?
It seems kind of strange to say the length units are characters. It would be less strange if I could say the length units are "digits" but that, of course, not legal. Any explanation of why length units of characters makes sense?
I reckon the above DFDL is kind of equivalent to the following XML Schema, right?
<xs:element name="DataEntry">
<xs:simpleType>
<xs:restriction base="xs:int">
<xs:length value="2" />
</xs:restriction>
</xs:simpleType>
</xs:element>
Re: Modeling input data that is an integer with a fixed number of
digits?
Posted by Steve Lawrence <sl...@apache.org>.
> That says the input is an integer that has exactly 2 digits.
Not quote. That DFDL snippet says that Daffodil will consume two
characters in the current encoding (so it treats multi-byte encodings
how you would expect), and then converts that two-character string to an
integer based on the dfdl:textNumberPattern and other textNumber
properties. So there's really no concept of parsing a certain number of
digits.
Part of the reason for this is that when representation is text, the
fundamental unit really is a single character. Groupings of these
characters might be interpreted as numbers, or delimiters, or other
things, but at its core the basic unit is a single character (or a byte
or bit in some formats).
Additionally, thinking about numbers as just digits is pretty limiting
in many cases, since numbers are often more than just the 0-9 digits.
They can often include positive and negative signs, grouping separators,
decimal separators, exponent characters, infinity characters, null
representations, prefixes/suffixes, and on and on. As simple example,
the string "-1" is a two character number that only has single digit.
So the use of lengthUnits="digits" would really only be useful with
unsigned integers with no grouping separators/exponents/prefixes/etc.
Although that might be somewhat common, it's really just
lengthUnits="characters" with type="xs:unsignedInt" and a
dfdl:textNumberPattern that only accepts digits and nothing else.
lenghtUnits="digits" is really just a restriction of the more general
case using existing DFDL properties.
On 8/12/19 2:34 PM, Costello, Roger L. wrote:
> Hello DFDL community,
>
> I want to confirm my understanding of the following DFDL:
>
> <xs:elementname="DataEntry"
> type="xs:int"
> dfdl:lengthKind="explicit"
> dfdl:length="2"
> dfdl:lengthUnits="characters"/>
>
> That says the input is an integer that has exactly 2 digits.
>
> Right?
>
> It seems kind of strange to say the length units are characters. It would be
> less strange if I could say the length units are “digits” but that, of course,
> not legal. Any explanation of why length units of characters makes sense?
>
> I reckon the above DFDL is kind of equivalent to the following XML Schema, right?
>
> <xs:elementname="DataEntry">
> <xs:simpleType>
> <xs:restrictionbase="xs:int">
> <xs:lengthvalue="2"/>
> </xs:restriction>
> </xs:simpleType>
> </xs:element>
>