You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Roger L Costello <co...@mitre.org> on 2021/11/08 17:40:56 UTC

Two approaches to specifying NumStudents ... a side by side comparison of the approaches

Hi Folks,

My input consists of a single non-zero integer representing the number of students in a classroom, e.g.,

	30

One way to specify that data item is to use the DFDL integer properties. Here's how to do it. You must provide all these DFDL properties or you will get an error.

<xs:element name="NumStudents" type="xs:integer"
    dfdl:textNumberPattern="#"
    dfdl:textNumberRep="standard"
    dfdl:textStandardBase="10"
    dfdl:textStandardExponentRep="E"
    dfdl:textStandardZeroRep="0"
/>

Actually, Daffodil has bugs in it and currently requires even more properties than those shown (it requires the rounding properties), but let's assume the bugs are fixed. 

This first approach allows the input to contain things that I don't want. For example, it allows the input to contain an exponential number:

	3E1

Also, it allows the input to contain zero students:

	0

You cannot prohibit these (illegal) values when using the DFDL integer properties!

An alternate approach is to specify the number of students using the string regex properties:

<xs:element name="NumStudents" type="xs:string"
    dfdl:lengthKind="pattern"
    dfdl:lengthUnits="characters"
    dfdl:lengthPattern="[1-9][0-9]+">
    <xs:annotation>
        <xs:appinfo source="http://www.ogf.org/dfdl/">
            <dfdl:assert test='{ . ne "" }' />
        </xs:appinfo>
    </xs:annotation>
</xs:element>

This second approach allows exactly the desired input (no exponential numbers, no classrooms with zero students). 

You might argue that the second approach specifies number of students as a string, whereas number of students is an integer. But both approaches produce this XML:

<NumStudents>30</NumStudents>

Who cares if the DFDL schema specifies the number of students as a string (actually, as a string composed exclusively of digits)? It is up to the application that processes the XML to interpret 30 as an integer. So I reject the argument.

Regexes have a deep theoretical basis (finite automata theory). The DFDL integer properties (the first approach) has no theoretical basis.

I know what Mike Beckerle thinks about the second approach (he doesn't like it) but what do other people think? Which of the above two approaches is simpler, more appealing to you?

/Roger