You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Roger L Costello <co...@mitre.org> on 2020/08/17 15:57:52 UTC

Example of input that fails due to grouping separators when textNumberCheckPolicy="strict"?

What is an example of input that will cause parsing the fail due to the grouping separators when textNumberCheckPolicy="strict"? Why isn't the below an example, i.e., why is no error generated with the below example?  

Why is it that with this input

1234

No error is raised when textNumberCheckPolicy="strict" and textNumberPattern="#,###" are specified:

<xs:element name="SimpleDataFormat">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="NumStudents" type="xs:nonNegativeInteger" 
                dfdl:textNumberCheckPolicy="strict"
                dfdl:textNumberPattern="#,###"
                dfdl:textStandardGroupingSeparator=","
                dfdl:textStandardDecimalSeparator="."
            />
        </xs:sequence>
    </xs:complexType>
</xs:element>


Re: Example of input that fails due to grouping separators when textNumberCheckPolicy="strict"?

Posted by Steve Lawrence <sl...@apache.org>.
We use the ICU library for parsing numbers based on the
textNumberPattern. This library has this to say about strict parsing of
numbers:


> The following conditions cause a parse failure relative to [lax] mode
> (examples use the pattern "#,##0.#"):
> 
> * The presence and position of special symbols, including currency, 
> must match the pattern.
> 
>   '+123' fails (there is no plus sign in the pattern)
> 
> * Leading or doubled grouping separators
> 
>   ',123' and '1,,234" fail
> 
> * Groups of incorrect length when grouping is used
> 
>   '1,23' and '1234,567' fail, but '1234' passes
> 
> * Grouping separators used in numbers followed by exponents
> 
>   '1,234E5' fails, but '1234E5' and '1,234E' pass ('E' is not an 
>    exponent when not followed by a number)

So bsaed on ICU's description of strict, this is the expected behavior.
It doesn't say anything about missing grouping separators causing an
error. Only that if they do exist then they must be in the right spot.

The only thing the DFDL specification mentions regarding strict numbers
is this:

> If 'strict' and dfdl:textNumberRep is 'standard' then the data must 
> follow the pattern with the exceptions that digits 0-9, decimal 
> separator and exponent separator are always recognised and parsed

To me, that reads like the decimal separator should always be required
in strict mode, so this feels like the ICU behavior and the behavior
described in the DFDL specification do not match. And I believe the DFDL
behavior was intended to match match ICU behavior, so it's possible the
DFDL specification needs to be updated.

I've created DAFFODIL-2384 [1] to track this issue.

- Steve

[1] https://issues.apache.org/jira/browse/DAFFODIL-2384

On 8/17/20 11:57 AM, Roger L Costello wrote:
> What is an example of input that will cause parsing the fail due to the grouping separators when textNumberCheckPolicy="strict"? Why isn't the below an example, i.e., why is no error generated with the below example?  
> 
> Why is it that with this input
> 
> 1234
> 
> No error is raised when textNumberCheckPolicy="strict" and textNumberPattern="#,###" are specified:
> 
> <xs:element name="SimpleDataFormat">
>     <xs:complexType>
>         <xs:sequence>
>             <xs:element name="NumStudents" type="xs:nonNegativeInteger" 
>                 dfdl:textNumberCheckPolicy="strict"
>                 dfdl:textNumberPattern="#,###"
>                 dfdl:textStandardGroupingSeparator=","
>                 dfdl:textStandardDecimalSeparator="."
>             />
>         </xs:sequence>
>     </xs:complexType>
> </xs:element>
>