You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Steve Lawrence (Jira)" <ji...@apache.org> on 2021/02/18 20:19:00 UTC

[jira] [Resolved] (DAFFODIL-2218) ICU behavior incompatible - textNumberCheckPolicy lax is lax about "+" signs. Was not before.

     [ https://issues.apache.org/jira/browse/DAFFODIL-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Lawrence resolved DAFFODIL-2218.
--------------------------------------
    Resolution: Won't Fix

Language in the specification has been weakened regarding this property and lax.

> ICU behavior incompatible - textNumberCheckPolicy lax is lax about "+" signs. Was not before. 
> ----------------------------------------------------------------------------------------------
>
>                 Key: DAFFODIL-2218
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2218
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Back End, ICU, Libraries
>            Reporter: Mike Beckerle
>            Priority: Minor
>
> ICU libraries changed behavior and now strict behavior is being lax about + signs.
> Daffodil should revert back to the latest ICU version that doesn't have this problem.
> Likely we have to determine what ICU version this changed in, and back out to a prior one, as this new behavior is not implementing the DFDL spec behavior. 
> See also https://issues.apache.org/jira/browse/DAFFODIL-845
> This from a DFDL Workgroup email thread on this subject:
> {code:java}
> Re: [DFDL-WG] Action 313: Plus '+' sign and lax textNumberCheckPolicyInboxxSteve Hanson <sm...@uk.ibm.com> Fri, Aug 30, 10:56 AMto me, slawrence, DFDL-WG, Liam ICU changing behaviour in an incompatible
> way is not good. 
> IBM DFDL is way behind, and is still
> on ICU 51.2.  We are limited in what we can do as we try to keep the
> same level as IBM Integration Bus & WTX as we have had C namespacing
> issues in the past.
> Looking at the links, there are other
> changes that have crept in when lenient. 
> - The string must
> contain a complete prefix and suffix. 
> For example, if the pattern is "{#};(#)", then
> "{123}" or "(123)" would match, but "{123",
> "123}", and "123" would all fail. 
> (The latter strings would be accepted in lenient mode.)
> -
> Minus and plus signs can only appear if specified in the pattern. 
> In lenient mode, a plus or minus sign can always precede
> a number.
> In typical ICU fashion, even this is
> not complete. It says nothing about what happens if the pattern has a sign
> and the data doesn't.
> I suggest you test all the combos with
> Daffodil and establish the truth.
> Then we need to decide what to do. If
> there is no way of controlling this (eg, parameter or env var) then the
> safest option is to backoff Daffodil to the latest ICU release that matches
> the DFDL 1.0 spec, and change the spec so that the link to ICU is specific
> rather than the generic link which is in the spec today (http://www.icu-project.org/apiref/icu4c/classDecimalFormat.html#_details)
> and which floats to the latest release. We can't have a moving target.
> Regards
>  
> Steve Hanson
> IBM Hybrid Integration, Hursley, UK
> Architect, IBM
> DFDL
> Co-Chair, OGF
> DFDL Working Group
> smh@uk.ibm.com
> tel:+44-1962-815848
> mob:+44-7717-378890
> Note: I work Tuesday to Friday 
> From:      
>  Mike Beckerle <mb...@gmail.com>
> To:      
>  DFDL-WG <df...@ogf.org>
> Date:      
>  29/08/2019 19:49
> Subject:    
>    [DFDL-WG] Action
> 313: Plus '+' sign and lax textNumberCheckPolicy
> Sent by:    
>    "dfdl-wg"
> <df...@ogf.org>
> Looks like ICU changed behavior....
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, August 29, 2019 1:30 PM
> To: users@daffodil.apache.org
> Subject: Re: Plus '+' sign and lax textNumberCheckPolicy - was: Re: How
> to model a fixed-length integer that may be padded with space on the left?
> I think this is a difference in ICU version?
> A little grepping through ICU source, I found a change [1] to their
> number parsing logic in Dec 2017:
> +        if (!isStrict) {
> +            parser.addMatcher(WhitespaceMatcher.getInstance());
> +            parser.addMatcher(new
> PlusSignMatcher());
> +        }
> That looks to me like a change to make it so plus signs are always
> matched in lax/lenient mode regardless of the pattern (Daffodils current
> behavior). A couple minor changes have been made to that section, but
> nothing that allows you to turn if off if lenient is on.
> It's hard to tell in the git history what release that was in, but it
> looks like around version 61, which is relatively new (Daffodil is on
> version 62).
> Also, the latest version of DecimalFormatProperties.java (looks to be an
> internal implementation, so no online javadocs), has javadocs that
> states that plus signs are always allowed in lenient/lax mode [2].
> I think this is a change in ICU behavior in newer versions.
> - Steve
> [1]
> https://github.com/unicode-org/icu/commit/68340c8464bd988477d6c88f46f9dfe4562a6d02#diff-565b07c255337881b4e06f766691667cR119-R122
> [2]
> https://github.com/unicode-org/icu/blob/master/icu4j/main/classes/core/src/com/ibm/icu/impl/number/DecimalFormatProperties.java#L53-L54
> --
>   dfdl-wg mailing list
>   dfdl-wg@ogf.org
>   https://www.ogf.org/mailman/listinfo/dfdl-wg
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)