You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "tuxji (via GitHub)" <gi...@apache.org> on 2023/03/08 17:56:04 UTC

[GitHub] [daffodil] tuxji commented on a diff in pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

tuxji commented on code in PR #982:
URL: https://github.com/apache/daffodil/pull/982#discussion_r1129828685


##########
daffodil-test/src/test/resources/org/apache/daffodil/section13/text_number_props/TextNumberProps.tdml:
##########
@@ -4039,16 +4081,13 @@
       <tdml:documentPart type="text"><![CDATA[NOTANUMBER]]]></tdml:documentPart>
     </tdml:document>
 
-    <tdml:warnings>
-      <tdml:warning>DFDL property was ignored</tdml:warning>
-      <tdml:warning>textStandardNaNRep</tdml:warning>
-    </tdml:warnings>
+    <tdml:warnings />

Review Comment:
   Can we just delete the empty tdml:warnings element without changing any semantics?



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/ConvertTextStandardNumberParser.scala:
##########
@@ -171,47 +185,91 @@ case class ConvertTextStandardNumberParser(
       case Some(_) => primNumeric.fromNumber(0)
       case None => {
         val df = textNumberFormatEv.evaluate(start)
-        val strCheckPolicy = if (df.isParseStrict) str else str.trim
+        val strToParse = if (df.isParseStrict) str else str.trim
         val pos = new ParsePosition(0)
-        val icuNum: Number = df.parse(strCheckPolicy, pos) match {
+        val icuNum: JNumber = df.parse(strToParse, pos) match {
           case null => {
-            PE(
-              start,
-              "Unable to parse %s from text: %s",
-              context.optPrimType.get.globalQName,
-              str,
-            )
-            return
+            val infNaN: JDouble =
+              if (df.isDecimalPatternMatchRequired) {
+                // ICU failed to parse. But there is a bug in ICU4J (ICU-22303) that if there is
+                // a decimal in the pattern and we've set that decimal to be required (due to
+                // strict mode), then it will fail to parse Inf/NaN representations. As a
+                // workaround, we clone the DecimalFormat, disable requiring the decimal, and
+                // reparse. We only accept successful Inf/NaN parses though--everything else is
+                // considered a parse error since it meant the decimal point was missing or
+                // wasn't either inf/nan or a valid number. If ICU fixes this bug, we should
+                // remove this infNan variable and its use, as it is likely pretty expensive to
+                // clone, change a setting, and reparse. Fortunately, it is only in the error
+                // case of strict parsing so should be rare.
+                pos.setIndex(0)
+                val newDF = df.clone().asInstanceOf[DecimalFormat]
+                newDF.setDecimalPatternMatchRequired(false)
+                newDF.parse(strToParse, pos) match {
+                  case d: JDouble => {
+                    Assert.invariant(d.isNaN || d.isInfinite)
+                    d
+                  }
+                  case _ => null
+                }
+              } else {
+                null
+              }
+
+            if (infNaN != null) {
+              infNaN
+            } else {
+              PE(
+                start,
+                "Unable to parse %s from text: %s",
+                context.optPrimType.get.globalQName,
+                str,
+              )
+              return
+            }
           }
-          case d: JDouble if primNumeric.isInteger => {
-            // If ICU returns a Double when only integers are expected, it means the
-            // string must have been NaN, -Infinity, or Infinity using the locales
-            // default symbols. There does not seem to be a way to disable this even
-            // with setParseBigDecimal to false and setParseIntegerOnly set to true.
-            // So just create the same PE as if it failed to parse it, which is what
-            // we really want ICU to do
+          case d: JDouble => {
+            // ICU returns a Double only if it parsed NaN, Infinity, or -Infinity. We will later
+            // pass this value in primNumber.fromNumber, which will fail if the primitive type
+            // does not allow NaN/Infinity
             Assert.invariant(d.isNaN || d.isInfinite)
-            PE(
-              start,
-              "Unable to parse %s from text: %s",
-              context.optPrimType.get.globalQName,
-              str,
-            )
-            return
+            d
           }
           case bd: ICUBigDecimal => {
-            // sometimes ICU will return their own custom BigDecimal, even if the
-            // value could be represented as a BigInteger. We only want Java types,
-            // so detect this and convert it to the appropriate type
+            // ICU will return their own custom BigDecimal if the value cannot fit in a Long and
+            // isn't infinity/NaN. We only want Java types, so detect this and convert it to the

Review Comment:
   Asking only for curiosity.  Why does ICU return a custom BigDecimal instead of a BigDecimal?  Why does the difference matter to Daffodil, that is, why isn't the custom BigDecimal compatible enough?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org