You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "stevedlawrence (via GitHub)" <gi...@apache.org> on 2023/03/08 16:51:12 UTC

[GitHub] [daffodil] stevedlawrence opened a new pull request, #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

stevedlawrence opened a new pull request, #982:
URL: https://github.com/apache/daffodil/pull/982

   Currently, if the primitive type is an integer then text number parsing disallows parsing decimal points, even if the pattern contains a decimal point. Instead, when parsing integers, we should allow decimals as long the fractional part is zero. And when unparsing, we should unparse a decimal point with a zero fractional part according to the pattern.
   
   This changes the behavior so integer parsing uses the same DecimalFormat configuration as non-integer parsing (i.e. decimals are allowed), but we throw a parse error if the fractional part that was parsed is non-zero. This also means that unparsing integers now outputs decimal points according to the pattern.
   
   Additionally, if textNumberCheckPolicy is strict, we enable ICU setDecimalPatternMatchRequired to true so that we allow or disallow decimal points in the data depending on if the pattern does or does not have a decimal point. Note that lax parsing always allows decimal points regardless of the pattern. For this reason, we now always require the grouping/decimal separator DFDL properties in lax mode.
   
   One bug was discovered in ICU (ICU-22303) where if we require the decimal point due to strict mode enabled, then ICU never parses the infinity/NaN representation. A workaround is added to manually check for these representations until this bug is fixed. ICU unit tests are also added which should fail if ICU fixes this bug so we can remove this workaround.
   
   Make sure we always specify infinity and NaN representations from the DFDL schema for all primitives, not just for xs:double/xs:float. There is no way to disable infinity/NaN ICU parsing, so when if we do not specify these values ICU just uses the locale values, which could lead to unwanted locale specific behavior. Related, this modifies NodeInfo types so that fromNumber fails for types that do not support infinity/NaN (i.e. everything except Double/Float) and creates a parse error.
   
   Modifies virtual decimal logic to ensure we handle cases for numbers that do not fit in a Long (should work) or contain decimal points (should be a parse error).
   
   Tests are updated so if they want to differentiate between int and decimal depending on if a decimal exists in the data, then they must specify a pattern with or without a decimal and enable strict mode--lax mode allows a decimal regardless of type so cannot differentiate the types.
   
   DAFFODIL-2158


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil] mbeckerle commented on a diff in pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Posted by "mbeckerle (via GitHub)" <gi...@apache.org>.
mbeckerle commented on code in PR #982:
URL: https://github.com/apache/daffodil/pull/982#discussion_r1129895851


##########
daffodil-test/src/test/resources/org/apache/daffodil/section13/text_number_props/TextNumberProps.tdml:
##########
@@ -4039,16 +4081,13 @@
       <tdml:documentPart type="text"><![CDATA[NOTANUMBER]]]></tdml:documentPart>
     </tdml:document>
 
-    <tdml:warnings>
-      <tdml:warning>DFDL property was ignored</tdml:warning>
-      <tdml:warning>textStandardNaNRep</tdml:warning>
-    </tdml:warnings>
+    <tdml:warnings />

Review Comment:
   I wasn't aware that empty tdml:warnings meant that. Does that also work for empty tdml:validationErrors?
   
   I think there is a JIRA ticket [about adding this feature for expecting/precluding warnings.](https://issues.apache.org/jira/browse/DAFFODIL-864) Perhaps this feature exists now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil] stevedlawrence merged pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Posted by "stevedlawrence (via GitHub)" <gi...@apache.org>.
stevedlawrence merged PR #982:
URL: https://github.com/apache/daffodil/pull/982


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil] stevedlawrence commented on a diff in pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Posted by "stevedlawrence (via GitHub)" <gi...@apache.org>.
stevedlawrence commented on code in PR #982:
URL: https://github.com/apache/daffodil/pull/982#discussion_r1129861918


##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/ConvertTextStandardNumberParser.scala:
##########
@@ -171,47 +185,91 @@ case class ConvertTextStandardNumberParser(
       case Some(_) => primNumeric.fromNumber(0)
       case None => {
         val df = textNumberFormatEv.evaluate(start)
-        val strCheckPolicy = if (df.isParseStrict) str else str.trim
+        val strToParse = if (df.isParseStrict) str else str.trim
         val pos = new ParsePosition(0)
-        val icuNum: Number = df.parse(strCheckPolicy, pos) match {
+        val icuNum: JNumber = df.parse(strToParse, pos) match {
           case null => {
-            PE(
-              start,
-              "Unable to parse %s from text: %s",
-              context.optPrimType.get.globalQName,
-              str,
-            )
-            return
+            val infNaN: JDouble =
+              if (df.isDecimalPatternMatchRequired) {
+                // ICU failed to parse. But there is a bug in ICU4J (ICU-22303) that if there is
+                // a decimal in the pattern and we've set that decimal to be required (due to
+                // strict mode), then it will fail to parse Inf/NaN representations. As a
+                // workaround, we clone the DecimalFormat, disable requiring the decimal, and
+                // reparse. We only accept successful Inf/NaN parses though--everything else is
+                // considered a parse error since it meant the decimal point was missing or
+                // wasn't either inf/nan or a valid number. If ICU fixes this bug, we should
+                // remove this infNan variable and its use, as it is likely pretty expensive to
+                // clone, change a setting, and reparse. Fortunately, it is only in the error
+                // case of strict parsing so should be rare.
+                pos.setIndex(0)
+                val newDF = df.clone().asInstanceOf[DecimalFormat]
+                newDF.setDecimalPatternMatchRequired(false)
+                newDF.parse(strToParse, pos) match {
+                  case d: JDouble => {
+                    Assert.invariant(d.isNaN || d.isInfinite)
+                    d
+                  }
+                  case _ => null
+                }
+              } else {
+                null
+              }
+
+            if (infNaN != null) {
+              infNaN
+            } else {
+              PE(
+                start,
+                "Unable to parse %s from text: %s",
+                context.optPrimType.get.globalQName,
+                str,
+              )
+              return
+            }
           }
-          case d: JDouble if primNumeric.isInteger => {
-            // If ICU returns a Double when only integers are expected, it means the
-            // string must have been NaN, -Infinity, or Infinity using the locales
-            // default symbols. There does not seem to be a way to disable this even
-            // with setParseBigDecimal to false and setParseIntegerOnly set to true.
-            // So just create the same PE as if it failed to parse it, which is what
-            // we really want ICU to do
+          case d: JDouble => {
+            // ICU returns a Double only if it parsed NaN, Infinity, or -Infinity. We will later
+            // pass this value in primNumber.fromNumber, which will fail if the primitive type
+            // does not allow NaN/Infinity
             Assert.invariant(d.isNaN || d.isInfinite)
-            PE(
-              start,
-              "Unable to parse %s from text: %s",
-              context.optPrimType.get.globalQName,
-              str,
-            )
-            return
+            d
           }
           case bd: ICUBigDecimal => {
-            // sometimes ICU will return their own custom BigDecimal, even if the
-            // value could be represented as a BigInteger. We only want Java types,
-            // so detect this and convert it to the appropriate type
+            // ICU will return their own custom BigDecimal if the value cannot fit in a Long and
+            // isn't infinity/NaN. We only want Java types, so detect this and convert it to the

Review Comment:
   > Asking only for curiosity. Why does ICU return a custom BigDecimal instead of a BigDecimal? 
   
   Why does ICU do anything? It's a mystery wrapped inside an enigma. We just do our best to make it work :)
   
   > Why does the difference matter to Daffodil, that is, why isn't the custom BigDecimal compatible enough?
   
   I'm not really sure what's different and it's not obvious reading the ICU BigDecimal documentation. But we try to use Java types everywhere (avoiding both Scala types and custom ICU types) just so we're consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil] tuxji commented on a diff in pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Posted by "tuxji (via GitHub)" <gi...@apache.org>.
tuxji commented on code in PR #982:
URL: https://github.com/apache/daffodil/pull/982#discussion_r1129828685


##########
daffodil-test/src/test/resources/org/apache/daffodil/section13/text_number_props/TextNumberProps.tdml:
##########
@@ -4039,16 +4081,13 @@
       <tdml:documentPart type="text"><![CDATA[NOTANUMBER]]]></tdml:documentPart>
     </tdml:document>
 
-    <tdml:warnings>
-      <tdml:warning>DFDL property was ignored</tdml:warning>
-      <tdml:warning>textStandardNaNRep</tdml:warning>
-    </tdml:warnings>
+    <tdml:warnings />

Review Comment:
   Can we just delete the empty tdml:warnings element without changing any semantics?



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/ConvertTextStandardNumberParser.scala:
##########
@@ -171,47 +185,91 @@ case class ConvertTextStandardNumberParser(
       case Some(_) => primNumeric.fromNumber(0)
       case None => {
         val df = textNumberFormatEv.evaluate(start)
-        val strCheckPolicy = if (df.isParseStrict) str else str.trim
+        val strToParse = if (df.isParseStrict) str else str.trim
         val pos = new ParsePosition(0)
-        val icuNum: Number = df.parse(strCheckPolicy, pos) match {
+        val icuNum: JNumber = df.parse(strToParse, pos) match {
           case null => {
-            PE(
-              start,
-              "Unable to parse %s from text: %s",
-              context.optPrimType.get.globalQName,
-              str,
-            )
-            return
+            val infNaN: JDouble =
+              if (df.isDecimalPatternMatchRequired) {
+                // ICU failed to parse. But there is a bug in ICU4J (ICU-22303) that if there is
+                // a decimal in the pattern and we've set that decimal to be required (due to
+                // strict mode), then it will fail to parse Inf/NaN representations. As a
+                // workaround, we clone the DecimalFormat, disable requiring the decimal, and
+                // reparse. We only accept successful Inf/NaN parses though--everything else is
+                // considered a parse error since it meant the decimal point was missing or
+                // wasn't either inf/nan or a valid number. If ICU fixes this bug, we should
+                // remove this infNan variable and its use, as it is likely pretty expensive to
+                // clone, change a setting, and reparse. Fortunately, it is only in the error
+                // case of strict parsing so should be rare.
+                pos.setIndex(0)
+                val newDF = df.clone().asInstanceOf[DecimalFormat]
+                newDF.setDecimalPatternMatchRequired(false)
+                newDF.parse(strToParse, pos) match {
+                  case d: JDouble => {
+                    Assert.invariant(d.isNaN || d.isInfinite)
+                    d
+                  }
+                  case _ => null
+                }
+              } else {
+                null
+              }
+
+            if (infNaN != null) {
+              infNaN
+            } else {
+              PE(
+                start,
+                "Unable to parse %s from text: %s",
+                context.optPrimType.get.globalQName,
+                str,
+              )
+              return
+            }
           }
-          case d: JDouble if primNumeric.isInteger => {
-            // If ICU returns a Double when only integers are expected, it means the
-            // string must have been NaN, -Infinity, or Infinity using the locales
-            // default symbols. There does not seem to be a way to disable this even
-            // with setParseBigDecimal to false and setParseIntegerOnly set to true.
-            // So just create the same PE as if it failed to parse it, which is what
-            // we really want ICU to do
+          case d: JDouble => {
+            // ICU returns a Double only if it parsed NaN, Infinity, or -Infinity. We will later
+            // pass this value in primNumber.fromNumber, which will fail if the primitive type
+            // does not allow NaN/Infinity
             Assert.invariant(d.isNaN || d.isInfinite)
-            PE(
-              start,
-              "Unable to parse %s from text: %s",
-              context.optPrimType.get.globalQName,
-              str,
-            )
-            return
+            d
           }
           case bd: ICUBigDecimal => {
-            // sometimes ICU will return their own custom BigDecimal, even if the
-            // value could be represented as a BigInteger. We only want Java types,
-            // so detect this and convert it to the appropriate type
+            // ICU will return their own custom BigDecimal if the value cannot fit in a Long and
+            // isn't infinity/NaN. We only want Java types, so detect this and convert it to the

Review Comment:
   Asking only for curiosity.  Why does ICU return a custom BigDecimal instead of a BigDecimal?  Why does the difference matter to Daffodil, that is, why isn't the custom BigDecimal compatible enough?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil] stevedlawrence commented on a diff in pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Posted by "stevedlawrence (via GitHub)" <gi...@apache.org>.
stevedlawrence commented on code in PR #982:
URL: https://github.com/apache/daffodil/pull/982#discussion_r1129900394


##########
daffodil-test/src/test/resources/org/apache/daffodil/section13/text_number_props/TextNumberProps.tdml:
##########
@@ -4039,16 +4081,13 @@
       <tdml:documentPart type="text"><![CDATA[NOTANUMBER]]]></tdml:documentPart>
     </tdml:document>
 
-    <tdml:warnings>
-      <tdml:warning>DFDL property was ignored</tdml:warning>
-      <tdml:warning>textStandardNaNRep</tdml:warning>
-    </tdml:warnings>
+    <tdml:warnings />

Review Comment:
   Looks like I was confused. It looks like we have that logic for `tdml:validationErrors` but not `tdml:warnings`. So you can ensure there are no validation errors, but can't do the same for warnings.
   
   So this change does nothing at the moment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil] stevedlawrence commented on a diff in pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Posted by "stevedlawrence (via GitHub)" <gi...@apache.org>.
stevedlawrence commented on code in PR #982:
URL: https://github.com/apache/daffodil/pull/982#discussion_r1129865660


##########
daffodil-test/src/test/resources/org/apache/daffodil/section13/text_number_props/TextNumberProps.tdml:
##########
@@ -4039,16 +4081,13 @@
       <tdml:documentPart type="text"><![CDATA[NOTANUMBER]]]></tdml:documentPart>
     </tdml:document>
 
-    <tdml:warnings>
-      <tdml:warning>DFDL property was ignored</tdml:warning>
-      <tdml:warning>textStandardNaNRep</tdml:warning>
-    </tdml:warnings>
+    <tdml:warnings />

Review Comment:
   The empty `tdml:warnings` explicitly says "there should be no warnings", whereas not having a `tdml:warnings` element just says "we don't care if there are warnings or not". The idea with changing it like this was that the test used to expect a warning saying this property was ignored, but the property is now used so I wanted to make sure it's not ignored and there's no warning. We do get an error mentioning "NaN" so that should be enough to make it clear the property is used, but I wanted to still keep the intention of the test regarding warnings.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil] tuxji commented on a diff in pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Posted by "tuxji (via GitHub)" <gi...@apache.org>.
tuxji commented on code in PR #982:
URL: https://github.com/apache/daffodil/pull/982#discussion_r1148438851


##########
daffodil-test/src/test/resources/org/apache/daffodil/section13/text_number_props/TextNumberProps.tdml:
##########
@@ -4039,16 +4081,13 @@
       <tdml:documentPart type="text"><![CDATA[NOTANUMBER]]]></tdml:documentPart>
     </tdml:document>
 
-    <tdml:warnings>
-      <tdml:warning>DFDL property was ignored</tdml:warning>
-      <tdml:warning>textStandardNaNRep</tdml:warning>
-    </tdml:warnings>
+    <tdml:warnings />

Review Comment:
   @stevedlawrence, you can leave the empty `tdml:warnings` since it might gain the expected semantics later on.
   
   @mbeckerle and @mike-mcgann, this PR needs a second +1 from one of you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil] mbeckerle commented on pull request #982: Ensure all primitives use textNumberPattern and infinfity/NaN correctly

Posted by "mbeckerle (via GitHub)" <gi...@apache.org>.
mbeckerle commented on PR #982:
URL: https://github.com/apache/daffodil/pull/982#issuecomment-1485080719

   A test with float instead of double would eliminate all the code-cov gaps. However, I don't actually care about them as they are correct by inspection. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org