You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2022/12/29 22:18:07 UTC

[GitHub] [daffodil] tuxji commented on a diff in pull request #900: Implement textNumberPattern 'V' virtual decimal point.

tuxji commented on code in PR #900:
URL: https://github.com/apache/daffodil/pull/900#discussion_r1059145342


##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -46,8 +46,226 @@ case class ConvertTextCombinator(e: ElementBase, value: Gram, converter: Gram)
   override lazy val unparser = new ConvertTextCombinatorUnparser(e.termRuntimeData, value.unparser, converter.unparser)
 }
 
-case class ConvertTextNumberPrim(e: ElementBase)
-  extends Terminal(e, true) {
+// This is a separate object for unit testing purposes.
+private[primitives]
+object TextNumberPatternUtils {
+
+  /**
+   * A regex which matches textNumberPatterns that legally use the V
+   * (implied decimal point) character.
+   */
+   private[primitives] lazy val vregexStandard = {
+    //  DFDL v1.0 Spec says
+    //  It is a Schema Definition Error if any symbols other than "0", "1" through "9" or #
+    //  are used in the vpinteger region of the pattern.
+    //
+    // The prefix and suffix chars can surround the vpinteger region, and there can be
+    // a positive and a negative pattern.
+    //
+    // The prefix and suffix cannot be digits, # or P or V. No quoted chars in prefix either.
+    //
+    val prefix = """([^0-9#PV']?)"""
+    val suffix = prefix // same syntax as prefix
+    val vpinteger = """(#*[0-9]+)V([0-9]+)"""
+    val subpattern = s"""${prefix}${vpinteger}${suffix}"""
+     // don't forget the ^ and $ (start of data, end of data) because we want
+     // the match to consume all the characters, starting at the beginning.
+    val vPattern = s"""^${subpattern}(?:;${subpattern})?$$"""
+    val re = vPattern.r
+    re
+  }
+
+  def textDecimalVirtualPointForStandard(
+    e: ImplementsThrowsOrSavesSDE,
+    originalPattern: String,
+    patternStripped: String): Int = {
+    val r = TextNumberPatternUtils.vregexStandard
+    r.findFirstMatchIn(patternStripped) match {
+      case Some(r(_, _, afterV, _, _, _, _, _)) => afterV.length
+      case None =>
+        e.SDE(
+          s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal point).
+             | Other than the sign indicators, it can contain only
+             | '#', then digits 0-9 then 'V' then digits 0-9.
+             | The positive part of the dfdl:textNumberPattern is mandatory.""".stripMargin('|'),
+          originalPattern)
+    }
+  }
+
+  private[primitives] lazy val vregexZoned= {
+    // Note: for zoned, can only have a positive pattern
+    // Prefix or suffix can only be '+' character, to
+    // indicate leading or trailing sign, and can only
+    // one of those.
+    //
+    // Also we're not allowing the # character, since I think that
+    // makes no sense for zoned with virtual decimal point.
+    //
+    val vPattern = """^(\+?)([0-9]+)V([0-9]+)(\+?)$""" // only positive pattern allowed, only + for prefix/suffix
+    val re = vPattern.r

Review Comment:
   Likewise, I would define each capturing group here as val prefix, val beforeV, val afterV, val suffix, and interpolate them into vPattern at line 104.



##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -46,8 +46,226 @@ case class ConvertTextCombinator(e: ElementBase, value: Gram, converter: Gram)
   override lazy val unparser = new ConvertTextCombinatorUnparser(e.termRuntimeData, value.unparser, converter.unparser)
 }
 
-case class ConvertTextNumberPrim(e: ElementBase)
-  extends Terminal(e, true) {
+// This is a separate object for unit testing purposes.
+private[primitives]
+object TextNumberPatternUtils {
+
+  /**
+   * A regex which matches textNumberPatterns that legally use the V
+   * (implied decimal point) character.
+   */
+   private[primitives] lazy val vregexStandard = {
+    //  DFDL v1.0 Spec says
+    //  It is a Schema Definition Error if any symbols other than "0", "1" through "9" or #
+    //  are used in the vpinteger region of the pattern.
+    //
+    // The prefix and suffix chars can surround the vpinteger region, and there can be
+    // a positive and a negative pattern.
+    //
+    // The prefix and suffix cannot be digits, # or P or V. No quoted chars in prefix either.
+    //
+    val prefix = """([^0-9#PV']?)"""
+    val suffix = prefix // same syntax as prefix
+    val vpinteger = """(#*[0-9]+)V([0-9]+)"""
+    val subpattern = s"""${prefix}${vpinteger}${suffix}"""
+     // don't forget the ^ and $ (start of data, end of data) because we want
+     // the match to consume all the characters, starting at the beginning.
+    val vPattern = s"""^${subpattern}(?:;${subpattern})?$$"""
+    val re = vPattern.r
+    re
+  }
+
+  def textDecimalVirtualPointForStandard(
+    e: ImplementsThrowsOrSavesSDE,
+    originalPattern: String,
+    patternStripped: String): Int = {
+    val r = TextNumberPatternUtils.vregexStandard
+    r.findFirstMatchIn(patternStripped) match {
+      case Some(r(_, _, afterV, _, _, _, _, _)) => afterV.length
+      case None =>
+        e.SDE(
+          s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal point).
+             | Other than the sign indicators, it can contain only
+             | '#', then digits 0-9 then 'V' then digits 0-9.
+             | The positive part of the dfdl:textNumberPattern is mandatory.""".stripMargin('|'),
+          originalPattern)
+    }
+  }
+
+  private[primitives] lazy val vregexZoned= {
+    // Note: for zoned, can only have a positive pattern
+    // Prefix or suffix can only be '+' character, to
+    // indicate leading or trailing sign, and can only
+    // one of those.
+    //
+    // Also we're not allowing the # character, since I think that
+    // makes no sense for zoned with virtual decimal point.
+    //
+    val vPattern = """^(\+?)([0-9]+)V([0-9]+)(\+?)$""" // only positive pattern allowed, only + for prefix/suffix
+    val re = vPattern.r
+    re
+  }
+
+  /**
+   * Checks a pattern for suitability with V (virtual decimal point) in the pattern
+   * in the context of zoned textNumberRep.
+   *
+   * This is broken out separately for unit testing purposes.
+   *
+   * @param pattern the dfdl:textNumberPattern pattern
+   * @return -1 if the pattern is illegal syntax for use with V (virtual decimal point)
+   *         otherwise the number of digits to the right of the V character.
+   */
+  private[primitives] def textDecimalVirtualPointForZoned(pattern: String): Option[Int] = {

Review Comment:
   Please add a similar scaladoc to textDecimalVirtualPointForStandard at line 78 too.  Also, should textDecimalVirtualPointForStandard be split into 2 overloaded functions or should the 2 overloaded textDecimalVirtualPointForZoned functions be merged together so that both textDecimalVirtualPointForStandard and textDecimalVirtualPointForZoned are defined consistently?



##########
daffodil-lib/src/main/scala/org/apache/daffodil/util/Numbers.scala:
##########
@@ -311,4 +311,28 @@ object Numbers {
   def asAnyRef(n: Any): AnyRef = {
     n.asInstanceOf[AnyRef]
   }
+
+  /**
+   * This is the official way to tell if a JBigDecimal has an integer value.
+   * @param bd A Java BigDecimal
+   * @return true if the value is an integer (has fraction part == 0).
+   */
+  def isIntegerValue (bd: JBigDecimal) = {
+     bd.signum() == 0 || bd.scale() <= 0 || bd.stripTrailingZeros().scale() <= 0;

Review Comment:
   I don't see any code in this PR calling this new function and it has a coverage warning which seems to indicate this function is not used anywhere.



##########
daffodil-lib/src/main/scala/org/apache/daffodil/util/Numbers.scala:
##########
@@ -311,4 +311,28 @@ object Numbers {
   def asAnyRef(n: Any): AnyRef = {
     n.asInstanceOf[AnyRef]
   }
+
+  /**
+   * This is the official way to tell if a JBigDecimal has an integer value.
+   * @param bd A Java BigDecimal
+   * @return true if the value is an integer (has fraction part == 0).
+   */
+  def isIntegerValue (bd: JBigDecimal) = {
+     bd.signum() == 0 || bd.scale() <= 0 || bd.stripTrailingZeros().scale() <= 0;
+  }
+
+  /** For any standard predefined subtype of JNumber, is the value numerically zero. */
+  def isZero(n1: JNumber) : Boolean = {
+    n1 match {
+      case d: JDouble => d.doubleValue() == 0.0
+      case f: JFloat => f.floatValue() == 0.0
+      case bd: JBigDecimal => bd.signum() == 0
+      case bi: JBigInt => bi.signum() == 0
+      case l: JLong => l.longValue == 0
+      case i: JInt => i.intValue == 0
+      case s: JShort => s.shortValue == 0
+      case b: JByte => b.byteValue == 0
+    }

Review Comment:
   Out of curiosity, does the Scala compiler or your IDE consider this match expression to be an exhaustive match or why didn't either of them complain about there being no default or case _ branch?  I found JNumber's javadoc page in Java 8 and it's publicly extensible (not final or sealed in any way I can tell) and has more direct subclasses than the above list ([AtomicInteger](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/AtomicInteger.html), [AtomicLong](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/AtomicLong.html), [BigDecimal](https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html), [BigInteger](https://docs.oracle.com/javase/8/docs/api/java/math/BigInteger.html), [Byte](https://docs.oracle.com/javase/8/docs/api/java/lang/Byte.html), [Double](https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html), [DoubleAccumulator](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/DoubleAccumulator.html
 ), [DoubleAdder](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/DoubleAdder.html), [Float](https://docs.oracle.com/javase/8/docs/api/java/lang/Float.html), [Integer](https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html), [Long](https://docs.oracle.com/javase/8/docs/api/java/lang/Long.html), [LongAccumulator](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/LongAccumulator.html), [LongAdder](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/atomic/LongAdder.html), [Short](https://docs.oracle.com/javase/8/docs/api/java/lang/Short.html)).



##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/ElementBaseGrammarMixin.scala:
##########
@@ -642,7 +642,7 @@ trait ElementBaseGrammarMixin
 
   private lazy val textConverter = {
     primType match {
-      case _: NodeInfo.Numeric.Kind => ConvertTextNumberPrim(this)
+      case _: NodeInfo.Numeric.Kind => ConvertTextNumberStandardPrim(this)

Review Comment:
   What is the reason for renaming ConvertTextNumberPrim to ConvertTextNumberStandardPrim?  That is, what does the "Standard" word mean in this context?  I notice "Standard" occurs in a lot of places within the codebase: 
   
   - java.nio.charset.StandardCharsets
   - java.nio.file.StandardOpenOption
   - org.apache.daffodil.processors.charset.StandardBitsCharsets
   - requireTextStandardBaseProperty
   - textStandardBase="10"
   - textStandardGroupingSeparator=","
   - textStandardDecimalSeparator="."
   - textStandardExponentRep="E"
   - textStandardExponentCharacter (deprecated)
   - textStandardInfinityRep="Inf"
   - textStandardNaNRep="NaN"
   - textStandardZeroRep=""
   - TextStandardBaseMixin
   - AsciiStandard
   (This wasn't a complete list, by the way.)
   
   If the intention is to say the converter uses these textStandard* properties, should the new name be ConvertTextStandardNumberPrim instead?
   



##########
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/parsers/PrimitivesTextNumber1.scala:
##########
@@ -54,11 +56,58 @@ case class ConvertTextCombinatorParser(
   }
 }
 
+trait TextDecimalVirtualPointMixin {
+  def textDecimalVirtualPoint: Int
+
+  final protected val divisor = scala.math.pow(10.0, textDecimalVirtualPoint)
+  final protected val multiplier = scala.math.pow(10.0, textDecimalVirtualPoint)

Review Comment:
   I'm confused, how come you are defining both divisor and multiplier as the same value?  Please add a comment explaining why.



##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesZoned.scala:
##########
@@ -47,71 +45,61 @@ case class ConvertZonedCombinator(e: ElementBase, value: Gram, converter: Gram)
 }
 
 case class ConvertZonedNumberPrim(e: ElementBase)
-  extends Terminal(e, true) {
+  extends Terminal(e, true)
+  with ConvertTextNumberMixin {
 
-  val textNumberFormatEv: TextNumberFormatEv = {
-    val pattern = {
-      val p = e.textNumberPattern
+  final override protected val pattern =  e.textNumberPattern
 
-      val noEscapedTicksRegex = """''""".r
-      val patternNoEscapedTicks = noEscapedTicksRegex.replaceAllIn(p, "")
-      val noQuotedRegex = """'[^']+'""".r
-      val patternNoQuoted = noQuotedRegex.replaceAllIn(patternNoEscapedTicks, "")
-      val zonedPatternRegex = "\\+?#*[0-9]+\\+?".r
+  lazy val textDecimalVirtualPoint = computeTextDecimalVirtualPoint(e)
 
-      if (patternNoQuoted.contains("V")) {
-        e.notYetImplemented("textNumberPattern with V symbol")
-      }
+  lazy val textNumberFormatEv: TextNumberFormatEv = {
 
-      if (patternNoQuoted.contains("P")) {
-        e.notYetImplemented("textNumberPattern with P symbol")
-      }
+    if (patternStripped.contains("@")) {
+      e.SDE("The '@' symbol may not be used in textNumberPattern for textNumberRep='zoned'")
+    }
 
-      if (patternNoQuoted.contains("@")) {
-        e.SDE("The '@' symbol may not be used in textNumberPattern for textNumberRep='zoned'")
-      }
+    if (patternStripped.contains("E")) {
+      e.SDE("The 'E' symbol may not be used in textNumberPattern for textNumberRep='zoned'")
+    }
 
-      if (patternNoQuoted.contains(";")) {
-        e.SDE("Negative patterns may not be used in textNumberPattern for textNumberRep='zoned'")
-      }
+    e.schemaDefinitionWhen(patternStripped.contains(";"),
+      "Negative patterns may not be used in textNumberPattern for textNumberRep='zoned'")
 
-      e.primType match {
-        case PrimType.Double | PrimType.Float => e.SDE("textZonedFormat='zoned' does not support Doubles/Floats")
-        case PrimType.UnsignedLong | PrimType.UnsignedInt | PrimType.UnsignedShort | PrimType.UnsignedByte => {
-          if (e.textNumberCheckPolicy == TextNumberCheckPolicy.Lax) {
-            if ((patternNoQuoted(0) != '+') && (patternNoQuoted(patternNoQuoted.length - 1) != '+'))
-              e.SDE("textNumberPattern must have '+' at the beginning or the end of the pattern when textZonedFormat='zoned' and textNumberPolicy='lax' for unsigned numbers")
-          }
-        }
-        case _ => {
-          if ((patternNoQuoted(0) != '+') && (patternNoQuoted(patternNoQuoted.length - 1) != '+'))
-            e.SDE("textNumberPattern must have '+' at the beginning or the end of the pattern when textZonedFormat='zoned' for signed numbers")
+    e.primType match {
+      case PrimType.Double | PrimType.Float => e.SDE("textNumberRep='zoned' does not support Doubles/Floats")
+      case PrimType.UnsignedLong | PrimType.UnsignedInt | PrimType.UnsignedShort | PrimType.UnsignedByte => {
+        if (e.textNumberCheckPolicy == TextNumberCheckPolicy.Lax) {
+          if ((patternStripped(0) != '+') && (patternStripped(patternStripped.length - 1) != '+'))
+            e.SDE("textNumberPattern must have '+' at the beginning or the end of the pattern when textNumberRep='zoned' and textNumberPolicy='lax' for unsigned numbers")
         }
       }
+      case _ => {
+        if ((patternStripped(0) != '+') && (patternStripped(patternStripped.length - 1) != '+'))
+          e.SDE("textNumberPattern must have '+' at the beginning or the end of the pattern when textNumberRep='zoned' for signed numbers")
+      }
+    }
 
-      if ((patternNoQuoted(0) == '+') && (patternNoQuoted(patternNoQuoted.length - 1) == '+'))
-        e.SDE("The textNumberPattern may either begin or end with a '+', not both.")
-
-      if (!zonedPatternRegex.pattern.matcher(patternNoQuoted).matches)
-        e.SDE("Invalid characters used in textNubmerPattern for zoned number. Only the following characters may be used with zoned numbers: '+', 'V', 'P', '0-9', and '#'")
+    if ((patternStripped(0) == '+') && (patternStripped(patternStripped.length - 1) == '+'))
+      e.SDE("The textNumberPattern may either begin or end with a '+', not both.")
 
-      // Load the pattern to make sure it is valid
-      try {
-        new DecimalFormat(p.replace("+", ""))
-      } catch {
-        case ex: IllegalArgumentException => e.SDE("Invalid textNumberPattern: " + ex.getMessage())
+    if (textDecimalVirtualPoint > 0) {
+      e.primType match {
+        case PrimType.Decimal => // ok
+        case _ => e.SDE(
+          """The dfdl:textNumberPattern has a virtual decimal point 'V' and dfdl:textNumberRep='zoned'.
+            | The type must be xs:decimal, but was: %s.""".stripMargin, e.primType.globalQName.toPrettyString)
       }
-
-      p
     }
 
+    checkPatternWithICU(e)
+
     /* Need to remove the '+' from the number pattern as '+' is only
     *  used to indicate which digit of the number is overpunched when
     *  dealing with zoned decimal formats. If the '+' is not removed
     *  DecimalFormat will attempt to use it as an indicator for exponent
     *  numbers, which will most likely not match the zoned number being
     *  parsed */
-    val zonedPattern = pattern.replace("+", "")
+    val zonedPattern = runtimePattern.replace("+", "")

Review Comment:
   This comment mentions DecimalFormat but we've removed import DecimalFormat from this file which is confusing.  It might be better to say we're removing '+' to make ConvertZonedNumberParser's job easier (we construct textNumberFormatEv with zonedPattern, pass textNumberFormatEv to ConvertZonedNumberParser, and call textNumberFormatEv.evaluate(start) there).



##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -46,8 +46,226 @@ case class ConvertTextCombinator(e: ElementBase, value: Gram, converter: Gram)
   override lazy val unparser = new ConvertTextCombinatorUnparser(e.termRuntimeData, value.unparser, converter.unparser)
 }
 
-case class ConvertTextNumberPrim(e: ElementBase)
-  extends Terminal(e, true) {
+// This is a separate object for unit testing purposes.
+private[primitives]
+object TextNumberPatternUtils {
+
+  /**
+   * A regex which matches textNumberPatterns that legally use the V
+   * (implied decimal point) character.
+   */
+   private[primitives] lazy val vregexStandard = {
+    //  DFDL v1.0 Spec says
+    //  It is a Schema Definition Error if any symbols other than "0", "1" through "9" or #
+    //  are used in the vpinteger region of the pattern.
+    //
+    // The prefix and suffix chars can surround the vpinteger region, and there can be
+    // a positive and a negative pattern.
+    //
+    // The prefix and suffix cannot be digits, # or P or V. No quoted chars in prefix either.
+    //
+    val prefix = """([^0-9#PV']?)"""
+    val suffix = prefix // same syntax as prefix
+    val vpinteger = """(#*[0-9]+)V([0-9]+)"""
+    val subpattern = s"""${prefix}${vpinteger}${suffix}"""
+     // don't forget the ^ and $ (start of data, end of data) because we want
+     // the match to consume all the characters, starting at the beginning.
+    val vPattern = s"""^${subpattern}(?:;${subpattern})?$$"""
+    val re = vPattern.r
+    re
+  }
+
+  def textDecimalVirtualPointForStandard(
+    e: ImplementsThrowsOrSavesSDE,
+    originalPattern: String,
+    patternStripped: String): Int = {
+    val r = TextNumberPatternUtils.vregexStandard
+    r.findFirstMatchIn(patternStripped) match {
+      case Some(r(_, _, afterV, _, _, _, _, _)) => afterV.length
+      case None =>
+        e.SDE(
+          s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal point).
+             | Other than the sign indicators, it can contain only
+             | '#', then digits 0-9 then 'V' then digits 0-9.
+             | The positive part of the dfdl:textNumberPattern is mandatory.""".stripMargin('|'),
+          originalPattern)
+    }
+  }
+
+  private[primitives] lazy val vregexZoned= {
+    // Note: for zoned, can only have a positive pattern
+    // Prefix or suffix can only be '+' character, to
+    // indicate leading or trailing sign, and can only
+    // one of those.
+    //
+    // Also we're not allowing the # character, since I think that
+    // makes no sense for zoned with virtual decimal point.
+    //
+    val vPattern = """^(\+?)([0-9]+)V([0-9]+)(\+?)$""" // only positive pattern allowed, only + for prefix/suffix
+    val re = vPattern.r
+    re
+  }
+
+  /**
+   * Checks a pattern for suitability with V (virtual decimal point) in the pattern
+   * in the context of zoned textNumberRep.
+   *
+   * This is broken out separately for unit testing purposes.
+   *
+   * @param pattern the dfdl:textNumberPattern pattern
+   * @return -1 if the pattern is illegal syntax for use with V (virtual decimal point)
+   *         otherwise the number of digits to the right of the V character.
+   */
+  private[primitives] def textDecimalVirtualPointForZoned(pattern: String): Option[Int] = {
+    val r = TextNumberPatternUtils.vregexZoned
+    r.findFirstMatchIn(pattern) match {
+      // note: cannot have both a prefix and suffix. Only one of them.
+      case Some(r(pre, _, afterV, suf)) if (pre.length + suf.length <= 1) => Some(afterV.length)
+      case _ => None
+    }
+  }
+
+  def textDecimalVirtualPointForZoned(
+    e: ImplementsThrowsOrSavesSDE,
+    originalPattern: String,
+    patternStripped: String): Int = {
+    textDecimalVirtualPointForZoned(patternStripped).getOrElse {
+      e.SDE(
+        s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal point).
+           |Other than the leading or trailing '+' sign indicator,
+           |it can contain only digits 0-9.""".stripMargin('|'),
+        originalPattern)
+    }
+  }
+
+  /**
+   * The apos character (') is the quoting character in ICU patterns (and DFDL textNumberPattern).
+   * This removes any unquoted P or V characters (which are not implemented by ICU)
+   * leaving a pattern string that is suitable for use with ICU.
+   * @param pattern the textNumberPattern string
+   * @return the pattern string with all unquoted P and unquoted V removed.
+   */
+  def removeUnquotedPV(pattern: String) : String = {
+    val uniqueQQString = "alsdflslkjskjslkkjlkkjfppooiipsldsflj"
+    // A single regex that matches an unquoted character
+    // where the quoting char is self quoting requires
+    // a zero-width look behind that matches a potentially
+    // unbounded number of quote chars. That's not allowed.
+    //
+    // Consider ''''''P. We want a regex that matches only the P here
+    // because it is preceded by an even number of quotes.
+    //
+    // I did some tests, and the formulations you think might work
+    // such as from stack-overflow, don't work.
+    // (See test howNotToUseRegexLookBehindWithReplaceAll)
+    //
+    // So we use this brute force technique of replacing all ''
+    // first, so we have only single quotes to deal with.
+    pattern.replaceAll("''", uniqueQQString).
+      replaceAll("(?<!')P", "").
+      replaceAll("(?<!')V", "").
+      replaceAll(uniqueQQString, "''")
+  }
+}
+
+trait ConvertTextNumberMixin {
+
+  protected def pattern: String
+
+
+  // The pattern with escaped characters removed.
+  // This can be reliably searched for pattern characters,
+  // but cannot be used as an operational pattern given that
+  // potentially lots of things have been removed from it.
+  final protected lazy val patternStripped = {
+    // note: tick == apos == ' == single quote.
+    // First remove entirely all escaped ticks ie., ''
+    val noEscapedTicksRegex = """''""".r
+    val patternNoEscapedTicks = noEscapedTicksRegex.replaceAllIn(pattern, "")
+    // Next remove all tick-escaped characters entirely
+    val noQuotedRegex = """'[^']+'""".r
+    val patternNoQuoted = noQuotedRegex.replaceAllIn(patternNoEscapedTicks, "")
+    // the remaining string contains only pattern special characters
+    // and regular non-pattern characters
+    patternNoQuoted
+  }
+
+  protected final lazy val hasV = patternStripped.contains("V")
+  protected final lazy val hasP = patternStripped.contains("P")
+
+  // analogous to the property dfdl:binaryDecimalVirtualPoint
+  protected final def computeTextDecimalVirtualPoint(e: ElementBase) = {
+    if (!hasV && !hasP) 0 // no virtual point
+    else {
+      if (hasP) {
+        e.notYetImplemented("textNumberPattern with P symbol")
+      }
+      Assert.invariant(hasV)
+      //
+      // check for things incompatible with "V"
+      //
+      val virtualPoint = e.textNumberRep match {
+        case TextNumberRep.Standard =>
+          TextNumberPatternUtils.textDecimalVirtualPointForStandard(e, pattern, patternStripped)
+        case TextNumberRep.Zoned =>
+          TextNumberPatternUtils.textDecimalVirtualPointForZoned(e, pattern, patternStripped)
+      }
+
+      Assert.invariant(virtualPoint >= 1) // if this fails the regex is broken.
+      virtualPoint
+    }
+  }
+
+  //
+  // We need 2 patterns
+  // 1. The original textNumberPattern string from the schema
+  // 2. Same but with the P or V removed.
+  //
+  // We need to use the original textNumberPattern in diagnostic messages
+  // since that's what the schema author wrote and expects to see.
+  // But if the pattern has P or V, then we need ICU at runtime to use the original
+  // but with the P or V removed since the P and V don't actually represent anything in the data.
+  final protected lazy val runtimePattern = {
+    if (hasV || hasP) {
+      val patternWithoutPV = TextNumberPatternUtils.removeUnquotedPV(pattern)
+      patternWithoutPV
+    }
+    else pattern
+  }
+
+  final protected def checkPatternWithICU(e: ElementBase) = {
+    // Load the pattern to make sure it is valid
+    try {
+      if (hasV || hasP) {
+        new DecimalFormat(runtimePattern)
+      } else {
+        new DecimalFormat(pattern)
+      }
+    } catch {
+      case ex: IllegalArgumentException =>
+        if (hasV || hasP) {
+          // we don't know what the diagnostic message will say here
+          // since it is from the ICU library.
+          // That library has only seen the pattern with the P and V
+          // removed. The messages might show that pattern which would
+          // confuse the schema author since that's not the pattern
+          // they wrote.
+          //
+          // So we try to explain....
+          e.SDE(
+            """Invalid textNumberPattern.
+              | The errors are about the pattern with the P (decimal scaling position)
+              | and V (virtual decimal point) characters removed: %s""".stripMargin, ex)

Review Comment:
   Consider adding a test with a deliberately bad "P" pattern which ICU rejects to cover this if branch, although it's only a weak "consider" in order to turn the CI's coverage check green.



##########
daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/PrimitivesTextNumber.scala:
##########
@@ -46,8 +46,226 @@ case class ConvertTextCombinator(e: ElementBase, value: Gram, converter: Gram)
   override lazy val unparser = new ConvertTextCombinatorUnparser(e.termRuntimeData, value.unparser, converter.unparser)
 }
 
-case class ConvertTextNumberPrim(e: ElementBase)
-  extends Terminal(e, true) {
+// This is a separate object for unit testing purposes.
+private[primitives]
+object TextNumberPatternUtils {
+
+  /**
+   * A regex which matches textNumberPatterns that legally use the V
+   * (implied decimal point) character.
+   */
+   private[primitives] lazy val vregexStandard = {
+    //  DFDL v1.0 Spec says
+    //  It is a Schema Definition Error if any symbols other than "0", "1" through "9" or #
+    //  are used in the vpinteger region of the pattern.
+    //
+    // The prefix and suffix chars can surround the vpinteger region, and there can be
+    // a positive and a negative pattern.
+    //
+    // The prefix and suffix cannot be digits, # or P or V. No quoted chars in prefix either.
+    //
+    val prefix = """([^0-9#PV']?)"""
+    val suffix = prefix // same syntax as prefix
+    val vpinteger = """(#*[0-9]+)V([0-9]+)"""
+    val subpattern = s"""${prefix}${vpinteger}${suffix}"""
+     // don't forget the ^ and $ (start of data, end of data) because we want
+     // the match to consume all the characters, starting at the beginning.
+    val vPattern = s"""^${subpattern}(?:;${subpattern})?$$"""
+    val re = vPattern.r
+    re
+  }
+
+  def textDecimalVirtualPointForStandard(
+    e: ImplementsThrowsOrSavesSDE,
+    originalPattern: String,
+    patternStripped: String): Int = {
+    val r = TextNumberPatternUtils.vregexStandard
+    r.findFirstMatchIn(patternStripped) match {
+      case Some(r(_, _, afterV, _, _, _, _, _)) => afterV.length
+      case None =>
+        e.SDE(
+          s"""The dfdl:textNumberPattern '%s' contains 'V' (virtual decimal point).
+             | Other than the sign indicators, it can contain only
+             | '#', then digits 0-9 then 'V' then digits 0-9.
+             | The positive part of the dfdl:textNumberPattern is mandatory.""".stripMargin('|'),

Review Comment:
   We use only one of vregexStandard's capturing groups here, `afterV.length`.  The only place we use the rest of the capturing groups is in TestPrimitive test cases to verify that each group captures text as expected.  Do we really need the rest of the capturing groups, that is, how high is the likelihood that Daffodil's package "primitives" will ever use them and need to know what they contain?
   
   I'm not suggesting that you name the other groups here since IDEs would complain about unused names.  Rather, name all the capturing groups at lines 67-73 as val prefix = """(group1)"", val beforeV = """(group2)""", all the way to val negSuffix, and finally interpolate all the capturing groups into the entire pattern as val vPattern = s"""^$prefix${beforeV}V$afterV$suffix(?:;$negPrefix${negBeforeV}V$negAfterV$negSuffix)?$$""".  Then it will be clear what the names of each capturing group should be in any calling code's Some(r(prefix, beforeV, afterV, suffix, negPrefix, negBeforeV, negAfterV, negSuffix)) pattern patch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org