You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by sl...@apache.org on 2024/01/09 14:43:46 UTC

(daffodil) branch main updated: Improve performance of padding removal when parsing

This is an automated email from the ASF dual-hosted git repository.

slawrence pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/daffodil.git


The following commit(s) were added to refs/heads/main by this push:
     new f4996f0fe Improve performance of padding removal when parsing
f4996f0fe is described below

commit f4996f0fe86643e9282d37a06263c3cc5a82de81
Author: Steve Lawrence <sl...@apache.org>
AuthorDate: Tue Jan 9 07:39:43 2024 -0500

    Improve performance of padding removal when parsing
    
    The current algorithm to remove right padding of left justified strings
    first reverses the String, removes leading pad characters using
    dropWhile, and then reverses the result. The two reverses are linear in
    the length of the String, and requires allocating multiple String
    instances and copying characters from one to the other. And this is done
    regardless of how many, if any, pad chars exist in the String. This
    logic is very clear, but is fairly inefficient, enough to show up while
    profiling.
    
    To improve performance, this rewrites the algorithm to scan through the
    String in reverse to find the index of the last pad character and then
    uses the substring() function to create a new String with those pad
    characters removed. This is now linear in the number of pad characters
    in a String instead of the full length of the string. Additionally, the
    use of substring() avoids character copies, since it just allocates a
    new String using the same underlying String value but with different
    indices.
    
    I have not looked into detail how scala implements dropWhile() for
    Strings (skimming the code, it looks like it will allocate a new String
    and copy characters), but for consistency and maximum performance, this
    also updates the algorithm that removes left padding of right justified
    strings to use similar logic as the new right padding algorithm. By
    using substring() we should avoid possible copies.
    
    In one test with lots of left justified strings, many of which are
    padded, this saw about a 15% improvement in parse times (excluding
    infoset creating using the null infoset outputter), and padding removal
    no longer shows up while profiling.
    
    DAFFODIL-2868
---
 .../processors/parsers/PaddingRuntimeMixin.scala   | 28 ++++++++++++++++++----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PaddingRuntimeMixin.scala b/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PaddingRuntimeMixin.scala
index 8486c79c7..93092c5a6 100644
--- a/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PaddingRuntimeMixin.scala
+++ b/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/parsers/PaddingRuntimeMixin.scala
@@ -17,6 +17,7 @@
 
 package org.apache.daffodil.runtime1.processors.parsers
 
+import org.apache.daffodil.lib.exceptions.Assert
 import org.apache.daffodil.lib.util.MaybeChar
 import org.apache.daffodil.runtime1.processors.TextJustificationType
 
@@ -24,11 +25,28 @@ trait PaddingRuntimeMixin {
   protected def justificationTrim: TextJustificationType.Type
   protected def parsingPadChar: MaybeChar
 
-  private def removeRightPadding(str: String): String =
-    if (parsingPadChar.isEmpty) str
-    else str.reverse.dropWhile(c => c == parsingPadChar.get).reverse
-  private def removeLeftPadding(str: String): String =
-    if (parsingPadChar.isEmpty) str else str.dropWhile(c => c == parsingPadChar.get)
+  private def removeRightPadding(str: String): String = {
+    // must have a pad character if we are removing padding
+    Assert.invariant(parsingPadChar.isDefined)
+    val padChar = parsingPadChar.get
+    var index = str.length - 1
+    while (index >= 0 && str(index) == padChar) {
+      index -= 1
+    }
+    str.substring(0, index + 1)
+  }
+
+  private def removeLeftPadding(str: String): String = {
+    // must have a pad character if we are removing padding
+    Assert.invariant(parsingPadChar.isDefined)
+    val padChar = parsingPadChar.get
+    var index = 0
+    while (index < str.length && str(index) == padChar) {
+      index += 1
+    }
+    str.substring(index)
+  }
+
   private def removePadding(str: String): String = removeRightPadding(removeLeftPadding(str))
 
   def trimByJustification(str: String): String = {