You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "MaxGekk (via GitHub)" <gi...@apache.org> on 2023/12/25 10:19:41 UTC

Re: [PR] [SPARK-46488][SQL] Skipping trimAll call during timestamp parsing [spark]

MaxGekk commented on code in PR #44463:
URL: https://github.com/apache/spark/pull/44463#discussion_r1436048800


##########
sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkDateTimeUtils.scala:
##########
@@ -619,6 +616,29 @@ trait SparkDateTimeUtils {
       case NonFatal(_) => None
     }
   }
+
+  /**
+   * This method retrieves the start and end indices of a byte array after trimming
+   * any whitespace or ISO control characters.
+   * This way we can avoid allocating a new string with trimAll method
+   * and just operate between the trimmed indices.
+   *
+   * @param bytes The byte array to be trimmed.
+   * @return A tuple of two integers; first being the start and second the end trimmed index.
+   */
+  private def getTrimmedStartEnd(bytes: Array[Byte]): (Int, Int) = {
+    var (start, end) = (0, bytes.length - 1)
+
+    while (start < bytes.length && UTF8String.isWhitespaceOrISOControl(bytes(start))) {
+      start += 1
+    }
+
+    while (end > start && UTF8String.isWhitespaceOrISOControl(bytes(end))) {
+      end -= 1
+    }
+
+    (start, end + 1)

Review Comment:
   Don't you create a `Tuple` instance here. Is it possible to avoid this? For example, define two separate `inline` functions:
   - `getTrimmedStart(bytes: Array[Byte]): Int`
   - `getTrimmedEnd(bytes: Array[Byte], start: Int): Int`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org