You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/26 18:45:34 UTC

[GitHub] [spark] srowen commented on a change in pull request #24707: [SPARK-27839][SQL] Improve UTF8String.replace() / StringReplace performance

srowen commented on a change in pull request #24707: [SPARK-27839][SQL] Improve UTF8String.replace() / StringReplace performance
URL: https://github.com/apache/spark/pull/24707#discussion_r287609735
 
 

 ##########
 File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
 ##########
 @@ -976,9 +977,28 @@ public UTF8String replace(UTF8String search, UTF8String replace) {
     if (EMPTY_UTF8.equals(search)) {
       return this;
     }
-    String replaced = toString().replace(
-      search.toString(), replace.toString());
-    return fromString(replaced);
+    return replace(search.toString(), replace.toString());
+  }
+
+  public UTF8String replace(String search, String replace) {
+    String before = toString();
+    String after;
+    if (search.length() == 1 && replace.length() == 1) {
+      // Use single-character-replacement fast path
+      after = before.replace(search.charAt(0), replace.charAt(0));
+    } else {
+      // In Java 8, String.replace() is implemented using a regex and is therefore
+      // somewhat inefficient (see https://bugs.openjdk.java.net/browse/JDK-8058779).
+      // This is fixed in Java 9, but in Java 8 we can use Commons StringUtils instead:
+      after = StringUtils.replace(before, search, replace);
+    }
+    // Use reference equality to cheaply detect whether the replacement had no effect,
+    // in which case we can simply return the original UTF8String and save some copying.
+    if (before == after) {
 
 Review comment:
   You need `.equals()` here, no? the objects almost certainly won't be the same String?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org