You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/26 19:03:13 UTC

[GitHub] [spark] JoshRosen commented on a change in pull request #24707: [SPARK-27839][SQL] Improve UTF8String.replace() / StringReplace performance

JoshRosen commented on a change in pull request #24707: [SPARK-27839][SQL] Improve UTF8String.replace() / StringReplace performance
URL: https://github.com/apache/spark/pull/24707#discussion_r287610284
 
 

 ##########
 File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
 ##########
 @@ -976,9 +977,28 @@ public UTF8String replace(UTF8String search, UTF8String replace) {
     if (EMPTY_UTF8.equals(search)) {
       return this;
     }
-    String replaced = toString().replace(
-      search.toString(), replace.toString());
-    return fromString(replaced);
+    return replace(search.toString(), replace.toString());
+  }
+
+  public UTF8String replace(String search, String replace) {
+    String before = toString();
+    String after;
+    if (search.length() == 1 && replace.length() == 1) {
+      // Use single-character-replacement fast path
+      after = before.replace(search.charAt(0), replace.charAt(0));
+    } else {
+      // In Java 8, String.replace() is implemented using a regex and is therefore
+      // somewhat inefficient (see https://bugs.openjdk.java.net/browse/JDK-8058779).
+      // This is fixed in Java 9, but in Java 8 we can use Commons StringUtils instead:
+      after = StringUtils.replace(before, search, replace);
+    }
+    // Use reference equality to cheaply detect whether the replacement had no effect,
+    // in which case we can simply return the original UTF8String and save some copying.
+    if (before == after) {
 
 Review comment:
   They generally will be the same if no characters were replaced. For example:
   
   ```scala
   // To avoid worrying about interning of strings, define some random ones at runtime:
   @ def randomString() = "ABCDEF" + scala.math.random()
   defined function randomString
   
   @ randomString
   res9: String = "ABCDEF0.24700533055374918"
   
   @ res9.replace('Z', 'z') eq res9
   res10: Boolean = true
   
   @ res9.replace("ZZZ", "z") eq res9
   res11: Boolean = true
   
   @ import $ivy.`org.apache.commons:commons-lang3:3.9`, org.apache.commons.lang3.StringUtils
   import $ivy.$                                     , org.apache.commons.lang3.StringUtils
   
   @ StringUtils.replace(res9, "ZZZ", "z") eq res9
   res14: Boolean = true
   
   // String is technically unchanged, but a different string object:
   @ StringUtils.replace(res9, "A", "A") eq res9
   res15: Boolean = false
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org