You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/30 06:28:21 UTC

[GitHub] [spark] mkaravel opened a new pull request #34154: [WIP] Add lpad and rpad functions for binary strings

mkaravel opened a new pull request #34154:
URL: https://github.com/apache/spark/pull/34154


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946099332


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48850/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946143341


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48852/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946051661


   **[Test build #144378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144378/testReport)** for PR 34154 at commit [`7a52529`](https://github.com/apache/spark/commit/7a52529805a63a597b8ebc0117ca9f9761d6f6c6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946145554


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144378/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946112268


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144376/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946112268


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144376/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946099332


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48850/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947897117


   **[Test build #144465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144465/testReport)** for PR 34154 at commit [`fdc9207`](https://github.com/apache/spark/commit/fdc9207e3f7fa27db5746a6ff804e2e324b99e36).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34154: [WIP][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-930840410


   **[Test build #143756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143756/testReport)** for PR 34154 at commit [`36cf3c2`](https://github.com/apache/spark/commit/36cf3c2f5c0edad8663bf4261d2a117cf7aab9df).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731831413



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
##########
@@ -1328,14 +1328,31 @@ case class StringLocate(substr: Expression, str: Expression, start: Expression)
 
 }
 
+/**
+ * Helper class for implementing StringLPad and StringRPad.
+ * Returns the default expression to be used in StringLPad or StringRPad based on the type of
+ * the input expression.
+ * For character string expressions the default padding expression is the string literal ' '.
+ * For binary string expressions the default padding expression is the byte literal 0x00.
+ */
+object StringPadDefaultValue {
+  def get(str: Expression): Expression = {
+    str.dataType match {

Review comment:
       One problem is that `str` may be unresolved and we can't call `str.dataType`. We need some indirection here so that we can go to the binary lpad/rpad code path during analysis.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946949915


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48889/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r732586572



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input byte
+  // sequence if it is longer than `len` bytes, or a copy of the byte sequence, otherwise.
+  private static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting byte sequence with the pattern. The resulting byte sequence is
+  // passed as the first argument and it is filled from position `firstPos` (inclusive)
+  // to position `beyondPos` (not inclusive).
+  private static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {
+    for (int pos = firstPos; pos < beyondPos; pos += pad.length) {
+      final int jMax = Math.min(pad.length, beyondPos - pos);
+      for (int j = 0; j < jMax; ++j) {
+        result[pos + j] = (byte) pad[j];
+      }
+    }
+  }
+
+  // Left-pads the input byte sequence using the provided padding pattern.
+  // In the special case that the padding pattern is empty, the resulting byte sequence
+  // contains the first `len` bytes of the input if they exist, or is a copy of the input
+  // byte sequence otherwise.
+  // For padding patterns with positive byte length, the resulting byte sequence's byte length is
+  // equal to `len`. If the input byte sequence is not less than `len` bytes, its first `len` bytes
+  // are returned. Otherwise, the remaining missing bytes are filled in with the provided pattern.
+  public static byte[] lpad(byte[] bytes, int len, byte[] pad) {
+    if (bytes == null || pad == null) return null;
+    // If the input length is 0, return the empty byte sequence.
+    if (len == 0) return EMPTY_BYTE;
+    // The padding pattern is empty.
+    if (pad.length == 0) return padWithEmptyPattern(bytes, len);
+    // The general case.
+    // 1. Copy the first `len` bytes of the input byte sequence into the output if they exist.
+    final byte[] result = new byte[len];
+    final int minLen = Math.min(len, bytes.length);
+    Platform.copyMemory(
+            bytes, Platform.BYTE_ARRAY_OFFSET,
+            result, Platform.BYTE_ARRAY_OFFSET + len - minLen,
+            minLen);
+    // 2. If the input has less than `len` bytes, fill in the rest using the provided pattern.
+    if (bytes.length < len) {
+      fillWithPattern(result, 0, len - bytes.length, pad);
+    }
+    return result;
+  }
+
+  // Right-pads the input byte sequence using the provided padding pattern.
+  // In the special case that the padding pattern is empty, the resulting byte sequence
+  // contains the first `len` bytes of the input if they exist, or is a copy of the input
+  // byte sequence otherwise.
+  // For padding patterns with positive byte length, the resulting byte sequence's byte length is
+  // equal to `len`. If the input byte sequence is not less than `len` bytes, its first `len` bytes
+  // are returned. Otherwise, the remaining missing bytes are filled in with the provided pattern.
+  public static byte[] rpad(byte[] bytes, int len, byte[] pad) {
+    if (bytes == null || pad == null) return null;
+    // If the input length is 0, return the empty byte sequence.
+    if (len == 0) return EMPTY_BYTE;
+    // The padding pattern is empty.
+    if (pad.length == 0) return padWithEmptyPattern(bytes, len);
+    // The general case.
+    // 1. Copy the first `len` bytes of the input sequence into the output if they exist.
+    final byte[] result = new byte[len];
+    Platform.copyMemory(
+            bytes, Platform.BYTE_ARRAY_OFFSET,
+            result, Platform.BYTE_ARRAY_OFFSET,
+            Math.min(len, bytes.length));

Review comment:
       ditto, 2 spaces indentation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731822261



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {

Review comment:
       ```suggestion
     private static byte[] padWithEmptyPattern(byte[] bytes, int len) {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947002754


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144415/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947801612


   **[Test build #144465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144465/testReport)** for PR 34154 at commit [`fdc9207`](https://github.com/apache/spark/commit/fdc9207e3f7fa27db5746a6ff804e2e324b99e36).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947898273


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144465/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946144464


   **[Test build #144378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144378/testReport)** for PR 34154 at commit [`7a52529`](https://github.com/apache/spark/commit/7a52529805a63a597b8ebc0117ca9f9761d6f6c6).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946488237


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48875/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947929721


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48940/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946935899


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48889/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-948614742


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947893511


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48938/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731892447



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {

Review comment:
       I do not have any issue with changing the method scope from protected to private, but I would like to understand your reasoning please.

##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary string
+  // is passed as the first argument and it is filled from position `firstPos` (inclusive)
+  // to position `beyondPos` (not inclusive).
+  protected static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {

Review comment:
       Same here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731894894



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary string

Review comment:
       [Here](https://spark.apache.org/docs/latest/sql-ref-datatypes.html) we use the term *byte sequence*. Maybe use the same wording?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r732584160



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
##########
@@ -1328,14 +1328,31 @@ case class StringLocate(substr: Expression, str: Expression, start: Expression)
 
 }
 
+/**
+ * Helper class for implementing StringLPad and StringRPad.
+ * Returns the default expression to be used in StringLPad or StringRPad based on the type of
+ * the input expression.
+ * For character string expressions the default padding expression is the string literal ' '.
+ * For byte sequence expressions the default padding expression is the byte literal 0x00.
+ */
+object StringPadDefaultValue {
+  def get(str: Expression): Expression = {
+    str.dataType match {
+      case StringType => Literal(" ")
+      case BinaryType => Literal(Array[Byte](0x00))

Review comment:
       is `0x00` a byte literal? is it better than `0.toByte`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946529965


   **[Test build #144401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144401/testReport)** for PR 34154 at commit [`5a4e114`](https://github.com/apache/spark/commit/5a4e1145b700fd9edb660f4481779ff963458ed7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946985689


   **[Test build #144415 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144415/testReport)** for PR 34154 at commit [`1f34b1c`](https://github.com/apache/spark/commit/1f34b1c02e9181190238eea2719da9793cc1bdd3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947893511


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48938/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r732915470



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input byte
+  // sequence if it is longer than `len` bytes, or a copy of the byte sequence, otherwise.
+  private static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting byte sequence with the pattern. The resulting byte sequence is
+  // passed as the first argument and it is filled from position `firstPos` (inclusive)
+  // to position `beyondPos` (not inclusive).
+  private static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {
+    for (int pos = firstPos; pos < beyondPos; pos += pad.length) {
+      final int jMax = Math.min(pad.length, beyondPos - pos);
+      for (int j = 0; j < jMax; ++j) {
+        result[pos + j] = (byte) pad[j];
+      }
+    }
+  }
+
+  // Left-pads the input byte sequence using the provided padding pattern.
+  // In the special case that the padding pattern is empty, the resulting byte sequence
+  // contains the first `len` bytes of the input if they exist, or is a copy of the input
+  // byte sequence otherwise.
+  // For padding patterns with positive byte length, the resulting byte sequence's byte length is
+  // equal to `len`. If the input byte sequence is not less than `len` bytes, its first `len` bytes
+  // are returned. Otherwise, the remaining missing bytes are filled in with the provided pattern.
+  public static byte[] lpad(byte[] bytes, int len, byte[] pad) {
+    if (bytes == null || pad == null) return null;
+    // If the input length is 0, return the empty byte sequence.
+    if (len == 0) return EMPTY_BYTE;
+    // The padding pattern is empty.
+    if (pad.length == 0) return padWithEmptyPattern(bytes, len);
+    // The general case.
+    // 1. Copy the first `len` bytes of the input byte sequence into the output if they exist.
+    final byte[] result = new byte[len];
+    final int minLen = Math.min(len, bytes.length);
+    Platform.copyMemory(
+            bytes, Platform.BYTE_ARRAY_OFFSET,
+            result, Platform.BYTE_ARRAY_OFFSET + len - minLen,
+            minLen);
+    // 2. If the input has less than `len` bytes, fill in the rest using the provided pattern.
+    if (bytes.length < len) {
+      fillWithPattern(result, 0, len - bytes.length, pad);
+    }
+    return result;
+  }
+
+  // Right-pads the input byte sequence using the provided padding pattern.
+  // In the special case that the padding pattern is empty, the resulting byte sequence
+  // contains the first `len` bytes of the input if they exist, or is a copy of the input
+  // byte sequence otherwise.
+  // For padding patterns with positive byte length, the resulting byte sequence's byte length is
+  // equal to `len`. If the input byte sequence is not less than `len` bytes, its first `len` bytes
+  // are returned. Otherwise, the remaining missing bytes are filled in with the provided pattern.
+  public static byte[] rpad(byte[] bytes, int len, byte[] pad) {
+    if (bytes == null || pad == null) return null;
+    // If the input length is 0, return the empty byte sequence.
+    if (len == 0) return EMPTY_BYTE;
+    // The padding pattern is empty.
+    if (pad.length == 0) return padWithEmptyPattern(bytes, len);
+    // The general case.
+    // 1. Copy the first `len` bytes of the input sequence into the output if they exist.
+    final byte[] result = new byte[len];
+    Platform.copyMemory(
+            bytes, Platform.BYTE_ARRAY_OFFSET,
+            result, Platform.BYTE_ARRAY_OFFSET,
+            Math.min(len, bytes.length));

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947937821


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48940/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [WIP][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-931024754


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48267/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [WIP][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-931024883


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48267/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [WIP] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-930840410


   **[Test build #143756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143756/testReport)** for PR 34154 at commit [`36cf3c2`](https://github.com/apache/spark/commit/36cf3c2f5c0edad8663bf4261d2a117cf7aab9df).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [WIP][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-931120392


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143756/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731826054



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary string
+  // is passed as the first argument and it is filled from position `firstPos` (inclusive)
+  // to position `beyondPos` (not inclusive).
+  protected static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {
+    for (int pos = firstPos; pos < beyondPos; pos += pad.length) {
+      final int jMax = Math.min(pad.length, beyondPos - pos);
+      for (int j = 0; j < jMax; ++j) {
+        result[pos + j] = (byte) pad[j];
+      }
+    }
+  }
+
+  // Left-pads the input binary string using the provided padding pattern.
+  // In the special case that the padding pattern is empty, the resulting binary string
+  // contains the first `len` bytes of the input if they exist, or is a copy of the input
+  // binary stringkm otherwise.
+  // For padding patterns with positive byte length, the resulting binary string's byte length is
+  // equal to `len`. If the input binary string is not less than `len` bytes, its first `len` bytes
+  // are returned. Otherwise, the remaining missing bytes are filled in with the provided pattern.
+  public static byte[] lpad(byte[] bytes, int len, byte[] pad) {

Review comment:
       what if `len % pad.length` != 0? will we append part of the "pad binary" to the result binary?

##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary string
+  // is passed as the first argument and it is filled from position `firstPos` (inclusive)
+  // to position `beyondPos` (not inclusive).
+  protected static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {
+    for (int pos = firstPos; pos < beyondPos; pos += pad.length) {
+      final int jMax = Math.min(pad.length, beyondPos - pos);
+      for (int j = 0; j < jMax; ++j) {
+        result[pos + j] = (byte) pad[j];
+      }
+    }
+  }
+
+  // Left-pads the input binary string using the provided padding pattern.
+  // In the special case that the padding pattern is empty, the resulting binary string
+  // contains the first `len` bytes of the input if they exist, or is a copy of the input
+  // binary stringkm otherwise.
+  // For padding patterns with positive byte length, the resulting binary string's byte length is
+  // equal to `len`. If the input binary string is not less than `len` bytes, its first `len` bytes
+  // are returned. Otherwise, the remaining missing bytes are filled in with the provided pattern.
+  public static byte[] lpad(byte[] bytes, int len, byte[] pad) {

Review comment:
       what if `len % pad.length` != 0? will we append part of the "padding binary" to the result binary?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731891091



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary string
+  // is passed as the first argument and it is filled from position `firstPos` (inclusive)
+  // to position `beyondPos` (not inclusive).
+  protected static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {
+    for (int pos = firstPos; pos < beyondPos; pos += pad.length) {
+      final int jMax = Math.min(pad.length, beyondPos - pos);
+      for (int j = 0; j < jMax; ++j) {
+        result[pos + j] = (byte) pad[j];
+      }
+    }
+  }
+
+  // Left-pads the input binary string using the provided padding pattern.
+  // In the special case that the padding pattern is empty, the resulting binary string
+  // contains the first `len` bytes of the input if they exist, or is a copy of the input
+  // binary stringkm otherwise.
+  // For padding patterns with positive byte length, the resulting binary string's byte length is
+  // equal to `len`. If the input binary string is not less than `len` bytes, its first `len` bytes
+  // are returned. Otherwise, the remaining missing bytes are filled in with the provided pattern.
+  public static byte[] lpad(byte[] bytes, int len, byte[] pad) {

Review comment:
       Yes. We adopt the same behavior as for character strings:
   ```
   scala> spark.sql("select lpad('abc', 10, '12')").show
   +-----------------+
   |lpad(abc, 10, 12)|
   +-----------------+
   |       1212121abc|
   +-----------------+
   
   
   scala> spark.sql("select rpad('abc', 10, '12')").show
   +-----------------+
   |rpad(abc, 10, 12)|
   +-----------------+
   |       abc1212121|
   +-----------------+
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947937821


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48940/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #34154:
URL: https://github.com/apache/spark/pull/34154


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r732582899



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input byte
+  // sequence if it is longer than `len` bytes, or a copy of the byte sequence, otherwise.
+  private static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting byte sequence with the pattern. The resulting byte sequence is
+  // passed as the first argument and it is filled from position `firstPos` (inclusive)
+  // to position `beyondPos` (not inclusive).
+  private static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {
+    for (int pos = firstPos; pos < beyondPos; pos += pad.length) {
+      final int jMax = Math.min(pad.length, beyondPos - pos);
+      for (int j = 0; j < jMax; ++j) {
+        result[pos + j] = (byte) pad[j];
+      }
+    }
+  }
+
+  // Left-pads the input byte sequence using the provided padding pattern.
+  // In the special case that the padding pattern is empty, the resulting byte sequence
+  // contains the first `len` bytes of the input if they exist, or is a copy of the input
+  // byte sequence otherwise.
+  // For padding patterns with positive byte length, the resulting byte sequence's byte length is
+  // equal to `len`. If the input byte sequence is not less than `len` bytes, its first `len` bytes
+  // are returned. Otherwise, the remaining missing bytes are filled in with the provided pattern.
+  public static byte[] lpad(byte[] bytes, int len, byte[] pad) {
+    if (bytes == null || pad == null) return null;
+    // If the input length is 0, return the empty byte sequence.
+    if (len == 0) return EMPTY_BYTE;
+    // The padding pattern is empty.
+    if (pad.length == 0) return padWithEmptyPattern(bytes, len);
+    // The general case.
+    // 1. Copy the first `len` bytes of the input byte sequence into the output if they exist.
+    final byte[] result = new byte[len];
+    final int minLen = Math.min(len, bytes.length);
+    Platform.copyMemory(
+            bytes, Platform.BYTE_ARRAY_OFFSET,
+            result, Platform.BYTE_ARRAY_OFFSET + len - minLen,
+            minLen);

Review comment:
       nit: 2 spaces indentation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947844884


   **[Test build #144467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144467/testReport)** for PR 34154 at commit [`9027989`](https://github.com/apache/spark/commit/9027989529ff3bbf5f79d2d2022981b0a36db97f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946110205


   **[Test build #144376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144376/testReport)** for PR 34154 at commit [`9324674`](https://github.com/apache/spark/commit/93246745cd3ce4fdebc89b3bef84654780e207aa).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946145554


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144378/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946536230


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144401/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731822971



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary string
+  // is passed as the first argument and it is filled from position `firstPos` (inclusive)
+  // to position `beyondPos` (not inclusive).
+  protected static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {

Review comment:
       ```suggestion
     private static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946850720


   **[Test build #144415 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144415/testReport)** for PR 34154 at commit [`1f34b1c`](https://github.com/apache/spark/commit/1f34b1c02e9181190238eea2719da9793cc1bdd3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731897915



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {

Review comment:
       We should minimize the scope as possible as we can. If `private` is sufficient here, we should use `private`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946949915


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48889/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r732915231



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input byte
+  // sequence if it is longer than `len` bytes, or a copy of the byte sequence, otherwise.
+  private static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting byte sequence with the pattern. The resulting byte sequence is
+  // passed as the first argument and it is filled from position `firstPos` (inclusive)
+  // to position `beyondPos` (not inclusive).
+  private static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {
+    for (int pos = firstPos; pos < beyondPos; pos += pad.length) {
+      final int jMax = Math.min(pad.length, beyondPos - pos);
+      for (int j = 0; j < jMax; ++j) {
+        result[pos + j] = (byte) pad[j];
+      }
+    }
+  }
+
+  // Left-pads the input byte sequence using the provided padding pattern.
+  // In the special case that the padding pattern is empty, the resulting byte sequence
+  // contains the first `len` bytes of the input if they exist, or is a copy of the input
+  // byte sequence otherwise.
+  // For padding patterns with positive byte length, the resulting byte sequence's byte length is
+  // equal to `len`. If the input byte sequence is not less than `len` bytes, its first `len` bytes
+  // are returned. Otherwise, the remaining missing bytes are filled in with the provided pattern.
+  public static byte[] lpad(byte[] bytes, int len, byte[] pad) {
+    if (bytes == null || pad == null) return null;
+    // If the input length is 0, return the empty byte sequence.
+    if (len == 0) return EMPTY_BYTE;
+    // The padding pattern is empty.
+    if (pad.length == 0) return padWithEmptyPattern(bytes, len);
+    // The general case.
+    // 1. Copy the first `len` bytes of the input byte sequence into the output if they exist.
+    final byte[] result = new byte[len];
+    final int minLen = Math.min(len, bytes.length);
+    Platform.copyMemory(
+            bytes, Platform.BYTE_ARRAY_OFFSET,
+            result, Platform.BYTE_ARRAY_OFFSET + len - minLen,
+            minLen);

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947868029


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48938/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947980304


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144467/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946861549


   > What's the use case for this? given that we would have to justify a breaking change (of behavior that was probably not intended to work anyway though)
   
   From my point of view, the old behavior is wrong and would have to be fixed anyways. Besides this, the main scenario is one where a user has a BINARY column with different lengths and they want to "align" all values either to the left or to the right. These overloads allow the user to do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sigmod commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
sigmod commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946059641


   @MaxGekk @cloud-fan @wangyum @sigmod 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946536230


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144401/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946410811


   **[Test build #144401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144401/testReport)** for PR 34154 at commit [`5a4e114`](https://github.com/apache/spark/commit/5a4e1145b700fd9edb660f4481779ff963458ed7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [WIP][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-930921729


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48267/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946488201


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48875/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947898273


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144465/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731898465



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary string

Review comment:
       "byte sequence" LGTM




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947844884


   **[Test build #144467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144467/testReport)** for PR 34154 at commit [`9027989`](https://github.com/apache/spark/commit/9027989529ff3bbf5f79d2d2022981b0a36db97f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731831413



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
##########
@@ -1328,14 +1328,31 @@ case class StringLocate(substr: Expression, str: Expression, start: Expression)
 
 }
 
+/**
+ * Helper class for implementing StringLPad and StringRPad.
+ * Returns the default expression to be used in StringLPad or StringRPad based on the type of
+ * the input expression.
+ * For character string expressions the default padding expression is the string literal ' '.
+ * For binary string expressions the default padding expression is the byte literal 0x00.
+ */
+object StringPadDefaultValue {
+  def get(str: Expression): Expression = {
+    str.dataType match {

Review comment:
       One problem is that `str` may be unresolved and we can't call `str.dataType`. We need some indirection here so that we can go to the binary lpad/rpad code path during analysis.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946897851


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48889/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947835113


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48938/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731994988



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary string
+  // is passed as the first argument and it is filled from position `firstPos` (inclusive)
+  // to position `beyondPos` (not inclusive).
+  protected static void fillWithPattern(byte[] result, int firstPos, int beyondPos, byte[] pad) {

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [WIP][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946013895


   **[Test build #144376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144376/testReport)** for PR 34154 at commit [`9324674`](https://github.com/apache/spark/commit/93246745cd3ce4fdebc89b3bef84654780e207aa).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946050677


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48850/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946443678


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48875/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731994772



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947821544


   > LGTM. Since it's a breaking change, let's add an item in `docs/sql-migration-guide.md`
   
   Done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947886105


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48940/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947801612


   **[Test build #144465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144465/testReport)** for PR 34154 at commit [`fdc9207`](https://github.com/apache/spark/commit/fdc9207e3f7fa27db5746a6ff804e2e324b99e36).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947980304


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144467/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [WIP][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-931106032


   **[Test build #143756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/143756/testReport)** for PR 34154 at commit [`36cf3c2`](https://github.com/apache/spark/commit/36cf3c2f5c0edad8663bf4261d2a117cf7aab9df).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946086000


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48850/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946098181


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48852/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946013895


   **[Test build #144376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144376/testReport)** for PR 34154 at commit [`9324674`](https://github.com/apache/spark/commit/93246745cd3ce4fdebc89b3bef84654780e207aa).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946410811


   **[Test build #144401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144401/testReport)** for PR 34154 at commit [`5a4e114`](https://github.com/apache/spark/commit/5a4e1145b700fd9edb660f4481779ff963458ed7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946488237


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48875/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [WIP][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-931024883


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48267/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946131135


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48852/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731995473



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary string

Review comment:
       Using the term *byte sequence* consistently now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946850720


   **[Test build #144415 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144415/testReport)** for PR 34154 at commit [`1f34b1c`](https://github.com/apache/spark/commit/1f34b1c02e9181190238eea2719da9793cc1bdd3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947002754


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144415/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946143341


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/48852/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731902238



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {

Review comment:
       This makes sense. Will make the changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mkaravel commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
mkaravel commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r732916603



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
##########
@@ -1328,14 +1328,31 @@ case class StringLocate(substr: Expression, str: Expression, start: Expression)
 
 }
 
+/**
+ * Helper class for implementing StringLPad and StringRPad.
+ * Returns the default expression to be used in StringLPad or StringRPad based on the type of
+ * the input expression.
+ * For character string expressions the default padding expression is the string literal ' '.
+ * For byte sequence expressions the default padding expression is the byte literal 0x00.
+ */
+object StringPadDefaultValue {
+  def get(str: Expression): Expression = {
+    str.dataType match {
+      case StringType => Literal(" ")
+      case BinaryType => Literal(Array[Byte](0x00))

Review comment:
       Yes it is (it is the hex representation of one byte whose value is `0`).
   I think that both choices are equally good. I ended up opting for the even simpler: `Array[Byte](0)`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-947964922


   **[Test build #144467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144467/testReport)** for PR 34154 at commit [`9027989`](https://github.com/apache/spark/commit/9027989529ff3bbf5f79d2d2022981b0a36db97f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #34154:
URL: https://github.com/apache/spark/pull/34154#discussion_r731823513



##########
File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java
##########
@@ -101,4 +101,82 @@ public static long getPrefix(byte[] bytes) {
     }
     return result;
   }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // If the padding pattern's length is 0, return the first `len` bytes of the input
+  // binary string if it longer than `len` bytes, or a copy of the binary string, otherwise.
+  protected static byte[] padWithEmptyPattern(byte[] bytes, int len) {
+    len = Math.min(bytes.length, len);
+    final byte[] result = new byte[len];
+    Platform.copyMemory(bytes, Platform.BYTE_ARRAY_OFFSET, result, Platform.BYTE_ARRAY_OFFSET, len);
+    return result;
+  }
+
+  // Helper method for implementing `lpad` and `rpad`.
+  // Fills the resulting binary string with the pattern. The resulting binary string

Review comment:
       shall we use "binary data" instead of "binary string"? there is no "binary string" concept in Spark today.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34154: [SPARK-37047][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-946051661


   **[Test build #144378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144378/testReport)** for PR 34154 at commit [`7a52529`](https://github.com/apache/spark/commit/7a52529805a63a597b8ebc0117ca9f9761d6f6c6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34154: [WIP][SQL] Add lpad and rpad functions for binary strings

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34154:
URL: https://github.com/apache/spark/pull/34154#issuecomment-931120392


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/143756/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org