You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ma...@apache.org on 2021/04/12 11:32:54 UTC

[spark] branch master updated: [SPARK-35005][SQL] Improve error msg if UTF8String concatWs length overflow

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 1be1012  [SPARK-35005][SQL] Improve error msg if UTF8String concatWs length overflow
1be1012 is described below

commit 1be10124977bc32b580e37cf533328195ab6378c
Author: ulysses-you <ul...@gmail.com>
AuthorDate: Mon Apr 12 14:32:15 2021 +0300

    [SPARK-35005][SQL] Improve error msg if UTF8String concatWs length overflow
    
    ### What changes were proposed in this pull request?
    
    Add check if the byte length over `int`.
    
    ### Why are the changes needed?
    
    We encounter a very extreme case with expression `concat_ws`, and the error msg is
    ```
    Caused by: java.lang.NegativeArraySizeException
    	at org.apache.spark.unsafe.types.UTF8String.concatWs
    ```
    Seems the `UTF8String.concat` has already done the length check at [#21064](https://github.com/apache/spark/pull/21064), so it's better to add in `concatWs`.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes
    
    ### How was this patch tested?
    
    It's too heavy to add the test.
    
    Closes #32106 from ulysses-you/SPARK-35005.
    
    Authored-by: ulysses-you <ul...@gmail.com>
    Signed-off-by: Max Gekk <ma...@gmail.com>
---
 .../src/main/java/org/apache/spark/unsafe/types/UTF8String.java      | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
index f8289c1..41c511a 100644
--- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
+++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
@@ -993,7 +993,7 @@ public final class UTF8String implements Comparable<UTF8String>, Externalizable,
       return null;
     }
 
-    int numInputBytes = 0;  // total number of bytes from the inputs
+    long numInputBytes = 0L;  // total number of bytes from the inputs
     int numInputs = 0;      // number of non-null inputs
     for (int i = 0; i < inputs.length; i++) {
       if (inputs[i] != null) {
@@ -1009,7 +1009,8 @@ public final class UTF8String implements Comparable<UTF8String>, Externalizable,
 
     // Allocate a new byte array, and copy the inputs one by one into it.
     // The size of the new array is the size of all inputs, plus the separators.
-    final byte[] result = new byte[numInputBytes + (numInputs - 1) * separator.numBytes];
+    int resultSize = Math.toIntExact(numInputBytes + (numInputs - 1) * (long)separator.numBytes);
+    final byte[] result = new byte[resultSize];
     int offset = 0;
 
     for (int i = 0, j = 0; i < inputs.length; i++) {

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org