You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ma...@apache.org on 2021/04/12 11:32:54 UTC
[spark] branch master updated: [SPARK-35005][SQL] Improve error msg
if UTF8String concatWs length overflow
This is an automated email from the ASF dual-hosted git repository.
maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 1be1012 [SPARK-35005][SQL] Improve error msg if UTF8String concatWs length overflow
1be1012 is described below
commit 1be10124977bc32b580e37cf533328195ab6378c
Author: ulysses-you <ul...@gmail.com>
AuthorDate: Mon Apr 12 14:32:15 2021 +0300
[SPARK-35005][SQL] Improve error msg if UTF8String concatWs length overflow
### What changes were proposed in this pull request?
Add check if the byte length over `int`.
### Why are the changes needed?
We encounter a very extreme case with expression `concat_ws`, and the error msg is
```
Caused by: java.lang.NegativeArraySizeException
at org.apache.spark.unsafe.types.UTF8String.concatWs
```
Seems the `UTF8String.concat` has already done the length check at [#21064](https://github.com/apache/spark/pull/21064), so it's better to add in `concatWs`.
### Does this PR introduce _any_ user-facing change?
Yes
### How was this patch tested?
It's too heavy to add the test.
Closes #32106 from ulysses-you/SPARK-35005.
Authored-by: ulysses-you <ul...@gmail.com>
Signed-off-by: Max Gekk <ma...@gmail.com>
---
.../src/main/java/org/apache/spark/unsafe/types/UTF8String.java | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
index f8289c1..41c511a 100644
--- a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
+++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
@@ -993,7 +993,7 @@ public final class UTF8String implements Comparable<UTF8String>, Externalizable,
return null;
}
- int numInputBytes = 0; // total number of bytes from the inputs
+ long numInputBytes = 0L; // total number of bytes from the inputs
int numInputs = 0; // number of non-null inputs
for (int i = 0; i < inputs.length; i++) {
if (inputs[i] != null) {
@@ -1009,7 +1009,8 @@ public final class UTF8String implements Comparable<UTF8String>, Externalizable,
// Allocate a new byte array, and copy the inputs one by one into it.
// The size of the new array is the size of all inputs, plus the separators.
- final byte[] result = new byte[numInputBytes + (numInputs - 1) * separator.numBytes];
+ int resultSize = Math.toIntExact(numInputBytes + (numInputs - 1) * (long)separator.numBytes);
+ final byte[] result = new byte[resultSize];
int offset = 0;
for (int i = 0, j = 0; i < inputs.length; i++) {
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org