You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "markj-db (via GitHub)" <gi...@apache.org> on 2023/11/18 00:37:35 UTC

[PR] [SPARK-44973][SQL] Fix ArrayIndexOutOfBoundsException in conv() [spark]

markj-db opened a new pull request, #43880:
URL: https://github.com/apache/spark/pull/43880

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   
   Increase the size of the buffer allocated for the result of base conversion in `NumberConverter` to prevent ArrayIndexOutOfBoundsException when evaluating `conv(s"${Long.MinValue}", 10, -2)`.
   
   ### Why are the changes needed?
   
   I don't think the ArrayIndexOutOfBoundsException is intended behaviour.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Users will no longer experience an ArrayIndexOutOfBoundsException for this specific set of arguments and will instead receive the expected base conversion.
   
   ### How was this patch tested?
   
   New unit test cases
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43880:
URL: https://github.com/apache/spark/pull/43880#issuecomment-1821568982

   I added you, @markj-db , to the Apache Spark contributor group. Thank you again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "markj-db (via GitHub)" <gi...@apache.org>.
markj-db commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1399540088


##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -244,7 +244,7 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
-  test("SPARK-36229 inconsistently behaviour where returned value is above the 64 char threshold") {
+  test("SPARK-36229: inconsistent behaviour where returned value is above the 64 char threshold") {

Review Comment:
   Done



##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -227,15 +227,15 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     checkAnswer(df.selectExpr("""conv("-10", 16, -10)"""), Row("-16"))
   }
 
-  test("SPARK-33428 conv function should trim input string") {
+  test("SPARK-33428: conv function should trim input string") {

Review Comment:
   Done



##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -227,15 +227,15 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     checkAnswer(df.selectExpr("""conv("-10", 16, -10)"""), Row("-16"))
   }
 
-  test("SPARK-33428 conv function should trim input string") {
+  test("SPARK-33428: conv function should trim input string") {
     val df = Seq(("abc"), ("  abc"), ("abc  "), ("  abc  ")).toDF("num")
     checkAnswer(df.select(conv($"num", 16, 10)),
       Seq(Row("2748"), Row("2748"), Row("2748"), Row("2748")))
     checkAnswer(df.select(conv($"num", 16, -10)),
       Seq(Row("2748"), Row("2748"), Row("2748"), Row("2748")))
   }
 
-  test("SPARK-33428 conv function shouldn't raise error if input string is too big") {
+  test("SPARK-33428: conv function shouldn't raise error if input string is too big") {

Review Comment:
   Done



##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -253,7 +253,7 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
-  test("SPARK-36229 conv should return result equal to -1 in base of toBase") {
+  test("SPARK-36229: conv should return result equal to -1 in base of toBase") {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix ArrayIndexOutOfBoundsException in conv() [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398037979


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala:
##########
@@ -23,6 +23,13 @@ import org.apache.spark.unsafe.types.UTF8String
 
 object NumberConverter {
 
+  /**
+   * The output string has a max length of one char per bit in the intermediate representation plus
+   * one char for the '-' sign.  This corresponds to the representation of `Long.MinValue` with
+   * `toBase` equal to -2.
+   */
+  private final val MAX_OUTPUT_LENGTH = 64 + 1

Review Comment:
   Increase the capacity of buffer seems not treat the root cause.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398474335


##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -253,7 +253,7 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
-  test("SPARK-36229 conv should return result equal to -1 in base of toBase") {
+  test("SPARK-36229: conv should return result equal to -1 in base of toBase") {

Review Comment:
   ditto.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "gerashegalov (via GitHub)" <gi...@apache.org>.
gerashegalov commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398269455


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala:
##########
@@ -23,6 +23,13 @@ import org.apache.spark.unsafe.types.UTF8String
 
 object NumberConverter {
 
+  /**
+   * The output string has a max length of one char per bit in the intermediate representation plus
+   * one char for the '-' sign.  This corresponds to the representation of `Long.MinValue` with
+   * `toBase` equal to -2.
+   */
+  private final val MAX_OUTPUT_LENGTH = 64 + 1

Review Comment:
   Since decoding is done via a Long interpreted as an unsigned long with an external `-` sign, 
   64 + 1 or `java.lang.Long.SIZE + 1`  is intuitive



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix ArrayIndexOutOfBoundsException in conv() [spark]

Posted by "gerashegalov (via GitHub)" <gi...@apache.org>.
gerashegalov commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398017568


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/NumberConverterSuite.scala:
##########
@@ -55,6 +55,12 @@ class NumberConverterSuite extends SparkFunSuite {
     checkConv("-10", 11, 7, "45012021522523134134555")
   }
 
+  test("SPARK-44973: conv must allocate enough space for all digits plus negative sign") {
+    checkConv(s"${Long.MinValue}", 10, -2, "-1" + "0" * 63)
+    checkConv("8" + "0" * 15, 16, -2, "-1" + "0" * 63)

Review Comment:
   String construction is a little convoluted, we could use `BigInt` to make it more obvious, e.g.:
   
   ```Scala
   BigInt(Int.MinValue).toString(2)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "markj-db (via GitHub)" <gi...@apache.org>.
markj-db commented on PR #43880:
URL: https://github.com/apache/spark/pull/43880#issuecomment-1820291990

   @dongjoon-hyun are there any further blockers to merging this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398275341


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala:
##########
@@ -23,6 +23,13 @@ import org.apache.spark.unsafe.types.UTF8String
 
 object NumberConverter {
 
+  /**
+   * The output string has a max length of one char per bit in the intermediate representation plus
+   * one char for the '-' sign.  This corresponds to the representation of `Long.MinValue` with
+   * `toBase` equal to -2.
+   */
+  private final val MAX_OUTPUT_LENGTH = 64 + 1

Review Comment:
   +1 for @gerashegalov 's suggestion `java.lang.Long.SIZE + 1`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398474288


##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -227,15 +227,15 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     checkAnswer(df.selectExpr("""conv("-10", 16, -10)"""), Row("-16"))
   }
 
-  test("SPARK-33428 conv function should trim input string") {
+  test("SPARK-33428: conv function should trim input string") {

Review Comment:
   ditto



##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -227,15 +227,15 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     checkAnswer(df.selectExpr("""conv("-10", 16, -10)"""), Row("-16"))
   }
 
-  test("SPARK-33428 conv function should trim input string") {
+  test("SPARK-33428: conv function should trim input string") {
     val df = Seq(("abc"), ("  abc"), ("abc  "), ("  abc  ")).toDF("num")
     checkAnswer(df.select(conv($"num", 16, 10)),
       Seq(Row("2748"), Row("2748"), Row("2748"), Row("2748")))
     checkAnswer(df.select(conv($"num", 16, -10)),
       Seq(Row("2748"), Row("2748"), Row("2748"), Row("2748")))
   }
 
-  test("SPARK-33428 conv function shouldn't raise error if input string is too big") {
+  test("SPARK-33428: conv function shouldn't raise error if input string is too big") {

Review Comment:
   ditto.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix ArrayIndexOutOfBoundsException in conv() [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398100978


##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -262,6 +262,14 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
+  test("SPARK-44973 conv must allocate enough space for all digits plus negative sign") {
+    withSQLConf(SQLConf.ANSI_ENABLED.key -> false.toString) {
+      val df = Seq(("8" + "0"*15), ("-8" + "0" * 15)).toDF("num")

Review Comment:
   It's interesting. I expect a linter error at `"0"*15` due to the lack of space, but it's passed.
   
   In anyway, shall we change like this?
   ```
   - "0"*15
   + "0" * 15
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix ArrayIndexOutOfBoundsException in conv() [spark]

Posted by "markj-db (via GitHub)" <gi...@apache.org>.
markj-db commented on PR #43880:
URL: https://github.com/apache/spark/pull/43880#issuecomment-1817294401

   @dchvn @cloud-fan could you please review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix ArrayIndexOutOfBoundsException in conv() [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398037979


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala:
##########
@@ -23,6 +23,13 @@ import org.apache.spark.unsafe.types.UTF8String
 
 object NumberConverter {
 
+  /**
+   * The output string has a max length of one char per bit in the intermediate representation plus
+   * one char for the '-' sign.  This corresponds to the representation of `Long.MinValue` with
+   * `toBase` equal to -2.
+   */
+  private final val MAX_OUTPUT_LENGTH = 64 + 1

Review Comment:
   Increase the capacity of buffer seems not treat the root cause.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "markj-db (via GitHub)" <gi...@apache.org>.
markj-db commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1399558967


##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -262,6 +262,17 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
+  test("SPARK-44973: conv must allocate enough space for all digits plus negative sign") {
+    withSQLConf(SQLConf.ANSI_ENABLED.key -> false.toString) {
+      val df = Seq(
+        ((BigInt(Long.MaxValue) + 1).toString(16)),
+        (BigInt(Long.MinValue).toString(16))

Review Comment:
   Both `8000000000000000` and `-8000000000000000` trigger the `java.lang.ArrayIndexOutOfBoundsException` before the fix.  There's not really a big change in path coverage with the addition of the second test case, but there's also not really a big change in test runtime.  Since both cases were failing before, it seemed an appropriate tradeoff (to me) to test that the fix addressed both cases.
   
   I suppose I'm more concerned about future regressions due to unknown future refactoring of `NumberConverter` and the benefit of documenting this edge case (the documentation of `conv()` is sparse, in the sense that it has no discussion of edge cases or the effect of the internal `long` representation).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "markj-db (via GitHub)" <gi...@apache.org>.
markj-db commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398321440


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala:
##########
@@ -23,6 +23,13 @@ import org.apache.spark.unsafe.types.UTF8String
 
 object NumberConverter {
 
+  /**
+   * The output string has a max length of one char per bit in the intermediate representation plus
+   * one char for the '-' sign.  This corresponds to the representation of `Long.MinValue` with
+   * `toBase` equal to -2.
+   */
+  private final val MAX_OUTPUT_LENGTH = 64 + 1

Review Comment:
   Ah, thanks, I agree, I was searching for a suitable constant and failed to find it.



##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -262,6 +262,14 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
+  test("SPARK-44973 conv must allocate enough space for all digits plus negative sign") {
+    withSQLConf(SQLConf.ANSI_ENABLED.key -> false.toString) {
+      val df = Seq(("8" + "0"*15), ("-8" + "0" * 15)).toDF("num")

Review Comment:
   I went with the `BigInt` proposal



##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -262,6 +262,14 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
+  test("SPARK-44973 conv must allocate enough space for all digits plus negative sign") {

Review Comment:
   I was following the convention in the file 🤷 



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala:
##########
@@ -23,6 +23,13 @@ import org.apache.spark.unsafe.types.UTF8String
 
 object NumberConverter {
 
+  /**
+   * The output string has a max length of one char per bit in the intermediate representation plus
+   * one char for the '-' sign.  This corresponds to the representation of `Long.MinValue` with
+   * `toBase` equal to -2.
+   */
+  private final val MAX_OUTPUT_LENGTH = 64 + 1

Review Comment:
   I don't think the problem is the length of the intermediate representation exceeding 64, it's when it's exactly 64 and we need an additional byte for the '-' sign.  I've updated the comment; hopefully it's clearer?



##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -262,6 +262,14 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
+  test("SPARK-44973 conv must allocate enough space for all digits plus negative sign") {

Review Comment:
   I was following the convention in the file 🤷 



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/NumberConverterSuite.scala:
##########
@@ -55,6 +55,12 @@ class NumberConverterSuite extends SparkFunSuite {
     checkConv("-10", 11, 7, "45012021522523134134555")
   }
 
+  test("SPARK-44973: conv must allocate enough space for all digits plus negative sign") {
+    checkConv(s"${Long.MinValue}", 10, -2, "-1" + "0" * 63)
+    checkConv("8" + "0" * 15, 16, -2, "-1" + "0" * 63)

Review Comment:
   Done (although IMO this is less obvious in some sense)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398474977


##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -262,6 +262,17 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
+  test("SPARK-44973: conv must allocate enough space for all digits plus negative sign") {
+    withSQLConf(SQLConf.ANSI_ENABLED.key -> false.toString) {
+      val df = Seq(
+        ((BigInt(Long.MaxValue) + 1).toString(16)),
+        (BigInt(Long.MinValue).toString(16))

Review Comment:
   May I ask what is the target test coverage for this? ` -8000000000000000` is meaningful?
   ```
   scala> BigInt(Long.MinValue)
   res0: scala.math.BigInt = -9223372036854775808
   
   scala> BigInt(Long.MinValue).toString(16)
   res1: String = -8000000000000000
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "gerashegalov (via GitHub)" <gi...@apache.org>.
gerashegalov commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1401056227


##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -262,6 +262,17 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
+  test("SPARK-44973: conv must allocate enough space for all digits plus negative sign") {
+    withSQLConf(SQLConf.ANSI_ENABLED.key -> false.toString) {
+      val df = Seq(
+        ((BigInt(Long.MaxValue) + 1).toString(16)),
+        (BigInt(Long.MinValue).toString(16))

Review Comment:
   +1 for the lack of documentation on the Spark side, I think it is historically borrowed from MySQL via Hive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix ArrayIndexOutOfBoundsException in conv() [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398100852


##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -262,6 +262,14 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
+  test("SPARK-44973 conv must allocate enough space for all digits plus negative sign") {

Review Comment:
   nit.
   ```
   - SPARK-44973
   + SPARK-44973:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398717748


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala:
##########
@@ -23,6 +23,13 @@ import org.apache.spark.unsafe.types.UTF8String
 
 object NumberConverter {
 
+  /**
+   * The output string has a max length of one char per bit in the intermediate representation plus
+   * one char for the '-' sign.  This corresponds to the representation of `Long.MinValue` with
+   * `toBase` equal to -2.
+   */
+  private final val MAX_OUTPUT_LENGTH = 64 + 1

Review Comment:
   Yeah. It's exactly 64. I made a slip of the tongue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "gerashegalov (via GitHub)" <gi...@apache.org>.
gerashegalov commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1399564669


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/NumberConverterSuite.scala:
##########
@@ -55,6 +55,12 @@ class NumberConverterSuite extends SparkFunSuite {
     checkConv("-10", 11, 7, "45012021522523134134555")
   }
 
+  test("SPARK-44973: conv must allocate enough space for all digits plus negative sign") {
+    checkConv(s"${Long.MinValue}", 10, -2, "-1" + "0" * 63)
+    checkConv("8" + "0" * 15, 16, -2, "-1" + "0" * 63)

Review Comment:
   It was not meant as a blocking comment either way



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix ArrayIndexOutOfBoundsException in conv() [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398087442


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala:
##########
@@ -23,6 +23,13 @@ import org.apache.spark.unsafe.types.UTF8String
 
 object NumberConverter {
 
+  /**
+   * The output string has a max length of one char per bit in the intermediate representation plus
+   * one char for the '-' sign.  This corresponds to the representation of `Long.MinValue` with
+   * `toBase` equal to -2.
+   */
+  private final val MAX_OUTPUT_LENGTH = 64 + 1

Review Comment:
   I think use 65 is more suitable.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala:
##########
@@ -23,6 +23,13 @@ import org.apache.spark.unsafe.types.UTF8String
 
 object NumberConverter {
 
+  /**
+   * The output string has a max length of one char per bit in the intermediate representation plus
+   * one char for the '-' sign.  This corresponds to the representation of `Long.MinValue` with
+   * `toBase` equal to -2.
+   */
+  private final val MAX_OUTPUT_LENGTH = 64 + 1

Review Comment:
   Shall we add some comment like
   `Sometimes the length of intermediate representation exceeds 64, we obtain the first byte to store one char for the '-' sign.` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398474229


##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -244,7 +244,7 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
-  test("SPARK-36229 inconsistently behaviour where returned value is above the 64 char threshold") {
+  test("SPARK-36229: inconsistent behaviour where returned value is above the 64 char threshold") {

Review Comment:
   Please revert irrelevant style changes from this PR. It makes backporting your PR difficult.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #43880: [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()`
URL: https://github.com/apache/spark/pull/43880


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "markj-db (via GitHub)" <gi...@apache.org>.
markj-db commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1399558967


##########
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala:
##########
@@ -262,6 +262,17 @@ class MathFunctionsSuite extends QueryTest with SharedSparkSession {
     }
   }
 
+  test("SPARK-44973: conv must allocate enough space for all digits plus negative sign") {
+    withSQLConf(SQLConf.ANSI_ENABLED.key -> false.toString) {
+      val df = Seq(
+        ((BigInt(Long.MaxValue) + 1).toString(16)),
+        (BigInt(Long.MinValue).toString(16))

Review Comment:
   Both `8000000000000000` and `-8000000000000000` trigger the `java.lang.ArrayIndexOutOfBoundsException` before the fix.  There's not really a big change in path coverage with the addition of the second test case, but there's also not really a big change in test runtime.  Since both cases were failing before, it seemed an appropriate tradeoff (to me) to test that the fix addressed both cases.
   
   I suppose I'm more concerned about future regressions due to unknown future refactoring of `NumberConverter` and the benefit of documenting this edge case (the documentation of `conv()` is sparse, in the sense that it has no discussion of edge cases or the effect of the internal `long` representation).
   
   As an example, it's counterintuitive to me that `conv("8000000000000000",16,-2)==conv("-8000000000000000",16,-2)` but `conv("7fffffffffffffff",16,-2)!=conv("-7fffffffffffffff",16,-2)`.
   
   ```
   scala> spark.sql(s"""select col1, conv(col1,16,-2) from values ("${"8"+"0"*15}"), ("${"-8"+"0"*15}"), ("${"7"+"f"*15}"), ("${"-7"+"f"*15}"), ("${"8" + "0" * 14 + "1"}"), ("${"-8" + "0" * 14 + "1"}")""").show(truncate=false)
   +-----------------+-----------------------------------------------------------------+
   |col1             |conv(col1, 16, -2)                                               |
   +-----------------+-----------------------------------------------------------------+
   |8000000000000000 |-1000000000000000000000000000000000000000000000000000000000000000|
   |-8000000000000000|-1000000000000000000000000000000000000000000000000000000000000000|
   |7fffffffffffffff |111111111111111111111111111111111111111111111111111111111111111  |
   |-7fffffffffffffff|-111111111111111111111111111111111111111111111111111111111111111 |
   |8000000000000001 |-111111111111111111111111111111111111111111111111111111111111111 |
   |-8000000000000001|-111111111111111111111111111111111111111111111111111111111111111 |
   +-----------------+-----------------------------------------------------------------+
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44973][SQL] Fix `ArrayIndexOutOfBoundsException` in `conv()` [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #43880:
URL: https://github.com/apache/spark/pull/43880#discussion_r1398718509


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala:
##########
@@ -23,6 +23,13 @@ import org.apache.spark.unsafe.types.UTF8String
 
 object NumberConverter {
 
+  /**
+   * The output string has a max length of one char per bit in the intermediate representation plus
+   * one char for the '-' sign.  This corresponds to the representation of `Long.MinValue` with
+   * `toBase` equal to -2.
+   */
+  private final val MAX_OUTPUT_LENGTH = 64 + 1

Review Comment:
   Thank you for the updated comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org