You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "NarekDW (via GitHub)" <gi...@apache.org> on 2023/08/09 19:08:49 UTC

[GitHub] [spark] NarekDW opened a new pull request, #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

NarekDW opened a new pull request, #40040:
URL: https://github.com/apache/spark/pull/40040

   ### What changes were proposed in this pull request?
   This PR proposes to use `BigInt` instead of `Long` datatype for  conversion of numbers between different radixes, because of this we will get rid of overflows.
   This change allows to convert big numbers, like:
   ```
   spark-sql> SELECT CONV(SUBSTRING('0xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff', 3), 16, 10);
   ```
   will give correct result:
   `115792089237316195423570985008687907853269984665640564039457584007913129639935`
   
   **P.S. According to the NumberConverterBenchmark it work faster with Java 8 and slower with Java 11 and 17.
   You can check benchmark results.**
   
   
   ### Why are the changes needed?
   Changes allow to convert big numbers among different radixes.
   
   ### Does this PR introduce _any_ user-facing change?
   Users could convert big values by using `conv` function, without an overflow.
   
   ### How was this patch tested?
   By existing tests + an additional test cases were added to:
   `sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala`
   `sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/NumberConverterSuite.scala`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] NarekDW commented on a diff in pull request #40040: [WIP][SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

Posted by "NarekDW (via GitHub)" <gi...@apache.org>.
NarekDW commented on code in PR #40040:
URL: https://github.com/apache/spark/pull/40040#discussion_r1107365294


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala:
##########
@@ -158,45 +158,42 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
   }
 
   test("conv") {
-    Seq(true, false).foreach { ansiEnabled =>
-      checkEvaluation(Conv(Literal("3"), Literal(10), Literal(2), ansiEnabled), "11")
-      checkEvaluation(Conv(Literal("-15"), Literal(10), Literal(-16), ansiEnabled), "-F")
-      checkEvaluation(
-        Conv(Literal("-15"), Literal(10), Literal(16), ansiEnabled), "FFFFFFFFFFFFFFF1")
-      checkEvaluation(Conv(Literal("big"), Literal(36), Literal(16), ansiEnabled), "3A48")
-      checkEvaluation(Conv(Literal.create(null, StringType), Literal(36), Literal(16), ansiEnabled),
-        null)
-      checkEvaluation(
-        Conv(Literal("3"), Literal.create(null, IntegerType), Literal(16), ansiEnabled), null)
-      checkEvaluation(
-        Conv(Literal("3"), Literal(16), Literal.create(null, IntegerType), ansiEnabled), null)
-      checkEvaluation(
-        Conv(Literal("1234"), Literal(10), Literal(37), ansiEnabled), null)
-      checkEvaluation(
-        Conv(Literal(""), Literal(10), Literal(16), ansiEnabled), null)
-
-      // If there is an invalid digit in the number, the longest valid prefix should be converted.
-      checkEvaluation(
-        Conv(Literal("11abc"), Literal(10), Literal(16), ansiEnabled), "B")

Review Comment:
   This behaviour is strange a bit, if it is required I'll add this functionality as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] NarekDW commented on pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

Posted by "NarekDW (via GitHub)" <gi...@apache.org>.
NarekDW commented on PR #40040:
URL: https://github.com/apache/spark/pull/40040#issuecomment-1674540086

   @holdenk sure, done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] NarekDW commented on a diff in pull request #40040: [WIP][SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

Posted by "NarekDW (via GitHub)" <gi...@apache.org>.
NarekDW commented on code in PR #40040:
URL: https://github.com/apache/spark/pull/40040#discussion_r1107365294


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala:
##########
@@ -158,45 +158,42 @@ class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper {
   }
 
   test("conv") {
-    Seq(true, false).foreach { ansiEnabled =>
-      checkEvaluation(Conv(Literal("3"), Literal(10), Literal(2), ansiEnabled), "11")
-      checkEvaluation(Conv(Literal("-15"), Literal(10), Literal(-16), ansiEnabled), "-F")
-      checkEvaluation(
-        Conv(Literal("-15"), Literal(10), Literal(16), ansiEnabled), "FFFFFFFFFFFFFFF1")
-      checkEvaluation(Conv(Literal("big"), Literal(36), Literal(16), ansiEnabled), "3A48")
-      checkEvaluation(Conv(Literal.create(null, StringType), Literal(36), Literal(16), ansiEnabled),
-        null)
-      checkEvaluation(
-        Conv(Literal("3"), Literal.create(null, IntegerType), Literal(16), ansiEnabled), null)
-      checkEvaluation(
-        Conv(Literal("3"), Literal(16), Literal.create(null, IntegerType), ansiEnabled), null)
-      checkEvaluation(
-        Conv(Literal("1234"), Literal(10), Literal(37), ansiEnabled), null)
-      checkEvaluation(
-        Conv(Literal(""), Literal(10), Literal(16), ansiEnabled), null)
-
-      // If there is an invalid digit in the number, the longest valid prefix should be converted.
-      checkEvaluation(
-        Conv(Literal("11abc"), Literal(10), Literal(16), ansiEnabled), "B")

Review Comment:
   This behaviour is strange a bit, if it is required I'll add this functionality as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)
URL: https://github.com/apache/spark/pull/40040


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)
URL: https://github.com/apache/spark/pull/40040


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow) [spark]

Posted by "NarekDW (via GitHub)" <gi...@apache.org>.
NarekDW commented on PR #40040:
URL: https://github.com/apache/spark/pull/40040#issuecomment-1817419747

   @holdenk should I close this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40040:
URL: https://github.com/apache/spark/pull/40040#issuecomment-1579642044

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] holdenk commented on pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

Posted by "holdenk (via GitHub)" <gi...@apache.org>.
holdenk commented on PR #40040:
URL: https://github.com/apache/spark/pull/40040#issuecomment-1671994014

   CC @NarekDW could you update the PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] NarekDW commented on pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

Posted by "NarekDW (via GitHub)" <gi...@apache.org>.
NarekDW commented on PR #40040:
URL: https://github.com/apache/spark/pull/40040#issuecomment-1445410662

   @srielau could you take a look, pls?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] NarekDW commented on pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)

Posted by "NarekDW (via GitHub)" <gi...@apache.org>.
NarekDW commented on PR #40040:
URL: https://github.com/apache/spark/pull/40040#issuecomment-1445466115

   Also, I'd like to share some performance measurements from my local machine, using JMH:
   code example:
   ```java
   ...
       @Benchmark
       public void convert_branch_master(Blackhole bh) {
           // Convert to unsigned
           for (int i = -10_000; i < 10_000; i++) {
               UTF8String convert = NumberConverter
                       .convert(UTF8String.fromString(String.valueOf(i)).getBytes(), 10, 16);
               bh.consume(convert);
           }
   
           // Convert to signed
           for (int i = -10_000; i < 10_000; i++) {
               UTF8String convert = NumberConverter
                       .convert(UTF8String.fromString(String.valueOf(i)).getBytes(), 10, -16);
               bh.consume(convert);
           }
       }
   ...
   ```
   
   With Java 8, current PR even speeds up the performance:
   ```java
   # JMH version: 1.36
   # VM version: JDK 1.8.0_362, OpenJDK 64-Bit Server VM, 25.362-b00
   # VM invoker: /usr/local/Cellar/openjdk@8/1.8.0+362/libexec/openjdk.jdk/Contents/Home/jre/bin/java
   # Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
   # Warmup: 5 iterations, 10 s each
   # Measurement: 5 iterations, 10 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   
   
   
   Benchmark                                           Mode  Cnt   Score   Error  Units
   NumberConverterBenchmark.convert_branch_master      avgt   10  30.458 ± 1.638  ms/op
   NumberConverterBenchmark.convert_branch_spark42399  avgt   10  22.857 ± 0.421  ms/op
   
   ```
   
   With Java 11, implementation from master branch works more then 2 times faster than the same implementation with Java 8!
   But there is not a big gap in performance difference between master branch implementation and implementation from current branch. 
   ```java
   # JMH version: 1.36
   # VM version: JDK 11.0.16.1, OpenJDK 64-Bit Server VM, 11.0.16.1+0
   # VM invoker: /usr/local/Cellar/openjdk@11/11.0.16.1/libexec/openjdk.jdk/Contents/Home/bin/java
   # Blackhole mode: full + dont-inline hint (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
   # Warmup: 5 iterations, 10 s each
   # Measurement: 5 iterations, 10 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   
   Benchmark                                           Mode  Cnt   Score   Error  Units
   NumberConverterBenchmark.convert_branch_master      avgt   10  14.453 ± 1.082  ms/op
   NumberConverterBenchmark.convert_branch_spark42399  avgt   10  17.956 ± 0.489  ms/op
   
   ```
   
   With Java 17, implementation from master branch works more then 3 times faster than the same implementation with Java 8 and ~ 2 times faster then the same implementation with Java 11!
   And here is a significant difference between master branch implementation and current branch (master branch is ~2x times faster...). 
   ```java
   # JMH version: 1.36
   # VM version: JDK 17.0.4.1, OpenJDK 64-Bit Server VM, 17.0.4.1+1
   # VM invoker: /usr/local/Cellar/openjdk@17/17.0.4.1_1/libexec/openjdk.jdk/Contents/Home/bin/java
   # Blackhole mode: compiler (auto-detected, use -Djmh.blackhole.autoDetect=false to disable)
   # Warmup: 5 iterations, 10 s each
   # Measurement: 5 iterations, 10 s each
   # Timeout: 10 min per iteration
   # Threads: 1 thread, will synchronize iterations
   # Benchmark mode: Average time, time/op
   
   Benchmark                                           Mode  Cnt   Score   Error  Units
   NumberConverterBenchmark.convert_branch_master      avgt   10   8.410 ± 0.161  ms/op
   NumberConverterBenchmark.convert_branch_spark42399  avgt   10  18.162 ± 0.036  ms/op
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow) [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40040:
URL: https://github.com/apache/spark/pull/40040#issuecomment-1965565863

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow) [spark]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #40040: [SPARK-42399] [SQL] Support big numbers for conv function (get rid of overflow)
URL: https://github.com/apache/spark/pull/40040


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org