You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "zhengruifeng (via GitHub)" <gi...@apache.org> on 2024/01/21 06:11:20 UTC

[PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

zhengruifeng opened a new pull request, #44821:
URL: https://github.com/apache/spark/pull/44821

   ### What changes were proposed in this pull request?
   `bloomFilter` function should throw `AnalysisException` for invalid input
   
   ### Why are the changes needed?
   
   1. `BloomFilterAggregate` itself validates the input, and throws meaningful errors. we should not handle those invalid input and throw `InvalidPlanInput` in Planner.
   2. to be consistent with vanilla Scala API and other functions
   
   ### Does this PR introduce _any_ user-facing change?
   yes, `InvalidPlanInput` -> `AnalysisException`
   
   
   ### How was this patch tested?
   updated CI
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1462982961


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -1805,31 +1805,8 @@ class SparkConnectPlanner(
       case "bloom_filter_agg" if fun.getArgumentsCount == 3 =>
         // [col, expectedNumItems: Long, numBits: Long]
         val children = fun.getArgumentsList.asScala.map(transformExpression)
-
-        // Check expectedNumItems is LongType and value greater than 0L
-        val expectedNumItemsExpr = children(1)
-        val expectedNumItems = expectedNumItemsExpr match {
-          case Literal(l: Long, LongType) => l
-          case _ =>
-            throw InvalidPlanInput("Expected insertions must be long literal.")

Review Comment:
   got it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on PR #44821:
URL: https://github.com/apache/spark/pull/44821#issuecomment-1907199604

   @LuciferYang then I think we can try add following check back
   ```
       if (fpp <= 0D || fpp >= 1D) {
         throw new IllegalArgumentException(
           "False positive probability must be within range (0.0, 1.0)"
         );
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1464261693


##########
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##########
@@ -536,6 +536,11 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
    * @since 2.0.0
    */
   def bloomFilter(col: Column, expectedNumItems: Long, fpp: Double): BloomFilter = {
+    if (fpp <= 0D || fpp >= 1D) {

Review Comment:
   yeah, on second thought, it's worth a separate PR to unify the check of `fpp`.
   let me remove the fpp check for now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1463074536


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##########
@@ -644,25 +644,7 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo
    * @since 3.5.0
    */
   def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): BloomFilter = {
-    buildBloomFilter(col, expectedNumItems, numBits, Double.NaN)
-  }
-
-  private def buildBloomFilter(
-      col: Column,
-      expectedNumItems: Long,
-      numBits: Long,
-      fpp: Double): BloomFilter = {
-    def numBitsValue: Long = if (!fpp.isNaN) {
-      BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
-    } else {
-      numBits
-    }
-
-    if (fpp <= 0d || fpp >= 1d) {

Review Comment:
   After the refactoring work https://github.com/apache/spark/pull/43391 and https://github.com/apache/spark/pull/44821/files, if an invalid `fpp` value is input, like `df.stat.bloomFilter("id", 1000, -1.0)`
   
   I will see the following error message:
   
   ```
   [info] org.apache.spark.sql.AnalysisException: 
   [DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE] Cannot resolve "bloom_filter_agg(id, 1000, 0)" due to data type mismatch: The numBits must be between [0, positive] (current value = 0L). SQLSTATE: 42K09
   ```
   
   Personally feel that the new error message is not very user-friendly. My input expression is not `bloom_filter_agg(id, 1000, 0)`, and I did not specify the `numBits` parameter. Why is this error reported? How should I fix it?
   
   Is there a way to improve this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on PR #44821:
URL: https://github.com/apache/spark/pull/44821#issuecomment-1909273074

   Merged into master for Spark 4.0. Thanks @zhengruifeng @beliefer and @MaxGekk ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1462912072


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##########
@@ -644,25 +644,7 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo
    * @since 3.5.0
    */
   def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): BloomFilter = {
-    buildBloomFilter(col, expectedNumItems, numBits, Double.NaN)
-  }
-
-  private def buildBloomFilter(
-      col: Column,
-      expectedNumItems: Long,
-      numBits: Long,
-      fpp: Double): BloomFilter = {
-    def numBitsValue: Long = if (!fpp.isNaN) {
-      BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
-    } else {
-      numBits
-    }
-
-    if (fpp <= 0d || fpp >= 1d) {

Review Comment:
   we don't have such a check in `DataFrameStatFunctions` of vanilla spark, `BloomFilterAggregate` will check the value range



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1463058097


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientDataFrameStatSuite.scala:
##########
@@ -248,19 +248,19 @@ class ClientDataFrameStatSuite extends RemoteSparkSession {
 
   test("Bloom filter test invalid inputs") {
     val df = spark.range(1000).toDF("id")
-    val message1 = intercept[SparkException] {
+    val error1 = intercept[AnalysisException] {

Review Comment:
   ```
   [info] - Bloom filter test invalid inputs *** FAILED *** (13 milliseconds)
   [info]   "[DATATYPE_MISMATCH.]VALUE_OUT_OF_RANGE" did not equal "[]VALUE_OUT_OF_RANGE" (ClientDataFrameStatSuite.scala:254)
   [info]   Analysis:
   [info]   "[DATATYPE_MISMATCH.]VALUE_OUT_OF_RANGE" -> "[]VALUE_OUT_OF_RANGE"
   ```



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##########
@@ -644,25 +644,7 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo
    * @since 3.5.0
    */
   def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): BloomFilter = {
-    buildBloomFilter(col, expectedNumItems, numBits, Double.NaN)
-  }
-
-  private def buildBloomFilter(
-      col: Column,
-      expectedNumItems: Long,
-      numBits: Long,
-      fpp: Double): BloomFilter = {
-    def numBitsValue: Long = if (!fpp.isNaN) {
-      BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
-    } else {
-      numBits
-    }
-
-    if (fpp <= 0d || fpp >= 1d) {

Review Comment:
   For Vanilla Spark, before the refactoring work https://github.com/apache/spark/pull/43391, there was this check logic when constructing a `BloomFilter` using the following constructor:
   
   https://github.com/apache/spark/blob/bc889c8c2adb976bdf3e7f020e3c7c3de5339c54/common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java#L244-L251



##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##########
@@ -644,25 +644,7 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo
    * @since 3.5.0
    */
   def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): BloomFilter = {
-    buildBloomFilter(col, expectedNumItems, numBits, Double.NaN)
-  }
-
-  private def buildBloomFilter(
-      col: Column,
-      expectedNumItems: Long,
-      numBits: Long,
-      fpp: Double): BloomFilter = {
-    def numBitsValue: Long = if (!fpp.isNaN) {
-      BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
-    } else {
-      numBits
-    }
-
-    if (fpp <= 0d || fpp >= 1d) {

Review Comment:
   After the refactoring work https://github.com/apache/spark/pull/43391 and https://github.com/apache/spark/pull/44821/files, if an invalid `fpp` value is input, like `df.stat.bloomFilter("id", 1000, -1.0)`
   
   I will see the following error message:
   
   ```
   [info] org.apache.spark.sql.AnalysisException: 
   [DATATYPE_MISMATCH.VALUE_OUT_OF_RANGE] Cannot resolve "bloom_filter_agg(id, 1000, 0)" due to data type mismatch: The numBits must be between [0, positive] (current value = 0L). SQLSTATE: 42K09
   ```
   
   Personally feel that the new error message is not very user-friendly. My input expression is not `bloom_filter_agg(id, 1000, 0)`, and I did not specify the `numBits` parameter. Why is this error reported? How should I fix it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1462980844


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##########
@@ -644,25 +644,7 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo
    * @since 3.5.0
    */
   def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): BloomFilter = {
-    buildBloomFilter(col, expectedNumItems, numBits, Double.NaN)
-  }
-
-  private def buildBloomFilter(
-      col: Column,
-      expectedNumItems: Long,
-      numBits: Long,
-      fpp: Double): BloomFilter = {
-    def numBitsValue: Long = if (!fpp.isNaN) {
-      BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
-    } else {
-      numBits
-    }
-
-    if (fpp <= 0d || fpp >= 1d) {

Review Comment:
   I see. So this change is try to fix the bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1464238576


##########
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##########
@@ -536,6 +536,11 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
    * @since 2.0.0
    */
   def bloomFilter(col: Column, expectedNumItems: Long, fpp: Double): BloomFilter = {
+    if (fpp <= 0D || fpp >= 1D) {

Review Comment:
   Perhaps in a follow-up, we could try moving the fpp check to the `BloomFilter#optimalNumOfBits(long, double)` method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1461328861


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##########
@@ -644,25 +644,7 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo
    * @since 3.5.0
    */
   def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): BloomFilter = {
-    buildBloomFilter(col, expectedNumItems, numBits, Double.NaN)
-  }
-
-  private def buildBloomFilter(
-      col: Column,
-      expectedNumItems: Long,
-      numBits: Long,
-      fpp: Double): BloomFilter = {
-    def numBitsValue: Long = if (!fpp.isNaN) {
-      BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
-    } else {
-      numBits
-    }
-
-    if (fpp <= 0d || fpp >= 1d) {

Review Comment:
   I don't know the reason that remove the check.
   It seems is a break change.



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -1805,31 +1805,8 @@ class SparkConnectPlanner(
       case "bloom_filter_agg" if fun.getArgumentsCount == 3 =>
         // [col, expectedNumItems: Long, numBits: Long]
         val children = fun.getArgumentsList.asScala.map(transformExpression)
-
-        // Check expectedNumItems is LongType and value greater than 0L
-        val expectedNumItemsExpr = children(1)
-        val expectedNumItems = expectedNumItemsExpr match {
-          case Literal(l: Long, LongType) => l
-          case _ =>
-            throw InvalidPlanInput("Expected insertions must be long literal.")

Review Comment:
   Why remove these checks?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1464292127


##########
sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##########
@@ -536,6 +536,11 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
    * @since 2.0.0
    */
   def bloomFilter(col: Column, expectedNumItems: Long, fpp: Double): BloomFilter = {
+    if (fpp <= 0D || fpp >= 1D) {

Review Comment:
   OK



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang closed pull request #44821: [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input
URL: https://github.com/apache/spark/pull/44821


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1461044134


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientDataFrameStatSuite.scala:
##########
@@ -248,19 +248,19 @@ class ClientDataFrameStatSuite extends RemoteSparkSession {
 
   test("Bloom filter test invalid inputs") {
     val df = spark.range(1000).toDF("id")
-    val message1 = intercept[SparkException] {
+    val message1 = intercept[AnalysisException] {
       df.stat.bloomFilter("id", -1000, 100)
     }.getMessage
-    assert(message1.contains("Expected insertions must be positive"))
+    assert(message1.contains("VALUE_OUT_OF_RANGE"))

Review Comment:
   Could you just invoke the `getErrorClass()` method, please.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1461292535


##########
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientDataFrameStatSuite.scala:
##########
@@ -248,19 +248,19 @@ class ClientDataFrameStatSuite extends RemoteSparkSession {
 
   test("Bloom filter test invalid inputs") {
     val df = spark.range(1000).toDF("id")
-    val message1 = intercept[SparkException] {
+    val message1 = intercept[AnalysisException] {
       df.stat.bloomFilter("id", -1000, 100)
     }.getMessage
-    assert(message1.contains("Expected insertions must be positive"))
+    assert(message1.contains("VALUE_OUT_OF_RANGE"))

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1462908737


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -1805,31 +1805,8 @@ class SparkConnectPlanner(
       case "bloom_filter_agg" if fun.getArgumentsCount == 3 =>
         // [col, expectedNumItems: Long, numBits: Long]
         val children = fun.getArgumentsList.asScala.map(transformExpression)
-
-        // Check expectedNumItems is LongType and value greater than 0L
-        val expectedNumItemsExpr = children(1)
-        val expectedNumItems = expectedNumItemsExpr match {
-          case Literal(l: Long, LongType) => l
-          case _ =>
-            throw InvalidPlanInput("Expected insertions must be long literal.")

Review Comment:
   `InvalidPlanInput` is kind of internal exception, we should throw an analyzer exception for:
   1, to be consistent with vanilla spark;
   2, better error message



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46787][CONNECT] `bloomFilter` function should throw `AnalysisException` for invalid input [spark]

Posted by "zhengruifeng (via GitHub)" <gi...@apache.org>.
zhengruifeng commented on code in PR #44821:
URL: https://github.com/apache/spark/pull/44821#discussion_r1462912072


##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala:
##########
@@ -644,25 +644,7 @@ final class DataFrameStatFunctions private[sql] (sparkSession: SparkSession, roo
    * @since 3.5.0
    */
   def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): BloomFilter = {
-    buildBloomFilter(col, expectedNumItems, numBits, Double.NaN)
-  }
-
-  private def buildBloomFilter(
-      col: Column,
-      expectedNumItems: Long,
-      numBits: Long,
-      fpp: Double): BloomFilter = {
-    def numBitsValue: Long = if (!fpp.isNaN) {
-      BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
-    } else {
-      numBits
-    }
-
-    if (fpp <= 0d || fpp >= 1d) {

Review Comment:
   we don't have such a check in vanilla spark, `BloomFilterAggregate` will check the value range



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org