You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2023/03/01 09:29:36 UTC

[spark] branch branch-3.4 updated: [SPARK-42628][SQL][DOCS] Add a migration note for bloom filter join

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new 48874d9b398 [SPARK-42628][SQL][DOCS] Add a migration note for bloom filter join
48874d9b398 is described below

commit 48874d9b39806b9f746288b86d6bb770c12fc142
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Wed Mar 1 01:29:01 2023 -0800

    [SPARK-42628][SQL][DOCS] Add a migration note for bloom filter join
    
    ### What changes were proposed in this pull request?
    
    This PR aims to add a migration note for bloom filter join.
    
    ### Why are the changes needed?
    
    SPARK-38841 enabled bloom filter joins by default in Apache Spark 3.4.0.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Manual review.
    
    Closes #40231 from dongjoon-hyun/SPARK-42628.
    
    Authored-by: Dongjoon Hyun <do...@apache.org>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
    (cherry picked from commit 69d15c3a0e0184d5d2b1a5587d7a030969509cb6)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 docs/sql-migration-guide.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index a92e2b27218..7eda9c8de92 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -37,6 +37,7 @@ license: |
   - Since Spark 3.4, Spark will do validation for partition spec in ALTER PARTITION to follow the behavior of `spark.sql.storeAssignmentPolicy` which may cause an exception if type conversion fails, e.g. `ALTER TABLE .. ADD PARTITION(p='a')` if column `p` is int type. To restore the legacy behavior, set `spark.sql.legacy.skipTypeValidationOnAlterPartition` to `true`.
   - Since Spark 3.4, vectorized readers are enabled by default for the nested data types (array, map and struct). To restore the legacy behavior, set `spark.sql.orc.enableNestedColumnVectorizedReader` and `spark.sql.parquet.enableNestedColumnVectorizedReader` to `false`.
   - Since Spark 3.4, `BinaryType` is not supported in CSV datasource. In Spark 3.3 or earlier, users can write binary columns in CSV datasource, but the output content in CSV files is `Object.toString()` which is meaningless; meanwhile, if users read CSV tables with binary columns, Spark will throw an `Unsupported type: binary` exception.
+  - Since Spark 3.4, bloom filter joins are enabled by default. To restore the legacy behavior, set `spark.sql.optimizer.runtime.bloomFilter.enabled` to `false`.
 
 ## Upgrading from Spark SQL 3.2 to 3.3
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org