You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2023/03/01 09:29:36 UTC
[spark] branch branch-3.4 updated: [SPARK-42628][SQL][DOCS] Add a migration note for bloom filter join
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push:
new 48874d9b398 [SPARK-42628][SQL][DOCS] Add a migration note for bloom filter join
48874d9b398 is described below
commit 48874d9b39806b9f746288b86d6bb770c12fc142
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Wed Mar 1 01:29:01 2023 -0800
[SPARK-42628][SQL][DOCS] Add a migration note for bloom filter join
### What changes were proposed in this pull request?
This PR aims to add a migration note for bloom filter join.
### Why are the changes needed?
SPARK-38841 enabled bloom filter joins by default in Apache Spark 3.4.0.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual review.
Closes #40231 from dongjoon-hyun/SPARK-42628.
Authored-by: Dongjoon Hyun <do...@apache.org>
Signed-off-by: Dongjoon Hyun <do...@apache.org>
(cherry picked from commit 69d15c3a0e0184d5d2b1a5587d7a030969509cb6)
Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
docs/sql-migration-guide.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index a92e2b27218..7eda9c8de92 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -37,6 +37,7 @@ license: |
- Since Spark 3.4, Spark will do validation for partition spec in ALTER PARTITION to follow the behavior of `spark.sql.storeAssignmentPolicy` which may cause an exception if type conversion fails, e.g. `ALTER TABLE .. ADD PARTITION(p='a')` if column `p` is int type. To restore the legacy behavior, set `spark.sql.legacy.skipTypeValidationOnAlterPartition` to `true`.
- Since Spark 3.4, vectorized readers are enabled by default for the nested data types (array, map and struct). To restore the legacy behavior, set `spark.sql.orc.enableNestedColumnVectorizedReader` and `spark.sql.parquet.enableNestedColumnVectorizedReader` to `false`.
- Since Spark 3.4, `BinaryType` is not supported in CSV datasource. In Spark 3.3 or earlier, users can write binary columns in CSV datasource, but the output content in CSV files is `Object.toString()` which is meaningless; meanwhile, if users read CSV tables with binary columns, Spark will throw an `Unsupported type: binary` exception.
+ - Since Spark 3.4, bloom filter joins are enabled by default. To restore the legacy behavior, set `spark.sql.optimizer.runtime.bloomFilter.enabled` to `false`.
## Upgrading from Spark SQL 3.2 to 3.3
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org