You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/18 08:26:59 UTC
[GitHub] [spark-website] cloud-fan commented on a change in pull request #361: Add 3.2.0 release note and news and update links

cloud-fan commented on a change in pull request #361:
URL: https://github.com/apache/spark-website/pull/361#discussion_r730684820



##########
File path: releases/_posts/2021-10-13-spark-release-3-2-0.md
##########
@@ -0,0 +1,318 @@
+---
+layout: post
+title: Spark Release 3.2.0
+categories: []
+tags: []
+status: publish
+type: post
+published: true
+meta:
+_edit_last: '4'
+_wpas_done_all: '1'
+---
+
+Apache Spark 3.2.0 is the third release of the 3.x line. With tremendous contribution from the open-source community, this release managed to resolve in excess of 1,700 Jira tickets.
+
+In this release, Spark supports the Pandas API layer on Spark. Pandas users can scale out their applications on Spark with one line code change. Other major updates include RocksDB StateStore support, session window support, push-based shuffle support, ANSI SQL INTERVAL types, enabling Adaptive Query Execution (AQE) by default, and ANSI SQL mode GA.
+
+To download Apache Spark 3.2.0, visit the [downloads](https://spark.apache.org/downloads.html) page. You can consult JIRA for the [detailed changes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420&version=12349407). We have curated a list of high level changes here, grouped by major modules.
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+### Highlights
+
+* Support Pandas API layer on PySpark ([SPARK-34849](https://issues.apache.org/jira/browse/SPARK-34849))
+* Support push-based shuffle to improve shuffle efficiency ([SPARK-30602](https://issues.apache.org/jira/browse/SPARK-30602))
+* Add RocksDB StateStore implementation ([SPARK-34198](https://issues.apache.org/jira/browse/SPARK-34198))
+* EventTime based sessionization (session window) ([SPARK-10816](https://issues.apache.org/jira/browse/SPARK-10816))
+* ANSI SQL mode GA ([SPARK-35030](https://issues.apache.org/jira/browse/SPARK-35030))
+* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+* Query compilation latency reduction ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042), [SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103), [SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+* Support Scala 2.13 ([SPARK-34218](https://issues.apache.org/jira/browse/SPARK-34218))
+
+
+### Core and Spark SQL
+
+**ANSI SQL Compatibility Enhancements**
+
+* Support for ANSI SQL INTERVAL types ([SPARK-27790](https://issues.apache.org/jira/browse/SPARK-27790))
+* New type coercion syntax rules in ANSI mode ([SPARK-34246](https://issues.apache.org/jira/browse/SPARK-34246))
+* Support LATERAL subqueries ([SPARK-34382](https://issues.apache.org/jira/browse/SPARK-34382))
+* ANSI mode: IntegralDivide throws an exception on overflow ([SPARK-35152](https://issues.apache.org/jira/browse/SPARK-35152))
+* ANSI mode: Check for overflow in Average ([SPARK-35955](https://issues.apache.org/jira/browse/SPARK-35955))
+* Block count(table.*) to follow ANSI standard and other SQL engines ([SPARK-34199](https://issues.apache.org/jira/browse/SPARK-34199))
+
+**Performance**
+
+* Query compilation latency
+  * Support traversal pruning in transform/resolve functions and their call sites ([SPARK-35042](https://issues.apache.org/jira/browse/SPARK-35042))
+  * Improve the performance of mapChildren and withNewChildren methods ([SPARK-34989](https://issues.apache.org/jira/browse/SPARK-34989))
+  * Improve the performance of type coercion rules ([SPARK-35103](https://issues.apache.org/jira/browse/SPARK-35103))
+* Query optimization
+  * Remove redundant aggregates in the Optimizer ([SPARK-33122](https://issues.apache.org/jira/browse/SPARK-33122))
+  * Push down limit through Project with Join ([SPARK-34622](https://issues.apache.org/jira/browse/SPARK-34622))
+  * Push down limit for LEFT SEMI and LEFT ANTI join ([SPARK-36404](https://issues.apache.org/jira/browse/SPARK-36404), [SPARK-34514](https://issues.apache.org/jira/browse/SPARK-34514))
+  * Push down limit through WINDOW when partition spec is empty ([SPARK-34575](https://issues.apache.org/jira/browse/SPARK-34575))
+  * Use a relative cost comparison function in the CBO ([SPARK-34922](https://issues.apache.org/jira/browse/SPARK-34922))
+  * Cardinality estimation of union, sort, and range operator ([SPARK-33411](https://issues.apache.org/jira/browse/SPARK-33411))
+  * Only push down LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join ([SPARK-34081](https://issues.apache.org/jira/browse/SPARK-34081))
+  * UnwrapCastInBinaryComparison support In/InSet predicate ([SPARK-35316](https://issues.apache.org/jira/browse/SPARK-35316))
+  * Subexpression elimination enhancements ([SPARK-35448](https://issues.apache.org/jira/browse/SPARK-35448))
+  * Keep necessary stats after partition pruning ([SPARK-34119](https://issues.apache.org/jira/browse/SPARK-34119))
+  * Decouple bucket filter pruning and bucket table scan ([SPARK-32985](https://issues.apache.org/jira/browse/SPARK-32985))
+* Query execution
+  * Adaptive query execution
+    * Enable adaptive query execution by default ([SPARK-33679](https://issues.apache.org/jira/browse/SPARK-33679))
+    * Support Dynamic Partition Pruning (DPP) in AQE when the join is broadcast hash join at the beginning or there is no reused broadcast exchange ([SPARK-34168](https://issues.apache.org/jira/browse/SPARK-34168), [SPARK-35710](https://issues.apache.org/jira/browse/SPARK-35710))
+    * Optimize skew join before coalescing shuffle partitions ([SPARK-35447](https://issues.apache.org/jira/browse/SPARK-35447))
+    * Support AQE side shuffled hash join formula using rule ([SPARK-35282](https://issues.apache.org/jira/browse/SPARK-35282))
+    * Support AQE side broadcast hash join threshold ([SPARK-35264](https://issues.apache.org/jira/browse/SPARK-35264))
+    * Allow custom plugin for AQE cost evaluator ([SPARK-35794](https://issues.apache.org/jira/browse/SPARK-35794))
+  * Enable Zstandard buffer pool by default ([SPARK-34340](https://issues.apache.org/jira/browse/SPARK-34340), [SPARK-34390](https://issues.apache.org/jira/browse/SPARK-34390))
+  * Add code-gen for all join types of sort-merge join ([SPARK-34705](https://issues.apache.org/jira/browse/SPARK-34705))
+  * Whole plan exchange and subquery reuse ([SPARK-29375](https://issues.apache.org/jira/browse/SPARK-29375))
+  * Broadcast nested loop join improvement ([SPARK-34706](https://issues.apache.org/jira/browse/SPARK-34706))
+  * Support two levels of hash maps for final hash aggregation ([SPARK-35141](https://issues.apache.org/jira/browse/SPARK-35141))

Review comment:
       let's remove this one, we turned it off in a followup PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org