You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Matei Zaharia <ma...@gmail.com> on 2022/08/10 18:16:06 UTC

ASF board report draft for August

It’s time to submit our quarterly report to the ASF board on Friday. Here is a draft, lmk if you have suggestions:

=======================

Description:

Apache Spark is a fast and general purpose engine for large-scale data
processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
well as a rich set of libraries including stream processing, machine learning,
and graph analytics.

Issues for the board:

- None

Project status:

- Apache Spark was honored to receive the SIGMOD System Award this year, given by SIGMOD (the ACM’s data management research organization) to impactful real-world and research systems.

- We recently released Apache Spark 3.3.0, a feature release that improves join query performance via Bloom filters, increases the Pandas API coverage with the support of popular Pandas features such as datetime.timedelta and merge_asof, simplifies the migration from traditional data warehouses by improving ANSI SQL compliance and supporting dozens of new built-in functions, boosts development productivity with better error handling, autocompletion, performance, and profiling.

- We released Apache Spark 3.2.2, a bug fix release for the 3.2 line, on July 17th.

- A Spark Project Improvement Proposal (SPIP) for Spark Connect was voted on and accepted. Spark Connect introduces a lightweight client/server API for Spark (https://issues.apache.org/jira/browse/SPARK-39375) that will allow applications to submit work to a remote Spark cluster without running the heavyweight query planner in the client, and will also decouple the client version from the server version, making it possible to update Spark without updating all the applications.

- We added three new PMC members, Huaxin Gao, Gengliang Wang and Maxim Gekk, in June 2022.

- We added a new committer, Xinrong Meng, in July 2022.

Trademarks:

- No changes since the last report.

Latest releases:

- Spark 3.3.0 was released on June 16, 2022.
- Spark 3.2.2 was released on July 17, 2022.
- Spark 3.1.3 was released on February 18, 2022.

Committers and PMC:

- The latest committer was added on July 13rd, 2022 (Xinrong Meng).
- The latest PMC member was added on June 28th, 2022 (Huaxin Gao).

=======================

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: ASF board report draft for August

Posted by Matei Zaharia <ma...@gmail.com>.
Actually I forgot to add one more item. I want to mention that the community started a large effort to improve Structured Streaming performance, usability, APIs, and connectors (https://issues.apache.org/jira/browse/SPARK-40025 <https://issues.apache.org/jira/browse/SPARK-40025>), and we’d love to get feedback and contributions on that.

> On Aug 10, 2022, at 11:16 AM, Matei Zaharia <ma...@gmail.com> wrote:
> 
> It’s time to submit our quarterly report to the ASF board on Friday. Here is a draft, lmk if you have suggestions:
> 
> =======================
> 
> Description:
> 
> Apache Spark is a fast and general purpose engine for large-scale data
> processing. It offers high-level APIs in Java, Scala, Python, R and SQL as
> well as a rich set of libraries including stream processing, machine learning,
> and graph analytics.
> 
> Issues for the board:
> 
> - None
> 
> Project status:
> 
> - Apache Spark was honored to receive the SIGMOD System Award this year, given by SIGMOD (the ACM’s data management research organization) to impactful real-world and research systems.
> 
> - We recently released Apache Spark 3.3.0, a feature release that improves join query performance via Bloom filters, increases the Pandas API coverage with the support of popular Pandas features such as datetime.timedelta and merge_asof, simplifies the migration from traditional data warehouses by improving ANSI SQL compliance and supporting dozens of new built-in functions, boosts development productivity with better error handling, autocompletion, performance, and profiling.
> 
> - We released Apache Spark 3.2.2, a bug fix release for the 3.2 line, on July 17th.
> 
> - A Spark Project Improvement Proposal (SPIP) for Spark Connect was voted on and accepted. Spark Connect introduces a lightweight client/server API for Spark (https://issues.apache.org/jira/browse/SPARK-39375) that will allow applications to submit work to a remote Spark cluster without running the heavyweight query planner in the client, and will also decouple the client version from the server version, making it possible to update Spark without updating all the applications.
> 
> - We added three new PMC members, Huaxin Gao, Gengliang Wang and Maxim Gekk, in June 2022.
> 
> - We added a new committer, Xinrong Meng, in July 2022.
> 
> Trademarks:
> 
> - No changes since the last report.
> 
> Latest releases:
> 
> - Spark 3.3.0 was released on June 16, 2022.
> - Spark 3.2.2 was released on July 17, 2022.
> - Spark 3.1.3 was released on February 18, 2022.
> 
> Committers and PMC:
> 
> - The latest committer was added on July 13rd, 2022 (Xinrong Meng).
> - The latest PMC member was added on June 28th, 2022 (Huaxin Gao).
> 
> =======================