You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Dongjoon Hyun <do...@apache.org> on 2021/06/16 21:57:53 UTC

UPDATE: Apache Spark 3.2 Release

This is a continuation of the previous thread, `Apache Spark 3.2 Expectation`, in order to give you updates.

- https://lists.apache.org/thread.html/r61897da071729913bf586ddd769311ce8b5b068e7156c352b51f7a33%40%3Cdev.spark.apache.org%3E

First of all, the AS-IS schedule is here

- https://spark.apache.org/versioning-policy.html

  July 1st Code freeze. Release branch cut.
  Mid July QA period. Focus on bug fixes, tests, stability and docs. Generally, no new features merged.
  August   Release candidates (RC), voting, etc. until final release passes

Second, Gengliang Wang volunteered as a release manager and started to work as a release manager. Thank you! He shared the on-going issues and I want to piggy-back the followings to his list.


# Languages

- Scala 2.13 Support: Although SPARK-25075 is almost done and we have Scala 2.13 Jenkins job on master branch, we do not support Scala 2.13.6. We should document it if Scala 2.13.7 is not arrived on time.
  Please see https://github.com/scala/scala/pull/9641 (Milestone Scala 2.13.7).

- SparkR CRAN publishing: Apache SparkR 3.1.2 is in CRAN as of today, but we get policy violation warnings for cache directory. The fix deadline is 2021-06-28. If that's going to be removed again, we need to retry via Apache Spark 3.2.0 after making some fix.
  https://cran.r-project.org/web/packages/SparkR/index.html


# Dependencies

- Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark 3.2 via SPARK-29250 today. We are observing big improvements in S3 use cases. Please try it and share your experience.

- Apache Hive 2.3.9 becomes the built-in Hive library with more HMS compatibility fixes recently. We need re-evaluate the previous HMS incompatibility reports.

- K8s 1.21 is released May 12th. K8s Client 5.4.1 supports it in Apache Spark 3.2. In addition, public cloud vendors start to support K8s 1.20. Please note that this is a breaking K8s API change from K8s Client 4.x to 5.x.

- SPARK-33913 upgraded Apache Kafka Client dependency to 2.8.0 and Kafka community is considering the deprecation of Scala 2.12 support at Apache Kafka 3.0.

- SPARK-34542 upgraded Apache Parquet dependency to 1.12.0. However, we need SPARK-34859 to fix column index issue before release. In addition, Apache Parquet encryption is added as a developer API. Custom KMS client should be implemented.

- SPARK-35489 upgraded Apache ORC dependency to 1.6.8. We still need ORC-804 for better masking feature additionally.

- SPARK-34651 improved ZStandard support with ZStandard 1.4.9 and we are currently evaluating newly arrived ZStandard 1.5.0 additionally. Currently, JDK11 performance is under investigation. In addition, SPARK-35181 (Use zstd for spark.io.compression.codec by default) is still on the way seperately.


# Newly arrived items

- SPARK-35779 Dynamic filtering for Data Source V2

- SPARK-35781 Support Spark on Apple Silicon on macOS natively

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: UPDATE: Apache Spark 3.2 Release

Posted by Dongjoon Hyun <do...@apache.org>.
Thank you for the correction, Yikun.
Yes, it's 3.3.1. :)

On 2021/06/17 09:03:55, Yikun Jiang <yi...@gmail.com> wrote: 
> - Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark
> 3.2 via SPARK-29250 today. We are observing big improvements in S3 use
> cases. Please try it and share your experience.
> 
> It should be  Apache Hadoop 3.3.1 [1]. : )
> 
> Note that Apache hadoop 3.3.0 is the first Hadoop release including x86 and
> aarch64, and 3.3.1 also. Very happy to see 3.3.1 can be the default
> dependency of Spark 3.2.0.
> 
> [1] https://hadoop.apache.org/release/3.3.1.html
> 
> Regards,
> Yikun
> 
> 
> Dongjoon Hyun <do...@apache.org> 于2021年6月17日周四 上午5:58写道:
> 
> > This is a continuation of the previous thread, `Apache Spark 3.2
> > Expectation`, in order to give you updates.
> >
> > -
> > https://lists.apache.org/thread.html/r61897da071729913bf586ddd769311ce8b5b068e7156c352b51f7a33%40%3Cdev.spark.apache.org%3E
> >
> > First of all, the AS-IS schedule is here
> >
> > - https://spark.apache.org/versioning-policy.html
> >
> >   July 1st Code freeze. Release branch cut.
> >   Mid July QA period. Focus on bug fixes, tests, stability and docs.
> > Generally, no new features merged.
> >   August   Release candidates (RC), voting, etc. until final release passes
> >
> > Second, Gengliang Wang volunteered as a release manager and started to
> > work as a release manager. Thank you! He shared the on-going issues and I
> > want to piggy-back the followings to his list.
> >
> >
> > # Languages
> >
> > - Scala 2.13 Support: Although SPARK-25075 is almost done and we have
> > Scala 2.13 Jenkins job on master branch, we do not support Scala 2.13.6. We
> > should document it if Scala 2.13.7 is not arrived on time.
> >   Please see https://github.com/scala/scala/pull/9641 (Milestone Scala
> > 2.13.7).
> >
> > - SparkR CRAN publishing: Apache SparkR 3.1.2 is in CRAN as of today, but
> > we get policy violation warnings for cache directory. The fix deadline is
> > 2021-06-28. If that's going to be removed again, we need to retry via
> > Apache Spark 3.2.0 after making some fix.
> >   https://cran.r-project.org/web/packages/SparkR/index.html
> >
> >
> > # Dependencies
> >
> > - Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark
> > 3.2 via SPARK-29250 today. We are observing big improvements in S3 use
> > cases. Please try it and share your experience.
> >
> > - Apache Hive 2.3.9 becomes the built-in Hive library with more HMS
> > compatibility fixes recently. We need re-evaluate the previous HMS
> > incompatibility reports.
> >
> > - K8s 1.21 is released May 12th. K8s Client 5.4.1 supports it in Apache
> > Spark 3.2. In addition, public cloud vendors start to support K8s 1.20.
> > Please note that this is a breaking K8s API change from K8s Client 4.x to
> > 5.x.
> >
> > - SPARK-33913 upgraded Apache Kafka Client dependency to 2.8.0 and Kafka
> > community is considering the deprecation of Scala 2.12 support at Apache
> > Kafka 3.0.
> >
> > - SPARK-34542 upgraded Apache Parquet dependency to 1.12.0. However, we
> > need SPARK-34859 to fix column index issue before release. In addition,
> > Apache Parquet encryption is added as a developer API. Custom KMS client
> > should be implemented.
> >
> > - SPARK-35489 upgraded Apache ORC dependency to 1.6.8. We still need
> > ORC-804 for better masking feature additionally.
> >
> > - SPARK-34651 improved ZStandard support with ZStandard 1.4.9 and we are
> > currently evaluating newly arrived ZStandard 1.5.0 additionally. Currently,
> > JDK11 performance is under investigation. In addition, SPARK-35181 (Use
> > zstd for spark.io.compression.codec by default) is still on the way
> > seperately.
> >
> >
> > # Newly arrived items
> >
> > - SPARK-35779 Dynamic filtering for Data Source V2
> >
> > - SPARK-35781 Support Spark on Apple Silicon on macOS natively
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: UPDATE: Apache Spark 3.2 Release

Posted by Yikun Jiang <yi...@gmail.com>.
- Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark
3.2 via SPARK-29250 today. We are observing big improvements in S3 use
cases. Please try it and share your experience.

It should be  Apache Hadoop 3.3.1 [1]. : )

Note that Apache hadoop 3.3.0 is the first Hadoop release including x86 and
aarch64, and 3.3.1 also. Very happy to see 3.3.1 can be the default
dependency of Spark 3.2.0.

[1] https://hadoop.apache.org/release/3.3.1.html

Regards,
Yikun


Dongjoon Hyun <do...@apache.org> 于2021年6月17日周四 上午5:58写道:

> This is a continuation of the previous thread, `Apache Spark 3.2
> Expectation`, in order to give you updates.
>
> -
> https://lists.apache.org/thread.html/r61897da071729913bf586ddd769311ce8b5b068e7156c352b51f7a33%40%3Cdev.spark.apache.org%3E
>
> First of all, the AS-IS schedule is here
>
> - https://spark.apache.org/versioning-policy.html
>
>   July 1st Code freeze. Release branch cut.
>   Mid July QA period. Focus on bug fixes, tests, stability and docs.
> Generally, no new features merged.
>   August   Release candidates (RC), voting, etc. until final release passes
>
> Second, Gengliang Wang volunteered as a release manager and started to
> work as a release manager. Thank you! He shared the on-going issues and I
> want to piggy-back the followings to his list.
>
>
> # Languages
>
> - Scala 2.13 Support: Although SPARK-25075 is almost done and we have
> Scala 2.13 Jenkins job on master branch, we do not support Scala 2.13.6. We
> should document it if Scala 2.13.7 is not arrived on time.
>   Please see https://github.com/scala/scala/pull/9641 (Milestone Scala
> 2.13.7).
>
> - SparkR CRAN publishing: Apache SparkR 3.1.2 is in CRAN as of today, but
> we get policy violation warnings for cache directory. The fix deadline is
> 2021-06-28. If that's going to be removed again, we need to retry via
> Apache Spark 3.2.0 after making some fix.
>   https://cran.r-project.org/web/packages/SparkR/index.html
>
>
> # Dependencies
>
> - Apache Hadoop 3.3.2 becomes the default Hadoop profile for Apache Spark
> 3.2 via SPARK-29250 today. We are observing big improvements in S3 use
> cases. Please try it and share your experience.
>
> - Apache Hive 2.3.9 becomes the built-in Hive library with more HMS
> compatibility fixes recently. We need re-evaluate the previous HMS
> incompatibility reports.
>
> - K8s 1.21 is released May 12th. K8s Client 5.4.1 supports it in Apache
> Spark 3.2. In addition, public cloud vendors start to support K8s 1.20.
> Please note that this is a breaking K8s API change from K8s Client 4.x to
> 5.x.
>
> - SPARK-33913 upgraded Apache Kafka Client dependency to 2.8.0 and Kafka
> community is considering the deprecation of Scala 2.12 support at Apache
> Kafka 3.0.
>
> - SPARK-34542 upgraded Apache Parquet dependency to 1.12.0. However, we
> need SPARK-34859 to fix column index issue before release. In addition,
> Apache Parquet encryption is added as a developer API. Custom KMS client
> should be implemented.
>
> - SPARK-35489 upgraded Apache ORC dependency to 1.6.8. We still need
> ORC-804 for better masking feature additionally.
>
> - SPARK-34651 improved ZStandard support with ZStandard 1.4.9 and we are
> currently evaluating newly arrived ZStandard 1.5.0 additionally. Currently,
> JDK11 performance is under investigation. In addition, SPARK-35181 (Use
> zstd for spark.io.compression.codec by default) is still on the way
> seperately.
>
>
> # Newly arrived items
>
> - SPARK-35779 Dynamic filtering for Data Source V2
>
> - SPARK-35781 Support Spark on Apple Silicon on macOS natively
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>