You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Gengliang Wang <lt...@gmail.com> on 2021/06/15 07:17:15 UTC

Re: Apache Spark 3.2 Expectation

Hi,

As the expected release date is close,  I would like to volunteer as the
release manager for Apache Spark 3.2.0.

Thanks,
Gengliang

On Mon, Apr 12, 2021 at 1:59 PM Wenchen Fan <cl...@gmail.com> wrote:

> An update: we found a mistake that we picked the Spark 3.2 release date
> based on the scheduled release date of 3.1. However, 3.1 was delayed and
> released on March 2. In order to have a full 6 months development for 3.2,
> the target release date for 3.2 should be September 2.
>
> I'm updating the release dates in
> https://github.com/apache/spark-website/pull/331
>
> Thanks,
> Wenchen
>
> On Thu, Mar 11, 2021 at 11:17 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Thank you, Xiao, Wenchen and Hyukjin.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Mar 11, 2021 at 2:15 AM Hyukjin Kwon <gu...@gmail.com> wrote:
>>
>>> Just for an update, I will send a discussion email about my idea late
>>> this week or early next week.
>>>
>>> 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan <cl...@gmail.com>님이 작성:
>>>
>>>> There are many projects going on right now, such as new DS v2 APIs,
>>>> ANSI interval types, join improvement, disaggregated shuffle, etc. I don't
>>>> think it's realistic to do the branch cut in April.
>>>>
>>>> I'm +1 to release 3.2 around July, but it doesn't mean we have to cut
>>>> the branch 3 months earlier. We should make the release process faster and
>>>> cut the branch around June probably.
>>>>
>>>>
>>>>
>>>> On Thu, Mar 11, 2021 at 4:41 AM Xiao Li <ga...@gmail.com> wrote:
>>>>
>>>>> Below are some nice-to-have features we can work on in Spark 3.2: Lateral
>>>>> Join support <https://issues.apache.org/jira/browse/SPARK-28379>,
>>>>> interval data type, timestamp without time zone, un-nesting arbitrary
>>>>> queries, the returned metrics of DSV2, and error message standardization.
>>>>> Spark 3.2 will be another exciting release I believe!
>>>>>
>>>>> Go Spark!
>>>>>
>>>>> Xiao
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Dongjoon Hyun <do...@gmail.com> 于2021年3月10日周三 下午12:25写道:
>>>>>
>>>>>> Hi, Xiao.
>>>>>>
>>>>>> This thread started 13 days ago. Since you asked the community about
>>>>>> major features or timelines at that time, could you share your roadmap or
>>>>>> expectations if you have something in your mind?
>>>>>>
>>>>>> > Thank you, Dongjoon, for initiating this discussion. Let us keep it
>>>>>> open. It might take 1-2 weeks to collect from the community all the
>>>>>> features we plan to build and ship in 3.2 since we just finished the 3.1
>>>>>> voting.
>>>>>> > TBH, cutting the branch this April does not look good to me. That
>>>>>> means, we only have one month left for feature development of Spark 3.2. Do
>>>>>> we have enough features in the current master branch? If not, are we able
>>>>>> to finish major features we collected here? Do they have a timeline or
>>>>>> project plan?
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 3, 2021 at 2:58 PM Dongjoon Hyun <do...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, John.
>>>>>>>
>>>>>>> This thread aims to share your expectations and goals (and maybe
>>>>>>> work progress) to Apache Spark 3.2 because we are making this together. :)
>>>>>>>
>>>>>>> Bests,
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 3, 2021 at 1:59 PM John Zhuge <jz...@apache.org> wrote:
>>>>>>>
>>>>>>>> Hi Dongjoon,
>>>>>>>>
>>>>>>>> Is it possible to get ViewCatalog in? The community already had
>>>>>>>> fairly detailed discussions.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> John
>>>>>>>>
>>>>>>>> On Thu, Feb 25, 2021 at 8:57 AM Dongjoon Hyun <
>>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi, All.
>>>>>>>>>
>>>>>>>>> Since we have been preparing Apache Spark 3.2.0 in master branch
>>>>>>>>> since December 2020, March seems to be a good time to share our thoughts
>>>>>>>>> and aspirations on Apache Spark 3.2.
>>>>>>>>>
>>>>>>>>> According to the progress on Apache Spark 3.1 release, Apache
>>>>>>>>> Spark 3.2 seems to be the last minor release of this year. Given the
>>>>>>>>> timeframe, we might consider the following. (This is a small set. Please
>>>>>>>>> add your thoughts to this limited list.)
>>>>>>>>>
>>>>>>>>> # Languages
>>>>>>>>>
>>>>>>>>> - Scala 2.13 Support: This was expected on 3.1 via SPARK-25075 but
>>>>>>>>> slipped out. Currently, we are trying to use Scala 2.13.5 via SPARK-34505
>>>>>>>>> and investigating the publishing issue. Thank you for your contributions
>>>>>>>>> and feedback on this.
>>>>>>>>>
>>>>>>>>> - Java 17 LTS Support: Java 17 LTS will arrive in September 2017.
>>>>>>>>> Like Java 11, we need lots of support from our dependencies. Let's see.
>>>>>>>>>
>>>>>>>>> - Python 3.6 Deprecation(?): Python 3.6 community support ends at
>>>>>>>>> 2021-12-23. So, the deprecation is not required yet, but we had better
>>>>>>>>> prepare it because we don't have an ETA of Apache Spark 3.3 in 2022.
>>>>>>>>>
>>>>>>>>> - SparkR CRAN publishing: As we know, it's discontinued so far.
>>>>>>>>> Resuming it depends on the success of Apache SparkR 3.1.1 CRAN publishing.
>>>>>>>>> If it succeeds to revive it, we can keep publishing. Otherwise, I believe
>>>>>>>>> we had better drop it from the releasing work item list officially.
>>>>>>>>>
>>>>>>>>> # Dependencies
>>>>>>>>>
>>>>>>>>> - Apache Hadoop 3.3.2: Hadoop 3.2.0 becomes the default Hadoop
>>>>>>>>> profile in Apache Spark 3.1. Currently, Spark master branch lives on Hadoop
>>>>>>>>> 3.2.2's shaded clients via SPARK-33212. So far, there is one on-going
>>>>>>>>> report at YARN environment. We hope it will be fixed soon at Spark 3.2
>>>>>>>>> timeframe and we can move toward Hadoop 3.3.2.
>>>>>>>>>
>>>>>>>>> - Apache Hive 2.3.9: Spark 3.0 starts to use Hive 2.3.7 by default
>>>>>>>>> instead of old Hive 1.2 fork. Spark 3.1 removed hive-1.2 profile completely
>>>>>>>>> via SPARK-32981 and replaced the generated hive-service-rpc code with the
>>>>>>>>> official dependency via SPARK-32981. We are steadily improving this area
>>>>>>>>> and will consume Hive 2.3.9 if available.
>>>>>>>>>
>>>>>>>>> - K8s Client 4.13.2: During K8s GA activity, Spark 3.1 upgrades
>>>>>>>>> K8s client dependency to 4.12.0. Spark 3.2 upgrades it to 4.13.2 in order
>>>>>>>>> to support K8s model 1.19.
>>>>>>>>>
>>>>>>>>> - Kafka Client 2.8: To bring the client fixes, Spark 3.1 is using
>>>>>>>>> Kafka Client 2.6. For Spark 3.2, SPARK-33913 upgraded to Kafka 2.7 with
>>>>>>>>> Scala 2.12.13, but it was reverted later due to Scala 2.12.13 issue. Since
>>>>>>>>> KAFKA-12357 fixed the Scala requirement two days ago, Spark 3.2 will go
>>>>>>>>> with Kafka Client 2.8 hopefully.
>>>>>>>>>
>>>>>>>>> # Some Features
>>>>>>>>>
>>>>>>>>> - Data Source v2: Spark 3.2 will deliver much richer DSv2 with
>>>>>>>>> Apache Iceberg integration. Especially, we hope the on-going function
>>>>>>>>> catalog SPIP and up-coming storage partitioned join SPIP can be delivered
>>>>>>>>> as a part of Spark 3.2 and become an additional foundation.
>>>>>>>>>
>>>>>>>>> - Columnar Encryption: As of today, Apache Spark master branch
>>>>>>>>> supports columnar encryption via Apache ORC 1.6 and it's documented via
>>>>>>>>> SPARK-34036. Also, upcoming Apache Parquet 1.12 has a similar capability.
>>>>>>>>> Hopefully, Apache Spark 3.2 is going to be the first release to have this
>>>>>>>>> feature officially. Any feedback is welcome.
>>>>>>>>>
>>>>>>>>> - Improved ZStandard Support: Spark 3.2 will bring more benefits
>>>>>>>>> for ZStandard users: 1) SPARK-34340 added native ZSTD JNI buffer pool
>>>>>>>>> support for all IO operations, 2) SPARK-33978 makes ORC datasource support
>>>>>>>>> ZSTD compression, 3) SPARK-34503 sets ZSTD as the default codec for event
>>>>>>>>> log compression, 4) SPARK-34479 aims to support ZSTD at Avro data source.
>>>>>>>>> Also, the upcoming Parquet 1.12 supports ZSTD (and supports JNI buffer
>>>>>>>>> pool), too. I'm expecting more benefits.
>>>>>>>>>
>>>>>>>>> - Structure Streaming with RocksDB backend: According to the
>>>>>>>>> latest update, it looks active enough for merging to master branch in Spark
>>>>>>>>> 3.2.
>>>>>>>>>
>>>>>>>>> Please share your thoughts and let's build better Apache Spark 3.2
>>>>>>>>> together.
>>>>>>>>>
>>>>>>>>> Bests,
>>>>>>>>> Dongjoon.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> John Zhuge
>>>>>>>>
>>>>>>>

Re: Apache Spark 3.2 Expectation

Posted by Gengliang Wang <lt...@gmail.com>.
Hi all,

I just cut branch-3.2 on Github and created version 3.3.0 on Jira.
When merging PRs on the master branch before 3.2.0 RC, please help
cherry-picking bug fixes and ongoing major features mentioned in this
thread to branch-3.2, thanks!

On Fri, Jul 2, 2021 at 2:31 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you, Gengliang!
>
> On Wed, Jun 30, 2021 at 10:56 PM Gengliang Wang <lt...@gmail.com> wrote:
>
>> Hi all,
>>
>> Just as a gentle reminder, I will do the branch cut tomorrow. Please
>> focus on finalizing the works to land in Spark 3.2.0.
>> After the branch cut, we can still merge the ongoing major features
>> mentioned in this thread. There should no be other new features in branch
>> 3.2.
>> Thanks!
>>
>> On Thu, Jun 17, 2021 at 2:57 PM Hyukjin Kwon <gu...@gmail.com> wrote:
>>
>>> *GA -> QA
>>>
>>> On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>>
>>>> I think we would make sure treating these items in the list as
>>>> exceptions from the code freeze, and discourage to push new APIs and
>>>> features though.
>>>>
>>>> GA period ideally we should focus on bug fixes and polishing.
>>>>
>>>> It would be great if we can speed up on these items in the list too.
>>>>
>>>>
>>>> On Thu, 17 Jun 2021, 15:08 Gengliang Wang, <lt...@gmail.com> wrote:
>>>>
>>>>> Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
>>>>> Now we make it clear that it's a soft cut and we can still merge
>>>>> important code changes to branch-3.2 before RC. Let's keep the branch cut
>>>>> date as July 1st.
>>>>>
>>>>> On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> > First, I think you are saying "branch-3.2";
>>>>>>
>>>>>> To Xiao. Yes, it's was a typo of "branch-3.2".
>>>>>>
>>>>>> > We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>>> all the patches under SPARK-30602.
>>>>>> > This way, we can backport the other performance/operability
>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>> future Spark 3.2.x patch releases.
>>>>>>
>>>>>> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+
>>>>>> as Xiao wrote.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li <ga...@gmail.com> wrote:
>>>>>>
>>>>>>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>>>>>> soft cut and the committers still are able to commit to `branch-3.3`
>>>>>>>> according to their decisions.
>>>>>>>
>>>>>>>
>>>>>>> First, I think you are saying "branch-3.2";
>>>>>>>
>>>>>>> Second, the "so cut" means no "code freeze", although we cut the
>>>>>>> branch. To avoid releasing half-baked and unready features, the release
>>>>>>> manager needs to be very careful when cutting the RC. Based on what is
>>>>>>> proposed here, the RC date is the actual code freeze date.
>>>>>>>
>>>>>>> This way, we can backport the other performance/operability
>>>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>>>> future Spark 3.2.x patch releases.
>>>>>>>
>>>>>>>
>>>>>>> This is not allowed based on the policy. Only bug fixes can be
>>>>>>> merged to the patch releases. Thus, if we know it will introduce major
>>>>>>> performance regression, we have to turn the feature off by default.
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:
>>>>>>>
>>>>>>>> Hi Gengliang,
>>>>>>>>
>>>>>>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>>>>>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we
>>>>>>>> are close to having all the patches merged to master to enable push-based
>>>>>>>> shuffle.
>>>>>>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>>>>>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>>>>>>> We should be able to post the PRs for the other 2 remaining tickets
>>>>>>>> (SPARK-32923 and SPARK-35546) early next week.
>>>>>>>>
>>>>>>>> The tickets under SPARK-30602 are the minimum set of patches to
>>>>>>>> enable push-based shuffle.
>>>>>>>> We do have other performance/operability enhancements tickets under
>>>>>>>> SPARK-33235 that are needed to fully contribute what we have internally for
>>>>>>>> push-based shuffle.
>>>>>>>> However, these are optional for enabling push-based shuffle.
>>>>>>>> We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>>>>> all the patches under SPARK-30602.
>>>>>>>> This way, we can backport the other performance/operability
>>>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>>>> future Spark 3.2.x patch releases.
>>>>>>>> I understand the preference of not postponing the branch cut date.
>>>>>>>> We will check with Dongjoon regarding the soft cut date and the
>>>>>>>> flexibility for including the remaining tickets under SPARK-30602 into
>>>>>>>> branch-3.2.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Min
>>>>>>>>
>>>>>>>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more
>>>>>>>>> this.
>>>>>>>>> As it is soft cut date, there is no reason to postpone it.
>>>>>>>>>
>>>>>>>>> It sounds good then to keep original branch cut date.
>>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dongjoon Hyun-2 wrote
>>>>>>>>> > Thank you for volunteering, Gengliang.
>>>>>>>>> >
>>>>>>>>> > Apache Spark 3.2.0 is the first version enabling AQE by default.
>>>>>>>>> I'm also
>>>>>>>>> > watching some on-going improvements on that.
>>>>>>>>> >
>>>>>>>>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL
>>>>>>>>> Adaptive Query
>>>>>>>>> > Execution QA)
>>>>>>>>> >
>>>>>>>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this
>>>>>>>>> is a soft
>>>>>>>>> > cut and the committers still are able to commit to `branch-3.3`
>>>>>>>>> according
>>>>>>>>> > to their decisions.
>>>>>>>>> >
>>>>>>>>> > Given that Apache Spark had 115 commits in a week in various
>>>>>>>>> areas
>>>>>>>>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>>>>>>>>> > branch-3.3 and allowing only limited backporting.
>>>>>>>>> >
>>>>>>>>> >     https://github.com/apache/spark/graphs/commit-activity
>>>>>>>>> >
>>>>>>>>> > Bests,
>>>>>>>>> > Dongjoon.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>>>>>>>>
>>>>>>>>> > viirya@
>>>>>>>>>
>>>>>>>>> > &gt; wrote:
>>>>>>>>> >
>>>>>>>>> >> First, thanks for being volunteer as the release manager of
>>>>>>>>> Spark 3.2.0,
>>>>>>>>> >> Gengliang!
>>>>>>>>> >>
>>>>>>>>> >> And yes, for the two important Structured Streaming features,
>>>>>>>>> RocksDB
>>>>>>>>> >> StateStore and session window, we're working on them and expect
>>>>>>>>> to have
>>>>>>>>> >> them
>>>>>>>>> >> in the new release.
>>>>>>>>> >>
>>>>>>>>> >> So I propose to postpone the branch cut date.
>>>>>>>>> >>
>>>>>>>>> >> Thank you!
>>>>>>>>> >>
>>>>>>>>> >> Liang-Chi
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> Gengliang Wang-2 wrote
>>>>>>>>> >> > Thanks, Hyukjin.
>>>>>>>>> >> >
>>>>>>>>> >> > The expected target branch cut date of Spark 3.2 is *July
>>>>>>>>> 1st* on
>>>>>>>>> >> > https://spark.apache.org/versioning-policy.html. However, I
>>>>>>>>> notice that
>>>>>>>>> >> > there are still multiple important projects in progress now:
>>>>>>>>> >> >
>>>>>>>>> >> > [Core]
>>>>>>>>> >> >
>>>>>>>>> >> >    - SPIP: Support push-based shuffle to improve shuffle
>>>>>>>>> efficiency
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>>>>>>>>> >> >
>>>>>>>>> >> > [SQL]
>>>>>>>>> >> >
>>>>>>>>> >> >    - Support ANSI SQL INTERVAL types
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>>>>>>>>> >> >    - Support Timestamp without time zone data type
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>>>>>>>>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>>>>>>>>> >> >
>>>>>>>>> >> > [Streaming]
>>>>>>>>> >> >
>>>>>>>>> >> >    - EventTime based sessionization (session window)
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>>>>>>>>> >> >    - Add RocksDB StateStore as external module
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > I wonder whether we should postpone the branch cut date.
>>>>>>>>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim,
>>>>>>>>> Yuanjian
>>>>>>>>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>>>>>>>>> >> >
>>>>>>>>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>>>>>>>> >>
>>>>>>>>> >> > gurwls223@
>>>>>>>>> >>
>>>>>>>>> >> > &gt; wrote:
>>>>>>>>> >> >
>>>>>>>>> >> >> +1, thanks.
>>>>>>>>> >> >>
>>>>>>>>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>>>>>>>> >>
>>>>>>>>> >> > ltnwgl@
>>>>>>>>> >>
>>>>>>>>> >> > &gt; wrote:
>>>>>>>>> >> >>
>>>>>>>>> >> >>> Hi,
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> As the expected release date is close,  I would like to
>>>>>>>>> volunteer as
>>>>>>>>> >> the
>>>>>>>>> >> >>> release manager for Apache Spark 3.2.0.
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> Thanks,
>>>>>>>>> >> >>> Gengliang
>>>>>>>>> >> >>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Sent from:
>>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> >> To unsubscribe e-mail:
>>>>>>>>>
>>>>>>>>> > dev-unsubscribe@.apache
>>>>>>>>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sent from:
>>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>>
>>>>>>>>>

Re: Apache Spark 3.2 Expectation

Posted by Dongjoon Hyun <do...@gmail.com>.
Thank you, Gengliang!

On Wed, Jun 30, 2021 at 10:56 PM Gengliang Wang <lt...@gmail.com> wrote:

> Hi all,
>
> Just as a gentle reminder, I will do the branch cut tomorrow. Please
> focus on finalizing the works to land in Spark 3.2.0.
> After the branch cut, we can still merge the ongoing major features
> mentioned in this thread. There should no be other new features in branch
> 3.2.
> Thanks!
>
> On Thu, Jun 17, 2021 at 2:57 PM Hyukjin Kwon <gu...@gmail.com> wrote:
>
>> *GA -> QA
>>
>> On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>
>>> I think we would make sure treating these items in the list as
>>> exceptions from the code freeze, and discourage to push new APIs and
>>> features though.
>>>
>>> GA period ideally we should focus on bug fixes and polishing.
>>>
>>> It would be great if we can speed up on these items in the list too.
>>>
>>>
>>> On Thu, 17 Jun 2021, 15:08 Gengliang Wang, <lt...@gmail.com> wrote:
>>>
>>>> Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
>>>> Now we make it clear that it's a soft cut and we can still merge
>>>> important code changes to branch-3.2 before RC. Let's keep the branch cut
>>>> date as July 1st.
>>>>
>>>> On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> > First, I think you are saying "branch-3.2";
>>>>>
>>>>> To Xiao. Yes, it's was a typo of "branch-3.2".
>>>>>
>>>>> > We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>> all the patches under SPARK-30602.
>>>>> > This way, we can backport the other performance/operability
>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>> future Spark 3.2.x patch releases.
>>>>>
>>>>> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+
>>>>> as Xiao wrote.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li <ga...@gmail.com> wrote:
>>>>>
>>>>>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>>>>> soft cut and the committers still are able to commit to `branch-3.3`
>>>>>>> according to their decisions.
>>>>>>
>>>>>>
>>>>>> First, I think you are saying "branch-3.2";
>>>>>>
>>>>>> Second, the "so cut" means no "code freeze", although we cut the
>>>>>> branch. To avoid releasing half-baked and unready features, the release
>>>>>> manager needs to be very careful when cutting the RC. Based on what is
>>>>>> proposed here, the RC date is the actual code freeze date.
>>>>>>
>>>>>> This way, we can backport the other performance/operability
>>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>>> future Spark 3.2.x patch releases.
>>>>>>
>>>>>>
>>>>>> This is not allowed based on the policy. Only bug fixes can be merged
>>>>>> to the patch releases. Thus, if we know it will introduce major performance
>>>>>> regression, we have to turn the feature off by default.
>>>>>>
>>>>>> Xiao
>>>>>>
>>>>>>
>>>>>>
>>>>>> Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:
>>>>>>
>>>>>>> Hi Gengliang,
>>>>>>>
>>>>>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>>>>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we
>>>>>>> are close to having all the patches merged to master to enable push-based
>>>>>>> shuffle.
>>>>>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>>>>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>>>>>> We should be able to post the PRs for the other 2 remaining tickets
>>>>>>> (SPARK-32923 and SPARK-35546) early next week.
>>>>>>>
>>>>>>> The tickets under SPARK-30602 are the minimum set of patches to
>>>>>>> enable push-based shuffle.
>>>>>>> We do have other performance/operability enhancements tickets under
>>>>>>> SPARK-33235 that are needed to fully contribute what we have internally for
>>>>>>> push-based shuffle.
>>>>>>> However, these are optional for enabling push-based shuffle.
>>>>>>> We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>>>> all the patches under SPARK-30602.
>>>>>>> This way, we can backport the other performance/operability
>>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>>> future Spark 3.2.x patch releases.
>>>>>>> I understand the preference of not postponing the branch cut date.
>>>>>>> We will check with Dongjoon regarding the soft cut date and the
>>>>>>> flexibility for including the remaining tickets under SPARK-30602 into
>>>>>>> branch-3.2.
>>>>>>>
>>>>>>> Best,
>>>>>>> Min
>>>>>>>
>>>>>>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more
>>>>>>>> this.
>>>>>>>> As it is soft cut date, there is no reason to postpone it.
>>>>>>>>
>>>>>>>> It sounds good then to keep original branch cut date.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Dongjoon Hyun-2 wrote
>>>>>>>> > Thank you for volunteering, Gengliang.
>>>>>>>> >
>>>>>>>> > Apache Spark 3.2.0 is the first version enabling AQE by default.
>>>>>>>> I'm also
>>>>>>>> > watching some on-going improvements on that.
>>>>>>>> >
>>>>>>>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL
>>>>>>>> Adaptive Query
>>>>>>>> > Execution QA)
>>>>>>>> >
>>>>>>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this
>>>>>>>> is a soft
>>>>>>>> > cut and the committers still are able to commit to `branch-3.3`
>>>>>>>> according
>>>>>>>> > to their decisions.
>>>>>>>> >
>>>>>>>> > Given that Apache Spark had 115 commits in a week in various areas
>>>>>>>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>>>>>>>> > branch-3.3 and allowing only limited backporting.
>>>>>>>> >
>>>>>>>> >     https://github.com/apache/spark/graphs/commit-activity
>>>>>>>> >
>>>>>>>> > Bests,
>>>>>>>> > Dongjoon.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>>>>>>>
>>>>>>>> > viirya@
>>>>>>>>
>>>>>>>> > &gt; wrote:
>>>>>>>> >
>>>>>>>> >> First, thanks for being volunteer as the release manager of
>>>>>>>> Spark 3.2.0,
>>>>>>>> >> Gengliang!
>>>>>>>> >>
>>>>>>>> >> And yes, for the two important Structured Streaming features,
>>>>>>>> RocksDB
>>>>>>>> >> StateStore and session window, we're working on them and expect
>>>>>>>> to have
>>>>>>>> >> them
>>>>>>>> >> in the new release.
>>>>>>>> >>
>>>>>>>> >> So I propose to postpone the branch cut date.
>>>>>>>> >>
>>>>>>>> >> Thank you!
>>>>>>>> >>
>>>>>>>> >> Liang-Chi
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> Gengliang Wang-2 wrote
>>>>>>>> >> > Thanks, Hyukjin.
>>>>>>>> >> >
>>>>>>>> >> > The expected target branch cut date of Spark 3.2 is *July 1st*
>>>>>>>> on
>>>>>>>> >> > https://spark.apache.org/versioning-policy.html. However, I
>>>>>>>> notice that
>>>>>>>> >> > there are still multiple important projects in progress now:
>>>>>>>> >> >
>>>>>>>> >> > [Core]
>>>>>>>> >> >
>>>>>>>> >> >    - SPIP: Support push-based shuffle to improve shuffle
>>>>>>>> efficiency
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>>>>>>>> >> >
>>>>>>>> >> > [SQL]
>>>>>>>> >> >
>>>>>>>> >> >    - Support ANSI SQL INTERVAL types
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>>>>>>>> >> >    - Support Timestamp without time zone data type
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>>>>>>>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>>>>>>>> >> >
>>>>>>>> >> > [Streaming]
>>>>>>>> >> >
>>>>>>>> >> >    - EventTime based sessionization (session window)
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>>>>>>>> >> >    - Add RocksDB StateStore as external module
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > I wonder whether we should postpone the branch cut date.
>>>>>>>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim,
>>>>>>>> Yuanjian
>>>>>>>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>>>>>>>> >> >
>>>>>>>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>>>>>>> >>
>>>>>>>> >> > gurwls223@
>>>>>>>> >>
>>>>>>>> >> > &gt; wrote:
>>>>>>>> >> >
>>>>>>>> >> >> +1, thanks.
>>>>>>>> >> >>
>>>>>>>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>>>>>>> >>
>>>>>>>> >> > ltnwgl@
>>>>>>>> >>
>>>>>>>> >> > &gt; wrote:
>>>>>>>> >> >>
>>>>>>>> >> >>> Hi,
>>>>>>>> >> >>>
>>>>>>>> >> >>> As the expected release date is close,  I would like to
>>>>>>>> volunteer as
>>>>>>>> >> the
>>>>>>>> >> >>> release manager for Apache Spark 3.2.0.
>>>>>>>> >> >>>
>>>>>>>> >> >>> Thanks,
>>>>>>>> >> >>> Gengliang
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Sent from:
>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >> To unsubscribe e-mail:
>>>>>>>>
>>>>>>>> > dev-unsubscribe@.apache
>>>>>>>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from:
>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>
>>>>>>>>

Re: Apache Spark 3.2 Expectation

Posted by Gengliang Wang <lt...@gmail.com>.
Hi all,

Just as a gentle reminder, I will do the branch cut tomorrow. Please focus
on finalizing the works to land in Spark 3.2.0.
After the branch cut, we can still merge the ongoing major features
mentioned in this thread. There should no be other new features in branch
3.2.
Thanks!

On Thu, Jun 17, 2021 at 2:57 PM Hyukjin Kwon <gu...@gmail.com> wrote:

> *GA -> QA
>
> On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon, <gu...@gmail.com> wrote:
>
>> I think we would make sure treating these items in the list as exceptions
>> from the code freeze, and discourage to push new APIs and features though.
>>
>> GA period ideally we should focus on bug fixes and polishing.
>>
>> It would be great if we can speed up on these items in the list too.
>>
>>
>> On Thu, 17 Jun 2021, 15:08 Gengliang Wang, <lt...@gmail.com> wrote:
>>
>>> Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
>>> Now we make it clear that it's a soft cut and we can still merge
>>> important code changes to branch-3.2 before RC. Let's keep the branch cut
>>> date as July 1st.
>>>
>>> On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun <do...@gmail.com>
>>> wrote:
>>>
>>>> > First, I think you are saying "branch-3.2";
>>>>
>>>> To Xiao. Yes, it's was a typo of "branch-3.2".
>>>>
>>>> > We do strongly prefer to cut the release for Spark 3.2.0 including
>>>> all the patches under SPARK-30602.
>>>> > This way, we can backport the other performance/operability
>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>> future Spark 3.2.x patch releases.
>>>>
>>>> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+ as
>>>> Xiao wrote.
>>>>
>>>>
>>>>
>>>> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li <ga...@gmail.com> wrote:
>>>>
>>>>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>>>> soft cut and the committers still are able to commit to `branch-3.3`
>>>>>> according to their decisions.
>>>>>
>>>>>
>>>>> First, I think you are saying "branch-3.2";
>>>>>
>>>>> Second, the "so cut" means no "code freeze", although we cut the
>>>>> branch. To avoid releasing half-baked and unready features, the release
>>>>> manager needs to be very careful when cutting the RC. Based on what is
>>>>> proposed here, the RC date is the actual code freeze date.
>>>>>
>>>>> This way, we can backport the other performance/operability
>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>> future Spark 3.2.x patch releases.
>>>>>
>>>>>
>>>>> This is not allowed based on the policy. Only bug fixes can be merged
>>>>> to the patch releases. Thus, if we know it will introduce major performance
>>>>> regression, we have to turn the feature off by default.
>>>>>
>>>>> Xiao
>>>>>
>>>>>
>>>>>
>>>>> Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:
>>>>>
>>>>>> Hi Gengliang,
>>>>>>
>>>>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>>>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we
>>>>>> are close to having all the patches merged to master to enable push-based
>>>>>> shuffle.
>>>>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>>>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>>>>> We should be able to post the PRs for the other 2 remaining tickets
>>>>>> (SPARK-32923 and SPARK-35546) early next week.
>>>>>>
>>>>>> The tickets under SPARK-30602 are the minimum set of patches to
>>>>>> enable push-based shuffle.
>>>>>> We do have other performance/operability enhancements tickets under
>>>>>> SPARK-33235 that are needed to fully contribute what we have internally for
>>>>>> push-based shuffle.
>>>>>> However, these are optional for enabling push-based shuffle.
>>>>>> We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>>> all the patches under SPARK-30602.
>>>>>> This way, we can backport the other performance/operability
>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>> future Spark 3.2.x patch releases.
>>>>>> I understand the preference of not postponing the branch cut date.
>>>>>> We will check with Dongjoon regarding the soft cut date and the
>>>>>> flexibility for including the remaining tickets under SPARK-30602 into
>>>>>> branch-3.2.
>>>>>>
>>>>>> Best,
>>>>>> Min
>>>>>>
>>>>>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more this.
>>>>>>> As it is soft cut date, there is no reason to postpone it.
>>>>>>>
>>>>>>> It sounds good then to keep original branch cut date.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dongjoon Hyun-2 wrote
>>>>>>> > Thank you for volunteering, Gengliang.
>>>>>>> >
>>>>>>> > Apache Spark 3.2.0 is the first version enabling AQE by default.
>>>>>>> I'm also
>>>>>>> > watching some on-going improvements on that.
>>>>>>> >
>>>>>>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL
>>>>>>> Adaptive Query
>>>>>>> > Execution QA)
>>>>>>> >
>>>>>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this is
>>>>>>> a soft
>>>>>>> > cut and the committers still are able to commit to `branch-3.3`
>>>>>>> according
>>>>>>> > to their decisions.
>>>>>>> >
>>>>>>> > Given that Apache Spark had 115 commits in a week in various areas
>>>>>>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>>>>>>> > branch-3.3 and allowing only limited backporting.
>>>>>>> >
>>>>>>> >     https://github.com/apache/spark/graphs/commit-activity
>>>>>>> >
>>>>>>> > Bests,
>>>>>>> > Dongjoon.
>>>>>>> >
>>>>>>> >
>>>>>>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>>>>>>
>>>>>>> > viirya@
>>>>>>>
>>>>>>> > &gt; wrote:
>>>>>>> >
>>>>>>> >> First, thanks for being volunteer as the release manager of Spark
>>>>>>> 3.2.0,
>>>>>>> >> Gengliang!
>>>>>>> >>
>>>>>>> >> And yes, for the two important Structured Streaming features,
>>>>>>> RocksDB
>>>>>>> >> StateStore and session window, we're working on them and expect
>>>>>>> to have
>>>>>>> >> them
>>>>>>> >> in the new release.
>>>>>>> >>
>>>>>>> >> So I propose to postpone the branch cut date.
>>>>>>> >>
>>>>>>> >> Thank you!
>>>>>>> >>
>>>>>>> >> Liang-Chi
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> Gengliang Wang-2 wrote
>>>>>>> >> > Thanks, Hyukjin.
>>>>>>> >> >
>>>>>>> >> > The expected target branch cut date of Spark 3.2 is *July 1st*
>>>>>>> on
>>>>>>> >> > https://spark.apache.org/versioning-policy.html. However, I
>>>>>>> notice that
>>>>>>> >> > there are still multiple important projects in progress now:
>>>>>>> >> >
>>>>>>> >> > [Core]
>>>>>>> >> >
>>>>>>> >> >    - SPIP: Support push-based shuffle to improve shuffle
>>>>>>> efficiency
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>>>>>>> >> >
>>>>>>> >> > [SQL]
>>>>>>> >> >
>>>>>>> >> >    - Support ANSI SQL INTERVAL types
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>>>>>>> >> >    - Support Timestamp without time zone data type
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>>>>>>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>>>>>>> >> >
>>>>>>> >> > [Streaming]
>>>>>>> >> >
>>>>>>> >> >    - EventTime based sessionization (session window)
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>>>>>>> >> >    - Add RocksDB StateStore as external module
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > I wonder whether we should postpone the branch cut date.
>>>>>>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
>>>>>>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>>>>>>> >> >
>>>>>>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>>>>>> >>
>>>>>>> >> > gurwls223@
>>>>>>> >>
>>>>>>> >> > &gt; wrote:
>>>>>>> >> >
>>>>>>> >> >> +1, thanks.
>>>>>>> >> >>
>>>>>>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>>>>>> >>
>>>>>>> >> > ltnwgl@
>>>>>>> >>
>>>>>>> >> > &gt; wrote:
>>>>>>> >> >>
>>>>>>> >> >>> Hi,
>>>>>>> >> >>>
>>>>>>> >> >>> As the expected release date is close,  I would like to
>>>>>>> volunteer as
>>>>>>> >> the
>>>>>>> >> >>> release manager for Apache Spark 3.2.0.
>>>>>>> >> >>>
>>>>>>> >> >>> Thanks,
>>>>>>> >> >>> Gengliang
>>>>>>> >> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Sent from:
>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>> >>
>>>>>>> >>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> >> To unsubscribe e-mail:
>>>>>>>
>>>>>>> > dev-unsubscribe@.apache
>>>>>>>
>>>>>>> >>
>>>>>>> >>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from:
>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>
>>>>>>>

Re: Apache Spark 3.2 Expectation

Posted by Hyukjin Kwon <gu...@gmail.com>.
*GA -> QA

On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon, <gu...@gmail.com> wrote:

> I think we would make sure treating these items in the list as exceptions
> from the code freeze, and discourage to push new APIs and features though.
>
> GA period ideally we should focus on bug fixes and polishing.
>
> It would be great if we can speed up on these items in the list too.
>
>
> On Thu, 17 Jun 2021, 15:08 Gengliang Wang, <lt...@gmail.com> wrote:
>
>> Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
>> Now we make it clear that it's a soft cut and we can still merge
>> important code changes to branch-3.2 before RC. Let's keep the branch cut
>> date as July 1st.
>>
>> On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>>> > First, I think you are saying "branch-3.2";
>>>
>>> To Xiao. Yes, it's was a typo of "branch-3.2".
>>>
>>> > We do strongly prefer to cut the release for Spark 3.2.0 including
>>> all the patches under SPARK-30602.
>>> > This way, we can backport the other performance/operability
>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>> future Spark 3.2.x patch releases.
>>>
>>> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+ as
>>> Xiao wrote.
>>>
>>>
>>>
>>> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li <ga...@gmail.com> wrote:
>>>
>>>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>>> soft cut and the committers still are able to commit to `branch-3.3`
>>>>> according to their decisions.
>>>>
>>>>
>>>> First, I think you are saying "branch-3.2";
>>>>
>>>> Second, the "so cut" means no "code freeze", although we cut the
>>>> branch. To avoid releasing half-baked and unready features, the release
>>>> manager needs to be very careful when cutting the RC. Based on what is
>>>> proposed here, the RC date is the actual code freeze date.
>>>>
>>>> This way, we can backport the other performance/operability
>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>> future Spark 3.2.x patch releases.
>>>>
>>>>
>>>> This is not allowed based on the policy. Only bug fixes can be merged
>>>> to the patch releases. Thus, if we know it will introduce major performance
>>>> regression, we have to turn the feature off by default.
>>>>
>>>> Xiao
>>>>
>>>>
>>>>
>>>> Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:
>>>>
>>>>> Hi Gengliang,
>>>>>
>>>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we
>>>>> are close to having all the patches merged to master to enable push-based
>>>>> shuffle.
>>>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>>>> We should be able to post the PRs for the other 2 remaining tickets
>>>>> (SPARK-32923 and SPARK-35546) early next week.
>>>>>
>>>>> The tickets under SPARK-30602 are the minimum set of patches to enable
>>>>> push-based shuffle.
>>>>> We do have other performance/operability enhancements tickets under
>>>>> SPARK-33235 that are needed to fully contribute what we have internally for
>>>>> push-based shuffle.
>>>>> However, these are optional for enabling push-based shuffle.
>>>>> We do strongly prefer to cut the release for Spark 3.2.0 including all
>>>>> the patches under SPARK-30602.
>>>>> This way, we can backport the other performance/operability
>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>> future Spark 3.2.x patch releases.
>>>>> I understand the preference of not postponing the branch cut date.
>>>>> We will check with Dongjoon regarding the soft cut date and the
>>>>> flexibility for including the remaining tickets under SPARK-30602 into
>>>>> branch-3.2.
>>>>>
>>>>> Best,
>>>>> Min
>>>>>
>>>>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more this.
>>>>>> As it is soft cut date, there is no reason to postpone it.
>>>>>>
>>>>>> It sounds good then to keep original branch cut date.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dongjoon Hyun-2 wrote
>>>>>> > Thank you for volunteering, Gengliang.
>>>>>> >
>>>>>> > Apache Spark 3.2.0 is the first version enabling AQE by default.
>>>>>> I'm also
>>>>>> > watching some on-going improvements on that.
>>>>>> >
>>>>>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL
>>>>>> Adaptive Query
>>>>>> > Execution QA)
>>>>>> >
>>>>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this is
>>>>>> a soft
>>>>>> > cut and the committers still are able to commit to `branch-3.3`
>>>>>> according
>>>>>> > to their decisions.
>>>>>> >
>>>>>> > Given that Apache Spark had 115 commits in a week in various areas
>>>>>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>>>>>> > branch-3.3 and allowing only limited backporting.
>>>>>> >
>>>>>> >     https://github.com/apache/spark/graphs/commit-activity
>>>>>> >
>>>>>> > Bests,
>>>>>> > Dongjoon.
>>>>>> >
>>>>>> >
>>>>>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>>>>>
>>>>>> > viirya@
>>>>>>
>>>>>> > &gt; wrote:
>>>>>> >
>>>>>> >> First, thanks for being volunteer as the release manager of Spark
>>>>>> 3.2.0,
>>>>>> >> Gengliang!
>>>>>> >>
>>>>>> >> And yes, for the two important Structured Streaming features,
>>>>>> RocksDB
>>>>>> >> StateStore and session window, we're working on them and expect to
>>>>>> have
>>>>>> >> them
>>>>>> >> in the new release.
>>>>>> >>
>>>>>> >> So I propose to postpone the branch cut date.
>>>>>> >>
>>>>>> >> Thank you!
>>>>>> >>
>>>>>> >> Liang-Chi
>>>>>> >>
>>>>>> >>
>>>>>> >> Gengliang Wang-2 wrote
>>>>>> >> > Thanks, Hyukjin.
>>>>>> >> >
>>>>>> >> > The expected target branch cut date of Spark 3.2 is *July 1st* on
>>>>>> >> > https://spark.apache.org/versioning-policy.html. However, I
>>>>>> notice that
>>>>>> >> > there are still multiple important projects in progress now:
>>>>>> >> >
>>>>>> >> > [Core]
>>>>>> >> >
>>>>>> >> >    - SPIP: Support push-based shuffle to improve shuffle
>>>>>> efficiency
>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>>>>>> >> >
>>>>>> >> > [SQL]
>>>>>> >> >
>>>>>> >> >    - Support ANSI SQL INTERVAL types
>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>>>>>> >> >    - Support Timestamp without time zone data type
>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>>>>>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>>>>>> >> >
>>>>>> >> > [Streaming]
>>>>>> >> >
>>>>>> >> >    - EventTime based sessionization (session window)
>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>>>>>> >> >    - Add RocksDB StateStore as external module
>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > I wonder whether we should postpone the branch cut date.
>>>>>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
>>>>>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>>>>>> >> >
>>>>>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>>>>> >>
>>>>>> >> > gurwls223@
>>>>>> >>
>>>>>> >> > &gt; wrote:
>>>>>> >> >
>>>>>> >> >> +1, thanks.
>>>>>> >> >>
>>>>>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>>>>> >>
>>>>>> >> > ltnwgl@
>>>>>> >>
>>>>>> >> > &gt; wrote:
>>>>>> >> >>
>>>>>> >> >>> Hi,
>>>>>> >> >>>
>>>>>> >> >>> As the expected release date is close,  I would like to
>>>>>> volunteer as
>>>>>> >> the
>>>>>> >> >>> release manager for Apache Spark 3.2.0.
>>>>>> >> >>>
>>>>>> >> >>> Thanks,
>>>>>> >> >>> Gengliang
>>>>>> >> >>>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Sent from:
>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>> >>
>>>>>> >>
>>>>>> ---------------------------------------------------------------------
>>>>>> >> To unsubscribe e-mail:
>>>>>>
>>>>>> > dev-unsubscribe@.apache
>>>>>>
>>>>>> >>
>>>>>> >>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>
>>>>>>

Re: Apache Spark 3.2 Expectation

Posted by Hyukjin Kwon <gu...@gmail.com>.
I think we would make sure treating these items in the list as exceptions
from the code freeze, and discourage to push new APIs and features though.

GA period ideally we should focus on bug fixes and polishing.

It would be great if we can speed up on these items in the list too.


On Thu, 17 Jun 2021, 15:08 Gengliang Wang, <lt...@gmail.com> wrote:

> Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
> Now we make it clear that it's a soft cut and we can still merge important
> code changes to branch-3.2 before RC. Let's keep the branch cut date as
> July 1st.
>
> On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> > First, I think you are saying "branch-3.2";
>>
>> To Xiao. Yes, it's was a typo of "branch-3.2".
>>
>> > We do strongly prefer to cut the release for Spark 3.2.0 including all
>> the patches under SPARK-30602.
>> > This way, we can backport the other performance/operability
>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>> future Spark 3.2.x patch releases.
>>
>> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+ as
>> Xiao wrote.
>>
>>
>>
>> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li <ga...@gmail.com> wrote:
>>
>>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>> soft cut and the committers still are able to commit to `branch-3.3`
>>>> according to their decisions.
>>>
>>>
>>> First, I think you are saying "branch-3.2";
>>>
>>> Second, the "so cut" means no "code freeze", although we cut the branch.
>>> To avoid releasing half-baked and unready features, the release
>>> manager needs to be very careful when cutting the RC. Based on what is
>>> proposed here, the RC date is the actual code freeze date.
>>>
>>> This way, we can backport the other performance/operability enhancements
>>>> tickets under SPARK-33235 into branch-3.2 to be released in future Spark
>>>> 3.2.x patch releases.
>>>
>>>
>>> This is not allowed based on the policy. Only bug fixes can be merged to
>>> the patch releases. Thus, if we know it will introduce major performance
>>> regression, we have to turn the feature off by default.
>>>
>>> Xiao
>>>
>>>
>>>
>>> Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:
>>>
>>>> Hi Gengliang,
>>>>
>>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we are
>>>> close to having all the patches merged to master to enable push-based
>>>> shuffle.
>>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>>> We should be able to post the PRs for the other 2 remaining tickets
>>>> (SPARK-32923 and SPARK-35546) early next week.
>>>>
>>>> The tickets under SPARK-30602 are the minimum set of patches to enable
>>>> push-based shuffle.
>>>> We do have other performance/operability enhancements tickets under
>>>> SPARK-33235 that are needed to fully contribute what we have internally for
>>>> push-based shuffle.
>>>> However, these are optional for enabling push-based shuffle.
>>>> We do strongly prefer to cut the release for Spark 3.2.0 including all
>>>> the patches under SPARK-30602.
>>>> This way, we can backport the other performance/operability
>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>> future Spark 3.2.x patch releases.
>>>> I understand the preference of not postponing the branch cut date.
>>>> We will check with Dongjoon regarding the soft cut date and the
>>>> flexibility for including the remaining tickets under SPARK-30602 into
>>>> branch-3.2.
>>>>
>>>> Best,
>>>> Min
>>>>
>>>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more this.
>>>>> As it is soft cut date, there is no reason to postpone it.
>>>>>
>>>>> It sounds good then to keep original branch cut date.
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>>
>>>>> Dongjoon Hyun-2 wrote
>>>>> > Thank you for volunteering, Gengliang.
>>>>> >
>>>>> > Apache Spark 3.2.0 is the first version enabling AQE by default. I'm
>>>>> also
>>>>> > watching some on-going improvements on that.
>>>>> >
>>>>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL Adaptive
>>>>> Query
>>>>> > Execution QA)
>>>>> >
>>>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>>> soft
>>>>> > cut and the committers still are able to commit to `branch-3.3`
>>>>> according
>>>>> > to their decisions.
>>>>> >
>>>>> > Given that Apache Spark had 115 commits in a week in various areas
>>>>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>>>>> > branch-3.3 and allowing only limited backporting.
>>>>> >
>>>>> >     https://github.com/apache/spark/graphs/commit-activity
>>>>> >
>>>>> > Bests,
>>>>> > Dongjoon.
>>>>> >
>>>>> >
>>>>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>>>>
>>>>> > viirya@
>>>>>
>>>>> > &gt; wrote:
>>>>> >
>>>>> >> First, thanks for being volunteer as the release manager of Spark
>>>>> 3.2.0,
>>>>> >> Gengliang!
>>>>> >>
>>>>> >> And yes, for the two important Structured Streaming features,
>>>>> RocksDB
>>>>> >> StateStore and session window, we're working on them and expect to
>>>>> have
>>>>> >> them
>>>>> >> in the new release.
>>>>> >>
>>>>> >> So I propose to postpone the branch cut date.
>>>>> >>
>>>>> >> Thank you!
>>>>> >>
>>>>> >> Liang-Chi
>>>>> >>
>>>>> >>
>>>>> >> Gengliang Wang-2 wrote
>>>>> >> > Thanks, Hyukjin.
>>>>> >> >
>>>>> >> > The expected target branch cut date of Spark 3.2 is *July 1st* on
>>>>> >> > https://spark.apache.org/versioning-policy.html. However, I
>>>>> notice that
>>>>> >> > there are still multiple important projects in progress now:
>>>>> >> >
>>>>> >> > [Core]
>>>>> >> >
>>>>> >> >    - SPIP: Support push-based shuffle to improve shuffle
>>>>> efficiency
>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>>>>> >> >
>>>>> >> > [SQL]
>>>>> >> >
>>>>> >> >    - Support ANSI SQL INTERVAL types
>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>>>>> >> >    - Support Timestamp without time zone data type
>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>>>>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>>>>> >> >
>>>>> >> > [Streaming]
>>>>> >> >
>>>>> >> >    - EventTime based sessionization (session window)
>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>>>>> >> >    - Add RocksDB StateStore as external module
>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>>>>> >> >
>>>>> >> >
>>>>> >> > I wonder whether we should postpone the branch cut date.
>>>>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
>>>>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>>>>> >> >
>>>>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>>>> >>
>>>>> >> > gurwls223@
>>>>> >>
>>>>> >> > &gt; wrote:
>>>>> >> >
>>>>> >> >> +1, thanks.
>>>>> >> >>
>>>>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>>>> >>
>>>>> >> > ltnwgl@
>>>>> >>
>>>>> >> > &gt; wrote:
>>>>> >> >>
>>>>> >> >>> Hi,
>>>>> >> >>>
>>>>> >> >>> As the expected release date is close,  I would like to
>>>>> volunteer as
>>>>> >> the
>>>>> >> >>> release manager for Apache Spark 3.2.0.
>>>>> >> >>>
>>>>> >> >>> Thanks,
>>>>> >> >>> Gengliang
>>>>> >> >>>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Sent from:
>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>> >>
>>>>> >>
>>>>> ---------------------------------------------------------------------
>>>>> >> To unsubscribe e-mail:
>>>>>
>>>>> > dev-unsubscribe@.apache
>>>>>
>>>>> >>
>>>>> >>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>
>>>>>

Re: Apache Spark 3.2 Expectation

Posted by Gengliang Wang <lt...@gmail.com>.
Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
Now we make it clear that it's a soft cut and we can still merge important
code changes to branch-3.2 before RC. Let's keep the branch cut date as
July 1st.

On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> > First, I think you are saying "branch-3.2";
>
> To Xiao. Yes, it's was a typo of "branch-3.2".
>
> > We do strongly prefer to cut the release for Spark 3.2.0 including all
> the patches under SPARK-30602.
> > This way, we can backport the other performance/operability
> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
> future Spark 3.2.x patch releases.
>
> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+ as
> Xiao wrote.
>
>
>
> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li <ga...@gmail.com> wrote:
>
>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a soft
>>> cut and the committers still are able to commit to `branch-3.3` according
>>> to their decisions.
>>
>>
>> First, I think you are saying "branch-3.2";
>>
>> Second, the "so cut" means no "code freeze", although we cut the branch.
>> To avoid releasing half-baked and unready features, the release
>> manager needs to be very careful when cutting the RC. Based on what is
>> proposed here, the RC date is the actual code freeze date.
>>
>> This way, we can backport the other performance/operability enhancements
>>> tickets under SPARK-33235 into branch-3.2 to be released in future Spark
>>> 3.2.x patch releases.
>>
>>
>> This is not allowed based on the policy. Only bug fixes can be merged to
>> the patch releases. Thus, if we know it will introduce major performance
>> regression, we have to turn the feature off by default.
>>
>> Xiao
>>
>>
>>
>> Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:
>>
>>> Hi Gengliang,
>>>
>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we are
>>> close to having all the patches merged to master to enable push-based
>>> shuffle.
>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>> We should be able to post the PRs for the other 2 remaining tickets
>>> (SPARK-32923 and SPARK-35546) early next week.
>>>
>>> The tickets under SPARK-30602 are the minimum set of patches to enable
>>> push-based shuffle.
>>> We do have other performance/operability enhancements tickets under
>>> SPARK-33235 that are needed to fully contribute what we have internally for
>>> push-based shuffle.
>>> However, these are optional for enabling push-based shuffle.
>>> We do strongly prefer to cut the release for Spark 3.2.0 including all
>>> the patches under SPARK-30602.
>>> This way, we can backport the other performance/operability enhancements
>>> tickets under SPARK-33235 into branch-3.2 to be released in future Spark
>>> 3.2.x patch releases.
>>> I understand the preference of not postponing the branch cut date.
>>> We will check with Dongjoon regarding the soft cut date and the
>>> flexibility for including the remaining tickets under SPARK-30602 into
>>> branch-3.2.
>>>
>>> Best,
>>> Min
>>>
>>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more this.
>>>> As it is soft cut date, there is no reason to postpone it.
>>>>
>>>> It sounds good then to keep original branch cut date.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>>> Dongjoon Hyun-2 wrote
>>>> > Thank you for volunteering, Gengliang.
>>>> >
>>>> > Apache Spark 3.2.0 is the first version enabling AQE by default. I'm
>>>> also
>>>> > watching some on-going improvements on that.
>>>> >
>>>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL Adaptive
>>>> Query
>>>> > Execution QA)
>>>> >
>>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>> soft
>>>> > cut and the committers still are able to commit to `branch-3.3`
>>>> according
>>>> > to their decisions.
>>>> >
>>>> > Given that Apache Spark had 115 commits in a week in various areas
>>>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>>>> > branch-3.3 and allowing only limited backporting.
>>>> >
>>>> >     https://github.com/apache/spark/graphs/commit-activity
>>>> >
>>>> > Bests,
>>>> > Dongjoon.
>>>> >
>>>> >
>>>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>>>
>>>> > viirya@
>>>>
>>>> > &gt; wrote:
>>>> >
>>>> >> First, thanks for being volunteer as the release manager of Spark
>>>> 3.2.0,
>>>> >> Gengliang!
>>>> >>
>>>> >> And yes, for the two important Structured Streaming features, RocksDB
>>>> >> StateStore and session window, we're working on them and expect to
>>>> have
>>>> >> them
>>>> >> in the new release.
>>>> >>
>>>> >> So I propose to postpone the branch cut date.
>>>> >>
>>>> >> Thank you!
>>>> >>
>>>> >> Liang-Chi
>>>> >>
>>>> >>
>>>> >> Gengliang Wang-2 wrote
>>>> >> > Thanks, Hyukjin.
>>>> >> >
>>>> >> > The expected target branch cut date of Spark 3.2 is *July 1st* on
>>>> >> > https://spark.apache.org/versioning-policy.html. However, I
>>>> notice that
>>>> >> > there are still multiple important projects in progress now:
>>>> >> >
>>>> >> > [Core]
>>>> >> >
>>>> >> >    - SPIP: Support push-based shuffle to improve shuffle efficiency
>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>>>> >> >
>>>> >> > [SQL]
>>>> >> >
>>>> >> >    - Support ANSI SQL INTERVAL types
>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>>>> >> >    - Support Timestamp without time zone data type
>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>>>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>>>> >> >
>>>> >> > [Streaming]
>>>> >> >
>>>> >> >    - EventTime based sessionization (session window)
>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>>>> >> >    - Add RocksDB StateStore as external module
>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>>>> >> >
>>>> >> >
>>>> >> > I wonder whether we should postpone the branch cut date.
>>>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
>>>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>>>> >> >
>>>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>>> >>
>>>> >> > gurwls223@
>>>> >>
>>>> >> > &gt; wrote:
>>>> >> >
>>>> >> >> +1, thanks.
>>>> >> >>
>>>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>>> >>
>>>> >> > ltnwgl@
>>>> >>
>>>> >> > &gt; wrote:
>>>> >> >>
>>>> >> >>> Hi,
>>>> >> >>>
>>>> >> >>> As the expected release date is close,  I would like to
>>>> volunteer as
>>>> >> the
>>>> >> >>> release manager for Apache Spark 3.2.0.
>>>> >> >>>
>>>> >> >>> Thanks,
>>>> >> >>> Gengliang
>>>> >> >>>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Sent from:
>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe e-mail:
>>>>
>>>> > dev-unsubscribe@.apache
>>>>
>>>> >>
>>>> >>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>

Re: Apache Spark 3.2 Expectation

Posted by Dongjoon Hyun <do...@gmail.com>.
> First, I think you are saying "branch-3.2";

To Xiao. Yes, it's was a typo of "branch-3.2".

> We do strongly prefer to cut the release for Spark 3.2.0 including all
the patches under SPARK-30602.
> This way, we can backport the other performance/operability enhancements
tickets under SPARK-33235 into branch-3.2 to be released in future Spark
3.2.x patch releases.

To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+ as
Xiao wrote.



On Wed, Jun 16, 2021 at 9:42 PM Xiao Li <ga...@gmail.com> wrote:

> To Liang-Chi, I'm -1 for postponing the branch cut because this is a soft
>> cut and the committers still are able to commit to `branch-3.3` according
>> to their decisions.
>
>
> First, I think you are saying "branch-3.2";
>
> Second, the "so cut" means no "code freeze", although we cut the branch.
> To avoid releasing half-baked and unready features, the release
> manager needs to be very careful when cutting the RC. Based on what is
> proposed here, the RC date is the actual code freeze date.
>
> This way, we can backport the other performance/operability enhancements
>> tickets under SPARK-33235 into branch-3.2 to be released in future Spark
>> 3.2.x patch releases.
>
>
> This is not allowed based on the policy. Only bug fixes can be merged to
> the patch releases. Thus, if we know it will introduce major performance
> regression, we have to turn the feature off by default.
>
> Xiao
>
>
>
> Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:
>
>> Hi Gengliang,
>>
>> Thanks for volunteering as the release manager for Spark 3.2.0.
>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we are
>> close to having all the patches merged to master to enable push-based
>> shuffle.
>> Currently, there are 2 PRs under SPARK-30602 that are under active review
>> (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>> We should be able to post the PRs for the other 2 remaining tickets
>> (SPARK-32923 and SPARK-35546) early next week.
>>
>> The tickets under SPARK-30602 are the minimum set of patches to enable
>> push-based shuffle.
>> We do have other performance/operability enhancements tickets under
>> SPARK-33235 that are needed to fully contribute what we have internally for
>> push-based shuffle.
>> However, these are optional for enabling push-based shuffle.
>> We do strongly prefer to cut the release for Spark 3.2.0 including all
>> the patches under SPARK-30602.
>> This way, we can backport the other performance/operability enhancements
>> tickets under SPARK-33235 into branch-3.2 to be released in future Spark
>> 3.2.x patch releases.
>> I understand the preference of not postponing the branch cut date.
>> We will check with Dongjoon regarding the soft cut date and the
>> flexibility for including the remaining tickets under SPARK-30602 into
>> branch-3.2.
>>
>> Best,
>> Min
>>
>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com> wrote:
>>
>>>
>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more this.
>>> As it is soft cut date, there is no reason to postpone it.
>>>
>>> It sounds good then to keep original branch cut date.
>>>
>>> Thank you.
>>>
>>>
>>>
>>> Dongjoon Hyun-2 wrote
>>> > Thank you for volunteering, Gengliang.
>>> >
>>> > Apache Spark 3.2.0 is the first version enabling AQE by default. I'm
>>> also
>>> > watching some on-going improvements on that.
>>> >
>>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL Adaptive
>>> Query
>>> > Execution QA)
>>> >
>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>> soft
>>> > cut and the committers still are able to commit to `branch-3.3`
>>> according
>>> > to their decisions.
>>> >
>>> > Given that Apache Spark had 115 commits in a week in various areas
>>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>>> > branch-3.3 and allowing only limited backporting.
>>> >
>>> >     https://github.com/apache/spark/graphs/commit-activity
>>> >
>>> > Bests,
>>> > Dongjoon.
>>> >
>>> >
>>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>>
>>> > viirya@
>>>
>>> > &gt; wrote:
>>> >
>>> >> First, thanks for being volunteer as the release manager of Spark
>>> 3.2.0,
>>> >> Gengliang!
>>> >>
>>> >> And yes, for the two important Structured Streaming features, RocksDB
>>> >> StateStore and session window, we're working on them and expect to
>>> have
>>> >> them
>>> >> in the new release.
>>> >>
>>> >> So I propose to postpone the branch cut date.
>>> >>
>>> >> Thank you!
>>> >>
>>> >> Liang-Chi
>>> >>
>>> >>
>>> >> Gengliang Wang-2 wrote
>>> >> > Thanks, Hyukjin.
>>> >> >
>>> >> > The expected target branch cut date of Spark 3.2 is *July 1st* on
>>> >> > https://spark.apache.org/versioning-policy.html. However, I notice
>>> that
>>> >> > there are still multiple important projects in progress now:
>>> >> >
>>> >> > [Core]
>>> >> >
>>> >> >    - SPIP: Support push-based shuffle to improve shuffle efficiency
>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>>> >> >
>>> >> > [SQL]
>>> >> >
>>> >> >    - Support ANSI SQL INTERVAL types
>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>>> >> >    - Support Timestamp without time zone data type
>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>>> >> >
>>> >> > [Streaming]
>>> >> >
>>> >> >    - EventTime based sessionization (session window)
>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>>> >> >    - Add RocksDB StateStore as external module
>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>>> >> >
>>> >> >
>>> >> > I wonder whether we should postpone the branch cut date.
>>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
>>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>>> >> >
>>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>> >>
>>> >> > gurwls223@
>>> >>
>>> >> > &gt; wrote:
>>> >> >
>>> >> >> +1, thanks.
>>> >> >>
>>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>> >>
>>> >> > ltnwgl@
>>> >>
>>> >> > &gt; wrote:
>>> >> >>
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> As the expected release date is close,  I would like to volunteer
>>> as
>>> >> the
>>> >> >>> release manager for Apache Spark 3.2.0.
>>> >> >>>
>>> >> >>> Thanks,
>>> >> >>> Gengliang
>>> >> >>>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe e-mail:
>>>
>>> > dev-unsubscribe@.apache
>>>
>>> >>
>>> >>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>

Re: Apache Spark 3.2 Expectation

Posted by Xiao Li <ga...@gmail.com>.
>
> To Liang-Chi, I'm -1 for postponing the branch cut because this is a soft
> cut and the committers still are able to commit to `branch-3.3` according
> to their decisions.


First, I think you are saying "branch-3.2";

Second, the "so cut" means no "code freeze", although we cut the branch. To
avoid releasing half-baked and unready features, the release
manager needs to be very careful when cutting the RC. Based on what is
proposed here, the RC date is the actual code freeze date.

This way, we can backport the other performance/operability enhancements
> tickets under SPARK-33235 into branch-3.2 to be released in future Spark
> 3.2.x patch releases.


This is not allowed based on the policy. Only bug fixes can be merged to
the patch releases. Thus, if we know it will introduce major performance
regression, we have to turn the feature off by default.

Xiao



Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:

> Hi Gengliang,
>
> Thanks for volunteering as the release manager for Spark 3.2.0.
> Regarding the ongoing work of push-based shuffle in SPARK-30602, we are
> close to having all the patches merged to master to enable push-based
> shuffle.
> Currently, there are 2 PRs under SPARK-30602 that are under active review
> (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
> We should be able to post the PRs for the other 2 remaining tickets
> (SPARK-32923 and SPARK-35546) early next week.
>
> The tickets under SPARK-30602 are the minimum set of patches to enable
> push-based shuffle.
> We do have other performance/operability enhancements tickets under
> SPARK-33235 that are needed to fully contribute what we have internally for
> push-based shuffle.
> However, these are optional for enabling push-based shuffle.
> We do strongly prefer to cut the release for Spark 3.2.0 including all the
> patches under SPARK-30602.
> This way, we can backport the other performance/operability enhancements
> tickets under SPARK-33235 into branch-3.2 to be released in future Spark
> 3.2.x patch releases.
> I understand the preference of not postponing the branch cut date.
> We will check with Dongjoon regarding the soft cut date and the
> flexibility for including the remaining tickets under SPARK-30602 into
> branch-3.2.
>
> Best,
> Min
>
> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com> wrote:
>
>>
>> Thanks Dongjoon. I've talked with Dongjoon offline to know more this.
>> As it is soft cut date, there is no reason to postpone it.
>>
>> It sounds good then to keep original branch cut date.
>>
>> Thank you.
>>
>>
>>
>> Dongjoon Hyun-2 wrote
>> > Thank you for volunteering, Gengliang.
>> >
>> > Apache Spark 3.2.0 is the first version enabling AQE by default. I'm
>> also
>> > watching some on-going improvements on that.
>> >
>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL Adaptive
>> Query
>> > Execution QA)
>> >
>> > To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>> soft
>> > cut and the committers still are able to commit to `branch-3.3`
>> according
>> > to their decisions.
>> >
>> > Given that Apache Spark had 115 commits in a week in various areas
>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>> > branch-3.3 and allowing only limited backporting.
>> >
>> >     https://github.com/apache/spark/graphs/commit-activity
>> >
>> > Bests,
>> > Dongjoon.
>> >
>> >
>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>
>> > viirya@
>>
>> > &gt; wrote:
>> >
>> >> First, thanks for being volunteer as the release manager of Spark
>> 3.2.0,
>> >> Gengliang!
>> >>
>> >> And yes, for the two important Structured Streaming features, RocksDB
>> >> StateStore and session window, we're working on them and expect to have
>> >> them
>> >> in the new release.
>> >>
>> >> So I propose to postpone the branch cut date.
>> >>
>> >> Thank you!
>> >>
>> >> Liang-Chi
>> >>
>> >>
>> >> Gengliang Wang-2 wrote
>> >> > Thanks, Hyukjin.
>> >> >
>> >> > The expected target branch cut date of Spark 3.2 is *July 1st* on
>> >> > https://spark.apache.org/versioning-policy.html. However, I notice
>> that
>> >> > there are still multiple important projects in progress now:
>> >> >
>> >> > [Core]
>> >> >
>> >> >    - SPIP: Support push-based shuffle to improve shuffle efficiency
>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>> >> >
>> >> > [SQL]
>> >> >
>> >> >    - Support ANSI SQL INTERVAL types
>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>> >> >    - Support Timestamp without time zone data type
>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>> >> >
>> >> > [Streaming]
>> >> >
>> >> >    - EventTime based sessionization (session window)
>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>> >> >    - Add RocksDB StateStore as external module
>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>> >> >
>> >> >
>> >> > I wonder whether we should postpone the branch cut date.
>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>> >> >
>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>> >>
>> >> > gurwls223@
>> >>
>> >> > &gt; wrote:
>> >> >
>> >> >> +1, thanks.
>> >> >>
>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>> >>
>> >> > ltnwgl@
>> >>
>> >> > &gt; wrote:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> As the expected release date is close,  I would like to volunteer
>> as
>> >> the
>> >> >>> release manager for Apache Spark 3.2.0.
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Gengliang
>> >> >>>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail:
>>
>> > dev-unsubscribe@.apache
>>
>> >>
>> >>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: Apache Spark 3.2 Expectation

Posted by Min Shen <vi...@gmail.com>.
Hi Gengliang,

Thanks for volunteering as the release manager for Spark 3.2.0.
Regarding the ongoing work of push-based shuffle in SPARK-30602, we are
close to having all the patches merged to master to enable push-based
shuffle.
Currently, there are 2 PRs under SPARK-30602 that are under active review
(SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
We should be able to post the PRs for the other 2 remaining tickets
(SPARK-32923 and SPARK-35546) early next week.

The tickets under SPARK-30602 are the minimum set of patches to enable
push-based shuffle.
We do have other performance/operability enhancements tickets under
SPARK-33235 that are needed to fully contribute what we have internally for
push-based shuffle.
However, these are optional for enabling push-based shuffle.
We do strongly prefer to cut the release for Spark 3.2.0 including all the
patches under SPARK-30602.
This way, we can backport the other performance/operability enhancements
tickets under SPARK-33235 into branch-3.2 to be released in future Spark
3.2.x patch releases.
I understand the preference of not postponing the branch cut date.
We will check with Dongjoon regarding the soft cut date and the flexibility
for including the remaining tickets under SPARK-30602 into branch-3.2.

Best,
Min

On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com> wrote:

>
> Thanks Dongjoon. I've talked with Dongjoon offline to know more this.
> As it is soft cut date, there is no reason to postpone it.
>
> It sounds good then to keep original branch cut date.
>
> Thank you.
>
>
>
> Dongjoon Hyun-2 wrote
> > Thank you for volunteering, Gengliang.
> >
> > Apache Spark 3.2.0 is the first version enabling AQE by default. I'm also
> > watching some on-going improvements on that.
> >
> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL Adaptive
> Query
> > Execution QA)
> >
> > To Liang-Chi, I'm -1 for postponing the branch cut because this is a soft
> > cut and the committers still are able to commit to `branch-3.3` according
> > to their decisions.
> >
> > Given that Apache Spark had 115 commits in a week in various areas
> > concurrently, we should start QA for Apache Spark 3.2 by creating
> > branch-3.3 and allowing only limited backporting.
> >
> >     https://github.com/apache/spark/graphs/commit-activity
> >
> > Bests,
> > Dongjoon.
> >
> >
> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>
> > viirya@
>
> > &gt; wrote:
> >
> >> First, thanks for being volunteer as the release manager of Spark 3.2.0,
> >> Gengliang!
> >>
> >> And yes, for the two important Structured Streaming features, RocksDB
> >> StateStore and session window, we're working on them and expect to have
> >> them
> >> in the new release.
> >>
> >> So I propose to postpone the branch cut date.
> >>
> >> Thank you!
> >>
> >> Liang-Chi
> >>
> >>
> >> Gengliang Wang-2 wrote
> >> > Thanks, Hyukjin.
> >> >
> >> > The expected target branch cut date of Spark 3.2 is *July 1st* on
> >> > https://spark.apache.org/versioning-policy.html. However, I notice
> that
> >> > there are still multiple important projects in progress now:
> >> >
> >> > [Core]
> >> >
> >> >    - SPIP: Support push-based shuffle to improve shuffle efficiency
> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
> >> >
> >> > [SQL]
> >> >
> >> >    - Support ANSI SQL INTERVAL types
> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
> >> >    - Support Timestamp without time zone data type
> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
> >> >    - Aggregate (Min/Max/Count) push down for Parquet
> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
> >> >
> >> > [Streaming]
> >> >
> >> >    - EventTime based sessionization (session window)
> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
> >> >    - Add RocksDB StateStore as external module
> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
> >> >
> >> >
> >> > I wonder whether we should postpone the branch cut date.
> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
> >> > Li, Liang-Chi Hsieh, who work on the projects above.
> >> >
> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
> >>
> >> > gurwls223@
> >>
> >> > &gt; wrote:
> >> >
> >> >> +1, thanks.
> >> >>
> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
> >>
> >> > ltnwgl@
> >>
> >> > &gt; wrote:
> >> >>
> >> >>> Hi,
> >> >>>
> >> >>> As the expected release date is close,  I would like to volunteer as
> >> the
> >> >>> release manager for Apache Spark 3.2.0.
> >> >>>
> >> >>> Thanks,
> >> >>> Gengliang
> >> >>>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail:
>
> > dev-unsubscribe@.apache
>
> >>
> >>
>
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Apache Spark 3.2 Expectation

Posted by Liang-Chi Hsieh <vi...@gmail.com>.
Thanks Dongjoon. I've talked with Dongjoon offline to know more this.
As it is soft cut date, there is no reason to postpone it.

It sounds good then to keep original branch cut date.

Thank you.



Dongjoon Hyun-2 wrote
> Thank you for volunteering, Gengliang.
> 
> Apache Spark 3.2.0 is the first version enabling AQE by default. I'm also
> watching some on-going improvements on that.
> 
>     https://issues.apache.org/jira/browse/SPARK-33828 (SQL Adaptive Query
> Execution QA)
> 
> To Liang-Chi, I'm -1 for postponing the branch cut because this is a soft
> cut and the committers still are able to commit to `branch-3.3` according
> to their decisions.
> 
> Given that Apache Spark had 115 commits in a week in various areas
> concurrently, we should start QA for Apache Spark 3.2 by creating
> branch-3.3 and allowing only limited backporting.
> 
>     https://github.com/apache/spark/graphs/commit-activity
> 
> Bests,
> Dongjoon.
> 
> 
> On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;

> viirya@

> &gt; wrote:
> 
>> First, thanks for being volunteer as the release manager of Spark 3.2.0,
>> Gengliang!
>>
>> And yes, for the two important Structured Streaming features, RocksDB
>> StateStore and session window, we're working on them and expect to have
>> them
>> in the new release.
>>
>> So I propose to postpone the branch cut date.
>>
>> Thank you!
>>
>> Liang-Chi
>>
>>
>> Gengliang Wang-2 wrote
>> > Thanks, Hyukjin.
>> >
>> > The expected target branch cut date of Spark 3.2 is *July 1st* on
>> > https://spark.apache.org/versioning-policy.html. However, I notice that
>> > there are still multiple important projects in progress now:
>> >
>> > [Core]
>> >
>> >    - SPIP: Support push-based shuffle to improve shuffle efficiency
>> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>> >
>> > [SQL]
>> >
>> >    - Support ANSI SQL INTERVAL types
>> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>> >    - Support Timestamp without time zone data type
>> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>> >    - Aggregate (Min/Max/Count) push down for Parquet
>> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>> >
>> > [Streaming]
>> >
>> >    - EventTime based sessionization (session window)
>> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>> >    - Add RocksDB StateStore as external module
>> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>> >
>> >
>> > I wonder whether we should postpone the branch cut date.
>> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
>> > Li, Liang-Chi Hsieh, who work on the projects above.
>> >
>> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>
>> > gurwls223@
>>
>> > &gt; wrote:
>> >
>> >> +1, thanks.
>> >>
>> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>
>> > ltnwgl@
>>
>> > &gt; wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> As the expected release date is close,  I would like to volunteer as
>> the
>> >>> release manager for Apache Spark 3.2.0.
>> >>>
>> >>> Thanks,
>> >>> Gengliang
>> >>>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: 

> dev-unsubscribe@.apache

>>
>>





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Apache Spark 3.2 Expectation

Posted by Dongjoon Hyun <do...@gmail.com>.
Thank you for volunteering, Gengliang.

Apache Spark 3.2.0 is the first version enabling AQE by default. I'm also
watching some on-going improvements on that.

    https://issues.apache.org/jira/browse/SPARK-33828 (SQL Adaptive Query
Execution QA)

To Liang-Chi, I'm -1 for postponing the branch cut because this is a soft
cut and the committers still are able to commit to `branch-3.3` according
to their decisions.

Given that Apache Spark had 115 commits in a week in various areas
concurrently, we should start QA for Apache Spark 3.2 by creating
branch-3.3 and allowing only limited backporting.

    https://github.com/apache/spark/graphs/commit-activity

Bests,
Dongjoon.


On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh <vi...@gmail.com> wrote:

> First, thanks for being volunteer as the release manager of Spark 3.2.0,
> Gengliang!
>
> And yes, for the two important Structured Streaming features, RocksDB
> StateStore and session window, we're working on them and expect to have
> them
> in the new release.
>
> So I propose to postpone the branch cut date.
>
> Thank you!
>
> Liang-Chi
>
>
> Gengliang Wang-2 wrote
> > Thanks, Hyukjin.
> >
> > The expected target branch cut date of Spark 3.2 is *July 1st* on
> > https://spark.apache.org/versioning-policy.html. However, I notice that
> > there are still multiple important projects in progress now:
> >
> > [Core]
> >
> >    - SPIP: Support push-based shuffle to improve shuffle efficiency
> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
> >
> > [SQL]
> >
> >    - Support ANSI SQL INTERVAL types
> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
> >    - Support Timestamp without time zone data type
> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
> >    - Aggregate (Min/Max/Count) push down for Parquet
> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
> >
> > [Streaming]
> >
> >    - EventTime based sessionization (session window)
> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
> >    - Add RocksDB StateStore as external module
> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
> >
> >
> > I wonder whether we should postpone the branch cut date.
> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
> > Li, Liang-Chi Hsieh, who work on the projects above.
> >
> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>
> > gurwls223@
>
> > &gt; wrote:
> >
> >> +1, thanks.
> >>
> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>
> > ltnwgl@
>
> > &gt; wrote:
> >>
> >>> Hi,
> >>>
> >>> As the expected release date is close,  I would like to volunteer as
> the
> >>> release manager for Apache Spark 3.2.0.
> >>>
> >>> Thanks,
> >>> Gengliang
> >>>
>
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Apache Spark 3.2 Expectation

Posted by Liang-Chi Hsieh <vi...@gmail.com>.
First, thanks for being volunteer as the release manager of Spark 3.2.0,
Gengliang!

And yes, for the two important Structured Streaming features, RocksDB
StateStore and session window, we're working on them and expect to have them
in the new release.

So I propose to postpone the branch cut date.

Thank you!

Liang-Chi


Gengliang Wang-2 wrote
> Thanks, Hyukjin.
> 
> The expected target branch cut date of Spark 3.2 is *July 1st* on
> https://spark.apache.org/versioning-policy.html. However, I notice that
> there are still multiple important projects in progress now:
> 
> [Core]
> 
>    - SPIP: Support push-based shuffle to improve shuffle efficiency
>    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
> 
> [SQL]
> 
>    - Support ANSI SQL INTERVAL types
>    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>    - Support Timestamp without time zone data type
>    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>    - Aggregate (Min/Max/Count) push down for Parquet
>    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
> 
> [Streaming]
> 
>    - EventTime based sessionization (session window)
>    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>    - Add RocksDB StateStore as external module
>    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
> 
> 
> I wonder whether we should postpone the branch cut date.
> cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
> Li, Liang-Chi Hsieh, who work on the projects above.
> 
> On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;

> gurwls223@

> &gt; wrote:
> 
>> +1, thanks.
>>
>> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;

> ltnwgl@

> &gt; wrote:
>>
>>> Hi,
>>>
>>> As the expected release date is close,  I would like to volunteer as the
>>> release manager for Apache Spark 3.2.0.
>>>
>>> Thanks,
>>> Gengliang
>>>





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Apache Spark 3.2 Expectation

Posted by Gengliang Wang <lt...@gmail.com>.
Thanks, Hyukjin.

The expected target branch cut date of Spark 3.2 is *July 1st* on
https://spark.apache.org/versioning-policy.html. However, I notice that
there are still multiple important projects in progress now:

[Core]

   - SPIP: Support push-based shuffle to improve shuffle efficiency
   <https://issues.apache.org/jira/browse/SPARK-30602>

[SQL]

   - Support ANSI SQL INTERVAL types
   <https://issues.apache.org/jira/browse/SPARK-27790>
   - Support Timestamp without time zone data type
   <https://issues.apache.org/jira/browse/SPARK-35662>
   - Aggregate (Min/Max/Count) push down for Parquet
   <https://issues.apache.org/jira/browse/SPARK-34952>

[Streaming]

   - EventTime based sessionization (session window)
   <https://issues.apache.org/jira/browse/SPARK-10816>
   - Add RocksDB StateStore as external module
   <https://issues.apache.org/jira/browse/SPARK-34198>


I wonder whether we should postpone the branch cut date.
cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
Li, Liang-Chi Hsieh, who work on the projects above.

On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon <gu...@gmail.com> wrote:

> +1, thanks.
>
> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, <lt...@gmail.com> wrote:
>
>> Hi,
>>
>> As the expected release date is close,  I would like to volunteer as the
>> release manager for Apache Spark 3.2.0.
>>
>> Thanks,
>> Gengliang
>>
>> On Mon, Apr 12, 2021 at 1:59 PM Wenchen Fan <cl...@gmail.com> wrote:
>>
>>> An update: we found a mistake that we picked the Spark 3.2 release date
>>> based on the scheduled release date of 3.1. However, 3.1 was delayed and
>>> released on March 2. In order to have a full 6 months development for 3.2,
>>> the target release date for 3.2 should be September 2.
>>>
>>> I'm updating the release dates in
>>> https://github.com/apache/spark-website/pull/331
>>>
>>> Thanks,
>>> Wenchen
>>>
>>> On Thu, Mar 11, 2021 at 11:17 PM Dongjoon Hyun <do...@gmail.com>
>>> wrote:
>>>
>>>> Thank you, Xiao, Wenchen and Hyukjin.
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Thu, Mar 11, 2021 at 2:15 AM Hyukjin Kwon <gu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Just for an update, I will send a discussion email about my idea late
>>>>> this week or early next week.
>>>>>
>>>>> 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan <cl...@gmail.com>님이 작성:
>>>>>
>>>>>> There are many projects going on right now, such as new DS v2 APIs,
>>>>>> ANSI interval types, join improvement, disaggregated shuffle, etc. I don't
>>>>>> think it's realistic to do the branch cut in April.
>>>>>>
>>>>>> I'm +1 to release 3.2 around July, but it doesn't mean we have to cut
>>>>>> the branch 3 months earlier. We should make the release process faster and
>>>>>> cut the branch around June probably.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 11, 2021 at 4:41 AM Xiao Li <ga...@gmail.com> wrote:
>>>>>>
>>>>>>> Below are some nice-to-have features we can work on in Spark 3.2: Lateral
>>>>>>> Join support <https://issues.apache.org/jira/browse/SPARK-28379>,
>>>>>>> interval data type, timestamp without time zone, un-nesting arbitrary
>>>>>>> queries, the returned metrics of DSV2, and error message standardization.
>>>>>>> Spark 3.2 will be another exciting release I believe!
>>>>>>>
>>>>>>> Go Spark!
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dongjoon Hyun <do...@gmail.com> 于2021年3月10日周三 下午12:25写道:
>>>>>>>
>>>>>>>> Hi, Xiao.
>>>>>>>>
>>>>>>>> This thread started 13 days ago. Since you asked the community
>>>>>>>> about major features or timelines at that time, could you share your
>>>>>>>> roadmap or expectations if you have something in your mind?
>>>>>>>>
>>>>>>>> > Thank you, Dongjoon, for initiating this discussion. Let us keep
>>>>>>>> it open. It might take 1-2 weeks to collect from the community all the
>>>>>>>> features we plan to build and ship in 3.2 since we just finished the 3.1
>>>>>>>> voting.
>>>>>>>> > TBH, cutting the branch this April does not look good to me. That
>>>>>>>> means, we only have one month left for feature development of Spark 3.2. Do
>>>>>>>> we have enough features in the current master branch? If not, are we able
>>>>>>>> to finish major features we collected here? Do they have a timeline or
>>>>>>>> project plan?
>>>>>>>>
>>>>>>>> Bests,
>>>>>>>> Dongjoon.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 3, 2021 at 2:58 PM Dongjoon Hyun <
>>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi, John.
>>>>>>>>>
>>>>>>>>> This thread aims to share your expectations and goals (and maybe
>>>>>>>>> work progress) to Apache Spark 3.2 because we are making this together. :)
>>>>>>>>>
>>>>>>>>> Bests,
>>>>>>>>> Dongjoon.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Mar 3, 2021 at 1:59 PM John Zhuge <jz...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Dongjoon,
>>>>>>>>>>
>>>>>>>>>> Is it possible to get ViewCatalog in? The community already had
>>>>>>>>>> fairly detailed discussions.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>> On Thu, Feb 25, 2021 at 8:57 AM Dongjoon Hyun <
>>>>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi, All.
>>>>>>>>>>>
>>>>>>>>>>> Since we have been preparing Apache Spark 3.2.0 in master branch
>>>>>>>>>>> since December 2020, March seems to be a good time to share our thoughts
>>>>>>>>>>> and aspirations on Apache Spark 3.2.
>>>>>>>>>>>
>>>>>>>>>>> According to the progress on Apache Spark 3.1 release, Apache
>>>>>>>>>>> Spark 3.2 seems to be the last minor release of this year. Given the
>>>>>>>>>>> timeframe, we might consider the following. (This is a small set. Please
>>>>>>>>>>> add your thoughts to this limited list.)
>>>>>>>>>>>
>>>>>>>>>>> # Languages
>>>>>>>>>>>
>>>>>>>>>>> - Scala 2.13 Support: This was expected on 3.1 via SPARK-25075
>>>>>>>>>>> but slipped out. Currently, we are trying to use Scala 2.13.5 via
>>>>>>>>>>> SPARK-34505 and investigating the publishing issue. Thank you for your
>>>>>>>>>>> contributions and feedback on this.
>>>>>>>>>>>
>>>>>>>>>>> - Java 17 LTS Support: Java 17 LTS will arrive in September
>>>>>>>>>>> 2017. Like Java 11, we need lots of support from our dependencies. Let's
>>>>>>>>>>> see.
>>>>>>>>>>>
>>>>>>>>>>> - Python 3.6 Deprecation(?): Python 3.6 community support ends
>>>>>>>>>>> at 2021-12-23. So, the deprecation is not required yet, but we had better
>>>>>>>>>>> prepare it because we don't have an ETA of Apache Spark 3.3 in 2022.
>>>>>>>>>>>
>>>>>>>>>>> - SparkR CRAN publishing: As we know, it's discontinued so far.
>>>>>>>>>>> Resuming it depends on the success of Apache SparkR 3.1.1 CRAN publishing.
>>>>>>>>>>> If it succeeds to revive it, we can keep publishing. Otherwise, I believe
>>>>>>>>>>> we had better drop it from the releasing work item list officially.
>>>>>>>>>>>
>>>>>>>>>>> # Dependencies
>>>>>>>>>>>
>>>>>>>>>>> - Apache Hadoop 3.3.2: Hadoop 3.2.0 becomes the default Hadoop
>>>>>>>>>>> profile in Apache Spark 3.1. Currently, Spark master branch lives on Hadoop
>>>>>>>>>>> 3.2.2's shaded clients via SPARK-33212. So far, there is one on-going
>>>>>>>>>>> report at YARN environment. We hope it will be fixed soon at Spark 3.2
>>>>>>>>>>> timeframe and we can move toward Hadoop 3.3.2.
>>>>>>>>>>>
>>>>>>>>>>> - Apache Hive 2.3.9: Spark 3.0 starts to use Hive 2.3.7 by
>>>>>>>>>>> default instead of old Hive 1.2 fork. Spark 3.1 removed hive-1.2 profile
>>>>>>>>>>> completely via SPARK-32981 and replaced the generated hive-service-rpc code
>>>>>>>>>>> with the official dependency via SPARK-32981. We are steadily improving
>>>>>>>>>>> this area and will consume Hive 2.3.9 if available.
>>>>>>>>>>>
>>>>>>>>>>> - K8s Client 4.13.2: During K8s GA activity, Spark 3.1 upgrades
>>>>>>>>>>> K8s client dependency to 4.12.0. Spark 3.2 upgrades it to 4.13.2 in order
>>>>>>>>>>> to support K8s model 1.19.
>>>>>>>>>>>
>>>>>>>>>>> - Kafka Client 2.8: To bring the client fixes, Spark 3.1 is
>>>>>>>>>>> using Kafka Client 2.6. For Spark 3.2, SPARK-33913 upgraded to Kafka 2.7
>>>>>>>>>>> with Scala 2.12.13, but it was reverted later due to Scala 2.12.13 issue.
>>>>>>>>>>> Since KAFKA-12357 fixed the Scala requirement two days ago, Spark 3.2 will
>>>>>>>>>>> go with Kafka Client 2.8 hopefully.
>>>>>>>>>>>
>>>>>>>>>>> # Some Features
>>>>>>>>>>>
>>>>>>>>>>> - Data Source v2: Spark 3.2 will deliver much richer DSv2 with
>>>>>>>>>>> Apache Iceberg integration. Especially, we hope the on-going function
>>>>>>>>>>> catalog SPIP and up-coming storage partitioned join SPIP can be delivered
>>>>>>>>>>> as a part of Spark 3.2 and become an additional foundation.
>>>>>>>>>>>
>>>>>>>>>>> - Columnar Encryption: As of today, Apache Spark master branch
>>>>>>>>>>> supports columnar encryption via Apache ORC 1.6 and it's documented via
>>>>>>>>>>> SPARK-34036. Also, upcoming Apache Parquet 1.12 has a similar capability.
>>>>>>>>>>> Hopefully, Apache Spark 3.2 is going to be the first release to have this
>>>>>>>>>>> feature officially. Any feedback is welcome.
>>>>>>>>>>>
>>>>>>>>>>> - Improved ZStandard Support: Spark 3.2 will bring more benefits
>>>>>>>>>>> for ZStandard users: 1) SPARK-34340 added native ZSTD JNI buffer pool
>>>>>>>>>>> support for all IO operations, 2) SPARK-33978 makes ORC datasource support
>>>>>>>>>>> ZSTD compression, 3) SPARK-34503 sets ZSTD as the default codec for event
>>>>>>>>>>> log compression, 4) SPARK-34479 aims to support ZSTD at Avro data source.
>>>>>>>>>>> Also, the upcoming Parquet 1.12 supports ZSTD (and supports JNI buffer
>>>>>>>>>>> pool), too. I'm expecting more benefits.
>>>>>>>>>>>
>>>>>>>>>>> - Structure Streaming with RocksDB backend: According to the
>>>>>>>>>>> latest update, it looks active enough for merging to master branch in Spark
>>>>>>>>>>> 3.2.
>>>>>>>>>>>
>>>>>>>>>>> Please share your thoughts and let's build better Apache Spark
>>>>>>>>>>> 3.2 together.
>>>>>>>>>>>
>>>>>>>>>>> Bests,
>>>>>>>>>>> Dongjoon.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> John Zhuge
>>>>>>>>>>
>>>>>>>>>

Re: Apache Spark 3.2 Expectation

Posted by Hyukjin Kwon <gu...@gmail.com>.
+1, thanks.

On Tue, 15 Jun 2021, 16:17 Gengliang Wang, <lt...@gmail.com> wrote:

> Hi,
>
> As the expected release date is close,  I would like to volunteer as the
> release manager for Apache Spark 3.2.0.
>
> Thanks,
> Gengliang
>
> On Mon, Apr 12, 2021 at 1:59 PM Wenchen Fan <cl...@gmail.com> wrote:
>
>> An update: we found a mistake that we picked the Spark 3.2 release date
>> based on the scheduled release date of 3.1. However, 3.1 was delayed and
>> released on March 2. In order to have a full 6 months development for 3.2,
>> the target release date for 3.2 should be September 2.
>>
>> I'm updating the release dates in
>> https://github.com/apache/spark-website/pull/331
>>
>> Thanks,
>> Wenchen
>>
>> On Thu, Mar 11, 2021 at 11:17 PM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>>> Thank you, Xiao, Wenchen and Hyukjin.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Thu, Mar 11, 2021 at 2:15 AM Hyukjin Kwon <gu...@gmail.com>
>>> wrote:
>>>
>>>> Just for an update, I will send a discussion email about my idea late
>>>> this week or early next week.
>>>>
>>>> 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan <cl...@gmail.com>님이 작성:
>>>>
>>>>> There are many projects going on right now, such as new DS v2 APIs,
>>>>> ANSI interval types, join improvement, disaggregated shuffle, etc. I don't
>>>>> think it's realistic to do the branch cut in April.
>>>>>
>>>>> I'm +1 to release 3.2 around July, but it doesn't mean we have to cut
>>>>> the branch 3 months earlier. We should make the release process faster and
>>>>> cut the branch around June probably.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Mar 11, 2021 at 4:41 AM Xiao Li <ga...@gmail.com> wrote:
>>>>>
>>>>>> Below are some nice-to-have features we can work on in Spark 3.2: Lateral
>>>>>> Join support <https://issues.apache.org/jira/browse/SPARK-28379>,
>>>>>> interval data type, timestamp without time zone, un-nesting arbitrary
>>>>>> queries, the returned metrics of DSV2, and error message standardization.
>>>>>> Spark 3.2 will be another exciting release I believe!
>>>>>>
>>>>>> Go Spark!
>>>>>>
>>>>>> Xiao
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dongjoon Hyun <do...@gmail.com> 于2021年3月10日周三 下午12:25写道:
>>>>>>
>>>>>>> Hi, Xiao.
>>>>>>>
>>>>>>> This thread started 13 days ago. Since you asked the community about
>>>>>>> major features or timelines at that time, could you share your roadmap or
>>>>>>> expectations if you have something in your mind?
>>>>>>>
>>>>>>> > Thank you, Dongjoon, for initiating this discussion. Let us keep
>>>>>>> it open. It might take 1-2 weeks to collect from the community all the
>>>>>>> features we plan to build and ship in 3.2 since we just finished the 3.1
>>>>>>> voting.
>>>>>>> > TBH, cutting the branch this April does not look good to me. That
>>>>>>> means, we only have one month left for feature development of Spark 3.2. Do
>>>>>>> we have enough features in the current master branch? If not, are we able
>>>>>>> to finish major features we collected here? Do they have a timeline or
>>>>>>> project plan?
>>>>>>>
>>>>>>> Bests,
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 3, 2021 at 2:58 PM Dongjoon Hyun <
>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi, John.
>>>>>>>>
>>>>>>>> This thread aims to share your expectations and goals (and maybe
>>>>>>>> work progress) to Apache Spark 3.2 because we are making this together. :)
>>>>>>>>
>>>>>>>> Bests,
>>>>>>>> Dongjoon.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 3, 2021 at 1:59 PM John Zhuge <jz...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Dongjoon,
>>>>>>>>>
>>>>>>>>> Is it possible to get ViewCatalog in? The community already had
>>>>>>>>> fairly detailed discussions.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>> On Thu, Feb 25, 2021 at 8:57 AM Dongjoon Hyun <
>>>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi, All.
>>>>>>>>>>
>>>>>>>>>> Since we have been preparing Apache Spark 3.2.0 in master branch
>>>>>>>>>> since December 2020, March seems to be a good time to share our thoughts
>>>>>>>>>> and aspirations on Apache Spark 3.2.
>>>>>>>>>>
>>>>>>>>>> According to the progress on Apache Spark 3.1 release, Apache
>>>>>>>>>> Spark 3.2 seems to be the last minor release of this year. Given the
>>>>>>>>>> timeframe, we might consider the following. (This is a small set. Please
>>>>>>>>>> add your thoughts to this limited list.)
>>>>>>>>>>
>>>>>>>>>> # Languages
>>>>>>>>>>
>>>>>>>>>> - Scala 2.13 Support: This was expected on 3.1 via SPARK-25075
>>>>>>>>>> but slipped out. Currently, we are trying to use Scala 2.13.5 via
>>>>>>>>>> SPARK-34505 and investigating the publishing issue. Thank you for your
>>>>>>>>>> contributions and feedback on this.
>>>>>>>>>>
>>>>>>>>>> - Java 17 LTS Support: Java 17 LTS will arrive in September 2017.
>>>>>>>>>> Like Java 11, we need lots of support from our dependencies. Let's see.
>>>>>>>>>>
>>>>>>>>>> - Python 3.6 Deprecation(?): Python 3.6 community support ends at
>>>>>>>>>> 2021-12-23. So, the deprecation is not required yet, but we had better
>>>>>>>>>> prepare it because we don't have an ETA of Apache Spark 3.3 in 2022.
>>>>>>>>>>
>>>>>>>>>> - SparkR CRAN publishing: As we know, it's discontinued so far.
>>>>>>>>>> Resuming it depends on the success of Apache SparkR 3.1.1 CRAN publishing.
>>>>>>>>>> If it succeeds to revive it, we can keep publishing. Otherwise, I believe
>>>>>>>>>> we had better drop it from the releasing work item list officially.
>>>>>>>>>>
>>>>>>>>>> # Dependencies
>>>>>>>>>>
>>>>>>>>>> - Apache Hadoop 3.3.2: Hadoop 3.2.0 becomes the default Hadoop
>>>>>>>>>> profile in Apache Spark 3.1. Currently, Spark master branch lives on Hadoop
>>>>>>>>>> 3.2.2's shaded clients via SPARK-33212. So far, there is one on-going
>>>>>>>>>> report at YARN environment. We hope it will be fixed soon at Spark 3.2
>>>>>>>>>> timeframe and we can move toward Hadoop 3.3.2.
>>>>>>>>>>
>>>>>>>>>> - Apache Hive 2.3.9: Spark 3.0 starts to use Hive 2.3.7 by
>>>>>>>>>> default instead of old Hive 1.2 fork. Spark 3.1 removed hive-1.2 profile
>>>>>>>>>> completely via SPARK-32981 and replaced the generated hive-service-rpc code
>>>>>>>>>> with the official dependency via SPARK-32981. We are steadily improving
>>>>>>>>>> this area and will consume Hive 2.3.9 if available.
>>>>>>>>>>
>>>>>>>>>> - K8s Client 4.13.2: During K8s GA activity, Spark 3.1 upgrades
>>>>>>>>>> K8s client dependency to 4.12.0. Spark 3.2 upgrades it to 4.13.2 in order
>>>>>>>>>> to support K8s model 1.19.
>>>>>>>>>>
>>>>>>>>>> - Kafka Client 2.8: To bring the client fixes, Spark 3.1 is using
>>>>>>>>>> Kafka Client 2.6. For Spark 3.2, SPARK-33913 upgraded to Kafka 2.7 with
>>>>>>>>>> Scala 2.12.13, but it was reverted later due to Scala 2.12.13 issue. Since
>>>>>>>>>> KAFKA-12357 fixed the Scala requirement two days ago, Spark 3.2 will go
>>>>>>>>>> with Kafka Client 2.8 hopefully.
>>>>>>>>>>
>>>>>>>>>> # Some Features
>>>>>>>>>>
>>>>>>>>>> - Data Source v2: Spark 3.2 will deliver much richer DSv2 with
>>>>>>>>>> Apache Iceberg integration. Especially, we hope the on-going function
>>>>>>>>>> catalog SPIP and up-coming storage partitioned join SPIP can be delivered
>>>>>>>>>> as a part of Spark 3.2 and become an additional foundation.
>>>>>>>>>>
>>>>>>>>>> - Columnar Encryption: As of today, Apache Spark master branch
>>>>>>>>>> supports columnar encryption via Apache ORC 1.6 and it's documented via
>>>>>>>>>> SPARK-34036. Also, upcoming Apache Parquet 1.12 has a similar capability.
>>>>>>>>>> Hopefully, Apache Spark 3.2 is going to be the first release to have this
>>>>>>>>>> feature officially. Any feedback is welcome.
>>>>>>>>>>
>>>>>>>>>> - Improved ZStandard Support: Spark 3.2 will bring more benefits
>>>>>>>>>> for ZStandard users: 1) SPARK-34340 added native ZSTD JNI buffer pool
>>>>>>>>>> support for all IO operations, 2) SPARK-33978 makes ORC datasource support
>>>>>>>>>> ZSTD compression, 3) SPARK-34503 sets ZSTD as the default codec for event
>>>>>>>>>> log compression, 4) SPARK-34479 aims to support ZSTD at Avro data source.
>>>>>>>>>> Also, the upcoming Parquet 1.12 supports ZSTD (and supports JNI buffer
>>>>>>>>>> pool), too. I'm expecting more benefits.
>>>>>>>>>>
>>>>>>>>>> - Structure Streaming with RocksDB backend: According to the
>>>>>>>>>> latest update, it looks active enough for merging to master branch in Spark
>>>>>>>>>> 3.2.
>>>>>>>>>>
>>>>>>>>>> Please share your thoughts and let's build better Apache Spark
>>>>>>>>>> 3.2 together.
>>>>>>>>>>
>>>>>>>>>> Bests,
>>>>>>>>>> Dongjoon.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> John Zhuge
>>>>>>>>>
>>>>>>>>