You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Gengliang Wang <lt...@gmail.com> on 2021/07/01 05:55:57 UTC

Re: Apache Spark 3.2 Expectation

Hi all,

Just as a gentle reminder, I will do the branch cut tomorrow. Please focus
on finalizing the works to land in Spark 3.2.0.
After the branch cut, we can still merge the ongoing major features
mentioned in this thread. There should no be other new features in branch
3.2.
Thanks!

On Thu, Jun 17, 2021 at 2:57 PM Hyukjin Kwon <gu...@gmail.com> wrote:

> *GA -> QA
>
> On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon, <gu...@gmail.com> wrote:
>
>> I think we would make sure treating these items in the list as exceptions
>> from the code freeze, and discourage to push new APIs and features though.
>>
>> GA period ideally we should focus on bug fixes and polishing.
>>
>> It would be great if we can speed up on these items in the list too.
>>
>>
>> On Thu, 17 Jun 2021, 15:08 Gengliang Wang, <lt...@gmail.com> wrote:
>>
>>> Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
>>> Now we make it clear that it's a soft cut and we can still merge
>>> important code changes to branch-3.2 before RC. Let's keep the branch cut
>>> date as July 1st.
>>>
>>> On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun <do...@gmail.com>
>>> wrote:
>>>
>>>> > First, I think you are saying "branch-3.2";
>>>>
>>>> To Xiao. Yes, it's was a typo of "branch-3.2".
>>>>
>>>> > We do strongly prefer to cut the release for Spark 3.2.0 including
>>>> all the patches under SPARK-30602.
>>>> > This way, we can backport the other performance/operability
>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>> future Spark 3.2.x patch releases.
>>>>
>>>> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+ as
>>>> Xiao wrote.
>>>>
>>>>
>>>>
>>>> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li <ga...@gmail.com> wrote:
>>>>
>>>>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>>>> soft cut and the committers still are able to commit to `branch-3.3`
>>>>>> according to their decisions.
>>>>>
>>>>>
>>>>> First, I think you are saying "branch-3.2";
>>>>>
>>>>> Second, the "so cut" means no "code freeze", although we cut the
>>>>> branch. To avoid releasing half-baked and unready features, the release
>>>>> manager needs to be very careful when cutting the RC. Based on what is
>>>>> proposed here, the RC date is the actual code freeze date.
>>>>>
>>>>> This way, we can backport the other performance/operability
>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>> future Spark 3.2.x patch releases.
>>>>>
>>>>>
>>>>> This is not allowed based on the policy. Only bug fixes can be merged
>>>>> to the patch releases. Thus, if we know it will introduce major performance
>>>>> regression, we have to turn the feature off by default.
>>>>>
>>>>> Xiao
>>>>>
>>>>>
>>>>>
>>>>> Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:
>>>>>
>>>>>> Hi Gengliang,
>>>>>>
>>>>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>>>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we
>>>>>> are close to having all the patches merged to master to enable push-based
>>>>>> shuffle.
>>>>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>>>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>>>>> We should be able to post the PRs for the other 2 remaining tickets
>>>>>> (SPARK-32923 and SPARK-35546) early next week.
>>>>>>
>>>>>> The tickets under SPARK-30602 are the minimum set of patches to
>>>>>> enable push-based shuffle.
>>>>>> We do have other performance/operability enhancements tickets under
>>>>>> SPARK-33235 that are needed to fully contribute what we have internally for
>>>>>> push-based shuffle.
>>>>>> However, these are optional for enabling push-based shuffle.
>>>>>> We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>>> all the patches under SPARK-30602.
>>>>>> This way, we can backport the other performance/operability
>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>> future Spark 3.2.x patch releases.
>>>>>> I understand the preference of not postponing the branch cut date.
>>>>>> We will check with Dongjoon regarding the soft cut date and the
>>>>>> flexibility for including the remaining tickets under SPARK-30602 into
>>>>>> branch-3.2.
>>>>>>
>>>>>> Best,
>>>>>> Min
>>>>>>
>>>>>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more this.
>>>>>>> As it is soft cut date, there is no reason to postpone it.
>>>>>>>
>>>>>>> It sounds good then to keep original branch cut date.
>>>>>>>
>>>>>>> Thank you.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dongjoon Hyun-2 wrote
>>>>>>> > Thank you for volunteering, Gengliang.
>>>>>>> >
>>>>>>> > Apache Spark 3.2.0 is the first version enabling AQE by default.
>>>>>>> I'm also
>>>>>>> > watching some on-going improvements on that.
>>>>>>> >
>>>>>>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL
>>>>>>> Adaptive Query
>>>>>>> > Execution QA)
>>>>>>> >
>>>>>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this is
>>>>>>> a soft
>>>>>>> > cut and the committers still are able to commit to `branch-3.3`
>>>>>>> according
>>>>>>> > to their decisions.
>>>>>>> >
>>>>>>> > Given that Apache Spark had 115 commits in a week in various areas
>>>>>>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>>>>>>> > branch-3.3 and allowing only limited backporting.
>>>>>>> >
>>>>>>> >     https://github.com/apache/spark/graphs/commit-activity
>>>>>>> >
>>>>>>> > Bests,
>>>>>>> > Dongjoon.
>>>>>>> >
>>>>>>> >
>>>>>>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>>>>>>
>>>>>>> > viirya@
>>>>>>>
>>>>>>> > &gt; wrote:
>>>>>>> >
>>>>>>> >> First, thanks for being volunteer as the release manager of Spark
>>>>>>> 3.2.0,
>>>>>>> >> Gengliang!
>>>>>>> >>
>>>>>>> >> And yes, for the two important Structured Streaming features,
>>>>>>> RocksDB
>>>>>>> >> StateStore and session window, we're working on them and expect
>>>>>>> to have
>>>>>>> >> them
>>>>>>> >> in the new release.
>>>>>>> >>
>>>>>>> >> So I propose to postpone the branch cut date.
>>>>>>> >>
>>>>>>> >> Thank you!
>>>>>>> >>
>>>>>>> >> Liang-Chi
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> Gengliang Wang-2 wrote
>>>>>>> >> > Thanks, Hyukjin.
>>>>>>> >> >
>>>>>>> >> > The expected target branch cut date of Spark 3.2 is *July 1st*
>>>>>>> on
>>>>>>> >> > https://spark.apache.org/versioning-policy.html. However, I
>>>>>>> notice that
>>>>>>> >> > there are still multiple important projects in progress now:
>>>>>>> >> >
>>>>>>> >> > [Core]
>>>>>>> >> >
>>>>>>> >> >    - SPIP: Support push-based shuffle to improve shuffle
>>>>>>> efficiency
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>>>>>>> >> >
>>>>>>> >> > [SQL]
>>>>>>> >> >
>>>>>>> >> >    - Support ANSI SQL INTERVAL types
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>>>>>>> >> >    - Support Timestamp without time zone data type
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>>>>>>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>>>>>>> >> >
>>>>>>> >> > [Streaming]
>>>>>>> >> >
>>>>>>> >> >    - EventTime based sessionization (session window)
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>>>>>>> >> >    - Add RocksDB StateStore as external module
>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > I wonder whether we should postpone the branch cut date.
>>>>>>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuanjian
>>>>>>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>>>>>>> >> >
>>>>>>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>>>>>> >>
>>>>>>> >> > gurwls223@
>>>>>>> >>
>>>>>>> >> > &gt; wrote:
>>>>>>> >> >
>>>>>>> >> >> +1, thanks.
>>>>>>> >> >>
>>>>>>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>>>>>> >>
>>>>>>> >> > ltnwgl@
>>>>>>> >>
>>>>>>> >> > &gt; wrote:
>>>>>>> >> >>
>>>>>>> >> >>> Hi,
>>>>>>> >> >>>
>>>>>>> >> >>> As the expected release date is close,  I would like to
>>>>>>> volunteer as
>>>>>>> >> the
>>>>>>> >> >>> release manager for Apache Spark 3.2.0.
>>>>>>> >> >>>
>>>>>>> >> >>> Thanks,
>>>>>>> >> >>> Gengliang
>>>>>>> >> >>>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Sent from:
>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>> >>
>>>>>>> >>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> >> To unsubscribe e-mail:
>>>>>>>
>>>>>>> > dev-unsubscribe@.apache
>>>>>>>
>>>>>>> >>
>>>>>>> >>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Sent from:
>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>
>>>>>>>

Re: Apache Spark 3.2 Expectation

Posted by Gengliang Wang <lt...@gmail.com>.
Hi all,

I just cut branch-3.2 on Github and created version 3.3.0 on Jira.
When merging PRs on the master branch before 3.2.0 RC, please help
cherry-picking bug fixes and ongoing major features mentioned in this
thread to branch-3.2, thanks!

On Fri, Jul 2, 2021 at 2:31 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you, Gengliang!
>
> On Wed, Jun 30, 2021 at 10:56 PM Gengliang Wang <lt...@gmail.com> wrote:
>
>> Hi all,
>>
>> Just as a gentle reminder, I will do the branch cut tomorrow. Please
>> focus on finalizing the works to land in Spark 3.2.0.
>> After the branch cut, we can still merge the ongoing major features
>> mentioned in this thread. There should no be other new features in branch
>> 3.2.
>> Thanks!
>>
>> On Thu, Jun 17, 2021 at 2:57 PM Hyukjin Kwon <gu...@gmail.com> wrote:
>>
>>> *GA -> QA
>>>
>>> On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>>
>>>> I think we would make sure treating these items in the list as
>>>> exceptions from the code freeze, and discourage to push new APIs and
>>>> features though.
>>>>
>>>> GA period ideally we should focus on bug fixes and polishing.
>>>>
>>>> It would be great if we can speed up on these items in the list too.
>>>>
>>>>
>>>> On Thu, 17 Jun 2021, 15:08 Gengliang Wang, <lt...@gmail.com> wrote:
>>>>
>>>>> Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
>>>>> Now we make it clear that it's a soft cut and we can still merge
>>>>> important code changes to branch-3.2 before RC. Let's keep the branch cut
>>>>> date as July 1st.
>>>>>
>>>>> On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> > First, I think you are saying "branch-3.2";
>>>>>>
>>>>>> To Xiao. Yes, it's was a typo of "branch-3.2".
>>>>>>
>>>>>> > We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>>> all the patches under SPARK-30602.
>>>>>> > This way, we can backport the other performance/operability
>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>> future Spark 3.2.x patch releases.
>>>>>>
>>>>>> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+
>>>>>> as Xiao wrote.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li <ga...@gmail.com> wrote:
>>>>>>
>>>>>>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>>>>>> soft cut and the committers still are able to commit to `branch-3.3`
>>>>>>>> according to their decisions.
>>>>>>>
>>>>>>>
>>>>>>> First, I think you are saying "branch-3.2";
>>>>>>>
>>>>>>> Second, the "so cut" means no "code freeze", although we cut the
>>>>>>> branch. To avoid releasing half-baked and unready features, the release
>>>>>>> manager needs to be very careful when cutting the RC. Based on what is
>>>>>>> proposed here, the RC date is the actual code freeze date.
>>>>>>>
>>>>>>> This way, we can backport the other performance/operability
>>>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>>>> future Spark 3.2.x patch releases.
>>>>>>>
>>>>>>>
>>>>>>> This is not allowed based on the policy. Only bug fixes can be
>>>>>>> merged to the patch releases. Thus, if we know it will introduce major
>>>>>>> performance regression, we have to turn the feature off by default.
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:
>>>>>>>
>>>>>>>> Hi Gengliang,
>>>>>>>>
>>>>>>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>>>>>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we
>>>>>>>> are close to having all the patches merged to master to enable push-based
>>>>>>>> shuffle.
>>>>>>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>>>>>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>>>>>>> We should be able to post the PRs for the other 2 remaining tickets
>>>>>>>> (SPARK-32923 and SPARK-35546) early next week.
>>>>>>>>
>>>>>>>> The tickets under SPARK-30602 are the minimum set of patches to
>>>>>>>> enable push-based shuffle.
>>>>>>>> We do have other performance/operability enhancements tickets under
>>>>>>>> SPARK-33235 that are needed to fully contribute what we have internally for
>>>>>>>> push-based shuffle.
>>>>>>>> However, these are optional for enabling push-based shuffle.
>>>>>>>> We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>>>>> all the patches under SPARK-30602.
>>>>>>>> This way, we can backport the other performance/operability
>>>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>>>> future Spark 3.2.x patch releases.
>>>>>>>> I understand the preference of not postponing the branch cut date.
>>>>>>>> We will check with Dongjoon regarding the soft cut date and the
>>>>>>>> flexibility for including the remaining tickets under SPARK-30602 into
>>>>>>>> branch-3.2.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Min
>>>>>>>>
>>>>>>>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more
>>>>>>>>> this.
>>>>>>>>> As it is soft cut date, there is no reason to postpone it.
>>>>>>>>>
>>>>>>>>> It sounds good then to keep original branch cut date.
>>>>>>>>>
>>>>>>>>> Thank you.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dongjoon Hyun-2 wrote
>>>>>>>>> > Thank you for volunteering, Gengliang.
>>>>>>>>> >
>>>>>>>>> > Apache Spark 3.2.0 is the first version enabling AQE by default.
>>>>>>>>> I'm also
>>>>>>>>> > watching some on-going improvements on that.
>>>>>>>>> >
>>>>>>>>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL
>>>>>>>>> Adaptive Query
>>>>>>>>> > Execution QA)
>>>>>>>>> >
>>>>>>>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this
>>>>>>>>> is a soft
>>>>>>>>> > cut and the committers still are able to commit to `branch-3.3`
>>>>>>>>> according
>>>>>>>>> > to their decisions.
>>>>>>>>> >
>>>>>>>>> > Given that Apache Spark had 115 commits in a week in various
>>>>>>>>> areas
>>>>>>>>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>>>>>>>>> > branch-3.3 and allowing only limited backporting.
>>>>>>>>> >
>>>>>>>>> >     https://github.com/apache/spark/graphs/commit-activity
>>>>>>>>> >
>>>>>>>>> > Bests,
>>>>>>>>> > Dongjoon.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>>>>>>>>
>>>>>>>>> > viirya@
>>>>>>>>>
>>>>>>>>> > &gt; wrote:
>>>>>>>>> >
>>>>>>>>> >> First, thanks for being volunteer as the release manager of
>>>>>>>>> Spark 3.2.0,
>>>>>>>>> >> Gengliang!
>>>>>>>>> >>
>>>>>>>>> >> And yes, for the two important Structured Streaming features,
>>>>>>>>> RocksDB
>>>>>>>>> >> StateStore and session window, we're working on them and expect
>>>>>>>>> to have
>>>>>>>>> >> them
>>>>>>>>> >> in the new release.
>>>>>>>>> >>
>>>>>>>>> >> So I propose to postpone the branch cut date.
>>>>>>>>> >>
>>>>>>>>> >> Thank you!
>>>>>>>>> >>
>>>>>>>>> >> Liang-Chi
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> Gengliang Wang-2 wrote
>>>>>>>>> >> > Thanks, Hyukjin.
>>>>>>>>> >> >
>>>>>>>>> >> > The expected target branch cut date of Spark 3.2 is *July
>>>>>>>>> 1st* on
>>>>>>>>> >> > https://spark.apache.org/versioning-policy.html. However, I
>>>>>>>>> notice that
>>>>>>>>> >> > there are still multiple important projects in progress now:
>>>>>>>>> >> >
>>>>>>>>> >> > [Core]
>>>>>>>>> >> >
>>>>>>>>> >> >    - SPIP: Support push-based shuffle to improve shuffle
>>>>>>>>> efficiency
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>>>>>>>>> >> >
>>>>>>>>> >> > [SQL]
>>>>>>>>> >> >
>>>>>>>>> >> >    - Support ANSI SQL INTERVAL types
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>>>>>>>>> >> >    - Support Timestamp without time zone data type
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>>>>>>>>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>>>>>>>>> >> >
>>>>>>>>> >> > [Streaming]
>>>>>>>>> >> >
>>>>>>>>> >> >    - EventTime based sessionization (session window)
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>>>>>>>>> >> >    - Add RocksDB StateStore as external module
>>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>>>>>>>>> >> >
>>>>>>>>> >> >
>>>>>>>>> >> > I wonder whether we should postpone the branch cut date.
>>>>>>>>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim,
>>>>>>>>> Yuanjian
>>>>>>>>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>>>>>>>>> >> >
>>>>>>>>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>>>>>>>> >>
>>>>>>>>> >> > gurwls223@
>>>>>>>>> >>
>>>>>>>>> >> > &gt; wrote:
>>>>>>>>> >> >
>>>>>>>>> >> >> +1, thanks.
>>>>>>>>> >> >>
>>>>>>>>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>>>>>>>> >>
>>>>>>>>> >> > ltnwgl@
>>>>>>>>> >>
>>>>>>>>> >> > &gt; wrote:
>>>>>>>>> >> >>
>>>>>>>>> >> >>> Hi,
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> As the expected release date is close,  I would like to
>>>>>>>>> volunteer as
>>>>>>>>> >> the
>>>>>>>>> >> >>> release manager for Apache Spark 3.2.0.
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> Thanks,
>>>>>>>>> >> >>> Gengliang
>>>>>>>>> >> >>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> >> --
>>>>>>>>> >> Sent from:
>>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> >> To unsubscribe e-mail:
>>>>>>>>>
>>>>>>>>> > dev-unsubscribe@.apache
>>>>>>>>>
>>>>>>>>> >>
>>>>>>>>> >>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sent from:
>>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>>
>>>>>>>>>

Re: Apache Spark 3.2 Expectation

Posted by Dongjoon Hyun <do...@gmail.com>.
Thank you, Gengliang!

On Wed, Jun 30, 2021 at 10:56 PM Gengliang Wang <lt...@gmail.com> wrote:

> Hi all,
>
> Just as a gentle reminder, I will do the branch cut tomorrow. Please
> focus on finalizing the works to land in Spark 3.2.0.
> After the branch cut, we can still merge the ongoing major features
> mentioned in this thread. There should no be other new features in branch
> 3.2.
> Thanks!
>
> On Thu, Jun 17, 2021 at 2:57 PM Hyukjin Kwon <gu...@gmail.com> wrote:
>
>> *GA -> QA
>>
>> On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon, <gu...@gmail.com> wrote:
>>
>>> I think we would make sure treating these items in the list as
>>> exceptions from the code freeze, and discourage to push new APIs and
>>> features though.
>>>
>>> GA period ideally we should focus on bug fixes and polishing.
>>>
>>> It would be great if we can speed up on these items in the list too.
>>>
>>>
>>> On Thu, 17 Jun 2021, 15:08 Gengliang Wang, <lt...@gmail.com> wrote:
>>>
>>>> Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao!
>>>> Now we make it clear that it's a soft cut and we can still merge
>>>> important code changes to branch-3.2 before RC. Let's keep the branch cut
>>>> date as July 1st.
>>>>
>>>> On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> > First, I think you are saying "branch-3.2";
>>>>>
>>>>> To Xiao. Yes, it's was a typo of "branch-3.2".
>>>>>
>>>>> > We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>> all the patches under SPARK-30602.
>>>>> > This way, we can backport the other performance/operability
>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>> future Spark 3.2.x patch releases.
>>>>>
>>>>> To Min, after releasing 3.2.0, only bug fixes are allowed for 3.2.1+
>>>>> as Xiao wrote.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jun 16, 2021 at 9:42 PM Xiao Li <ga...@gmail.com> wrote:
>>>>>
>>>>>> To Liang-Chi, I'm -1 for postponing the branch cut because this is a
>>>>>>> soft cut and the committers still are able to commit to `branch-3.3`
>>>>>>> according to their decisions.
>>>>>>
>>>>>>
>>>>>> First, I think you are saying "branch-3.2";
>>>>>>
>>>>>> Second, the "so cut" means no "code freeze", although we cut the
>>>>>> branch. To avoid releasing half-baked and unready features, the release
>>>>>> manager needs to be very careful when cutting the RC. Based on what is
>>>>>> proposed here, the RC date is the actual code freeze date.
>>>>>>
>>>>>> This way, we can backport the other performance/operability
>>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>>> future Spark 3.2.x patch releases.
>>>>>>
>>>>>>
>>>>>> This is not allowed based on the policy. Only bug fixes can be merged
>>>>>> to the patch releases. Thus, if we know it will introduce major performance
>>>>>> regression, we have to turn the feature off by default.
>>>>>>
>>>>>> Xiao
>>>>>>
>>>>>>
>>>>>>
>>>>>> Min Shen <vi...@gmail.com> 于2021年6月16日周三 下午3:22写道:
>>>>>>
>>>>>>> Hi Gengliang,
>>>>>>>
>>>>>>> Thanks for volunteering as the release manager for Spark 3.2.0.
>>>>>>> Regarding the ongoing work of push-based shuffle in SPARK-30602, we
>>>>>>> are close to having all the patches merged to master to enable push-based
>>>>>>> shuffle.
>>>>>>> Currently, there are 2 PRs under SPARK-30602 that are under active
>>>>>>> review (SPARK-32922 and SPARK-35671), and hopefully can be merged soon.
>>>>>>> We should be able to post the PRs for the other 2 remaining tickets
>>>>>>> (SPARK-32923 and SPARK-35546) early next week.
>>>>>>>
>>>>>>> The tickets under SPARK-30602 are the minimum set of patches to
>>>>>>> enable push-based shuffle.
>>>>>>> We do have other performance/operability enhancements tickets under
>>>>>>> SPARK-33235 that are needed to fully contribute what we have internally for
>>>>>>> push-based shuffle.
>>>>>>> However, these are optional for enabling push-based shuffle.
>>>>>>> We do strongly prefer to cut the release for Spark 3.2.0 including
>>>>>>> all the patches under SPARK-30602.
>>>>>>> This way, we can backport the other performance/operability
>>>>>>> enhancements tickets under SPARK-33235 into branch-3.2 to be released in
>>>>>>> future Spark 3.2.x patch releases.
>>>>>>> I understand the preference of not postponing the branch cut date.
>>>>>>> We will check with Dongjoon regarding the soft cut date and the
>>>>>>> flexibility for including the remaining tickets under SPARK-30602 into
>>>>>>> branch-3.2.
>>>>>>>
>>>>>>> Best,
>>>>>>> Min
>>>>>>>
>>>>>>> On Wed, Jun 16, 2021 at 1:20 PM Liang-Chi Hsieh <vi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks Dongjoon. I've talked with Dongjoon offline to know more
>>>>>>>> this.
>>>>>>>> As it is soft cut date, there is no reason to postpone it.
>>>>>>>>
>>>>>>>> It sounds good then to keep original branch cut date.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Dongjoon Hyun-2 wrote
>>>>>>>> > Thank you for volunteering, Gengliang.
>>>>>>>> >
>>>>>>>> > Apache Spark 3.2.0 is the first version enabling AQE by default.
>>>>>>>> I'm also
>>>>>>>> > watching some on-going improvements on that.
>>>>>>>> >
>>>>>>>> >     https://issues.apache.org/jira/browse/SPARK-33828 (SQL
>>>>>>>> Adaptive Query
>>>>>>>> > Execution QA)
>>>>>>>> >
>>>>>>>> > To Liang-Chi, I'm -1 for postponing the branch cut because this
>>>>>>>> is a soft
>>>>>>>> > cut and the committers still are able to commit to `branch-3.3`
>>>>>>>> according
>>>>>>>> > to their decisions.
>>>>>>>> >
>>>>>>>> > Given that Apache Spark had 115 commits in a week in various areas
>>>>>>>> > concurrently, we should start QA for Apache Spark 3.2 by creating
>>>>>>>> > branch-3.3 and allowing only limited backporting.
>>>>>>>> >
>>>>>>>> >     https://github.com/apache/spark/graphs/commit-activity
>>>>>>>> >
>>>>>>>> > Bests,
>>>>>>>> > Dongjoon.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh &lt;
>>>>>>>>
>>>>>>>> > viirya@
>>>>>>>>
>>>>>>>> > &gt; wrote:
>>>>>>>> >
>>>>>>>> >> First, thanks for being volunteer as the release manager of
>>>>>>>> Spark 3.2.0,
>>>>>>>> >> Gengliang!
>>>>>>>> >>
>>>>>>>> >> And yes, for the two important Structured Streaming features,
>>>>>>>> RocksDB
>>>>>>>> >> StateStore and session window, we're working on them and expect
>>>>>>>> to have
>>>>>>>> >> them
>>>>>>>> >> in the new release.
>>>>>>>> >>
>>>>>>>> >> So I propose to postpone the branch cut date.
>>>>>>>> >>
>>>>>>>> >> Thank you!
>>>>>>>> >>
>>>>>>>> >> Liang-Chi
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> Gengliang Wang-2 wrote
>>>>>>>> >> > Thanks, Hyukjin.
>>>>>>>> >> >
>>>>>>>> >> > The expected target branch cut date of Spark 3.2 is *July 1st*
>>>>>>>> on
>>>>>>>> >> > https://spark.apache.org/versioning-policy.html. However, I
>>>>>>>> notice that
>>>>>>>> >> > there are still multiple important projects in progress now:
>>>>>>>> >> >
>>>>>>>> >> > [Core]
>>>>>>>> >> >
>>>>>>>> >> >    - SPIP: Support push-based shuffle to improve shuffle
>>>>>>>> efficiency
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-30602&gt;
>>>>>>>> >> >
>>>>>>>> >> > [SQL]
>>>>>>>> >> >
>>>>>>>> >> >    - Support ANSI SQL INTERVAL types
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-27790&gt;
>>>>>>>> >> >    - Support Timestamp without time zone data type
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-35662&gt;
>>>>>>>> >> >    - Aggregate (Min/Max/Count) push down for Parquet
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34952&gt;
>>>>>>>> >> >
>>>>>>>> >> > [Streaming]
>>>>>>>> >> >
>>>>>>>> >> >    - EventTime based sessionization (session window)
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-10816&gt;
>>>>>>>> >> >    - Add RocksDB StateStore as external module
>>>>>>>> >> >    &lt;https://issues.apache.org/jira/browse/SPARK-34198&gt;
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >> > I wonder whether we should postpone the branch cut date.
>>>>>>>> >> > cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim,
>>>>>>>> Yuanjian
>>>>>>>> >> > Li, Liang-Chi Hsieh, who work on the projects above.
>>>>>>>> >> >
>>>>>>>> >> > On Tue, Jun 15, 2021 at 4:34 PM Hyukjin Kwon &lt;
>>>>>>>> >>
>>>>>>>> >> > gurwls223@
>>>>>>>> >>
>>>>>>>> >> > &gt; wrote:
>>>>>>>> >> >
>>>>>>>> >> >> +1, thanks.
>>>>>>>> >> >>
>>>>>>>> >> >> On Tue, 15 Jun 2021, 16:17 Gengliang Wang, &lt;
>>>>>>>> >>
>>>>>>>> >> > ltnwgl@
>>>>>>>> >>
>>>>>>>> >> > &gt; wrote:
>>>>>>>> >> >>
>>>>>>>> >> >>> Hi,
>>>>>>>> >> >>>
>>>>>>>> >> >>> As the expected release date is close,  I would like to
>>>>>>>> volunteer as
>>>>>>>> >> the
>>>>>>>> >> >>> release manager for Apache Spark 3.2.0.
>>>>>>>> >> >>>
>>>>>>>> >> >>> Thanks,
>>>>>>>> >> >>> Gengliang
>>>>>>>> >> >>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Sent from:
>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> >> To unsubscribe e-mail:
>>>>>>>>
>>>>>>>> > dev-unsubscribe@.apache
>>>>>>>>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Sent from:
>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>
>>>>>>>>