You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Maxim Gekk <ma...@databricks.com.INVALID> on 2022/04/04 18:27:08 UTC

Re: Apache Spark 3.3 Release

Hello All,

Below is current status of features from the allow list:

IN PROGRESS:

   1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   3. SPARK-37093: Inline type hints python/pyspark/streaming
   4. SPARK-37377: Refactor V2 Partitioning interface and remove deprecated
   usage of Distribution
   5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   7. SPARK-28516: Data Type Formatting Functions: `to_char`
   8. SPARK-36664: Log time spent waiting for cluster resources
   9. SPARK-34659: Web UI does not correctly get appId
   10. SPARK-37650: Tell spark-env.sh the python interpreter
   11. SPARK-38589: New SQL function: try_avg
   12. SPARK-38590: New SQL function: try_to_binary
   13. SPARK-34079: Improvement CTE table scan

RESOLVED:

   1. SPARK-32268: Bloom Filter Join
   2. SPARK-38548: New SQL function: try_sum
   3. SPARK-38063: Support SQL split_part function
   4. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
   6. SPARK-38194: Make Yarn memory overhead factor configurable
   7. SPARK-37618: Support cleaning up shuffle blocks from external shuffle
   service
   8. SPARK-37831: Add task partition id in metrics
   9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support

We need to decide whether we are going to wait a little bit more or close
the doors.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk <ma...@databricks.com>
wrote:

> Hi All,
>
> Here is the allow list which I built based on your requests in this thread:
>
>    1. SPARK-37396: Inline type hint files for files in
>    python/pyspark/mllib
>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>    deprecated usage of Distribution
>    5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>    sources
>    6. SPARK-32268: Bloom Filter Join
>    7. SPARK-38548: New SQL function: try_sum
>    8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>    9. SPARK-38063: Support SQL split_part function
>    10. SPARK-28516: Data Type Formatting Functions: `to_char`
>    11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>    filter by self way
>    12. SPARK-34863: Support nested column in Spark Parquet vectorized
>    readers
>    13. SPARK-38194: Make Yarn memory overhead factor configurable
>    14. SPARK-37618: Support cleaning up shuffle blocks from external
>    shuffle service
>    15. SPARK-37831: Add task partition id in metrics
>    16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>    17. SPARK-36664: Log time spent waiting for cluster resources
>    18. SPARK-34659: Web UI does not correctly get appId
>    19. SPARK-37650: Tell spark-env.sh the python interpreter
>    20. SPARK-38589: New SQL function: try_avg
>    21. SPARK-38590: New SQL function: try_to_binary
>    22. SPARK-34079: Improvement CTE table scan
>
> Best regards,
> Max Gekk
>
>
> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:
>
>> Is the feature freeze target date March 22nd then?  I saw a few dates
>> thrown around want to confirm what we landed on
>>
>> I am trying to get the following improvements finished review and in, if
>> concerns with either, let me know:
>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>> <https://github.com/apache/spark/pull/32298#>
>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>> for released executors <https://github.com/apache/spark/pull/35085#>
>>
>> Tom
>>
>>
>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>> ltnwgl@gmail.com> wrote:
>>
>>
>> I'd like to add the following new SQL functions in the 3.3 release. These
>> functions are useful when overflow or encoding errors occur:
>>
>>    - [SPARK-38548][SQL] New SQL function: try_sum
>>    <https://github.com/apache/spark/pull/35848>
>>    - [SPARK-38589][SQL] New SQL function: try_avg
>>    <https://github.com/apache/spark/pull/35896>
>>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>>    <https://github.com/apache/spark/pull/35897>
>>
>> Gengliang
>>
>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com>
>> wrote:
>>
>> Hello,
>>
>> I've been trying for a bit to get the following two PRs merged and
>> into a release, and I'm having some difficulty moving them forward:
>>
>> https://github.com/apache/spark/pull/34903 - This passes the current
>> python interpreter to spark-env.sh to allow some currently-unavailable
>> customization to happen
>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>> SparkUI reverse proxy-handling code where it does a greedy match for
>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>> wrong place.
>>
>> I'm not exactly sure of how to get attention of PRs that have been
>> sitting around for a while, but these are really important to our
>> use-cases, and it would be nice to have them merged in.
>>
>> Cheers
>> Andrew
>>
>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>> >
>> > I'd like to add/backport the logging in
>> https://github.com/apache/spark/pull/35881 PR so that when users submit
>> issues with dynamic allocation we can better debug what's going on.
>> >
>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>> >>
>> >> There is one item on our side that we want to backport to 3.3:
>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>> >>
>> >> It's already reviewed and approved.
>> >>
>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>> <tg...@yahoo.com.invalid> wrote:
>> >> >
>> >> > It looks like the version hasn't been updated on master and still
>> shows 3.3.0-SNAPSHOT, can you please update that.
>> >> >
>> >> > Tom
>> >> >
>> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
>> maxim.gekk@databricks.com.invalid> wrote:
>> >> >
>> >> >
>> >> > Hi All,
>> >> >
>> >> > I have created the branch for Spark 3.3:
>> >> > https://github.com/apache/spark/commits/branch-3.3
>> >> >
>> >> > Please, backport important fixes to it, and if you have some doubts,
>> ping me in the PR. Regarding new features, we are still building the allow
>> list for branch-3.3.
>> >> >
>> >> > Best regards,
>> >> > Max Gekk
>> >> >
>> >> >
>> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
>> dongjoon.hyun@gmail.com> wrote:
>> >> >
>> >> > Yes, I agree with you for your whitelist approach for backporting. :)
>> >> > Thank you for summarizing.
>> >> >
>> >> > Thanks,
>> >> > Dongjoon.
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> >
>> >> > I think I finally got your point. What you want to keep unchanged is
>> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
>> deal.
>> >> >
>> >> > My major concern is whether we should keep merging the feature work
>> or the dependency upgrade after the branch cut. To make our release time
>> more predictable, I am suggesting we should finalize the exception PR list
>> first, instead of merging them in an ad hoc way. In the past, we spent a
>> lot of time on the revert of the PRs that were merged after the branch cut.
>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>> Dongjoon?
>> >> >
>> >> >
>> >> >
>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>> >> >
>> >> > That is not totally fine, Xiao. It sounds like you are asking a
>> change of plan without a proper reason.
>> >> >
>> >> > Although we cut the branch Today according our plan, you still can
>> collect the list and make a list of exceptions. I'm not blocking what you
>> want to do.
>> >> >
>> >> > Please let the community start to ramp down as we agreed before.
>> >> >
>> >> > Dongjoon
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> >
>> >> > Please do not get me wrong. If we don't cut a branch, we are
>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>> we cut the branch, we should avoid merging the feature work. In the next
>> three days, let us collect the actively developed PRs that we want to make
>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>> make sense?
>> >> >
>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>> >> >
>> >> > Xiao. You are working against what you are saying.
>> >> > If you don't cut a branch, it means you are allowing all patches to
>> land Apache Spark 3.3. No?
>> >> >
>> >> > > we need to avoid backporting the feature work that are not being
>> well discussed.
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> >
>> >> > Cutting the branch is simple, but we need to avoid backporting the
>> feature work that are not being well discussed. Not all the members are
>> actively following the dev list. I think we should wait 3 more days for
>> collecting the PR list before cutting the branch.
>> >> >
>> >> > BTW, there are very few 3.4-only feature work that will be affected.
>> >> >
>> >> > Xiao
>> >> >
>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>> >> >
>> >> > Hi, Max, Chao, Xiao, Holden and all.
>> >> >
>> >> > I have a different idea.
>> >> >
>> >> > Given the situation and small patch list, I don't think we need to
>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>> and allow backporting.
>> >> >
>> >> > As of today, we already have an obvious Apache Spark 3.4 patch in
>> the branch together. This situation only becomes worse and worse because
>> there is no way to block the other patches from landing unintentionally if
>> we don't cut a branch.
>> >> >
>> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>> values
>> >> >
>> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>> >> >
>> >> > Best,
>> >> > Dongjoon.
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>> wrote:
>> >> >
>> >> > Cool, thanks for clarifying!
>> >> >
>> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> > >>
>> >> > >> For the following list:
>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>> vectorized reader
>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>> >> > >
>> >> > >
>> >> > > If possible, I hope these features can be shipped with Spark 3.3.
>> >> > >
>> >> > >
>> >> > >
>> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>> >> > >>
>> >> > >> Hi Xiao,
>> >> > >>
>> >> > >> For the following list:
>> >> > >>
>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>> vectorized reader
>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> > >>
>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>> >> > >>
>> >> > >> Thanks,
>> >> > >> Chao
>> >> > >>
>> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>> dongjoon.hyun@gmail.com> wrote:
>> >> > >> >
>> >> > >> > The following was tested and merged a few minutes ago. So, we
>> can remove it from the list.
>> >> > >> >
>> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> > >> >
>> >> > >> > Thanks,
>> >> > >> > Dongjoon.
>> >> > >> >
>> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> > >> >>
>> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>> days to collect the list of actively developed PRs that we want to merge to
>> 3.3 after the branch cut?
>> >> > >> >>
>> >> > >> >> Please do not rush to merge the PRs that are not fully
>> reviewed. We can cut the branch this Friday and continue merging the PRs
>> that have been discussed in this thread. Does that make sense?
>> >> > >> >>
>> >> > >> >> Xiao
>> >> > >> >>
>> >> > >> >>
>> >> > >> >>
>> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>> >> > >> >>>
>> >> > >> >>> May I suggest we push out one week (22nd) just to give
>> everyone a bit of breathing space? Rushed software development more often
>> results in bugs.
>> >> > >> >>>
>> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>> yikunkero@gmail.com> wrote:
>> >> > >> >>>>
>> >> > >> >>>> > To make our release time more predictable, let us collect
>> the PRs and wait three more days before the branch cut?
>> >> > >> >>>>
>> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> > >> >>>>
>> >> > >> >>>> Three more days are OK for this from my view.
>> >> > >> >>>>
>> >> > >> >>>> Regards,
>> >> > >> >>>> Yikun
>> >> > >> >>>
>> >> > >> >>> --
>> >> > >> >>> Twitter: https://twitter.com/holdenkarau
>> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >> > >> >>> YouTube Live Streams:
>> https://www.youtube.com/user/holdenkarau
>> >
>> >
>> >
>> > --
>> > Twitter: https://twitter.com/holdenkarau
>> > Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: Apache Spark 3.3 Release

Posted by Maciej <ms...@gmail.com>.

Thanks for the updated Max!

Just a small clarification ‒ the following should be moved to RESOLVED:

1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
3. SPARK-37093: Inline type hints python/pyspark/streaming

On 4/28/22 14:42, Maxim Gekk wrote:
> Hello All,
> 
> I am going to create the first release candidate of Spark 3.3 at the
> beginning of the next week if there are no objections. Below is the list
> of allow features, and their current status. At the moment, only one
> feature is still in progress, but it can be postponed to the next
> release, I guess:
> 
> IN PROGRESS:
> 
>  1. SPARK-28516: Data Type Formatting Functions: `to_char`
> 
> IN PROGRESS but won't/couldn't be merged to branch-3.3:
> 
>  1. SPARK-37650: Tell spark-env.sh the python interpreter
>  2. SPARK-36664: Log time spent waiting for cluster resources
>  3. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
>  4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>  5. SPARK-37093: Inline type hints python/pyspark/streaming
> 
> RESOLVED:
> 
>  1. SPARK-32268: Bloom Filter Join
>  2. SPARK-38548: New SQL function: try_sum
>  3. SPARK-38063: Support SQL split_part function
>  4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>     filter by self way
>  5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
>  6. SPARK-38194: Make Yarn memory overhead factor configurable
>  7. SPARK-37618: Support cleaning up shuffle blocks from external
>     shuffle service
>  8. SPARK-37831: Add task partition id in metrics
>  9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>     DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
> 10. SPARK-38590: New SQL function: try_to_binary
> 11. SPARK-37377: Refactor V2 Partitioning interface and remove
>     deprecated usage of Distribution
> 12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>     sources
> 13. SPARK-34659: Web UI does not correctly get appId
> 14. SPARK-38589: New SQL function: try_avg
> 15. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
> 16. SPARK-34079: Improvement CTE table scan
> 
> 
> Max Gekk
> 
> Software Engineer
> 
> Databricks, Inc.
> 
> 
> 
> On Fri, Apr 15, 2022 at 4:28 PM Maxim Gekk <maxim.gekk@databricks.com
> <ma...@databricks.com>> wrote:
> 
>     Hello All,
> 
>     Current status of features from the allow list for branch-3.3 is:
> 
>     IN PROGRESS:
> 
>      1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>      2. SPARK-28516: Data Type Formatting Functions: `to_char`
>      3. SPARK-34079: Improvement CTE table scan
> 
>     IN PROGRESS but won't/couldn't be merged to branch-3.3:
> 
>      1. SPARK-37650: Tell spark-env.sh the python interpreter
>      2. SPARK-36664: Log time spent waiting for cluster resources
>      3. SPARK-37396: Inline type hint files for files in
>         python/pyspark/mllib
>      4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>      5. SPARK-37093: Inline type hints python/pyspark/streaming
> 
>     RESOLVED:
> 
>      1. SPARK-32268: Bloom Filter Join
>      2. SPARK-38548: New SQL function: try_sum
>      3. SPARK-38063: Support SQL split_part function
>      4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>         filter by self way
>      5. SPARK-34863: Support nested column in Spark Parquet vectorized
>         readers
>      6. SPARK-38194: Make Yarn memory overhead factor configurable
>      7. SPARK-37618: Support cleaning up shuffle blocks from external
>         shuffle service
>      8. SPARK-37831: Add task partition id in metrics
>      9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>         DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>     10. SPARK-38590: New SQL function: try_to_binary
>     11. SPARK-37377: Refactor V2 Partitioning interface and remove
>         deprecated usage of Distribution
>     12. SPARK-38085: DataSource V2: Handle DELETE commands for
>         group-based sources
>     13. SPARK-34659: Web UI does not correctly get appId
>     14. SPARK-38589: New SQL function: try_avg
> 
> 
>     Max Gekk
> 
>     Software Engineer
> 
>     Databricks, Inc.
> 
> 
> 
>     On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk <maxim.gekk@databricks.com
>     <ma...@databricks.com>> wrote:
> 
>         Hello All,
> 
>         Below is current status of features from the allow list:
> 
>         IN PROGRESS:
> 
>          1. SPARK-37396: Inline type hint files for files in
>             python/pyspark/mllib
>          2. SPARK-37395: Inline type hint files for files in
>             python/pyspark/ml
>          3. SPARK-37093: Inline type hints python/pyspark/streaming
>          4. SPARK-37377: Refactor V2 Partitioning interface and remove
>             deprecated usage of Distribution
>          5. SPARK-38085: DataSource V2: Handle DELETE commands for
>             group-based sources
>          6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>          7. SPARK-28516: Data Type Formatting Functions: `to_char`
>          8. SPARK-36664: Log time spent waiting for cluster resources
>          9. SPARK-34659: Web UI does not correctly get appId
>         10. SPARK-37650: Tell spark-env.sh the python interpreter
>         11. SPARK-38589: New SQL function: try_avg
>         12. SPARK-38590: New SQL function: try_to_binary
>         13. SPARK-34079: Improvement CTE table scan
> 
>         RESOLVED:
> 
>          1. SPARK-32268: Bloom Filter Join
>          2. SPARK-38548: New SQL function: try_sum
>          3. SPARK-38063: Support SQL split_part function
>          4. SPARK-38432: Refactor framework so as JDBC dialect could
>             compile filter by self way
>          5. SPARK-34863: Support nested column in Spark Parquet
>             vectorized readers
>          6. SPARK-38194: Make Yarn memory overhead factor configurable
>          7. SPARK-37618: Support cleaning up shuffle blocks from
>             external shuffle service
>          8. SPARK-37831: Add task partition id in metrics
>          9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>             DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
> 
>         We need to decide whether we are going to wait a little bit more
>         or close the doors.
> 
>         Maxim Gekk
> 
>         Software Engineer
> 
>         Databricks, Inc.
> 
> 
> 
>         On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk
>         <maxim.gekk@databricks.com <ma...@databricks.com>>
>         wrote:
> 
>             Hi All,
> 
>             Here is the allow list which I built based on your requests
>             in this thread:
> 
>              1. SPARK-37396: Inline type hint files for files in
>                 python/pyspark/mllib
>              2. SPARK-37395: Inline type hint files for files in
>                 python/pyspark/ml
>              3. SPARK-37093: Inline type hints python/pyspark/streaming
>              4. SPARK-37377: Refactor V2 Partitioning interface and
>                 remove deprecated usage of Distribution
>              5. SPARK-38085: DataSource V2: Handle DELETE commands for
>                 group-based sources
>              6. SPARK-32268: Bloom Filter Join
>              7. SPARK-38548: New SQL function: try_sum
>              8. SPARK-37691: Support ANSI Aggregation Function:
>                 percentile_disc
>              9. SPARK-38063: Support SQL split_part function
>             10. SPARK-28516: Data Type Formatting Functions: `to_char`
>             11. SPARK-38432: Refactor framework so as JDBC dialect could
>                 compile filter by self way
>             12. SPARK-34863: Support nested column in Spark Parquet
>                 vectorized readers
>             13. SPARK-38194: Make Yarn memory overhead factor configurable
>             14. SPARK-37618: Support cleaning up shuffle blocks from
>                 external shuffle service
>             15. SPARK-37831: Add task partition id in metrics
>             16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>                 DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>             17. SPARK-36664: Log time spent waiting for cluster resources
>             18. SPARK-34659: Web UI does not correctly get appId
>             19. SPARK-37650: Tell spark-env.sh the python interpreter
>             20. SPARK-38589: New SQL function: try_avg
>             21. SPARK-38590: New SQL function: try_to_binary
>             22. SPARK-34079: Improvement CTE table scan
> 
>             Best regards,
>             Max Gekk
> 
> 
>             On Thu, Mar 17, 2022 at 4:59 PM Tom Graves
>             <tgraves_cs@yahoo.com <ma...@yahoo.com>> wrote:
> 
>                 Is the feature freeze target date March 22nd then?  I
>                 saw a few dates thrown around want to confirm what we
>                 landed on 
> 
>                 I am trying to get the following improvements finished
>                 review and in, if concerns with either, let me know:
>                 - [SPARK-34079][SQL] Merge non-correlated scalar
>                 subqueries <https://github.com/apache/spark/pull/32298#>
>                 - [SPARK-37618][CORE] Remove shuffle blocks using the
>                 shuffle service for released executors
>                 <https://github.com/apache/spark/pull/35085#>
> 
>                 Tom
> 
> 
>                 On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang
>                 Wang <ltnwgl@gmail.com <ma...@gmail.com>> wrote:
> 
> 
>                 I'd like to add the following new SQL functions in the
>                 3.3 release. These functions are useful when overflow or
>                 encoding errors occur:
> 
>                   * [SPARK-38548][SQL] New SQL function: try_sum
>                     <https://github.com/apache/spark/pull/35848> 
>                   * [SPARK-38589][SQL] New SQL function: try_avg
>                     <https://github.com/apache/spark/pull/35896>
>                   * [SPARK-38590][SQL] New SQL function: try_to_binary
>                     <https://github.com/apache/spark/pull/35897> 
> 
>                 Gengliang
> 
>                 On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo
>                 <andrew.melo@gmail.com <ma...@gmail.com>>
>                 wrote:
> 
>                     Hello,
> 
>                     I've been trying for a bit to get the following two
>                     PRs merged and
>                     into a release, and I'm having some difficulty
>                     moving them forward:
> 
>                     https://github.com/apache/spark/pull/34903
>                     <https://github.com/apache/spark/pull/34903> - This
>                     passes the current
>                     python interpreter to spark-env.sh to allow some
>                     currently-unavailable
>                     customization to happen
>                     https://github.com/apache/spark/pull/31774
>                     <https://github.com/apache/spark/pull/31774> - This
>                     fixes a bug in the
>                     SparkUI reverse proxy-handling code where it does a
>                     greedy match for
>                     "proxy" in the URL, and will mistakenly replace the
>                     App-ID in the
>                     wrong place.
> 
>                     I'm not exactly sure of how to get attention of PRs
>                     that have been
>                     sitting around for a while, but these are really
>                     important to our
>                     use-cases, and it would be nice to have them merged in.
> 
>                     Cheers
>                     Andrew
> 
>                     On Wed, Mar 16, 2022 at 6:21 PM Holden Karau
>                     <holden@pigscanfly.ca <ma...@pigscanfly.ca>>
>                     wrote:
>                     >
>                     > I'd like to add/backport the logging in
>                     https://github.com/apache/spark/pull/35881
>                     <https://github.com/apache/spark/pull/35881> PR so
>                     that when users submit issues with dynamic
>                     allocation we can better debug what's going on.
>                     >
>                     > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun
>                     <sunchao@apache.org <ma...@apache.org>> wrote:
>                     >>
>                     >> There is one item on our side that we want to
>                     backport to 3.3:
>                     >> - vectorized
>                     DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>                     >> Parquet V2 support
>                     (https://github.com/apache/spark/pull/35262
>                     <https://github.com/apache/spark/pull/35262>)
>                     >>
>                     >> It's already reviewed and approved.
>                     >>
>                     >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>                     <tg...@yahoo.com.invalid> wrote:
>                     >> >
>                     >> > It looks like the version hasn't been updated
>                     on master and still shows 3.3.0-SNAPSHOT, can you
>                     please update that.
>                     >> >
>                     >> > Tom
>                     >> >
>                     >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT,
>                     Maxim Gekk <maxim.gekk@databricks.com
>                     <ma...@databricks.com>.invalid> wrote:
>                     >> >
>                     >> >
>                     >> > Hi All,
>                     >> >
>                     >> > I have created the branch for Spark 3.3:
>                     >> >
>                     https://github.com/apache/spark/commits/branch-3.3
>                     <https://github.com/apache/spark/commits/branch-3.3>
>                     >> >
>                     >> > Please, backport important fixes to it, and if
>                     you have some doubts, ping me in the PR. Regarding
>                     new features, we are still building the allow list
>                     for branch-3.3.
>                     >> >
>                     >> > Best regards,
>                     >> > Max Gekk
>                     >> >
>                     >> >
>                     >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun
>                     <dongjoon.hyun@gmail.com
>                     <ma...@gmail.com>> wrote:
>                     >> >
>                     >> > Yes, I agree with you for your whitelist
>                     approach for backporting. :)
>                     >> > Thank you for summarizing.
>                     >> >
>                     >> > Thanks,
>                     >> > Dongjoon.
>                     >> >
>                     >> >
>                     >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li
>                     <gatorsmile@gmail.com <ma...@gmail.com>>
>                     wrote:
>                     >> >
>                     >> > I think I finally got your point. What you want
>                     to keep unchanged is the branch cut date of Spark
>                     3.3. Today? or this Friday? This is not a big deal.
>                     >> >
>                     >> > My major concern is whether we should keep
>                     merging the feature work or the dependency upgrade
>                     after the branch cut. To make our release time more
>                     predictable, I am suggesting we should finalize the
>                     exception PR list first, instead of merging them in
>                     an ad hoc way. In the past, we spent a lot of time
>                     on the revert of the PRs that were merged after the
>                     branch cut. I hope we can minimize unnecessary
>                     arguments in this release. Do you agree, Dongjoon?
>                     >> >
>                     >> >
>                     >> >
>                     >> > Dongjoon Hyun <dongjoon.hyun@gmail.com
>                     <ma...@gmail.com>> 于2022年3月15日周
>                     二 15:55写道：
>                     >> >
>                     >> > That is not totally fine, Xiao. It sounds like
>                     you are asking a change of plan without a proper reason.
>                     >> >
>                     >> > Although we cut the branch Today according our
>                     plan, you still can collect the list and make a list
>                     of exceptions. I'm not blocking what you want to do.
>                     >> >
>                     >> > Please let the community start to ramp down as
>                     we agreed before.
>                     >> >
>                     >> > Dongjoon
>                     >> >
>                     >> >
>                     >> >
>                     >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li
>                     <gatorsmile@gmail.com <ma...@gmail.com>>
>                     wrote:
>                     >> >
>                     >> > Please do not get me wrong. If we don't cut a
>                     branch, we are allowing all patches to land Apache
>                     Spark 3.3. That is totally fine. After we cut the
>                     branch, we should avoid merging the feature work. In
>                     the next three days, let us collect the actively
>                     developed PRs that we want to make an exception
>                     (i.e., merged to 3.3 after the upcoming branch cut).
>                     Does that make sense?
>                     >> >
>                     >> > Dongjoon Hyun <dongjoon.hyun@gmail.com
>                     <ma...@gmail.com>> 于2022年3月15日周
>                     二 14:54写道：
>                     >> >
>                     >> > Xiao. You are working against what you are saying.
>                     >> > If you don't cut a branch, it means you are
>                     allowing all patches to land Apache Spark 3.3. No?
>                     >> >
>                     >> > > we need to avoid backporting the feature work
>                     that are not being well discussed.
>                     >> >
>                     >> >
>                     >> >
>                     >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li
>                     <gatorsmile@gmail.com <ma...@gmail.com>>
>                     wrote:
>                     >> >
>                     >> > Cutting the branch is simple, but we need to
>                     avoid backporting the feature work that are not
>                     being well discussed. Not all the members are
>                     actively following the dev list. I think we should
>                     wait 3 more days for collecting the PR list before
>                     cutting the branch.
>                     >> >
>                     >> > BTW, there are very few 3.4-only feature work
>                     that will be affected.
>                     >> >
>                     >> > Xiao
>                     >> >
>                     >> > Dongjoon Hyun <dongjoon.hyun@gmail.com
>                     <ma...@gmail.com>> 于2022年3月15日周
>                     二 11:49写道：
>                     >> >
>                     >> > Hi, Max, Chao, Xiao, Holden and all.
>                     >> >
>                     >> > I have a different idea.
>                     >> >
>                     >> > Given the situation and small patch list, I
>                     don't think we need to postpone the branch cut for
>                     those patches. It's easier to cut a branch-3.3 and
>                     allow backporting.
>                     >> >
>                     >> > As of today, we already have an obvious Apache
>                     Spark 3.4 patch in the branch together. This
>                     situation only becomes worse and worse because there
>                     is no way to block the other patches from landing
>                     unintentionally if we don't cut a branch.
>                     >> >
>                     >> >     [SPARK-38335][SQL] Implement parser support
>                     for DEFAULT column values
>                     >> >
>                     >> > Let's cut `branch-3.3` Today for Apache Spark
>                     3.3.0 preparation.
>                     >> >
>                     >> > Best,
>                     >> > Dongjoon.
>                     >> >
>                     >> >
>                     >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun
>                     <sunchao@apache.org <ma...@apache.org>> wrote:
>                     >> >
>                     >> > Cool, thanks for clarifying!
>                     >> >
>                     >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li
>                     <gatorsmile@gmail.com <ma...@gmail.com>>
>                     wrote:
>                     >> > >>
>                     >> > >> For the following list:
>                     >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime
>                     Filtering
>                     >> > >> #34659 [SPARK-34863][SQL] Support complex
>                     types for Parquet vectorized reader
>                     >> > >> #35848 [SPARK-38548][SQL] New SQL function:
>                     try_sum
>                     >> > >> Do you mean we should include them, or
>                     exclude them from 3.3?
>                     >> > >
>                     >> > >
>                     >> > > If possible, I hope these features can be
>                     shipped with Spark 3.3.
>                     >> > >
>                     >> > >
>                     >> > >
>                     >> > > Chao Sun <sunchao@apache.org
>                     <ma...@apache.org>> 于2022年3月15日周二
>                     10:06写道：
>                     >> > >>
>                     >> > >> Hi Xiao,
>                     >> > >>
>                     >> > >> For the following list:
>                     >> > >>
>                     >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime
>                     Filtering
>                     >> > >> #34659 [SPARK-34863][SQL] Support complex
>                     types for Parquet vectorized reader
>                     >> > >> #35848 [SPARK-38548][SQL] New SQL function:
>                     try_sum
>                     >> > >>
>                     >> > >> Do you mean we should include them, or
>                     exclude them from 3.3?
>                     >> > >>
>                     >> > >> Thanks,
>                     >> > >> Chao
>                     >> > >>
>                     >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon
>                     Hyun <dongjoon.hyun@gmail.com
>                     <ma...@gmail.com>> wrote:
>                     >> > >> >
>                     >> > >> > The following was tested and merged a few
>                     minutes ago. So, we can remove it from the list.
>                     >> > >> >
>                     >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S]
>                     Bump Volcano to v1.5.1
>                     >> > >> >
>                     >> > >> > Thanks,
>                     >> > >> > Dongjoon.
>                     >> > >> >
>                     >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li
>                     <gatorsmile@gmail.com <ma...@gmail.com>>
>                     wrote:
>                     >> > >> >>
>                     >> > >> >> Let me clarify my above suggestion. Maybe
>                     we can wait 3 more days to collect the list of
>                     actively developed PRs that we want to merge to 3.3
>                     after the branch cut?
>                     >> > >> >>
>                     >> > >> >> Please do not rush to merge the PRs that
>                     are not fully reviewed. We can cut the branch this
>                     Friday and continue merging the PRs that have been
>                     discussed in this thread. Does that make sense?
>                     >> > >> >>
>                     >> > >> >> Xiao
>                     >> > >> >>
>                     >> > >> >>
>                     >> > >> >>
>                     >> > >> >> Holden Karau <holden@pigscanfly.ca
>                     <ma...@pigscanfly.ca>> 于2022年3月15日周二
>                     09:10写道：
>                     >> > >> >>>
>                     >> > >> >>> May I suggest we push out one week
>                     (22nd) just to give everyone a bit of breathing
>                     space? Rushed software development more often
>                     results in bugs.
>                     >> > >> >>>
>                     >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun
>                     Jiang <yikunkero@gmail.com
>                     <ma...@gmail.com>> wrote:
>                     >> > >> >>>>
>                     >> > >> >>>> > To make our release time more
>                     predictable, let us collect the PRs and wait three
>                     more days before the branch cut?
>                     >> > >> >>>>
>                     >> > >> >>>> For SPIP: Support Customized Kubernetes
>                     Schedulers:
>                     >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S]
>                     Bump Volcano to v1.5.1
>                     >> > >> >>>>
>                     >> > >> >>>> Three more days are OK for this from my
>                     view.
>                     >> > >> >>>>
>                     >> > >> >>>> Regards,
>                     >> > >> >>>> Yikun
>                     >> > >> >>>
>                     >> > >> >>> --
>                     >> > >> >>> Twitter: https://twitter.com/holdenkarau
>                     <https://twitter.com/holdenkarau>
>                     >> > >> >>> Books (Learning Spark, High Performance
>                     Spark, etc.): https://amzn.to/2MaRAG9
>                     <https://amzn.to/2MaRAG9>
>                     >> > >> >>> YouTube Live Streams:
>                     https://www.youtube.com/user/holdenkarau
>                     <https://www.youtube.com/user/holdenkarau>
>                     >
>                     >
>                     >
>                     > --
>                     > Twitter: https://twitter.com/holdenkarau
>                     <https://twitter.com/holdenkarau>
>                     > Books (Learning Spark, High Performance Spark,
>                     etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9>
>                     > YouTube Live Streams:
>                     https://www.youtube.com/user/holdenkarau
>                     <https://www.youtube.com/user/holdenkarau>
> 
>                     ---------------------------------------------------------------------
>                     To unsubscribe e-mail:
>                     dev-unsubscribe@spark.apache.org
>                     <ma...@spark.apache.org>
> 


-- 
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
PGP: A30CEF0C31A501EC

Re: Apache Spark 3.3 Release

Posted by Maxim Gekk <ma...@databricks.com.INVALID>.

Hello All,

I am going to create the first release candidate of Spark 3.3 at the
beginning of the next week if there are no objections. Below is the list of
allow features, and their current status. At the moment, only one feature
is still in progress, but it can be postponed to the next release, I guess:

IN PROGRESS:

   1. SPARK-28516: Data Type Formatting Functions: `to_char`

IN PROGRESS but won't/couldn't be merged to branch-3.3:

   1. SPARK-37650: Tell spark-env.sh the python interpreter
   2. SPARK-36664: Log time spent waiting for cluster resources
   3. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   5. SPARK-37093: Inline type hints python/pyspark/streaming

RESOLVED:

   1. SPARK-32268: Bloom Filter Join
   2. SPARK-38548: New SQL function: try_sum
   3. SPARK-38063: Support SQL split_part function
   4. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
   6. SPARK-38194: Make Yarn memory overhead factor configurable
   7. SPARK-37618: Support cleaning up shuffle blocks from external shuffle
   service
   8. SPARK-37831: Add task partition id in metrics
   9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   10. SPARK-38590: New SQL function: try_to_binary
   11. SPARK-37377: Refactor V2 Partitioning interface and remove
   deprecated usage of Distribution
   12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   13. SPARK-34659: Web UI does not correctly get appId
   14. SPARK-38589: New SQL function: try_avg
   15. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   16. SPARK-34079: Improvement CTE table scan


Max Gekk

Software Engineer

Databricks, Inc.


On Fri, Apr 15, 2022 at 4:28 PM Maxim Gekk <ma...@databricks.com>
wrote:

> Hello All,
>
> Current status of features from the allow list for branch-3.3 is:
>
> IN PROGRESS:
>
>    1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>    2. SPARK-28516: Data Type Formatting Functions: `to_char`
>    3. SPARK-34079: Improvement CTE table scan
>
> IN PROGRESS but won't/couldn't be merged to branch-3.3:
>
>    1. SPARK-37650: Tell spark-env.sh the python interpreter
>    2. SPARK-36664: Log time spent waiting for cluster resources
>    3. SPARK-37396: Inline type hint files for files in
>    python/pyspark/mllib
>    4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>    5. SPARK-37093: Inline type hints python/pyspark/streaming
>
> RESOLVED:
>
>    1. SPARK-32268: Bloom Filter Join
>    2. SPARK-38548: New SQL function: try_sum
>    3. SPARK-38063: Support SQL split_part function
>    4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>    filter by self way
>    5. SPARK-34863: Support nested column in Spark Parquet vectorized
>    readers
>    6. SPARK-38194: Make Yarn memory overhead factor configurable
>    7. SPARK-37618: Support cleaning up shuffle blocks from external
>    shuffle service
>    8. SPARK-37831: Add task partition id in metrics
>    9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>    10. SPARK-38590: New SQL function: try_to_binary
>    11. SPARK-37377: Refactor V2 Partitioning interface and remove
>    deprecated usage of Distribution
>    12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>    sources
>    13. SPARK-34659: Web UI does not correctly get appId
>    14. SPARK-38589: New SQL function: try_avg
>
>
> Max Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk <ma...@databricks.com>
> wrote:
>
>> Hello All,
>>
>> Below is current status of features from the allow list:
>>
>> IN PROGRESS:
>>
>>    1. SPARK-37396: Inline type hint files for files in
>>    python/pyspark/mllib
>>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>    deprecated usage of Distribution
>>    5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>>    sources
>>    6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>    7. SPARK-28516: Data Type Formatting Functions: `to_char`
>>    8. SPARK-36664: Log time spent waiting for cluster resources
>>    9. SPARK-34659: Web UI does not correctly get appId
>>    10. SPARK-37650: Tell spark-env.sh the python interpreter
>>    11. SPARK-38589: New SQL function: try_avg
>>    12. SPARK-38590: New SQL function: try_to_binary
>>    13. SPARK-34079: Improvement CTE table scan
>>
>> RESOLVED:
>>
>>    1. SPARK-32268: Bloom Filter Join
>>    2. SPARK-38548: New SQL function: try_sum
>>    3. SPARK-38063: Support SQL split_part function
>>    4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>    filter by self way
>>    5. SPARK-34863: Support nested column in Spark Parquet vectorized
>>    readers
>>    6. SPARK-38194: Make Yarn memory overhead factor configurable
>>    7. SPARK-37618: Support cleaning up shuffle blocks from external
>>    shuffle service
>>    8. SPARK-37831: Add task partition id in metrics
>>    9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>
>> We need to decide whether we are going to wait a little bit more or close
>> the doors.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>> On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk <ma...@databricks.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> Here is the allow list which I built based on your requests in this
>>> thread:
>>>
>>>    1. SPARK-37396: Inline type hint files for files in
>>>    python/pyspark/mllib
>>>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>>>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>>    deprecated usage of Distribution
>>>    5. SPARK-38085: DataSource V2: Handle DELETE commands for
>>>    group-based sources
>>>    6. SPARK-32268: Bloom Filter Join
>>>    7. SPARK-38548: New SQL function: try_sum
>>>    8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>>    9. SPARK-38063: Support SQL split_part function
>>>    10. SPARK-28516: Data Type Formatting Functions: `to_char`
>>>    11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>>    filter by self way
>>>    12. SPARK-34863: Support nested column in Spark Parquet vectorized
>>>    readers
>>>    13. SPARK-38194: Make Yarn memory overhead factor configurable
>>>    14. SPARK-37618: Support cleaning up shuffle blocks from external
>>>    shuffle service
>>>    15. SPARK-37831: Add task partition id in metrics
>>>    16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>>    17. SPARK-36664: Log time spent waiting for cluster resources
>>>    18. SPARK-34659: Web UI does not correctly get appId
>>>    19. SPARK-37650: Tell spark-env.sh the python interpreter
>>>    20. SPARK-38589: New SQL function: try_avg
>>>    21. SPARK-38590: New SQL function: try_to_binary
>>>    22. SPARK-34079: Improvement CTE table scan
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:
>>>
>>>> Is the feature freeze target date March 22nd then?  I saw a few dates
>>>> thrown around want to confirm what we landed on
>>>>
>>>> I am trying to get the following improvements finished review and in,
>>>> if concerns with either, let me know:
>>>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>>>> <https://github.com/apache/spark/pull/32298#>
>>>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>>>> for released executors <https://github.com/apache/spark/pull/35085#>
>>>>
>>>> Tom
>>>>
>>>>
>>>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>>>> ltnwgl@gmail.com> wrote:
>>>>
>>>>
>>>> I'd like to add the following new SQL functions in the 3.3 release.
>>>> These functions are useful when overflow or encoding errors occur:
>>>>
>>>>    - [SPARK-38548][SQL] New SQL function: try_sum
>>>>    <https://github.com/apache/spark/pull/35848>
>>>>    - [SPARK-38589][SQL] New SQL function: try_avg
>>>>    <https://github.com/apache/spark/pull/35896>
>>>>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>>>>    <https://github.com/apache/spark/pull/35897>
>>>>
>>>> Gengliang
>>>>
>>>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com>
>>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I've been trying for a bit to get the following two PRs merged and
>>>> into a release, and I'm having some difficulty moving them forward:
>>>>
>>>> https://github.com/apache/spark/pull/34903 - This passes the current
>>>> python interpreter to spark-env.sh to allow some currently-unavailable
>>>> customization to happen
>>>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>>>> SparkUI reverse proxy-handling code where it does a greedy match for
>>>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>>>> wrong place.
>>>>
>>>> I'm not exactly sure of how to get attention of PRs that have been
>>>> sitting around for a while, but these are really important to our
>>>> use-cases, and it would be nice to have them merged in.
>>>>
>>>> Cheers
>>>> Andrew
>>>>
>>>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca>
>>>> wrote:
>>>> >
>>>> > I'd like to add/backport the logging in
>>>> https://github.com/apache/spark/pull/35881 PR so that when users
>>>> submit issues with dynamic allocation we can better debug what's going on.
>>>> >
>>>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>>>> >>
>>>> >> There is one item on our side that we want to backport to 3.3:
>>>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>>>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>>> >>
>>>> >> It's already reviewed and approved.
>>>> >>
>>>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>>>> <tg...@yahoo.com.invalid> wrote:
>>>> >> >
>>>> >> > It looks like the version hasn't been updated on master and still
>>>> shows 3.3.0-SNAPSHOT, can you please update that.
>>>> >> >
>>>> >> > Tom
>>>> >> >
>>>> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
>>>> maxim.gekk@databricks.com.invalid> wrote:
>>>> >> >
>>>> >> >
>>>> >> > Hi All,
>>>> >> >
>>>> >> > I have created the branch for Spark 3.3:
>>>> >> > https://github.com/apache/spark/commits/branch-3.3
>>>> >> >
>>>> >> > Please, backport important fixes to it, and if you have some
>>>> doubts, ping me in the PR. Regarding new features, we are still building
>>>> the allow list for branch-3.3.
>>>> >> >
>>>> >> > Best regards,
>>>> >> > Max Gekk
>>>> >> >
>>>> >> >
>>>> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
>>>> dongjoon.hyun@gmail.com> wrote:
>>>> >> >
>>>> >> > Yes, I agree with you for your whitelist approach for backporting.
>>>> :)
>>>> >> > Thank you for summarizing.
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Dongjoon.
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > I think I finally got your point. What you want to keep unchanged
>>>> is the branch cut date of Spark 3.3. Today? or this Friday? This is not a
>>>> big deal.
>>>> >> >
>>>> >> > My major concern is whether we should keep merging the feature
>>>> work or the dependency upgrade after the branch cut. To make our release
>>>> time more predictable, I am suggesting we should finalize the exception PR
>>>> list first, instead of merging them in an ad hoc way. In the past, we spent
>>>> a lot of time on the revert of the PRs that were merged after the branch
>>>> cut. I hope we can minimize unnecessary arguments in this release. Do you
>>>> agree, Dongjoon?
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>>>> >> >
>>>> >> > That is not totally fine, Xiao. It sounds like you are asking a
>>>> change of plan without a proper reason.
>>>> >> >
>>>> >> > Although we cut the branch Today according our plan, you still can
>>>> collect the list and make a list of exceptions. I'm not blocking what you
>>>> want to do.
>>>> >> >
>>>> >> > Please let the community start to ramp down as we agreed before.
>>>> >> >
>>>> >> > Dongjoon
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Please do not get me wrong. If we don't cut a branch, we are
>>>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>>>> we cut the branch, we should avoid merging the feature work. In the next
>>>> three days, let us collect the actively developed PRs that we want to make
>>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>>> make sense?
>>>> >> >
>>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>>> >> >
>>>> >> > Xiao. You are working against what you are saying.
>>>> >> > If you don't cut a branch, it means you are allowing all patches
>>>> to land Apache Spark 3.3. No?
>>>> >> >
>>>> >> > > we need to avoid backporting the feature work that are not being
>>>> well discussed.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Cutting the branch is simple, but we need to avoid backporting the
>>>> feature work that are not being well discussed. Not all the members are
>>>> actively following the dev list. I think we should wait 3 more days for
>>>> collecting the PR list before cutting the branch.
>>>> >> >
>>>> >> > BTW, there are very few 3.4-only feature work that will be
>>>> affected.
>>>> >> >
>>>> >> > Xiao
>>>> >> >
>>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>> >> >
>>>> >> > Hi, Max, Chao, Xiao, Holden and all.
>>>> >> >
>>>> >> > I have a different idea.
>>>> >> >
>>>> >> > Given the situation and small patch list, I don't think we need to
>>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>>> and allow backporting.
>>>> >> >
>>>> >> > As of today, we already have an obvious Apache Spark 3.4 patch in
>>>> the branch together. This situation only becomes worse and worse because
>>>> there is no way to block the other patches from landing unintentionally if
>>>> we don't cut a branch.
>>>> >> >
>>>> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>>> values
>>>> >> >
>>>> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>> >> >
>>>> >> > Best,
>>>> >> > Dongjoon.
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>>>> wrote:
>>>> >> >
>>>> >> > Cool, thanks for clarifying!
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> > >>
>>>> >> > >> For the following list:
>>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>> vectorized reader
>>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>>> >> > >
>>>> >> > >
>>>> >> > > If possible, I hope these features can be shipped with Spark 3.3.
>>>> >> > >
>>>> >> > >
>>>> >> > >
>>>> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>> >> > >>
>>>> >> > >> Hi Xiao,
>>>> >> > >>
>>>> >> > >> For the following list:
>>>> >> > >>
>>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>> vectorized reader
>>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>> >> > >>
>>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>>> >> > >>
>>>> >> > >> Thanks,
>>>> >> > >> Chao
>>>> >> > >>
>>>> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>> dongjoon.hyun@gmail.com> wrote:
>>>> >> > >> >
>>>> >> > >> > The following was tested and merged a few minutes ago. So, we
>>>> can remove it from the list.
>>>> >> > >> >
>>>> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>> >> > >> >
>>>> >> > >> > Thanks,
>>>> >> > >> > Dongjoon.
>>>> >> > >> >
>>>> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> > >> >>
>>>> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>>> days to collect the list of actively developed PRs that we want to merge to
>>>> 3.3 after the branch cut?
>>>> >> > >> >>
>>>> >> > >> >> Please do not rush to merge the PRs that are not fully
>>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>>> that have been discussed in this thread. Does that make sense?
>>>> >> > >> >>
>>>> >> > >> >> Xiao
>>>> >> > >> >>
>>>> >> > >> >>
>>>> >> > >> >>
>>>> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>> >> > >> >>>
>>>> >> > >> >>> May I suggest we push out one week (22nd) just to give
>>>> everyone a bit of breathing space? Rushed software development more often
>>>> results in bugs.
>>>> >> > >> >>>
>>>> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>> yikunkero@gmail.com> wrote:
>>>> >> > >> >>>>
>>>> >> > >> >>>> > To make our release time more predictable, let us
>>>> collect the PRs and wait three more days before the branch cut?
>>>> >> > >> >>>>
>>>> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>> v1.5.1
>>>> >> > >> >>>>
>>>> >> > >> >>>> Three more days are OK for this from my view.
>>>> >> > >> >>>>
>>>> >> > >> >>>> Regards,
>>>> >> > >> >>>> Yikun
>>>> >> > >> >>>
>>>> >> > >> >>> --
>>>> >> > >> >>> Twitter: https://twitter.com/holdenkarau
>>>> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> >> > >> >>> YouTube Live Streams:
>>>> https://www.youtube.com/user/holdenkarau
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Twitter: https://twitter.com/holdenkarau
>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>

Re: Apache Spark 3.3 Release

Posted by Maxim Gekk <ma...@databricks.com.INVALID>.

Hello All,

Current status of features from the allow list for branch-3.3 is:

IN PROGRESS:

   1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   2. SPARK-28516: Data Type Formatting Functions: `to_char`
   3. SPARK-34079: Improvement CTE table scan

IN PROGRESS but won't/couldn't be merged to branch-3.3:

   1. SPARK-37650: Tell spark-env.sh the python interpreter
   2. SPARK-36664: Log time spent waiting for cluster resources
   3. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   5. SPARK-37093: Inline type hints python/pyspark/streaming

RESOLVED:

   1. SPARK-32268: Bloom Filter Join
   2. SPARK-38548: New SQL function: try_sum
   3. SPARK-38063: Support SQL split_part function
   4. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
   6. SPARK-38194: Make Yarn memory overhead factor configurable
   7. SPARK-37618: Support cleaning up shuffle blocks from external shuffle
   service
   8. SPARK-37831: Add task partition id in metrics
   9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   10. SPARK-38590: New SQL function: try_to_binary
   11. SPARK-37377: Refactor V2 Partitioning interface and remove
   deprecated usage of Distribution
   12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   13. SPARK-34659: Web UI does not correctly get appId
   14. SPARK-38589: New SQL function: try_avg


Max Gekk

Software Engineer

Databricks, Inc.


On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk <ma...@databricks.com> wrote:

> Hello All,
>
> Below is current status of features from the allow list:
>
> IN PROGRESS:
>
>    1. SPARK-37396: Inline type hint files for files in
>    python/pyspark/mllib
>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>    deprecated usage of Distribution
>    5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>    sources
>    6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>    7. SPARK-28516: Data Type Formatting Functions: `to_char`
>    8. SPARK-36664: Log time spent waiting for cluster resources
>    9. SPARK-34659: Web UI does not correctly get appId
>    10. SPARK-37650: Tell spark-env.sh the python interpreter
>    11. SPARK-38589: New SQL function: try_avg
>    12. SPARK-38590: New SQL function: try_to_binary
>    13. SPARK-34079: Improvement CTE table scan
>
> RESOLVED:
>
>    1. SPARK-32268: Bloom Filter Join
>    2. SPARK-38548: New SQL function: try_sum
>    3. SPARK-38063: Support SQL split_part function
>    4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>    filter by self way
>    5. SPARK-34863: Support nested column in Spark Parquet vectorized
>    readers
>    6. SPARK-38194: Make Yarn memory overhead factor configurable
>    7. SPARK-37618: Support cleaning up shuffle blocks from external
>    shuffle service
>    8. SPARK-37831: Add task partition id in metrics
>    9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>
> We need to decide whether we are going to wait a little bit more or close
> the doors.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk <ma...@databricks.com>
> wrote:
>
>> Hi All,
>>
>> Here is the allow list which I built based on your requests in this
>> thread:
>>
>>    1. SPARK-37396: Inline type hint files for files in
>>    python/pyspark/mllib
>>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>    deprecated usage of Distribution
>>    5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>>    sources
>>    6. SPARK-32268: Bloom Filter Join
>>    7. SPARK-38548: New SQL function: try_sum
>>    8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>    9. SPARK-38063: Support SQL split_part function
>>    10. SPARK-28516: Data Type Formatting Functions: `to_char`
>>    11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>    filter by self way
>>    12. SPARK-34863: Support nested column in Spark Parquet vectorized
>>    readers
>>    13. SPARK-38194: Make Yarn memory overhead factor configurable
>>    14. SPARK-37618: Support cleaning up shuffle blocks from external
>>    shuffle service
>>    15. SPARK-37831: Add task partition id in metrics
>>    16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>    17. SPARK-36664: Log time spent waiting for cluster resources
>>    18. SPARK-34659: Web UI does not correctly get appId
>>    19. SPARK-37650: Tell spark-env.sh the python interpreter
>>    20. SPARK-38589: New SQL function: try_avg
>>    21. SPARK-38590: New SQL function: try_to_binary
>>    22. SPARK-34079: Improvement CTE table scan
>>
>> Best regards,
>> Max Gekk
>>
>>
>> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:
>>
>>> Is the feature freeze target date March 22nd then?  I saw a few dates
>>> thrown around want to confirm what we landed on
>>>
>>> I am trying to get the following improvements finished review and in, if
>>> concerns with either, let me know:
>>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>>> <https://github.com/apache/spark/pull/32298#>
>>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>>> for released executors <https://github.com/apache/spark/pull/35085#>
>>>
>>> Tom
>>>
>>>
>>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>>> ltnwgl@gmail.com> wrote:
>>>
>>>
>>> I'd like to add the following new SQL functions in the 3.3 release.
>>> These functions are useful when overflow or encoding errors occur:
>>>
>>>    - [SPARK-38548][SQL] New SQL function: try_sum
>>>    <https://github.com/apache/spark/pull/35848>
>>>    - [SPARK-38589][SQL] New SQL function: try_avg
>>>    <https://github.com/apache/spark/pull/35896>
>>>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>>>    <https://github.com/apache/spark/pull/35897>
>>>
>>> Gengliang
>>>
>>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com>
>>> wrote:
>>>
>>> Hello,
>>>
>>> I've been trying for a bit to get the following two PRs merged and
>>> into a release, and I'm having some difficulty moving them forward:
>>>
>>> https://github.com/apache/spark/pull/34903 - This passes the current
>>> python interpreter to spark-env.sh to allow some currently-unavailable
>>> customization to happen
>>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>>> SparkUI reverse proxy-handling code where it does a greedy match for
>>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>>> wrong place.
>>>
>>> I'm not exactly sure of how to get attention of PRs that have been
>>> sitting around for a while, but these are really important to our
>>> use-cases, and it would be nice to have them merged in.
>>>
>>> Cheers
>>> Andrew
>>>
>>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca>
>>> wrote:
>>> >
>>> > I'd like to add/backport the logging in
>>> https://github.com/apache/spark/pull/35881 PR so that when users submit
>>> issues with dynamic allocation we can better debug what's going on.
>>> >
>>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>>> >>
>>> >> There is one item on our side that we want to backport to 3.3:
>>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>> >>
>>> >> It's already reviewed and approved.
>>> >>
>>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>>> <tg...@yahoo.com.invalid> wrote:
>>> >> >
>>> >> > It looks like the version hasn't been updated on master and still
>>> shows 3.3.0-SNAPSHOT, can you please update that.
>>> >> >
>>> >> > Tom
>>> >> >
>>> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
>>> maxim.gekk@databricks.com.invalid> wrote:
>>> >> >
>>> >> >
>>> >> > Hi All,
>>> >> >
>>> >> > I have created the branch for Spark 3.3:
>>> >> > https://github.com/apache/spark/commits/branch-3.3
>>> >> >
>>> >> > Please, backport important fixes to it, and if you have some
>>> doubts, ping me in the PR. Regarding new features, we are still building
>>> the allow list for branch-3.3.
>>> >> >
>>> >> > Best regards,
>>> >> > Max Gekk
>>> >> >
>>> >> >
>>> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
>>> dongjoon.hyun@gmail.com> wrote:
>>> >> >
>>> >> > Yes, I agree with you for your whitelist approach for backporting.
>>> :)
>>> >> > Thank you for summarizing.
>>> >> >
>>> >> > Thanks,
>>> >> > Dongjoon.
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > I think I finally got your point. What you want to keep unchanged
>>> is the branch cut date of Spark 3.3. Today? or this Friday? This is not a
>>> big deal.
>>> >> >
>>> >> > My major concern is whether we should keep merging the feature work
>>> or the dependency upgrade after the branch cut. To make our release time
>>> more predictable, I am suggesting we should finalize the exception PR list
>>> first, instead of merging them in an ad hoc way. In the past, we spent a
>>> lot of time on the revert of the PRs that were merged after the branch cut.
>>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>>> Dongjoon?
>>> >> >
>>> >> >
>>> >> >
>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>>> >> >
>>> >> > That is not totally fine, Xiao. It sounds like you are asking a
>>> change of plan without a proper reason.
>>> >> >
>>> >> > Although we cut the branch Today according our plan, you still can
>>> collect the list and make a list of exceptions. I'm not blocking what you
>>> want to do.
>>> >> >
>>> >> > Please let the community start to ramp down as we agreed before.
>>> >> >
>>> >> > Dongjoon
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > Please do not get me wrong. If we don't cut a branch, we are
>>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>>> we cut the branch, we should avoid merging the feature work. In the next
>>> three days, let us collect the actively developed PRs that we want to make
>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>> make sense?
>>> >> >
>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>> >> >
>>> >> > Xiao. You are working against what you are saying.
>>> >> > If you don't cut a branch, it means you are allowing all patches to
>>> land Apache Spark 3.3. No?
>>> >> >
>>> >> > > we need to avoid backporting the feature work that are not being
>>> well discussed.
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > Cutting the branch is simple, but we need to avoid backporting the
>>> feature work that are not being well discussed. Not all the members are
>>> actively following the dev list. I think we should wait 3 more days for
>>> collecting the PR list before cutting the branch.
>>> >> >
>>> >> > BTW, there are very few 3.4-only feature work that will be affected.
>>> >> >
>>> >> > Xiao
>>> >> >
>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>> >> >
>>> >> > Hi, Max, Chao, Xiao, Holden and all.
>>> >> >
>>> >> > I have a different idea.
>>> >> >
>>> >> > Given the situation and small patch list, I don't think we need to
>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>> and allow backporting.
>>> >> >
>>> >> > As of today, we already have an obvious Apache Spark 3.4 patch in
>>> the branch together. This situation only becomes worse and worse because
>>> there is no way to block the other patches from landing unintentionally if
>>> we don't cut a branch.
>>> >> >
>>> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>> values
>>> >> >
>>> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>> >> >
>>> >> > Best,
>>> >> > Dongjoon.
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>>> wrote:
>>> >> >
>>> >> > Cool, thanks for clarifying!
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> > >>
>>> >> > >> For the following list:
>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>> >> > >
>>> >> > >
>>> >> > > If possible, I hope these features can be shipped with Spark 3.3.
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>> >> > >>
>>> >> > >> Hi Xiao,
>>> >> > >>
>>> >> > >> For the following list:
>>> >> > >>
>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> > >>
>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>> >> > >>
>>> >> > >> Thanks,
>>> >> > >> Chao
>>> >> > >>
>>> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>> dongjoon.hyun@gmail.com> wrote:
>>> >> > >> >
>>> >> > >> > The following was tested and merged a few minutes ago. So, we
>>> can remove it from the list.
>>> >> > >> >
>>> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> > >> >
>>> >> > >> > Thanks,
>>> >> > >> > Dongjoon.
>>> >> > >> >
>>> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> > >> >>
>>> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>> days to collect the list of actively developed PRs that we want to merge to
>>> 3.3 after the branch cut?
>>> >> > >> >>
>>> >> > >> >> Please do not rush to merge the PRs that are not fully
>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>> that have been discussed in this thread. Does that make sense?
>>> >> > >> >>
>>> >> > >> >> Xiao
>>> >> > >> >>
>>> >> > >> >>
>>> >> > >> >>
>>> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>> >> > >> >>>
>>> >> > >> >>> May I suggest we push out one week (22nd) just to give
>>> everyone a bit of breathing space? Rushed software development more often
>>> results in bugs.
>>> >> > >> >>>
>>> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>> yikunkero@gmail.com> wrote:
>>> >> > >> >>>>
>>> >> > >> >>>> > To make our release time more predictable, let us collect
>>> the PRs and wait three more days before the branch cut?
>>> >> > >> >>>>
>>> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>> v1.5.1
>>> >> > >> >>>>
>>> >> > >> >>>> Three more days are OK for this from my view.
>>> >> > >> >>>>
>>> >> > >> >>>> Regards,
>>> >> > >> >>>> Yikun
>>> >> > >> >>>
>>> >> > >> >>> --
>>> >> > >> >>> Twitter: https://twitter.com/holdenkarau
>>> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >> > >> >>> YouTube Live Streams:
>>> https://www.youtube.com/user/holdenkarau
>>> >
>>> >
>>> >
>>> > --
>>> > Twitter: https://twitter.com/holdenkarau
>>> > Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>