You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Maxim Gekk <ma...@databricks.com.INVALID> on 2022/03/03 18:44:37 UTC

Apache Spark 3.3 Release

Hello All,

I would like to bring on the table the theme about the new Spark release
3.3. According to the public schedule at
https://spark.apache.org/versioning-policy.html, we planned to start the
code freeze and release branch cut on March 15th, 2022. Since this date is
coming soon, I would like to take your attention on the topic and gather
objections that you might have.

Bellow is the list of ongoing and active SPIPs:

Spark SQL:
- [SPARK-31357] DataSourceV2: Catalog API for view metadata
- [SPARK-35801] Row-level operations in Data Source V2
- [SPARK-37166] Storage Partitioned Join

Spark Core:
- [SPARK-20624] Add better handling for node shutdown
- [SPARK-25299] Use remote storage for persisting shuffle data

PySpark:
- [SPARK-26413] RDD Arrow Support in Spark Core and PySpark

Kubernetes:
- [SPARK-36057] Support Customized Kubernetes Schedulers

Probably, we should finish if there are any remaining works for Spark 3.3,
and switch to QA mode, cut a branch and keep everything on track. I would
like to volunteer to help drive this process.

Best regards,
Max Gekk

Re: Apache Spark 3.3 Release

Posted by Yikun Jiang <yi...@gmail.com>.

@Maxim Thanks for driving the release!

> Not sure about SPARK-36057 since the current state.

@Igor Costa Thanks for your attention, as dongjoon said, basic framework
abilities of  custom scheduler have been supported, we are also planning to
mark this as beta in 3.3.0. Of course, we will do more tests to make sure
it is more stable and also welcome more input to make it better
continuously.

> I don't think that could be a blocker for Apache Spark 3.2.0.

Yep, and v3.3.0, : )

Regards,
Yikun

Re: Apache Spark 3.3 Release

Posted by Dongjoon Hyun <do...@gmail.com>.

I've reviewed most of the actual code in that area.

That's pretty much an experimental feature still.

I don't think that could be a blocker for Apache Spark 3.2.0.

Dongjoon.



On Fri, Mar 4, 2022 at 12:25 PM Igor Costa <ig...@gmail.com> wrote:

> Thanks Maxim,
>
> The code freeze by end of this month would be fine. Not sure about
> SPARK-36057 since the current state.
>
>
>
> Thanks
>
> On Fri, 4 Mar 2022 at 19:27, Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> Thanks Maxim for volunteering to drive the release! I support the plan
>> (March 15th) to perform a release branch cut.
>>
>> Btw, would we be open for modification of critical/blocker issues after
>> the release branch cut? I have a blocker JIRA ticket and the PR is open for
>> reviewing, but need some time to gain traction as well as going through
>> actual reviews. My guess is yes but to confirm again.
>>
>> On Fri, Mar 4, 2022 at 4:20 AM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>>> Thank you, Max, for volunteering for Apache Spark 3.3 release manager.
>>>
>>> Ya, I'm also +1 for the original plan.
>>>
>>> Dongjoon
>>>
>>> On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan <mr...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> Agree with Sean, code freeze by mid March sounds good.
>>>>
>>>> Regards,
>>>> Mridul
>>>>
>>>> On Thu, Mar 3, 2022 at 12:47 PM Sean Owen <sr...@gmail.com> wrote:
>>>>
>>>>> I think it's fine to pursue the existing plan - code freeze in two
>>>>> weeks and try to close off key remaining issues. Final release pending on
>>>>> how those go, and testing, but fine to get the ball rolling.
>>>>>
>>>>> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>>>>> <ma...@databricks.com.invalid> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> I would like to bring on the table the theme about the new Spark
>>>>>> release 3.3. According to the public schedule at
>>>>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>>>>> the code freeze and release branch cut on March 15th, 2022. Since this date
>>>>>> is coming soon, I would like to take your attention on the topic and gather
>>>>>> objections that you might have.
>>>>>>
>>>>>> Bellow is the list of ongoing and active SPIPs:
>>>>>>
>>>>>> Spark SQL:
>>>>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>>>>> - [SPARK-35801] Row-level operations in Data Source V2
>>>>>> - [SPARK-37166] Storage Partitioned Join
>>>>>>
>>>>>> Spark Core:
>>>>>> - [SPARK-20624] Add better handling for node shutdown
>>>>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>>>>
>>>>>> PySpark:
>>>>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>>>>
>>>>>> Kubernetes:
>>>>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>>>>
>>>>>> Probably, we should finish if there are any remaining works for Spark
>>>>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>>>>> would like to volunteer to help drive this process.
>>>>>>
>>>>>> Best regards,
>>>>>> Max Gekk
>>>>>>
>>>>> --
> Sent from Gmail Mobile
>

Re: Apache Spark 3.3 Release

Posted by Igor Costa <ig...@gmail.com>.

Thanks Maxim,

The code freeze by end of this month would be fine. Not sure about
SPARK-36057 since the current state.



Thanks

On Fri, 4 Mar 2022 at 19:27, Jungtaek Lim <ka...@gmail.com>
wrote:

> Thanks Maxim for volunteering to drive the release! I support the plan
> (March 15th) to perform a release branch cut.
>
> Btw, would we be open for modification of critical/blocker issues after
> the release branch cut? I have a blocker JIRA ticket and the PR is open for
> reviewing, but need some time to gain traction as well as going through
> actual reviews. My guess is yes but to confirm again.
>
> On Fri, Mar 4, 2022 at 4:20 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Thank you, Max, for volunteering for Apache Spark 3.3 release manager.
>>
>> Ya, I'm also +1 for the original plan.
>>
>> Dongjoon
>>
>> On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan <mr...@gmail.com>
>> wrote:
>>
>>>
>>> Agree with Sean, code freeze by mid March sounds good.
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Thu, Mar 3, 2022 at 12:47 PM Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> I think it's fine to pursue the existing plan - code freeze in two
>>>> weeks and try to close off key remaining issues. Final release pending on
>>>> how those go, and testing, but fine to get the ball rolling.
>>>>
>>>> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>>>> <ma...@databricks.com.invalid> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> I would like to bring on the table the theme about the new Spark
>>>>> release 3.3. According to the public schedule at
>>>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>>>> the code freeze and release branch cut on March 15th, 2022. Since this date
>>>>> is coming soon, I would like to take your attention on the topic and gather
>>>>> objections that you might have.
>>>>>
>>>>> Bellow is the list of ongoing and active SPIPs:
>>>>>
>>>>> Spark SQL:
>>>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>>>> - [SPARK-35801] Row-level operations in Data Source V2
>>>>> - [SPARK-37166] Storage Partitioned Join
>>>>>
>>>>> Spark Core:
>>>>> - [SPARK-20624] Add better handling for node shutdown
>>>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>>>
>>>>> PySpark:
>>>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>>>
>>>>> Kubernetes:
>>>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>>>
>>>>> Probably, we should finish if there are any remaining works for Spark
>>>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>>>> would like to volunteer to help drive this process.
>>>>>
>>>>> Best regards,
>>>>> Max Gekk
>>>>>
>>>> --
Sent from Gmail Mobile

Re: Apache Spark 3.3 Release

Posted by Jungtaek Lim <ka...@gmail.com>.

Thanks Maxim for volunteering to drive the release! I support the plan
(March 15th) to perform a release branch cut.

Btw, would we be open for modification of critical/blocker issues after the
release branch cut? I have a blocker JIRA ticket and the PR is open for
reviewing, but need some time to gain traction as well as going through
actual reviews. My guess is yes but to confirm again.

On Fri, Mar 4, 2022 at 4:20 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you, Max, for volunteering for Apache Spark 3.3 release manager.
>
> Ya, I'm also +1 for the original plan.
>
> Dongjoon
>
> On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan <mr...@gmail.com>
> wrote:
>
>>
>> Agree with Sean, code freeze by mid March sounds good.
>>
>> Regards,
>> Mridul
>>
>> On Thu, Mar 3, 2022 at 12:47 PM Sean Owen <sr...@gmail.com> wrote:
>>
>>> I think it's fine to pursue the existing plan - code freeze in two weeks
>>> and try to close off key remaining issues. Final release pending on how
>>> those go, and testing, but fine to get the ball rolling.
>>>
>>> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>>> <ma...@databricks.com.invalid> wrote:
>>>
>>>> Hello All,
>>>>
>>>> I would like to bring on the table the theme about the new Spark
>>>> release 3.3. According to the public schedule at
>>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>>> the code freeze and release branch cut on March 15th, 2022. Since this date
>>>> is coming soon, I would like to take your attention on the topic and gather
>>>> objections that you might have.
>>>>
>>>> Bellow is the list of ongoing and active SPIPs:
>>>>
>>>> Spark SQL:
>>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>>> - [SPARK-35801] Row-level operations in Data Source V2
>>>> - [SPARK-37166] Storage Partitioned Join
>>>>
>>>> Spark Core:
>>>> - [SPARK-20624] Add better handling for node shutdown
>>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>>
>>>> PySpark:
>>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>>
>>>> Kubernetes:
>>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>>
>>>> Probably, we should finish if there are any remaining works for Spark
>>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>>> would like to volunteer to help drive this process.
>>>>
>>>> Best regards,
>>>> Max Gekk
>>>>
>>>

Re: Apache Spark 3.3 Release

Posted by Dongjoon Hyun <do...@gmail.com>.

Thank you, Max, for volunteering for Apache Spark 3.3 release manager.

Ya, I'm also +1 for the original plan.

Dongjoon

On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan <mr...@gmail.com>
wrote:

>
> Agree with Sean, code freeze by mid March sounds good.
>
> Regards,
> Mridul
>
> On Thu, Mar 3, 2022 at 12:47 PM Sean Owen <sr...@gmail.com> wrote:
>
>> I think it's fine to pursue the existing plan - code freeze in two weeks
>> and try to close off key remaining issues. Final release pending on how
>> those go, and testing, but fine to get the ball rolling.
>>
>> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>> <ma...@databricks.com.invalid> wrote:
>>
>>> Hello All,
>>>
>>> I would like to bring on the table the theme about the new Spark release
>>> 3.3. According to the public schedule at
>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>> the code freeze and release branch cut on March 15th, 2022. Since this date
>>> is coming soon, I would like to take your attention on the topic and gather
>>> objections that you might have.
>>>
>>> Bellow is the list of ongoing and active SPIPs:
>>>
>>> Spark SQL:
>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>> - [SPARK-35801] Row-level operations in Data Source V2
>>> - [SPARK-37166] Storage Partitioned Join
>>>
>>> Spark Core:
>>> - [SPARK-20624] Add better handling for node shutdown
>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>
>>> PySpark:
>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>
>>> Kubernetes:
>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>
>>> Probably, we should finish if there are any remaining works for Spark
>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>> would like to volunteer to help drive this process.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>

Re: Apache Spark 3.3 Release

Posted by Mridul Muralidharan <mr...@gmail.com>.

Agree with Sean, code freeze by mid March sounds good.

Regards,
Mridul

On Thu, Mar 3, 2022 at 12:47 PM Sean Owen <sr...@gmail.com> wrote:

> I think it's fine to pursue the existing plan - code freeze in two weeks
> and try to close off key remaining issues. Final release pending on how
> those go, and testing, but fine to get the ball rolling.
>
> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
> <ma...@databricks.com.invalid> wrote:
>
>> Hello All,
>>
>> I would like to bring on the table the theme about the new Spark release
>> 3.3. According to the public schedule at
>> https://spark.apache.org/versioning-policy.html, we planned to start the
>> code freeze and release branch cut on March 15th, 2022. Since this date is
>> coming soon, I would like to take your attention on the topic and gather
>> objections that you might have.
>>
>> Bellow is the list of ongoing and active SPIPs:
>>
>> Spark SQL:
>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>> - [SPARK-35801] Row-level operations in Data Source V2
>> - [SPARK-37166] Storage Partitioned Join
>>
>> Spark Core:
>> - [SPARK-20624] Add better handling for node shutdown
>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>
>> PySpark:
>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>
>> Kubernetes:
>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>
>> Probably, we should finish if there are any remaining works for Spark
>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>> would like to volunteer to help drive this process.
>>
>> Best regards,
>> Max Gekk
>>
>

Re: Apache Spark 3.3 Release

Posted by Sean Owen <sr...@gmail.com>.

I think it's fine to pursue the existing plan - code freeze in two weeks
and try to close off key remaining issues. Final release pending on how
those go, and testing, but fine to get the ball rolling.

On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
<ma...@databricks.com.invalid> wrote:

> Hello All,
>
> I would like to bring on the table the theme about the new Spark release
> 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html, we planned to start the
> code freeze and release branch cut on March 15th, 2022. Since this date is
> coming soon, I would like to take your attention on the topic and gather
> objections that you might have.
>
> Bellow is the list of ongoing and active SPIPs:
>
> Spark SQL:
> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
> - [SPARK-35801] Row-level operations in Data Source V2
> - [SPARK-37166] Storage Partitioned Join
>
> Spark Core:
> - [SPARK-20624] Add better handling for node shutdown
> - [SPARK-25299] Use remote storage for persisting shuffle data
>
> PySpark:
> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>
> Kubernetes:
> - [SPARK-36057] Support Customized Kubernetes Schedulers
>
> Probably, we should finish if there are any remaining works for Spark 3.3,
> and switch to QA mode, cut a branch and keep everything on track. I would
> like to volunteer to help drive this process.
>
> Best regards,
> Max Gekk
>

Re: Apache Spark 3.3 Release

Posted by Maciej <ms...@gmail.com>.

Thanks for the updated Max!

Just a small clarification ‒ the following should be moved to RESOLVED:

1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
3. SPARK-37093: Inline type hints python/pyspark/streaming

On 4/28/22 14:42, Maxim Gekk wrote:
> Hello All,
> 
> I am going to create the first release candidate of Spark 3.3 at the
> beginning of the next week if there are no objections. Below is the list
> of allow features, and their current status. At the moment, only one
> feature is still in progress, but it can be postponed to the next
> release, I guess:
> 
> IN PROGRESS:
> 
>  1. SPARK-28516: Data Type Formatting Functions: `to_char`
> 
> IN PROGRESS but won't/couldn't be merged to branch-3.3:
> 
>  1. SPARK-37650: Tell spark-env.sh the python interpreter
>  2. SPARK-36664: Log time spent waiting for cluster resources
>  3. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
>  4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>  5. SPARK-37093: Inline type hints python/pyspark/streaming
> 
> RESOLVED:
> 
>  1. SPARK-32268: Bloom Filter Join
>  2. SPARK-38548: New SQL function: try_sum
>  3. SPARK-38063: Support SQL split_part function
>  4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>     filter by self way
>  5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
>  6. SPARK-38194: Make Yarn memory overhead factor configurable
>  7. SPARK-37618: Support cleaning up shuffle blocks from external
>     shuffle service
>  8. SPARK-37831: Add task partition id in metrics
>  9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>     DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
> 10. SPARK-38590: New SQL function: try_to_binary
> 11. SPARK-37377: Refactor V2 Partitioning interface and remove
>     deprecated usage of Distribution
> 12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>     sources
> 13. SPARK-34659: Web UI does not correctly get appId
> 14. SPARK-38589: New SQL function: try_avg
> 15. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
> 16. SPARK-34079: Improvement CTE table scan
> 
> 
> Max Gekk
> 
> Software Engineer
> 
> Databricks, Inc.
> 
> 
> 
> On Fri, Apr 15, 2022 at 4:28 PM Maxim Gekk <maxim.gekk@databricks.com
> <ma...@databricks.com>> wrote:
> 
>     Hello All,
> 
>     Current status of features from the allow list for branch-3.3 is:
> 
>     IN PROGRESS:
> 
>      1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>      2. SPARK-28516: Data Type Formatting Functions: `to_char`
>      3. SPARK-34079: Improvement CTE table scan
> 
>     IN PROGRESS but won't/couldn't be merged to branch-3.3:
> 
>      1. SPARK-37650: Tell spark-env.sh the python interpreter
>      2. SPARK-36664: Log time spent waiting for cluster resources
>      3. SPARK-37396: Inline type hint files for files in
>         python/pyspark/mllib
>      4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>      5. SPARK-37093: Inline type hints python/pyspark/streaming
> 
>     RESOLVED:
> 
>      1. SPARK-32268: Bloom Filter Join
>      2. SPARK-38548: New SQL function: try_sum
>      3. SPARK-38063: Support SQL split_part function
>      4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>         filter by self way
>      5. SPARK-34863: Support nested column in Spark Parquet vectorized
>         readers
>      6. SPARK-38194: Make Yarn memory overhead factor configurable
>      7. SPARK-37618: Support cleaning up shuffle blocks from external
>         shuffle service
>      8. SPARK-37831: Add task partition id in metrics
>      9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>         DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>     10. SPARK-38590: New SQL function: try_to_binary
>     11. SPARK-37377: Refactor V2 Partitioning interface and remove
>         deprecated usage of Distribution
>     12. SPARK-38085: DataSource V2: Handle DELETE commands for
>         group-based sources
>     13. SPARK-34659: Web UI does not correctly get appId
>     14. SPARK-38589: New SQL function: try_avg
> 
> 
>     Max Gekk
> 
>     Software Engineer
> 
>     Databricks, Inc.
> 
> 
> 
>     On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk <maxim.gekk@databricks.com
>     <ma...@databricks.com>> wrote:
> 
>         Hello All,
> 
>         Below is current status of features from the allow list:
> 
>         IN PROGRESS:
> 
>          1. SPARK-37396: Inline type hint files for files in
>             python/pyspark/mllib
>          2. SPARK-37395: Inline type hint files for files in
>             python/pyspark/ml
>          3. SPARK-37093: Inline type hints python/pyspark/streaming
>          4. SPARK-37377: Refactor V2 Partitioning interface and remove
>             deprecated usage of Distribution
>          5. SPARK-38085: DataSource V2: Handle DELETE commands for
>             group-based sources
>          6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>          7. SPARK-28516: Data Type Formatting Functions: `to_char`
>          8. SPARK-36664: Log time spent waiting for cluster resources
>          9. SPARK-34659: Web UI does not correctly get appId
>         10. SPARK-37650: Tell spark-env.sh the python interpreter
>         11. SPARK-38589: New SQL function: try_avg
>         12. SPARK-38590: New SQL function: try_to_binary
>         13. SPARK-34079: Improvement CTE table scan
> 
>         RESOLVED:
> 
>          1. SPARK-32268: Bloom Filter Join
>          2. SPARK-38548: New SQL function: try_sum
>          3. SPARK-38063: Support SQL split_part function
>          4. SPARK-38432: Refactor framework so as JDBC dialect could
>             compile filter by self way
>          5. SPARK-34863: Support nested column in Spark Parquet
>             vectorized readers
>          6. SPARK-38194: Make Yarn memory overhead factor configurable
>          7. SPARK-37618: Support cleaning up shuffle blocks from
>             external shuffle service
>          8. SPARK-37831: Add task partition id in metrics
>          9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>             DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
> 
>         We need to decide whether we are going to wait a little bit more
>         or close the doors.
> 
>         Maxim Gekk
> 
>         Software Engineer
> 
>         Databricks, Inc.
> 
> 
> 
>         On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk
>         <maxim.gekk@databricks.com <ma...@databricks.com>>
>         wrote:
> 
>             Hi All,
> 
>             Here is the allow list which I built based on your requests
>             in this thread:
> 
>              1. SPARK-37396: Inline type hint files for files in
>                 python/pyspark/mllib
>              2. SPARK-37395: Inline type hint files for files in
>                 python/pyspark/ml
>              3. SPARK-37093: Inline type hints python/pyspark/streaming
>              4. SPARK-37377: Refactor V2 Partitioning interface and
>                 remove deprecated usage of Distribution
>              5. SPARK-38085: DataSource V2: Handle DELETE commands for
>                 group-based sources
>              6. SPARK-32268: Bloom Filter Join
>              7. SPARK-38548: New SQL function: try_sum
>              8. SPARK-37691: Support ANSI Aggregation Function:
>                 percentile_disc
>              9. SPARK-38063: Support SQL split_part function
>             10. SPARK-28516: Data Type Formatting Functions: `to_char`
>             11. SPARK-38432: Refactor framework so as JDBC dialect could
>                 compile filter by self way
>             12. SPARK-34863: Support nested column in Spark Parquet
>                 vectorized readers
>             13. SPARK-38194: Make Yarn memory overhead factor configurable
>             14. SPARK-37618: Support cleaning up shuffle blocks from
>                 external shuffle service
>             15. SPARK-37831: Add task partition id in metrics
>             16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>                 DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>             17. SPARK-36664: Log time spent waiting for cluster resources
>             18. SPARK-34659: Web UI does not correctly get appId
>             19. SPARK-37650: Tell spark-env.sh the python interpreter
>             20. SPARK-38589: New SQL function: try_avg
>             21. SPARK-38590: New SQL function: try_to_binary
>             22. SPARK-34079: Improvement CTE table scan
> 
>             Best regards,
>             Max Gekk
> 
> 
>             On Thu, Mar 17, 2022 at 4:59 PM Tom Graves
>             <tgraves_cs@yahoo.com <ma...@yahoo.com>> wrote:
> 
>                 Is the feature freeze target date March 22nd then?  I
>                 saw a few dates thrown around want to confirm what we
>                 landed on 
> 
>                 I am trying to get the following improvements finished
>                 review and in, if concerns with either, let me know:
>                 - [SPARK-34079][SQL] Merge non-correlated scalar
>                 subqueries <https://github.com/apache/spark/pull/32298#>
>                 - [SPARK-37618][CORE] Remove shuffle blocks using the
>                 shuffle service for released executors
>                 <https://github.com/apache/spark/pull/35085#>
> 
>                 Tom
> 
> 
>                 On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang
>                 Wang <ltnwgl@gmail.com <ma...@gmail.com>> wrote:
> 
> 
>                 I'd like to add the following new SQL functions in the
>                 3.3 release. These functions are useful when overflow or
>                 encoding errors occur:
> 
>                   * [SPARK-38548][SQL] New SQL function: try_sum
>                     <https://github.com/apache/spark/pull/35848> 
>                   * [SPARK-38589][SQL] New SQL function: try_avg
>                     <https://github.com/apache/spark/pull/35896>
>                   * [SPARK-38590][SQL] New SQL function: try_to_binary
>                     <https://github.com/apache/spark/pull/35897> 
> 
>                 Gengliang
> 
>                 On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo
>                 <andrew.melo@gmail.com <ma...@gmail.com>>
>                 wrote:
> 
>                     Hello,
> 
>                     I've been trying for a bit to get the following two
>                     PRs merged and
>                     into a release, and I'm having some difficulty
>                     moving them forward:
> 
>                     https://github.com/apache/spark/pull/34903
>                     <https://github.com/apache/spark/pull/34903> - This
>                     passes the current
>                     python interpreter to spark-env.sh to allow some
>                     currently-unavailable
>                     customization to happen
>                     https://github.com/apache/spark/pull/31774
>                     <https://github.com/apache/spark/pull/31774> - This
>                     fixes a bug in the
>                     SparkUI reverse proxy-handling code where it does a
>                     greedy match for
>                     "proxy" in the URL, and will mistakenly replace the
>                     App-ID in the
>                     wrong place.
> 
>                     I'm not exactly sure of how to get attention of PRs
>                     that have been
>                     sitting around for a while, but these are really
>                     important to our
>                     use-cases, and it would be nice to have them merged in.
> 
>                     Cheers
>                     Andrew
> 
>                     On Wed, Mar 16, 2022 at 6:21 PM Holden Karau
>                     <holden@pigscanfly.ca <ma...@pigscanfly.ca>>
>                     wrote:
>                     >
>                     > I'd like to add/backport the logging in
>                     https://github.com/apache/spark/pull/35881
>                     <https://github.com/apache/spark/pull/35881> PR so
>                     that when users submit issues with dynamic
>                     allocation we can better debug what's going on.
>                     >
>                     > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun
>                     <sunchao@apache.org <ma...@apache.org>> wrote:
>                     >>
>                     >> There is one item on our side that we want to
>                     backport to 3.3:
>                     >> - vectorized
>                     DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>                     >> Parquet V2 support
>                     (https://github.com/apache/spark/pull/35262
>                     <https://github.com/apache/spark/pull/35262>)
>                     >>
>                     >> It's already reviewed and approved.
>                     >>
>                     >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>                     <tg...@yahoo.com.invalid> wrote:
>                     >> >
>                     >> > It looks like the version hasn't been updated
>                     on master and still shows 3.3.0-SNAPSHOT, can you
>                     please update that.
>                     >> >
>                     >> > Tom
>                     >> >
>                     >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT,
>                     Maxim Gekk <maxim.gekk@databricks.com
>                     <ma...@databricks.com>.invalid> wrote:
>                     >> >
>                     >> >
>                     >> > Hi All,
>                     >> >
>                     >> > I have created the branch for Spark 3.3:
>                     >> >
>                     https://github.com/apache/spark/commits/branch-3.3
>                     <https://github.com/apache/spark/commits/branch-3.3>
>                     >> >
>                     >> > Please, backport important fixes to it, and if
>                     you have some doubts, ping me in the PR. Regarding
>                     new features, we are still building the allow list
>                     for branch-3.3.
>                     >> >
>                     >> > Best regards,
>                     >> > Max Gekk
>                     >> >
>                     >> >
>                     >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun
>                     <dongjoon.hyun@gmail.com
>                     <ma...@gmail.com>> wrote:
>                     >> >
>                     >> > Yes, I agree with you for your whitelist
>                     approach for backporting. :)
>                     >> > Thank you for summarizing.
>                     >> >
>                     >> > Thanks,
>                     >> > Dongjoon.
>                     >> >
>                     >> >
>                     >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li
>                     <gatorsmile@gmail.com <ma...@gmail.com>>
>                     wrote:
>                     >> >
>                     >> > I think I finally got your point. What you want
>                     to keep unchanged is the branch cut date of Spark
>                     3.3. Today? or this Friday? This is not a big deal.
>                     >> >
>                     >> > My major concern is whether we should keep
>                     merging the feature work or the dependency upgrade
>                     after the branch cut. To make our release time more
>                     predictable, I am suggesting we should finalize the
>                     exception PR list first, instead of merging them in
>                     an ad hoc way. In the past, we spent a lot of time
>                     on the revert of the PRs that were merged after the
>                     branch cut. I hope we can minimize unnecessary
>                     arguments in this release. Do you agree, Dongjoon?
>                     >> >
>                     >> >
>                     >> >
>                     >> > Dongjoon Hyun <dongjoon.hyun@gmail.com
>                     <ma...@gmail.com>> 于2022年3月15日周
>                     二 15:55写道：
>                     >> >
>                     >> > That is not totally fine, Xiao. It sounds like
>                     you are asking a change of plan without a proper reason.
>                     >> >
>                     >> > Although we cut the branch Today according our
>                     plan, you still can collect the list and make a list
>                     of exceptions. I'm not blocking what you want to do.
>                     >> >
>                     >> > Please let the community start to ramp down as
>                     we agreed before.
>                     >> >
>                     >> > Dongjoon
>                     >> >
>                     >> >
>                     >> >
>                     >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li
>                     <gatorsmile@gmail.com <ma...@gmail.com>>
>                     wrote:
>                     >> >
>                     >> > Please do not get me wrong. If we don't cut a
>                     branch, we are allowing all patches to land Apache
>                     Spark 3.3. That is totally fine. After we cut the
>                     branch, we should avoid merging the feature work. In
>                     the next three days, let us collect the actively
>                     developed PRs that we want to make an exception
>                     (i.e., merged to 3.3 after the upcoming branch cut).
>                     Does that make sense?
>                     >> >
>                     >> > Dongjoon Hyun <dongjoon.hyun@gmail.com
>                     <ma...@gmail.com>> 于2022年3月15日周
>                     二 14:54写道：
>                     >> >
>                     >> > Xiao. You are working against what you are saying.
>                     >> > If you don't cut a branch, it means you are
>                     allowing all patches to land Apache Spark 3.3. No?
>                     >> >
>                     >> > > we need to avoid backporting the feature work
>                     that are not being well discussed.
>                     >> >
>                     >> >
>                     >> >
>                     >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li
>                     <gatorsmile@gmail.com <ma...@gmail.com>>
>                     wrote:
>                     >> >
>                     >> > Cutting the branch is simple, but we need to
>                     avoid backporting the feature work that are not
>                     being well discussed. Not all the members are
>                     actively following the dev list. I think we should
>                     wait 3 more days for collecting the PR list before
>                     cutting the branch.
>                     >> >
>                     >> > BTW, there are very few 3.4-only feature work
>                     that will be affected.
>                     >> >
>                     >> > Xiao
>                     >> >
>                     >> > Dongjoon Hyun <dongjoon.hyun@gmail.com
>                     <ma...@gmail.com>> 于2022年3月15日周
>                     二 11:49写道：
>                     >> >
>                     >> > Hi, Max, Chao, Xiao, Holden and all.
>                     >> >
>                     >> > I have a different idea.
>                     >> >
>                     >> > Given the situation and small patch list, I
>                     don't think we need to postpone the branch cut for
>                     those patches. It's easier to cut a branch-3.3 and
>                     allow backporting.
>                     >> >
>                     >> > As of today, we already have an obvious Apache
>                     Spark 3.4 patch in the branch together. This
>                     situation only becomes worse and worse because there
>                     is no way to block the other patches from landing
>                     unintentionally if we don't cut a branch.
>                     >> >
>                     >> >     [SPARK-38335][SQL] Implement parser support
>                     for DEFAULT column values
>                     >> >
>                     >> > Let's cut `branch-3.3` Today for Apache Spark
>                     3.3.0 preparation.
>                     >> >
>                     >> > Best,
>                     >> > Dongjoon.
>                     >> >
>                     >> >
>                     >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun
>                     <sunchao@apache.org <ma...@apache.org>> wrote:
>                     >> >
>                     >> > Cool, thanks for clarifying!
>                     >> >
>                     >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li
>                     <gatorsmile@gmail.com <ma...@gmail.com>>
>                     wrote:
>                     >> > >>
>                     >> > >> For the following list:
>                     >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime
>                     Filtering
>                     >> > >> #34659 [SPARK-34863][SQL] Support complex
>                     types for Parquet vectorized reader
>                     >> > >> #35848 [SPARK-38548][SQL] New SQL function:
>                     try_sum
>                     >> > >> Do you mean we should include them, or
>                     exclude them from 3.3?
>                     >> > >
>                     >> > >
>                     >> > > If possible, I hope these features can be
>                     shipped with Spark 3.3.
>                     >> > >
>                     >> > >
>                     >> > >
>                     >> > > Chao Sun <sunchao@apache.org
>                     <ma...@apache.org>> 于2022年3月15日周二
>                     10:06写道：
>                     >> > >>
>                     >> > >> Hi Xiao,
>                     >> > >>
>                     >> > >> For the following list:
>                     >> > >>
>                     >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime
>                     Filtering
>                     >> > >> #34659 [SPARK-34863][SQL] Support complex
>                     types for Parquet vectorized reader
>                     >> > >> #35848 [SPARK-38548][SQL] New SQL function:
>                     try_sum
>                     >> > >>
>                     >> > >> Do you mean we should include them, or
>                     exclude them from 3.3?
>                     >> > >>
>                     >> > >> Thanks,
>                     >> > >> Chao
>                     >> > >>
>                     >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon
>                     Hyun <dongjoon.hyun@gmail.com
>                     <ma...@gmail.com>> wrote:
>                     >> > >> >
>                     >> > >> > The following was tested and merged a few
>                     minutes ago. So, we can remove it from the list.
>                     >> > >> >
>                     >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S]
>                     Bump Volcano to v1.5.1
>                     >> > >> >
>                     >> > >> > Thanks,
>                     >> > >> > Dongjoon.
>                     >> > >> >
>                     >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li
>                     <gatorsmile@gmail.com <ma...@gmail.com>>
>                     wrote:
>                     >> > >> >>
>                     >> > >> >> Let me clarify my above suggestion. Maybe
>                     we can wait 3 more days to collect the list of
>                     actively developed PRs that we want to merge to 3.3
>                     after the branch cut?
>                     >> > >> >>
>                     >> > >> >> Please do not rush to merge the PRs that
>                     are not fully reviewed. We can cut the branch this
>                     Friday and continue merging the PRs that have been
>                     discussed in this thread. Does that make sense?
>                     >> > >> >>
>                     >> > >> >> Xiao
>                     >> > >> >>
>                     >> > >> >>
>                     >> > >> >>
>                     >> > >> >> Holden Karau <holden@pigscanfly.ca
>                     <ma...@pigscanfly.ca>> 于2022年3月15日周二
>                     09:10写道：
>                     >> > >> >>>
>                     >> > >> >>> May I suggest we push out one week
>                     (22nd) just to give everyone a bit of breathing
>                     space? Rushed software development more often
>                     results in bugs.
>                     >> > >> >>>
>                     >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun
>                     Jiang <yikunkero@gmail.com
>                     <ma...@gmail.com>> wrote:
>                     >> > >> >>>>
>                     >> > >> >>>> > To make our release time more
>                     predictable, let us collect the PRs and wait three
>                     more days before the branch cut?
>                     >> > >> >>>>
>                     >> > >> >>>> For SPIP: Support Customized Kubernetes
>                     Schedulers:
>                     >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S]
>                     Bump Volcano to v1.5.1
>                     >> > >> >>>>
>                     >> > >> >>>> Three more days are OK for this from my
>                     view.
>                     >> > >> >>>>
>                     >> > >> >>>> Regards,
>                     >> > >> >>>> Yikun
>                     >> > >> >>>
>                     >> > >> >>> --
>                     >> > >> >>> Twitter: https://twitter.com/holdenkarau
>                     <https://twitter.com/holdenkarau>
>                     >> > >> >>> Books (Learning Spark, High Performance
>                     Spark, etc.): https://amzn.to/2MaRAG9
>                     <https://amzn.to/2MaRAG9>
>                     >> > >> >>> YouTube Live Streams:
>                     https://www.youtube.com/user/holdenkarau
>                     <https://www.youtube.com/user/holdenkarau>
>                     >
>                     >
>                     >
>                     > --
>                     > Twitter: https://twitter.com/holdenkarau
>                     <https://twitter.com/holdenkarau>
>                     > Books (Learning Spark, High Performance Spark,
>                     etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9>
>                     > YouTube Live Streams:
>                     https://www.youtube.com/user/holdenkarau
>                     <https://www.youtube.com/user/holdenkarau>
> 
>                     ---------------------------------------------------------------------
>                     To unsubscribe e-mail:
>                     dev-unsubscribe@spark.apache.org
>                     <ma...@spark.apache.org>
> 


-- 
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
PGP: A30CEF0C31A501EC

Re: Apache Spark 3.3 Release

Posted by Jacky Lee <qc...@gmail.com>.

I also have a PR that has been ready to merge for a while, can we merge in
3.3.0?
[SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics
https://github.com/apache/spark/pull/35185

Adam Binford <ad...@gmail.com> 于2022年3月16日周三 21:16写道：

> Also throwing my hat in for two of my PRs that should be ready just need
> final reviews/approval:
> Removing shuffles from deallocated executors using the shuffle service:
> https://github.com/apache/spark/pull/35085. This has been asked for for
> several years across many issues.
> Configurable memory overhead factor:
> https://github.com/apache/spark/pull/35504
>
> Adam
>
> On Wed, Mar 16, 2022 at 8:53 AM Wenchen Fan <cl...@gmail.com> wrote:
>
>> +1 to define an allowlist of features that we want to backport to branch
>> 3.3. I also have a few in my mind
>> complex type support in vectorized parquet reader:
>> https://github.com/apache/spark/pull/34659
>> refine the DS v2 filter API for JDBC v2:
>> https://github.com/apache/spark/pull/35768
>> a few new SQL functions that have been in development for a while:
>> to_char, split_part, percentile_disc, try_sum, etc.
>>
>> On Wed, Mar 16, 2022 at 2:41 PM Maxim Gekk
>> <ma...@databricks.com.invalid> wrote:
>>
>>> Hi All,
>>>
>>> I have created the branch for Spark 3.3:
>>> https://github.com/apache/spark/commits/branch-3.3
>>>
>>> Please, backport important fixes to it, and if you have some doubts,
>>> ping me in the PR. Regarding new features, we are still building the allow
>>> list for branch-3.3.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <do...@gmail.com>
>>> wrote:
>>>
>>>> Yes, I agree with you for your whitelist approach for backporting. :)
>>>> Thank you for summarizing.
>>>>
>>>> Thanks,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
>>>>
>>>>> I think I finally got your point. What you want to keep unchanged is
>>>>> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
>>>>> deal.
>>>>>
>>>>> My major concern is whether we should keep merging the feature work or
>>>>> the dependency upgrade after the branch cut. To make our release time more
>>>>> predictable, I am suggesting we should finalize the exception PR list
>>>>> first, instead of merging them in an ad hoc way. In the past, we spent a
>>>>> lot of time on the revert of the PRs that were merged after the branch cut.
>>>>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>>>>> Dongjoon?
>>>>>
>>>>>
>>>>>
>>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>>>>>
>>>>>> That is not totally fine, Xiao. It sounds like you are asking a
>>>>>> change of plan without a proper reason.
>>>>>>
>>>>>> Although we cut the branch Today according our plan, you still can
>>>>>> collect the list and make a list of exceptions. I'm not blocking what you
>>>>>> want to do.
>>>>>>
>>>>>> Please let the community start to ramp down as we agreed before.
>>>>>>
>>>>>> Dongjoon
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
>>>>>>
>>>>>>> Please do not get me wrong. If we don't cut a branch, we are
>>>>>>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>>>>>>> we cut the branch, we should avoid merging the feature work. In the next
>>>>>>> three days, let us collect the actively developed PRs that we want to make
>>>>>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>>>>>> make sense?
>>>>>>>
>>>>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>>>>>>
>>>>>>>> Xiao. You are working against what you are saying.
>>>>>>>> If you don't cut a branch, it means you are allowing all patches to
>>>>>>>> land Apache Spark 3.3. No?
>>>>>>>>
>>>>>>>> > we need to avoid backporting the feature work that are not being
>>>>>>>> well discussed.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Cutting the branch is simple, but we need to avoid backporting the
>>>>>>>>> feature work that are not being well discussed. Not all the members are
>>>>>>>>> actively following the dev list. I think we should wait 3 more days for
>>>>>>>>> collecting the PR list before cutting the branch.
>>>>>>>>>
>>>>>>>>> BTW, there are very few 3.4-only feature work that will be
>>>>>>>>> affected.
>>>>>>>>>
>>>>>>>>> Xiao
>>>>>>>>>
>>>>>>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>>>>>>>
>>>>>>>>>> Hi, Max, Chao, Xiao, Holden and all.
>>>>>>>>>>
>>>>>>>>>> I have a different idea.
>>>>>>>>>>
>>>>>>>>>> Given the situation and small patch list, I don't think we need
>>>>>>>>>> to postpone the branch cut for those patches. It's easier to cut a
>>>>>>>>>> branch-3.3 and allow backporting.
>>>>>>>>>>
>>>>>>>>>> As of today, we already have an obvious Apache Spark 3.4 patch in
>>>>>>>>>> the branch together. This situation only becomes worse and worse because
>>>>>>>>>> there is no way to block the other patches from landing unintentionally if
>>>>>>>>>> we don't cut a branch.
>>>>>>>>>>
>>>>>>>>>>     [SPARK-38335][SQL] Implement parser support for DEFAULT
>>>>>>>>>> column values
>>>>>>>>>>
>>>>>>>>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Dongjoon.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Cool, thanks for clarifying!
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> >>
>>>>>>>>>>> >> For the following list:
>>>>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>>>>>> vectorized reader
>>>>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > If possible, I hope these features can be shipped with Spark
>>>>>>>>>>> 3.3.
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>>>>>>>>> >>
>>>>>>>>>>> >> Hi Xiao,
>>>>>>>>>>> >>
>>>>>>>>>>> >> For the following list:
>>>>>>>>>>> >>
>>>>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>>>>>> vectorized reader
>>>>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>>>>>> >>
>>>>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>>>>>> >>
>>>>>>>>>>> >> Thanks,
>>>>>>>>>>> >> Chao
>>>>>>>>>>> >>
>>>>>>>>>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>>>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > The following was tested and merged a few minutes ago. So,
>>>>>>>>>>> we can remove it from the list.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>>>>>>>>> v1.5.1
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Thanks,
>>>>>>>>>>> >> > Dongjoon.
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <
>>>>>>>>>>> gatorsmile@gmail.com> wrote:
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3
>>>>>>>>>>> more days to collect the list of actively developed PRs that we want to
>>>>>>>>>>> merge to 3.3 after the branch cut?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Please do not rush to merge the PRs that are not fully
>>>>>>>>>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>>>>>>>>>> that have been discussed in this thread. Does that make sense?
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Xiao
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> May I suggest we push out one week (22nd) just to give
>>>>>>>>>>> everyone a bit of breathing space? Rushed software development more often
>>>>>>>>>>> results in bugs.
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>>>>>>>>> yikunkero@gmail.com> wrote:
>>>>>>>>>>> >> >>>>
>>>>>>>>>>> >> >>>> > To make our release time more predictable, let us
>>>>>>>>>>> collect the PRs and wait three more days before the branch cut?
>>>>>>>>>>> >> >>>>
>>>>>>>>>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>>>>>>>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>>>>>>>>> v1.5.1
>>>>>>>>>>> >> >>>>
>>>>>>>>>>> >> >>>> Three more days are OK for this from my view.
>>>>>>>>>>> >> >>>>
>>>>>>>>>>> >> >>>> Regards,
>>>>>>>>>>> >> >>>> Yikun
>>>>>>>>>>> >> >>>
>>>>>>>>>>> >> >>> --
>>>>>>>>>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>>>> https://amzn.to/2MaRAG9
>>>>>>>>>>> >> >>> YouTube Live Streams:
>>>>>>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>>>>>>>
>>>>>>>>>>
>
> --
> Adam Binford
>

Re: Apache Spark 3.3 Release

Posted by Adam Binford <ad...@gmail.com>.

Also throwing my hat in for two of my PRs that should be ready just need
final reviews/approval:
Removing shuffles from deallocated executors using the shuffle service:
https://github.com/apache/spark/pull/35085. This has been asked for for
several years across many issues.
Configurable memory overhead factor:
https://github.com/apache/spark/pull/35504

Adam

On Wed, Mar 16, 2022 at 8:53 AM Wenchen Fan <cl...@gmail.com> wrote:

> +1 to define an allowlist of features that we want to backport to branch
> 3.3. I also have a few in my mind
> complex type support in vectorized parquet reader:
> https://github.com/apache/spark/pull/34659
> refine the DS v2 filter API for JDBC v2:
> https://github.com/apache/spark/pull/35768
> a few new SQL functions that have been in development for a while:
> to_char, split_part, percentile_disc, try_sum, etc.
>
> On Wed, Mar 16, 2022 at 2:41 PM Maxim Gekk
> <ma...@databricks.com.invalid> wrote:
>
>> Hi All,
>>
>> I have created the branch for Spark 3.3:
>> https://github.com/apache/spark/commits/branch-3.3
>>
>> Please, backport important fixes to it, and if you have some doubts, ping
>> me in the PR. Regarding new features, we are still building the allow list
>> for branch-3.3.
>>
>> Best regards,
>> Max Gekk
>>
>>
>> On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>>> Yes, I agree with you for your whitelist approach for backporting. :)
>>> Thank you for summarizing.
>>>
>>> Thanks,
>>> Dongjoon.
>>>
>>>
>>> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
>>>
>>>> I think I finally got your point. What you want to keep unchanged is
>>>> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
>>>> deal.
>>>>
>>>> My major concern is whether we should keep merging the feature work or
>>>> the dependency upgrade after the branch cut. To make our release time more
>>>> predictable, I am suggesting we should finalize the exception PR list
>>>> first, instead of merging them in an ad hoc way. In the past, we spent a
>>>> lot of time on the revert of the PRs that were merged after the branch cut.
>>>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>>>> Dongjoon?
>>>>
>>>>
>>>>
>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>>>>
>>>>> That is not totally fine, Xiao. It sounds like you are asking a change
>>>>> of plan without a proper reason.
>>>>>
>>>>> Although we cut the branch Today according our plan, you still can
>>>>> collect the list and make a list of exceptions. I'm not blocking what you
>>>>> want to do.
>>>>>
>>>>> Please let the community start to ramp down as we agreed before.
>>>>>
>>>>> Dongjoon
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
>>>>>
>>>>>> Please do not get me wrong. If we don't cut a branch, we are allowing
>>>>>> all patches to land Apache Spark 3.3. That is totally fine. After we cut
>>>>>> the branch, we should avoid merging the feature work. In the next three
>>>>>> days, let us collect the actively developed PRs that we want to make an
>>>>>> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>>>>> make sense?
>>>>>>
>>>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>>>>>
>>>>>>> Xiao. You are working against what you are saying.
>>>>>>> If you don't cut a branch, it means you are allowing all patches to
>>>>>>> land Apache Spark 3.3. No?
>>>>>>>
>>>>>>> > we need to avoid backporting the feature work that are not being
>>>>>>> well discussed.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Cutting the branch is simple, but we need to avoid backporting the
>>>>>>>> feature work that are not being well discussed. Not all the members are
>>>>>>>> actively following the dev list. I think we should wait 3 more days for
>>>>>>>> collecting the PR list before cutting the branch.
>>>>>>>>
>>>>>>>> BTW, there are very few 3.4-only feature work that will be affected.
>>>>>>>>
>>>>>>>> Xiao
>>>>>>>>
>>>>>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>>>>>>
>>>>>>>>> Hi, Max, Chao, Xiao, Holden and all.
>>>>>>>>>
>>>>>>>>> I have a different idea.
>>>>>>>>>
>>>>>>>>> Given the situation and small patch list, I don't think we need to
>>>>>>>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>>>>>>>> and allow backporting.
>>>>>>>>>
>>>>>>>>> As of today, we already have an obvious Apache Spark 3.4 patch in
>>>>>>>>> the branch together. This situation only becomes worse and worse because
>>>>>>>>> there is no way to block the other patches from landing unintentionally if
>>>>>>>>> we don't cut a branch.
>>>>>>>>>
>>>>>>>>>     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>>>>>>>> values
>>>>>>>>>
>>>>>>>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Dongjoon.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Cool, thanks for clarifying!
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> For the following list:
>>>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>>>>> vectorized reader
>>>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > If possible, I hope these features can be shipped with Spark
>>>>>>>>>> 3.3.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>>>>>>>> >>
>>>>>>>>>> >> Hi Xiao,
>>>>>>>>>> >>
>>>>>>>>>> >> For the following list:
>>>>>>>>>> >>
>>>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>>>>> vectorized reader
>>>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>>>>> >>
>>>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>>>>> >>
>>>>>>>>>> >> Thanks,
>>>>>>>>>> >> Chao
>>>>>>>>>> >>
>>>>>>>>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>>>> >> >
>>>>>>>>>> >> > The following was tested and merged a few minutes ago. So,
>>>>>>>>>> we can remove it from the list.
>>>>>>>>>> >> >
>>>>>>>>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>>>>>>>> >> >
>>>>>>>>>> >> > Thanks,
>>>>>>>>>> >> > Dongjoon.
>>>>>>>>>> >> >
>>>>>>>>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <
>>>>>>>>>> gatorsmile@gmail.com> wrote:
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3
>>>>>>>>>> more days to collect the list of actively developed PRs that we want to
>>>>>>>>>> merge to 3.3 after the branch cut?
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Please do not rush to merge the PRs that are not fully
>>>>>>>>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>>>>>>>>> that have been discussed in this thread. Does that make sense?
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Xiao
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >>
>>>>>>>>>> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> May I suggest we push out one week (22nd) just to give
>>>>>>>>>> everyone a bit of breathing space? Rushed software development more often
>>>>>>>>>> results in bugs.
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>>>>>>>> yikunkero@gmail.com> wrote:
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> > To make our release time more predictable, let us
>>>>>>>>>> collect the PRs and wait three more days before the branch cut?
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>>>>>>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>>>>>>>> v1.5.1
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> Three more days are OK for this from my view.
>>>>>>>>>> >> >>>>
>>>>>>>>>> >> >>>> Regards,
>>>>>>>>>> >> >>>> Yikun
>>>>>>>>>> >> >>>
>>>>>>>>>> >> >>> --
>>>>>>>>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>>> https://amzn.to/2MaRAG9
>>>>>>>>>> >> >>> YouTube Live Streams:
>>>>>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>>>>>>
>>>>>>>>>

-- 
Adam Binford

Re: Apache Spark 3.3 Release

Posted by Wenchen Fan <cl...@gmail.com>.

+1 to define an allowlist of features that we want to backport to branch
3.3. I also have a few in my mind
complex type support in vectorized parquet reader:
https://github.com/apache/spark/pull/34659
refine the DS v2 filter API for JDBC v2:
https://github.com/apache/spark/pull/35768
a few new SQL functions that have been in development for a while: to_char,
split_part, percentile_disc, try_sum, etc.

On Wed, Mar 16, 2022 at 2:41 PM Maxim Gekk
<ma...@databricks.com.invalid> wrote:

> Hi All,
>
> I have created the branch for Spark 3.3:
> https://github.com/apache/spark/commits/branch-3.3
>
> Please, backport important fixes to it, and if you have some doubts, ping
> me in the PR. Regarding new features, we are still building the allow list
> for branch-3.3.
>
> Best regards,
> Max Gekk
>
>
> On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Yes, I agree with you for your whitelist approach for backporting. :)
>> Thank you for summarizing.
>>
>> Thanks,
>> Dongjoon.
>>
>>
>> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
>>
>>> I think I finally got your point. What you want to keep unchanged is the
>>> branch cut date of Spark 3.3. Today? or this Friday? This is not a big
>>> deal.
>>>
>>> My major concern is whether we should keep merging the feature work or
>>> the dependency upgrade after the branch cut. To make our release time more
>>> predictable, I am suggesting we should finalize the exception PR list
>>> first, instead of merging them in an ad hoc way. In the past, we spent a
>>> lot of time on the revert of the PRs that were merged after the branch cut.
>>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>>> Dongjoon?
>>>
>>>
>>>
>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>>>
>>>> That is not totally fine, Xiao. It sounds like you are asking a change
>>>> of plan without a proper reason.
>>>>
>>>> Although we cut the branch Today according our plan, you still can
>>>> collect the list and make a list of exceptions. I'm not blocking what you
>>>> want to do.
>>>>
>>>> Please let the community start to ramp down as we agreed before.
>>>>
>>>> Dongjoon
>>>>
>>>>
>>>>
>>>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
>>>>
>>>>> Please do not get me wrong. If we don't cut a branch, we are allowing
>>>>> all patches to land Apache Spark 3.3. That is totally fine. After we cut
>>>>> the branch, we should avoid merging the feature work. In the next three
>>>>> days, let us collect the actively developed PRs that we want to make an
>>>>> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>>>> make sense?
>>>>>
>>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>>>>
>>>>>> Xiao. You are working against what you are saying.
>>>>>> If you don't cut a branch, it means you are allowing all patches to
>>>>>> land Apache Spark 3.3. No?
>>>>>>
>>>>>> > we need to avoid backporting the feature work that are not being
>>>>>> well discussed.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Cutting the branch is simple, but we need to avoid backporting the
>>>>>>> feature work that are not being well discussed. Not all the members are
>>>>>>> actively following the dev list. I think we should wait 3 more days for
>>>>>>> collecting the PR list before cutting the branch.
>>>>>>>
>>>>>>> BTW, there are very few 3.4-only feature work that will be affected.
>>>>>>>
>>>>>>> Xiao
>>>>>>>
>>>>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>>>>>
>>>>>>>> Hi, Max, Chao, Xiao, Holden and all.
>>>>>>>>
>>>>>>>> I have a different idea.
>>>>>>>>
>>>>>>>> Given the situation and small patch list, I don't think we need to
>>>>>>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>>>>>>> and allow backporting.
>>>>>>>>
>>>>>>>> As of today, we already have an obvious Apache Spark 3.4 patch in
>>>>>>>> the branch together. This situation only becomes worse and worse because
>>>>>>>> there is no way to block the other patches from landing unintentionally if
>>>>>>>> we don't cut a branch.
>>>>>>>>
>>>>>>>>     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>>>>>>> values
>>>>>>>>
>>>>>>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Dongjoon.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Cool, thanks for clarifying!
>>>>>>>>>
>>>>>>>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> >>
>>>>>>>>> >> For the following list:
>>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>>>> vectorized reader
>>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>>>>>>> >>
>>>>>>>>> >> Hi Xiao,
>>>>>>>>> >>
>>>>>>>>> >> For the following list:
>>>>>>>>> >>
>>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>>>> vectorized reader
>>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>>>> >>
>>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>>>> >>
>>>>>>>>> >> Thanks,
>>>>>>>>> >> Chao
>>>>>>>>> >>
>>>>>>>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>>> >> >
>>>>>>>>> >> > The following was tested and merged a few minutes ago. So, we
>>>>>>>>> can remove it from the list.
>>>>>>>>> >> >
>>>>>>>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>>>>>>> >> >
>>>>>>>>> >> > Thanks,
>>>>>>>>> >> > Dongjoon.
>>>>>>>>> >> >
>>>>>>>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> >> >>
>>>>>>>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>>>>>>>> days to collect the list of actively developed PRs that we want to merge to
>>>>>>>>> 3.3 after the branch cut?
>>>>>>>>> >> >>
>>>>>>>>> >> >> Please do not rush to merge the PRs that are not fully
>>>>>>>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>>>>>>>> that have been discussed in this thread. Does that make sense?
>>>>>>>>> >> >>
>>>>>>>>> >> >> Xiao
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >>
>>>>>>>>> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> May I suggest we push out one week (22nd) just to give
>>>>>>>>> everyone a bit of breathing space? Rushed software development more often
>>>>>>>>> results in bugs.
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>>>>>>> yikunkero@gmail.com> wrote:
>>>>>>>>> >> >>>>
>>>>>>>>> >> >>>> > To make our release time more predictable, let us
>>>>>>>>> collect the PRs and wait three more days before the branch cut?
>>>>>>>>> >> >>>>
>>>>>>>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>>>>>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>>>>>>> v1.5.1
>>>>>>>>> >> >>>>
>>>>>>>>> >> >>>> Three more days are OK for this from my view.
>>>>>>>>> >> >>>>
>>>>>>>>> >> >>>> Regards,
>>>>>>>>> >> >>>> Yikun
>>>>>>>>> >> >>>
>>>>>>>>> >> >>> --
>>>>>>>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>> https://amzn.to/2MaRAG9
>>>>>>>>> >> >>> YouTube Live Streams:
>>>>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>>>>>
>>>>>>>>

Re: Apache Spark 3.3 Release

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

 Maybe I'm miss understanding what you are saying, according to those dates code freeze, which should be majority of features are merged is March 15th. So if this list is all features and not merged at this point we should probably discuss if we want them to go in or if we need to change the dates.  Major features going in during QA period can destabilize things.
Tom
    On Monday, March 21, 2022, 01:53:24 AM CDT, Wenchen Fan <cl...@gmail.com> wrote:  
 
 Just checked the release calendar, the planned RC cut date is April:
Let's revisit after 2 weeks then?
On Mon, Mar 21, 2022 at 2:47 PM Wenchen Fan <cl...@gmail.com> wrote:

Shall we revisit this list after a week? Ideally, they should be either merged or rejected for 3.3, so that we can cut rc1. We can still discuss them case by case at that time if there are exceptions.
On Sat, Mar 19, 2022 at 5:27 AM Dongjoon Hyun <do...@gmail.com> wrote:

Thank you for your summarization.

I believe we need to have a discussion in order to evaluate each PR's readiness.

BTW, `branch-3.3` is still open for bug fixes including minor dependency changes like the following.

(Backported)[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4
Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5

(Upcoming)
[SPARK-38544][BUILD] Upgrade log4j2 to 2.17.2 from 2.17.1
[SPARK-38602][BUILD] Upgrade Kafka to 3.1.1 from 3.1.0
Dongjoon.


On Thu, Mar 17, 2022 at 11:22 PM Maxim Gekk <ma...@databricks.com> wrote:

Hi All,
Here is the allow list which I built based on your requests in this thread:   
   - SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   - SPARK-37395: Inline type hint files for files in python/pyspark/ml
   - SPARK-37093: Inline type hints python/pyspark/streaming
   - SPARK-37377: Refactor V2 Partitioning interface and remove deprecated usage of Distribution
   - SPARK-38085: DataSource V2: Handle DELETE commands for group-based sources
   - SPARK-32268: Bloom Filter Join
   - SPARK-38548: New SQL function: try_sum
   - SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   - SPARK-38063: Support SQL split_part function
   - SPARK-28516: Data Type Formatting Functions: `to_char`
   - SPARK-38432: Refactor framework so as JDBC dialect could compile filter by self way
   - SPARK-34863: Support nested column in Spark Parquet vectorized readers
   - SPARK-38194: Make Yarn memory overhead factor configurable
   - SPARK-37618: Support cleaning up shuffle blocks from external shuffle service
   - SPARK-37831: Add task partition id in metrics
   - SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   - SPARK-36664: Log time spent waiting for cluster resources
   - SPARK-34659: Web UI does not correctly get appId
   - SPARK-37650: Tell spark-env.sh the python interpreter
   - SPARK-38589: New SQL function: try_avg
   - SPARK-38590: New SQL function: try_to_binary   

   - SPARK-34079: Improvement CTE table scan   

Best regards,Max Gekk

On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:

 Is the feature freeze target date March 22nd then?  I saw a few dates thrown around want to confirm what we landed on 
I am trying to get the following improvements finished review and in, if concerns with either, let me know:- [SPARK-34079][SQL] Merge non-correlated scalar subqueries- [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service for released executors
Tom

    On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <lt...@gmail.com> wrote:  
 
 I'd like to add the following new SQL functions in the 3.3 release. These functions are useful when overflow or encoding errors occur:   
   - [SPARK-38548][SQL] New SQL function: try_sum    

   - [SPARK-38589][SQL] New SQL function: try_avg   

   - [SPARK-38590][SQL] New SQL function: try_to_binary    

Gengliang
On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com> wrote:

Hello,

I've been trying for a bit to get the following two PRs merged and
into a release, and I'm having some difficulty moving them forward:

https://github.com/apache/spark/pull/34903 - This passes the current
python interpreter to spark-env.sh to allow some currently-unavailable
customization to happen
https://github.com/apache/spark/pull/31774 - This fixes a bug in the
SparkUI reverse proxy-handling code where it does a greedy match for
"proxy" in the URL, and will mistakenly replace the App-ID in the
wrong place.

I'm not exactly sure of how to get attention of PRs that have been
sitting around for a while, but these are really important to our
use-cases, and it would be nice to have them merged in.

Cheers
Andrew

On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca> wrote:
>
> I'd like to add/backport the logging in https://github.com/apache/spark/pull/35881 PR so that when users submit issues with dynamic allocation we can better debug what's going on.
>
> On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>>
>> There is one item on our side that we want to backport to 3.3:
>> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>
>> It's already reviewed and approved.
>>
>> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves <tg...@yahoo.com.invalid> wrote:
>> >
>> > It looks like the version hasn't been updated on master and still shows 3.3.0-SNAPSHOT, can you please update that.
>> >
>> > Tom
>> >
>> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <ma...@databricks.com.invalid> wrote:
>> >
>> >
>> > Hi All,
>> >
>> > I have created the branch for Spark 3.3:
>> > https://github.com/apache/spark/commits/branch-3.3
>> >
>> > Please, backport important fixes to it, and if you have some doubts, ping me in the PR. Regarding new features, we are still building the allow list for branch-3.3.
>> >
>> > Best regards,
>> > Max Gekk
>> >
>> >
>> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <do...@gmail.com> wrote:
>> >
>> > Yes, I agree with you for your whitelist approach for backporting. :)
>> > Thank you for summarizing.
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > I think I finally got your point. What you want to keep unchanged is the branch cut date of Spark 3.3. Today? or this Friday? This is not a big deal.
>> >
>> > My major concern is whether we should keep merging the feature work or the dependency upgrade after the branch cut. To make our release time more predictable, I am suggesting we should finalize the exception PR list first, instead of merging them in an ad hoc way. In the past, we spent a lot of time on the revert of the PRs that were merged after the branch cut. I hope we can minimize unnecessary arguments in this release. Do you agree, Dongjoon?
>> >
>> >
>> >
>> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>> >
>> > That is not totally fine, Xiao. It sounds like you are asking a change of plan without a proper reason.
>> >
>> > Although we cut the branch Today according our plan, you still can collect the list and make a list of exceptions. I'm not blocking what you want to do.
>> >
>> > Please let the community start to ramp down as we agreed before.
>> >
>> > Dongjoon
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > Please do not get me wrong. If we don't cut a branch, we are allowing all patches to land Apache Spark 3.3. That is totally fine. After we cut the branch, we should avoid merging the feature work. In the next three days, let us collect the actively developed PRs that we want to make an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>> >
>> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>> >
>> > Xiao. You are working against what you are saying.
>> > If you don't cut a branch, it means you are allowing all patches to land Apache Spark 3.3. No?
>> >
>> > > we need to avoid backporting the feature work that are not being well discussed.
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > Cutting the branch is simple, but we need to avoid backporting the feature work that are not being well discussed. Not all the members are actively following the dev list. I think we should wait 3 more days for collecting the PR list before cutting the branch.
>> >
>> > BTW, there are very few 3.4-only feature work that will be affected.
>> >
>> > Xiao
>> >
>> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>> >
>> > Hi, Max, Chao, Xiao, Holden and all.
>> >
>> > I have a different idea.
>> >
>> > Given the situation and small patch list, I don't think we need to postpone the branch cut for those patches. It's easier to cut a branch-3.3 and allow backporting.
>> >
>> > As of today, we already have an obvious Apache Spark 3.4 patch in the branch together. This situation only becomes worse and worse because there is no way to block the other patches from landing unintentionally if we don't cut a branch.
>> >
>> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>> >
>> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>> >
>> > Best,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
>> >
>> > Cool, thanks for clarifying!
>> >
>> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
>> > >>
>> > >> For the following list:
>> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> > >> Do you mean we should include them, or exclude them from 3.3?
>> > >
>> > >
>> > > If possible, I hope these features can be shipped with Spark 3.3.
>> > >
>> > >
>> > >
>> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>> > >>
>> > >> Hi Xiao,
>> > >>
>> > >> For the following list:
>> > >>
>> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> > >>
>> > >> Do you mean we should include them, or exclude them from 3.3?
>> > >>
>> > >> Thanks,
>> > >> Chao
>> > >>
>> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <do...@gmail.com> wrote:
>> > >> >
>> > >> > The following was tested and merged a few minutes ago. So, we can remove it from the list.
>> > >> >
>> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> > >> >
>> > >> > Thanks,
>> > >> > Dongjoon.
>> > >> >
>> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com> wrote:
>> > >> >>
>> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to collect the list of actively developed PRs that we want to merge to 3.3 after the branch cut?
>> > >> >>
>> > >> >> Please do not rush to merge the PRs that are not fully reviewed. We can cut the branch this Friday and continue merging the PRs that have been discussed in this thread. Does that make sense?
>> > >> >>
>> > >> >> Xiao
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>> > >> >>>
>> > >> >>> May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs.
>> > >> >>>
>> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com> wrote:
>> > >> >>>>
>> > >> >>>> > To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut?
>> > >> >>>>
>> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> > >> >>>>
>> > >> >>>> Three more days are OK for this from my view.
>> > >> >>>>
>> > >> >>>> Regards,
>> > >> >>>> Yikun
>> > >> >>>
>> > >> >>> --
>> > >> >>> Twitter: https://twitter.com/holdenkarau
>> > >> >>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> > >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Apache Spark 3.3 Release

Posted by Wenchen Fan <cl...@gmail.com>.

Just checked the release calendar, the planned RC cut date is April:
[image: image.png]
Let's revisit after 2 weeks then?

On Mon, Mar 21, 2022 at 2:47 PM Wenchen Fan <cl...@gmail.com> wrote:

> Shall we revisit this list after a week? Ideally, they should be either
> merged or rejected for 3.3, so that we can cut rc1. We can still discuss
> them case by case at that time if there are exceptions.
>
> On Sat, Mar 19, 2022 at 5:27 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Thank you for your summarization.
>>
>> I believe we need to have a discussion in order to evaluate each PR's
>> readiness.
>>
>> BTW, `branch-3.3` is still open for bug fixes including minor dependency
>> changes like the following.
>>
>> (Backported)
>> [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4
>> Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
>> [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5
>>
>> (Upcoming)
>> [SPARK-38544][BUILD] Upgrade log4j2 to 2.17.2 from 2.17.1
>> [SPARK-38602][BUILD] Upgrade Kafka to 3.1.1 from 3.1.0
>>
>> Dongjoon.
>>
>>
>>
>> On Thu, Mar 17, 2022 at 11:22 PM Maxim Gekk <ma...@databricks.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> Here is the allow list which I built based on your requests in this
>>> thread:
>>>
>>>    1. SPARK-37396: Inline type hint files for files in
>>>    python/pyspark/mllib
>>>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>>>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>>    deprecated usage of Distribution
>>>    5. SPARK-38085: DataSource V2: Handle DELETE commands for
>>>    group-based sources
>>>    6. SPARK-32268: Bloom Filter Join
>>>    7. SPARK-38548: New SQL function: try_sum
>>>    8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>>    9. SPARK-38063: Support SQL split_part function
>>>    10. SPARK-28516: Data Type Formatting Functions: `to_char`
>>>    11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>>    filter by self way
>>>    12. SPARK-34863: Support nested column in Spark Parquet vectorized
>>>    readers
>>>    13. SPARK-38194: Make Yarn memory overhead factor configurable
>>>    14. SPARK-37618: Support cleaning up shuffle blocks from external
>>>    shuffle service
>>>    15. SPARK-37831: Add task partition id in metrics
>>>    16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>>    17. SPARK-36664: Log time spent waiting for cluster resources
>>>    18. SPARK-34659: Web UI does not correctly get appId
>>>    19. SPARK-37650: Tell spark-env.sh the python interpreter
>>>    20. SPARK-38589: New SQL function: try_avg
>>>    21. SPARK-38590: New SQL function: try_to_binary
>>>    22. SPARK-34079: Improvement CTE table scan
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:
>>>
>>>> Is the feature freeze target date March 22nd then?  I saw a few dates
>>>> thrown around want to confirm what we landed on
>>>>
>>>> I am trying to get the following improvements finished review and in,
>>>> if concerns with either, let me know:
>>>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>>>> <https://github.com/apache/spark/pull/32298#>
>>>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>>>> for released executors <https://github.com/apache/spark/pull/35085#>
>>>>
>>>> Tom
>>>>
>>>>
>>>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>>>> ltnwgl@gmail.com> wrote:
>>>>
>>>>
>>>> I'd like to add the following new SQL functions in the 3.3 release.
>>>> These functions are useful when overflow or encoding errors occur:
>>>>
>>>>    - [SPARK-38548][SQL] New SQL function: try_sum
>>>>    <https://github.com/apache/spark/pull/35848>
>>>>    - [SPARK-38589][SQL] New SQL function: try_avg
>>>>    <https://github.com/apache/spark/pull/35896>
>>>>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>>>>    <https://github.com/apache/spark/pull/35897>
>>>>
>>>> Gengliang
>>>>
>>>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com>
>>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I've been trying for a bit to get the following two PRs merged and
>>>> into a release, and I'm having some difficulty moving them forward:
>>>>
>>>> https://github.com/apache/spark/pull/34903 - This passes the current
>>>> python interpreter to spark-env.sh to allow some currently-unavailable
>>>> customization to happen
>>>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>>>> SparkUI reverse proxy-handling code where it does a greedy match for
>>>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>>>> wrong place.
>>>>
>>>> I'm not exactly sure of how to get attention of PRs that have been
>>>> sitting around for a while, but these are really important to our
>>>> use-cases, and it would be nice to have them merged in.
>>>>
>>>> Cheers
>>>> Andrew
>>>>
>>>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca>
>>>> wrote:
>>>> >
>>>> > I'd like to add/backport the logging in
>>>> https://github.com/apache/spark/pull/35881 PR so that when users
>>>> submit issues with dynamic allocation we can better debug what's going on.
>>>> >
>>>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>>>> >>
>>>> >> There is one item on our side that we want to backport to 3.3:
>>>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>>>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>>> >>
>>>> >> It's already reviewed and approved.
>>>> >>
>>>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>>>> <tg...@yahoo.com.invalid> wrote:
>>>> >> >
>>>> >> > It looks like the version hasn't been updated on master and still
>>>> shows 3.3.0-SNAPSHOT, can you please update that.
>>>> >> >
>>>> >> > Tom
>>>> >> >
>>>> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
>>>> maxim.gekk@databricks.com.invalid> wrote:
>>>> >> >
>>>> >> >
>>>> >> > Hi All,
>>>> >> >
>>>> >> > I have created the branch for Spark 3.3:
>>>> >> > https://github.com/apache/spark/commits/branch-3.3
>>>> >> >
>>>> >> > Please, backport important fixes to it, and if you have some
>>>> doubts, ping me in the PR. Regarding new features, we are still building
>>>> the allow list for branch-3.3.
>>>> >> >
>>>> >> > Best regards,
>>>> >> > Max Gekk
>>>> >> >
>>>> >> >
>>>> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
>>>> dongjoon.hyun@gmail.com> wrote:
>>>> >> >
>>>> >> > Yes, I agree with you for your whitelist approach for backporting.
>>>> :)
>>>> >> > Thank you for summarizing.
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Dongjoon.
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > I think I finally got your point. What you want to keep unchanged
>>>> is the branch cut date of Spark 3.3. Today? or this Friday? This is not a
>>>> big deal.
>>>> >> >
>>>> >> > My major concern is whether we should keep merging the feature
>>>> work or the dependency upgrade after the branch cut. To make our release
>>>> time more predictable, I am suggesting we should finalize the exception PR
>>>> list first, instead of merging them in an ad hoc way. In the past, we spent
>>>> a lot of time on the revert of the PRs that were merged after the branch
>>>> cut. I hope we can minimize unnecessary arguments in this release. Do you
>>>> agree, Dongjoon?
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>>>> >> >
>>>> >> > That is not totally fine, Xiao. It sounds like you are asking a
>>>> change of plan without a proper reason.
>>>> >> >
>>>> >> > Although we cut the branch Today according our plan, you still can
>>>> collect the list and make a list of exceptions. I'm not blocking what you
>>>> want to do.
>>>> >> >
>>>> >> > Please let the community start to ramp down as we agreed before.
>>>> >> >
>>>> >> > Dongjoon
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Please do not get me wrong. If we don't cut a branch, we are
>>>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>>>> we cut the branch, we should avoid merging the feature work. In the next
>>>> three days, let us collect the actively developed PRs that we want to make
>>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>>> make sense?
>>>> >> >
>>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>>> >> >
>>>> >> > Xiao. You are working against what you are saying.
>>>> >> > If you don't cut a branch, it means you are allowing all patches
>>>> to land Apache Spark 3.3. No?
>>>> >> >
>>>> >> > > we need to avoid backporting the feature work that are not being
>>>> well discussed.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Cutting the branch is simple, but we need to avoid backporting the
>>>> feature work that are not being well discussed. Not all the members are
>>>> actively following the dev list. I think we should wait 3 more days for
>>>> collecting the PR list before cutting the branch.
>>>> >> >
>>>> >> > BTW, there are very few 3.4-only feature work that will be
>>>> affected.
>>>> >> >
>>>> >> > Xiao
>>>> >> >
>>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>> >> >
>>>> >> > Hi, Max, Chao, Xiao, Holden and all.
>>>> >> >
>>>> >> > I have a different idea.
>>>> >> >
>>>> >> > Given the situation and small patch list, I don't think we need to
>>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>>> and allow backporting.
>>>> >> >
>>>> >> > As of today, we already have an obvious Apache Spark 3.4 patch in
>>>> the branch together. This situation only becomes worse and worse because
>>>> there is no way to block the other patches from landing unintentionally if
>>>> we don't cut a branch.
>>>> >> >
>>>> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>>> values
>>>> >> >
>>>> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>> >> >
>>>> >> > Best,
>>>> >> > Dongjoon.
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>>>> wrote:
>>>> >> >
>>>> >> > Cool, thanks for clarifying!
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> > >>
>>>> >> > >> For the following list:
>>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>> vectorized reader
>>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>>> >> > >
>>>> >> > >
>>>> >> > > If possible, I hope these features can be shipped with Spark 3.3.
>>>> >> > >
>>>> >> > >
>>>> >> > >
>>>> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>> >> > >>
>>>> >> > >> Hi Xiao,
>>>> >> > >>
>>>> >> > >> For the following list:
>>>> >> > >>
>>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>> vectorized reader
>>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>> >> > >>
>>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>>> >> > >>
>>>> >> > >> Thanks,
>>>> >> > >> Chao
>>>> >> > >>
>>>> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>> dongjoon.hyun@gmail.com> wrote:
>>>> >> > >> >
>>>> >> > >> > The following was tested and merged a few minutes ago. So, we
>>>> can remove it from the list.
>>>> >> > >> >
>>>> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>> >> > >> >
>>>> >> > >> > Thanks,
>>>> >> > >> > Dongjoon.
>>>> >> > >> >
>>>> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> > >> >>
>>>> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>>> days to collect the list of actively developed PRs that we want to merge to
>>>> 3.3 after the branch cut?
>>>> >> > >> >>
>>>> >> > >> >> Please do not rush to merge the PRs that are not fully
>>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>>> that have been discussed in this thread. Does that make sense?
>>>> >> > >> >>
>>>> >> > >> >> Xiao
>>>> >> > >> >>
>>>> >> > >> >>
>>>> >> > >> >>
>>>> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>> >> > >> >>>
>>>> >> > >> >>> May I suggest we push out one week (22nd) just to give
>>>> everyone a bit of breathing space? Rushed software development more often
>>>> results in bugs.
>>>> >> > >> >>>
>>>> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>> yikunkero@gmail.com> wrote:
>>>> >> > >> >>>>
>>>> >> > >> >>>> > To make our release time more predictable, let us
>>>> collect the PRs and wait three more days before the branch cut?
>>>> >> > >> >>>>
>>>> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>> v1.5.1
>>>> >> > >> >>>>
>>>> >> > >> >>>> Three more days are OK for this from my view.
>>>> >> > >> >>>>
>>>> >> > >> >>>> Regards,
>>>> >> > >> >>>> Yikun
>>>> >> > >> >>>
>>>> >> > >> >>> --
>>>> >> > >> >>> Twitter: https://twitter.com/holdenkarau
>>>> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> >> > >> >>> YouTube Live Streams:
>>>> https://www.youtube.com/user/holdenkarau
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Twitter: https://twitter.com/holdenkarau
>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>

Re: Apache Spark 3.3 Release

Posted by Wenchen Fan <cl...@gmail.com>.

Shall we revisit this list after a week? Ideally, they should be either
merged or rejected for 3.3, so that we can cut rc1. We can still discuss
them case by case at that time if there are exceptions.

On Sat, Mar 19, 2022 at 5:27 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you for your summarization.
>
> I believe we need to have a discussion in order to evaluate each PR's
> readiness.
>
> BTW, `branch-3.3` is still open for bug fixes including minor dependency
> changes like the following.
>
> (Backported)
> [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4
> Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
> [SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5
>
> (Upcoming)
> [SPARK-38544][BUILD] Upgrade log4j2 to 2.17.2 from 2.17.1
> [SPARK-38602][BUILD] Upgrade Kafka to 3.1.1 from 3.1.0
>
> Dongjoon.
>
>
>
> On Thu, Mar 17, 2022 at 11:22 PM Maxim Gekk <ma...@databricks.com>
> wrote:
>
>> Hi All,
>>
>> Here is the allow list which I built based on your requests in this
>> thread:
>>
>>    1. SPARK-37396: Inline type hint files for files in
>>    python/pyspark/mllib
>>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>    deprecated usage of Distribution
>>    5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>>    sources
>>    6. SPARK-32268: Bloom Filter Join
>>    7. SPARK-38548: New SQL function: try_sum
>>    8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>    9. SPARK-38063: Support SQL split_part function
>>    10. SPARK-28516: Data Type Formatting Functions: `to_char`
>>    11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>    filter by self way
>>    12. SPARK-34863: Support nested column in Spark Parquet vectorized
>>    readers
>>    13. SPARK-38194: Make Yarn memory overhead factor configurable
>>    14. SPARK-37618: Support cleaning up shuffle blocks from external
>>    shuffle service
>>    15. SPARK-37831: Add task partition id in metrics
>>    16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>    17. SPARK-36664: Log time spent waiting for cluster resources
>>    18. SPARK-34659: Web UI does not correctly get appId
>>    19. SPARK-37650: Tell spark-env.sh the python interpreter
>>    20. SPARK-38589: New SQL function: try_avg
>>    21. SPARK-38590: New SQL function: try_to_binary
>>    22. SPARK-34079: Improvement CTE table scan
>>
>> Best regards,
>> Max Gekk
>>
>>
>> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:
>>
>>> Is the feature freeze target date March 22nd then?  I saw a few dates
>>> thrown around want to confirm what we landed on
>>>
>>> I am trying to get the following improvements finished review and in, if
>>> concerns with either, let me know:
>>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>>> <https://github.com/apache/spark/pull/32298#>
>>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>>> for released executors <https://github.com/apache/spark/pull/35085#>
>>>
>>> Tom
>>>
>>>
>>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>>> ltnwgl@gmail.com> wrote:
>>>
>>>
>>> I'd like to add the following new SQL functions in the 3.3 release.
>>> These functions are useful when overflow or encoding errors occur:
>>>
>>>    - [SPARK-38548][SQL] New SQL function: try_sum
>>>    <https://github.com/apache/spark/pull/35848>
>>>    - [SPARK-38589][SQL] New SQL function: try_avg
>>>    <https://github.com/apache/spark/pull/35896>
>>>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>>>    <https://github.com/apache/spark/pull/35897>
>>>
>>> Gengliang
>>>
>>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com>
>>> wrote:
>>>
>>> Hello,
>>>
>>> I've been trying for a bit to get the following two PRs merged and
>>> into a release, and I'm having some difficulty moving them forward:
>>>
>>> https://github.com/apache/spark/pull/34903 - This passes the current
>>> python interpreter to spark-env.sh to allow some currently-unavailable
>>> customization to happen
>>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>>> SparkUI reverse proxy-handling code where it does a greedy match for
>>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>>> wrong place.
>>>
>>> I'm not exactly sure of how to get attention of PRs that have been
>>> sitting around for a while, but these are really important to our
>>> use-cases, and it would be nice to have them merged in.
>>>
>>> Cheers
>>> Andrew
>>>
>>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca>
>>> wrote:
>>> >
>>> > I'd like to add/backport the logging in
>>> https://github.com/apache/spark/pull/35881 PR so that when users submit
>>> issues with dynamic allocation we can better debug what's going on.
>>> >
>>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>>> >>
>>> >> There is one item on our side that we want to backport to 3.3:
>>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>> >>
>>> >> It's already reviewed and approved.
>>> >>
>>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>>> <tg...@yahoo.com.invalid> wrote:
>>> >> >
>>> >> > It looks like the version hasn't been updated on master and still
>>> shows 3.3.0-SNAPSHOT, can you please update that.
>>> >> >
>>> >> > Tom
>>> >> >
>>> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
>>> maxim.gekk@databricks.com.invalid> wrote:
>>> >> >
>>> >> >
>>> >> > Hi All,
>>> >> >
>>> >> > I have created the branch for Spark 3.3:
>>> >> > https://github.com/apache/spark/commits/branch-3.3
>>> >> >
>>> >> > Please, backport important fixes to it, and if you have some
>>> doubts, ping me in the PR. Regarding new features, we are still building
>>> the allow list for branch-3.3.
>>> >> >
>>> >> > Best regards,
>>> >> > Max Gekk
>>> >> >
>>> >> >
>>> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
>>> dongjoon.hyun@gmail.com> wrote:
>>> >> >
>>> >> > Yes, I agree with you for your whitelist approach for backporting.
>>> :)
>>> >> > Thank you for summarizing.
>>> >> >
>>> >> > Thanks,
>>> >> > Dongjoon.
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > I think I finally got your point. What you want to keep unchanged
>>> is the branch cut date of Spark 3.3. Today? or this Friday? This is not a
>>> big deal.
>>> >> >
>>> >> > My major concern is whether we should keep merging the feature work
>>> or the dependency upgrade after the branch cut. To make our release time
>>> more predictable, I am suggesting we should finalize the exception PR list
>>> first, instead of merging them in an ad hoc way. In the past, we spent a
>>> lot of time on the revert of the PRs that were merged after the branch cut.
>>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>>> Dongjoon?
>>> >> >
>>> >> >
>>> >> >
>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>>> >> >
>>> >> > That is not totally fine, Xiao. It sounds like you are asking a
>>> change of plan without a proper reason.
>>> >> >
>>> >> > Although we cut the branch Today according our plan, you still can
>>> collect the list and make a list of exceptions. I'm not blocking what you
>>> want to do.
>>> >> >
>>> >> > Please let the community start to ramp down as we agreed before.
>>> >> >
>>> >> > Dongjoon
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > Please do not get me wrong. If we don't cut a branch, we are
>>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>>> we cut the branch, we should avoid merging the feature work. In the next
>>> three days, let us collect the actively developed PRs that we want to make
>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>> make sense?
>>> >> >
>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>> >> >
>>> >> > Xiao. You are working against what you are saying.
>>> >> > If you don't cut a branch, it means you are allowing all patches to
>>> land Apache Spark 3.3. No?
>>> >> >
>>> >> > > we need to avoid backporting the feature work that are not being
>>> well discussed.
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > Cutting the branch is simple, but we need to avoid backporting the
>>> feature work that are not being well discussed. Not all the members are
>>> actively following the dev list. I think we should wait 3 more days for
>>> collecting the PR list before cutting the branch.
>>> >> >
>>> >> > BTW, there are very few 3.4-only feature work that will be affected.
>>> >> >
>>> >> > Xiao
>>> >> >
>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>> >> >
>>> >> > Hi, Max, Chao, Xiao, Holden and all.
>>> >> >
>>> >> > I have a different idea.
>>> >> >
>>> >> > Given the situation and small patch list, I don't think we need to
>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>> and allow backporting.
>>> >> >
>>> >> > As of today, we already have an obvious Apache Spark 3.4 patch in
>>> the branch together. This situation only becomes worse and worse because
>>> there is no way to block the other patches from landing unintentionally if
>>> we don't cut a branch.
>>> >> >
>>> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>> values
>>> >> >
>>> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>> >> >
>>> >> > Best,
>>> >> > Dongjoon.
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>>> wrote:
>>> >> >
>>> >> > Cool, thanks for clarifying!
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> > >>
>>> >> > >> For the following list:
>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>> >> > >
>>> >> > >
>>> >> > > If possible, I hope these features can be shipped with Spark 3.3.
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>> >> > >>
>>> >> > >> Hi Xiao,
>>> >> > >>
>>> >> > >> For the following list:
>>> >> > >>
>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> > >>
>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>> >> > >>
>>> >> > >> Thanks,
>>> >> > >> Chao
>>> >> > >>
>>> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>> dongjoon.hyun@gmail.com> wrote:
>>> >> > >> >
>>> >> > >> > The following was tested and merged a few minutes ago. So, we
>>> can remove it from the list.
>>> >> > >> >
>>> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> > >> >
>>> >> > >> > Thanks,
>>> >> > >> > Dongjoon.
>>> >> > >> >
>>> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> > >> >>
>>> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>> days to collect the list of actively developed PRs that we want to merge to
>>> 3.3 after the branch cut?
>>> >> > >> >>
>>> >> > >> >> Please do not rush to merge the PRs that are not fully
>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>> that have been discussed in this thread. Does that make sense?
>>> >> > >> >>
>>> >> > >> >> Xiao
>>> >> > >> >>
>>> >> > >> >>
>>> >> > >> >>
>>> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>> >> > >> >>>
>>> >> > >> >>> May I suggest we push out one week (22nd) just to give
>>> everyone a bit of breathing space? Rushed software development more often
>>> results in bugs.
>>> >> > >> >>>
>>> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>> yikunkero@gmail.com> wrote:
>>> >> > >> >>>>
>>> >> > >> >>>> > To make our release time more predictable, let us collect
>>> the PRs and wait three more days before the branch cut?
>>> >> > >> >>>>
>>> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>> v1.5.1
>>> >> > >> >>>>
>>> >> > >> >>>> Three more days are OK for this from my view.
>>> >> > >> >>>>
>>> >> > >> >>>> Regards,
>>> >> > >> >>>> Yikun
>>> >> > >> >>>
>>> >> > >> >>> --
>>> >> > >> >>> Twitter: https://twitter.com/holdenkarau
>>> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >> > >> >>> YouTube Live Streams:
>>> https://www.youtube.com/user/holdenkarau
>>> >
>>> >
>>> >
>>> > --
>>> > Twitter: https://twitter.com/holdenkarau
>>> > Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>

Re: Apache Spark 3.3 Release

Posted by Dongjoon Hyun <do...@gmail.com>.

Thank you for your summarization.

I believe we need to have a discussion in order to evaluate each PR's
readiness.

BTW, `branch-3.3` is still open for bug fixes including minor dependency
changes like the following.

(Backported)
[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4
Revert "[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4"
[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.5

(Upcoming)
[SPARK-38544][BUILD] Upgrade log4j2 to 2.17.2 from 2.17.1
[SPARK-38602][BUILD] Upgrade Kafka to 3.1.1 from 3.1.0

Dongjoon.



On Thu, Mar 17, 2022 at 11:22 PM Maxim Gekk <ma...@databricks.com>
wrote:

> Hi All,
>
> Here is the allow list which I built based on your requests in this thread:
>
>    1. SPARK-37396: Inline type hint files for files in
>    python/pyspark/mllib
>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>    deprecated usage of Distribution
>    5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>    sources
>    6. SPARK-32268: Bloom Filter Join
>    7. SPARK-38548: New SQL function: try_sum
>    8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>    9. SPARK-38063: Support SQL split_part function
>    10. SPARK-28516: Data Type Formatting Functions: `to_char`
>    11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>    filter by self way
>    12. SPARK-34863: Support nested column in Spark Parquet vectorized
>    readers
>    13. SPARK-38194: Make Yarn memory overhead factor configurable
>    14. SPARK-37618: Support cleaning up shuffle blocks from external
>    shuffle service
>    15. SPARK-37831: Add task partition id in metrics
>    16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>    17. SPARK-36664: Log time spent waiting for cluster resources
>    18. SPARK-34659: Web UI does not correctly get appId
>    19. SPARK-37650: Tell spark-env.sh the python interpreter
>    20. SPARK-38589: New SQL function: try_avg
>    21. SPARK-38590: New SQL function: try_to_binary
>    22. SPARK-34079: Improvement CTE table scan
>
> Best regards,
> Max Gekk
>
>
> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:
>
>> Is the feature freeze target date March 22nd then?  I saw a few dates
>> thrown around want to confirm what we landed on
>>
>> I am trying to get the following improvements finished review and in, if
>> concerns with either, let me know:
>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>> <https://github.com/apache/spark/pull/32298#>
>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>> for released executors <https://github.com/apache/spark/pull/35085#>
>>
>> Tom
>>
>>
>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>> ltnwgl@gmail.com> wrote:
>>
>>
>> I'd like to add the following new SQL functions in the 3.3 release. These
>> functions are useful when overflow or encoding errors occur:
>>
>>    - [SPARK-38548][SQL] New SQL function: try_sum
>>    <https://github.com/apache/spark/pull/35848>
>>    - [SPARK-38589][SQL] New SQL function: try_avg
>>    <https://github.com/apache/spark/pull/35896>
>>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>>    <https://github.com/apache/spark/pull/35897>
>>
>> Gengliang
>>
>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com>
>> wrote:
>>
>> Hello,
>>
>> I've been trying for a bit to get the following two PRs merged and
>> into a release, and I'm having some difficulty moving them forward:
>>
>> https://github.com/apache/spark/pull/34903 - This passes the current
>> python interpreter to spark-env.sh to allow some currently-unavailable
>> customization to happen
>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>> SparkUI reverse proxy-handling code where it does a greedy match for
>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>> wrong place.
>>
>> I'm not exactly sure of how to get attention of PRs that have been
>> sitting around for a while, but these are really important to our
>> use-cases, and it would be nice to have them merged in.
>>
>> Cheers
>> Andrew
>>
>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>> >
>> > I'd like to add/backport the logging in
>> https://github.com/apache/spark/pull/35881 PR so that when users submit
>> issues with dynamic allocation we can better debug what's going on.
>> >
>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>> >>
>> >> There is one item on our side that we want to backport to 3.3:
>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>> >>
>> >> It's already reviewed and approved.
>> >>
>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>> <tg...@yahoo.com.invalid> wrote:
>> >> >
>> >> > It looks like the version hasn't been updated on master and still
>> shows 3.3.0-SNAPSHOT, can you please update that.
>> >> >
>> >> > Tom
>> >> >
>> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
>> maxim.gekk@databricks.com.invalid> wrote:
>> >> >
>> >> >
>> >> > Hi All,
>> >> >
>> >> > I have created the branch for Spark 3.3:
>> >> > https://github.com/apache/spark/commits/branch-3.3
>> >> >
>> >> > Please, backport important fixes to it, and if you have some doubts,
>> ping me in the PR. Regarding new features, we are still building the allow
>> list for branch-3.3.
>> >> >
>> >> > Best regards,
>> >> > Max Gekk
>> >> >
>> >> >
>> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
>> dongjoon.hyun@gmail.com> wrote:
>> >> >
>> >> > Yes, I agree with you for your whitelist approach for backporting. :)
>> >> > Thank you for summarizing.
>> >> >
>> >> > Thanks,
>> >> > Dongjoon.
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> >
>> >> > I think I finally got your point. What you want to keep unchanged is
>> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
>> deal.
>> >> >
>> >> > My major concern is whether we should keep merging the feature work
>> or the dependency upgrade after the branch cut. To make our release time
>> more predictable, I am suggesting we should finalize the exception PR list
>> first, instead of merging them in an ad hoc way. In the past, we spent a
>> lot of time on the revert of the PRs that were merged after the branch cut.
>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>> Dongjoon?
>> >> >
>> >> >
>> >> >
>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>> >> >
>> >> > That is not totally fine, Xiao. It sounds like you are asking a
>> change of plan without a proper reason.
>> >> >
>> >> > Although we cut the branch Today according our plan, you still can
>> collect the list and make a list of exceptions. I'm not blocking what you
>> want to do.
>> >> >
>> >> > Please let the community start to ramp down as we agreed before.
>> >> >
>> >> > Dongjoon
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> >
>> >> > Please do not get me wrong. If we don't cut a branch, we are
>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>> we cut the branch, we should avoid merging the feature work. In the next
>> three days, let us collect the actively developed PRs that we want to make
>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>> make sense?
>> >> >
>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>> >> >
>> >> > Xiao. You are working against what you are saying.
>> >> > If you don't cut a branch, it means you are allowing all patches to
>> land Apache Spark 3.3. No?
>> >> >
>> >> > > we need to avoid backporting the feature work that are not being
>> well discussed.
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> >
>> >> > Cutting the branch is simple, but we need to avoid backporting the
>> feature work that are not being well discussed. Not all the members are
>> actively following the dev list. I think we should wait 3 more days for
>> collecting the PR list before cutting the branch.
>> >> >
>> >> > BTW, there are very few 3.4-only feature work that will be affected.
>> >> >
>> >> > Xiao
>> >> >
>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>> >> >
>> >> > Hi, Max, Chao, Xiao, Holden and all.
>> >> >
>> >> > I have a different idea.
>> >> >
>> >> > Given the situation and small patch list, I don't think we need to
>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>> and allow backporting.
>> >> >
>> >> > As of today, we already have an obvious Apache Spark 3.4 patch in
>> the branch together. This situation only becomes worse and worse because
>> there is no way to block the other patches from landing unintentionally if
>> we don't cut a branch.
>> >> >
>> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>> values
>> >> >
>> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>> >> >
>> >> > Best,
>> >> > Dongjoon.
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>> wrote:
>> >> >
>> >> > Cool, thanks for clarifying!
>> >> >
>> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> > >>
>> >> > >> For the following list:
>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>> vectorized reader
>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>> >> > >
>> >> > >
>> >> > > If possible, I hope these features can be shipped with Spark 3.3.
>> >> > >
>> >> > >
>> >> > >
>> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>> >> > >>
>> >> > >> Hi Xiao,
>> >> > >>
>> >> > >> For the following list:
>> >> > >>
>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>> vectorized reader
>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> > >>
>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>> >> > >>
>> >> > >> Thanks,
>> >> > >> Chao
>> >> > >>
>> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>> dongjoon.hyun@gmail.com> wrote:
>> >> > >> >
>> >> > >> > The following was tested and merged a few minutes ago. So, we
>> can remove it from the list.
>> >> > >> >
>> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> > >> >
>> >> > >> > Thanks,
>> >> > >> > Dongjoon.
>> >> > >> >
>> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> > >> >>
>> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>> days to collect the list of actively developed PRs that we want to merge to
>> 3.3 after the branch cut?
>> >> > >> >>
>> >> > >> >> Please do not rush to merge the PRs that are not fully
>> reviewed. We can cut the branch this Friday and continue merging the PRs
>> that have been discussed in this thread. Does that make sense?
>> >> > >> >>
>> >> > >> >> Xiao
>> >> > >> >>
>> >> > >> >>
>> >> > >> >>
>> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>> >> > >> >>>
>> >> > >> >>> May I suggest we push out one week (22nd) just to give
>> everyone a bit of breathing space? Rushed software development more often
>> results in bugs.
>> >> > >> >>>
>> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>> yikunkero@gmail.com> wrote:
>> >> > >> >>>>
>> >> > >> >>>> > To make our release time more predictable, let us collect
>> the PRs and wait three more days before the branch cut?
>> >> > >> >>>>
>> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> > >> >>>>
>> >> > >> >>>> Three more days are OK for this from my view.
>> >> > >> >>>>
>> >> > >> >>>> Regards,
>> >> > >> >>>> Yikun
>> >> > >> >>>
>> >> > >> >>> --
>> >> > >> >>> Twitter: https://twitter.com/holdenkarau
>> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >> > >> >>> YouTube Live Streams:
>> https://www.youtube.com/user/holdenkarau
>> >
>> >
>> >
>> > --
>> > Twitter: https://twitter.com/holdenkarau
>> > Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: Apache Spark 3.3 Release

Posted by Maxim Gekk <ma...@databricks.com.INVALID>.

Hello All,

I am going to create the first release candidate of Spark 3.3 at the
beginning of the next week if there are no objections. Below is the list of
allow features, and their current status. At the moment, only one feature
is still in progress, but it can be postponed to the next release, I guess:

IN PROGRESS:

   1. SPARK-28516: Data Type Formatting Functions: `to_char`

IN PROGRESS but won't/couldn't be merged to branch-3.3:

   1. SPARK-37650: Tell spark-env.sh the python interpreter
   2. SPARK-36664: Log time spent waiting for cluster resources
   3. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   5. SPARK-37093: Inline type hints python/pyspark/streaming

RESOLVED:

   1. SPARK-32268: Bloom Filter Join
   2. SPARK-38548: New SQL function: try_sum
   3. SPARK-38063: Support SQL split_part function
   4. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
   6. SPARK-38194: Make Yarn memory overhead factor configurable
   7. SPARK-37618: Support cleaning up shuffle blocks from external shuffle
   service
   8. SPARK-37831: Add task partition id in metrics
   9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   10. SPARK-38590: New SQL function: try_to_binary
   11. SPARK-37377: Refactor V2 Partitioning interface and remove
   deprecated usage of Distribution
   12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   13. SPARK-34659: Web UI does not correctly get appId
   14. SPARK-38589: New SQL function: try_avg
   15. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   16. SPARK-34079: Improvement CTE table scan


Max Gekk

Software Engineer

Databricks, Inc.


On Fri, Apr 15, 2022 at 4:28 PM Maxim Gekk <ma...@databricks.com>
wrote:

> Hello All,
>
> Current status of features from the allow list for branch-3.3 is:
>
> IN PROGRESS:
>
>    1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>    2. SPARK-28516: Data Type Formatting Functions: `to_char`
>    3. SPARK-34079: Improvement CTE table scan
>
> IN PROGRESS but won't/couldn't be merged to branch-3.3:
>
>    1. SPARK-37650: Tell spark-env.sh the python interpreter
>    2. SPARK-36664: Log time spent waiting for cluster resources
>    3. SPARK-37396: Inline type hint files for files in
>    python/pyspark/mllib
>    4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>    5. SPARK-37093: Inline type hints python/pyspark/streaming
>
> RESOLVED:
>
>    1. SPARK-32268: Bloom Filter Join
>    2. SPARK-38548: New SQL function: try_sum
>    3. SPARK-38063: Support SQL split_part function
>    4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>    filter by self way
>    5. SPARK-34863: Support nested column in Spark Parquet vectorized
>    readers
>    6. SPARK-38194: Make Yarn memory overhead factor configurable
>    7. SPARK-37618: Support cleaning up shuffle blocks from external
>    shuffle service
>    8. SPARK-37831: Add task partition id in metrics
>    9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>    10. SPARK-38590: New SQL function: try_to_binary
>    11. SPARK-37377: Refactor V2 Partitioning interface and remove
>    deprecated usage of Distribution
>    12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>    sources
>    13. SPARK-34659: Web UI does not correctly get appId
>    14. SPARK-38589: New SQL function: try_avg
>
>
> Max Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk <ma...@databricks.com>
> wrote:
>
>> Hello All,
>>
>> Below is current status of features from the allow list:
>>
>> IN PROGRESS:
>>
>>    1. SPARK-37396: Inline type hint files for files in
>>    python/pyspark/mllib
>>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>    deprecated usage of Distribution
>>    5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>>    sources
>>    6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>    7. SPARK-28516: Data Type Formatting Functions: `to_char`
>>    8. SPARK-36664: Log time spent waiting for cluster resources
>>    9. SPARK-34659: Web UI does not correctly get appId
>>    10. SPARK-37650: Tell spark-env.sh the python interpreter
>>    11. SPARK-38589: New SQL function: try_avg
>>    12. SPARK-38590: New SQL function: try_to_binary
>>    13. SPARK-34079: Improvement CTE table scan
>>
>> RESOLVED:
>>
>>    1. SPARK-32268: Bloom Filter Join
>>    2. SPARK-38548: New SQL function: try_sum
>>    3. SPARK-38063: Support SQL split_part function
>>    4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>    filter by self way
>>    5. SPARK-34863: Support nested column in Spark Parquet vectorized
>>    readers
>>    6. SPARK-38194: Make Yarn memory overhead factor configurable
>>    7. SPARK-37618: Support cleaning up shuffle blocks from external
>>    shuffle service
>>    8. SPARK-37831: Add task partition id in metrics
>>    9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>
>> We need to decide whether we are going to wait a little bit more or close
>> the doors.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>> On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk <ma...@databricks.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> Here is the allow list which I built based on your requests in this
>>> thread:
>>>
>>>    1. SPARK-37396: Inline type hint files for files in
>>>    python/pyspark/mllib
>>>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>>>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>>    deprecated usage of Distribution
>>>    5. SPARK-38085: DataSource V2: Handle DELETE commands for
>>>    group-based sources
>>>    6. SPARK-32268: Bloom Filter Join
>>>    7. SPARK-38548: New SQL function: try_sum
>>>    8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>>    9. SPARK-38063: Support SQL split_part function
>>>    10. SPARK-28516: Data Type Formatting Functions: `to_char`
>>>    11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>>    filter by self way
>>>    12. SPARK-34863: Support nested column in Spark Parquet vectorized
>>>    readers
>>>    13. SPARK-38194: Make Yarn memory overhead factor configurable
>>>    14. SPARK-37618: Support cleaning up shuffle blocks from external
>>>    shuffle service
>>>    15. SPARK-37831: Add task partition id in metrics
>>>    16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>>    17. SPARK-36664: Log time spent waiting for cluster resources
>>>    18. SPARK-34659: Web UI does not correctly get appId
>>>    19. SPARK-37650: Tell spark-env.sh the python interpreter
>>>    20. SPARK-38589: New SQL function: try_avg
>>>    21. SPARK-38590: New SQL function: try_to_binary
>>>    22. SPARK-34079: Improvement CTE table scan
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:
>>>
>>>> Is the feature freeze target date March 22nd then?  I saw a few dates
>>>> thrown around want to confirm what we landed on
>>>>
>>>> I am trying to get the following improvements finished review and in,
>>>> if concerns with either, let me know:
>>>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>>>> <https://github.com/apache/spark/pull/32298#>
>>>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>>>> for released executors <https://github.com/apache/spark/pull/35085#>
>>>>
>>>> Tom
>>>>
>>>>
>>>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>>>> ltnwgl@gmail.com> wrote:
>>>>
>>>>
>>>> I'd like to add the following new SQL functions in the 3.3 release.
>>>> These functions are useful when overflow or encoding errors occur:
>>>>
>>>>    - [SPARK-38548][SQL] New SQL function: try_sum
>>>>    <https://github.com/apache/spark/pull/35848>
>>>>    - [SPARK-38589][SQL] New SQL function: try_avg
>>>>    <https://github.com/apache/spark/pull/35896>
>>>>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>>>>    <https://github.com/apache/spark/pull/35897>
>>>>
>>>> Gengliang
>>>>
>>>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com>
>>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I've been trying for a bit to get the following two PRs merged and
>>>> into a release, and I'm having some difficulty moving them forward:
>>>>
>>>> https://github.com/apache/spark/pull/34903 - This passes the current
>>>> python interpreter to spark-env.sh to allow some currently-unavailable
>>>> customization to happen
>>>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>>>> SparkUI reverse proxy-handling code where it does a greedy match for
>>>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>>>> wrong place.
>>>>
>>>> I'm not exactly sure of how to get attention of PRs that have been
>>>> sitting around for a while, but these are really important to our
>>>> use-cases, and it would be nice to have them merged in.
>>>>
>>>> Cheers
>>>> Andrew
>>>>
>>>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca>
>>>> wrote:
>>>> >
>>>> > I'd like to add/backport the logging in
>>>> https://github.com/apache/spark/pull/35881 PR so that when users
>>>> submit issues with dynamic allocation we can better debug what's going on.
>>>> >
>>>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>>>> >>
>>>> >> There is one item on our side that we want to backport to 3.3:
>>>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>>>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>>> >>
>>>> >> It's already reviewed and approved.
>>>> >>
>>>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>>>> <tg...@yahoo.com.invalid> wrote:
>>>> >> >
>>>> >> > It looks like the version hasn't been updated on master and still
>>>> shows 3.3.0-SNAPSHOT, can you please update that.
>>>> >> >
>>>> >> > Tom
>>>> >> >
>>>> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
>>>> maxim.gekk@databricks.com.invalid> wrote:
>>>> >> >
>>>> >> >
>>>> >> > Hi All,
>>>> >> >
>>>> >> > I have created the branch for Spark 3.3:
>>>> >> > https://github.com/apache/spark/commits/branch-3.3
>>>> >> >
>>>> >> > Please, backport important fixes to it, and if you have some
>>>> doubts, ping me in the PR. Regarding new features, we are still building
>>>> the allow list for branch-3.3.
>>>> >> >
>>>> >> > Best regards,
>>>> >> > Max Gekk
>>>> >> >
>>>> >> >
>>>> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
>>>> dongjoon.hyun@gmail.com> wrote:
>>>> >> >
>>>> >> > Yes, I agree with you for your whitelist approach for backporting.
>>>> :)
>>>> >> > Thank you for summarizing.
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Dongjoon.
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > I think I finally got your point. What you want to keep unchanged
>>>> is the branch cut date of Spark 3.3. Today? or this Friday? This is not a
>>>> big deal.
>>>> >> >
>>>> >> > My major concern is whether we should keep merging the feature
>>>> work or the dependency upgrade after the branch cut. To make our release
>>>> time more predictable, I am suggesting we should finalize the exception PR
>>>> list first, instead of merging them in an ad hoc way. In the past, we spent
>>>> a lot of time on the revert of the PRs that were merged after the branch
>>>> cut. I hope we can minimize unnecessary arguments in this release. Do you
>>>> agree, Dongjoon?
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>>>> >> >
>>>> >> > That is not totally fine, Xiao. It sounds like you are asking a
>>>> change of plan without a proper reason.
>>>> >> >
>>>> >> > Although we cut the branch Today according our plan, you still can
>>>> collect the list and make a list of exceptions. I'm not blocking what you
>>>> want to do.
>>>> >> >
>>>> >> > Please let the community start to ramp down as we agreed before.
>>>> >> >
>>>> >> > Dongjoon
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Please do not get me wrong. If we don't cut a branch, we are
>>>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>>>> we cut the branch, we should avoid merging the feature work. In the next
>>>> three days, let us collect the actively developed PRs that we want to make
>>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>>> make sense?
>>>> >> >
>>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>>> >> >
>>>> >> > Xiao. You are working against what you are saying.
>>>> >> > If you don't cut a branch, it means you are allowing all patches
>>>> to land Apache Spark 3.3. No?
>>>> >> >
>>>> >> > > we need to avoid backporting the feature work that are not being
>>>> well discussed.
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Cutting the branch is simple, but we need to avoid backporting the
>>>> feature work that are not being well discussed. Not all the members are
>>>> actively following the dev list. I think we should wait 3 more days for
>>>> collecting the PR list before cutting the branch.
>>>> >> >
>>>> >> > BTW, there are very few 3.4-only feature work that will be
>>>> affected.
>>>> >> >
>>>> >> > Xiao
>>>> >> >
>>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>> >> >
>>>> >> > Hi, Max, Chao, Xiao, Holden and all.
>>>> >> >
>>>> >> > I have a different idea.
>>>> >> >
>>>> >> > Given the situation and small patch list, I don't think we need to
>>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>>> and allow backporting.
>>>> >> >
>>>> >> > As of today, we already have an obvious Apache Spark 3.4 patch in
>>>> the branch together. This situation only becomes worse and worse because
>>>> there is no way to block the other patches from landing unintentionally if
>>>> we don't cut a branch.
>>>> >> >
>>>> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>>> values
>>>> >> >
>>>> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>> >> >
>>>> >> > Best,
>>>> >> > Dongjoon.
>>>> >> >
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>>>> wrote:
>>>> >> >
>>>> >> > Cool, thanks for clarifying!
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> > >>
>>>> >> > >> For the following list:
>>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>> vectorized reader
>>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>>> >> > >
>>>> >> > >
>>>> >> > > If possible, I hope these features can be shipped with Spark 3.3.
>>>> >> > >
>>>> >> > >
>>>> >> > >
>>>> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>> >> > >>
>>>> >> > >> Hi Xiao,
>>>> >> > >>
>>>> >> > >> For the following list:
>>>> >> > >>
>>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>> vectorized reader
>>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>> >> > >>
>>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>>> >> > >>
>>>> >> > >> Thanks,
>>>> >> > >> Chao
>>>> >> > >>
>>>> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>> dongjoon.hyun@gmail.com> wrote:
>>>> >> > >> >
>>>> >> > >> > The following was tested and merged a few minutes ago. So, we
>>>> can remove it from the list.
>>>> >> > >> >
>>>> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>> >> > >> >
>>>> >> > >> > Thanks,
>>>> >> > >> > Dongjoon.
>>>> >> > >> >
>>>> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> > >> >>
>>>> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>>> days to collect the list of actively developed PRs that we want to merge to
>>>> 3.3 after the branch cut?
>>>> >> > >> >>
>>>> >> > >> >> Please do not rush to merge the PRs that are not fully
>>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>>> that have been discussed in this thread. Does that make sense?
>>>> >> > >> >>
>>>> >> > >> >> Xiao
>>>> >> > >> >>
>>>> >> > >> >>
>>>> >> > >> >>
>>>> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>> >> > >> >>>
>>>> >> > >> >>> May I suggest we push out one week (22nd) just to give
>>>> everyone a bit of breathing space? Rushed software development more often
>>>> results in bugs.
>>>> >> > >> >>>
>>>> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>> yikunkero@gmail.com> wrote:
>>>> >> > >> >>>>
>>>> >> > >> >>>> > To make our release time more predictable, let us
>>>> collect the PRs and wait three more days before the branch cut?
>>>> >> > >> >>>>
>>>> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>> v1.5.1
>>>> >> > >> >>>>
>>>> >> > >> >>>> Three more days are OK for this from my view.
>>>> >> > >> >>>>
>>>> >> > >> >>>> Regards,
>>>> >> > >> >>>> Yikun
>>>> >> > >> >>>
>>>> >> > >> >>> --
>>>> >> > >> >>> Twitter: https://twitter.com/holdenkarau
>>>> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> >> > >> >>> YouTube Live Streams:
>>>> https://www.youtube.com/user/holdenkarau
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Twitter: https://twitter.com/holdenkarau
>>>> > Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>

Re: Apache Spark 3.3 Release

Posted by Maxim Gekk <ma...@databricks.com.INVALID>.

Hello All,

Current status of features from the allow list for branch-3.3 is:

IN PROGRESS:

   1. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   2. SPARK-28516: Data Type Formatting Functions: `to_char`
   3. SPARK-34079: Improvement CTE table scan

IN PROGRESS but won't/couldn't be merged to branch-3.3:

   1. SPARK-37650: Tell spark-env.sh the python interpreter
   2. SPARK-36664: Log time spent waiting for cluster resources
   3. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   4. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   5. SPARK-37093: Inline type hints python/pyspark/streaming

RESOLVED:

   1. SPARK-32268: Bloom Filter Join
   2. SPARK-38548: New SQL function: try_sum
   3. SPARK-38063: Support SQL split_part function
   4. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
   6. SPARK-38194: Make Yarn memory overhead factor configurable
   7. SPARK-37618: Support cleaning up shuffle blocks from external shuffle
   service
   8. SPARK-37831: Add task partition id in metrics
   9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   10. SPARK-38590: New SQL function: try_to_binary
   11. SPARK-37377: Refactor V2 Partitioning interface and remove
   deprecated usage of Distribution
   12. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   13. SPARK-34659: Web UI does not correctly get appId
   14. SPARK-38589: New SQL function: try_avg


Max Gekk

Software Engineer

Databricks, Inc.


On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk <ma...@databricks.com> wrote:

> Hello All,
>
> Below is current status of features from the allow list:
>
> IN PROGRESS:
>
>    1. SPARK-37396: Inline type hint files for files in
>    python/pyspark/mllib
>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>    deprecated usage of Distribution
>    5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>    sources
>    6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>    7. SPARK-28516: Data Type Formatting Functions: `to_char`
>    8. SPARK-36664: Log time spent waiting for cluster resources
>    9. SPARK-34659: Web UI does not correctly get appId
>    10. SPARK-37650: Tell spark-env.sh the python interpreter
>    11. SPARK-38589: New SQL function: try_avg
>    12. SPARK-38590: New SQL function: try_to_binary
>    13. SPARK-34079: Improvement CTE table scan
>
> RESOLVED:
>
>    1. SPARK-32268: Bloom Filter Join
>    2. SPARK-38548: New SQL function: try_sum
>    3. SPARK-38063: Support SQL split_part function
>    4. SPARK-38432: Refactor framework so as JDBC dialect could compile
>    filter by self way
>    5. SPARK-34863: Support nested column in Spark Parquet vectorized
>    readers
>    6. SPARK-38194: Make Yarn memory overhead factor configurable
>    7. SPARK-37618: Support cleaning up shuffle blocks from external
>    shuffle service
>    8. SPARK-37831: Add task partition id in metrics
>    9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>
> We need to decide whether we are going to wait a little bit more or close
> the doors.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk <ma...@databricks.com>
> wrote:
>
>> Hi All,
>>
>> Here is the allow list which I built based on your requests in this
>> thread:
>>
>>    1. SPARK-37396: Inline type hint files for files in
>>    python/pyspark/mllib
>>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>>    deprecated usage of Distribution
>>    5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>>    sources
>>    6. SPARK-32268: Bloom Filter Join
>>    7. SPARK-38548: New SQL function: try_sum
>>    8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>>    9. SPARK-38063: Support SQL split_part function
>>    10. SPARK-28516: Data Type Formatting Functions: `to_char`
>>    11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>>    filter by self way
>>    12. SPARK-34863: Support nested column in Spark Parquet vectorized
>>    readers
>>    13. SPARK-38194: Make Yarn memory overhead factor configurable
>>    14. SPARK-37618: Support cleaning up shuffle blocks from external
>>    shuffle service
>>    15. SPARK-37831: Add task partition id in metrics
>>    16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>>    17. SPARK-36664: Log time spent waiting for cluster resources
>>    18. SPARK-34659: Web UI does not correctly get appId
>>    19. SPARK-37650: Tell spark-env.sh the python interpreter
>>    20. SPARK-38589: New SQL function: try_avg
>>    21. SPARK-38590: New SQL function: try_to_binary
>>    22. SPARK-34079: Improvement CTE table scan
>>
>> Best regards,
>> Max Gekk
>>
>>
>> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:
>>
>>> Is the feature freeze target date March 22nd then?  I saw a few dates
>>> thrown around want to confirm what we landed on
>>>
>>> I am trying to get the following improvements finished review and in, if
>>> concerns with either, let me know:
>>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>>> <https://github.com/apache/spark/pull/32298#>
>>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>>> for released executors <https://github.com/apache/spark/pull/35085#>
>>>
>>> Tom
>>>
>>>
>>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>>> ltnwgl@gmail.com> wrote:
>>>
>>>
>>> I'd like to add the following new SQL functions in the 3.3 release.
>>> These functions are useful when overflow or encoding errors occur:
>>>
>>>    - [SPARK-38548][SQL] New SQL function: try_sum
>>>    <https://github.com/apache/spark/pull/35848>
>>>    - [SPARK-38589][SQL] New SQL function: try_avg
>>>    <https://github.com/apache/spark/pull/35896>
>>>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>>>    <https://github.com/apache/spark/pull/35897>
>>>
>>> Gengliang
>>>
>>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com>
>>> wrote:
>>>
>>> Hello,
>>>
>>> I've been trying for a bit to get the following two PRs merged and
>>> into a release, and I'm having some difficulty moving them forward:
>>>
>>> https://github.com/apache/spark/pull/34903 - This passes the current
>>> python interpreter to spark-env.sh to allow some currently-unavailable
>>> customization to happen
>>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>>> SparkUI reverse proxy-handling code where it does a greedy match for
>>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>>> wrong place.
>>>
>>> I'm not exactly sure of how to get attention of PRs that have been
>>> sitting around for a while, but these are really important to our
>>> use-cases, and it would be nice to have them merged in.
>>>
>>> Cheers
>>> Andrew
>>>
>>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca>
>>> wrote:
>>> >
>>> > I'd like to add/backport the logging in
>>> https://github.com/apache/spark/pull/35881 PR so that when users submit
>>> issues with dynamic allocation we can better debug what's going on.
>>> >
>>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>>> >>
>>> >> There is one item on our side that we want to backport to 3.3:
>>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>> >>
>>> >> It's already reviewed and approved.
>>> >>
>>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>>> <tg...@yahoo.com.invalid> wrote:
>>> >> >
>>> >> > It looks like the version hasn't been updated on master and still
>>> shows 3.3.0-SNAPSHOT, can you please update that.
>>> >> >
>>> >> > Tom
>>> >> >
>>> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
>>> maxim.gekk@databricks.com.invalid> wrote:
>>> >> >
>>> >> >
>>> >> > Hi All,
>>> >> >
>>> >> > I have created the branch for Spark 3.3:
>>> >> > https://github.com/apache/spark/commits/branch-3.3
>>> >> >
>>> >> > Please, backport important fixes to it, and if you have some
>>> doubts, ping me in the PR. Regarding new features, we are still building
>>> the allow list for branch-3.3.
>>> >> >
>>> >> > Best regards,
>>> >> > Max Gekk
>>> >> >
>>> >> >
>>> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
>>> dongjoon.hyun@gmail.com> wrote:
>>> >> >
>>> >> > Yes, I agree with you for your whitelist approach for backporting.
>>> :)
>>> >> > Thank you for summarizing.
>>> >> >
>>> >> > Thanks,
>>> >> > Dongjoon.
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > I think I finally got your point. What you want to keep unchanged
>>> is the branch cut date of Spark 3.3. Today? or this Friday? This is not a
>>> big deal.
>>> >> >
>>> >> > My major concern is whether we should keep merging the feature work
>>> or the dependency upgrade after the branch cut. To make our release time
>>> more predictable, I am suggesting we should finalize the exception PR list
>>> first, instead of merging them in an ad hoc way. In the past, we spent a
>>> lot of time on the revert of the PRs that were merged after the branch cut.
>>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>>> Dongjoon?
>>> >> >
>>> >> >
>>> >> >
>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>>> >> >
>>> >> > That is not totally fine, Xiao. It sounds like you are asking a
>>> change of plan without a proper reason.
>>> >> >
>>> >> > Although we cut the branch Today according our plan, you still can
>>> collect the list and make a list of exceptions. I'm not blocking what you
>>> want to do.
>>> >> >
>>> >> > Please let the community start to ramp down as we agreed before.
>>> >> >
>>> >> > Dongjoon
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > Please do not get me wrong. If we don't cut a branch, we are
>>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>>> we cut the branch, we should avoid merging the feature work. In the next
>>> three days, let us collect the actively developed PRs that we want to make
>>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>> make sense?
>>> >> >
>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>> >> >
>>> >> > Xiao. You are working against what you are saying.
>>> >> > If you don't cut a branch, it means you are allowing all patches to
>>> land Apache Spark 3.3. No?
>>> >> >
>>> >> > > we need to avoid backporting the feature work that are not being
>>> well discussed.
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > Cutting the branch is simple, but we need to avoid backporting the
>>> feature work that are not being well discussed. Not all the members are
>>> actively following the dev list. I think we should wait 3 more days for
>>> collecting the PR list before cutting the branch.
>>> >> >
>>> >> > BTW, there are very few 3.4-only feature work that will be affected.
>>> >> >
>>> >> > Xiao
>>> >> >
>>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>> >> >
>>> >> > Hi, Max, Chao, Xiao, Holden and all.
>>> >> >
>>> >> > I have a different idea.
>>> >> >
>>> >> > Given the situation and small patch list, I don't think we need to
>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>> and allow backporting.
>>> >> >
>>> >> > As of today, we already have an obvious Apache Spark 3.4 patch in
>>> the branch together. This situation only becomes worse and worse because
>>> there is no way to block the other patches from landing unintentionally if
>>> we don't cut a branch.
>>> >> >
>>> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>> values
>>> >> >
>>> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>> >> >
>>> >> > Best,
>>> >> > Dongjoon.
>>> >> >
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>>> wrote:
>>> >> >
>>> >> > Cool, thanks for clarifying!
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> > >>
>>> >> > >> For the following list:
>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>> >> > >
>>> >> > >
>>> >> > > If possible, I hope these features can be shipped with Spark 3.3.
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>> >> > >>
>>> >> > >> Hi Xiao,
>>> >> > >>
>>> >> > >> For the following list:
>>> >> > >>
>>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> > >>
>>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>>> >> > >>
>>> >> > >> Thanks,
>>> >> > >> Chao
>>> >> > >>
>>> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>> dongjoon.hyun@gmail.com> wrote:
>>> >> > >> >
>>> >> > >> > The following was tested and merged a few minutes ago. So, we
>>> can remove it from the list.
>>> >> > >> >
>>> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> > >> >
>>> >> > >> > Thanks,
>>> >> > >> > Dongjoon.
>>> >> > >> >
>>> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> > >> >>
>>> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>> days to collect the list of actively developed PRs that we want to merge to
>>> 3.3 after the branch cut?
>>> >> > >> >>
>>> >> > >> >> Please do not rush to merge the PRs that are not fully
>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>> that have been discussed in this thread. Does that make sense?
>>> >> > >> >>
>>> >> > >> >> Xiao
>>> >> > >> >>
>>> >> > >> >>
>>> >> > >> >>
>>> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>> >> > >> >>>
>>> >> > >> >>> May I suggest we push out one week (22nd) just to give
>>> everyone a bit of breathing space? Rushed software development more often
>>> results in bugs.
>>> >> > >> >>>
>>> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>> yikunkero@gmail.com> wrote:
>>> >> > >> >>>>
>>> >> > >> >>>> > To make our release time more predictable, let us collect
>>> the PRs and wait three more days before the branch cut?
>>> >> > >> >>>>
>>> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>> v1.5.1
>>> >> > >> >>>>
>>> >> > >> >>>> Three more days are OK for this from my view.
>>> >> > >> >>>>
>>> >> > >> >>>> Regards,
>>> >> > >> >>>> Yikun
>>> >> > >> >>>
>>> >> > >> >>> --
>>> >> > >> >>> Twitter: https://twitter.com/holdenkarau
>>> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >> > >> >>> YouTube Live Streams:
>>> https://www.youtube.com/user/holdenkarau
>>> >
>>> >
>>> >
>>> > --
>>> > Twitter: https://twitter.com/holdenkarau
>>> > Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>

Re: Apache Spark 3.3 Release

Posted by Maxim Gekk <ma...@databricks.com.INVALID>.

Hello All,

Below is current status of features from the allow list:

IN PROGRESS:

   1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   3. SPARK-37093: Inline type hints python/pyspark/streaming
   4. SPARK-37377: Refactor V2 Partitioning interface and remove deprecated
   usage of Distribution
   5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   6. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   7. SPARK-28516: Data Type Formatting Functions: `to_char`
   8. SPARK-36664: Log time spent waiting for cluster resources
   9. SPARK-34659: Web UI does not correctly get appId
   10. SPARK-37650: Tell spark-env.sh the python interpreter
   11. SPARK-38589: New SQL function: try_avg
   12. SPARK-38590: New SQL function: try_to_binary
   13. SPARK-34079: Improvement CTE table scan

RESOLVED:

   1. SPARK-32268: Bloom Filter Join
   2. SPARK-38548: New SQL function: try_sum
   3. SPARK-38063: Support SQL split_part function
   4. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   5. SPARK-34863: Support nested column in Spark Parquet vectorized readers
   6. SPARK-38194: Make Yarn memory overhead factor configurable
   7. SPARK-37618: Support cleaning up shuffle blocks from external shuffle
   service
   8. SPARK-37831: Add task partition id in metrics
   9. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support

We need to decide whether we are going to wait a little bit more or close
the doors.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk <ma...@databricks.com>
wrote:

> Hi All,
>
> Here is the allow list which I built based on your requests in this thread:
>
>    1. SPARK-37396: Inline type hint files for files in
>    python/pyspark/mllib
>    2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
>    3. SPARK-37093: Inline type hints python/pyspark/streaming
>    4. SPARK-37377: Refactor V2 Partitioning interface and remove
>    deprecated usage of Distribution
>    5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
>    sources
>    6. SPARK-32268: Bloom Filter Join
>    7. SPARK-38548: New SQL function: try_sum
>    8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
>    9. SPARK-38063: Support SQL split_part function
>    10. SPARK-28516: Data Type Formatting Functions: `to_char`
>    11. SPARK-38432: Refactor framework so as JDBC dialect could compile
>    filter by self way
>    12. SPARK-34863: Support nested column in Spark Parquet vectorized
>    readers
>    13. SPARK-38194: Make Yarn memory overhead factor configurable
>    14. SPARK-37618: Support cleaning up shuffle blocks from external
>    shuffle service
>    15. SPARK-37831: Add task partition id in metrics
>    16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
>    DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
>    17. SPARK-36664: Log time spent waiting for cluster resources
>    18. SPARK-34659: Web UI does not correctly get appId
>    19. SPARK-37650: Tell spark-env.sh the python interpreter
>    20. SPARK-38589: New SQL function: try_avg
>    21. SPARK-38590: New SQL function: try_to_binary
>    22. SPARK-34079: Improvement CTE table scan
>
> Best regards,
> Max Gekk
>
>
> On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:
>
>> Is the feature freeze target date March 22nd then?  I saw a few dates
>> thrown around want to confirm what we landed on
>>
>> I am trying to get the following improvements finished review and in, if
>> concerns with either, let me know:
>> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
>> <https://github.com/apache/spark/pull/32298#>
>> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service
>> for released executors <https://github.com/apache/spark/pull/35085#>
>>
>> Tom
>>
>>
>> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
>> ltnwgl@gmail.com> wrote:
>>
>>
>> I'd like to add the following new SQL functions in the 3.3 release. These
>> functions are useful when overflow or encoding errors occur:
>>
>>    - [SPARK-38548][SQL] New SQL function: try_sum
>>    <https://github.com/apache/spark/pull/35848>
>>    - [SPARK-38589][SQL] New SQL function: try_avg
>>    <https://github.com/apache/spark/pull/35896>
>>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>>    <https://github.com/apache/spark/pull/35897>
>>
>> Gengliang
>>
>> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com>
>> wrote:
>>
>> Hello,
>>
>> I've been trying for a bit to get the following two PRs merged and
>> into a release, and I'm having some difficulty moving them forward:
>>
>> https://github.com/apache/spark/pull/34903 - This passes the current
>> python interpreter to spark-env.sh to allow some currently-unavailable
>> customization to happen
>> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
>> SparkUI reverse proxy-handling code where it does a greedy match for
>> "proxy" in the URL, and will mistakenly replace the App-ID in the
>> wrong place.
>>
>> I'm not exactly sure of how to get attention of PRs that have been
>> sitting around for a while, but these are really important to our
>> use-cases, and it would be nice to have them merged in.
>>
>> Cheers
>> Andrew
>>
>> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>> >
>> > I'd like to add/backport the logging in
>> https://github.com/apache/spark/pull/35881 PR so that when users submit
>> issues with dynamic allocation we can better debug what's going on.
>> >
>> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>> >>
>> >> There is one item on our side that we want to backport to 3.3:
>> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>> >>
>> >> It's already reviewed and approved.
>> >>
>> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves
>> <tg...@yahoo.com.invalid> wrote:
>> >> >
>> >> > It looks like the version hasn't been updated on master and still
>> shows 3.3.0-SNAPSHOT, can you please update that.
>> >> >
>> >> > Tom
>> >> >
>> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
>> maxim.gekk@databricks.com.invalid> wrote:
>> >> >
>> >> >
>> >> > Hi All,
>> >> >
>> >> > I have created the branch for Spark 3.3:
>> >> > https://github.com/apache/spark/commits/branch-3.3
>> >> >
>> >> > Please, backport important fixes to it, and if you have some doubts,
>> ping me in the PR. Regarding new features, we are still building the allow
>> list for branch-3.3.
>> >> >
>> >> > Best regards,
>> >> > Max Gekk
>> >> >
>> >> >
>> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
>> dongjoon.hyun@gmail.com> wrote:
>> >> >
>> >> > Yes, I agree with you for your whitelist approach for backporting. :)
>> >> > Thank you for summarizing.
>> >> >
>> >> > Thanks,
>> >> > Dongjoon.
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> >
>> >> > I think I finally got your point. What you want to keep unchanged is
>> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
>> deal.
>> >> >
>> >> > My major concern is whether we should keep merging the feature work
>> or the dependency upgrade after the branch cut. To make our release time
>> more predictable, I am suggesting we should finalize the exception PR list
>> first, instead of merging them in an ad hoc way. In the past, we spent a
>> lot of time on the revert of the PRs that were merged after the branch cut.
>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>> Dongjoon?
>> >> >
>> >> >
>> >> >
>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>> >> >
>> >> > That is not totally fine, Xiao. It sounds like you are asking a
>> change of plan without a proper reason.
>> >> >
>> >> > Although we cut the branch Today according our plan, you still can
>> collect the list and make a list of exceptions. I'm not blocking what you
>> want to do.
>> >> >
>> >> > Please let the community start to ramp down as we agreed before.
>> >> >
>> >> > Dongjoon
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> >
>> >> > Please do not get me wrong. If we don't cut a branch, we are
>> allowing all patches to land Apache Spark 3.3. That is totally fine. After
>> we cut the branch, we should avoid merging the feature work. In the next
>> three days, let us collect the actively developed PRs that we want to make
>> an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>> make sense?
>> >> >
>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>> >> >
>> >> > Xiao. You are working against what you are saying.
>> >> > If you don't cut a branch, it means you are allowing all patches to
>> land Apache Spark 3.3. No?
>> >> >
>> >> > > we need to avoid backporting the feature work that are not being
>> well discussed.
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> >
>> >> > Cutting the branch is simple, but we need to avoid backporting the
>> feature work that are not being well discussed. Not all the members are
>> actively following the dev list. I think we should wait 3 more days for
>> collecting the PR list before cutting the branch.
>> >> >
>> >> > BTW, there are very few 3.4-only feature work that will be affected.
>> >> >
>> >> > Xiao
>> >> >
>> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>> >> >
>> >> > Hi, Max, Chao, Xiao, Holden and all.
>> >> >
>> >> > I have a different idea.
>> >> >
>> >> > Given the situation and small patch list, I don't think we need to
>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>> and allow backporting.
>> >> >
>> >> > As of today, we already have an obvious Apache Spark 3.4 patch in
>> the branch together. This situation only becomes worse and worse because
>> there is no way to block the other patches from landing unintentionally if
>> we don't cut a branch.
>> >> >
>> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>> values
>> >> >
>> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>> >> >
>> >> > Best,
>> >> > Dongjoon.
>> >> >
>> >> >
>> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>> wrote:
>> >> >
>> >> > Cool, thanks for clarifying!
>> >> >
>> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> > >>
>> >> > >> For the following list:
>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>> vectorized reader
>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>> >> > >
>> >> > >
>> >> > > If possible, I hope these features can be shipped with Spark 3.3.
>> >> > >
>> >> > >
>> >> > >
>> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>> >> > >>
>> >> > >> Hi Xiao,
>> >> > >>
>> >> > >> For the following list:
>> >> > >>
>> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>> vectorized reader
>> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> > >>
>> >> > >> Do you mean we should include them, or exclude them from 3.3?
>> >> > >>
>> >> > >> Thanks,
>> >> > >> Chao
>> >> > >>
>> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>> dongjoon.hyun@gmail.com> wrote:
>> >> > >> >
>> >> > >> > The following was tested and merged a few minutes ago. So, we
>> can remove it from the list.
>> >> > >> >
>> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> > >> >
>> >> > >> > Thanks,
>> >> > >> > Dongjoon.
>> >> > >> >
>> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> > >> >>
>> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>> days to collect the list of actively developed PRs that we want to merge to
>> 3.3 after the branch cut?
>> >> > >> >>
>> >> > >> >> Please do not rush to merge the PRs that are not fully
>> reviewed. We can cut the branch this Friday and continue merging the PRs
>> that have been discussed in this thread. Does that make sense?
>> >> > >> >>
>> >> > >> >> Xiao
>> >> > >> >>
>> >> > >> >>
>> >> > >> >>
>> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>> >> > >> >>>
>> >> > >> >>> May I suggest we push out one week (22nd) just to give
>> everyone a bit of breathing space? Rushed software development more often
>> results in bugs.
>> >> > >> >>>
>> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>> yikunkero@gmail.com> wrote:
>> >> > >> >>>>
>> >> > >> >>>> > To make our release time more predictable, let us collect
>> the PRs and wait three more days before the branch cut?
>> >> > >> >>>>
>> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> > >> >>>>
>> >> > >> >>>> Three more days are OK for this from my view.
>> >> > >> >>>>
>> >> > >> >>>> Regards,
>> >> > >> >>>> Yikun
>> >> > >> >>>
>> >> > >> >>> --
>> >> > >> >>> Twitter: https://twitter.com/holdenkarau
>> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >> > >> >>> YouTube Live Streams:
>> https://www.youtube.com/user/holdenkarau
>> >
>> >
>> >
>> > --
>> > Twitter: https://twitter.com/holdenkarau
>> > Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: Apache Spark 3.3 Release

Posted by Maxim Gekk <ma...@databricks.com.INVALID>.

Hi All,

Here is the allow list which I built based on your requests in this thread:

   1. SPARK-37396: Inline type hint files for files in python/pyspark/mllib
   2. SPARK-37395: Inline type hint files for files in python/pyspark/ml
   3. SPARK-37093: Inline type hints python/pyspark/streaming
   4. SPARK-37377: Refactor V2 Partitioning interface and remove deprecated
   usage of Distribution
   5. SPARK-38085: DataSource V2: Handle DELETE commands for group-based
   sources
   6. SPARK-32268: Bloom Filter Join
   7. SPARK-38548: New SQL function: try_sum
   8. SPARK-37691: Support ANSI Aggregation Function: percentile_disc
   9. SPARK-38063: Support SQL split_part function
   10. SPARK-28516: Data Type Formatting Functions: `to_char`
   11. SPARK-38432: Refactor framework so as JDBC dialect could compile
   filter by self way
   12. SPARK-34863: Support nested column in Spark Parquet vectorized
   readers
   13. SPARK-38194: Make Yarn memory overhead factor configurable
   14. SPARK-37618: Support cleaning up shuffle blocks from external
   shuffle service
   15. SPARK-37831: Add task partition id in metrics
   16. SPARK-37974: Implement vectorized DELTA_BYTE_ARRAY and
   DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support
   17. SPARK-36664: Log time spent waiting for cluster resources
   18. SPARK-34659: Web UI does not correctly get appId
   19. SPARK-37650: Tell spark-env.sh the python interpreter
   20. SPARK-38589: New SQL function: try_avg
   21. SPARK-38590: New SQL function: try_to_binary
   22. SPARK-34079: Improvement CTE table scan

Best regards,
Max Gekk


On Thu, Mar 17, 2022 at 4:59 PM Tom Graves <tg...@yahoo.com> wrote:

> Is the feature freeze target date March 22nd then?  I saw a few dates
> thrown around want to confirm what we landed on
>
> I am trying to get the following improvements finished review and in, if
> concerns with either, let me know:
> - [SPARK-34079][SQL] Merge non-correlated scalar subqueries
> <https://github.com/apache/spark/pull/32298#>
> - [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service for
> released executors <https://github.com/apache/spark/pull/35085#>
>
> Tom
>
>
> On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <
> ltnwgl@gmail.com> wrote:
>
>
> I'd like to add the following new SQL functions in the 3.3 release. These
> functions are useful when overflow or encoding errors occur:
>
>    - [SPARK-38548][SQL] New SQL function: try_sum
>    <https://github.com/apache/spark/pull/35848>
>    - [SPARK-38589][SQL] New SQL function: try_avg
>    <https://github.com/apache/spark/pull/35896>
>    - [SPARK-38590][SQL] New SQL function: try_to_binary
>    <https://github.com/apache/spark/pull/35897>
>
> Gengliang
>
> On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com> wrote:
>
> Hello,
>
> I've been trying for a bit to get the following two PRs merged and
> into a release, and I'm having some difficulty moving them forward:
>
> https://github.com/apache/spark/pull/34903 - This passes the current
> python interpreter to spark-env.sh to allow some currently-unavailable
> customization to happen
> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
> SparkUI reverse proxy-handling code where it does a greedy match for
> "proxy" in the URL, and will mistakenly replace the App-ID in the
> wrong place.
>
> I'm not exactly sure of how to get attention of PRs that have been
> sitting around for a while, but these are really important to our
> use-cases, and it would be nice to have them merged in.
>
> Cheers
> Andrew
>
> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca> wrote:
> >
> > I'd like to add/backport the logging in
> https://github.com/apache/spark/pull/35881 PR so that when users submit
> issues with dynamic allocation we can better debug what's going on.
> >
> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
> >>
> >> There is one item on our side that we want to backport to 3.3:
> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
> >>
> >> It's already reviewed and approved.
> >>
> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves <tg...@yahoo.com.invalid>
> wrote:
> >> >
> >> > It looks like the version hasn't been updated on master and still
> shows 3.3.0-SNAPSHOT, can you please update that.
> >> >
> >> > Tom
> >> >
> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
> maxim.gekk@databricks.com.invalid> wrote:
> >> >
> >> >
> >> > Hi All,
> >> >
> >> > I have created the branch for Spark 3.3:
> >> > https://github.com/apache/spark/commits/branch-3.3
> >> >
> >> > Please, backport important fixes to it, and if you have some doubts,
> ping me in the PR. Regarding new features, we are still building the allow
> list for branch-3.3.
> >> >
> >> > Best regards,
> >> > Max Gekk
> >> >
> >> >
> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
> dongjoon.hyun@gmail.com> wrote:
> >> >
> >> > Yes, I agree with you for your whitelist approach for backporting. :)
> >> > Thank you for summarizing.
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
> >> >
> >> > I think I finally got your point. What you want to keep unchanged is
> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
> deal.
> >> >
> >> > My major concern is whether we should keep merging the feature work
> or the dependency upgrade after the branch cut. To make our release time
> more predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
> >> >
> >> >
> >> >
> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
> >> >
> >> > That is not totally fine, Xiao. It sounds like you are asking a
> change of plan without a proper reason.
> >> >
> >> > Although we cut the branch Today according our plan, you still can
> collect the list and make a list of exceptions. I'm not blocking what you
> want to do.
> >> >
> >> > Please let the community start to ramp down as we agreed before.
> >> >
> >> > Dongjoon
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
> >> >
> >> > Please do not get me wrong. If we don't cut a branch, we are allowing
> all patches to land Apache Spark 3.3. That is totally fine. After we cut
> the branch, we should avoid merging the feature work. In the next three
> days, let us collect the actively developed PRs that we want to make an
> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
> make sense?
> >> >
> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
> >> >
> >> > Xiao. You are working against what you are saying.
> >> > If you don't cut a branch, it means you are allowing all patches to
> land Apache Spark 3.3. No?
> >> >
> >> > > we need to avoid backporting the feature work that are not being
> well discussed.
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
> wrote:
> >> >
> >> > Cutting the branch is simple, but we need to avoid backporting the
> feature work that are not being well discussed. Not all the members are
> actively following the dev list. I think we should wait 3 more days for
> collecting the PR list before cutting the branch.
> >> >
> >> > BTW, there are very few 3.4-only feature work that will be affected.
> >> >
> >> > Xiao
> >> >
> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
> >> >
> >> > Hi, Max, Chao, Xiao, Holden and all.
> >> >
> >> > I have a different idea.
> >> >
> >> > Given the situation and small patch list, I don't think we need to
> postpone the branch cut for those patches. It's easier to cut a branch-3.3
> and allow backporting.
> >> >
> >> > As of today, we already have an obvious Apache Spark 3.4 patch in the
> branch together. This situation only becomes worse and worse because there
> is no way to block the other patches from landing unintentionally if we
> don't cut a branch.
> >> >
> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
> values
> >> >
> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
> >> >
> >> > Best,
> >> > Dongjoon.
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
> >> >
> >> > Cool, thanks for clarifying!
> >> >
> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
> wrote:
> >> > >>
> >> > >> For the following list:
> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> > >> Do you mean we should include them, or exclude them from 3.3?
> >> > >
> >> > >
> >> > > If possible, I hope these features can be shipped with Spark 3.3.
> >> > >
> >> > >
> >> > >
> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
> >> > >>
> >> > >> Hi Xiao,
> >> > >>
> >> > >> For the following list:
> >> > >>
> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> > >>
> >> > >> Do you mean we should include them, or exclude them from 3.3?
> >> > >>
> >> > >> Thanks,
> >> > >> Chao
> >> > >>
> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
> dongjoon.hyun@gmail.com> wrote:
> >> > >> >
> >> > >> > The following was tested and merged a few minutes ago. So, we
> can remove it from the list.
> >> > >> >
> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> > >> >
> >> > >> > Thanks,
> >> > >> > Dongjoon.
> >> > >> >
> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
> wrote:
> >> > >> >>
> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
> days to collect the list of actively developed PRs that we want to merge to
> 3.3 after the branch cut?
> >> > >> >>
> >> > >> >> Please do not rush to merge the PRs that are not fully
> reviewed. We can cut the branch this Friday and continue merging the PRs
> that have been discussed in this thread. Does that make sense?
> >> > >> >>
> >> > >> >> Xiao
> >> > >> >>
> >> > >> >>
> >> > >> >>
> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
> >> > >> >>>
> >> > >> >>> May I suggest we push out one week (22nd) just to give
> everyone a bit of breathing space? Rushed software development more often
> results in bugs.
> >> > >> >>>
> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
> yikunkero@gmail.com> wrote:
> >> > >> >>>>
> >> > >> >>>> > To make our release time more predictable, let us collect
> the PRs and wait three more days before the branch cut?
> >> > >> >>>>
> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> > >> >>>>
> >> > >> >>>> Three more days are OK for this from my view.
> >> > >> >>>>
> >> > >> >>>> Regards,
> >> > >> >>>> Yikun
> >> > >> >>>
> >> > >> >>> --
> >> > >> >>> Twitter: https://twitter.com/holdenkarau
> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> > >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >
> >
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Apache Spark 3.3 Release

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

 Is the feature freeze target date March 22nd then?  I saw a few dates thrown around want to confirm what we landed on 
I am trying to get the following improvements finished review and in, if concerns with either, let me know:- [SPARK-34079][SQL] Merge non-correlated scalar subqueries- [SPARK-37618][CORE] Remove shuffle blocks using the shuffle service for released executors
Tom

    On Thursday, March 17, 2022, 07:24:41 AM CDT, Gengliang Wang <lt...@gmail.com> wrote:  
 
 I'd like to add the following new SQL functions in the 3.3 release. These functions are useful when overflow or encoding errors occur:   
   - [SPARK-38548][SQL] New SQL function: try_sum    

   - [SPARK-38589][SQL] New SQL function: try_avg   

   - [SPARK-38590][SQL] New SQL function: try_to_binary    

Gengliang
On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com> wrote:

Hello,

I've been trying for a bit to get the following two PRs merged and
into a release, and I'm having some difficulty moving them forward:

https://github.com/apache/spark/pull/34903 - This passes the current
python interpreter to spark-env.sh to allow some currently-unavailable
customization to happen
https://github.com/apache/spark/pull/31774 - This fixes a bug in the
SparkUI reverse proxy-handling code where it does a greedy match for
"proxy" in the URL, and will mistakenly replace the App-ID in the
wrong place.

I'm not exactly sure of how to get attention of PRs that have been
sitting around for a while, but these are really important to our
use-cases, and it would be nice to have them merged in.

Cheers
Andrew

On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca> wrote:
>
> I'd like to add/backport the logging in https://github.com/apache/spark/pull/35881 PR so that when users submit issues with dynamic allocation we can better debug what's going on.
>
> On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>>
>> There is one item on our side that we want to backport to 3.3:
>> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>
>> It's already reviewed and approved.
>>
>> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves <tg...@yahoo.com.invalid> wrote:
>> >
>> > It looks like the version hasn't been updated on master and still shows 3.3.0-SNAPSHOT, can you please update that.
>> >
>> > Tom
>> >
>> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <ma...@databricks.com.invalid> wrote:
>> >
>> >
>> > Hi All,
>> >
>> > I have created the branch for Spark 3.3:
>> > https://github.com/apache/spark/commits/branch-3.3
>> >
>> > Please, backport important fixes to it, and if you have some doubts, ping me in the PR. Regarding new features, we are still building the allow list for branch-3.3.
>> >
>> > Best regards,
>> > Max Gekk
>> >
>> >
>> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <do...@gmail.com> wrote:
>> >
>> > Yes, I agree with you for your whitelist approach for backporting. :)
>> > Thank you for summarizing.
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > I think I finally got your point. What you want to keep unchanged is the branch cut date of Spark 3.3. Today? or this Friday? This is not a big deal.
>> >
>> > My major concern is whether we should keep merging the feature work or the dependency upgrade after the branch cut. To make our release time more predictable, I am suggesting we should finalize the exception PR list first, instead of merging them in an ad hoc way. In the past, we spent a lot of time on the revert of the PRs that were merged after the branch cut. I hope we can minimize unnecessary arguments in this release. Do you agree, Dongjoon?
>> >
>> >
>> >
>> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>> >
>> > That is not totally fine, Xiao. It sounds like you are asking a change of plan without a proper reason.
>> >
>> > Although we cut the branch Today according our plan, you still can collect the list and make a list of exceptions. I'm not blocking what you want to do.
>> >
>> > Please let the community start to ramp down as we agreed before.
>> >
>> > Dongjoon
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > Please do not get me wrong. If we don't cut a branch, we are allowing all patches to land Apache Spark 3.3. That is totally fine. After we cut the branch, we should avoid merging the feature work. In the next three days, let us collect the actively developed PRs that we want to make an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>> >
>> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>> >
>> > Xiao. You are working against what you are saying.
>> > If you don't cut a branch, it means you are allowing all patches to land Apache Spark 3.3. No?
>> >
>> > > we need to avoid backporting the feature work that are not being well discussed.
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > Cutting the branch is simple, but we need to avoid backporting the feature work that are not being well discussed. Not all the members are actively following the dev list. I think we should wait 3 more days for collecting the PR list before cutting the branch.
>> >
>> > BTW, there are very few 3.4-only feature work that will be affected.
>> >
>> > Xiao
>> >
>> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>> >
>> > Hi, Max, Chao, Xiao, Holden and all.
>> >
>> > I have a different idea.
>> >
>> > Given the situation and small patch list, I don't think we need to postpone the branch cut for those patches. It's easier to cut a branch-3.3 and allow backporting.
>> >
>> > As of today, we already have an obvious Apache Spark 3.4 patch in the branch together. This situation only becomes worse and worse because there is no way to block the other patches from landing unintentionally if we don't cut a branch.
>> >
>> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>> >
>> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>> >
>> > Best,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
>> >
>> > Cool, thanks for clarifying!
>> >
>> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
>> > >>
>> > >> For the following list:
>> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> > >> Do you mean we should include them, or exclude them from 3.3?
>> > >
>> > >
>> > > If possible, I hope these features can be shipped with Spark 3.3.
>> > >
>> > >
>> > >
>> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>> > >>
>> > >> Hi Xiao,
>> > >>
>> > >> For the following list:
>> > >>
>> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> > >>
>> > >> Do you mean we should include them, or exclude them from 3.3?
>> > >>
>> > >> Thanks,
>> > >> Chao
>> > >>
>> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <do...@gmail.com> wrote:
>> > >> >
>> > >> > The following was tested and merged a few minutes ago. So, we can remove it from the list.
>> > >> >
>> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> > >> >
>> > >> > Thanks,
>> > >> > Dongjoon.
>> > >> >
>> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com> wrote:
>> > >> >>
>> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to collect the list of actively developed PRs that we want to merge to 3.3 after the branch cut?
>> > >> >>
>> > >> >> Please do not rush to merge the PRs that are not fully reviewed. We can cut the branch this Friday and continue merging the PRs that have been discussed in this thread. Does that make sense?
>> > >> >>
>> > >> >> Xiao
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>> > >> >>>
>> > >> >>> May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs.
>> > >> >>>
>> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com> wrote:
>> > >> >>>>
>> > >> >>>> > To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut?
>> > >> >>>>
>> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> > >> >>>>
>> > >> >>>> Three more days are OK for this from my view.
>> > >> >>>>
>> > >> >>>> Regards,
>> > >> >>>> Yikun
>> > >> >>>
>> > >> >>> --
>> > >> >>> Twitter: https://twitter.com/holdenkarau
>> > >> >>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> > >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Apache Spark 3.3 Release

Posted by Gengliang Wang <lt...@gmail.com>.

I'd like to add the following new SQL functions in the 3.3 release. These
functions are useful when overflow or encoding errors occur:

   - [SPARK-38548][SQL] New SQL function: try_sum
   <https://github.com/apache/spark/pull/35848>
   - [SPARK-38589][SQL] New SQL function: try_avg
   <https://github.com/apache/spark/pull/35896>
   - [SPARK-38590][SQL] New SQL function: try_to_binary
   <https://github.com/apache/spark/pull/35897>

Gengliang

On Thu, Mar 17, 2022 at 7:59 AM Andrew Melo <an...@gmail.com> wrote:

> Hello,
>
> I've been trying for a bit to get the following two PRs merged and
> into a release, and I'm having some difficulty moving them forward:
>
> https://github.com/apache/spark/pull/34903 - This passes the current
> python interpreter to spark-env.sh to allow some currently-unavailable
> customization to happen
> https://github.com/apache/spark/pull/31774 - This fixes a bug in the
> SparkUI reverse proxy-handling code where it does a greedy match for
> "proxy" in the URL, and will mistakenly replace the App-ID in the
> wrong place.
>
> I'm not exactly sure of how to get attention of PRs that have been
> sitting around for a while, but these are really important to our
> use-cases, and it would be nice to have them merged in.
>
> Cheers
> Andrew
>
> On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca> wrote:
> >
> > I'd like to add/backport the logging in
> https://github.com/apache/spark/pull/35881 PR so that when users submit
> issues with dynamic allocation we can better debug what's going on.
> >
> > On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
> >>
> >> There is one item on our side that we want to backport to 3.3:
> >> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
> >> Parquet V2 support (https://github.com/apache/spark/pull/35262)
> >>
> >> It's already reviewed and approved.
> >>
> >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves <tg...@yahoo.com.invalid>
> wrote:
> >> >
> >> > It looks like the version hasn't been updated on master and still
> shows 3.3.0-SNAPSHOT, can you please update that.
> >> >
> >> > Tom
> >> >
> >> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
> maxim.gekk@databricks.com.invalid> wrote:
> >> >
> >> >
> >> > Hi All,
> >> >
> >> > I have created the branch for Spark 3.3:
> >> > https://github.com/apache/spark/commits/branch-3.3
> >> >
> >> > Please, backport important fixes to it, and if you have some doubts,
> ping me in the PR. Regarding new features, we are still building the allow
> list for branch-3.3.
> >> >
> >> > Best regards,
> >> > Max Gekk
> >> >
> >> >
> >> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <
> dongjoon.hyun@gmail.com> wrote:
> >> >
> >> > Yes, I agree with you for your whitelist approach for backporting. :)
> >> > Thank you for summarizing.
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
> >> >
> >> > I think I finally got your point. What you want to keep unchanged is
> the branch cut date of Spark 3.3. Today? or this Friday? This is not a big
> deal.
> >> >
> >> > My major concern is whether we should keep merging the feature work
> or the dependency upgrade after the branch cut. To make our release time
> more predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
> >> >
> >> >
> >> >
> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
> >> >
> >> > That is not totally fine, Xiao. It sounds like you are asking a
> change of plan without a proper reason.
> >> >
> >> > Although we cut the branch Today according our plan, you still can
> collect the list and make a list of exceptions. I'm not blocking what you
> want to do.
> >> >
> >> > Please let the community start to ramp down as we agreed before.
> >> >
> >> > Dongjoon
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
> >> >
> >> > Please do not get me wrong. If we don't cut a branch, we are allowing
> all patches to land Apache Spark 3.3. That is totally fine. After we cut
> the branch, we should avoid merging the feature work. In the next three
> days, let us collect the actively developed PRs that we want to make an
> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
> make sense?
> >> >
> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
> >> >
> >> > Xiao. You are working against what you are saying.
> >> > If you don't cut a branch, it means you are allowing all patches to
> land Apache Spark 3.3. No?
> >> >
> >> > > we need to avoid backporting the feature work that are not being
> well discussed.
> >> >
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com>
> wrote:
> >> >
> >> > Cutting the branch is simple, but we need to avoid backporting the
> feature work that are not being well discussed. Not all the members are
> actively following the dev list. I think we should wait 3 more days for
> collecting the PR list before cutting the branch.
> >> >
> >> > BTW, there are very few 3.4-only feature work that will be affected.
> >> >
> >> > Xiao
> >> >
> >> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
> >> >
> >> > Hi, Max, Chao, Xiao, Holden and all.
> >> >
> >> > I have a different idea.
> >> >
> >> > Given the situation and small patch list, I don't think we need to
> postpone the branch cut for those patches. It's easier to cut a branch-3.3
> and allow backporting.
> >> >
> >> > As of today, we already have an obvious Apache Spark 3.4 patch in the
> branch together. This situation only becomes worse and worse because there
> is no way to block the other patches from landing unintentionally if we
> don't cut a branch.
> >> >
> >> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column
> values
> >> >
> >> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
> >> >
> >> > Best,
> >> > Dongjoon.
> >> >
> >> >
> >> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
> >> >
> >> > Cool, thanks for clarifying!
> >> >
> >> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
> wrote:
> >> > >>
> >> > >> For the following list:
> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> > >> Do you mean we should include them, or exclude them from 3.3?
> >> > >
> >> > >
> >> > > If possible, I hope these features can be shipped with Spark 3.3.
> >> > >
> >> > >
> >> > >
> >> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
> >> > >>
> >> > >> Hi Xiao,
> >> > >>
> >> > >> For the following list:
> >> > >>
> >> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> >> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> > >>
> >> > >> Do you mean we should include them, or exclude them from 3.3?
> >> > >>
> >> > >> Thanks,
> >> > >> Chao
> >> > >>
> >> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
> dongjoon.hyun@gmail.com> wrote:
> >> > >> >
> >> > >> > The following was tested and merged a few minutes ago. So, we
> can remove it from the list.
> >> > >> >
> >> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> > >> >
> >> > >> > Thanks,
> >> > >> > Dongjoon.
> >> > >> >
> >> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
> wrote:
> >> > >> >>
> >> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
> days to collect the list of actively developed PRs that we want to merge to
> 3.3 after the branch cut?
> >> > >> >>
> >> > >> >> Please do not rush to merge the PRs that are not fully
> reviewed. We can cut the branch this Friday and continue merging the PRs
> that have been discussed in this thread. Does that make sense?
> >> > >> >>
> >> > >> >> Xiao
> >> > >> >>
> >> > >> >>
> >> > >> >>
> >> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
> >> > >> >>>
> >> > >> >>> May I suggest we push out one week (22nd) just to give
> everyone a bit of breathing space? Rushed software development more often
> results in bugs.
> >> > >> >>>
> >> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
> yikunkero@gmail.com> wrote:
> >> > >> >>>>
> >> > >> >>>> > To make our release time more predictable, let us collect
> the PRs and wait three more days before the branch cut?
> >> > >> >>>>
> >> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
> >> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> > >> >>>>
> >> > >> >>>> Three more days are OK for this from my view.
> >> > >> >>>>
> >> > >> >>>> Regards,
> >> > >> >>>> Yikun
> >> > >> >>>
> >> > >> >>> --
> >> > >> >>> Twitter: https://twitter.com/holdenkarau
> >> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> > >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >
> >
> >
> > --
> > Twitter: https://twitter.com/holdenkarau
> > Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Apache Spark 3.3 Release

Posted by Andrew Melo <an...@gmail.com>.

Hello,

I've been trying for a bit to get the following two PRs merged and
into a release, and I'm having some difficulty moving them forward:

https://github.com/apache/spark/pull/34903 - This passes the current
python interpreter to spark-env.sh to allow some currently-unavailable
customization to happen
https://github.com/apache/spark/pull/31774 - This fixes a bug in the
SparkUI reverse proxy-handling code where it does a greedy match for
"proxy" in the URL, and will mistakenly replace the App-ID in the
wrong place.

I'm not exactly sure of how to get attention of PRs that have been
sitting around for a while, but these are really important to our
use-cases, and it would be nice to have them merged in.

Cheers
Andrew

On Wed, Mar 16, 2022 at 6:21 PM Holden Karau <ho...@pigscanfly.ca> wrote:
>
> I'd like to add/backport the logging in https://github.com/apache/spark/pull/35881 PR so that when users submit issues with dynamic allocation we can better debug what's going on.
>
> On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:
>>
>> There is one item on our side that we want to backport to 3.3:
>> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
>> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>>
>> It's already reviewed and approved.
>>
>> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves <tg...@yahoo.com.invalid> wrote:
>> >
>> > It looks like the version hasn't been updated on master and still shows 3.3.0-SNAPSHOT, can you please update that.
>> >
>> > Tom
>> >
>> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <ma...@databricks.com.invalid> wrote:
>> >
>> >
>> > Hi All,
>> >
>> > I have created the branch for Spark 3.3:
>> > https://github.com/apache/spark/commits/branch-3.3
>> >
>> > Please, backport important fixes to it, and if you have some doubts, ping me in the PR. Regarding new features, we are still building the allow list for branch-3.3.
>> >
>> > Best regards,
>> > Max Gekk
>> >
>> >
>> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <do...@gmail.com> wrote:
>> >
>> > Yes, I agree with you for your whitelist approach for backporting. :)
>> > Thank you for summarizing.
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > I think I finally got your point. What you want to keep unchanged is the branch cut date of Spark 3.3. Today? or this Friday? This is not a big deal.
>> >
>> > My major concern is whether we should keep merging the feature work or the dependency upgrade after the branch cut. To make our release time more predictable, I am suggesting we should finalize the exception PR list first, instead of merging them in an ad hoc way. In the past, we spent a lot of time on the revert of the PRs that were merged after the branch cut. I hope we can minimize unnecessary arguments in this release. Do you agree, Dongjoon?
>> >
>> >
>> >
>> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>> >
>> > That is not totally fine, Xiao. It sounds like you are asking a change of plan without a proper reason.
>> >
>> > Although we cut the branch Today according our plan, you still can collect the list and make a list of exceptions. I'm not blocking what you want to do.
>> >
>> > Please let the community start to ramp down as we agreed before.
>> >
>> > Dongjoon
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > Please do not get me wrong. If we don't cut a branch, we are allowing all patches to land Apache Spark 3.3. That is totally fine. After we cut the branch, we should avoid merging the feature work. In the next three days, let us collect the actively developed PRs that we want to make an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>> >
>> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>> >
>> > Xiao. You are working against what you are saying.
>> > If you don't cut a branch, it means you are allowing all patches to land Apache Spark 3.3. No?
>> >
>> > > we need to avoid backporting the feature work that are not being well discussed.
>> >
>> >
>> >
>> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > Cutting the branch is simple, but we need to avoid backporting the feature work that are not being well discussed. Not all the members are actively following the dev list. I think we should wait 3 more days for collecting the PR list before cutting the branch.
>> >
>> > BTW, there are very few 3.4-only feature work that will be affected.
>> >
>> > Xiao
>> >
>> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>> >
>> > Hi, Max, Chao, Xiao, Holden and all.
>> >
>> > I have a different idea.
>> >
>> > Given the situation and small patch list, I don't think we need to postpone the branch cut for those patches. It's easier to cut a branch-3.3 and allow backporting.
>> >
>> > As of today, we already have an obvious Apache Spark 3.4 patch in the branch together. This situation only becomes worse and worse because there is no way to block the other patches from landing unintentionally if we don't cut a branch.
>> >
>> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>> >
>> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>> >
>> > Best,
>> > Dongjoon.
>> >
>> >
>> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
>> >
>> > Cool, thanks for clarifying!
>> >
>> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
>> > >>
>> > >> For the following list:
>> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> > >> Do you mean we should include them, or exclude them from 3.3?
>> > >
>> > >
>> > > If possible, I hope these features can be shipped with Spark 3.3.
>> > >
>> > >
>> > >
>> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>> > >>
>> > >> Hi Xiao,
>> > >>
>> > >> For the following list:
>> > >>
>> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> > >>
>> > >> Do you mean we should include them, or exclude them from 3.3?
>> > >>
>> > >> Thanks,
>> > >> Chao
>> > >>
>> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <do...@gmail.com> wrote:
>> > >> >
>> > >> > The following was tested and merged a few minutes ago. So, we can remove it from the list.
>> > >> >
>> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> > >> >
>> > >> > Thanks,
>> > >> > Dongjoon.
>> > >> >
>> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com> wrote:
>> > >> >>
>> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to collect the list of actively developed PRs that we want to merge to 3.3 after the branch cut?
>> > >> >>
>> > >> >> Please do not rush to merge the PRs that are not fully reviewed. We can cut the branch this Friday and continue merging the PRs that have been discussed in this thread. Does that make sense?
>> > >> >>
>> > >> >> Xiao
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>> > >> >>>
>> > >> >>> May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs.
>> > >> >>>
>> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com> wrote:
>> > >> >>>>
>> > >> >>>> > To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut?
>> > >> >>>>
>> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> > >> >>>>
>> > >> >>>> Three more days are OK for this from my view.
>> > >> >>>>
>> > >> >>>> Regards,
>> > >> >>>> Yikun
>> > >> >>>
>> > >> >>> --
>> > >> >>> Twitter: https://twitter.com/holdenkarau
>> > >> >>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> > >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Apache Spark 3.3 Release

Posted by Holden Karau <ho...@pigscanfly.ca>.

I'd like to add/backport the logging in
https://github.com/apache/spark/pull/35881 PR so that when users submit
issues with dynamic allocation we can better debug what's going on.

On Wed, Mar 16, 2022 at 3:45 PM Chao Sun <su...@apache.org> wrote:

> There is one item on our side that we want to backport to 3.3:
> - vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
> Parquet V2 support (https://github.com/apache/spark/pull/35262)
>
> It's already reviewed and approved.
>
> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves <tg...@yahoo.com.invalid>
> wrote:
> >
> > It looks like the version hasn't been updated on master and still shows
> 3.3.0-SNAPSHOT, can you please update that.
> >
> > Tom
> >
> > On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <
> maxim.gekk@databricks.com.invalid> wrote:
> >
> >
> > Hi All,
> >
> > I have created the branch for Spark 3.3:
> > https://github.com/apache/spark/commits/branch-3.3
> >
> > Please, backport important fixes to it, and if you have some doubts,
> ping me in the PR. Regarding new features, we are still building the allow
> list for branch-3.3.
> >
> > Best regards,
> > Max Gekk
> >
> >
> > On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
> >
> > Yes, I agree with you for your whitelist approach for backporting. :)
> > Thank you for summarizing.
> >
> > Thanks,
> > Dongjoon.
> >
> >
> > On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
> >
> > I think I finally got your point. What you want to keep unchanged is the
> branch cut date of Spark 3.3. Today? or this Friday? This is not a big deal.
> >
> > My major concern is whether we should keep merging the feature work or
> the dependency upgrade after the branch cut. To make our release time more
> predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
> >
> >
> >
> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
> >
> > That is not totally fine, Xiao. It sounds like you are asking a change
> of plan without a proper reason.
> >
> > Although we cut the branch Today according our plan, you still can
> collect the list and make a list of exceptions. I'm not blocking what you
> want to do.
> >
> > Please let the community start to ramp down as we agreed before.
> >
> > Dongjoon
> >
> >
> >
> > On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
> >
> > Please do not get me wrong. If we don't cut a branch, we are allowing
> all patches to land Apache Spark 3.3. That is totally fine. After we cut
> the branch, we should avoid merging the feature work. In the next three
> days, let us collect the actively developed PRs that we want to make an
> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
> make sense?
> >
> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
> >
> > Xiao. You are working against what you are saying.
> > If you don't cut a branch, it means you are allowing all patches to land
> Apache Spark 3.3. No?
> >
> > > we need to avoid backporting the feature work that are not being well
> discussed.
> >
> >
> >
> > On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:
> >
> > Cutting the branch is simple, but we need to avoid backporting the
> feature work that are not being well discussed. Not all the members are
> actively following the dev list. I think we should wait 3 more days for
> collecting the PR list before cutting the branch.
> >
> > BTW, there are very few 3.4-only feature work that will be affected.
> >
> > Xiao
> >
> > Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
> >
> > Hi, Max, Chao, Xiao, Holden and all.
> >
> > I have a different idea.
> >
> > Given the situation and small patch list, I don't think we need to
> postpone the branch cut for those patches. It's easier to cut a branch-3.3
> and allow backporting.
> >
> > As of today, we already have an obvious Apache Spark 3.4 patch in the
> branch together. This situation only becomes worse and worse because there
> is no way to block the other patches from landing unintentionally if we
> don't cut a branch.
> >
> >     [SPARK-38335][SQL] Implement parser support for DEFAULT column values
> >
> > Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
> >
> > Best,
> > Dongjoon.
> >
> >
> > On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
> >
> > Cool, thanks for clarifying!
> >
> > On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
> > >>
> > >> For the following list:
> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> > >> Do you mean we should include them, or exclude them from 3.3?
> > >
> > >
> > > If possible, I hope these features can be shipped with Spark 3.3.
> > >
> > >
> > >
> > > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
> > >>
> > >> Hi Xiao,
> > >>
> > >> For the following list:
> > >>
> > >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> > >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
> vectorized reader
> > >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> > >>
> > >> Do you mean we should include them, or exclude them from 3.3?
> > >>
> > >> Thanks,
> > >> Chao
> > >>
> > >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
> dongjoon.hyun@gmail.com> wrote:
> > >> >
> > >> > The following was tested and merged a few minutes ago. So, we can
> remove it from the list.
> > >> >
> > >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> > >> >
> > >> > Thanks,
> > >> > Dongjoon.
> > >> >
> > >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
> wrote:
> > >> >>
> > >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days
> to collect the list of actively developed PRs that we want to merge to 3.3
> after the branch cut?
> > >> >>
> > >> >> Please do not rush to merge the PRs that are not fully reviewed.
> We can cut the branch this Friday and continue merging the PRs that have
> been discussed in this thread. Does that make sense?
> > >> >>
> > >> >> Xiao
> > >> >>
> > >> >>
> > >> >>
> > >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
> > >> >>>
> > >> >>> May I suggest we push out one week (22nd) just to give everyone a
> bit of breathing space? Rushed software development more often results in
> bugs.
> > >> >>>
> > >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com>
> wrote:
> > >> >>>>
> > >> >>>> > To make our release time more predictable, let us collect the
> PRs and wait three more days before the branch cut?
> > >> >>>>
> > >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
> > >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> > >> >>>>
> > >> >>>> Three more days are OK for this from my view.
> > >> >>>>
> > >> >>>> Regards,
> > >> >>>> Yikun
> > >> >>>
> > >> >>> --
> > >> >>> Twitter: https://twitter.com/holdenkarau
> > >> >>> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> > >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: Apache Spark 3.3 Release

Posted by Chao Sun <su...@apache.org>.

There is one item on our side that we want to backport to 3.3:
- vectorized DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for
Parquet V2 support (https://github.com/apache/spark/pull/35262)

It's already reviewed and approved.

On Wed, Mar 16, 2022 at 9:13 AM Tom Graves <tg...@yahoo.com.invalid> wrote:
>
> It looks like the version hasn't been updated on master and still shows 3.3.0-SNAPSHOT, can you please update that.
>
> Tom
>
> On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <ma...@databricks.com.invalid> wrote:
>
>
> Hi All,
>
> I have created the branch for Spark 3.3:
> https://github.com/apache/spark/commits/branch-3.3
>
> Please, backport important fixes to it, and if you have some doubts, ping me in the PR. Regarding new features, we are still building the allow list for branch-3.3.
>
> Best regards,
> Max Gekk
>
>
> On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <do...@gmail.com> wrote:
>
> Yes, I agree with you for your whitelist approach for backporting. :)
> Thank you for summarizing.
>
> Thanks,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
>
> I think I finally got your point. What you want to keep unchanged is the branch cut date of Spark 3.3. Today? or this Friday? This is not a big deal.
>
> My major concern is whether we should keep merging the feature work or the dependency upgrade after the branch cut. To make our release time more predictable, I am suggesting we should finalize the exception PR list first, instead of merging them in an ad hoc way. In the past, we spent a lot of time on the revert of the PRs that were merged after the branch cut. I hope we can minimize unnecessary arguments in this release. Do you agree, Dongjoon?
>
>
>
> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>
> That is not totally fine, Xiao. It sounds like you are asking a change of plan without a proper reason.
>
> Although we cut the branch Today according our plan, you still can collect the list and make a list of exceptions. I'm not blocking what you want to do.
>
> Please let the community start to ramp down as we agreed before.
>
> Dongjoon
>
>
>
> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
>
> Please do not get me wrong. If we don't cut a branch, we are allowing all patches to land Apache Spark 3.3. That is totally fine. After we cut the branch, we should avoid merging the feature work. In the next three days, let us collect the actively developed PRs that we want to make an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>
> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>
> Xiao. You are working against what you are saying.
> If you don't cut a branch, it means you are allowing all patches to land Apache Spark 3.3. No?
>
> > we need to avoid backporting the feature work that are not being well discussed.
>
>
>
> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:
>
> Cutting the branch is simple, but we need to avoid backporting the feature work that are not being well discussed. Not all the members are actively following the dev list. I think we should wait 3 more days for collecting the PR list before cutting the branch.
>
> BTW, there are very few 3.4-only feature work that will be affected.
>
> Xiao
>
> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>
> Hi, Max, Chao, Xiao, Holden and all.
>
> I have a different idea.
>
> Given the situation and small patch list, I don't think we need to postpone the branch cut for those patches. It's easier to cut a branch-3.3 and allow backporting.
>
> As of today, we already have an obvious Apache Spark 3.4 patch in the branch together. This situation only becomes worse and worse because there is no way to block the other patches from landing unintentionally if we don't cut a branch.
>
>     [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>
> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>
> Best,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
>
> Cool, thanks for clarifying!
>
> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
> >>
> >> For the following list:
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> Do you mean we should include them, or exclude them from 3.3?
> >
> >
> > If possible, I hope these features can be shipped with Spark 3.3.
> >
> >
> >
> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
> >>
> >> Hi Xiao,
> >>
> >> For the following list:
> >>
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >>
> >> Do you mean we should include them, or exclude them from 3.3?
> >>
> >> Thanks,
> >> Chao
> >>
> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <do...@gmail.com> wrote:
> >> >
> >> > The following was tested and merged a few minutes ago. So, we can remove it from the list.
> >> >
> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com> wrote:
> >> >>
> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to collect the list of actively developed PRs that we want to merge to 3.3 after the branch cut?
> >> >>
> >> >> Please do not rush to merge the PRs that are not fully reviewed. We can cut the branch this Friday and continue merging the PRs that have been discussed in this thread. Does that make sense?
> >> >>
> >> >> Xiao
> >> >>
> >> >>
> >> >>
> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
> >> >>>
> >> >>> May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs.
> >> >>>
> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com> wrote:
> >> >>>>
> >> >>>> > To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut?
> >> >>>>
> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> >>>>
> >> >>>> Three more days are OK for this from my view.
> >> >>>>
> >> >>>> Regards,
> >> >>>> Yikun
> >> >>>
> >> >>> --
> >> >>> Twitter: https://twitter.com/holdenkarau
> >> >>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Apache Spark 3.3 Release

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

 It looks like the version hasn't been updated on master and still shows 3.3.0-SNAPSHOT, can you please update that. 
Tom
    On Wednesday, March 16, 2022, 01:41:00 AM CDT, Maxim Gekk <ma...@databricks.com.invalid> wrote:  

 Hi All,

I have created the branch for Spark 3.3:
https://github.com/apache/spark/commits/branch-3.3

Please, backport important fixes to it, and if you have some doubts, ping me in the PR. Regarding new features, we are still building the allow list for branch-3.3.
Best regards,Max Gekk

On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <do...@gmail.com> wrote:

Yes, I agree with you for your whitelist approach for backporting. :)Thank you for summarizing.

Thanks,Dongjoon.

On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:

I think I finally got your point. What you want to keep unchanged is the branch cut date of Spark 3.3. Today? or this Friday? This is not a big deal. 
My major concern is whether we should keep merging the feature work or the dependency upgrade after the branch cut. To make our release time more predictable, I am suggesting we should finalize the exception PR list first, instead of merging them in an ad hoc way. In the past, we spent a lot of time on the revert of the PRs that were merged after the branch cut. I hope we can minimize unnecessary arguments in this release. Do you agree, Dongjoon?

Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：

That is not totally fine, Xiao. It sounds like you are asking a change of plan without a proper reason.
Although we cut the branch Today according our plan, you still can collect the list and make a list of exceptions. I'm not blocking what you want to do.
Please let the community start to ramp down as we agreed before.
Dongjoon

On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:

Please do not get me wrong. If we don't cut a branch, we are allowing all patches to land Apache Spark 3.3. That is totally fine. After we cut the branch, we should avoid merging the feature work. In the next three days, let us collect the actively developed PRs that we want to make an exception (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：

Xiao. You are working against what you are saying.If you don't cut a branch, it means you are allowing all patches to land Apache Spark 3.3. No?

> we need to avoid backporting the feature work that are not being well discussed.

On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:

Cutting the branch is simple, but we need to avoid backporting the feature work that are not being well discussed. Not all the members are actively following the dev list. I think we should wait 3 more days for collecting the PR list before cutting the branch. 
BTW, there are very few 3.4-only feature work that will be affected.

Xiao
Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：

Hi, Max, Chao, Xiao, Holden and all.
I have a different idea.
Given the situation and small patch list, I don't think we need to postpone the branch cut for those patches. It's easier to cut a branch-3.3 and allow backporting.
As of today, we already have an obvious Apache Spark 3.4 patch in the branch together. This situation only becomes worse and worse because there is no way to block the other patches from landing unintentionally if we don't cut a branch.
    [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
Best,
Dongjoon.

On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:

Cool, thanks for clarifying!

On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
>>
>> For the following list:
>> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> Do you mean we should include them, or exclude them from 3.3?
>
>
> If possible, I hope these features can be shipped with Spark 3.3.
>
>
>
> Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>
>> Hi Xiao,
>>
>> For the following list:
>>
>> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>
>> Do you mean we should include them, or exclude them from 3.3?
>>
>> Thanks,
>> Chao
>>
>> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <do...@gmail.com> wrote:
>> >
>> > The following was tested and merged a few minutes ago. So, we can remove it from the list.
>> >
>> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com> wrote:
>> >>
>> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to collect the list of actively developed PRs that we want to merge to 3.3 after the branch cut?
>> >>
>> >> Please do not rush to merge the PRs that are not fully reviewed. We can cut the branch this Friday and continue merging the PRs that have been discussed in this thread. Does that make sense?
>> >>
>> >> Xiao
>> >>
>> >>
>> >>
>> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>> >>>
>> >>> May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs.
>> >>>
>> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com> wrote:
>> >>>>
>> >>>> > To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut?
>> >>>>
>> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >>>>
>> >>>> Three more days are OK for this from my view.
>> >>>>
>> >>>> Regards,
>> >>>> Yikun
>> >>>
>> >>> --
>> >>> Twitter: https://twitter.com/holdenkarau
>> >>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: Apache Spark 3.3 Release

Posted by Maxim Gekk <ma...@databricks.com.INVALID>.

Hi All,

I have created the branch for Spark 3.3:
https://github.com/apache/spark/commits/branch-3.3

Please, backport important fixes to it, and if you have some doubts, ping
me in the PR. Regarding new features, we are still building the allow list
for branch-3.3.

Best regards,
Max Gekk


On Wed, Mar 16, 2022 at 5:51 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Yes, I agree with you for your whitelist approach for backporting. :)
> Thank you for summarizing.
>
> Thanks,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:
>
>> I think I finally got your point. What you want to keep unchanged is the
>> branch cut date of Spark 3.3. Today? or this Friday? This is not a big
>> deal.
>>
>> My major concern is whether we should keep merging the feature work or
>> the dependency upgrade after the branch cut. To make our release time more
>> predictable, I am suggesting we should finalize the exception PR list
>> first, instead of merging them in an ad hoc way. In the past, we spent a
>> lot of time on the revert of the PRs that were merged after the branch cut.
>> I hope we can minimize unnecessary arguments in this release. Do you agree,
>> Dongjoon?
>>
>>
>>
>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>>
>>> That is not totally fine, Xiao. It sounds like you are asking a change
>>> of plan without a proper reason.
>>>
>>> Although we cut the branch Today according our plan, you still can
>>> collect the list and make a list of exceptions. I'm not blocking what you
>>> want to do.
>>>
>>> Please let the community start to ramp down as we agreed before.
>>>
>>> Dongjoon
>>>
>>>
>>>
>>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
>>>
>>>> Please do not get me wrong. If we don't cut a branch, we are allowing
>>>> all patches to land Apache Spark 3.3. That is totally fine. After we cut
>>>> the branch, we should avoid merging the feature work. In the next three
>>>> days, let us collect the actively developed PRs that we want to make an
>>>> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>>> make sense?
>>>>
>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>>>
>>>>> Xiao. You are working against what you are saying.
>>>>> If you don't cut a branch, it means you are allowing all patches to
>>>>> land Apache Spark 3.3. No?
>>>>>
>>>>> > we need to avoid backporting the feature work that are not being
>>>>> well discussed.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:
>>>>>
>>>>>> Cutting the branch is simple, but we need to avoid backporting the
>>>>>> feature work that are not being well discussed. Not all the members are
>>>>>> actively following the dev list. I think we should wait 3 more days for
>>>>>> collecting the PR list before cutting the branch.
>>>>>>
>>>>>> BTW, there are very few 3.4-only feature work that will be affected.
>>>>>>
>>>>>> Xiao
>>>>>>
>>>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>>>>
>>>>>>> Hi, Max, Chao, Xiao, Holden and all.
>>>>>>>
>>>>>>> I have a different idea.
>>>>>>>
>>>>>>> Given the situation and small patch list, I don't think we need to
>>>>>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>>>>>> and allow backporting.
>>>>>>>
>>>>>>> As of today, we already have an obvious Apache Spark 3.4 patch in
>>>>>>> the branch together. This situation only becomes worse and worse because
>>>>>>> there is no way to block the other patches from landing unintentionally if
>>>>>>> we don't cut a branch.
>>>>>>>
>>>>>>>     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>>>>>> values
>>>>>>>
>>>>>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>>>>>
>>>>>>> Best,
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Cool, thanks for clarifying!
>>>>>>>>
>>>>>>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> For the following list:
>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>>> vectorized reader
>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>>>>>> >>
>>>>>>>> >> Hi Xiao,
>>>>>>>> >>
>>>>>>>> >> For the following list:
>>>>>>>> >>
>>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>>> vectorized reader
>>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>>> >>
>>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>>> >>
>>>>>>>> >> Thanks,
>>>>>>>> >> Chao
>>>>>>>> >>
>>>>>>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>> >> >
>>>>>>>> >> > The following was tested and merged a few minutes ago. So, we
>>>>>>>> can remove it from the list.
>>>>>>>> >> >
>>>>>>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>>>>>> >> >
>>>>>>>> >> > Thanks,
>>>>>>>> >> > Dongjoon.
>>>>>>>> >> >
>>>>>>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>>>>>>> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>>>>>>> days to collect the list of actively developed PRs that we want to merge to
>>>>>>>> 3.3 after the branch cut?
>>>>>>>> >> >>
>>>>>>>> >> >> Please do not rush to merge the PRs that are not fully
>>>>>>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>>>>>>> that have been discussed in this thread. Does that make sense?
>>>>>>>> >> >>
>>>>>>>> >> >> Xiao
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>>>>>> >> >>>
>>>>>>>> >> >>> May I suggest we push out one week (22nd) just to give
>>>>>>>> everyone a bit of breathing space? Rushed software development more often
>>>>>>>> results in bugs.
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>>>>>> yikunkero@gmail.com> wrote:
>>>>>>>> >> >>>>
>>>>>>>> >> >>>> > To make our release time more predictable, let us collect
>>>>>>>> the PRs and wait three more days before the branch cut?
>>>>>>>> >> >>>>
>>>>>>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>>>>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to
>>>>>>>> v1.5.1
>>>>>>>> >> >>>>
>>>>>>>> >> >>>> Three more days are OK for this from my view.
>>>>>>>> >> >>>>
>>>>>>>> >> >>>> Regards,
>>>>>>>> >> >>>> Yikun
>>>>>>>> >> >>>
>>>>>>>> >> >>> --
>>>>>>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>>>>>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>> https://amzn.to/2MaRAG9
>>>>>>>> >> >>> YouTube Live Streams:
>>>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>>>>
>>>>>>>

Re: Apache Spark 3.3 Release

Posted by Dongjoon Hyun <do...@gmail.com>.

Yes, I agree with you for your whitelist approach for backporting. :)
Thank you for summarizing.

Thanks,
Dongjoon.


On Tue, Mar 15, 2022 at 4:20 PM Xiao Li <ga...@gmail.com> wrote:

> I think I finally got your point. What you want to keep unchanged is the
> branch cut date of Spark 3.3. Today? or this Friday? This is not a big
> deal.
>
> My major concern is whether we should keep merging the feature work or the
> dependency upgrade after the branch cut. To make our release time more
> predictable, I am suggesting we should finalize the exception PR list
> first, instead of merging them in an ad hoc way. In the past, we spent a
> lot of time on the revert of the PRs that were merged after the branch cut.
> I hope we can minimize unnecessary arguments in this release. Do you agree,
> Dongjoon?
>
>
>
> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：
>
>> That is not totally fine, Xiao. It sounds like you are asking a change of
>> plan without a proper reason.
>>
>> Although we cut the branch Today according our plan, you still can
>> collect the list and make a list of exceptions. I'm not blocking what you
>> want to do.
>>
>> Please let the community start to ramp down as we agreed before.
>>
>> Dongjoon
>>
>>
>>
>> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
>>
>>> Please do not get me wrong. If we don't cut a branch, we are allowing
>>> all patches to land Apache Spark 3.3. That is totally fine. After we cut
>>> the branch, we should avoid merging the feature work. In the next three
>>> days, let us collect the actively developed PRs that we want to make an
>>> exception (i.e., merged to 3.3 after the upcoming branch cut). Does that
>>> make sense?
>>>
>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>>
>>>> Xiao. You are working against what you are saying.
>>>> If you don't cut a branch, it means you are allowing all patches to
>>>> land Apache Spark 3.3. No?
>>>>
>>>> > we need to avoid backporting the feature work that are not being well
>>>> discussed.
>>>>
>>>>
>>>>
>>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:
>>>>
>>>>> Cutting the branch is simple, but we need to avoid backporting the
>>>>> feature work that are not being well discussed. Not all the members are
>>>>> actively following the dev list. I think we should wait 3 more days for
>>>>> collecting the PR list before cutting the branch.
>>>>>
>>>>> BTW, there are very few 3.4-only feature work that will be affected.
>>>>>
>>>>> Xiao
>>>>>
>>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>>>
>>>>>> Hi, Max, Chao, Xiao, Holden and all.
>>>>>>
>>>>>> I have a different idea.
>>>>>>
>>>>>> Given the situation and small patch list, I don't think we need to
>>>>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>>>>> and allow backporting.
>>>>>>
>>>>>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>>>>>> branch together. This situation only becomes worse and worse because there
>>>>>> is no way to block the other patches from landing unintentionally if we
>>>>>> don't cut a branch.
>>>>>>
>>>>>>     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>>>>> values
>>>>>>
>>>>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>>>>
>>>>>> Best,
>>>>>> Dongjoon.
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
>>>>>>
>>>>>>> Cool, thanks for clarifying!
>>>>>>>
>>>>>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> For the following list:
>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>> vectorized reader
>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>> >
>>>>>>> >
>>>>>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>>>>> >>
>>>>>>> >> Hi Xiao,
>>>>>>> >>
>>>>>>> >> For the following list:
>>>>>>> >>
>>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>>> vectorized reader
>>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>>> >>
>>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>>> >>
>>>>>>> >> Thanks,
>>>>>>> >> Chao
>>>>>>> >>
>>>>>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>> >> >
>>>>>>> >> > The following was tested and merged a few minutes ago. So, we
>>>>>>> can remove it from the list.
>>>>>>> >> >
>>>>>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>>>>> >> >
>>>>>>> >> > Thanks,
>>>>>>> >> > Dongjoon.
>>>>>>> >> >
>>>>>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>>>>>> wrote:
>>>>>>> >> >>
>>>>>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>>>>>> days to collect the list of actively developed PRs that we want to merge to
>>>>>>> 3.3 after the branch cut?
>>>>>>> >> >>
>>>>>>> >> >> Please do not rush to merge the PRs that are not fully
>>>>>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>>>>>> that have been discussed in this thread. Does that make sense?
>>>>>>> >> >>
>>>>>>> >> >> Xiao
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >>
>>>>>>> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>>>>> >> >>>
>>>>>>> >> >>> May I suggest we push out one week (22nd) just to give
>>>>>>> everyone a bit of breathing space? Rushed software development more often
>>>>>>> results in bugs.
>>>>>>> >> >>>
>>>>>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>>>>> yikunkero@gmail.com> wrote:
>>>>>>> >> >>>>
>>>>>>> >> >>>> > To make our release time more predictable, let us collect
>>>>>>> the PRs and wait three more days before the branch cut?
>>>>>>> >> >>>>
>>>>>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>>>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>>>>> >> >>>>
>>>>>>> >> >>>> Three more days are OK for this from my view.
>>>>>>> >> >>>>
>>>>>>> >> >>>> Regards,
>>>>>>> >> >>>> Yikun
>>>>>>> >> >>>
>>>>>>> >> >>> --
>>>>>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>>>>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>> https://amzn.to/2MaRAG9
>>>>>>> >> >>> YouTube Live Streams:
>>>>>>> https://www.youtube.com/user/holdenkarau
>>>>>>>
>>>>>>

Re: Apache Spark 3.3 Release

Posted by Xiao Li <ga...@gmail.com>.

I think I finally got your point. What you want to keep unchanged is the
branch cut date of Spark 3.3. Today? or this Friday? This is not a big
deal.

My major concern is whether we should keep merging the feature work or the
dependency upgrade after the branch cut. To make our release time more
predictable, I am suggesting we should finalize the exception PR list
first, instead of merging them in an ad hoc way. In the past, we spent a
lot of time on the revert of the PRs that were merged after the branch cut.
I hope we can minimize unnecessary arguments in this release. Do you agree,
Dongjoon?



Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 15:55写道：

> That is not totally fine, Xiao. It sounds like you are asking a change of
> plan without a proper reason.
>
> Although we cut the branch Today according our plan, you still can collect
> the list and make a list of exceptions. I'm not blocking what you want to
> do.
>
> Please let the community start to ramp down as we agreed before.
>
> Dongjoon
>
>
>
> On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:
>
>> Please do not get me wrong. If we don't cut a branch, we are allowing all
>> patches to land Apache Spark 3.3. That is totally fine. After we cut the
>> branch, we should avoid merging the feature work. In the next three days,
>> let us collect the actively developed PRs that we want to make an exception
>> (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>>
>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>>
>>> Xiao. You are working against what you are saying.
>>> If you don't cut a branch, it means you are allowing all patches to land
>>> Apache Spark 3.3. No?
>>>
>>> > we need to avoid backporting the feature work that are not being well
>>> discussed.
>>>
>>>
>>>
>>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:
>>>
>>>> Cutting the branch is simple, but we need to avoid backporting the
>>>> feature work that are not being well discussed. Not all the members are
>>>> actively following the dev list. I think we should wait 3 more days for
>>>> collecting the PR list before cutting the branch.
>>>>
>>>> BTW, there are very few 3.4-only feature work that will be affected.
>>>>
>>>> Xiao
>>>>
>>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>>
>>>>> Hi, Max, Chao, Xiao, Holden and all.
>>>>>
>>>>> I have a different idea.
>>>>>
>>>>> Given the situation and small patch list, I don't think we need to
>>>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>>>> and allow backporting.
>>>>>
>>>>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>>>>> branch together. This situation only becomes worse and worse because there
>>>>> is no way to block the other patches from landing unintentionally if we
>>>>> don't cut a branch.
>>>>>
>>>>>     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>>>> values
>>>>>
>>>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>>>
>>>>> Best,
>>>>> Dongjoon.
>>>>>
>>>>>
>>>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
>>>>>
>>>>>> Cool, thanks for clarifying!
>>>>>>
>>>>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> For the following list:
>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>> vectorized reader
>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>> >
>>>>>> >
>>>>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>>>> >>
>>>>>> >> Hi Xiao,
>>>>>> >>
>>>>>> >> For the following list:
>>>>>> >>
>>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>>> vectorized reader
>>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>>> >>
>>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Chao
>>>>>> >>
>>>>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>> >> >
>>>>>> >> > The following was tested and merged a few minutes ago. So, we
>>>>>> can remove it from the list.
>>>>>> >> >
>>>>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>>>> >> >
>>>>>> >> > Thanks,
>>>>>> >> > Dongjoon.
>>>>>> >> >
>>>>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>>>>> days to collect the list of actively developed PRs that we want to merge to
>>>>>> 3.3 after the branch cut?
>>>>>> >> >>
>>>>>> >> >> Please do not rush to merge the PRs that are not fully
>>>>>> reviewed. We can cut the branch this Friday and continue merging the PRs
>>>>>> that have been discussed in this thread. Does that make sense?
>>>>>> >> >>
>>>>>> >> >> Xiao
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>>>> >> >>>
>>>>>> >> >>> May I suggest we push out one week (22nd) just to give
>>>>>> everyone a bit of breathing space? Rushed software development more often
>>>>>> results in bugs.
>>>>>> >> >>>
>>>>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>>>> yikunkero@gmail.com> wrote:
>>>>>> >> >>>>
>>>>>> >> >>>> > To make our release time more predictable, let us collect
>>>>>> the PRs and wait three more days before the branch cut?
>>>>>> >> >>>>
>>>>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>>>> >> >>>>
>>>>>> >> >>>> Three more days are OK for this from my view.
>>>>>> >> >>>>
>>>>>> >> >>>> Regards,
>>>>>> >> >>>> Yikun
>>>>>> >> >>>
>>>>>> >> >>> --
>>>>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>>>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9
>>>>>> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>>

Re: Apache Spark 3.3 Release

Posted by Dongjoon Hyun <do...@gmail.com>.

That is not totally fine, Xiao. It sounds like you are asking a change of
plan without a proper reason.

Although we cut the branch Today according our plan, you still can collect
the list and make a list of exceptions. I'm not blocking what you want to
do.

Please let the community start to ramp down as we agreed before.

Dongjoon



On Tue, Mar 15, 2022 at 3:07 PM Xiao Li <ga...@gmail.com> wrote:

> Please do not get me wrong. If we don't cut a branch, we are allowing all
> patches to land Apache Spark 3.3. That is totally fine. After we cut the
> branch, we should avoid merging the feature work. In the next three days,
> let us collect the actively developed PRs that we want to make an exception
> (i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?
>
> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：
>
>> Xiao. You are working against what you are saying.
>> If you don't cut a branch, it means you are allowing all patches to land
>> Apache Spark 3.3. No?
>>
>> > we need to avoid backporting the feature work that are not being well
>> discussed.
>>
>>
>>
>> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:
>>
>>> Cutting the branch is simple, but we need to avoid backporting the
>>> feature work that are not being well discussed. Not all the members are
>>> actively following the dev list. I think we should wait 3 more days for
>>> collecting the PR list before cutting the branch.
>>>
>>> BTW, there are very few 3.4-only feature work that will be affected.
>>>
>>> Xiao
>>>
>>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>>
>>>> Hi, Max, Chao, Xiao, Holden and all.
>>>>
>>>> I have a different idea.
>>>>
>>>> Given the situation and small patch list, I don't think we need to
>>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>>> and allow backporting.
>>>>
>>>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>>>> branch together. This situation only becomes worse and worse because there
>>>> is no way to block the other patches from landing unintentionally if we
>>>> don't cut a branch.
>>>>
>>>>     [SPARK-38335][SQL] Implement parser support for DEFAULT column
>>>> values
>>>>
>>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>>
>>>> Best,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
>>>>
>>>>> Cool, thanks for clarifying!
>>>>>
>>>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
>>>>> >>
>>>>> >> For the following list:
>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>> vectorized reader
>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>> >
>>>>> >
>>>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>>>> >
>>>>> >
>>>>> >
>>>>> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>>> >>
>>>>> >> Hi Xiao,
>>>>> >>
>>>>> >> For the following list:
>>>>> >>
>>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>>> vectorized reader
>>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>>> >>
>>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>>> >>
>>>>> >> Thanks,
>>>>> >> Chao
>>>>> >>
>>>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>> >> >
>>>>> >> > The following was tested and merged a few minutes ago. So, we can
>>>>> remove it from the list.
>>>>> >> >
>>>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>>> >> >
>>>>> >> > Thanks,
>>>>> >> > Dongjoon.
>>>>> >> >
>>>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more
>>>>> days to collect the list of actively developed PRs that we want to merge to
>>>>> 3.3 after the branch cut?
>>>>> >> >>
>>>>> >> >> Please do not rush to merge the PRs that are not fully reviewed.
>>>>> We can cut the branch this Friday and continue merging the PRs that have
>>>>> been discussed in this thread. Does that make sense?
>>>>> >> >>
>>>>> >> >> Xiao
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>>> >> >>>
>>>>> >> >>> May I suggest we push out one week (22nd) just to give everyone
>>>>> a bit of breathing space? Rushed software development more often results in
>>>>> bugs.
>>>>> >> >>>
>>>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <
>>>>> yikunkero@gmail.com> wrote:
>>>>> >> >>>>
>>>>> >> >>>> > To make our release time more predictable, let us collect
>>>>> the PRs and wait three more days before the branch cut?
>>>>> >> >>>>
>>>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>>> >> >>>>
>>>>> >> >>>> Three more days are OK for this from my view.
>>>>> >> >>>>
>>>>> >> >>>> Regards,
>>>>> >> >>>> Yikun
>>>>> >> >>>
>>>>> >> >>> --
>>>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>>> https://amzn.to/2MaRAG9
>>>>> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>
>>>>

Re: Apache Spark 3.3 Release

Posted by Xiao Li <ga...@gmail.com>.

Please do not get me wrong. If we don't cut a branch, we are allowing all
patches to land Apache Spark 3.3. That is totally fine. After we cut the
branch, we should avoid merging the feature work. In the next three days,
let us collect the actively developed PRs that we want to make an exception
(i.e., merged to 3.3 after the upcoming branch cut). Does that make sense?

Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 14:54写道：

> Xiao. You are working against what you are saying.
> If you don't cut a branch, it means you are allowing all patches to land
> Apache Spark 3.3. No?
>
> > we need to avoid backporting the feature work that are not being well
> discussed.
>
>
>
> On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:
>
>> Cutting the branch is simple, but we need to avoid backporting the
>> feature work that are not being well discussed. Not all the members are
>> actively following the dev list. I think we should wait 3 more days for
>> collecting the PR list before cutting the branch.
>>
>> BTW, there are very few 3.4-only feature work that will be affected.
>>
>> Xiao
>>
>> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>>
>>> Hi, Max, Chao, Xiao, Holden and all.
>>>
>>> I have a different idea.
>>>
>>> Given the situation and small patch list, I don't think we need to
>>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>>> and allow backporting.
>>>
>>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>>> branch together. This situation only becomes worse and worse because there
>>> is no way to block the other patches from landing unintentionally if we
>>> don't cut a branch.
>>>
>>>     [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>>>
>>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>>
>>> Best,
>>> Dongjoon.
>>>
>>>
>>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
>>>
>>>> Cool, thanks for clarifying!
>>>>
>>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
>>>> >>
>>>> >> For the following list:
>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>> vectorized reader
>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>> >
>>>> >
>>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>>> >
>>>> >
>>>> >
>>>> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>>> >>
>>>> >> Hi Xiao,
>>>> >>
>>>> >> For the following list:
>>>> >>
>>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>>> vectorized reader
>>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>>> >>
>>>> >> Do you mean we should include them, or exclude them from 3.3?
>>>> >>
>>>> >> Thanks,
>>>> >> Chao
>>>> >>
>>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>>> dongjoon.hyun@gmail.com> wrote:
>>>> >> >
>>>> >> > The following was tested and merged a few minutes ago. So, we can
>>>> remove it from the list.
>>>> >> >
>>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Dongjoon.
>>>> >> >
>>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>>> wrote:
>>>> >> >>
>>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days
>>>> to collect the list of actively developed PRs that we want to merge to 3.3
>>>> after the branch cut?
>>>> >> >>
>>>> >> >> Please do not rush to merge the PRs that are not fully reviewed.
>>>> We can cut the branch this Friday and continue merging the PRs that have
>>>> been discussed in this thread. Does that make sense?
>>>> >> >>
>>>> >> >> Xiao
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>> >> >>>
>>>> >> >>> May I suggest we push out one week (22nd) just to give everyone
>>>> a bit of breathing space? Rushed software development more often results in
>>>> bugs.
>>>> >> >>>
>>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com>
>>>> wrote:
>>>> >> >>>>
>>>> >> >>>> > To make our release time more predictable, let us collect the
>>>> PRs and wait three more days before the branch cut?
>>>> >> >>>>
>>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>> >> >>>>
>>>> >> >>>> Three more days are OK for this from my view.
>>>> >> >>>>
>>>> >> >>>> Regards,
>>>> >> >>>> Yikun
>>>> >> >>>
>>>> >> >>> --
>>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9
>>>> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>

Re: Apache Spark 3.3 Release

Posted by Dongjoon Hyun <do...@gmail.com>.

Xiao. You are working against what you are saying.
If you don't cut a branch, it means you are allowing all patches to land
Apache Spark 3.3. No?

> we need to avoid backporting the feature work that are not being well
discussed.



On Tue, Mar 15, 2022 at 12:12 PM Xiao Li <ga...@gmail.com> wrote:

> Cutting the branch is simple, but we need to avoid backporting the feature
> work that are not being well discussed. Not all the members are actively
> following the dev list. I think we should wait 3 more days for collecting
> the PR list before cutting the branch.
>
> BTW, there are very few 3.4-only feature work that will be affected.
>
> Xiao
>
> Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：
>
>> Hi, Max, Chao, Xiao, Holden and all.
>>
>> I have a different idea.
>>
>> Given the situation and small patch list, I don't think we need to
>> postpone the branch cut for those patches. It's easier to cut a branch-3.3
>> and allow backporting.
>>
>> As of today, we already have an obvious Apache Spark 3.4 patch in the
>> branch together. This situation only becomes worse and worse because there
>> is no way to block the other patches from landing unintentionally if we
>> don't cut a branch.
>>
>>     [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>>
>> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>>
>> Best,
>> Dongjoon.
>>
>>
>> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
>>
>>> Cool, thanks for clarifying!
>>>
>>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
>>> >>
>>> >> For the following list:
>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >> Do you mean we should include them, or exclude them from 3.3?
>>> >
>>> >
>>> > If possible, I hope these features can be shipped with Spark 3.3.
>>> >
>>> >
>>> >
>>> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>> >>
>>> >> Hi Xiao,
>>> >>
>>> >> For the following list:
>>> >>
>>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet
>>> vectorized reader
>>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>> >>
>>> >> Do you mean we should include them, or exclude them from 3.3?
>>> >>
>>> >> Thanks,
>>> >> Chao
>>> >>
>>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <
>>> dongjoon.hyun@gmail.com> wrote:
>>> >> >
>>> >> > The following was tested and merged a few minutes ago. So, we can
>>> remove it from the list.
>>> >> >
>>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> >
>>> >> > Thanks,
>>> >> > Dongjoon.
>>> >> >
>>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>>> wrote:
>>> >> >>
>>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days
>>> to collect the list of actively developed PRs that we want to merge to 3.3
>>> after the branch cut?
>>> >> >>
>>> >> >> Please do not rush to merge the PRs that are not fully reviewed.
>>> We can cut the branch this Friday and continue merging the PRs that have
>>> been discussed in this thread. Does that make sense?
>>> >> >>
>>> >> >> Xiao
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>> >> >>>
>>> >> >>> May I suggest we push out one week (22nd) just to give everyone a
>>> bit of breathing space? Rushed software development more often results in
>>> bugs.
>>> >> >>>
>>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com>
>>> wrote:
>>> >> >>>>
>>> >> >>>> > To make our release time more predictable, let us collect the
>>> PRs and wait three more days before the branch cut?
>>> >> >>>>
>>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> >> >>>>
>>> >> >>>> Three more days are OK for this from my view.
>>> >> >>>>
>>> >> >>>> Regards,
>>> >> >>>> Yikun
>>> >> >>>
>>> >> >>> --
>>> >> >>> Twitter: https://twitter.com/holdenkarau
>>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>

Re: Apache Spark 3.3 Release

Posted by Xiao Li <ga...@gmail.com>.

Cutting the branch is simple, but we need to avoid backporting the feature
work that are not being well discussed. Not all the members are actively
following the dev list. I think we should wait 3 more days for collecting
the PR list before cutting the branch.

BTW, there are very few 3.4-only feature work that will be affected.

Xiao

Dongjoon Hyun <do...@gmail.com> 于2022年3月15日周二 11:49写道：

> Hi, Max, Chao, Xiao, Holden and all.
>
> I have a different idea.
>
> Given the situation and small patch list, I don't think we need to
> postpone the branch cut for those patches. It's easier to cut a branch-3.3
> and allow backporting.
>
> As of today, we already have an obvious Apache Spark 3.4 patch in the
> branch together. This situation only becomes worse and worse because there
> is no way to block the other patches from landing unintentionally if we
> don't cut a branch.
>
>     [SPARK-38335][SQL] Implement parser support for DEFAULT column values
>
> Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.
>
> Best,
> Dongjoon.
>
>
> On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:
>
>> Cool, thanks for clarifying!
>>
>> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
>> >>
>> >> For the following list:
>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
>> reader
>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >> Do you mean we should include them, or exclude them from 3.3?
>> >
>> >
>> > If possible, I hope these features can be shipped with Spark 3.3.
>> >
>> >
>> >
>> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>> >>
>> >> Hi Xiao,
>> >>
>> >> For the following list:
>> >>
>> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
>> reader
>> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> >>
>> >> Do you mean we should include them, or exclude them from 3.3?
>> >>
>> >> Thanks,
>> >> Chao
>> >>
>> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>> >> >
>> >> > The following was tested and merged a few minutes ago. So, we can
>> remove it from the list.
>> >> >
>> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> >
>> >> > Thanks,
>> >> > Dongjoon.
>> >> >
>> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com>
>> wrote:
>> >> >>
>> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days
>> to collect the list of actively developed PRs that we want to merge to 3.3
>> after the branch cut?
>> >> >>
>> >> >> Please do not rush to merge the PRs that are not fully reviewed. We
>> can cut the branch this Friday and continue merging the PRs that have been
>> discussed in this thread. Does that make sense?
>> >> >>
>> >> >> Xiao
>> >> >>
>> >> >>
>> >> >>
>> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>> >> >>>
>> >> >>> May I suggest we push out one week (22nd) just to give everyone a
>> bit of breathing space? Rushed software development more often results in
>> bugs.
>> >> >>>
>> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com>
>> wrote:
>> >> >>>>
>> >> >>>> > To make our release time more predictable, let us collect the
>> PRs and wait three more days before the branch cut?
>> >> >>>>
>> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >> >>>>
>> >> >>>> Three more days are OK for this from my view.
>> >> >>>>
>> >> >>>> Regards,
>> >> >>>> Yikun
>> >> >>>
>> >> >>> --
>> >> >>> Twitter: https://twitter.com/holdenkarau
>> >> >>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

Re: Apache Spark 3.3 Release

Posted by Dongjoon Hyun <do...@gmail.com>.

Hi, Max, Chao, Xiao, Holden and all.

I have a different idea.

Given the situation and small patch list, I don't think we need to postpone
the branch cut for those patches. It's easier to cut a branch-3.3 and allow
backporting.

As of today, we already have an obvious Apache Spark 3.4 patch in the
branch together. This situation only becomes worse and worse because there
is no way to block the other patches from landing unintentionally if we
don't cut a branch.

    [SPARK-38335][SQL] Implement parser support for DEFAULT column values

Let's cut `branch-3.3` Today for Apache Spark 3.3.0 preparation.

Best,
Dongjoon.


On Tue, Mar 15, 2022 at 10:17 AM Chao Sun <su...@apache.org> wrote:

> Cool, thanks for clarifying!
>
> On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
> >>
> >> For the following list:
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >> Do you mean we should include them, or exclude them from 3.3?
> >
> >
> > If possible, I hope these features can be shipped with Spark 3.3.
> >
> >
> >
> > Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
> >>
> >> Hi Xiao,
> >>
> >> For the following list:
> >>
> >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> >> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> >>
> >> Do you mean we should include them, or exclude them from 3.3?
> >>
> >> Thanks,
> >> Chao
> >>
> >> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
> >> >
> >> > The following was tested and merged a few minutes ago. So, we can
> remove it from the list.
> >> >
> >> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> >
> >> > Thanks,
> >> > Dongjoon.
> >> >
> >> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com> wrote:
> >> >>
> >> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to
> collect the list of actively developed PRs that we want to merge to 3.3
> after the branch cut?
> >> >>
> >> >> Please do not rush to merge the PRs that are not fully reviewed. We
> can cut the branch this Friday and continue merging the PRs that have been
> discussed in this thread. Does that make sense?
> >> >>
> >> >> Xiao
> >> >>
> >> >>
> >> >>
> >> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
> >> >>>
> >> >>> May I suggest we push out one week (22nd) just to give everyone a
> bit of breathing space? Rushed software development more often results in
> bugs.
> >> >>>
> >> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com>
> wrote:
> >> >>>>
> >> >>>> > To make our release time more predictable, let us collect the
> PRs and wait three more days before the branch cut?
> >> >>>>
> >> >>>> For SPIP: Support Customized Kubernetes Schedulers:
> >> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >> >>>>
> >> >>>> Three more days are OK for this from my view.
> >> >>>>
> >> >>>> Regards,
> >> >>>> Yikun
> >> >>>
> >> >>> --
> >> >>> Twitter: https://twitter.com/holdenkarau
> >> >>> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: Apache Spark 3.3 Release

Posted by Chao Sun <su...@apache.org>.

Cool, thanks for clarifying!

On Tue, Mar 15, 2022 at 10:11 AM Xiao Li <ga...@gmail.com> wrote:
>>
>> For the following list:
>> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>> Do you mean we should include them, or exclude them from 3.3?
>
>
> If possible, I hope these features can be shipped with Spark 3.3.
>
>
>
> Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：
>>
>> Hi Xiao,
>>
>> For the following list:
>>
>> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
>> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
>> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>>
>> Do you mean we should include them, or exclude them from 3.3?
>>
>> Thanks,
>> Chao
>>
>> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <do...@gmail.com> wrote:
>> >
>> > The following was tested and merged a few minutes ago. So, we can remove it from the list.
>> >
>> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com> wrote:
>> >>
>> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to collect the list of actively developed PRs that we want to merge to 3.3 after the branch cut?
>> >>
>> >> Please do not rush to merge the PRs that are not fully reviewed. We can cut the branch this Friday and continue merging the PRs that have been discussed in this thread. Does that make sense?
>> >>
>> >> Xiao
>> >>
>> >>
>> >>
>> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>> >>>
>> >>> May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs.
>> >>>
>> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com> wrote:
>> >>>>
>> >>>> > To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut?
>> >>>>
>> >>>> For SPIP: Support Customized Kubernetes Schedulers:
>> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> >>>>
>> >>>> Three more days are OK for this from my view.
>> >>>>
>> >>>> Regards,
>> >>>> Yikun
>> >>>
>> >>> --
>> >>> Twitter: https://twitter.com/holdenkarau
>> >>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Apache Spark 3.3 Release

Posted by Xiao Li <ga...@gmail.com>.

>
> For the following list:
> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> #35848 [SPARK-38548][SQL] New SQL function: try_sum
> Do you mean we should include them, or exclude them from 3.3?


If possible, I hope these features can be shipped with Spark 3.3.



Chao Sun <su...@apache.org> 于2022年3月15日周二 10:06写道：

> Hi Xiao,
>
> For the following list:
>
> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
> reader
> #35848 [SPARK-38548][SQL] New SQL function: try_sum
>
> Do you mean we should include them, or exclude them from 3.3?
>
> Thanks,
> Chao
>
> On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
> >
> > The following was tested and merged a few minutes ago. So, we can remove
> it from the list.
> >
> > #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >
> > Thanks,
> > Dongjoon.
> >
> > On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com> wrote:
> >>
> >> Let me clarify my above suggestion. Maybe we can wait 3 more days to
> collect the list of actively developed PRs that we want to merge to 3.3
> after the branch cut?
> >>
> >> Please do not rush to merge the PRs that are not fully reviewed. We can
> cut the branch this Friday and continue merging the PRs that have been
> discussed in this thread. Does that make sense?
> >>
> >> Xiao
> >>
> >>
> >>
> >> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
> >>>
> >>> May I suggest we push out one week (22nd) just to give everyone a bit
> of breathing space? Rushed software development more often results in bugs.
> >>>
> >>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com>
> wrote:
> >>>>
> >>>> > To make our release time more predictable, let us collect the PRs
> and wait three more days before the branch cut?
> >>>>
> >>>> For SPIP: Support Customized Kubernetes Schedulers:
> >>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> >>>>
> >>>> Three more days are OK for this from my view.
> >>>>
> >>>> Regards,
> >>>> Yikun
> >>>
> >>> --
> >>> Twitter: https://twitter.com/holdenkarau
> >>> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: Apache Spark 3.3 Release

Posted by Chao Sun <su...@apache.org>.

Hi Xiao,

For the following list:

#35789 [SPARK-32268][SQL] Row-level Runtime Filtering
#34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
#35848 [SPARK-38548][SQL] New SQL function: try_sum

Do you mean we should include them, or exclude them from 3.3?

Thanks,
Chao

On Tue, Mar 15, 2022 at 9:56 AM Dongjoon Hyun <do...@gmail.com> wrote:
>
> The following was tested and merged a few minutes ago. So, we can remove it from the list.
>
> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>
> Thanks,
> Dongjoon.
>
> On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com> wrote:
>>
>> Let me clarify my above suggestion. Maybe we can wait 3 more days to collect the list of actively developed PRs that we want to merge to 3.3 after the branch cut?
>>
>> Please do not rush to merge the PRs that are not fully reviewed. We can cut the branch this Friday and continue merging the PRs that have been discussed in this thread. Does that make sense?
>>
>> Xiao
>>
>>
>>
>> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>>>
>>> May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs.
>>>
>>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com> wrote:
>>>>
>>>> > To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut?
>>>>
>>>> For SPIP: Support Customized Kubernetes Schedulers:
>>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>>>
>>>> Three more days are OK for this from my view.
>>>>
>>>> Regards,
>>>> Yikun
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Apache Spark 3.3 Release

Posted by Dongjoon Hyun <do...@gmail.com>.

The following was tested and merged a few minutes ago. So, we can remove it
from the list.

#35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
<https://github.com/apache/spark/pull/35819>

Thanks,
Dongjoon.

On Tue, Mar 15, 2022 at 9:48 AM Xiao Li <ga...@gmail.com> wrote:

> Let me clarify my above suggestion. Maybe we can wait 3 more days to
> collect the list of actively developed PRs that we want to merge to 3.3
> after the branch cut?
>
> Please do not rush to merge the PRs that are not fully reviewed. We can
> cut the branch this Friday and continue merging the PRs that have been
> discussed in this thread. Does that make sense?
>
> Xiao
>
>
>
>
> Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：
>
>> May I suggest we push out one week (22nd) just to give everyone a bit of
>> breathing space? Rushed software development more often results in bugs.
>>
>> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com> wrote:
>>
>>> > To make our release time more predictable, let us collect the PRs and
>>> wait three more days before the branch cut?
>>>
>>> For SPIP: Support Customized Kubernetes Schedulers:
>>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>>> <https://github.com/apache/spark/pull/35819>
>>>
>>> Three more days are OK for this from my view.
>>>
>>> Regards,
>>> Yikun
>>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

Re: Apache Spark 3.3 Release

Posted by Xiao Li <ga...@gmail.com>.

Let me clarify my above suggestion. Maybe we can wait 3 more days to
collect the list of actively developed PRs that we want to merge to 3.3
after the branch cut?

Please do not rush to merge the PRs that are not fully reviewed. We can cut
the branch this Friday and continue merging the PRs that have been
discussed in this thread. Does that make sense?

Xiao




Holden Karau <ho...@pigscanfly.ca> 于2022年3月15日周二 09:10写道：

> May I suggest we push out one week (22nd) just to give everyone a bit of
> breathing space? Rushed software development more often results in bugs.
>
> On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com> wrote:
>
>> > To make our release time more predictable, let us collect the PRs and
>> wait three more days before the branch cut?
>>
>> For SPIP: Support Customized Kubernetes Schedulers:
>> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
>> <https://github.com/apache/spark/pull/35819>
>>
>> Three more days are OK for this from my view.
>>
>> Regards,
>> Yikun
>>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: Apache Spark 3.3 Release

Posted by Holden Karau <ho...@pigscanfly.ca>.

May I suggest we push out one week (22nd) just to give everyone a bit of
breathing space? Rushed software development more often results in bugs.

On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang <yi...@gmail.com> wrote:

> > To make our release time more predictable, let us collect the PRs and
> wait three more days before the branch cut?
>
> For SPIP: Support Customized Kubernetes Schedulers:
> #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
> <https://github.com/apache/spark/pull/35819>
>
> Three more days are OK for this from my view.
>
> Regards,
> Yikun
>
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: Apache Spark 3.3 Release

Posted by Yikun Jiang <yi...@gmail.com>.

> To make our release time more predictable, let us collect the PRs and
wait three more days before the branch cut?

For SPIP: Support Customized Kubernetes Schedulers:
#35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1
<https://github.com/apache/spark/pull/35819>

Three more days are OK for this from my view.

Regards,
Yikun

Re: Apache Spark 3.3 Release

Posted by Xiao Li <ga...@gmail.com>.

To make our release time more predictable, let us collect the PRs and wait
three more days before the branch cut?

Please list all the actively developed feature work we plan to release with
Spark 3.3? We should avoid merging any new feature work that is not being
discussed in this email thread. Below is my list

   - #35789 [SPARK-32268][SQL] Row-level Runtime Filtering
   <https://github.com/apache/spark/pull/35789>
   - #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized
   reader <https://github.com/apache/spark/pull/34659>
   - #35848 [SPARK-38548][SQL] New SQL function: try_sum
   <https://github.com/apache/spark/pull/35848>




Chao Sun <su...@apache.org> 于2022年3月14日周一 21:17写道：

> I mainly mean:
>
>   - [SPARK-35801] Row-level operations in Data Source V2
>   - [SPARK-37166] Storage Partitioned Join
>
> For which the PR:
>
> - https://github.com/apache/spark/pull/35395
> - https://github.com/apache/spark/pull/35657
>
> are actively being reviewed. It seems there are ongoing PRs for other
> SPIPs as well but I'm not involved in those so not quite sure whether
> they are intended for 3.3 release.
>
> Chao
>
>
> Chao
>
> On Mon, Mar 14, 2022 at 8:53 PM Xiao Li <ga...@gmail.com> wrote:
> >
> > Could you please list which features we want to finish before the branch
> cut? How long will they take?
> >
> > Xiao
> >
> > Chao Sun <su...@apache.org> 于2022年3月14日周一 13:30写道：
> >>
> >> Hi Max,
> >>
> >> As there are still some ongoing work for the above listed SPIPs, can we
> still merge them after the branch cut?
> >>
> >> Thanks,
> >> Chao
> >>
> >> On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk <ma...@databricks.com.invalid>
> wrote:
> >>>
> >>> Hi All,
> >>>
> >>> Since there are no actual blockers for Spark 3.3.0 and significant
> objections, I am going to cut branch-3.3 after 15th March at 00:00 PST.
> Please, let us know if you have any concerns about that.
> >>>
> >>> Best regards,
> >>> Max Gekk
> >>>
> >>>
> >>> On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk <ma...@databricks.com>
> wrote:
> >>>>
> >>>> Hello All,
> >>>>
> >>>> I would like to bring on the table the theme about the new Spark
> release 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html, we planned to start the
> code freeze and release branch cut on March 15th, 2022. Since this date is
> coming soon, I would like to take your attention on the topic and gather
> objections that you might have.
> >>>>
> >>>> Bellow is the list of ongoing and active SPIPs:
> >>>>
> >>>> Spark SQL:
> >>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
> >>>> - [SPARK-35801] Row-level operations in Data Source V2
> >>>> - [SPARK-37166] Storage Partitioned Join
> >>>>
> >>>> Spark Core:
> >>>> - [SPARK-20624] Add better handling for node shutdown
> >>>> - [SPARK-25299] Use remote storage for persisting shuffle data
> >>>>
> >>>> PySpark:
> >>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
> >>>>
> >>>> Kubernetes:
> >>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
> >>>>
> >>>> Probably, we should finish if there are any remaining works for Spark
> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
> would like to volunteer to help drive this process.
> >>>>
> >>>> Best regards,
> >>>> Max Gekk
>

Re: Apache Spark 3.3 Release

Posted by Chao Sun <su...@apache.org>.

I mainly mean:

  - [SPARK-35801] Row-level operations in Data Source V2
  - [SPARK-37166] Storage Partitioned Join

For which the PR:

- https://github.com/apache/spark/pull/35395
- https://github.com/apache/spark/pull/35657

are actively being reviewed. It seems there are ongoing PRs for other
SPIPs as well but I'm not involved in those so not quite sure whether
they are intended for 3.3 release.

Chao


Chao

On Mon, Mar 14, 2022 at 8:53 PM Xiao Li <ga...@gmail.com> wrote:
>
> Could you please list which features we want to finish before the branch cut? How long will they take?
>
> Xiao
>
> Chao Sun <su...@apache.org> 于2022年3月14日周一 13:30写道：
>>
>> Hi Max,
>>
>> As there are still some ongoing work for the above listed SPIPs, can we still merge them after the branch cut?
>>
>> Thanks,
>> Chao
>>
>> On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk <ma...@databricks.com.invalid> wrote:
>>>
>>> Hi All,
>>>
>>> Since there are no actual blockers for Spark 3.3.0 and significant objections, I am going to cut branch-3.3 after 15th March at 00:00 PST. Please, let us know if you have any concerns about that.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk <ma...@databricks.com> wrote:
>>>>
>>>> Hello All,
>>>>
>>>> I would like to bring on the table the theme about the new Spark release 3.3. According to the public schedule at https://spark.apache.org/versioning-policy.html, we planned to start the code freeze and release branch cut on March 15th, 2022. Since this date is coming soon, I would like to take your attention on the topic and gather objections that you might have.
>>>>
>>>> Bellow is the list of ongoing and active SPIPs:
>>>>
>>>> Spark SQL:
>>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>>> - [SPARK-35801] Row-level operations in Data Source V2
>>>> - [SPARK-37166] Storage Partitioned Join
>>>>
>>>> Spark Core:
>>>> - [SPARK-20624] Add better handling for node shutdown
>>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>>
>>>> PySpark:
>>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>>
>>>> Kubernetes:
>>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>>
>>>> Probably, we should finish if there are any remaining works for Spark 3.3, and switch to QA mode, cut a branch and keep everything on track. I would like to volunteer to help drive this process.
>>>>
>>>> Best regards,
>>>> Max Gekk

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Apache Spark 3.3 Release

Posted by Holden Karau <ho...@pigscanfly.ca>.

On Mon, Mar 14, 2022 at 11:53 PM Xiao Li <ga...@gmail.com> wrote:

> Could you please list which features we want to finish before the branch
> cut? How long will they take?
>
> Xiao
>
> Chao Sun <su...@apache.org> 于2022年3月14日周一 13:30写道：
>
>> Hi Max,
>>
>> As there are still some ongoing work for the above listed SPIPs, can we
>> still merge them after the branch cut?
>>
> In the past we’ve allowed merges for actively developed PRs post branch
cut, but it is easier when it doesn’t need to be cherry picked (eg pre cut).

>
>> Thanks,
>> Chao
>>
>> On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk
>> <ma...@databricks.com.invalid> wrote:
>>
>>> Hi All,
>>>
>>> Since there are no actual blockers for Spark 3.3.0 and significant
>>> objections, I am going to cut branch-3.3 after 15th March at 00:00 PST.
>>> Please, let us know if you have any concerns about that.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>>
>>> On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk <ma...@databricks.com>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>> I would like to bring on the table the theme about the new Spark
>>>> release 3.3. According to the public schedule at
>>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>>> the code freeze and release branch cut on March 15th, 2022. Since this date
>>>> is coming soon, I would like to take your attention on the topic and gather
>>>> objections that you might have.
>>>>
>>>> Bellow is the list of ongoing and active SPIPs:
>>>>
>>>> Spark SQL:
>>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>>> - [SPARK-35801] Row-level operations in Data Source V2
>>>> - [SPARK-37166] Storage Partitioned Join
>>>>
>>>> Spark Core:
>>>> - [SPARK-20624] Add better handling for node shutdown
>>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>>
>>>> PySpark:
>>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>>
>>>> Kubernetes:
>>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>>
>>>> Probably, we should finish if there are any remaining works for Spark
>>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>>> would like to volunteer to help drive this process.
>>>>
>>>> Best regards,
>>>> Max Gekk
>>>>
>>> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: Apache Spark 3.3 Release

Posted by Xiao Li <ga...@gmail.com>.

Could you please list which features we want to finish before the branch
cut? How long will they take?

Xiao

Chao Sun <su...@apache.org> 于2022年3月14日周一 13:30写道：

> Hi Max,
>
> As there are still some ongoing work for the above listed SPIPs, can we
> still merge them after the branch cut?
>
> Thanks,
> Chao
>
> On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk
> <ma...@databricks.com.invalid> wrote:
>
>> Hi All,
>>
>> Since there are no actual blockers for Spark 3.3.0 and significant
>> objections, I am going to cut branch-3.3 after 15th March at 00:00 PST.
>> Please, let us know if you have any concerns about that.
>>
>> Best regards,
>> Max Gekk
>>
>>
>> On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk <ma...@databricks.com>
>> wrote:
>>
>>> Hello All,
>>>
>>> I would like to bring on the table the theme about the new Spark release
>>> 3.3. According to the public schedule at
>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>> the code freeze and release branch cut on March 15th, 2022. Since this date
>>> is coming soon, I would like to take your attention on the topic and gather
>>> objections that you might have.
>>>
>>> Bellow is the list of ongoing and active SPIPs:
>>>
>>> Spark SQL:
>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>> - [SPARK-35801] Row-level operations in Data Source V2
>>> - [SPARK-37166] Storage Partitioned Join
>>>
>>> Spark Core:
>>> - [SPARK-20624] Add better handling for node shutdown
>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>
>>> PySpark:
>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>
>>> Kubernetes:
>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>
>>> Probably, we should finish if there are any remaining works for Spark
>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>> would like to volunteer to help drive this process.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>

Re: Apache Spark 3.3 Release

Posted by Chao Sun <su...@apache.org>.

Hi Max,

As there are still some ongoing work for the above listed SPIPs, can we
still merge them after the branch cut?

Thanks,
Chao

On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk
<ma...@databricks.com.invalid> wrote:

> Hi All,
>
> Since there are no actual blockers for Spark 3.3.0 and significant
> objections, I am going to cut branch-3.3 after 15th March at 00:00 PST.
> Please, let us know if you have any concerns about that.
>
> Best regards,
> Max Gekk
>
>
> On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk <ma...@databricks.com>
> wrote:
>
>> Hello All,
>>
>> I would like to bring on the table the theme about the new Spark release
>> 3.3. According to the public schedule at
>> https://spark.apache.org/versioning-policy.html, we planned to start the
>> code freeze and release branch cut on March 15th, 2022. Since this date is
>> coming soon, I would like to take your attention on the topic and gather
>> objections that you might have.
>>
>> Bellow is the list of ongoing and active SPIPs:
>>
>> Spark SQL:
>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>> - [SPARK-35801] Row-level operations in Data Source V2
>> - [SPARK-37166] Storage Partitioned Join
>>
>> Spark Core:
>> - [SPARK-20624] Add better handling for node shutdown
>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>
>> PySpark:
>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>
>> Kubernetes:
>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>
>> Probably, we should finish if there are any remaining works for Spark
>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>> would like to volunteer to help drive this process.
>>
>> Best regards,
>> Max Gekk
>>
>

Re: Apache Spark 3.3 Release

Posted by Maxim Gekk <ma...@databricks.com.INVALID>.

Hi All,

Since there are no actual blockers for Spark 3.3.0 and significant
objections, I am going to cut branch-3.3 after 15th March at 00:00 PST.
Please, let us know if you have any concerns about that.

Best regards,
Max Gekk


On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk <ma...@databricks.com> wrote:

> Hello All,
>
> I would like to bring on the table the theme about the new Spark release
> 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html, we planned to start the
> code freeze and release branch cut on March 15th, 2022. Since this date is
> coming soon, I would like to take your attention on the topic and gather
> objections that you might have.
>
> Bellow is the list of ongoing and active SPIPs:
>
> Spark SQL:
> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
> - [SPARK-35801] Row-level operations in Data Source V2
> - [SPARK-37166] Storage Partitioned Join
>
> Spark Core:
> - [SPARK-20624] Add better handling for node shutdown
> - [SPARK-25299] Use remote storage for persisting shuffle data
>
> PySpark:
> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>
> Kubernetes:
> - [SPARK-36057] Support Customized Kubernetes Schedulers
>
> Probably, we should finish if there are any remaining works for Spark 3.3,
> and switch to QA mode, cut a branch and keep everything on track. I would
> like to volunteer to help drive this process.
>
> Best regards,
> Max Gekk
>

Re: Apache Spark 3.3 Release

Posted by Maciej <ms...@gmail.com>.

Ideally, we should complete these

- [SPARK-37093] Inline type hints python/pyspark/streaming
- [SPARK-37395] Inline type hint files for files in python/pyspark/ml
- [SPARK-37396] Inline type hint files for files in python/pyspark/mllib

All tasks have either PR in progress or someone working on a one, so the
the limiting factor is our ability to review these.

On 3/3/22 19:44, Maxim Gekk wrote:
> Hello All,
> 
> I would like to bring on the table the theme about the new Spark release
> 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html
> <https://spark.apache.org/versioning-policy.html>, we planned to start
> the code freeze and release branch cut on March 15th, 2022. Since this
> date is coming soon, I would like to take your attention on the topic
> and gather objections that you might have.
> 
> Bellow is the list of ongoing and active SPIPs:
> 
> Spark SQL:
> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
> - [SPARK-35801] Row-level operations in Data Source V2
> - [SPARK-37166] Storage Partitioned Join
> 
> Spark Core:
> - [SPARK-20624] Add better handling for node shutdown
> - [SPARK-25299] Use remote storage for persisting shuffle data
> 
> PySpark:
> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
> 
> Kubernetes:
> - [SPARK-36057] Support Customized Kubernetes Schedulers
> 
> Probably, we should finish if there are any remaining works for Spark
> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
> would like to volunteer to help drive this process.
> 
> Best regards,
> Max Gekk


-- 
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
PGP: A30CEF0C31A501EC

Re: Apache Spark 3.3 Release

Posted by Jacky Lee <qc...@gmail.com>.

I also have a PR that has been ready to merge for a while, can we merge in
3.3.0?
[SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics
https://github.com/apache/spark/pull/35185

beliefer <be...@163.com> 于2022年3月16日周三 21:33写道：

> +1 Glad to see we will release 3.3.0.
>
>
> At 2022-03-04 02:44:37, "Maxim Gekk" <ma...@databricks.com.INVALID>
> wrote:
>
> Hello All,
>
> I would like to bring on the table the theme about the new Spark release
> 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html, we planned to start the
> code freeze and release branch cut on March 15th, 2022. Since this date is
> coming soon, I would like to take your attention on the topic and gather
> objections that you might have.
>
> Bellow is the list of ongoing and active SPIPs:
>
> Spark SQL:
> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
> - [SPARK-35801] Row-level operations in Data Source V2
> - [SPARK-37166] Storage Partitioned Join
>
> Spark Core:
> - [SPARK-20624] Add better handling for node shutdown
> - [SPARK-25299] Use remote storage for persisting shuffle data
>
> PySpark:
> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>
> Kubernetes:
> - [SPARK-36057] Support Customized Kubernetes Schedulers
>
> Probably, we should finish if there are any remaining works for Spark 3.3,
> and switch to QA mode, cut a branch and keep everything on track. I would
> like to volunteer to help drive this process.
>
> Best regards,
> Max Gekk
>
>
>
>
>

Re:Apache Spark 3.3 Release

Posted by beliefer <be...@163.com>.

+1 Glad to see we will release 3.3.0.




At 2022-03-04 02:44:37, "Maxim Gekk" <ma...@databricks.com.INVALID> wrote:

Hello All,

I would like to bring on the table the theme about the new Spark release 3.3. According to the public schedule at https://spark.apache.org/versioning-policy.html, we planned to start the code freeze and release branch cut on March 15th, 2022. Since this date is coming soon, I would like to take your attention on the topic and gather objections that you might have.

Bellow is the list of ongoing and active SPIPs:

Spark SQL:
- [SPARK-31357] DataSourceV2: Catalog API for view metadata
- [SPARK-35801] Row-level operations in Data Source V2
- [SPARK-37166] Storage Partitioned Join

Spark Core:
- [SPARK-20624] Add better handling for node shutdown
- [SPARK-25299] Use remote storage for persisting shuffle data

PySpark:
- [SPARK-26413] RDD Arrow Support in Spark Core and PySpark

Kubernetes:
- [SPARK-36057] Support Customized Kubernetes Schedulers

Probably, we should finish if there are any remaining works for Spark 3.3, and switch to QA mode, cut a branch and keep everything on track. I would like to volunteer to help drive this process.



Best regards,
Max Gekk