You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by huaxin gao <hu...@gmail.com> on 2022/01/21 03:59:23 UTC

[VOTE] Release Spark 3.2.1 (RC2)

Please vote on releasing the following candidate as Apache Spark version
3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
package because ... To learn more about Apache Spark, please see
http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
4f25b3f71238a00508a356591553f2dfa89f8290):
https://github.com/apache/spark/tree/v3.2.1-rc2
The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1398/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
The list of bug fixes going into 3.2.1 can be found at the following URL:
https://s.apache.org/yu0cy

This release is using the release script of the tag v3.2.1-rc2. FAQ
========================= How can I help test this release?
========================= If you are a Spark user, you can help us test
this release by taking an existing Spark workload and running on this
release candidate, then reporting any regressions. If you're working in
PySpark you can set up a virtual env and install the current RC and see if
anything important breaks, in the Java/Scala you can add the staging
repository to your projects resolvers and test with the RC (make sure to
clean up the artifact cache before/after so you don't end up building with
a out of date RC going forward).
=========================================== What should happen to JIRA
tickets still targeting 3.2.1? ===========================================
The current list of open tickets targeted at 3.2.1 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.2.1 Committers should look at those and triage. Extremely
important bug fixes, documentation, and API tweaks that impact
compatibility should be worked on immediately. Everything else please
retarget to an appropriate release. ================== But my bug isn't
fixed? ================== In order to make timely releases, we will
typically not hold the release unless the bug in question is a regression
from the previous release. That being said, if there is something which is
a regression that has not been correctly targeted please ping me or a
committer to help target the issue.

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Yuming Wang <wg...@gmail.com>.

+1 (non-binding)

On Tue, Jan 25, 2022 at 12:44 PM Wenchen Fan <cl...@gmail.com> wrote:

> +1
>
> On Tue, Jan 25, 2022 at 10:13 AM Ruifeng Zheng <ru...@foxmail.com>
> wrote:
>
>> +1 (non-binding)
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Kent Yao" <ya...@apache.org>;
>> *发送时间:* 2022年1月25日(星期二) 上午10:09
>> *收件人:* "John Zhuge"<jz...@apache.org>;
>> *抄送:* "dev"<de...@spark.apache.org>;
>> *主题:* Re: [VOTE] Release Spark 3.2.1 (RC2)
>>
>> +1， non-binding
>>
>> John Zhuge <jz...@apache.org> 于2022年1月25日周二 06:56写道：
>>
>>> +1 (non-binding)
>>>
>>> On Mon, Jan 24, 2022 at 2:28 PM Cheng Su <ch...@fb.com.invalid> wrote:
>>>
>>>> +1 (non-binding)
>>>>
>>>>
>>>>
>>>> Cheng Su
>>>>
>>>>
>>>>
>>>> *From: *Chao Sun <su...@apache.org>
>>>> *Date: *Monday, January 24, 2022 at 2:10 PM
>>>> *To: *Michael Heuer <he...@gmail.com>
>>>> *Cc: *dev <de...@spark.apache.org>
>>>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC2)
>>>>
>>>> +1 (non-binding)
>>>>
>>>>
>>>>
>>>> On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer <he...@gmail.com>
>>>> wrote:
>>>>
>>>> +1 (non-binding)
>>>>
>>>>
>>>>
>>>>    michael
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Jan 24, 2022, at 7:30 AM, Gengliang Wang <lt...@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> +1 (non-binding)
>>>>
>>>>
>>>>
>>>> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun <do...@gmail.com>
>>>> wrote:
>>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> Dongjoon.
>>>>
>>>>
>>>>
>>>> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan <mr...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> Signatures, digests, etc check out fine.
>>>>
>>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Mridul
>>>>
>>>>
>>>>
>>>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <sr...@apache.org> wrote:
>>>>
>>>> +1 with same result as last time.
>>>>
>>>>
>>>>
>>>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <hu...@gmail.com>
>>>> wrote:
>>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
>>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
>>>> ] +1 Release this package as Apache Spark 3.2.1 [ ] -1 Do not release
>>>> this package because ... To learn more about Apache Spark, please see
>>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>>> https://github.com/apache/spark/tree/v3.2.1-rc2  The release files,
>>>> including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/  Signatures
>>>> used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>>> repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>>   The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/  The
>>>> list of bug fixes going into 3.2.1 can be found at the following URL:
>>>> https://s.apache.org/yu0cy   This release is using the release script
>>>> of the tag v3.2.1-rc2. FAQ ========================= How can I help
>>>> test this release? ========================= If you are a Spark user, you
>>>> can help us test this release by taking an existing Spark workload and
>>>> running on this release candidate, then reporting any regressions. If
>>>> you're working in PySpark you can set up a virtual env and install the
>>>> current RC and see if anything important breaks, in the Java/Scala you can
>>>> add the staging repository to your projects resolvers and test with the RC
>>>> (make sure to clean up the artifact cache before/after so you don't end up
>>>> building with a out of date RC going forward).
>>>> =========================================== What should happen to JIRA
>>>> tickets still targeting 3.2.1? ===========================================
>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>>> important bug fixes, documentation, and API tweaks that impact
>>>> compatibility should be worked on immediately. Everything else please
>>>> retarget to an appropriate release. ================== But my bug isn't
>>>> fixed? ================== In order to make timely releases, we will
>>>> typically not hold the release unless the bug in question is a regression
>>>> from the previous release. That being said, if there is something which is
>>>> a regression that has not been correctly targeted please ping me or a
>>>> committer to help target the issue.
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> John Zhuge
>>>
>>

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Wenchen Fan <cl...@gmail.com>.

+1

On Tue, Jan 25, 2022 at 10:13 AM Ruifeng Zheng <ru...@foxmail.com> wrote:

> +1 (non-binding)
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Kent Yao" <ya...@apache.org>;
> *发送时间:* 2022年1月25日(星期二) 上午10:09
> *收件人:* "John Zhuge"<jz...@apache.org>;
> *抄送:* "dev"<de...@spark.apache.org>;
> *主题:* Re: [VOTE] Release Spark 3.2.1 (RC2)
>
> +1， non-binding
>
> John Zhuge <jz...@apache.org> 于2022年1月25日周二 06:56写道：
>
>> +1 (non-binding)
>>
>> On Mon, Jan 24, 2022 at 2:28 PM Cheng Su <ch...@fb.com.invalid> wrote:
>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>> Cheng Su
>>>
>>>
>>>
>>> *From: *Chao Sun <su...@apache.org>
>>> *Date: *Monday, January 24, 2022 at 2:10 PM
>>> *To: *Michael Heuer <he...@gmail.com>
>>> *Cc: *dev <de...@spark.apache.org>
>>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC2)
>>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>> On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer <he...@gmail.com> wrote:
>>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>>    michael
>>>
>>>
>>>
>>>
>>>
>>> On Jan 24, 2022, at 7:30 AM, Gengliang Wang <lt...@gmail.com> wrote:
>>>
>>>
>>>
>>> +1 (non-binding)
>>>
>>>
>>>
>>> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun <do...@gmail.com>
>>> wrote:
>>>
>>> +1
>>>
>>>
>>>
>>> Dongjoon.
>>>
>>>
>>>
>>> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan <mr...@gmail.com>
>>> wrote:
>>>
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> Signatures, digests, etc check out fine.
>>>
>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>>
>>>
>>>
>>> Regards,
>>>
>>> Mridul
>>>
>>>
>>>
>>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <sr...@apache.org> wrote:
>>>
>>> +1 with same result as last time.
>>>
>>>
>>>
>>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <hu...@gmail.com>
>>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>>> Release this package as Apache Spark 3.2.1 [ ] -1 Do not release this
>>> package because ... To learn more about Apache Spark, please see
>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>> https://github.com/apache/spark/tree/v3.2.1-rc2  The release files,
>>> including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/  Signatures
>>> used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>> repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>   The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/  The
>>> list of bug fixes going into 3.2.1 can be found at the following URL:
>>> https://s.apache.org/yu0cy   This release is using the release script
>>> of the tag v3.2.1-rc2. FAQ ========================= How can I help
>>> test this release? ========================= If you are a Spark user, you
>>> can help us test this release by taking an existing Spark workload and
>>> running on this release candidate, then reporting any regressions. If
>>> you're working in PySpark you can set up a virtual env and install the
>>> current RC and see if anything important breaks, in the Java/Scala you can
>>> add the staging repository to your projects resolvers and test with the RC
>>> (make sure to clean up the artifact cache before/after so you don't end up
>>> building with a out of date RC going forward).
>>> =========================================== What should happen to JIRA
>>> tickets still targeting 3.2.1? ===========================================
>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>> important bug fixes, documentation, and API tweaks that impact
>>> compatibility should be worked on immediately. Everything else please
>>> retarget to an appropriate release. ================== But my bug isn't
>>> fixed? ================== In order to make timely releases, we will
>>> typically not hold the release unless the bug in question is a regression
>>> from the previous release. That being said, if there is something which is
>>> a regression that has not been correctly targeted please ping me or a
>>> committer to help target the issue.
>>>
>>>
>>>
>>>
>>
>> --
>> John Zhuge
>>
>

回复： [VOTE] Release Spark 3.2.1 (RC2)

Posted by Ruifeng Zheng <ru...@foxmail.com>.

+1 (non-binding)



------------------&nbsp;原始邮件&nbsp;------------------
发件人:                                                                                                                        "Kent Yao"                                                                                    <yao@apache.org&gt;;
发送时间:&nbsp;2022年1月25日(星期二) 上午10:09
收件人:&nbsp;"John Zhuge"<jzhuge@apache.org&gt;;
抄送:&nbsp;"dev"<dev@spark.apache.org&gt;;
主题:&nbsp;Re: [VOTE] Release Spark 3.2.1 (RC2)



+1， non-binding


John Zhuge <jzhuge@apache.org&gt; 于2022年1月25日周二 06:56写道：

+1 (non-binding)


On Mon, Jan 24, 2022 at 2:28 PM Cheng Su <chengsu@fb.com.invalid&gt; wrote:

   
+1 (non-binding)
 
&nbsp;
 
Cheng Su
 
&nbsp;
  
From: Chao Sun <sunchao@apache.org&gt;
 Date: Monday, January 24, 2022 at 2:10 PM
 To: Michael Heuer <heuermh@gmail.com&gt;
 Cc: dev <dev@spark.apache.org&gt;
 Subject: Re: [VOTE] Release Spark 3.2.1 (RC2)
 
  
+1 (non-binding)
 
 
&nbsp;
   
On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer <heuermh@gmail.com&gt; wrote:
 
   
+1 (non-binding)
  
&nbsp;
 
  
&nbsp; &nbsp;michael
 
  
&nbsp;
 
  
&nbsp;
    
On Jan 24, 2022, at 7:30 AM, Gengliang Wang <ltnwgl@gmail.com&gt; wrote:
 
 
&nbsp;
   
+1 (non-binding)
 
 
&nbsp;
   
On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun <dongjoon.hyun@gmail.com&gt; wrote:
 
    
+1
  
&nbsp;
 
  
Dongjoon.
 
 
 
&nbsp;
   
On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan <mridul@gmail.com&gt; wrote:
 
      
&nbsp;
 
  
+1&nbsp;
 
  
&nbsp;
 
 
  
Signatures, digests, etc check out fine.
 
  
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
 
 
  
&nbsp;
 
  
Regards,
 
  
Mridul
 
 
 
&nbsp;
   
On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <srowen@apache.org&gt; wrote:
 
   
+1 with same result as last time.
 
 
&nbsp;
   
On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <huaxin.gao11@gmail.com&gt; wrote:
 
   
Please vote on releasing the following candidate as Apache Spark version 3.2.1. The vote is open until 8:00pm Pacific time January  25 and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.2.1
 
[ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit&nbsp;
 
4f25b3f71238a00508a356591553f2dfa89f8290):
 
https://github.com/apache/spark/tree/v3.2.1-rc2&nbsp; 
 
The release files, including signatures, digests, etc. can be found at:
 
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/&nbsp; 
 
Signatures used for Spark RCs can be found in this file: https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository for this release can be found at:
 
https://repository.apache.org/content/repositories/orgapachespark-1398/
 
&nbsp;
 
The documentation corresponding to this release can be found at:&nbsp;
 
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/&nbsp; 
 
The list of bug fixes going into 3.2.1 can be found at the following URL:
 
https://s.apache.org/yu0cy
 
&nbsp;
 
This release is using the release script of the tag v3.2.1-rc2. FAQ  ========================= How can I help test this release? ========================= If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions. If  you're working in PySpark you can set up a virtual env and install the current RC and see if anything important breaks, in the Java/Scala you can add the staging repository to your projects resolvers and test with the RC (make sure to clean up the artifact  cache before/after so you don't end up building with a out of date RC going forward). =========================================== What should happen to JIRA tickets still targeting 3.2.1? =========================================== The current list of open  tickets targeted at 3.2.1 can be found at: https://issues.apache.org/jira/projects/SPARK  and search for "Target Version/s" = 3.2.1 Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to an appropriate  release. ================== But my bug isn't fixed? ================== In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which  is a regression that has not been correctly targeted please ping me or a committer to help target the issue.
 
  
  
  
 
  
 
  
 
&nbsp;
 
 
  
 
 
 



-- 
John Zhuge

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Kent Yao <ya...@apache.org>.

+1， non-binding

John Zhuge <jz...@apache.org> 于2022年1月25日周二 06:56写道：

> +1 (non-binding)
>
> On Mon, Jan 24, 2022 at 2:28 PM Cheng Su <ch...@fb.com.invalid> wrote:
>
>> +1 (non-binding)
>>
>>
>>
>> Cheng Su
>>
>>
>>
>> *From: *Chao Sun <su...@apache.org>
>> *Date: *Monday, January 24, 2022 at 2:10 PM
>> *To: *Michael Heuer <he...@gmail.com>
>> *Cc: *dev <de...@spark.apache.org>
>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC2)
>>
>> +1 (non-binding)
>>
>>
>>
>> On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer <he...@gmail.com> wrote:
>>
>> +1 (non-binding)
>>
>>
>>
>>    michael
>>
>>
>>
>>
>>
>> On Jan 24, 2022, at 7:30 AM, Gengliang Wang <lt...@gmail.com> wrote:
>>
>>
>>
>> +1 (non-binding)
>>
>>
>>
>> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>> +1
>>
>>
>>
>> Dongjoon.
>>
>>
>>
>> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan <mr...@gmail.com>
>> wrote:
>>
>>
>>
>> +1
>>
>>
>>
>> Signatures, digests, etc check out fine.
>>
>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>
>>
>>
>> Regards,
>>
>> Mridul
>>
>>
>>
>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <sr...@apache.org> wrote:
>>
>> +1 with same result as last time.
>>
>>
>>
>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <hu...@gmail.com>
>> wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>> Release this package as Apache Spark 3.2.1 [ ] -1 Do not release this
>> package because ... To learn more about Apache Spark, please see
>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>> https://github.com/apache/spark/tree/v3.2.1-rc2  The release files,
>> including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/  Signatures
>> used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
>> for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1398/   The
>> documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/  The
>> list of bug fixes going into 3.2.1 can be found at the following URL:
>> https://s.apache.org/yu0cy   This release is using the release script of
>> the tag v3.2.1-rc2. FAQ ========================= How can I help test
>> this release? ========================= If you are a Spark user, you can
>> help us test this release by taking an existing Spark workload and running
>> on this release candidate, then reporting any regressions. If you're
>> working in PySpark you can set up a virtual env and install the current RC
>> and see if anything important breaks, in the Java/Scala you can add the
>> staging repository to your projects resolvers and test with the RC (make
>> sure to clean up the artifact cache before/after so you don't end up
>> building with a out of date RC going forward).
>> =========================================== What should happen to JIRA
>> tickets still targeting 3.2.1? ===========================================
>> The current list of open tickets targeted at 3.2.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>> important bug fixes, documentation, and API tweaks that impact
>> compatibility should be worked on immediately. Everything else please
>> retarget to an appropriate release. ================== But my bug isn't
>> fixed? ================== In order to make timely releases, we will
>> typically not hold the release unless the bug in question is a regression
>> from the previous release. That being said, if there is something which is
>> a regression that has not been correctly targeted please ping me or a
>> committer to help target the issue.
>>
>>
>>
>>
>
> --
> John Zhuge
>

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by John Zhuge <jz...@apache.org>.

+1 (non-binding)

On Mon, Jan 24, 2022 at 2:28 PM Cheng Su <ch...@fb.com.invalid> wrote:

> +1 (non-binding)
>
>
>
> Cheng Su
>
>
>
> *From: *Chao Sun <su...@apache.org>
> *Date: *Monday, January 24, 2022 at 2:10 PM
> *To: *Michael Heuer <he...@gmail.com>
> *Cc: *dev <de...@spark.apache.org>
> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC2)
>
> +1 (non-binding)
>
>
>
> On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer <he...@gmail.com> wrote:
>
> +1 (non-binding)
>
>
>
>    michael
>
>
>
>
>
> On Jan 24, 2022, at 7:30 AM, Gengliang Wang <lt...@gmail.com> wrote:
>
>
>
> +1 (non-binding)
>
>
>
> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
> +1
>
>
>
> Dongjoon.
>
>
>
> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan <mr...@gmail.com>
> wrote:
>
>
>
> +1
>
>
>
> Signatures, digests, etc check out fine.
>
> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>
>
>
> Regards,
>
> Mridul
>
>
>
> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <sr...@apache.org> wrote:
>
> +1 with same result as last time.
>
>
>
> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <hu...@gmail.com> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
> Release this package as Apache Spark 3.2.1 [ ] -1 Do not release this
> package because ... To learn more about Apache Spark, please see
> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
> 4f25b3f71238a00508a356591553f2dfa89f8290):
> https://github.com/apache/spark/tree/v3.2.1-rc2  The release files,
> including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/  Signatures
> used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
> for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1398/   The
> documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/  The
> list of bug fixes going into 3.2.1 can be found at the following URL:
> https://s.apache.org/yu0cy   This release is using the release script of
> the tag v3.2.1-rc2. FAQ ========================= How can I help test
> this release? ========================= If you are a Spark user, you can
> help us test this release by taking an existing Spark workload and running
> on this release candidate, then reporting any regressions. If you're
> working in PySpark you can set up a virtual env and install the current RC
> and see if anything important breaks, in the Java/Scala you can add the
> staging repository to your projects resolvers and test with the RC (make
> sure to clean up the artifact cache before/after so you don't end up
> building with a out of date RC going forward).
> =========================================== What should happen to JIRA
> tickets still targeting 3.2.1? ===========================================
> The current list of open tickets targeted at 3.2.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
> important bug fixes, documentation, and API tweaks that impact
> compatibility should be worked on immediately. Everything else please
> retarget to an appropriate release. ================== But my bug isn't
> fixed? ================== In order to make timely releases, we will
> typically not hold the release unless the bug in question is a regression
> from the previous release. That being said, if there is something which is
> a regression that has not been correctly targeted please ping me or a
> committer to help target the issue.
>
>
>
>

-- 
John Zhuge

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Cheng Su <ch...@fb.com.INVALID>.

+1 (non-binding)

Cheng Su

From: Chao Sun <su...@apache.org>
Date: Monday, January 24, 2022 at 2:10 PM
To: Michael Heuer <he...@gmail.com>
Cc: dev <de...@spark.apache.org>
Subject: Re: [VOTE] Release Spark 3.2.1 (RC2)
+1 (non-binding)

On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer <he...@gmail.com>> wrote:
+1 (non-binding)

   michael

On Jan 24, 2022, at 7:30 AM, Gengliang Wang <lt...@gmail.com>> wrote:

+1 (non-binding)

On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun <do...@gmail.com>> wrote:
+1

Dongjoon.

On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan <mr...@gmail.com>> wrote:

+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <sr...@apache.org>> wrote:
+1 with same result as last time.

On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <hu...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.2.1
[ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/<http://spark.apache.org/> The tag to be voted on is v3.2.1-rc2 (commit
4f25b3f71238a00508a356591553f2dfa89f8290):
https://github.com/apache/spark/tree/v3.2.1-rc2
The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/<https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/>
Signatures used for Spark RCs can be found in this file: https://dist.apache.org/repos/dist/dev/spark/KEYS<https://dist.apache.org/repos/dist/dev/spark/KEYS> The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1398/<https://repository.apache.org/content/repositories/orgapachespark-1398/>

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/<https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/>
The list of bug fixes going into 3.2.1 can be found at the following URL:
https://s.apache.org/yu0cy<https://s.apache.org/yu0cy>

This release is using the release script of the tag v3.2.1-rc2. FAQ ========================= How can I help test this release? ========================= If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions. If you're working in PySpark you can set up a virtual env and install the current RC and see if anything important breaks, in the Java/Scala you can add the staging repository to your projects resolvers and test with the RC (make sure to clean up the artifact cache before/after so you don't end up building with a out of date RC going forward). =========================================== What should happen to JIRA tickets still targeting 3.2.1? =========================================== The current list of open tickets targeted at 3.2.1 can be found at: https://issues.apache.org/jira/projects/SPARK<https://issues.apache.org/jira/projects/SPARK> and search for "Target Version/s" = 3.2.1 Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to an appropriate release. ================== But my bug isn't fixed? ================== In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue.

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Chao Sun <su...@apache.org>.

+1 (non-binding)

On Mon, Jan 24, 2022 at 6:32 AM Michael Heuer <he...@gmail.com> wrote:

> +1 (non-binding)
>
>    michael
>
>
> On Jan 24, 2022, at 7:30 AM, Gengliang Wang <lt...@gmail.com> wrote:
>
> +1 (non-binding)
>
> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> +1
>>
>> Dongjoon.
>>
>> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan <mr...@gmail.com>
>> wrote:
>>
>>>
>>> +1
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <sr...@apache.org> wrote:
>>>
>>>> +1 with same result as last time.
>>>>
>>>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <hu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
>>>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
>>>>> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
>>>>> this package because ... To learn more about Apache Spark, please see
>>>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>>>> Signatures used for Spark RCs can be found in this file:
>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>>>> repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>>>>> The list of bug fixes going into 3.2.1 can be found at the following
>>>>> URL:https://s.apache.org/yu0cy
>>>>>
>>>>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>>>>> ========================= How can I help test this release?
>>>>> ========================= If you are a Spark user, you can help us test
>>>>> this release by taking an existing Spark workload and running on this
>>>>> release candidate, then reporting any regressions. If you're working in
>>>>> PySpark you can set up a virtual env and install the current RC and see if
>>>>> anything important breaks, in the Java/Scala you can add the staging
>>>>> repository to your projects resolvers and test with the RC (make sure to
>>>>> clean up the artifact cache before/after so you don't end up building with
>>>>> a out of date RC going forward).
>>>>> =========================================== What should happen to JIRA
>>>>> tickets still targeting 3.2.1? ===========================================
>>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>>>> important bug fixes, documentation, and API tweaks that impact
>>>>> compatibility should be worked on immediately. Everything else please
>>>>> retarget to an appropriate release. ================== But my bug isn't
>>>>> fixed? ================== In order to make timely releases, we will
>>>>> typically not hold the release unless the bug in question is a regression
>>>>> from the previous release. That being said, if there is something which is
>>>>> a regression that has not been correctly targeted please ping me or a
>>>>> committer to help target the issue.
>>>>>
>>>>
>

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Michael Heuer <he...@gmail.com>.

+1 (non-binding)

   michael


> On Jan 24, 2022, at 7:30 AM, Gengliang Wang <lt...@gmail.com> wrote:
> 
> +1 (non-binding)
> 
> On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun <dongjoon.hyun@gmail.com <ma...@gmail.com>> wrote:
> +1
> 
> Dongjoon.
> 
> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan <mridul@gmail.com <ma...@gmail.com>> wrote:
> 
> +1 
> 
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
> 
> Regards,
> Mridul
> 
> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <srowen@apache.org <ma...@apache.org>> wrote:
> +1 with same result as last time.
> 
> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <huaxin.gao11@gmail.com <ma...@gmail.com>> wrote:
> Please vote on releasing the following candidate as Apache Spark version 3.2.1.  The vote is open until 8:00pm Pacific time January 25 and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.  [ ] +1 Release this package as Apache Spark 3.2.1
> [ ] -1 Do not release this package because ...  To learn more about Apache Spark, please see http://spark.apache.org/ <http://spark.apache.org/>  The tag to be voted on is v3.2.1-rc2 (commit 
> 4f25b3f71238a00508a356591553f2dfa89f8290):
> https://github.com/apache/spark/tree/v3.2.1-rc2 <https://github.com/apache/spark/tree/v3.2.1-rc2>  
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/ <https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/>  
> Signatures used for Spark RCs can be found in this file: https://dist.apache.org/repos/dist/dev/spark/KEYS <https://dist.apache.org/repos/dist/dev/spark/KEYS>  The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1398/ <https://repository.apache.org/content/repositories/orgapachespark-1398/>
> 
> The documentation corresponding to this release can be found at: 
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/ <https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/>  
> The list of bug fixes going into 3.2.1 can be found at the following URL:
> https://s.apache.org/yu0cy <https://s.apache.org/yu0cy>
> 
> This release is using the release script of the tag v3.2.1-rc2.   FAQ  ========================= How can I help test this release? ========================= If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.  If you're working in PySpark you can set up a virtual env and install the current RC and see if anything important breaks, in the Java/Scala you can add the staging repository to your projects resolvers and test with the RC (make sure to clean up the artifact cache before/after so you don't end up building with a out of date RC going forward).  =========================================== What should happen to JIRA tickets still targeting 3.2.1? =========================================== The current list of open tickets targeted at 3.2.1 can be found at: https://issues.apache.org/jira/projects/SPARK <https://issues.apache.org/jira/projects/SPARK> and search for "Target Version/s" = 3.2.1  Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to an appropriate release.  ================== But my bug isn't fixed? ================== In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue.

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Gengliang Wang <lt...@gmail.com>.

+1 (non-binding)

On Mon, Jan 24, 2022 at 6:26 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> +1
>
> Dongjoon.
>
> On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan <mr...@gmail.com>
> wrote:
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>>
>> Regards,
>> Mridul
>>
>> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <sr...@apache.org> wrote:
>>
>>> +1 with same result as last time.
>>>
>>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <hu...@gmail.com>
>>> wrote:
>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
>>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
>>>> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
>>>> this package because ... To learn more about Apache Spark, please see
>>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>>> repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>>>> The list of bug fixes going into 3.2.1 can be found at the following
>>>> URL:https://s.apache.org/yu0cy
>>>>
>>>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>>>> ========================= How can I help test this release?
>>>> ========================= If you are a Spark user, you can help us test
>>>> this release by taking an existing Spark workload and running on this
>>>> release candidate, then reporting any regressions. If you're working in
>>>> PySpark you can set up a virtual env and install the current RC and see if
>>>> anything important breaks, in the Java/Scala you can add the staging
>>>> repository to your projects resolvers and test with the RC (make sure to
>>>> clean up the artifact cache before/after so you don't end up building with
>>>> a out of date RC going forward).
>>>> =========================================== What should happen to JIRA
>>>> tickets still targeting 3.2.1? ===========================================
>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>>> important bug fixes, documentation, and API tweaks that impact
>>>> compatibility should be worked on immediately. Everything else please
>>>> retarget to an appropriate release. ================== But my bug isn't
>>>> fixed? ================== In order to make timely releases, we will
>>>> typically not hold the release unless the bug in question is a regression
>>>> from the previous release. That being said, if there is something which is
>>>> a regression that has not been correctly targeted please ping me or a
>>>> committer to help target the issue.
>>>>
>>>

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Dongjoon Hyun <do...@gmail.com>.

+1

Dongjoon.

On Sat, Jan 22, 2022 at 7:19 AM Mridul Muralidharan <mr...@gmail.com>
wrote:

>
> +1
>
> Signatures, digests, etc check out fine.
> Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes
>
> Regards,
> Mridul
>
> On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <sr...@apache.org> wrote:
>
>> +1 with same result as last time.
>>
>> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <hu...@gmail.com>
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>>> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
>>> package because ... To learn more about Apache Spark, please see
>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>> repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>>> The list of bug fixes going into 3.2.1 can be found at the following URL:
>>> https://s.apache.org/yu0cy
>>>
>>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>>> ========================= How can I help test this release?
>>> ========================= If you are a Spark user, you can help us test
>>> this release by taking an existing Spark workload and running on this
>>> release candidate, then reporting any regressions. If you're working in
>>> PySpark you can set up a virtual env and install the current RC and see if
>>> anything important breaks, in the Java/Scala you can add the staging
>>> repository to your projects resolvers and test with the RC (make sure to
>>> clean up the artifact cache before/after so you don't end up building with
>>> a out of date RC going forward).
>>> =========================================== What should happen to JIRA
>>> tickets still targeting 3.2.1? ===========================================
>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>> important bug fixes, documentation, and API tweaks that impact
>>> compatibility should be worked on immediately. Everything else please
>>> retarget to an appropriate release. ================== But my bug isn't
>>> fixed? ================== In order to make timely releases, we will
>>> typically not hold the release unless the bug in question is a regression
>>> from the previous release. That being said, if there is something which is
>>> a regression that has not been correctly targeted please ping me or a
>>> committer to help target the issue.
>>>
>>

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Mridul Muralidharan <mr...@gmail.com>.

+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul

On Fri, Jan 21, 2022 at 9:01 PM Sean Owen <sr...@apache.org> wrote:

> +1 with same result as last time.
>
> On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <hu...@gmail.com> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
>> package because ... To learn more about Apache Spark, please see
>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>> https://github.com/apache/spark/tree/v3.2.1-rc2
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
>> for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>> The list of bug fixes going into 3.2.1 can be found at the following URL:
>> https://s.apache.org/yu0cy
>>
>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>> ========================= How can I help test this release?
>> ========================= If you are a Spark user, you can help us test
>> this release by taking an existing Spark workload and running on this
>> release candidate, then reporting any regressions. If you're working in
>> PySpark you can set up a virtual env and install the current RC and see if
>> anything important breaks, in the Java/Scala you can add the staging
>> repository to your projects resolvers and test with the RC (make sure to
>> clean up the artifact cache before/after so you don't end up building with
>> a out of date RC going forward).
>> =========================================== What should happen to JIRA
>> tickets still targeting 3.2.1? ===========================================
>> The current list of open tickets targeted at 3.2.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>> important bug fixes, documentation, and API tweaks that impact
>> compatibility should be worked on immediately. Everything else please
>> retarget to an appropriate release. ================== But my bug isn't
>> fixed? ================== In order to make timely releases, we will
>> typically not hold the release unless the bug in question is a regression
>> from the previous release. That being said, if there is something which is
>> a regression that has not been correctly targeted please ping me or a
>> committer to help target the issue.
>>
>

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Sean Owen <sr...@apache.org>.

+1 with same result as last time.

On Thu, Jan 20, 2022 at 9:59 PM huaxin gao <hu...@gmail.com> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
> package because ... To learn more about Apache Spark, please see
> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
> 4f25b3f71238a00508a356591553f2dfa89f8290):
> https://github.com/apache/spark/tree/v3.2.1-rc2
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
> for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1398/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
> The list of bug fixes going into 3.2.1 can be found at the following URL:
> https://s.apache.org/yu0cy
>
> This release is using the release script of the tag v3.2.1-rc2. FAQ
> ========================= How can I help test this release?
> ========================= If you are a Spark user, you can help us test
> this release by taking an existing Spark workload and running on this
> release candidate, then reporting any regressions. If you're working in
> PySpark you can set up a virtual env and install the current RC and see if
> anything important breaks, in the Java/Scala you can add the staging
> repository to your projects resolvers and test with the RC (make sure to
> clean up the artifact cache before/after so you don't end up building with
> a out of date RC going forward).
> =========================================== What should happen to JIRA
> tickets still targeting 3.2.1? ===========================================
> The current list of open tickets targeted at 3.2.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
> important bug fixes, documentation, and API tweaks that impact
> compatibility should be worked on immediately. Everything else please
> retarget to an appropriate release. ================== But my bug isn't
> fixed? ================== In order to make timely releases, we will
> typically not hold the release unless the bug in question is a regression
> from the previous release. That being said, if there is something which is
> a regression that has not been correctly targeted please ping me or a
> committer to help target the issue.
>

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Maciej <ms...@gmail.com>.

I closed the ticket as a duplicate of SPARK-29444

This behavior is neither a bug nor a regression and there is already a
documented writer (or global) option that be can be used to modify it.

On 1/22/22 00:47, Sean Owen wrote:
> Continue on the ticket - I am not sure this is established. We would
> block a release for critical problems that are not regressions. This is
> not a data loss / 'deleting data' issue even if valid.
> You're welcome to provide feedback but votes are for the PMC.
> 
> On Fri, Jan 21, 2022 at 5:24 PM Bjørn Jørgensen
> <bjornjorgensen@gmail.com <ma...@gmail.com>> wrote:
> 
>     Ok, but deleting users' data without them knowing it is never a good
>     idea. That's why I give this RC -1.
> 
>     lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen <srowen@gmail.com
>     <ma...@gmail.com>>:
> 
>         (Bjorn - unless this is a regression, it would not block a
>         release, even if it's a bug)
> 
>         On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen
>         <bjornjorgensen@gmail.com <ma...@gmail.com>> wrote:
> 
> 
>                     [x] -1 Do not release this package because, deletes
>                     all my columns with only Null in it.  
> 
> 
>             I have
>             opened https://issues.apache.org/jira/browse/SPARK-37981
>             <https://issues.apache.org/jira/browse/SPARK-37981> for this
>             bug. 
> 
> 
> 
> 
>             fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen
>             <srowen@gmail.com <ma...@gmail.com>>:
> 
>                 (Are you suggesting this is a regression, or is it a
>                 general question? here we're trying to figure out
>                 whether there are critical bugs introduced in 3.2.1 vs
>                 3.2.0)
> 
>                 On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen
>                 <bjornjorgensen@gmail.com
>                 <ma...@gmail.com>> wrote:
> 
>                     Hi, I am wondering if it's a bug or not.
> 
>                     I do have a lot of json files, where they have some
>                     columns that are all "null" on. 
> 
>                     I start spark with
> 
>                     from pyspark import pandas as ps
>                     import re
>                     import numpy as np
>                     import os
>                     import pandas as pd
> 
>                     from pyspark import SparkContext, SparkConf
>                     from pyspark.sql import SparkSession
>                     from pyspark.sql.functions import concat, concat_ws,
>                     lit, col, trim, expr
>                     from pyspark.sql.types import StructType,
>                     StructField, StringType,IntegerType
> 
>                     os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
> 
>                     def get_spark_session(app_name: str, conf: SparkConf):
>                         conf.setMaster('local[*]')
>                         conf \
>                           .set('spark.driver.memory', '64g')\
>                           .set("fs.s3a.access.key", "minio") \
>                           .set("fs.s3a.secret.key", "") \
>                           .set("fs.s3a.endpoint",
>                     "http://192.168.1.127:9000
>                     <http://192.168.1.127:9000>") \
>                           .set("spark.hadoop.fs.s3a.impl",
>                     "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>                           .set("spark.hadoop.fs.s3a.path.style.access",
>                     "true") \
>                           .set("spark.sql.repl.eagerEval.enabled", "True") \
>                           .set("spark.sql.adaptive.enabled", "True") \
>                           .set("spark.serializer",
>                     "org.apache.spark.serializer.KryoSerializer") \
>                           .set("spark.sql.repl.eagerEval.maxNumRows",
>                     "10000") \
>                           .set("sc.setLogLevel", "error")
>                        
>                         return
>                     SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
> 
>                     spark = get_spark_session("Falk", SparkConf())
> 
>                     d3 =
>                     spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
> 
>                     import pyspark
>                     def sparkShape(dataFrame):
>                         return (dataFrame.count(), len(dataFrame.columns))
>                     pyspark.sql.dataframe.DataFrame.shape = sparkShape
>                     print(d3.shape())
> 
> 
>                     (653610, 267)
> 
> 
>                     d3.write.json("d3.json")
> 
> 
>                     d3 = spark.read.json("d3.json/*.json")
> 
>                     import pyspark
>                     def sparkShape(dataFrame):
>                         return (dataFrame.count(), len(dataFrame.columns))
>                     pyspark.sql.dataframe.DataFrame.shape = sparkShape
>                     print(d3.shape())
> 
>                     (653610, 186)
> 
> 
>                     So spark is deleting 81 columns. I think that all of
>                     these 81 deleted columns have only Null in them.  
> 
>                     Is this a bug or has this been made on purpose?  
> 
> 
>                     fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao
>                     <huaxin.gao11@gmail.com
>                     <ma...@gmail.com>>:
> 
> 
>                                 Please vote on releasing the following
>                                 candidate as Apache Spark version 3.2.1.
>                                 The vote is open until 8:00pm Pacific
>                                 time January 25 and passes if a majority
>                                 +1 PMC votes are cast, with a minimum of
>                                 3 +1 votes. [ ] +1 Release this package
>                                 as Apache Spark 3.2.1
> 
> 
>                                 [ ] -1 Do not release this package
>                                 because ... To learn more about Apache
>                                 Spark, please see
>                                 http://spark.apache.org/
>                                 <http://spark.apache.org/> The tag to be
>                                 voted on is v3.2.1-rc2 (commit 
> 
> 
>                                 4f25b3f71238a00508a356591553f2dfa89f8290):
> 
> 
>                                 https://github.com/apache/spark/tree/v3.2.1-rc2
>                                 <https://github.com/apache/spark/tree/v3.2.1-rc2> 
> 
> 
> 
>                                 The release files, including signatures,
>                                 digests, etc. can be found at:
> 
> 
>                                 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>                                 <https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/> 
> 
> 
> 
>                                 Signatures used for Spark RCs can be
>                                 found in this file:
>                                 https://dist.apache.org/repos/dist/dev/spark/KEYS
>                                 <https://dist.apache.org/repos/dist/dev/spark/KEYS>
>                                 The staging repository for this release
>                                 can be found at:
> 
> 
>                                 https://repository.apache.org/content/repositories/orgapachespark-1398/
>                                 <https://repository.apache.org/content/repositories/orgapachespark-1398/>
> 
> 
> 
>                                 The documentation corresponding to this
>                                 release can be found at: 
> 
> 
>                                 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>                                 <https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/> 
> 
> 
> 
>                                 The list of bug fixes going into 3.2.1
>                                 can be found at the following URL:
> 
> 
>                                 https://s.apache.org/yu0cy
>                                 <https://s.apache.org/yu0cy>
> 
> 
> 
>                                 This release is using the release script
>                                 of the tag
>                                 v3.2.1-rc2.FAQ=========================
>                                 How can I help test this release?
>                                 ========================= If you are a
>                                 Spark user, you can help us test this
>                                 release by taking an existing Spark
>                                 workload and running on this release
>                                 candidate, then reporting any
>                                 regressions. If you're working in
>                                 PySpark you can set up a virtual env and
>                                 install the current RC and see if
>                                 anything important breaks, in the
>                                 Java/Scala you can add the staging
>                                 repository to your projects resolvers
>                                 and test with the RC (make sure to clean
>                                 up the artifact cache before/after so
>                                 you don't end up building with a out of
>                                 date RC going forward).
>                                 ===========================================
>                                 What should happen to JIRA tickets still
>                                 targeting 3.2.1?
>                                 ===========================================
>                                 The current list of open tickets
>                                 targeted at 3.2.1 can be found at:
>                                 https://issues.apache.org/jira/projects/SPARK
>                                 <https://issues.apache.org/jira/projects/SPARK>and
>                                 search for "Target Version/s" = 3.2.1
>                                 Committers should look at those and
>                                 triage. Extremely important bug fixes,
>                                 documentation, and API tweaks that
>                                 impact compatibility should be worked on
>                                 immediately. Everything else please
>                                 retarget to an appropriate release.
>                                 ================== But my bug isn't
>                                 fixed? ================== In order to
>                                 make timely releases, we will typically
>                                 not hold the release unless the bug in
>                                 question is a regression from the
>                                 previous release. That being said, if
>                                 there is something which is a regression
>                                 that has not been correctly targeted
>                                 please ping me or a committer to help
>                                 target the issue.
> 
> 
> 
>                     -- 
>                     Bjørn Jørgensen
>                     Vestre Aspehaug 4, 6010 Ålesund
>                     Norge
> 
>                     +47 480 94 297
> 
> 
> 
>             -- 
>             Bjørn Jørgensen
>             Vestre Aspehaug 4, 6010 Ålesund
>             Norge
> 
>             +47 480 94 297
> 
> 
> 
>     -- 
>     Bjørn Jørgensen
>     Vestre Aspehaug 4, 6010 Ålesund
>     Norge
> 
>     +47 480 94 297
> 


-- 
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
PGP: A30CEF0C31A501EC

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Holden Karau <ho...@pigscanfly.ca>.

On Fri, Jan 21, 2022 at 6:48 PM Sean Owen <sr...@gmail.com> wrote:

> Continue on the ticket - I am not sure this is established. We would block
> a release for critical problems that are not regressions. This is not a
> data loss / 'deleting data' issue even if valid.
> You're welcome to provide feedback but votes are for the PMC.
>
To be clear users and developers are more than welcome to vote, but only
PMC votes are binding.

>
> On Fri, Jan 21, 2022 at 5:24 PM Bjørn Jørgensen <bj...@gmail.com>
> wrote:
>
>> Ok, but deleting users' data without them knowing it is never a good
>> idea. That's why I give this RC -1.
>>
>> lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen <sr...@gmail.com>:
>>
>>> (Bjorn - unless this is a regression, it would not block a release, even
>>> if it's a bug)
>>>
>>> On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen <
>>> bjornjorgensen@gmail.com> wrote:
>>>
>>>> [x] -1 Do not release this package because, deletes all my columns with
>>>> only Null in it.
>>>>
>>>> I have opened https://issues.apache.org/jira/browse/SPARK-37981 for
>>>> this bug.
>>>>
>>>>
>>>>
>>>>
>>>> fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen <sr...@gmail.com>:
>>>>
>>>>> (Are you suggesting this is a regression, or is it a general question?
>>>>> here we're trying to figure out whether there are critical bugs introduced
>>>>> in 3.2.1 vs 3.2.0)
>>>>>
>>>>> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen <
>>>>> bjornjorgensen@gmail.com> wrote:
>>>>>
>>>>>> Hi, I am wondering if it's a bug or not.
>>>>>>
>>>>>> I do have a lot of json files, where they have some columns that are
>>>>>> all "null" on.
>>>>>>
>>>>>> I start spark with
>>>>>>
>>>>>> from pyspark import pandas as ps
>>>>>> import re
>>>>>> import numpy as np
>>>>>> import os
>>>>>> import pandas as pd
>>>>>>
>>>>>> from pyspark import SparkContext, SparkConf
>>>>>> from pyspark.sql import SparkSession
>>>>>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim,
>>>>>> expr
>>>>>> from pyspark.sql.types import StructType, StructField,
>>>>>> StringType,IntegerType
>>>>>>
>>>>>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>>>>>
>>>>>> def get_spark_session(app_name: str, conf: SparkConf):
>>>>>>     conf.setMaster('local[*]')
>>>>>>     conf \
>>>>>>       .set('spark.driver.memory', '64g')\
>>>>>>       .set("fs.s3a.access.key", "minio") \
>>>>>>       .set("fs.s3a.secret.key", "") \
>>>>>>       .set("fs.s3a.endpoint", "http://192.168.1.127:9000") \
>>>>>>       .set("spark.hadoop.fs.s3a.impl",
>>>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>>>>>       .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>>>>>       .set("spark.sql.repl.eagerEval.enabled", "True") \
>>>>>>       .set("spark.sql.adaptive.enabled", "True") \
>>>>>>       .set("spark.serializer",
>>>>>> "org.apache.spark.serializer.KryoSerializer") \
>>>>>>       .set("spark.sql.repl.eagerEval.maxNumRows", "10000") \
>>>>>>       .set("sc.setLogLevel", "error")
>>>>>>
>>>>>>     return
>>>>>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>>>>>
>>>>>> spark = get_spark_session("Falk", SparkConf())
>>>>>>
>>>>>> d3 =
>>>>>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>>>>>>
>>>>>> import pyspark
>>>>>> def sparkShape(dataFrame):
>>>>>>     return (dataFrame.count(), len(dataFrame.columns))
>>>>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>>>>>> print(d3.shape())
>>>>>>
>>>>>>
>>>>>> (653610, 267)
>>>>>>
>>>>>>
>>>>>> d3.write.json("d3.json")
>>>>>>
>>>>>>
>>>>>> d3 = spark.read.json("d3.json/*.json")
>>>>>>
>>>>>> import pyspark
>>>>>> def sparkShape(dataFrame):
>>>>>>     return (dataFrame.count(), len(dataFrame.columns))
>>>>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>>>>>> print(d3.shape())
>>>>>>
>>>>>> (653610, 186)
>>>>>>
>>>>>>
>>>>>> So spark is deleting 81 columns. I think that all of these 81 deleted
>>>>>> columns have only Null in them.
>>>>>>
>>>>>> Is this a bug or has this been made on purpose?
>>>>>>
>>>>>>
>>>>>> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao <huaxin.gao11@gmail.com
>>>>>> >:
>>>>>>
>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
>>>>>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
>>>>>>> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not
>>>>>>> release this package because ... To learn more about Apache Spark, please
>>>>>>> see http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2
>>>>>>> (commit 4f25b3f71238a00508a356591553f2dfa89f8290):
>>>>>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>>> at:https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>>>>>> repository for this release can be found at:
>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>>>>>
>>>>>>> The documentation corresponding to this release can be found at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>>>>>>>
>>>>>>> The list of bug fixes going into 3.2.1 can be found at the following
>>>>>>> URL:https://s.apache.org/yu0cy
>>>>>>>
>>>>>>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>>>>>>> ========================= How can I help test this release?
>>>>>>> ========================= If you are a Spark user, you can help us test
>>>>>>> this release by taking an existing Spark workload and running on this
>>>>>>> release candidate, then reporting any regressions. If you're working in
>>>>>>> PySpark you can set up a virtual env and install the current RC and see if
>>>>>>> anything important breaks, in the Java/Scala you can add the staging
>>>>>>> repository to your projects resolvers and test with the RC (make sure to
>>>>>>> clean up the artifact cache before/after so you don't end up building with
>>>>>>> a out of date RC going forward).
>>>>>>> =========================================== What should happen to JIRA
>>>>>>> tickets still targeting 3.2.1? ===========================================
>>>>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for
>>>>>>> "Target Version/s" = 3.2.1 Committers should look at those and triage.
>>>>>>> Extremely important bug fixes, documentation, and API tweaks that impact
>>>>>>> compatibility should be worked on immediately. Everything else please
>>>>>>> retarget to an appropriate release. ================== But my bug isn't
>>>>>>> fixed? ================== In order to make timely releases, we will
>>>>>>> typically not hold the release unless the bug in question is a regression
>>>>>>> from the previous release. That being said, if there is something which is
>>>>>>> a regression that has not been correctly targeted please ping me or a
>>>>>>> committer to help target the issue.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Bjørn Jørgensen
>>>>>> Vestre Aspehaug 4
>>>>>> <https://www.google.com/maps/search/Vestre+Aspehaug+4?entry=gmail&source=g>,
>>>>>> 6010 Ålesund
>>>>>> Norge
>>>>>>
>>>>>> +47 480 94 297
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Bjørn Jørgensen
>>>> Vestre Aspehaug 4
>>>> <https://www.google.com/maps/search/Vestre+Aspehaug+4?entry=gmail&source=g>,
>>>> 6010 Ålesund
>>>> Norge
>>>>
>>>> +47 480 94 297
>>>>
>>>
>>
>> --
>> Bjørn Jørgensen
>> Vestre Aspehaug 4
>> <https://www.google.com/maps/search/Vestre+Aspehaug+4?entry=gmail&source=g>,
>> 6010 Ålesund
>> Norge
>>
>> +47 480 94 297
>>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Sean Owen <sr...@gmail.com>.

Continue on the ticket - I am not sure this is established. We would block
a release for critical problems that are not regressions. This is not a
data loss / 'deleting data' issue even if valid.
You're welcome to provide feedback but votes are for the PMC.

On Fri, Jan 21, 2022 at 5:24 PM Bjørn Jørgensen <bj...@gmail.com>
wrote:

> Ok, but deleting users' data without them knowing it is never a good idea.
> That's why I give this RC -1.
>
> lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen <sr...@gmail.com>:
>
>> (Bjorn - unless this is a regression, it would not block a release, even
>> if it's a bug)
>>
>> On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen <bj...@gmail.com>
>> wrote:
>>
>>> [x] -1 Do not release this package because, deletes all my columns with
>>> only Null in it.
>>>
>>> I have opened https://issues.apache.org/jira/browse/SPARK-37981 for
>>> this bug.
>>>
>>>
>>>
>>>
>>> fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen <sr...@gmail.com>:
>>>
>>>> (Are you suggesting this is a regression, or is it a general question?
>>>> here we're trying to figure out whether there are critical bugs introduced
>>>> in 3.2.1 vs 3.2.0)
>>>>
>>>> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen <
>>>> bjornjorgensen@gmail.com> wrote:
>>>>
>>>>> Hi, I am wondering if it's a bug or not.
>>>>>
>>>>> I do have a lot of json files, where they have some columns that are
>>>>> all "null" on.
>>>>>
>>>>> I start spark with
>>>>>
>>>>> from pyspark import pandas as ps
>>>>> import re
>>>>> import numpy as np
>>>>> import os
>>>>> import pandas as pd
>>>>>
>>>>> from pyspark import SparkContext, SparkConf
>>>>> from pyspark.sql import SparkSession
>>>>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim,
>>>>> expr
>>>>> from pyspark.sql.types import StructType, StructField,
>>>>> StringType,IntegerType
>>>>>
>>>>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>>>>
>>>>> def get_spark_session(app_name: str, conf: SparkConf):
>>>>>     conf.setMaster('local[*]')
>>>>>     conf \
>>>>>       .set('spark.driver.memory', '64g')\
>>>>>       .set("fs.s3a.access.key", "minio") \
>>>>>       .set("fs.s3a.secret.key", "") \
>>>>>       .set("fs.s3a.endpoint", "http://192.168.1.127:9000") \
>>>>>       .set("spark.hadoop.fs.s3a.impl",
>>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>>>>       .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>>>>       .set("spark.sql.repl.eagerEval.enabled", "True") \
>>>>>       .set("spark.sql.adaptive.enabled", "True") \
>>>>>       .set("spark.serializer",
>>>>> "org.apache.spark.serializer.KryoSerializer") \
>>>>>       .set("spark.sql.repl.eagerEval.maxNumRows", "10000") \
>>>>>       .set("sc.setLogLevel", "error")
>>>>>
>>>>>     return
>>>>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>>>>
>>>>> spark = get_spark_session("Falk", SparkConf())
>>>>>
>>>>> d3 =
>>>>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>>>>>
>>>>> import pyspark
>>>>> def sparkShape(dataFrame):
>>>>>     return (dataFrame.count(), len(dataFrame.columns))
>>>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>>>>> print(d3.shape())
>>>>>
>>>>>
>>>>> (653610, 267)
>>>>>
>>>>>
>>>>> d3.write.json("d3.json")
>>>>>
>>>>>
>>>>> d3 = spark.read.json("d3.json/*.json")
>>>>>
>>>>> import pyspark
>>>>> def sparkShape(dataFrame):
>>>>>     return (dataFrame.count(), len(dataFrame.columns))
>>>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>>>>> print(d3.shape())
>>>>>
>>>>> (653610, 186)
>>>>>
>>>>>
>>>>> So spark is deleting 81 columns. I think that all of these 81 deleted
>>>>> columns have only Null in them.
>>>>>
>>>>> Is this a bug or has this been made on purpose?
>>>>>
>>>>>
>>>>> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao <huaxin.gao11@gmail.com
>>>>> >:
>>>>>
>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
>>>>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
>>>>>> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
>>>>>> this package because ... To learn more about Apache Spark, please see
>>>>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2
>>>>>> (commit 4f25b3f71238a00508a356591553f2dfa89f8290):
>>>>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>> at:https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>>>>> repository for this release can be found at:
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>>>>
>>>>>> The documentation corresponding to this release can be found at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>>>>>> The list of bug fixes going into 3.2.1 can be found at the following
>>>>>> URL:https://s.apache.org/yu0cy
>>>>>>
>>>>>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>>>>>> ========================= How can I help test this release?
>>>>>> ========================= If you are a Spark user, you can help us test
>>>>>> this release by taking an existing Spark workload and running on this
>>>>>> release candidate, then reporting any regressions. If you're working in
>>>>>> PySpark you can set up a virtual env and install the current RC and see if
>>>>>> anything important breaks, in the Java/Scala you can add the staging
>>>>>> repository to your projects resolvers and test with the RC (make sure to
>>>>>> clean up the artifact cache before/after so you don't end up building with
>>>>>> a out of date RC going forward).
>>>>>> =========================================== What should happen to JIRA
>>>>>> tickets still targeting 3.2.1? ===========================================
>>>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>>>>> important bug fixes, documentation, and API tweaks that impact
>>>>>> compatibility should be worked on immediately. Everything else please
>>>>>> retarget to an appropriate release. ================== But my bug isn't
>>>>>> fixed? ================== In order to make timely releases, we will
>>>>>> typically not hold the release unless the bug in question is a regression
>>>>>> from the previous release. That being said, if there is something which is
>>>>>> a regression that has not been correctly targeted please ping me or a
>>>>>> committer to help target the issue.
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bjørn Jørgensen
>>>>> Vestre Aspehaug 4, 6010 Ålesund
>>>>> Norge
>>>>>
>>>>> +47 480 94 297
>>>>>
>>>>
>>>
>>> --
>>> Bjørn Jørgensen
>>> Vestre Aspehaug 4, 6010 Ålesund
>>> Norge
>>>
>>> +47 480 94 297
>>>
>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Bjørn Jørgensen <bj...@gmail.com>.

Ok, but deleting users' data without them knowing it is never a good idea.
That's why I give this RC -1.

lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen <sr...@gmail.com>:

> (Bjorn - unless this is a regression, it would not block a release, even
> if it's a bug)
>
> On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen <bj...@gmail.com>
> wrote:
>
>> [x] -1 Do not release this package because, deletes all my columns with
>> only Null in it.
>>
>> I have opened https://issues.apache.org/jira/browse/SPARK-37981 for this
>> bug.
>>
>>
>>
>>
>> fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen <sr...@gmail.com>:
>>
>>> (Are you suggesting this is a regression, or is it a general question?
>>> here we're trying to figure out whether there are critical bugs introduced
>>> in 3.2.1 vs 3.2.0)
>>>
>>> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen <
>>> bjornjorgensen@gmail.com> wrote:
>>>
>>>> Hi, I am wondering if it's a bug or not.
>>>>
>>>> I do have a lot of json files, where they have some columns that are
>>>> all "null" on.
>>>>
>>>> I start spark with
>>>>
>>>> from pyspark import pandas as ps
>>>> import re
>>>> import numpy as np
>>>> import os
>>>> import pandas as pd
>>>>
>>>> from pyspark import SparkContext, SparkConf
>>>> from pyspark.sql import SparkSession
>>>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim,
>>>> expr
>>>> from pyspark.sql.types import StructType, StructField,
>>>> StringType,IntegerType
>>>>
>>>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>>>
>>>> def get_spark_session(app_name: str, conf: SparkConf):
>>>>     conf.setMaster('local[*]')
>>>>     conf \
>>>>       .set('spark.driver.memory', '64g')\
>>>>       .set("fs.s3a.access.key", "minio") \
>>>>       .set("fs.s3a.secret.key", "") \
>>>>       .set("fs.s3a.endpoint", "http://192.168.1.127:9000") \
>>>>       .set("spark.hadoop.fs.s3a.impl",
>>>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>>>       .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>>>       .set("spark.sql.repl.eagerEval.enabled", "True") \
>>>>       .set("spark.sql.adaptive.enabled", "True") \
>>>>       .set("spark.serializer",
>>>> "org.apache.spark.serializer.KryoSerializer") \
>>>>       .set("spark.sql.repl.eagerEval.maxNumRows", "10000") \
>>>>       .set("sc.setLogLevel", "error")
>>>>
>>>>     return
>>>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>>>
>>>> spark = get_spark_session("Falk", SparkConf())
>>>>
>>>> d3 =
>>>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>>>>
>>>> import pyspark
>>>> def sparkShape(dataFrame):
>>>>     return (dataFrame.count(), len(dataFrame.columns))
>>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>>>> print(d3.shape())
>>>>
>>>>
>>>> (653610, 267)
>>>>
>>>>
>>>> d3.write.json("d3.json")
>>>>
>>>>
>>>> d3 = spark.read.json("d3.json/*.json")
>>>>
>>>> import pyspark
>>>> def sparkShape(dataFrame):
>>>>     return (dataFrame.count(), len(dataFrame.columns))
>>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>>>> print(d3.shape())
>>>>
>>>> (653610, 186)
>>>>
>>>>
>>>> So spark is deleting 81 columns. I think that all of these 81 deleted
>>>> columns have only Null in them.
>>>>
>>>> Is this a bug or has this been made on purpose?
>>>>
>>>>
>>>> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao <hu...@gmail.com>:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
>>>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
>>>>> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
>>>>> this package because ... To learn more about Apache Spark, please see
>>>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>>>> Signatures used for Spark RCs can be found in this file:
>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>>>> repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>>>>> The list of bug fixes going into 3.2.1 can be found at the following
>>>>> URL:https://s.apache.org/yu0cy
>>>>>
>>>>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>>>>> ========================= How can I help test this release?
>>>>> ========================= If you are a Spark user, you can help us test
>>>>> this release by taking an existing Spark workload and running on this
>>>>> release candidate, then reporting any regressions. If you're working in
>>>>> PySpark you can set up a virtual env and install the current RC and see if
>>>>> anything important breaks, in the Java/Scala you can add the staging
>>>>> repository to your projects resolvers and test with the RC (make sure to
>>>>> clean up the artifact cache before/after so you don't end up building with
>>>>> a out of date RC going forward).
>>>>> =========================================== What should happen to JIRA
>>>>> tickets still targeting 3.2.1? ===========================================
>>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>>>> important bug fixes, documentation, and API tweaks that impact
>>>>> compatibility should be worked on immediately. Everything else please
>>>>> retarget to an appropriate release. ================== But my bug isn't
>>>>> fixed? ================== In order to make timely releases, we will
>>>>> typically not hold the release unless the bug in question is a regression
>>>>> from the previous release. That being said, if there is something which is
>>>>> a regression that has not been correctly targeted please ping me or a
>>>>> committer to help target the issue.
>>>>>
>>>>
>>>>
>>>> --
>>>> Bjørn Jørgensen
>>>> Vestre Aspehaug 4, 6010 Ålesund
>>>> Norge
>>>>
>>>> +47 480 94 297
>>>>
>>>
>>
>> --
>> Bjørn Jørgensen
>> Vestre Aspehaug 4, 6010 Ålesund
>> Norge
>>
>> +47 480 94 297
>>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Sean Owen <sr...@gmail.com>.

(Bjorn - unless this is a regression, it would not block a release, even if
it's a bug)

On Fri, Jan 21, 2022 at 5:09 PM Bjørn Jørgensen <bj...@gmail.com>
wrote:

> [x] -1 Do not release this package because, deletes all my columns with
> only Null in it.
>
> I have opened https://issues.apache.org/jira/browse/SPARK-37981 for this
> bug.
>
>
>
>
> fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen <sr...@gmail.com>:
>
>> (Are you suggesting this is a regression, or is it a general question?
>> here we're trying to figure out whether there are critical bugs introduced
>> in 3.2.1 vs 3.2.0)
>>
>> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen <bj...@gmail.com>
>> wrote:
>>
>>> Hi, I am wondering if it's a bug or not.
>>>
>>> I do have a lot of json files, where they have some columns that are all
>>> "null" on.
>>>
>>> I start spark with
>>>
>>> from pyspark import pandas as ps
>>> import re
>>> import numpy as np
>>> import os
>>> import pandas as pd
>>>
>>> from pyspark import SparkContext, SparkConf
>>> from pyspark.sql import SparkSession
>>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
>>> from pyspark.sql.types import StructType, StructField,
>>> StringType,IntegerType
>>>
>>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>>
>>> def get_spark_session(app_name: str, conf: SparkConf):
>>>     conf.setMaster('local[*]')
>>>     conf \
>>>       .set('spark.driver.memory', '64g')\
>>>       .set("fs.s3a.access.key", "minio") \
>>>       .set("fs.s3a.secret.key", "") \
>>>       .set("fs.s3a.endpoint", "http://192.168.1.127:9000") \
>>>       .set("spark.hadoop.fs.s3a.impl",
>>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>>       .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>>       .set("spark.sql.repl.eagerEval.enabled", "True") \
>>>       .set("spark.sql.adaptive.enabled", "True") \
>>>       .set("spark.serializer",
>>> "org.apache.spark.serializer.KryoSerializer") \
>>>       .set("spark.sql.repl.eagerEval.maxNumRows", "10000") \
>>>       .set("sc.setLogLevel", "error")
>>>
>>>     return
>>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>>
>>> spark = get_spark_session("Falk", SparkConf())
>>>
>>> d3 =
>>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>>>
>>> import pyspark
>>> def sparkShape(dataFrame):
>>>     return (dataFrame.count(), len(dataFrame.columns))
>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>>> print(d3.shape())
>>>
>>>
>>> (653610, 267)
>>>
>>>
>>> d3.write.json("d3.json")
>>>
>>>
>>> d3 = spark.read.json("d3.json/*.json")
>>>
>>> import pyspark
>>> def sparkShape(dataFrame):
>>>     return (dataFrame.count(), len(dataFrame.columns))
>>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>>> print(d3.shape())
>>>
>>> (653610, 186)
>>>
>>>
>>> So spark is deleting 81 columns. I think that all of these 81 deleted
>>> columns have only Null in them.
>>>
>>> Is this a bug or has this been made on purpose?
>>>
>>>
>>> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao <hu...@gmail.com>:
>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 3.2.1. The vote is open until 8:00pm Pacific time January 25 and
>>>> passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [
>>>> ] +1 Release this package as Apache Spark 3.2.1[ ] -1 Do not release
>>>> this package because ... To learn more about Apache Spark, please see
>>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>>> repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>>>> The list of bug fixes going into 3.2.1 can be found at the following
>>>> URL:https://s.apache.org/yu0cy
>>>>
>>>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>>>> ========================= How can I help test this release?
>>>> ========================= If you are a Spark user, you can help us test
>>>> this release by taking an existing Spark workload and running on this
>>>> release candidate, then reporting any regressions. If you're working in
>>>> PySpark you can set up a virtual env and install the current RC and see if
>>>> anything important breaks, in the Java/Scala you can add the staging
>>>> repository to your projects resolvers and test with the RC (make sure to
>>>> clean up the artifact cache before/after so you don't end up building with
>>>> a out of date RC going forward).
>>>> =========================================== What should happen to JIRA
>>>> tickets still targeting 3.2.1? ===========================================
>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>>> important bug fixes, documentation, and API tweaks that impact
>>>> compatibility should be worked on immediately. Everything else please
>>>> retarget to an appropriate release. ================== But my bug isn't
>>>> fixed? ================== In order to make timely releases, we will
>>>> typically not hold the release unless the bug in question is a regression
>>>> from the previous release. That being said, if there is something which is
>>>> a regression that has not been correctly targeted please ping me or a
>>>> committer to help target the issue.
>>>>
>>>
>>>
>>> --
>>> Bjørn Jørgensen
>>> Vestre Aspehaug 4, 6010 Ålesund
>>> Norge
>>>
>>> +47 480 94 297
>>>
>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Bjørn Jørgensen <bj...@gmail.com>.

[x] -1 Do not release this package because, deletes all my columns with
only Null in it.

I have opened https://issues.apache.org/jira/browse/SPARK-37981 for this
bug.




fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen <sr...@gmail.com>:

> (Are you suggesting this is a regression, or is it a general question?
> here we're trying to figure out whether there are critical bugs introduced
> in 3.2.1 vs 3.2.0)
>
> On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen <bj...@gmail.com>
> wrote:
>
>> Hi, I am wondering if it's a bug or not.
>>
>> I do have a lot of json files, where they have some columns that are all
>> "null" on.
>>
>> I start spark with
>>
>> from pyspark import pandas as ps
>> import re
>> import numpy as np
>> import os
>> import pandas as pd
>>
>> from pyspark import SparkContext, SparkConf
>> from pyspark.sql import SparkSession
>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
>> from pyspark.sql.types import StructType, StructField,
>> StringType,IntegerType
>>
>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>
>> def get_spark_session(app_name: str, conf: SparkConf):
>>     conf.setMaster('local[*]')
>>     conf \
>>       .set('spark.driver.memory', '64g')\
>>       .set("fs.s3a.access.key", "minio") \
>>       .set("fs.s3a.secret.key", "") \
>>       .set("fs.s3a.endpoint", "http://192.168.1.127:9000") \
>>       .set("spark.hadoop.fs.s3a.impl",
>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>       .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>       .set("spark.sql.repl.eagerEval.enabled", "True") \
>>       .set("spark.sql.adaptive.enabled", "True") \
>>       .set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer") \
>>       .set("spark.sql.repl.eagerEval.maxNumRows", "10000") \
>>       .set("sc.setLogLevel", "error")
>>
>>     return
>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>
>> spark = get_spark_session("Falk", SparkConf())
>>
>> d3 =
>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>>
>> import pyspark
>> def sparkShape(dataFrame):
>>     return (dataFrame.count(), len(dataFrame.columns))
>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>> print(d3.shape())
>>
>>
>> (653610, 267)
>>
>>
>> d3.write.json("d3.json")
>>
>>
>> d3 = spark.read.json("d3.json/*.json")
>>
>> import pyspark
>> def sparkShape(dataFrame):
>>     return (dataFrame.count(), len(dataFrame.columns))
>> pyspark.sql.dataframe.DataFrame.shape = sparkShape
>> print(d3.shape())
>>
>> (653610, 186)
>>
>>
>> So spark is deleting 81 columns. I think that all of these 81 deleted
>> columns have only Null in them.
>>
>> Is this a bug or has this been made on purpose?
>>
>>
>> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao <hu...@gmail.com>:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>>> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
>>> package because ... To learn more about Apache Spark, please see
>>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>>> https://github.com/apache/spark/tree/v3.2.1-rc2
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging
>>> repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>>> The list of bug fixes going into 3.2.1 can be found at the following URL:
>>> https://s.apache.org/yu0cy
>>>
>>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>>> ========================= How can I help test this release?
>>> ========================= If you are a Spark user, you can help us test
>>> this release by taking an existing Spark workload and running on this
>>> release candidate, then reporting any regressions. If you're working in
>>> PySpark you can set up a virtual env and install the current RC and see if
>>> anything important breaks, in the Java/Scala you can add the staging
>>> repository to your projects resolvers and test with the RC (make sure to
>>> clean up the artifact cache before/after so you don't end up building with
>>> a out of date RC going forward).
>>> =========================================== What should happen to JIRA
>>> tickets still targeting 3.2.1? ===========================================
>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>>> important bug fixes, documentation, and API tweaks that impact
>>> compatibility should be worked on immediately. Everything else please
>>> retarget to an appropriate release. ================== But my bug isn't
>>> fixed? ================== In order to make timely releases, we will
>>> typically not hold the release unless the bug in question is a regression
>>> from the previous release. That being said, if there is something which is
>>> a regression that has not been correctly targeted please ping me or a
>>> committer to help target the issue.
>>>
>>
>>
>> --
>> Bjørn Jørgensen
>> Vestre Aspehaug 4, 6010 Ålesund
>> Norge
>>
>> +47 480 94 297
>>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Sean Owen <sr...@gmail.com>.

(Are you suggesting this is a regression, or is it a general question? here
we're trying to figure out whether there are critical bugs introduced in
3.2.1 vs 3.2.0)

On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen <bj...@gmail.com>
wrote:

> Hi, I am wondering if it's a bug or not.
>
> I do have a lot of json files, where they have some columns that are all
> "null" on.
>
> I start spark with
>
> from pyspark import pandas as ps
> import re
> import numpy as np
> import os
> import pandas as pd
>
> from pyspark import SparkContext, SparkConf
> from pyspark.sql import SparkSession
> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
> from pyspark.sql.types import StructType, StructField,
> StringType,IntegerType
>
> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>
> def get_spark_session(app_name: str, conf: SparkConf):
>     conf.setMaster('local[*]')
>     conf \
>       .set('spark.driver.memory', '64g')\
>       .set("fs.s3a.access.key", "minio") \
>       .set("fs.s3a.secret.key", "") \
>       .set("fs.s3a.endpoint", "http://192.168.1.127:9000") \
>       .set("spark.hadoop.fs.s3a.impl",
> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>       .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>       .set("spark.sql.repl.eagerEval.enabled", "True") \
>       .set("spark.sql.adaptive.enabled", "True") \
>       .set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer") \
>       .set("spark.sql.repl.eagerEval.maxNumRows", "10000") \
>       .set("sc.setLogLevel", "error")
>
>     return
> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>
> spark = get_spark_session("Falk", SparkConf())
>
> d3 =
> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")
>
> import pyspark
> def sparkShape(dataFrame):
>     return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
>
>
> (653610, 267)
>
>
> d3.write.json("d3.json")
>
>
> d3 = spark.read.json("d3.json/*.json")
>
> import pyspark
> def sparkShape(dataFrame):
>     return (dataFrame.count(), len(dataFrame.columns))
> pyspark.sql.dataframe.DataFrame.shape = sparkShape
> print(d3.shape())
>
> (653610, 186)
>
>
> So spark is deleting 81 columns. I think that all of these 81 deleted
> columns have only Null in them.
>
> Is this a bug or has this been made on purpose?
>
>
> fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao <hu...@gmail.com>:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
>> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
>> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
>> package because ... To learn more about Apache Spark, please see
>> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
>> 4f25b3f71238a00508a356591553f2dfa89f8290):
>> https://github.com/apache/spark/tree/v3.2.1-rc2
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
>> for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1398/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
>> The list of bug fixes going into 3.2.1 can be found at the following URL:
>> https://s.apache.org/yu0cy
>>
>> This release is using the release script of the tag v3.2.1-rc2. FAQ
>> ========================= How can I help test this release?
>> ========================= If you are a Spark user, you can help us test
>> this release by taking an existing Spark workload and running on this
>> release candidate, then reporting any regressions. If you're working in
>> PySpark you can set up a virtual env and install the current RC and see if
>> anything important breaks, in the Java/Scala you can add the staging
>> repository to your projects resolvers and test with the RC (make sure to
>> clean up the artifact cache before/after so you don't end up building with
>> a out of date RC going forward).
>> =========================================== What should happen to JIRA
>> tickets still targeting 3.2.1? ===========================================
>> The current list of open tickets targeted at 3.2.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
>> important bug fixes, documentation, and API tweaks that impact
>> compatibility should be worked on immediately. Everything else please
>> retarget to an appropriate release. ================== But my bug isn't
>> fixed? ================== In order to make timely releases, we will
>> typically not hold the release unless the bug in question is a regression
>> from the previous release. That being said, if there is something which is
>> a regression that has not been correctly targeted please ping me or a
>> committer to help target the issue.
>>
>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>

Re: [VOTE] Release Spark 3.2.1 (RC2)

Posted by Bjørn Jørgensen <bj...@gmail.com>.

Hi, I am wondering if it's a bug or not.

I do have a lot of json files, where they have some columns that are all
"null" on.

I start spark with

from pyspark import pandas as ps
import re
import numpy as np
import os
import pandas as pd

from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
from pyspark.sql.types import StructType, StructField,
StringType,IntegerType

os.environ["PYARROW_IGNORE_TIMEZONE"]="1"

def get_spark_session(app_name: str, conf: SparkConf):
    conf.setMaster('local[*]')
    conf \
      .set('spark.driver.memory', '64g')\
      .set("fs.s3a.access.key", "minio") \
      .set("fs.s3a.secret.key", "") \
      .set("fs.s3a.endpoint", "http://192.168.1.127:9000") \
      .set("spark.hadoop.fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem") \
      .set("spark.hadoop.fs.s3a.path.style.access", "true") \
      .set("spark.sql.repl.eagerEval.enabled", "True") \
      .set("spark.sql.adaptive.enabled", "True") \
      .set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer") \
      .set("spark.sql.repl.eagerEval.maxNumRows", "10000") \
      .set("sc.setLogLevel", "error")

    return
SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()

spark = get_spark_session("Falk", SparkConf())

d3 =
spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/3/*.json")

import pyspark
def sparkShape(dataFrame):
    return (dataFrame.count(), len(dataFrame.columns))
pyspark.sql.dataframe.DataFrame.shape = sparkShape
print(d3.shape())


(653610, 267)


d3.write.json("d3.json")


d3 = spark.read.json("d3.json/*.json")

import pyspark
def sparkShape(dataFrame):
    return (dataFrame.count(), len(dataFrame.columns))
pyspark.sql.dataframe.DataFrame.shape = sparkShape
print(d3.shape())

(653610, 186)


So spark is deleting 81 columns. I think that all of these 81 deleted
columns have only Null in them.

Is this a bug or has this been made on purpose?


fre. 21. jan. 2022 kl. 04:59 skrev huaxin gao <hu...@gmail.com>:

> Please vote on releasing the following candidate as Apache Spark version
> 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1
> Release this package as Apache Spark 3.2.1[ ] -1 Do not release this
> package because ... To learn more about Apache Spark, please see
> http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 (commit
> 4f25b3f71238a00508a356591553f2dfa89f8290):
> https://github.com/apache/spark/tree/v3.2.1-rc2
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository
> for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1398/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/
> The list of bug fixes going into 3.2.1 can be found at the following URL:
> https://s.apache.org/yu0cy
>
> This release is using the release script of the tag v3.2.1-rc2. FAQ
> ========================= How can I help test this release?
> ========================= If you are a Spark user, you can help us test
> this release by taking an existing Spark workload and running on this
> release candidate, then reporting any regressions. If you're working in
> PySpark you can set up a virtual env and install the current RC and see if
> anything important breaks, in the Java/Scala you can add the staging
> repository to your projects resolvers and test with the RC (make sure to
> clean up the artifact cache before/after so you don't end up building with
> a out of date RC going forward).
> =========================================== What should happen to JIRA
> tickets still targeting 3.2.1? ===========================================
> The current list of open tickets targeted at 3.2.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.1 Committers should look at those and triage. Extremely
> important bug fixes, documentation, and API tweaks that impact
> compatibility should be worked on immediately. Everything else please
> retarget to an appropriate release. ================== But my bug isn't
> fixed? ================== In order to make timely releases, we will
> typically not hold the release unless the bug in question is a regression
> from the previous release. That being said, if there is something which is
> a regression that has not been correctly targeted please ping me or a
> committer to help target the issue.
>


-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297