You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Gengliang Wang <lt...@gmail.com> on 2021/09/01 04:15:18 UTC

Re: [VOTE] Release Spark 3.2.0 (RC1)

Hi Chao & DB,

Actually, I cut the RC2 yesterday before your post the Parquet issue:
https://github.com/apache/spark/tree/v3.2.0-rc2
It has been 11 days since RC1. I think we can have RC2 today so that the
community can test and find potential issues earlier.
As for the Parquet issue, we can treat it as a known blocker. If it takes
more than one week(which is not likely to happen), we will have to consider
reverting Parquet 1.12 and related features from branch-3.2.

Gengliang

On Wed, Sep 1, 2021 at 5:40 AM DB Tsai <db...@dbtsai.com.invalid> wrote:

> Hello Xiao, there are multiple patches in Spark 3.2 depending on parquet
> 1.12, so it might be easier to wait for the fix in parquet community
> instead of reverting all the related changes. The fix in parquet community
> is very trivial, and we hope that it will not take too long. Thanks.
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>
>
> On Tue, Aug 31, 2021 at 1:09 PM Chao Sun <su...@apache.org> wrote:
>
>> Hi Xiao, I'm still checking with the Parquet community on this. Since the
>> fix is already +1'd, I'm hoping this won't take long. The delta in
>> parquet-1.12.x branch is also small with just 2 commits so far.
>>
>> Chao
>>
>> On Tue, Aug 31, 2021 at 12:03 PM Xiao Li <li...@databricks.com> wrote:
>>
>>> Hi, Chao,
>>>
>>> How long will it take? Normally, in the RC stage, we always revert the
>>> upgrade made in the current release. We did the parquet upgrade multiple
>>> times in the previous releases for avoiding the major delay in our Spark
>>> release
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>>
>>> On Tue, Aug 31, 2021 at 11:03 AM Chao Sun <su...@apache.org> wrote:
>>>
>>>> The Apache Parquet community found an issue [1] in 1.12.0 which could
>>>> cause incorrect file offset being written and subsequently reading of the
>>>> same file to fail. A fix has been proposed in the same JIRA and we may have
>>>> to wait until a new release is available so that we can upgrade Spark with
>>>> the hot fix.
>>>>
>>>> [1]: https://issues.apache.org/jira/browse/PARQUET-2078
>>>>
>>>> On Fri, Aug 27, 2021 at 7:06 AM Sean Owen <sr...@gmail.com> wrote:
>>>>
>>>>> Maybe, I'm just confused why it's needed at all. Other profiles that
>>>>> add a dependency seem OK, but something's different here.
>>>>>
>>>>> One thing we can/should change is to simply remove the
>>>>> <dependencyManagement> block in the profile. It should always be a direct
>>>>> dep in Scala 2.13 (which lets us take out the profiles in submodules, which
>>>>> just repeat that)
>>>>> We can also update the version, by the by.
>>>>>
>>>>> I tried this and the resulting POM still doesn't look like what I
>>>>> expect though.
>>>>>
>>>>> (The binary release is OK, FWIW - it gets pulled in as a JAR as
>>>>> expected)
>>>>>
>>>>> On Thu, Aug 26, 2021 at 11:34 PM Stephen Coy <sc...@infomedia.com.au>
>>>>> wrote:
>>>>>
>>>>>> Hi Sean,
>>>>>>
>>>>>> I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will
>>>>>> help you out here.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Steve C
>>>>>>
>>>>>> On 27 Aug 2021, at 12:29 pm, Sean Owen <sr...@gmail.com> wrote:
>>>>>>
>>>>>> OK right, you would have seen a different error otherwise.
>>>>>>
>>>>>> Yes profiles are only a compile-time thing, but they should affect
>>>>>> the effective POM for the artifact. mvn -Pscala-2.13 help:effective-pom
>>>>>> shows scala-parallel-collections as a dependency in the POM as expected
>>>>>> (not in a profile). However I see what you see in the .pom in the release
>>>>>> repo, and in my local repo after building - it's just sitting there as a
>>>>>> profile as if it weren't activated or something.
>>>>>>
>>>>>> I'm confused then, that shouldn't be what happens. I'd say maybe
>>>>>> there is a problem with the release script, but seems to affect a simple
>>>>>> local build. Anyone else more expert in this see the problem, while I try
>>>>>> to debug more?
>>>>>> The binary distro may actually be fine, I'll check; it may even not
>>>>>> matter much for users who generally just treat Spark as a compile-time-only
>>>>>> dependency either. But I can see it would break exactly your case,
>>>>>> something like a self-contained test job.
>>>>>>
>>>>>> On Thu, Aug 26, 2021 at 8:41 PM Stephen Coy <sc...@infomedia.com.au>
>>>>>> wrote:
>>>>>>
>>>>>>> I did indeed.
>>>>>>>
>>>>>>> The generated spark-core_2.13-3.2.0.pom that is created alongside
>>>>>>> the jar file in the local repo contains:
>>>>>>>
>>>>>>> <profile>
>>>>>>>   <id>scala-2.13</id>
>>>>>>>   <dependencies>
>>>>>>>     <dependency>
>>>>>>>       <groupId>org.scala-lang.modules</groupId>
>>>>>>>
>>>>>>> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId>
>>>>>>>     </dependency>
>>>>>>>   </dependencies>
>>>>>>> </profile>
>>>>>>>
>>>>>>> which means this dependency will be missing for unit tests that
>>>>>>> create SparkSessions from library code only, a technique inspired by
>>>>>>> Spark’s own unit tests.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Steve C
>>>>>>>
>>>>>>> On 27 Aug 2021, at 11:33 am, Sean Owen <sr...@gmail.com> wrote:
>>>>>>>
>>>>>>> Did you run ./dev/change-scala-version.sh 2.13 ? that's required
>>>>>>> first to update POMs. It works fine for me.
>>>>>>>
>>>>>>> On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy <
>>>>>>> scoy@infomedia.com.au.invalid> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Being adventurous I have built the RC1 code with:
>>>>>>>>
>>>>>>>> -Pyarn -Phadoop-3.2  -Pyarn -Phadoop-cloud -Phive-thriftserver
>>>>>>>> -Phive-2.3 -Pscala-2.13 -Dhadoop.version=3.2.2
>>>>>>>>
>>>>>>>>
>>>>>>>> And then attempted to build my Java based spark application.
>>>>>>>>
>>>>>>>> However, I found a number of our unit tests were failing with:
>>>>>>>>
>>>>>>>> java.lang.NoClassDefFoundError:
>>>>>>>> scala/collection/parallel/TaskSupport
>>>>>>>>
>>>>>>>> at
>>>>>>>> org.apache.spark.SparkContext.$anonfun$union$1(SparkContext.scala:1412)
>>>>>>>> at
>>>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>>>>>>>> at
>>>>>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>>>>>>>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:789)
>>>>>>>> at org.apache.spark.SparkContext.union(SparkContext.scala:1406)
>>>>>>>> at
>>>>>>>> org.apache.spark.sql.execution.UnionExec.doExecute(basicPhysicalOperators.scala:698)
>>>>>>>> at
>>>>>>>> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:184)
>>>>>>>>         …
>>>>>>>>
>>>>>>>>
>>>>>>>> I tracked this down to a missing dependency:
>>>>>>>>
>>>>>>>> <dependency>
>>>>>>>>   <groupId>org.scala-lang.modules</groupId>
>>>>>>>>
>>>>>>>> <artifactId>scala-parallel-collections_${scala.binary.version}</artifactId>
>>>>>>>> </dependency>
>>>>>>>>
>>>>>>>>
>>>>>>>> which unfortunately appears only in a profile in the pom files
>>>>>>>> associated with the various spark dependencies.
>>>>>>>>
>>>>>>>> As far as I know it is not possible to activate profiles in
>>>>>>>> dependencies in maven builds.
>>>>>>>>
>>>>>>>> Therefore I suspect that right now a Scala 2.13 migration is not
>>>>>>>> quite as seamless as we would like.
>>>>>>>>
>>>>>>>> I stress that this is only an issue for developers that write unit
>>>>>>>> tests for their applications, as the Spark runtime environment will always
>>>>>>>> have the necessary dependencies available to it.
>>>>>>>>
>>>>>>>> (You might consider upgrading the
>>>>>>>> org.scala-lang.modules:scala-parallel-collections_2.13 version from 0.2 to
>>>>>>>> 1.0.3 though!)
>>>>>>>>
>>>>>>>> Cheers and thanks for the great work!
>>>>>>>>
>>>>>>>> Steve Coy
>>>>>>>>
>>>>>>>>
>>>>>>>> On 21 Aug 2021, at 3:05 am, Gengliang Wang <lt...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>>>  version 3.2.0.
>>>>>>>>
>>>>>>>> The vote is open until 11:59pm Pacific time Aug 25 and passes if a
>>>>>>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>>>>>>
>>>>>>>> [ ] +1 Release this package as Apache Spark 3.2.0
>>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>>
>>>>>>>> To learn more about Apache Spark, please see http://spark
>>>>>>>> .apache.org/
>>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738454069%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=R0QBrNxN%2FYd9HrCrihR5XgRZF7jYRHcq931lLXwhQeQ%3D&reserved=0>
>>>>>>>>
>>>>>>>> The tag to be voted on is v3.2.0-rc1 (commit
>>>>>>>> 6bb3523d8e838bd2082fb90d7f3741339245c044):
>>>>>>>> https://github.com/apache/spark/tree/v3.2.0-rc1
>>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Ftree%2Fv3.2.0-rc1&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738464031%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=aDmKWoXWZNsrYv6bLP%2F78rnC8rbhYEbOVoJ3FwQ49yU%3D&reserved=0>
>>>>>>>>
>>>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>>>> at:
>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-bin/
>>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc1-bin%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738464031%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=6w0zf1lNPWdTeSLOGmUo4yMkDwd6xwC4o7EUkw1n9gI%3D&reserved=0>
>>>>>>>>
>>>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2FKEYS&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=x7XeOjMPwuEqR%2FuXijVjAlwf68MuVInqGhZ9l19eVPI%3D&reserved=0>
>>>>>>>>
>>>>>>>> The staging repository for this release can be found at:
>>>>>>>>
>>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1388
>>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachespark-1388&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=DLKn1scc4YOYUNGP51ch4nkxr1lh5nhZIBj0%2BoBSCXo%3D&reserved=0>
>>>>>>>>
>>>>>>>> The documentation corresponding to this release can be found at:
>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc1-docs/
>>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fspark%2Fv3.2.0-rc1-docs%2F&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738473982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=QtfYYwnJlQIHry0TlmQy72y2DYzat1MQmpBQkATw%2BAQ%3D&reserved=0>
>>>>>>>>
>>>>>>>> The list of bug fixes going into 3.2.0 can be found at the
>>>>>>>> following URL:
>>>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12349407
>>>>>>>>
>>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FSPARK%2Fversions%2F12349407&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738483945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=cop5XebB3u0dc2rRqe4YvHfCJ2w9yLlhcdaGB7TSTas%3D&reserved=0>
>>>>>>>>
>>>>>>>> This release is using the release script of the tag v3.2.0-rc1.
>>>>>>>>
>>>>>>>>
>>>>>>>> FAQ
>>>>>>>>
>>>>>>>> =========================
>>>>>>>> How can I help test this release?
>>>>>>>> =========================
>>>>>>>> If you are a Spark user, you can help us test this release by
>>>>>>>> taking
>>>>>>>> an existing Spark workload and running on this release candidate,
>>>>>>>> then
>>>>>>>> reporting any regressions.
>>>>>>>>
>>>>>>>> If you're working in PySpark you can set up a virtual env and
>>>>>>>> install
>>>>>>>> the current RC and see if anything important breaks, in the
>>>>>>>> Java/Scala
>>>>>>>> you can add the staging repository to your projects resolvers and
>>>>>>>> test
>>>>>>>> with the RC (make sure to clean up the artifact cache before/after
>>>>>>>> so
>>>>>>>> you don't end up building with a out of date RC going forward).
>>>>>>>>
>>>>>>>> ===========================================
>>>>>>>> What should happen to JIRA tickets still targeting 3.2.0?
>>>>>>>> ===========================================
>>>>>>>> The current list of open tickets targeted at 3.2.0 can be found at:
>>>>>>>> https://issues.apache.org/jira/projects/SPARK
>>>>>>>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FSPARK&data=04%7C01%7Cscoy%40infomedia.com.au%7Ca129f588b6f74ab624b908d96902801d%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637656281738483945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=k5gTpGV4JvGRC6gKOXY%2BlaZKAH5NPFM3nDwmRyNDiQA%3D&reserved=0> and
>>>>>>>> search for "Target Version/s" = 3.2.0
>>>>>>>>
>>>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>>>> fixes, documentation, and API tweaks that impact compatibility
>>>>>>>> should
>>>>>>>> be worked on immediately. Everything else please retarget to an
>>>>>>>> appropriate release.
>>>>>>>>
>>>>>>>> ==================
>>>>>>>> But my bug isn't fixed?
>>>>>>>> ==================
>>>>>>>> In order to make timely releases, we will typically not hold the
>>>>>>>> release unless the bug in question is a regression from the previous
>>>>>>>> release. That being said, if there is something which is a
>>>>>>>> regression
>>>>>>>> that has not been correctly targeted please ping me or a committer
>>>>>>>> to
>>>>>>>> help target the issue.
>>>>>>>>
>>>>>>>>
>>>>>>>> This email contains confidential information of and is the
>>>>>>>> copyright of Infomedia. It must not be forwarded, amended or disclosed
>>>>>>>> without consent of the sender. If you received this message by mistake,
>>>>>>>> please advise the sender and delete all copies. Security of transmission on
>>>>>>>> the internet cannot be guaranteed, could be infected, intercepted, or
>>>>>>>> corrupted and you should ensure you have suitable antivirus protection in
>>>>>>>> place. By sending us your or any third party personal details, you consent
>>>>>>>> to (or confirm you have obtained consent from such third parties) to
>>>>>>>> Infomedia’s privacy policy.
>>>>>>>> http://www.infomedia.com.au/privacy-policy/
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>
>>> --
>>>
>>>