You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Wenchen Fan <cl...@gmail.com> on 2018/10/22 17:42:06 UTC

[VOTE] SPARK 2.4.0 (RC4)

Please vote on releasing the following candidate as Apache Spark version
2.4.0.

The vote is open until October 26 PST and passes if a majority +1 PMC votes
are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.4.0-rc4 (commit
e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
https://github.com/apache/spark/tree/v2.4.0-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1290

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/

The list of bug fixes going into 2.4.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342385

FAQ

=========================
How can I help test this release?
=========================

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===========================================
What should happen to JIRA tickets still targeting 2.4.0?
===========================================

The current list of open tickets targeted at 2.4.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 2.4.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==================
But my bug isn't fixed?
==================

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

+1 (non-binding)

The Iceberg implementation of DataSourceV2 is passing all tests after
updating to the 2.4 API, although I've had to disable ORC support because
BufferHolder is no longer public.

One oddity is that the DSv2 API for batch sources now includes an epoch ID,
which I think will be removed in the refactor before 2.5 or 3.0 and wasn't
part of the 2.3 release. That's strange, but it's minor.

rb

On Tue, Oct 23, 2018 at 5:10 PM Sean Owen <sr...@gmail.com> wrote:

> Hm, so you're trying to build a source release from a binary release?
> I don't think that needs to work nor do I expect it to for reasons
> like this. They just have fairly different things.
>
> On Tue, Oct 23, 2018 at 7:04 PM Dongjoon Hyun <do...@gmail.com>
> wrote:
> >
> > Ur, Wenchen.
> >
> > Source distribution seems to fail by default.
> >
> >
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/spark-2.4.0.tgz
> >
> > $ dev/make-distribution.sh -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive
> -Phive-thriftserver
> > ...
> > + cp /spark-2.4.0/LICENSE-binary /spark-2.4.0/dist/LICENSE
> > cp: /spark-2.4.0/LICENSE-binary: No such file or directory
> >
> >
> > The root cause seems to be the following fix.
> >
> >
> https://github.com/apache/spark/pull/22436/files#diff-01ca42240614718522afde4d4885b40dR175
> >
> > Although Apache Spark provides the binary distributions, it would be
> great if this succeeds out of the box.
> >
> > Bests,
> > Dongjoon.
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Sean Owen <sr...@gmail.com>.

Hm, so you're trying to build a source release from a binary release?
I don't think that needs to work nor do I expect it to for reasons
like this. They just have fairly different things.

On Tue, Oct 23, 2018 at 7:04 PM Dongjoon Hyun <do...@gmail.com> wrote:
>
> Ur, Wenchen.
>
> Source distribution seems to fail by default.
>
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/spark-2.4.0.tgz
>
> $ dev/make-distribution.sh -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> ...
> + cp /spark-2.4.0/LICENSE-binary /spark-2.4.0/dist/LICENSE
> cp: /spark-2.4.0/LICENSE-binary: No such file or directory
>
>
> The root cause seems to be the following fix.
>
> https://github.com/apache/spark/pull/22436/files#diff-01ca42240614718522afde4d4885b40dR175
>
> Although Apache Spark provides the binary distributions, it would be great if this succeeds out of the box.
>
> Bests,
> Dongjoon.
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Dongjoon Hyun <do...@gmail.com>.

Ur, Wenchen.

Source distribution seems to fail by default.

https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/spark-2.4.0.tgz

$ dev/make-distribution.sh -Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive
-Phive-thriftserver
...
+ cp /spark-2.4.0/LICENSE-binary /spark-2.4.0/dist/LICENSE
cp: /spark-2.4.0/LICENSE-binary: No such file or directory


The root cause seems to be the following fix.

https://github.com/apache/spark/pull/22436/files#diff-01ca42240614718522afde4d4885b40dR175

Although Apache Spark provides the binary distributions, it would be great
if this succeeds out of the box.

Bests,
Dongjoon.


On Mon, Oct 22, 2018 at 10:42 AM Wenchen Fan <cl...@gmail.com> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.4.0.
>
> The vote is open until October 26 PST and passes if a majority +1 PMC
> votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc4 (commit
> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
> https://github.com/apache/spark/tree/v2.4.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1290
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.4.0?
> ===========================================
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Wenchen Fan <cl...@gmail.com>.

Since GitHub and Jenkins are in a chaotic state, I didn't wait for a green
Jenkins QA job for the RC4 commit. We should fail this RC if the Jenkins is
broken (very unlikely).

I'm adding my own +1, all known blockers are resolved.

On Tue, Oct 23, 2018 at 1:42 AM Wenchen Fan <cl...@gmail.com> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.4.0.
>
> The vote is open until October 26 PST and passes if a majority +1 PMC
> votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc4 (commit
> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
> https://github.com/apache/spark/tree/v2.4.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1290
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.4.0?
> ===========================================
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Hyukjin Kwon <gu...@gmail.com>.

I am searching and checking some PRs or JIRAs that state regression. Let me
leave a link - it might be good to double check
https://github.com/apache/spark/pull/22514 as well.

2018년 10월 23일 (화) 오후 11:58, Stavros Kontopoulos <
stavros.kontopoulos@lightbend.com>님이 작성:

> Sean,
>
> I will try it against 2.12 shortly.
>
> You're saying someone would have to first build a k8s distro from source
>> too?
>
>
> Ok I missed the error one line above, before the distro error there is
> another one:
>
> fatal: not a git repository (or any of the parent directories): .git
>
>
> So that seems to come from here
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh#L19>.
> It seems that the test root is not set up correctly. It should be the top
> git dir from which you built Spark.
>
> Now regarding the distro thing. dev-run-integration-tests.sh should run
> from within the cloned project after the distro is built. The distro is
> required
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh#L61>
> , it should fail otherwise.
>
> Integration tests run the setup-integration-test-env.sh script. dev-run-integration-tests.sh
> calls mvn
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106> which
> in turn executes that setup script
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/pom.xml#L80>
> .
>
> How do you run the tests?
>
> Stavros
>
> On Tue, Oct 23, 2018 at 3:01 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> No, because the docs are built into the release too and released to
>> the site too from the released artifact.
>> As a practical matter, I think these docs are not critical for
>> release, and can follow in a maintenance release. I'd retarget to
>> 2.4.1 or untarget.
>> I do know at times a release's docs have been edited after the fact,
>> but that's bad form. We'd not go change a class in the release after
>> it was released and call it the same release.
>>
>> I'd still like some confirmation that someone can build and pass tests
>> with -Pkubernetes, maybe? It actually all passed with the 2.11 build.
>> I don't think it's a 2.12 incompatibility, but rather than the K8S
>> tests maybe don't quite work with the 2.12 build artifact naming. Or
>> else something to do with my env.
>>
>> On Mon, Oct 22, 2018 at 9:08 PM Wenchen Fan <cl...@gmail.com> wrote:
>> >
>> > Regarding the doc tickets, I vaguely remember that we can merge doc PRs
>> after release and publish doc to spark website later. Can anyone confirm?
>> >
>> > On Tue, Oct 23, 2018 at 8:30 AM Sean Owen <sr...@gmail.com> wrote:
>> >>
>> >> This is what I got from a straightforward build of the source distro
>> >> here ... really, ideally, it builds as-is from source. You're saying
>> >> someone would have to first build a k8s distro from source too?
>> >> It's not a 'must' that this be automatic but nothing else fails out of
>> the box.
>> >> I feel like I might be misunderstanding the setup here.
>> >> On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
>> >> <st...@lightbend.com> wrote:
>>
>
>
>
> --
> Stavros Kontopoulos
>
> *Senior Software Engineer*
> *Lightbend, Inc.*
>
> *p:  +30 6977967274 <%2B1%20650%20678%200020>*
> *e: stavros.kontopoulos@lightbend.com* <da...@lightbend.com>
>
>
>

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Stavros Kontopoulos <st...@lightbend.com>.

Sean,

Yes, I updated the PR and re-run it.

On Fri, Oct 26, 2018 at 2:54 AM, Sean Owen <sr...@gmail.com> wrote:

> Yep, we're going to merge a change to separate the k8s tests into a
> separate profile, and fix up the Scala 2.12 thing. While non-critical those
> are pretty nice to have for 2.4. I think that's doable within the next 12
> hours even.
>
> @skonto I think there's one last minor thing needed on this PR?
> https://github.com/apache/spark/pull/22838/files#r228363727
>
> On Thu, Oct 25, 2018 at 6:42 PM Wenchen Fan <cl...@gmail.com> wrote:
>
>> Any updates on this topic? https://github.com/apache/spark/pull/22827 is
>> merged and 2.4 is unblocked.
>>
>> I'll cut RC5 shortly after the weekend, and it will be great to include
>> the change proposed here.
>>
>> Thanks,
>> Wenchen
>>
>> On Fri, Oct 26, 2018 at 12:55 AM Stavros Kontopoulos <
>> stavros.kontopoulos@lightbend.com> wrote:
>>
>>> I think it's worth getting in a change to just not enable this module,
>>>> which ought to be entirely safe, and avoid two of the issues we
>>>> identified.
>>>>
>>>
>>> Besides disabling it, when someone wants to run the tests with 2.12 he
>>> should be able to do so. So propagating the Scala profile still makes sense
>>> but it is not related to the release other than making sure things work
>>> fine.
>>>
>>> On Thu, Oct 25, 2018 at 7:02 PM, Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> I think it's worth getting in a change to just not enable this module,
>>>> which ought to be entirely safe, and avoid two of the issues we
>>>> identified.
>>>> that said it didn't block RC4 so need not block RC5.
>>>> But should happen today if we're doing it.
>>>> On Thu, Oct 25, 2018 at 10:47 AM Xiao Li <ga...@gmail.com> wrote:
>>>> >
>>>> > Hopefully, this will not delay RC5. Since this is not a blocker
>>>> ticket, RC5 will start if all the blocker tickets are resolved.
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Xiao
>>>> >
>>>> > Sean Owen <sr...@gmail.com> 于2018年10月25日周四 上午8:44写道：
>>>> >>
>>>> >> Yes, I agree, and perhaps you are best placed to do that for 2.4.0
>>>> RC5 :)
>>>> >>
>>>> >> On Thu, Oct 25, 2018 at 10:41 AM Stavros Kontopoulos
>>>> >> <st...@lightbend.com> wrote:
>>>> >> >
>>>> >> > I agree these tests should be manual for now but should be run
>>>> somehow before a release to make sure things are working right?
>>>> >> >
>>>> >> > For the other issue: https://issues.apache.org/
>>>> jira/browse/SPARK-25835 .
>>>> >> >
>>>> >> >
>>>> >> > On Thu, Oct 25, 2018 at 6:29 PM, Stavros Kontopoulos <
>>>> stavros.kontopoulos@lightbend.com> wrote:
>>>> >> >>
>>>> >> >> I will open a jira for the profile propagation issue and have a
>>>> look to fix it.
>>>> >> >>
>>>> >> >> Stavros
>>>> >> >>
>>>> >> >> On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson <
>>>> eerlands@redhat.com> wrote:
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> I would be comfortable making the integration testing manual for
>>>> now.  A JIRA for ironing out how to make it reliable for automatic as a
>>>> goal for 3.0 seems like a good idea.
>>>> >> >>>
>>>> >> >>> On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com>
>>>> wrote:
>>>> >> >>>>
>>>> >> >>>> Forking this thread.
>>>> >> >>>>
>>>> >> >>>> Because we'll have another RC, we could possibly address these
>>>> two
>>>> >> >>>> issues. Only if we have a reliable change of course.
>>>> >> >>>>
>>>> >> >>>> Is it easy enough to propagate the -Pscala-2.12 profile? can't
>>>> hurt.
>>>> >> >>>>
>>>> >> >>>> And is it reasonable to essentially 'disable'
>>>> >> >>>> kubernetes/integration-tests by removing it from the kubernetes
>>>> >> >>>> profile? it doesn't mean it goes away, just means it's run
>>>> manually,
>>>> >> >>>> not automatically. Is that actually how it's meant to be used
>>>> anyway?
>>>> >> >>>> in the short term? given the discussion around its requirements
>>>> and
>>>> >> >>>> minikube and all that?
>>>> >> >>>>
>>>> >> >>>> (Actually, this would also 'solve' the Scala 2.12 build problem
>>>> too)
>>>> >> >>>>
>>>> >> >>>> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com>
>>>> wrote:
>>>> >> >>>> >
>>>> >> >>>> > To be clear I'm currently +1 on this release, with much
>>>> commentary.
>>>> >> >>>> >
>>>> >> >>>> > OK, the explanation for kubernetes tests makes sense. Yes I
>>>> think we need to propagate the scala-2.12 build profile to make it work. Go
>>>> for it, if you have a lead on what the change is.
>>>> >> >>>> > This doesn't block the release as it's an issue for tests,
>>>> and only affects 2.12. However if we had a clean fix for this and there
>>>> were another RC, I'd include it.
>>>> >> >>>> >
>>>> >> >>>> > Dongjoon has a good point about the
>>>> spark-kubernetes-integration-tests artifact. That doesn't sound like
>>>> it should be published in this way, though, of course, we publish the test
>>>> artifacts from every module already. This is only a bit odd in being a
>>>> non-test artifact meant for testing. But it's special testing! So I also
>>>> don't think that needs to block a release.
>>>> >> >>>> >
>>>> >> >>>> > This happens because the integration tests module is enabled
>>>> with the 'kubernetes' profile too, and also this output is copied into the
>>>> release tarball at kubernetes/integration-tests/tests. Do we need that
>>>> in a binary release?
>>>> >> >>>> >
>>>> >> >>>> > If these integration tests are meant to be run ad hoc,
>>>> manually, not part of a normal test cycle, then I think we can just not
>>>> enable it with -Pkubernetes. If it is meant to run every time, then it
>>>> sounds like we need a little extra work shown in recent PRs to make that
>>>> easier, but then, this test code should just be the 'test' artifact parts
>>>> of the kubernetes module, no?
>>>> >> >>>>
>>>> >> >>>> ------------------------------------------------------------
>>>> ---------
>>>> >> >>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>> >> >>>>
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >> ------------------------------------------------------------
>>>> ---------
>>>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>> >>
>>>>
>>>
>>>
>>>
>>>

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Reynold Xin <rx...@databricks.com>.

I also think we should get this in:
https://github.com/apache/spark/pull/22841

It's to deprecate a confusing & broken window function API, so we can
remove them in 3.0 and redesign a better one. See
https://issues.apache.org/jira/browse/SPARK-25841 for more information.


On Thu, Oct 25, 2018 at 4:55 PM Sean Owen <sr...@gmail.com> wrote:

> Yep, we're going to merge a change to separate the k8s tests into a
> separate profile, and fix up the Scala 2.12 thing. While non-critical those
> are pretty nice to have for 2.4. I think that's doable within the next 12
> hours even.
>
> @skonto I think there's one last minor thing needed on this PR?
> https://github.com/apache/spark/pull/22838/files#r228363727
>
> On Thu, Oct 25, 2018 at 6:42 PM Wenchen Fan <cl...@gmail.com> wrote:
>
>> Any updates on this topic? https://github.com/apache/spark/pull/22827 is
>> merged and 2.4 is unblocked.
>>
>> I'll cut RC5 shortly after the weekend, and it will be great to include
>> the change proposed here.
>>
>> Thanks,
>> Wenchen
>>
>> On Fri, Oct 26, 2018 at 12:55 AM Stavros Kontopoulos <
>> stavros.kontopoulos@lightbend.com> wrote:
>>
>>> I think it's worth getting in a change to just not enable this module,
>>>> which ought to be entirely safe, and avoid two of the issues we
>>>> identified.
>>>>
>>>
>>> Besides disabling it, when someone wants to run the tests with 2.12 he
>>> should be able to do so. So propagating the Scala profile still makes sense
>>> but it is not related to the release other than making sure things work
>>> fine.
>>>
>>> On Thu, Oct 25, 2018 at 7:02 PM, Sean Owen <sr...@gmail.com> wrote:
>>>
>>>> I think it's worth getting in a change to just not enable this module,
>>>> which ought to be entirely safe, and avoid two of the issues we
>>>> identified.
>>>> that said it didn't block RC4 so need not block RC5.
>>>> But should happen today if we're doing it.
>>>> On Thu, Oct 25, 2018 at 10:47 AM Xiao Li <ga...@gmail.com> wrote:
>>>> >
>>>> > Hopefully, this will not delay RC5. Since this is not a blocker
>>>> ticket, RC5 will start if all the blocker tickets are resolved.
>>>> >
>>>> > Thanks,
>>>> >
>>>> > Xiao
>>>> >
>>>> > Sean Owen <sr...@gmail.com> 于2018年10月25日周四 上午8:44写道：
>>>> >>
>>>> >> Yes, I agree, and perhaps you are best placed to do that for 2.4.0
>>>> RC5 :)
>>>> >>
>>>> >> On Thu, Oct 25, 2018 at 10:41 AM Stavros Kontopoulos
>>>> >> <st...@lightbend.com> wrote:
>>>> >> >
>>>> >> > I agree these tests should be manual for now but should be run
>>>> somehow before a release to make sure things are working right?
>>>> >> >
>>>> >> > For the other issue:
>>>> https://issues.apache.org/jira/browse/SPARK-25835 .
>>>> >> >
>>>> >> >
>>>> >> > On Thu, Oct 25, 2018 at 6:29 PM, Stavros Kontopoulos <
>>>> stavros.kontopoulos@lightbend.com> wrote:
>>>> >> >>
>>>> >> >> I will open a jira for the profile propagation issue and have a
>>>> look to fix it.
>>>> >> >>
>>>> >> >> Stavros
>>>> >> >>
>>>> >> >> On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson <
>>>> eerlands@redhat.com> wrote:
>>>> >> >>>
>>>> >> >>>
>>>> >> >>> I would be comfortable making the integration testing manual for
>>>> now.  A JIRA for ironing out how to make it reliable for automatic as a
>>>> goal for 3.0 seems like a good idea.
>>>> >> >>>
>>>> >> >>> On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com>
>>>> wrote:
>>>> >> >>>>
>>>> >> >>>> Forking this thread.
>>>> >> >>>>
>>>> >> >>>> Because we'll have another RC, we could possibly address these
>>>> two
>>>> >> >>>> issues. Only if we have a reliable change of course.
>>>> >> >>>>
>>>> >> >>>> Is it easy enough to propagate the -Pscala-2.12 profile? can't
>>>> hurt.
>>>> >> >>>>
>>>> >> >>>> And is it reasonable to essentially 'disable'
>>>> >> >>>> kubernetes/integration-tests by removing it from the kubernetes
>>>> >> >>>> profile? it doesn't mean it goes away, just means it's run
>>>> manually,
>>>> >> >>>> not automatically. Is that actually how it's meant to be used
>>>> anyway?
>>>> >> >>>> in the short term? given the discussion around its requirements
>>>> and
>>>> >> >>>> minikube and all that?
>>>> >> >>>>
>>>> >> >>>> (Actually, this would also 'solve' the Scala 2.12 build problem
>>>> too)
>>>> >> >>>>
>>>> >> >>>> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com>
>>>> wrote:
>>>> >> >>>> >
>>>> >> >>>> > To be clear I'm currently +1 on this release, with much
>>>> commentary.
>>>> >> >>>> >
>>>> >> >>>> > OK, the explanation for kubernetes tests makes sense. Yes I
>>>> think we need to propagate the scala-2.12 build profile to make it work. Go
>>>> for it, if you have a lead on what the change is.
>>>> >> >>>> > This doesn't block the release as it's an issue for tests,
>>>> and only affects 2.12. However if we had a clean fix for this and there
>>>> were another RC, I'd include it.
>>>> >> >>>> >
>>>> >> >>>> > Dongjoon has a good point about the
>>>> spark-kubernetes-integration-tests artifact. That doesn't sound like it
>>>> should be published in this way, though, of course, we publish the test
>>>> artifacts from every module already. This is only a bit odd in being a
>>>> non-test artifact meant for testing. But it's special testing! So I also
>>>> don't think that needs to block a release.
>>>> >> >>>> >
>>>> >> >>>> > This happens because the integration tests module is enabled
>>>> with the 'kubernetes' profile too, and also this output is copied into the
>>>> release tarball at kubernetes/integration-tests/tests. Do we need that in a
>>>> binary release?
>>>> >> >>>> >
>>>> >> >>>> > If these integration tests are meant to be run ad hoc,
>>>> manually, not part of a normal test cycle, then I think we can just not
>>>> enable it with -Pkubernetes. If it is meant to run every time, then it
>>>> sounds like we need a little extra work shown in recent PRs to make that
>>>> easier, but then, this test code should just be the 'test' artifact parts
>>>> of the kubernetes module, no?
>>>> >> >>>>
>>>> >> >>>>
>>>> ---------------------------------------------------------------------
>>>> >> >>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>> >> >>>>
>>>> >> >>
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>> >>
>>>>
>>>
>>>
>>>
>>> --
>>> Stavros Kontopoulos
>>>
>>> *Senior Software Engineer*
>>> *Lightbend, Inc.*
>>>
>>> *p:  +30 6977967274 <%2B1%20650%20678%200020>*
>>> *e: stavros.kontopoulos@lightbend.com* <da...@lightbend.com>
>>>
>>>
>>>

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Sean Owen <sr...@gmail.com>.

This is all merged to master/2.4. AFAIK there aren't any items I'm
monitoring that are needed for 2.4.

On Thu, Oct 25, 2018 at 6:54 PM Sean Owen <sr...@gmail.com> wrote:

> Yep, we're going to merge a change to separate the k8s tests into a
> separate profile, and fix up the Scala 2.12 thing. While non-critical those
> are pretty nice to have for 2.4. I think that's doable within the next 12
> hours even.
>
> @skonto I think there's one last minor thing needed on this PR?
> https://github.com/apache/spark/pull/22838/files#r228363727
>
> On Thu, Oct 25, 2018 at 6:42 PM Wenchen Fan <cl...@gmail.com> wrote:
>
>> Any updates on this topic? https://github.com/apache/spark/pull/22827 is
>> merged and 2.4 is unblocked.
>>
>> I'll cut RC5 shortly after the weekend, and it will be great to include
>> the change proposed here.
>>
>> Thanks,
>> Wenchen
>>
>

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Sean Owen <sr...@gmail.com>.

Yep, we're going to merge a change to separate the k8s tests into a
separate profile, and fix up the Scala 2.12 thing. While non-critical those
are pretty nice to have for 2.4. I think that's doable within the next 12
hours even.

@skonto I think there's one last minor thing needed on this PR?
https://github.com/apache/spark/pull/22838/files#r228363727

On Thu, Oct 25, 2018 at 6:42 PM Wenchen Fan <cl...@gmail.com> wrote:

> Any updates on this topic? https://github.com/apache/spark/pull/22827 is
> merged and 2.4 is unblocked.
>
> I'll cut RC5 shortly after the weekend, and it will be great to include
> the change proposed here.
>
> Thanks,
> Wenchen
>
> On Fri, Oct 26, 2018 at 12:55 AM Stavros Kontopoulos <
> stavros.kontopoulos@lightbend.com> wrote:
>
>> I think it's worth getting in a change to just not enable this module,
>>> which ought to be entirely safe, and avoid two of the issues we
>>> identified.
>>>
>>
>> Besides disabling it, when someone wants to run the tests with 2.12 he
>> should be able to do so. So propagating the Scala profile still makes sense
>> but it is not related to the release other than making sure things work
>> fine.
>>
>> On Thu, Oct 25, 2018 at 7:02 PM, Sean Owen <sr...@gmail.com> wrote:
>>
>>> I think it's worth getting in a change to just not enable this module,
>>> which ought to be entirely safe, and avoid two of the issues we
>>> identified.
>>> that said it didn't block RC4 so need not block RC5.
>>> But should happen today if we're doing it.
>>> On Thu, Oct 25, 2018 at 10:47 AM Xiao Li <ga...@gmail.com> wrote:
>>> >
>>> > Hopefully, this will not delay RC5. Since this is not a blocker
>>> ticket, RC5 will start if all the blocker tickets are resolved.
>>> >
>>> > Thanks,
>>> >
>>> > Xiao
>>> >
>>> > Sean Owen <sr...@gmail.com> 于2018年10月25日周四 上午8:44写道：
>>> >>
>>> >> Yes, I agree, and perhaps you are best placed to do that for 2.4.0
>>> RC5 :)
>>> >>
>>> >> On Thu, Oct 25, 2018 at 10:41 AM Stavros Kontopoulos
>>> >> <st...@lightbend.com> wrote:
>>> >> >
>>> >> > I agree these tests should be manual for now but should be run
>>> somehow before a release to make sure things are working right?
>>> >> >
>>> >> > For the other issue:
>>> https://issues.apache.org/jira/browse/SPARK-25835 .
>>> >> >
>>> >> >
>>> >> > On Thu, Oct 25, 2018 at 6:29 PM, Stavros Kontopoulos <
>>> stavros.kontopoulos@lightbend.com> wrote:
>>> >> >>
>>> >> >> I will open a jira for the profile propagation issue and have a
>>> look to fix it.
>>> >> >>
>>> >> >> Stavros
>>> >> >>
>>> >> >> On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson <
>>> eerlands@redhat.com> wrote:
>>> >> >>>
>>> >> >>>
>>> >> >>> I would be comfortable making the integration testing manual for
>>> now.  A JIRA for ironing out how to make it reliable for automatic as a
>>> goal for 3.0 seems like a good idea.
>>> >> >>>
>>> >> >>> On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com>
>>> wrote:
>>> >> >>>>
>>> >> >>>> Forking this thread.
>>> >> >>>>
>>> >> >>>> Because we'll have another RC, we could possibly address these
>>> two
>>> >> >>>> issues. Only if we have a reliable change of course.
>>> >> >>>>
>>> >> >>>> Is it easy enough to propagate the -Pscala-2.12 profile? can't
>>> hurt.
>>> >> >>>>
>>> >> >>>> And is it reasonable to essentially 'disable'
>>> >> >>>> kubernetes/integration-tests by removing it from the kubernetes
>>> >> >>>> profile? it doesn't mean it goes away, just means it's run
>>> manually,
>>> >> >>>> not automatically. Is that actually how it's meant to be used
>>> anyway?
>>> >> >>>> in the short term? given the discussion around its requirements
>>> and
>>> >> >>>> minikube and all that?
>>> >> >>>>
>>> >> >>>> (Actually, this would also 'solve' the Scala 2.12 build problem
>>> too)
>>> >> >>>>
>>> >> >>>> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com>
>>> wrote:
>>> >> >>>> >
>>> >> >>>> > To be clear I'm currently +1 on this release, with much
>>> commentary.
>>> >> >>>> >
>>> >> >>>> > OK, the explanation for kubernetes tests makes sense. Yes I
>>> think we need to propagate the scala-2.12 build profile to make it work. Go
>>> for it, if you have a lead on what the change is.
>>> >> >>>> > This doesn't block the release as it's an issue for tests, and
>>> only affects 2.12. However if we had a clean fix for this and there were
>>> another RC, I'd include it.
>>> >> >>>> >
>>> >> >>>> > Dongjoon has a good point about the
>>> spark-kubernetes-integration-tests artifact. That doesn't sound like it
>>> should be published in this way, though, of course, we publish the test
>>> artifacts from every module already. This is only a bit odd in being a
>>> non-test artifact meant for testing. But it's special testing! So I also
>>> don't think that needs to block a release.
>>> >> >>>> >
>>> >> >>>> > This happens because the integration tests module is enabled
>>> with the 'kubernetes' profile too, and also this output is copied into the
>>> release tarball at kubernetes/integration-tests/tests. Do we need that in a
>>> binary release?
>>> >> >>>> >
>>> >> >>>> > If these integration tests are meant to be run ad hoc,
>>> manually, not part of a normal test cycle, then I think we can just not
>>> enable it with -Pkubernetes. If it is meant to run every time, then it
>>> sounds like we need a little extra work shown in recent PRs to make that
>>> easier, but then, this test code should just be the 'test' artifact parts
>>> of the kubernetes module, no?
>>> >> >>>>
>>> >> >>>>
>>> ---------------------------------------------------------------------
>>> >> >>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >> >>>>
>>> >> >>
>>> >> >>
>>> >> >
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >>
>>>
>>
>>
>>
>> --
>> Stavros Kontopoulos
>>
>> *Senior Software Engineer*
>> *Lightbend, Inc.*
>>
>> *p:  +30 6977967274 <%2B1%20650%20678%200020>*
>> *e: stavros.kontopoulos@lightbend.com* <da...@lightbend.com>
>>
>>
>>

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Wenchen Fan <cl...@gmail.com>.

Any updates on this topic? https://github.com/apache/spark/pull/22827 is
merged and 2.4 is unblocked.

I'll cut RC5 shortly after the weekend, and it will be great to include the
change proposed here.

Thanks,
Wenchen

On Fri, Oct 26, 2018 at 12:55 AM Stavros Kontopoulos <
stavros.kontopoulos@lightbend.com> wrote:

> I think it's worth getting in a change to just not enable this module,
>> which ought to be entirely safe, and avoid two of the issues we
>> identified.
>>
>
> Besides disabling it, when someone wants to run the tests with 2.12 he
> should be able to do so. So propagating the Scala profile still makes sense
> but it is not related to the release other than making sure things work
> fine.
>
> On Thu, Oct 25, 2018 at 7:02 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> I think it's worth getting in a change to just not enable this module,
>> which ought to be entirely safe, and avoid two of the issues we
>> identified.
>> that said it didn't block RC4 so need not block RC5.
>> But should happen today if we're doing it.
>> On Thu, Oct 25, 2018 at 10:47 AM Xiao Li <ga...@gmail.com> wrote:
>> >
>> > Hopefully, this will not delay RC5. Since this is not a blocker ticket,
>> RC5 will start if all the blocker tickets are resolved.
>> >
>> > Thanks,
>> >
>> > Xiao
>> >
>> > Sean Owen <sr...@gmail.com> 于2018年10月25日周四 上午8:44写道：
>> >>
>> >> Yes, I agree, and perhaps you are best placed to do that for 2.4.0 RC5
>> :)
>> >>
>> >> On Thu, Oct 25, 2018 at 10:41 AM Stavros Kontopoulos
>> >> <st...@lightbend.com> wrote:
>> >> >
>> >> > I agree these tests should be manual for now but should be run
>> somehow before a release to make sure things are working right?
>> >> >
>> >> > For the other issue:
>> https://issues.apache.org/jira/browse/SPARK-25835 .
>> >> >
>> >> >
>> >> > On Thu, Oct 25, 2018 at 6:29 PM, Stavros Kontopoulos <
>> stavros.kontopoulos@lightbend.com> wrote:
>> >> >>
>> >> >> I will open a jira for the profile propagation issue and have a
>> look to fix it.
>> >> >>
>> >> >> Stavros
>> >> >>
>> >> >> On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson <
>> eerlands@redhat.com> wrote:
>> >> >>>
>> >> >>>
>> >> >>> I would be comfortable making the integration testing manual for
>> now.  A JIRA for ironing out how to make it reliable for automatic as a
>> goal for 3.0 seems like a good idea.
>> >> >>>
>> >> >>> On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com>
>> wrote:
>> >> >>>>
>> >> >>>> Forking this thread.
>> >> >>>>
>> >> >>>> Because we'll have another RC, we could possibly address these two
>> >> >>>> issues. Only if we have a reliable change of course.
>> >> >>>>
>> >> >>>> Is it easy enough to propagate the -Pscala-2.12 profile? can't
>> hurt.
>> >> >>>>
>> >> >>>> And is it reasonable to essentially 'disable'
>> >> >>>> kubernetes/integration-tests by removing it from the kubernetes
>> >> >>>> profile? it doesn't mean it goes away, just means it's run
>> manually,
>> >> >>>> not automatically. Is that actually how it's meant to be used
>> anyway?
>> >> >>>> in the short term? given the discussion around its requirements
>> and
>> >> >>>> minikube and all that?
>> >> >>>>
>> >> >>>> (Actually, this would also 'solve' the Scala 2.12 build problem
>> too)
>> >> >>>>
>> >> >>>> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com>
>> wrote:
>> >> >>>> >
>> >> >>>> > To be clear I'm currently +1 on this release, with much
>> commentary.
>> >> >>>> >
>> >> >>>> > OK, the explanation for kubernetes tests makes sense. Yes I
>> think we need to propagate the scala-2.12 build profile to make it work. Go
>> for it, if you have a lead on what the change is.
>> >> >>>> > This doesn't block the release as it's an issue for tests, and
>> only affects 2.12. However if we had a clean fix for this and there were
>> another RC, I'd include it.
>> >> >>>> >
>> >> >>>> > Dongjoon has a good point about the
>> spark-kubernetes-integration-tests artifact. That doesn't sound like it
>> should be published in this way, though, of course, we publish the test
>> artifacts from every module already. This is only a bit odd in being a
>> non-test artifact meant for testing. But it's special testing! So I also
>> don't think that needs to block a release.
>> >> >>>> >
>> >> >>>> > This happens because the integration tests module is enabled
>> with the 'kubernetes' profile too, and also this output is copied into the
>> release tarball at kubernetes/integration-tests/tests. Do we need that in a
>> binary release?
>> >> >>>> >
>> >> >>>> > If these integration tests are meant to be run ad hoc,
>> manually, not part of a normal test cycle, then I think we can just not
>> enable it with -Pkubernetes. If it is meant to run every time, then it
>> sounds like we need a little extra work shown in recent PRs to make that
>> easier, but then, this test code should just be the 'test' artifact parts
>> of the kubernetes module, no?
>> >> >>>>
>> >> >>>>
>> ---------------------------------------------------------------------
>> >> >>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >> >>>>
>> >> >>
>> >> >>
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >>
>>
>
>
>
> --
> Stavros Kontopoulos
>
> *Senior Software Engineer*
> *Lightbend, Inc.*
>
> *p:  +30 6977967274 <%2B1%20650%20678%200020>*
> *e: stavros.kontopoulos@lightbend.com* <da...@lightbend.com>
>
>
>

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Stavros Kontopoulos <st...@lightbend.com>.

>
> I think it's worth getting in a change to just not enable this module,
> which ought to be entirely safe, and avoid two of the issues we
> identified.
>

Besides disabling it, when someone wants to run the tests with 2.12 he
should be able to do so. So propagating the Scala profile still makes sense
but it is not related to the release other than making sure things work
fine.

On Thu, Oct 25, 2018 at 7:02 PM, Sean Owen <sr...@gmail.com> wrote:

> I think it's worth getting in a change to just not enable this module,
> which ought to be entirely safe, and avoid two of the issues we
> identified.
> that said it didn't block RC4 so need not block RC5.
> But should happen today if we're doing it.
> On Thu, Oct 25, 2018 at 10:47 AM Xiao Li <ga...@gmail.com> wrote:
> >
> > Hopefully, this will not delay RC5. Since this is not a blocker ticket,
> RC5 will start if all the blocker tickets are resolved.
> >
> > Thanks,
> >
> > Xiao
> >
> > Sean Owen <sr...@gmail.com> 于2018年10月25日周四 上午8:44写道：
> >>
> >> Yes, I agree, and perhaps you are best placed to do that for 2.4.0 RC5
> :)
> >>
> >> On Thu, Oct 25, 2018 at 10:41 AM Stavros Kontopoulos
> >> <st...@lightbend.com> wrote:
> >> >
> >> > I agree these tests should be manual for now but should be run
> somehow before a release to make sure things are working right?
> >> >
> >> > For the other issue: https://issues.apache.org/
> jira/browse/SPARK-25835 .
> >> >
> >> >
> >> > On Thu, Oct 25, 2018 at 6:29 PM, Stavros Kontopoulos <
> stavros.kontopoulos@lightbend.com> wrote:
> >> >>
> >> >> I will open a jira for the profile propagation issue and have a look
> to fix it.
> >> >>
> >> >> Stavros
> >> >>
> >> >> On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson <ee...@redhat.com>
> wrote:
> >> >>>
> >> >>>
> >> >>> I would be comfortable making the integration testing manual for
> now.  A JIRA for ironing out how to make it reliable for automatic as a
> goal for 3.0 seems like a good idea.
> >> >>>
> >> >>> On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com> wrote:
> >> >>>>
> >> >>>> Forking this thread.
> >> >>>>
> >> >>>> Because we'll have another RC, we could possibly address these two
> >> >>>> issues. Only if we have a reliable change of course.
> >> >>>>
> >> >>>> Is it easy enough to propagate the -Pscala-2.12 profile? can't
> hurt.
> >> >>>>
> >> >>>> And is it reasonable to essentially 'disable'
> >> >>>> kubernetes/integration-tests by removing it from the kubernetes
> >> >>>> profile? it doesn't mean it goes away, just means it's run
> manually,
> >> >>>> not automatically. Is that actually how it's meant to be used
> anyway?
> >> >>>> in the short term? given the discussion around its requirements and
> >> >>>> minikube and all that?
> >> >>>>
> >> >>>> (Actually, this would also 'solve' the Scala 2.12 build problem
> too)
> >> >>>>
> >> >>>> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com>
> wrote:
> >> >>>> >
> >> >>>> > To be clear I'm currently +1 on this release, with much
> commentary.
> >> >>>> >
> >> >>>> > OK, the explanation for kubernetes tests makes sense. Yes I
> think we need to propagate the scala-2.12 build profile to make it work. Go
> for it, if you have a lead on what the change is.
> >> >>>> > This doesn't block the release as it's an issue for tests, and
> only affects 2.12. However if we had a clean fix for this and there were
> another RC, I'd include it.
> >> >>>> >
> >> >>>> > Dongjoon has a good point about the spark-kubernetes-integration-tests
> artifact. That doesn't sound like it should be published in this way,
> though, of course, we publish the test artifacts from every module already.
> This is only a bit odd in being a non-test artifact meant for testing. But
> it's special testing! So I also don't think that needs to block a release.
> >> >>>> >
> >> >>>> > This happens because the integration tests module is enabled
> with the 'kubernetes' profile too, and also this output is copied into the
> release tarball at kubernetes/integration-tests/tests. Do we need that in
> a binary release?
> >> >>>> >
> >> >>>> > If these integration tests are meant to be run ad hoc, manually,
> not part of a normal test cycle, then I think we can just not enable it
> with -Pkubernetes. If it is meant to run every time, then it sounds like we
> need a little extra work shown in recent PRs to make that easier, but then,
> this test code should just be the 'test' artifact parts of the kubernetes
> module, no?
> >> >>>>
> >> >>>> ------------------------------------------------------------
> ---------
> >> >>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >> >>>>
> >> >>
> >> >>
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>
>



-- 
Stavros Kontopoulos

*Senior Software Engineer*
*Lightbend, Inc.*

*p:  +30 6977967274 <%2B1%20650%20678%200020>*
*e: stavros.kontopoulos@lightbend.com* <da...@lightbend.com>

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Sean Owen <sr...@gmail.com>.

I think it's worth getting in a change to just not enable this module,
which ought to be entirely safe, and avoid two of the issues we
identified.
that said it didn't block RC4 so need not block RC5.
But should happen today if we're doing it.
On Thu, Oct 25, 2018 at 10:47 AM Xiao Li <ga...@gmail.com> wrote:
>
> Hopefully, this will not delay RC5. Since this is not a blocker ticket, RC5 will start if all the blocker tickets are resolved.
>
> Thanks,
>
> Xiao
>
> Sean Owen <sr...@gmail.com> 于2018年10月25日周四 上午8:44写道：
>>
>> Yes, I agree, and perhaps you are best placed to do that for 2.4.0 RC5 :)
>>
>> On Thu, Oct 25, 2018 at 10:41 AM Stavros Kontopoulos
>> <st...@lightbend.com> wrote:
>> >
>> > I agree these tests should be manual for now but should be run somehow before a release to make sure things are working right?
>> >
>> > For the other issue: https://issues.apache.org/jira/browse/SPARK-25835 .
>> >
>> >
>> > On Thu, Oct 25, 2018 at 6:29 PM, Stavros Kontopoulos <st...@lightbend.com> wrote:
>> >>
>> >> I will open a jira for the profile propagation issue and have a look to fix it.
>> >>
>> >> Stavros
>> >>
>> >> On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson <ee...@redhat.com> wrote:
>> >>>
>> >>>
>> >>> I would be comfortable making the integration testing manual for now.  A JIRA for ironing out how to make it reliable for automatic as a goal for 3.0 seems like a good idea.
>> >>>
>> >>> On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com> wrote:
>> >>>>
>> >>>> Forking this thread.
>> >>>>
>> >>>> Because we'll have another RC, we could possibly address these two
>> >>>> issues. Only if we have a reliable change of course.
>> >>>>
>> >>>> Is it easy enough to propagate the -Pscala-2.12 profile? can't hurt.
>> >>>>
>> >>>> And is it reasonable to essentially 'disable'
>> >>>> kubernetes/integration-tests by removing it from the kubernetes
>> >>>> profile? it doesn't mean it goes away, just means it's run manually,
>> >>>> not automatically. Is that actually how it's meant to be used anyway?
>> >>>> in the short term? given the discussion around its requirements and
>> >>>> minikube and all that?
>> >>>>
>> >>>> (Actually, this would also 'solve' the Scala 2.12 build problem too)
>> >>>>
>> >>>> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com> wrote:
>> >>>> >
>> >>>> > To be clear I'm currently +1 on this release, with much commentary.
>> >>>> >
>> >>>> > OK, the explanation for kubernetes tests makes sense. Yes I think we need to propagate the scala-2.12 build profile to make it work. Go for it, if you have a lead on what the change is.
>> >>>> > This doesn't block the release as it's an issue for tests, and only affects 2.12. However if we had a clean fix for this and there were another RC, I'd include it.
>> >>>> >
>> >>>> > Dongjoon has a good point about the spark-kubernetes-integration-tests artifact. That doesn't sound like it should be published in this way, though, of course, we publish the test artifacts from every module already. This is only a bit odd in being a non-test artifact meant for testing. But it's special testing! So I also don't think that needs to block a release.
>> >>>> >
>> >>>> > This happens because the integration tests module is enabled with the 'kubernetes' profile too, and also this output is copied into the release tarball at kubernetes/integration-tests/tests. Do we need that in a binary release?
>> >>>> >
>> >>>> > If these integration tests are meant to be run ad hoc, manually, not part of a normal test cycle, then I think we can just not enable it with -Pkubernetes. If it is meant to run every time, then it sounds like we need a little extra work shown in recent PRs to make that easier, but then, this test code should just be the 'test' artifact parts of the kubernetes module, no?
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >>>>
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Xiao Li <ga...@gmail.com>.

Hopefully, this will not delay RC5. Since this is not a blocker ticket, RC5
will start if all the blocker tickets are resolved.

Thanks,

Xiao

Sean Owen <sr...@gmail.com> 于2018年10月25日周四 上午8:44写道：

> Yes, I agree, and perhaps you are best placed to do that for 2.4.0 RC5 :)
>
> On Thu, Oct 25, 2018 at 10:41 AM Stavros Kontopoulos
> <st...@lightbend.com> wrote:
> >
> > I agree these tests should be manual for now but should be run somehow
> before a release to make sure things are working right?
> >
> > For the other issue: https://issues.apache.org/jira/browse/SPARK-25835 .
> >
> >
> > On Thu, Oct 25, 2018 at 6:29 PM, Stavros Kontopoulos <
> stavros.kontopoulos@lightbend.com> wrote:
> >>
> >> I will open a jira for the profile propagation issue and have a look to
> fix it.
> >>
> >> Stavros
> >>
> >> On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson <ee...@redhat.com>
> wrote:
> >>>
> >>>
> >>> I would be comfortable making the integration testing manual for now.
> A JIRA for ironing out how to make it reliable for automatic as a goal for
> 3.0 seems like a good idea.
> >>>
> >>> On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com> wrote:
> >>>>
> >>>> Forking this thread.
> >>>>
> >>>> Because we'll have another RC, we could possibly address these two
> >>>> issues. Only if we have a reliable change of course.
> >>>>
> >>>> Is it easy enough to propagate the -Pscala-2.12 profile? can't hurt.
> >>>>
> >>>> And is it reasonable to essentially 'disable'
> >>>> kubernetes/integration-tests by removing it from the kubernetes
> >>>> profile? it doesn't mean it goes away, just means it's run manually,
> >>>> not automatically. Is that actually how it's meant to be used anyway?
> >>>> in the short term? given the discussion around its requirements and
> >>>> minikube and all that?
> >>>>
> >>>> (Actually, this would also 'solve' the Scala 2.12 build problem too)
> >>>>
> >>>> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com> wrote:
> >>>> >
> >>>> > To be clear I'm currently +1 on this release, with much commentary.
> >>>> >
> >>>> > OK, the explanation for kubernetes tests makes sense. Yes I think
> we need to propagate the scala-2.12 build profile to make it work. Go for
> it, if you have a lead on what the change is.
> >>>> > This doesn't block the release as it's an issue for tests, and only
> affects 2.12. However if we had a clean fix for this and there were another
> RC, I'd include it.
> >>>> >
> >>>> > Dongjoon has a good point about the
> spark-kubernetes-integration-tests artifact. That doesn't sound like it
> should be published in this way, though, of course, we publish the test
> artifacts from every module already. This is only a bit odd in being a
> non-test artifact meant for testing. But it's special testing! So I also
> don't think that needs to block a release.
> >>>> >
> >>>> > This happens because the integration tests module is enabled with
> the 'kubernetes' profile too, and also this output is copied into the
> release tarball at kubernetes/integration-tests/tests. Do we need that in a
> binary release?
> >>>> >
> >>>> > If these integration tests are meant to be run ad hoc, manually,
> not part of a normal test cycle, then I think we can just not enable it
> with -Pkubernetes. If it is meant to run every time, then it sounds like we
> need a little extra work shown in recent PRs to make that easier, but then,
> this test code should just be the 'test' artifact parts of the kubernetes
> module, no?
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>>>
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Sean Owen <sr...@gmail.com>.

Yes, I agree, and perhaps you are best placed to do that for 2.4.0 RC5 :)

On Thu, Oct 25, 2018 at 10:41 AM Stavros Kontopoulos
<st...@lightbend.com> wrote:
>
> I agree these tests should be manual for now but should be run somehow before a release to make sure things are working right?
>
> For the other issue: https://issues.apache.org/jira/browse/SPARK-25835 .
>
>
> On Thu, Oct 25, 2018 at 6:29 PM, Stavros Kontopoulos <st...@lightbend.com> wrote:
>>
>> I will open a jira for the profile propagation issue and have a look to fix it.
>>
>> Stavros
>>
>> On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson <ee...@redhat.com> wrote:
>>>
>>>
>>> I would be comfortable making the integration testing manual for now.  A JIRA for ironing out how to make it reliable for automatic as a goal for 3.0 seems like a good idea.
>>>
>>> On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com> wrote:
>>>>
>>>> Forking this thread.
>>>>
>>>> Because we'll have another RC, we could possibly address these two
>>>> issues. Only if we have a reliable change of course.
>>>>
>>>> Is it easy enough to propagate the -Pscala-2.12 profile? can't hurt.
>>>>
>>>> And is it reasonable to essentially 'disable'
>>>> kubernetes/integration-tests by removing it from the kubernetes
>>>> profile? it doesn't mean it goes away, just means it's run manually,
>>>> not automatically. Is that actually how it's meant to be used anyway?
>>>> in the short term? given the discussion around its requirements and
>>>> minikube and all that?
>>>>
>>>> (Actually, this would also 'solve' the Scala 2.12 build problem too)
>>>>
>>>> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com> wrote:
>>>> >
>>>> > To be clear I'm currently +1 on this release, with much commentary.
>>>> >
>>>> > OK, the explanation for kubernetes tests makes sense. Yes I think we need to propagate the scala-2.12 build profile to make it work. Go for it, if you have a lead on what the change is.
>>>> > This doesn't block the release as it's an issue for tests, and only affects 2.12. However if we had a clean fix for this and there were another RC, I'd include it.
>>>> >
>>>> > Dongjoon has a good point about the spark-kubernetes-integration-tests artifact. That doesn't sound like it should be published in this way, though, of course, we publish the test artifacts from every module already. This is only a bit odd in being a non-test artifact meant for testing. But it's special testing! So I also don't think that needs to block a release.
>>>> >
>>>> > This happens because the integration tests module is enabled with the 'kubernetes' profile too, and also this output is copied into the release tarball at kubernetes/integration-tests/tests. Do we need that in a binary release?
>>>> >
>>>> > If these integration tests are meant to be run ad hoc, manually, not part of a normal test cycle, then I think we can just not enable it with -Pkubernetes. If it is meant to run every time, then it sounds like we need a little extra work shown in recent PRs to make that easier, but then, this test code should just be the 'test' artifact parts of the kubernetes module, no?
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Stavros Kontopoulos <st...@lightbend.com>.

I agree these tests should be manual for now but should be run somehow
before a release to make sure things are working right?

For the other issue: https://issues.apache.org/jira/browse/SPARK-25835 .


On Thu, Oct 25, 2018 at 6:29 PM, Stavros Kontopoulos <
stavros.kontopoulos@lightbend.com> wrote:

> I will open a jira for the profile propagation issue and have a look to
> fix it.
>
> Stavros
>
> On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson <ee...@redhat.com>
> wrote:
>
>>
>> I would be comfortable making the integration testing manual for now.  A
>> JIRA for ironing out how to make it reliable for automatic as a goal for
>> 3.0 seems like a good idea.
>>
>> On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com> wrote:
>>
>>> Forking this thread.
>>>
>>> Because we'll have another RC, we could possibly address these two
>>> issues. Only if we have a reliable change of course.
>>>
>>> Is it easy enough to propagate the -Pscala-2.12 profile? can't hurt.
>>>
>>> And is it reasonable to essentially 'disable'
>>> kubernetes/integration-tests by removing it from the kubernetes
>>> profile? it doesn't mean it goes away, just means it's run manually,
>>> not automatically. Is that actually how it's meant to be used anyway?
>>> in the short term? given the discussion around its requirements and
>>> minikube and all that?
>>>
>>> (Actually, this would also 'solve' the Scala 2.12 build problem too)
>>>
>>> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com> wrote:
>>> >
>>> > To be clear I'm currently +1 on this release, with much commentary.
>>> >
>>> > OK, the explanation for kubernetes tests makes sense. Yes I think we
>>> need to propagate the scala-2.12 build profile to make it work. Go for it,
>>> if you have a lead on what the change is.
>>> > This doesn't block the release as it's an issue for tests, and only
>>> affects 2.12. However if we had a clean fix for this and there were another
>>> RC, I'd include it.
>>> >
>>> > Dongjoon has a good point about the spark-kubernetes-integration-tests
>>> artifact. That doesn't sound like it should be published in this way,
>>> though, of course, we publish the test artifacts from every module already.
>>> This is only a bit odd in being a non-test artifact meant for testing. But
>>> it's special testing! So I also don't think that needs to block a release.
>>> >
>>> > This happens because the integration tests module is enabled with the
>>> 'kubernetes' profile too, and also this output is copied into the release
>>> tarball at kubernetes/integration-tests/tests. Do we need that in a
>>> binary release?
>>> >
>>> > If these integration tests are meant to be run ad hoc, manually, not
>>> part of a normal test cycle, then I think we can just not enable it with
>>> -Pkubernetes. If it is meant to run every time, then it sounds like we need
>>> a little extra work shown in recent PRs to make that easier, but then, this
>>> test code should just be the 'test' artifact parts of the kubernetes
>>> module, no?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>
>
>

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Stavros Kontopoulos <st...@lightbend.com>.

I will open a jira for the profile propagation issue and have a look to fix
it.

Stavros

On Thu, Oct 25, 2018 at 6:16 PM, Erik Erlandson <ee...@redhat.com> wrote:

>
> I would be comfortable making the integration testing manual for now.  A
> JIRA for ironing out how to make it reliable for automatic as a goal for
> 3.0 seems like a good idea.
>
> On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com> wrote:
>
>> Forking this thread.
>>
>> Because we'll have another RC, we could possibly address these two
>> issues. Only if we have a reliable change of course.
>>
>> Is it easy enough to propagate the -Pscala-2.12 profile? can't hurt.
>>
>> And is it reasonable to essentially 'disable'
>> kubernetes/integration-tests by removing it from the kubernetes
>> profile? it doesn't mean it goes away, just means it's run manually,
>> not automatically. Is that actually how it's meant to be used anyway?
>> in the short term? given the discussion around its requirements and
>> minikube and all that?
>>
>> (Actually, this would also 'solve' the Scala 2.12 build problem too)
>>
>> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com> wrote:
>> >
>> > To be clear I'm currently +1 on this release, with much commentary.
>> >
>> > OK, the explanation for kubernetes tests makes sense. Yes I think we
>> need to propagate the scala-2.12 build profile to make it work. Go for it,
>> if you have a lead on what the change is.
>> > This doesn't block the release as it's an issue for tests, and only
>> affects 2.12. However if we had a clean fix for this and there were another
>> RC, I'd include it.
>> >
>> > Dongjoon has a good point about the spark-kubernetes-integration-tests
>> artifact. That doesn't sound like it should be published in this way,
>> though, of course, we publish the test artifacts from every module already.
>> This is only a bit odd in being a non-test artifact meant for testing. But
>> it's special testing! So I also don't think that needs to block a release.
>> >
>> > This happens because the integration tests module is enabled with the
>> 'kubernetes' profile too, and also this output is copied into the release
>> tarball at kubernetes/integration-tests/tests. Do we need that in a
>> binary release?
>> >
>> > If these integration tests are meant to be run ad hoc, manually, not
>> part of a normal test cycle, then I think we can just not enable it with
>> -Pkubernetes. If it is meant to run every time, then it sounds like we need
>> a little extra work shown in recent PRs to make that easier, but then, this
>> test code should just be the 'test' artifact parts of the kubernetes
>> module, no?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Erik Erlandson <ee...@redhat.com>.

I would be comfortable making the integration testing manual for now.  A
JIRA for ironing out how to make it reliable for automatic as a goal for
3.0 seems like a good idea.

On Thu, Oct 25, 2018 at 8:11 AM Sean Owen <sr...@gmail.com> wrote:

> Forking this thread.
>
> Because we'll have another RC, we could possibly address these two
> issues. Only if we have a reliable change of course.
>
> Is it easy enough to propagate the -Pscala-2.12 profile? can't hurt.
>
> And is it reasonable to essentially 'disable'
> kubernetes/integration-tests by removing it from the kubernetes
> profile? it doesn't mean it goes away, just means it's run manually,
> not automatically. Is that actually how it's meant to be used anyway?
> in the short term? given the discussion around its requirements and
> minikube and all that?
>
> (Actually, this would also 'solve' the Scala 2.12 build problem too)
>
> On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com> wrote:
> >
> > To be clear I'm currently +1 on this release, with much commentary.
> >
> > OK, the explanation for kubernetes tests makes sense. Yes I think we
> need to propagate the scala-2.12 build profile to make it work. Go for it,
> if you have a lead on what the change is.
> > This doesn't block the release as it's an issue for tests, and only
> affects 2.12. However if we had a clean fix for this and there were another
> RC, I'd include it.
> >
> > Dongjoon has a good point about the spark-kubernetes-integration-tests
> artifact. That doesn't sound like it should be published in this way,
> though, of course, we publish the test artifacts from every module already.
> This is only a bit odd in being a non-test artifact meant for testing. But
> it's special testing! So I also don't think that needs to block a release.
> >
> > This happens because the integration tests module is enabled with the
> 'kubernetes' profile too, and also this output is copied into the release
> tarball at kubernetes/integration-tests/tests. Do we need that in a binary
> release?
> >
> > If these integration tests are meant to be run ad hoc, manually, not
> part of a normal test cycle, then I think we can just not enable it with
> -Pkubernetes. If it is meant to run every time, then it sounds like we need
> a little extra work shown in recent PRs to make that easier, but then, this
> test code should just be the 'test' artifact parts of the kubernetes
> module, no?
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

What if anything to fix about k8s for the 2.4.0 RC5?

Posted by Sean Owen <sr...@gmail.com>.

Forking this thread.

Because we'll have another RC, we could possibly address these two
issues. Only if we have a reliable change of course.

Is it easy enough to propagate the -Pscala-2.12 profile? can't hurt.

And is it reasonable to essentially 'disable'
kubernetes/integration-tests by removing it from the kubernetes
profile? it doesn't mean it goes away, just means it's run manually,
not automatically. Is that actually how it's meant to be used anyway?
in the short term? given the discussion around its requirements and
minikube and all that?

(Actually, this would also 'solve' the Scala 2.12 build problem too)

On Tue, Oct 23, 2018 at 2:45 PM Sean Owen <sr...@gmail.com> wrote:
>
> To be clear I'm currently +1 on this release, with much commentary.
>
> OK, the explanation for kubernetes tests makes sense. Yes I think we need to propagate the scala-2.12 build profile to make it work. Go for it, if you have a lead on what the change is.
> This doesn't block the release as it's an issue for tests, and only affects 2.12. However if we had a clean fix for this and there were another RC, I'd include it.
>
> Dongjoon has a good point about the spark-kubernetes-integration-tests artifact. That doesn't sound like it should be published in this way, though, of course, we publish the test artifacts from every module already. This is only a bit odd in being a non-test artifact meant for testing. But it's special testing! So I also don't think that needs to block a release.
>
> This happens because the integration tests module is enabled with the 'kubernetes' profile too, and also this output is copied into the release tarball at kubernetes/integration-tests/tests. Do we need that in a binary release?
>
> If these integration tests are meant to be run ad hoc, manually, not part of a normal test cycle, then I think we can just not enable it with -Pkubernetes. If it is meant to run every time, then it sounds like we need a little extra work shown in recent PRs to make that easier, but then, this test code should just be the 'test' artifact parts of the kubernetes module, no?

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Sean Owen <sr...@gmail.com>.

To be clear I'm currently +1 on this release, with much commentary.

OK, the explanation for kubernetes tests makes sense. Yes I think we need
to propagate the scala-2.12 build profile to make it work. Go for it, if
you have a lead on what the change is.
This doesn't block the release as it's an issue for tests, and only affects
2.12. However if we had a clean fix for this and there were another RC, I'd
include it.

Dongjoon has a good point about the spark-kubernetes-integration-tests
artifact. That doesn't sound like it should be published in this way,
though, of course, we publish the test artifacts from every module already.
This is only a bit odd in being a non-test artifact meant for testing. But
it's special testing! So I also don't think that needs to block a release.

This happens because the integration tests module is enabled with the
'kubernetes' profile too, and also this output is copied into the release
tarball at kubernetes/integration-tests/tests. Do we need that in a binary
release?

If these integration tests are meant to be run ad hoc, manually, not part
of a normal test cycle, then I think we can just not enable it with
-Pkubernetes. If it is meant to run every time, then it sounds like we need
a little extra work shown in recent PRs to make that easier, but then, this
test code should just be the 'test' artifact parts of the kubernetes
module, no?

On Tue, Oct 23, 2018 at 1:46 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> BTW, for that integration suite, I saw the related artifacts in the RC4
> staging directory.
>
> Does Spark 2.4.0 need to start to release these `spark-kubernetes-integration-tests`
> artifacts?
>
>    -
>    https://repository.apache.org/content/repositories/orgapachespark-1290/org/apache/spark/spark-kubernetes-integration-tests_2.11/
>    -
>    https://repository.apache.org/content/repositories/orgapachespark-1290/org/apache/spark/spark-kubernetes-integration-tests_2.12/
>
> Historically, Spark released `spark-docker-integration-tests` at Spark
> 1.6.x era and stopped since Spark 2.0.0.
>
>    -
>    http://central.maven.org/maven2/org/apache/spark/spark-docker-integration-tests_2.10/
>    -
>    http://central.maven.org/maven2/org/apache/spark/spark-docker-integration-tests_2.11/
>
>
> Bests,
> Dongjoon.
>
> On Tue, Oct 23, 2018 at 11:43 AM Stavros Kontopoulos <
> stavros.kontopoulos@lightbend.com> wrote:
>
>> Sean,
>>
>> Ok makes sense, im using a cloned repo. I built with Scala 2.12 profile
>> using the related tag v2.4.0-rc4:
>>
>> ./dev/change-scala-version.sh 2.12
>> ./dev/make-distribution.sh  --name test --r --tgz -Pscala-2.12 -Psparkr
>> -Phadoop-2.7 -Pkubernetes -Phive
>> Pushed images to dockerhub (previous email) since I didnt use the
>> minikube daemon (default behavior).
>>
>> Then run tests successfully against minikube:
>>
>> TGZ_PATH=$(pwd)/spark-2.4.0-bin-test.gz
>> cd resource-managers/kubernetes/integration-tests
>>
>> ./dev/dev-run-integration-tests.sh --spark-tgz $TGZ_PATH
>> --service-account default --namespace default
>> --image-tag k8s-scala-12 --image-repo skonto
>>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Stavros Kontopoulos <st...@lightbend.com>.

+1 (non-binding). Run k8s tests with Scala 2.12. Also included the
RTestsSuite (mentioned by Ilan) although not part of the 2.4 rc tag:

[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
spark-kubernetes-integration-tests_2.12 ---
Discovery starting.
Discovery completed in 239 milliseconds.
Run starting. Expected test count is: 15
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Run SparkR on simple dataframe.R example
Run completed in 6 minutes, 32 seconds.
Total number of tests run: 15
Suites: completed 2, aborted 0
Tests: succeeded 15, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO]
------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM 2.4.0 ..................... SUCCESS [
4.480 s]
[INFO] Spark Project Tags ................................. SUCCESS [
3.898 s]
[INFO] Spark Project Local DB ............................. SUCCESS [
2.773 s]
[INFO] Spark Project Networking ........................... SUCCESS [
5.063 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
2.651 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [
2.662 s]
[INFO] Spark Project Launcher ............................. SUCCESS [
5.103 s]
[INFO] Spark Project Core ................................. SUCCESS [
25.703 s]
[INFO] Spark Project Kubernetes Integration Tests 2.4.0 ... SUCCESS [06:51
min]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 07:44 min
[INFO] Finished at: 2018-10-23T19:09:41Z
[INFO]
------------------------------------------------------------------------

Stavros

On Tue, Oct 23, 2018 at 9:46 PM, Dongjoon Hyun <do...@gmail.com>
wrote:

> BTW, for that integration suite, I saw the related artifacts in the RC4
> staging directory.
>
> Does Spark 2.4.0 need to start to release these `spark-kubernetes
> -integration-tests` artifacts?
>
>    - https://repository.apache.org/content/repositories/
>    orgapachespark-1290/org/apache/spark/spark-kubernetes-
>    integration-tests_2.11/
>    <https://repository.apache.org/content/repositories/orgapachespark-1290/org/apache/spark/spark-kubernetes-integration-tests_2.11/>
>    - https://repository.apache.org/content/repositories/
>    orgapachespark-1290/org/apache/spark/spark-kubernetes-
>    integration-tests_2.12/
>    <https://repository.apache.org/content/repositories/orgapachespark-1290/org/apache/spark/spark-kubernetes-integration-tests_2.12/>
>
> Historically, Spark released `spark-docker-integration-tests` at Spark
> 1.6.x era and stopped since Spark 2.0.0.
>
>    - http://central.maven.org/maven2/org/apache/spark/spark-
>    docker-integration-tests_2.10/
>    - http://central.maven.org/maven2/org/apache/spark/spark-
>    docker-integration-tests_2.11/
>
>
> Bests,
> Dongjoon.
>
> On Tue, Oct 23, 2018 at 11:43 AM Stavros Kontopoulos <stavros.kontopoulos@
> lightbend.com> wrote:
>
>> Sean,
>>
>> Ok makes sense, im using a cloned repo. I built with Scala 2.12 profile
>> using the related tag v2.4.0-rc4:
>>
>> ./dev/change-scala-version.sh 2.12
>> ./dev/make-distribution.sh  --name test --r --tgz -Pscala-2.12 -Psparkr
>> -Phadoop-2.7 -Pkubernetes -Phive
>> Pushed images to dockerhub (previous email) since I didnt use the
>> minikube daemon (default behavior).
>>
>> Then run tests successfully against minikube:
>>
>> TGZ_PATH=$(pwd)/spark-2.4.0-bin-test.gz
>> cd resource-managers/kubernetes/integration-tests
>>
>> ./dev/dev-run-integration-tests.sh --spark-tgz $TGZ_PATH
>> --service-account default --namespace default --image-tag k8s-scala-12 --image-repo
>> skonto
>>
>>
>> [INFO]
>> [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
>> spark-kubernetes-integration-tests_2.12 ---
>> Discovery starting.
>> Discovery completed in 229 milliseconds.
>> Run starting. Expected test count is: 14
>> KubernetesSuite:
>> - Run SparkPi with no resources
>> - Run SparkPi with a very long application name.
>> - Use SparkLauncher.NO_RESOURCE
>> - Run SparkPi with a master URL without a scheme.
>> - Run SparkPi with an argument.
>> - Run SparkPi with custom labels, annotations, and environment variables.
>> - Run extraJVMOptions check on driver
>> - Run SparkRemoteFileTest using a remote data file
>> - Run SparkPi with env and mount secrets.
>> - Run PySpark on simple pi.py example
>> - Run PySpark with Python2 to test a pyfiles example
>> - Run PySpark with Python3 to test a pyfiles example
>> - Run PySpark with memory customization
>> - Run in client mode.
>> Run completed in 5 minutes, 24 seconds.
>> Total number of tests run: 14
>> Suites: completed 2, aborted 0
>> Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
>> All tests passed.
>> [INFO] ------------------------------------------------------------
>> ------------
>> [INFO] Reactor Summary:
>> [INFO]
>> [INFO] Spark Project Parent POM 2.4.0 ..................... SUCCESS [
>> 4.491 s]
>> [INFO] Spark Project Tags ................................. SUCCESS [
>> 3.833 s]
>> [INFO] Spark Project Local DB ............................. SUCCESS [
>> 2.680 s]
>> [INFO] Spark Project Networking ........................... SUCCESS [
>> 4.817 s]
>> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
>> 2.541 s]
>> [INFO] Spark Project Unsafe ............................... SUCCESS [
>> 2.795 s]
>> [INFO] Spark Project Launcher ............................. SUCCESS [
>> 5.593 s]
>> [INFO] Spark Project Core ................................. SUCCESS [
>> 25.160 s]
>> [INFO] Spark Project Kubernetes Integration Tests 2.4.0 ... SUCCESS
>> [05:30 min]
>> [INFO] ------------------------------------------------------------
>> ------------
>> [INFO] BUILD SUCCESS
>> [INFO] ------------------------------------------------------------
>> ------------
>> [INFO] Total time: 06:23 min
>> [INFO] Finished at: 2018-10-23T18:39:11Z
>> [INFO] ------------------------------------------------------------
>> ------------
>>
>>
>> but had to modify this line
>> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106> and
>> added -Pscala-2.12 , otherwise it fails (these tests inherit from the
>> parent pom but the profile is not propagated to the mvn command that
>> launches the tests, I can create a PR to fix that).
>>
>>
>> On Tue, Oct 23, 2018 at 7:44 PM, Hyukjin Kwon <gu...@gmail.com>
>> wrote:
>>
>>> https://github.com/apache/spark/pull/22514 sounds like a regression
>>> that affects Hive CTAS in write path (by not replacing them into Spark
>>> internal datasources; therefore performance regression).
>>> but yea I suspect if we should block the release by this.
>>>
>>> https://github.com/apache/spark/pull/22144 is just being discussed if I
>>> am not mistaken.
>>>
>>> Thanks.
>>>
>>> 2018년 10월 24일 (수) 오전 12:27, Xiao Li <ga...@gmail.com>님이 작성:
>>>
>>>> https://github.com/apache/spark/pull/22144 is also not a blocker of
>>>> Spark 2.4 release, as discussed in the PR.
>>>>
>>>> Thanks,
>>>>
>>>> Xiao
>>>>
>>>> Xiao Li <ga...@gmail.com> 于2018年10月23日周二 上午9:20写道：
>>>>
>>>>> Thanks for reporting this. https://github.com/apache/spark/pull/22514
>>>>> is not a blocker. We can fix it in the next minor release, if we are unable
>>>>> to make it in this release.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Xiao
>>>>>
>>>>> Sean Owen <sr...@gmail.com> 于2018年10月23日周二 上午9:14写道：
>>>>>
>>>>>> (I should add, I only observed this with the Scala 2.12 build. It all
>>>>>> seemed to work with 2.11. Therefore I'm not too worried about it. I
>>>>>> don't think it's a Scala version issue, but perhaps something looking
>>>>>> for a spark 2.11 tarball and not finding it. See
>>>>>> https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
>>>>>> a change that might address this kind of thing.)
>>>>>>
>>>>>> On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <sr...@gmail.com> wrote:
>>>>>> >
>>>>>> > Yeah, that's maybe the issue here. This is a source release, not a
>>>>>> git checkout, and it still needs to work in this context.
>>>>>> >
>>>>>> > I just added -Pkubernetes to my build and didn't do anything else.
>>>>>> I think the ideal is that a "mvn -P... -P... install" to work from a source
>>>>>> release; that's a good expectation and consistent with docs.
>>>>>> >
>>>>>> > Maybe these tests simply don't need to run with the normal suite of
>>>>>> tests, and can be considered tests run manually by developers running these
>>>>>> scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>>>>>> >
>>>>>> > I don't think this has to block the release even if so, just trying
>>>>>> to get to the bottom of it.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>
>>>>>>
>>
>>
>>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Dongjoon Hyun <do...@gmail.com>.

BTW, for that integration suite, I saw the related artifacts in the RC4
staging directory.

Does Spark 2.4.0 need to start to release these
`spark-kubernetes-integration-tests`
artifacts?

   -
   https://repository.apache.org/content/repositories/orgapachespark-1290/org/apache/spark/spark-kubernetes-integration-tests_2.11/
   -
   https://repository.apache.org/content/repositories/orgapachespark-1290/org/apache/spark/spark-kubernetes-integration-tests_2.12/

Historically, Spark released `spark-docker-integration-tests` at Spark
1.6.x era and stopped since Spark 2.0.0.

   -
   http://central.maven.org/maven2/org/apache/spark/spark-docker-integration-tests_2.10/
   -
   http://central.maven.org/maven2/org/apache/spark/spark-docker-integration-tests_2.11/


Bests,
Dongjoon.

On Tue, Oct 23, 2018 at 11:43 AM Stavros Kontopoulos <
stavros.kontopoulos@lightbend.com> wrote:

> Sean,
>
> Ok makes sense, im using a cloned repo. I built with Scala 2.12 profile
> using the related tag v2.4.0-rc4:
>
> ./dev/change-scala-version.sh 2.12
> ./dev/make-distribution.sh  --name test --r --tgz -Pscala-2.12 -Psparkr
> -Phadoop-2.7 -Pkubernetes -Phive
> Pushed images to dockerhub (previous email) since I didnt use the minikube
> daemon (default behavior).
>
> Then run tests successfully against minikube:
>
> TGZ_PATH=$(pwd)/spark-2.4.0-bin-test.gz
> cd resource-managers/kubernetes/integration-tests
>
> ./dev/dev-run-integration-tests.sh --spark-tgz $TGZ_PATH --service-account
> default --namespace default --image-tag k8s-scala-12 --image-repo skonto
>
>
> [INFO]
> [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
> spark-kubernetes-integration-tests_2.12 ---
> Discovery starting.
> Discovery completed in 229 milliseconds.
> Run starting. Expected test count is: 14
> KubernetesSuite:
> - Run SparkPi with no resources
> - Run SparkPi with a very long application name.
> - Use SparkLauncher.NO_RESOURCE
> - Run SparkPi with a master URL without a scheme.
> - Run SparkPi with an argument.
> - Run SparkPi with custom labels, annotations, and environment variables.
> - Run extraJVMOptions check on driver
> - Run SparkRemoteFileTest using a remote data file
> - Run SparkPi with env and mount secrets.
> - Run PySpark on simple pi.py example
> - Run PySpark with Python2 to test a pyfiles example
> - Run PySpark with Python3 to test a pyfiles example
> - Run PySpark with memory customization
> - Run in client mode.
> Run completed in 5 minutes, 24 seconds.
> Total number of tests run: 14
> Suites: completed 2, aborted 0
> Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
> All tests passed.
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Spark Project Parent POM 2.4.0 ..................... SUCCESS [
> 4.491 s]
> [INFO] Spark Project Tags ................................. SUCCESS [
> 3.833 s]
> [INFO] Spark Project Local DB ............................. SUCCESS [
> 2.680 s]
> [INFO] Spark Project Networking ........................... SUCCESS [
> 4.817 s]
> [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
> 2.541 s]
> [INFO] Spark Project Unsafe ............................... SUCCESS [
> 2.795 s]
> [INFO] Spark Project Launcher ............................. SUCCESS [
> 5.593 s]
> [INFO] Spark Project Core ................................. SUCCESS [
> 25.160 s]
> [INFO] Spark Project Kubernetes Integration Tests 2.4.0 ... SUCCESS [05:30
> min]
> [INFO]
> ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO]
> ------------------------------------------------------------------------
> [INFO] Total time: 06:23 min
> [INFO] Finished at: 2018-10-23T18:39:11Z
> [INFO]
> ------------------------------------------------------------------------
>
>
> but had to modify this line
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106> and
> added -Pscala-2.12 , otherwise it fails (these tests inherit from the
> parent pom but the profile is not propagated to the mvn command that
> launches the tests, I can create a PR to fix that).
>
>
> On Tue, Oct 23, 2018 at 7:44 PM, Hyukjin Kwon <gu...@gmail.com> wrote:
>
>> https://github.com/apache/spark/pull/22514 sounds like a regression that
>> affects Hive CTAS in write path (by not replacing them into Spark internal
>> datasources; therefore performance regression).
>> but yea I suspect if we should block the release by this.
>>
>> https://github.com/apache/spark/pull/22144 is just being discussed if I
>> am not mistaken.
>>
>> Thanks.
>>
>> 2018년 10월 24일 (수) 오전 12:27, Xiao Li <ga...@gmail.com>님이 작성:
>>
>>> https://github.com/apache/spark/pull/22144 is also not a blocker of
>>> Spark 2.4 release, as discussed in the PR.
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>> Xiao Li <ga...@gmail.com> 于2018年10月23日周二 上午9:20写道：
>>>
>>>> Thanks for reporting this. https://github.com/apache/spark/pull/22514
>>>> is not a blocker. We can fix it in the next minor release, if we are unable
>>>> to make it in this release.
>>>>
>>>> Thanks,
>>>>
>>>> Xiao
>>>>
>>>> Sean Owen <sr...@gmail.com> 于2018年10月23日周二 上午9:14写道：
>>>>
>>>>> (I should add, I only observed this with the Scala 2.12 build. It all
>>>>> seemed to work with 2.11. Therefore I'm not too worried about it. I
>>>>> don't think it's a Scala version issue, but perhaps something looking
>>>>> for a spark 2.11 tarball and not finding it. See
>>>>> https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
>>>>> a change that might address this kind of thing.)
>>>>>
>>>>> On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <sr...@gmail.com> wrote:
>>>>> >
>>>>> > Yeah, that's maybe the issue here. This is a source release, not a
>>>>> git checkout, and it still needs to work in this context.
>>>>> >
>>>>> > I just added -Pkubernetes to my build and didn't do anything else. I
>>>>> think the ideal is that a "mvn -P... -P... install" to work from a source
>>>>> release; that's a good expectation and consistent with docs.
>>>>> >
>>>>> > Maybe these tests simply don't need to run with the normal suite of
>>>>> tests, and can be considered tests run manually by developers running these
>>>>> scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>>>>> >
>>>>> > I don't think this has to block the release even if so, just trying
>>>>> to get to the bottom of it.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>
>>>>>
>
>
> --
> Stavros Kontopoulos
>
> *Senior Software Engineer*
> *Lightbend, Inc.*
>
> *p:  +30 6977967274 <%2B1%20650%20678%200020>*
> *e: stavros.kontopoulos@lightbend.com* <da...@lightbend.com>
>
>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Stavros Kontopoulos <st...@lightbend.com>.

Sean,

Ok makes sense, im using a cloned repo. I built with Scala 2.12 profile
using the related tag v2.4.0-rc4:

./dev/change-scala-version.sh 2.12
./dev/make-distribution.sh  --name test --r --tgz -Pscala-2.12 -Psparkr
-Phadoop-2.7 -Pkubernetes -Phive
Pushed images to dockerhub (previous email) since I didnt use the minikube
daemon (default behavior).

Then run tests successfully against minikube:

TGZ_PATH=$(pwd)/spark-2.4.0-bin-test.gz
cd resource-managers/kubernetes/integration-tests

./dev/dev-run-integration-tests.sh --spark-tgz $TGZ_PATH --service-account
default --namespace default --image-tag k8s-scala-12 --image-repo skonto


[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
spark-kubernetes-integration-tests_2.12 ---
Discovery starting.
Discovery completed in 229 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
Run completed in 5 minutes, 24 seconds.
Total number of tests run: 14
Suites: completed 2, aborted 0
Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO]
------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM 2.4.0 ..................... SUCCESS [
4.491 s]
[INFO] Spark Project Tags ................................. SUCCESS [
3.833 s]
[INFO] Spark Project Local DB ............................. SUCCESS [
2.680 s]
[INFO] Spark Project Networking ........................... SUCCESS [
4.817 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [
2.541 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [
2.795 s]
[INFO] Spark Project Launcher ............................. SUCCESS [
5.593 s]
[INFO] Spark Project Core ................................. SUCCESS [
25.160 s]
[INFO] Spark Project Kubernetes Integration Tests 2.4.0 ... SUCCESS [05:30
min]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 06:23 min
[INFO] Finished at: 2018-10-23T18:39:11Z
[INFO]
------------------------------------------------------------------------


but had to modify this line
<https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106>
and
added -Pscala-2.12 , otherwise it fails (these tests inherit from the
parent pom but the profile is not propagated to the mvn command that
launches the tests, I can create a PR to fix that).


On Tue, Oct 23, 2018 at 7:44 PM, Hyukjin Kwon <gu...@gmail.com> wrote:

> https://github.com/apache/spark/pull/22514 sounds like a regression that
> affects Hive CTAS in write path (by not replacing them into Spark internal
> datasources; therefore performance regression).
> but yea I suspect if we should block the release by this.
>
> https://github.com/apache/spark/pull/22144 is just being discussed if I
> am not mistaken.
>
> Thanks.
>
> 2018년 10월 24일 (수) 오전 12:27, Xiao Li <ga...@gmail.com>님이 작성:
>
>> https://github.com/apache/spark/pull/22144 is also not a blocker of
>> Spark 2.4 release, as discussed in the PR.
>>
>> Thanks,
>>
>> Xiao
>>
>> Xiao Li <ga...@gmail.com> 于2018年10月23日周二 上午9:20写道：
>>
>>> Thanks for reporting this. https://github.com/apache/spark/pull/22514
>>> is not a blocker. We can fix it in the next minor release, if we are unable
>>> to make it in this release.
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>> Sean Owen <sr...@gmail.com> 于2018年10月23日周二 上午9:14写道：
>>>
>>>> (I should add, I only observed this with the Scala 2.12 build. It all
>>>> seemed to work with 2.11. Therefore I'm not too worried about it. I
>>>> don't think it's a Scala version issue, but perhaps something looking
>>>> for a spark 2.11 tarball and not finding it. See
>>>> https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
>>>> a change that might address this kind of thing.)
>>>>
>>>> On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <sr...@gmail.com> wrote:
>>>> >
>>>> > Yeah, that's maybe the issue here. This is a source release, not a
>>>> git checkout, and it still needs to work in this context.
>>>> >
>>>> > I just added -Pkubernetes to my build and didn't do anything else. I
>>>> think the ideal is that a "mvn -P... -P... install" to work from a source
>>>> release; that's a good expectation and consistent with docs.
>>>> >
>>>> > Maybe these tests simply don't need to run with the normal suite of
>>>> tests, and can be considered tests run manually by developers running these
>>>> scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>>>> >
>>>> > I don't think this has to block the release even if so, just trying
>>>> to get to the bottom of it.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>
>>>>


-- 
Stavros Kontopoulos

*Senior Software Engineer*
*Lightbend, Inc.*

*p:  +30 6977967274 <%2B1%20650%20678%200020>*
*e: stavros.kontopoulos@lightbend.com* <da...@lightbend.com>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Hyukjin Kwon <gu...@gmail.com>.

https://github.com/apache/spark/pull/22514 sounds like a regression that
affects Hive CTAS in write path (by not replacing them into Spark internal
datasources; therefore performance regression).
but yea I suspect if we should block the release by this.

https://github.com/apache/spark/pull/22144 is just being discussed if I am
not mistaken.

Thanks.

2018년 10월 24일 (수) 오전 12:27, Xiao Li <ga...@gmail.com>님이 작성:

> https://github.com/apache/spark/pull/22144 is also not a blocker of Spark
> 2.4 release, as discussed in the PR.
>
> Thanks,
>
> Xiao
>
> Xiao Li <ga...@gmail.com> 于2018年10月23日周二 上午9:20写道：
>
>> Thanks for reporting this. https://github.com/apache/spark/pull/22514 is
>> not a blocker. We can fix it in the next minor release, if we are unable to
>> make it in this release.
>>
>> Thanks,
>>
>> Xiao
>>
>> Sean Owen <sr...@gmail.com> 于2018年10月23日周二 上午9:14写道：
>>
>>> (I should add, I only observed this with the Scala 2.12 build. It all
>>> seemed to work with 2.11. Therefore I'm not too worried about it. I
>>> don't think it's a Scala version issue, but perhaps something looking
>>> for a spark 2.11 tarball and not finding it. See
>>> https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
>>> a change that might address this kind of thing.)
>>>
>>> On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <sr...@gmail.com> wrote:
>>> >
>>> > Yeah, that's maybe the issue here. This is a source release, not a git
>>> checkout, and it still needs to work in this context.
>>> >
>>> > I just added -Pkubernetes to my build and didn't do anything else. I
>>> think the ideal is that a "mvn -P... -P... install" to work from a source
>>> release; that's a good expectation and consistent with docs.
>>> >
>>> > Maybe these tests simply don't need to run with the normal suite of
>>> tests, and can be considered tests run manually by developers running these
>>> scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>>> >
>>> > I don't think this has to block the release even if so, just trying to
>>> get to the bottom of it.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>>>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Xiao Li <ga...@gmail.com>.

https://github.com/apache/spark/pull/22144 is also not a blocker of Spark
2.4 release, as discussed in the PR.

Thanks,

Xiao

Xiao Li <ga...@gmail.com> 于2018年10月23日周二 上午9:20写道：

> Thanks for reporting this. https://github.com/apache/spark/pull/22514 is
> not a blocker. We can fix it in the next minor release, if we are unable to
> make it in this release.
>
> Thanks,
>
> Xiao
>
> Sean Owen <sr...@gmail.com> 于2018年10月23日周二 上午9:14写道：
>
>> (I should add, I only observed this with the Scala 2.12 build. It all
>> seemed to work with 2.11. Therefore I'm not too worried about it. I
>> don't think it's a Scala version issue, but perhaps something looking
>> for a spark 2.11 tarball and not finding it. See
>> https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
>> a change that might address this kind of thing.)
>>
>> On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <sr...@gmail.com> wrote:
>> >
>> > Yeah, that's maybe the issue here. This is a source release, not a git
>> checkout, and it still needs to work in this context.
>> >
>> > I just added -Pkubernetes to my build and didn't do anything else. I
>> think the ideal is that a "mvn -P... -P... install" to work from a source
>> release; that's a good expectation and consistent with docs.
>> >
>> > Maybe these tests simply don't need to run with the normal suite of
>> tests, and can be considered tests run manually by developers running these
>> scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>> >
>> > I don't think this has to block the release even if so, just trying to
>> get to the bottom of it.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Xiao Li <ga...@gmail.com>.

Thanks for reporting this. https://github.com/apache/spark/pull/22514 is
not a blocker. We can fix it in the next minor release, if we are unable to
make it in this release.

Thanks,

Xiao

Sean Owen <sr...@gmail.com> 于2018年10月23日周二 上午9:14写道：

> (I should add, I only observed this with the Scala 2.12 build. It all
> seemed to work with 2.11. Therefore I'm not too worried about it. I
> don't think it's a Scala version issue, but perhaps something looking
> for a spark 2.11 tarball and not finding it. See
> https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
> a change that might address this kind of thing.)
>
> On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <sr...@gmail.com> wrote:
> >
> > Yeah, that's maybe the issue here. This is a source release, not a git
> checkout, and it still needs to work in this context.
> >
> > I just added -Pkubernetes to my build and didn't do anything else. I
> think the ideal is that a "mvn -P... -P... install" to work from a source
> release; that's a good expectation and consistent with docs.
> >
> > Maybe these tests simply don't need to run with the normal suite of
> tests, and can be considered tests run manually by developers running these
> scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
> >
> > I don't think this has to block the release even if so, just trying to
> get to the bottom of it.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Ilan Filonenko <if...@cornell.edu>.

+1 (non-binding) in reference to all k8s tests for 2.11 (including SparkR
Tests with R version being 3.4.1)

*[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
spark-kubernetes-integration-tests_2.11 ---*
*Discovery starting.*
*Discovery completed in 202 milliseconds.*
*Run starting. Expected test count is: 15*
*KubernetesSuite:*
*- Run SparkPi with no resources*
*- Run SparkPi with a very long application name.*
*- Use SparkLauncher.NO_RESOURCE*
*- Run SparkPi with a master URL without a scheme.*
*- Run SparkPi with an argument.*
*- Run SparkPi with custom labels, annotations, and environment variables.*
*- Run extraJVMOptions check on driver*
*- Run SparkRemoteFileTest using a remote data file*
*- Run SparkPi with env and mount secrets.*
*- Run PySpark on simple pi.py example*
*- Run PySpark with Python2 to test a pyfiles example*
*- Run PySpark with Python3 to test a pyfiles example*
*- Run PySpark with memory customization*
*- Run SparkR on simple dataframe.R example*
*- Run in client mode.*
*Run completed in 6 minutes, 47 seconds.*
*Total number of tests run: 15*
*Suites: completed 2, aborted 0*
*Tests: succeeded 15, failed 0, canceled 0, ignored 0, pending 0*
*All tests passed.*

Sean, in reference to your issues, the comment you linked is correct in
that you would need to build a Kubernetes distribution:
i.e.
*dev/make-distribution.sh --pip --r --tgz -Psparkr -Phadoop-2.7
-Pkubernetes*setup minikube
i.e. *minikube start --insecure-registry=localhost:5000 --cpus 6 --memory
6000*
and then run appropriate tests:
i.e. *dev/dev-run-integration-tests.sh --spark-tgz
.../spark-2.4.0-bin-2.7.3.tgz*

The newest PR that you linked allows us to point to the local Kubernetes
cluster deployed via docker-for-mac as opposed to minikube which gives us
another way to test, but does not change the workflow of testing AFAICT.

On Tue, Oct 23, 2018 at 9:14 AM Sean Owen <sr...@gmail.com> wrote:

> (I should add, I only observed this with the Scala 2.12 build. It all
> seemed to work with 2.11. Therefore I'm not too worried about it. I
> don't think it's a Scala version issue, but perhaps something looking
> for a spark 2.11 tarball and not finding it. See
> https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
> a change that might address this kind of thing.)
>
> On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <sr...@gmail.com> wrote:
> >
> > Yeah, that's maybe the issue here. This is a source release, not a git
> checkout, and it still needs to work in this context.
> >
> > I just added -Pkubernetes to my build and didn't do anything else. I
> think the ideal is that a "mvn -P... -P... install" to work from a source
> release; that's a good expectation and consistent with docs.
> >
> > Maybe these tests simply don't need to run with the normal suite of
> tests, and can be considered tests run manually by developers running these
> scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
> >
> > I don't think this has to block the release even if so, just trying to
> get to the bottom of it.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Sean Owen <sr...@gmail.com>.

(I should add, I only observed this with the Scala 2.12 build. It all
seemed to work with 2.11. Therefore I'm not too worried about it. I
don't think it's a Scala version issue, but perhaps something looking
for a spark 2.11 tarball and not finding it. See
https://github.com/apache/spark/pull/22805#issuecomment-432304622 for
a change that might address this kind of thing.)

On Tue, Oct 23, 2018 at 11:05 AM Sean Owen <sr...@gmail.com> wrote:
>
> Yeah, that's maybe the issue here. This is a source release, not a git checkout, and it still needs to work in this context.
>
> I just added -Pkubernetes to my build and didn't do anything else. I think the ideal is that a "mvn -P... -P... install" to work from a source release; that's a good expectation and consistent with docs.
>
> Maybe these tests simply don't need to run with the normal suite of tests, and can be considered tests run manually by developers running these scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?
>
> I don't think this has to block the release even if so, just trying to get to the bottom of it.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Sean Owen <sr...@gmail.com>.

Yeah, that's maybe the issue here. This is a source release, not a git
checkout, and it still needs to work in this context.

I just added -Pkubernetes to my build and didn't do anything else. I think
the ideal is that a "mvn -P... -P... install" to work from a source
release; that's a good expectation and consistent with docs.

Maybe these tests simply don't need to run with the normal suite of tests,
and can be considered tests run manually by developers running these
scripts? Basically, KubernetesSuite shouldn't run in a normal mvn install?

I don't think this has to block the release even if so, just trying to get
to the bottom of it.

On Tue, Oct 23, 2018 at 10:58 AM Stavros Kontopoulos <
stavros.kontopoulos@lightbend.com> wrote:

> Ok I missed the error one line above, before the distro error there is
> another one:
>
> fatal: not a git repository (or any of the parent directories): .git
>
>
> So that seems to come from here
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh#L19>.
> It seems that the test root is not set up correctly. It should be the top
> git dir from which you built Spark.
>
> Now regarding the distro thing. dev-run-integration-tests.sh should run
> from within the cloned project after the distro is built. The distro is
> required
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh#L61>
> , it should fail otherwise.
>
> Integration tests run the setup-integration-test-env.sh script. dev-run-integration-tests.sh
> calls mvn
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106> which
> in turn executes that setup script
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/pom.xml#L80>
> .
>
> How do you run the tests?
>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Wenchen Fan <cl...@gmail.com>.

I read through the contributing guide
<https://spark.apache.org/contributing.html>, it only mentions that data
correctness and data loss issues should be marked as blockers. AFAIK we
also mark regressions of current release as blockers, but not regressions
of the previous releases.

SPARK-24935 is indeed a bug, and is a regression from Spark 2.2.0. We
should definitely fix it, but doesn't seem like a blocker. BTW the root
cause of SPARK-24935 is unknown(at least I can't tell from the PR), so
fixing it might take a while.

On Tue, Oct 23, 2018 at 11:58 PM Stavros Kontopoulos <
stavros.kontopoulos@lightbend.com> wrote:

> Sean,
>
> I will try it against 2.12 shortly.
>
> You're saying someone would have to first build a k8s distro from source
>> too?
>
>
> Ok I missed the error one line above, before the distro error there is
> another one:
>
> fatal: not a git repository (or any of the parent directories): .git
>
>
> So that seems to come from here
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh#L19>.
> It seems that the test root is not set up correctly. It should be the top
> git dir from which you built Spark.
>
> Now regarding the distro thing. dev-run-integration-tests.sh should run
> from within the cloned project after the distro is built. The distro is
> required
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh#L61>
> , it should fail otherwise.
>
> Integration tests run the setup-integration-test-env.sh script. dev-run-integration-tests.sh
> calls mvn
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106> which
> in turn executes that setup script
> <https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/pom.xml#L80>
> .
>
> How do you run the tests?
>
> Stavros
>
> On Tue, Oct 23, 2018 at 3:01 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> No, because the docs are built into the release too and released to
>> the site too from the released artifact.
>> As a practical matter, I think these docs are not critical for
>> release, and can follow in a maintenance release. I'd retarget to
>> 2.4.1 or untarget.
>> I do know at times a release's docs have been edited after the fact,
>> but that's bad form. We'd not go change a class in the release after
>> it was released and call it the same release.
>>
>> I'd still like some confirmation that someone can build and pass tests
>> with -Pkubernetes, maybe? It actually all passed with the 2.11 build.
>> I don't think it's a 2.12 incompatibility, but rather than the K8S
>> tests maybe don't quite work with the 2.12 build artifact naming. Or
>> else something to do with my env.
>>
>> On Mon, Oct 22, 2018 at 9:08 PM Wenchen Fan <cl...@gmail.com> wrote:
>> >
>> > Regarding the doc tickets, I vaguely remember that we can merge doc PRs
>> after release and publish doc to spark website later. Can anyone confirm?
>> >
>> > On Tue, Oct 23, 2018 at 8:30 AM Sean Owen <sr...@gmail.com> wrote:
>> >>
>> >> This is what I got from a straightforward build of the source distro
>> >> here ... really, ideally, it builds as-is from source. You're saying
>> >> someone would have to first build a k8s distro from source too?
>> >> It's not a 'must' that this be automatic but nothing else fails out of
>> the box.
>> >> I feel like I might be misunderstanding the setup here.
>> >> On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
>> >> <st...@lightbend.com> wrote:
>>
>
>
>
> --
> Stavros Kontopoulos
>
> *Senior Software Engineer*
> *Lightbend, Inc.*
>
> *p:  +30 6977967274 <%2B1%20650%20678%200020>*
> *e: stavros.kontopoulos@lightbend.com* <da...@lightbend.com>
>
>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Stavros Kontopoulos <st...@lightbend.com>.

Sean,

I will try it against 2.12 shortly.

You're saying someone would have to first build a k8s distro from source
> too?

Ok I missed the error one line above, before the distro error there is
another one:

fatal: not a git repository (or any of the parent directories): .git

So that seems to come from here
<https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh#L19>.
It seems that the test root is not set up correctly. It should be the top
git dir from which you built Spark.

Now regarding the distro thing. dev-run-integration-tests.sh should run
from within the cloned project after the distro is built. The distro is
required
<https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh#L61>
, it should fail otherwise.

Integration tests run the setup-integration-test-env.sh script.
dev-run-integration-tests.sh
calls mvn
<https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh#L106>
which
in turn executes that setup script
<https://github.com/apache/spark/blob/master/resource-managers/kubernetes/integration-tests/pom.xml#L80>
.

How do you run the tests?

Stavros

On Tue, Oct 23, 2018 at 3:01 PM, Sean Owen <sr...@gmail.com> wrote:

> No, because the docs are built into the release too and released to
> the site too from the released artifact.
> As a practical matter, I think these docs are not critical for
> release, and can follow in a maintenance release. I'd retarget to
> 2.4.1 or untarget.
> I do know at times a release's docs have been edited after the fact,
> but that's bad form. We'd not go change a class in the release after
> it was released and call it the same release.
>
> I'd still like some confirmation that someone can build and pass tests
> with -Pkubernetes, maybe? It actually all passed with the 2.11 build.
> I don't think it's a 2.12 incompatibility, but rather than the K8S
> tests maybe don't quite work with the 2.12 build artifact naming. Or
> else something to do with my env.
>
> On Mon, Oct 22, 2018 at 9:08 PM Wenchen Fan <cl...@gmail.com> wrote:
> >
> > Regarding the doc tickets, I vaguely remember that we can merge doc PRs
> after release and publish doc to spark website later. Can anyone confirm?
> >
> > On Tue, Oct 23, 2018 at 8:30 AM Sean Owen <sr...@gmail.com> wrote:
> >>
> >> This is what I got from a straightforward build of the source distro
> >> here ... really, ideally, it builds as-is from source. You're saying
> >> someone would have to first build a k8s distro from source too?
> >> It's not a 'must' that this be automatic but nothing else fails out of
> the box.
> >> I feel like I might be misunderstanding the setup here.
> >> On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
> >> <st...@lightbend.com> wrote:
>

-- 
Stavros Kontopoulos

*Senior Software Engineer*
*Lightbend, Inc.*

*p:  +30 6977967274 <%2B1%20650%20678%200020>*
*e: stavros.kontopoulos@lightbend.com* <da...@lightbend.com>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Sean Owen <sr...@gmail.com>.

No, because the docs are built into the release too and released to
the site too from the released artifact.
As a practical matter, I think these docs are not critical for
release, and can follow in a maintenance release. I'd retarget to
2.4.1 or untarget.
I do know at times a release's docs have been edited after the fact,
but that's bad form. We'd not go change a class in the release after
it was released and call it the same release.

I'd still like some confirmation that someone can build and pass tests
with -Pkubernetes, maybe? It actually all passed with the 2.11 build.
I don't think it's a 2.12 incompatibility, but rather than the K8S
tests maybe don't quite work with the 2.12 build artifact naming. Or
else something to do with my env.

On Mon, Oct 22, 2018 at 9:08 PM Wenchen Fan <cl...@gmail.com> wrote:
>
> Regarding the doc tickets, I vaguely remember that we can merge doc PRs after release and publish doc to spark website later. Can anyone confirm?
>
> On Tue, Oct 23, 2018 at 8:30 AM Sean Owen <sr...@gmail.com> wrote:
>>
>> This is what I got from a straightforward build of the source distro
>> here ... really, ideally, it builds as-is from source. You're saying
>> someone would have to first build a k8s distro from source too?
>> It's not a 'must' that this be automatic but nothing else fails out of the box.
>> I feel like I might be misunderstanding the setup here.
>> On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
>> <st...@lightbend.com> wrote:

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Wenchen Fan <cl...@gmail.com>.

Regarding the doc tickets, I vaguely remember that we can merge doc PRs
after release and publish doc to spark website later. Can anyone confirm?

On Tue, Oct 23, 2018 at 8:30 AM Sean Owen <sr...@gmail.com> wrote:

> This is what I got from a straightforward build of the source distro
> here ... really, ideally, it builds as-is from source. You're saying
> someone would have to first build a k8s distro from source too?
> It's not a 'must' that this be automatic but nothing else fails out of the
> box.
> I feel like I might be misunderstanding the setup here.
> On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
> <st...@lightbend.com> wrote:
> >
> >
> >>
> >> tar (child): Error is not recoverable: exiting now
> >> tar: Child returned status 2
> >> tar: Error is not recoverable: exiting now
> >> scripts/setup-integration-test-env.sh: line 85:
> >>
> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
> >
> >
> > It seems you are missing the distro file... here is how I run it locally:
> >
> > DOCKER_USERNAME=...
> > SPARK_K8S_IMAGE_TAG=...
> >
> > ./dev/make-distribution.sh --name test --tgz -Phadoop-2.7 -Pkubernetes
> -Phive
> > tar -zxvf spark-2.4.0-SNAPSHOT-bin-test.tgz
> > cd spark-2.4.0-SNAPSHOT-bin-test
> > ./bin/docker-image-tool.sh -r $DOCKER_USERNAME -t $SPARK_K8S_IMAGE_TAG
> build
> > cd ..
> > TGZ_PATH=$(pwd)/spark-2.4.0-SNAPSHOT-bin-test.tgz
> > cd resource-managers/kubernetes/integration-tests
> > ./dev/dev-run-integration-tests.sh --image-tag $SPARK_K8S_IMAGE_TAG
> --spark-tgz $TGZ_PATH --image-repo $DOCKER_USERNAME
> >
> > Stavros
> >
> > On Tue, Oct 23, 2018 at 1:54 AM, Sean Owen <sr...@gmail.com> wrote:
> >>
> >> Provisionally looking good to me, but I had a few questions.
> >>
> >> We have these open for 2.4, but I presume they aren't actually going
> >> to be in 2.4 and should be untargeted:
> >>
> >> SPARK-25507 Update documents for the new features in 2.4 release
> >> SPARK-25179 Document the features that require Pyarrow 0.10
> >> SPARK-25783 Spark shell fails because of jline incompatibility
> >> SPARK-25347 Document image data source in doc site
> >> SPARK-25584 Document libsvm data source in doc site
> >> SPARK-25346 Document Spark builtin data sources
> >> SPARK-24464 Unit tests for MLlib's Instrumentation
> >> SPARK-23197 Flaky test:
> spark.streaming.ReceiverSuite."receiver_life_cycle"
> >> SPARK-22809 pyspark is sensitive to imports with dots
> >> SPARK-21030 extend hint syntax to support any expression for Python and
> R
> >>
> >> Comments in several of the doc issues suggest they are needed for 2.4
> >> though. How essential?
> >>
> >> (Brief digression: SPARK-21030 is an example of a pattern I see
> >> sometimes. Parent Epic A is targeted for version X. Children B and C
> >> are not. Epic A's description is basically "do X and Y". Is the parent
> >> helping? And now that Y is done, is there a point in tracking X with
> >> two JIRAs? can I just close the Epic?)
> >>
> >> I am not sure I've tried running K8S in my test runs before, but I get
> >> this on my Linux machine:
> >>
> >> [INFO] --- exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @
> >> spark-kubernetes-integration-tests_2.12 ---
> >> fatal: not a git repository (or any of the parent directories): .git
> >> tar (child): --strip-components=1: Cannot open: No such file or
> directory
> >> tar (child): Error is not recoverable: exiting now
> >> tar: Child returned status 2
> >> tar: Error is not recoverable: exiting now
> >> scripts/setup-integration-test-env.sh: line 85:
> >>
> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
> >> No such file or directory
> >> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests
> >> [INFO]
> >> [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
> >> spark-kubernetes-integration-tests_2.12 ---
> >> Discovery starting.
> >> Discovery completed in 289 milliseconds.
> >> Run starting. Expected test count is: 14
> >> KubernetesSuite:
> >> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED
> ***
> >>   java.lang.NullPointerException:
> >>   at
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:92)
> >>   at
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
> >>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> >>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> >>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org
> $scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:39)
> >>   at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
> >>   at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
> >>   at
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.run(KubernetesSuite.scala:39)
> >>   at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1210)
> >>   at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1257)
> >>   ...
> >>
> >> Clearly it's expecting something about the env that isn't true, but I
> >> don't know if it's a problem with those expectations versus what is in
> >> the source release, or, just something to do with my env. This is with
> >> Scala 2.12.
> >>
> >>
> >>
> >> On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan <cl...@gmail.com>
> wrote:
> >> >
> >> > Please vote on releasing the following candidate as Apache Spark
> version 2.4.0.
> >> >
> >> > The vote is open until October 26 PST and passes if a majority +1 PMC
> votes are cast, with
> >> > a minimum of 3 +1 votes.
> >> >
> >> > [ ] +1 Release this package as Apache Spark 2.4.0
> >> > [ ] -1 Do not release this package because ...
> >> >
> >> > To learn more about Apache Spark, please see http://spark.apache.org/
> >> >
> >> > The tag to be voted on is v2.4.0-rc4 (commit
> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
> >> > https://github.com/apache/spark/tree/v2.4.0-rc4
> >> >
> >> > The release files, including signatures, digests, etc. can be found
> at:
> >> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
> >> >
> >> > Signatures used for Spark RCs can be found in this file:
> >> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >> >
> >> > The staging repository for this release can be found at:
> >> >
> https://repository.apache.org/content/repositories/orgapachespark-1290
> >> >
> >> > The documentation corresponding to this release can be found at:
> >> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
> >> >
> >> > The list of bug fixes going into 2.4.0 can be found at the following
> URL:
> >> > https://issues.apache.org/jira/projects/SPARK/versions/12342385
> >> >
> >> > FAQ
> >> >
> >> > =========================
> >> > How can I help test this release?
> >> > =========================
> >> >
> >> > If you are a Spark user, you can help us test this release by taking
> >> > an existing Spark workload and running on this release candidate, then
> >> > reporting any regressions.
> >> >
> >> > If you're working in PySpark you can set up a virtual env and install
> >> > the current RC and see if anything important breaks, in the Java/Scala
> >> > you can add the staging repository to your projects resolvers and test
> >> > with the RC (make sure to clean up the artifact cache before/after so
> >> > you don't end up building with a out of date RC going forward).
> >> >
> >> > ===========================================
> >> > What should happen to JIRA tickets still targeting 2.4.0?
> >> > ===========================================
> >> >
> >> > The current list of open tickets targeted at 2.4.0 can be found at:
> >> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
> >> >
> >> > Committers should look at those and triage. Extremely important bug
> >> > fixes, documentation, and API tweaks that impact compatibility should
> >> > be worked on immediately. Everything else please retarget to an
> >> > appropriate release.
> >> >
> >> > ==================
> >> > But my bug isn't fixed?
> >> > ==================
> >> >
> >> > In order to make timely releases, we will typically not hold the
> >> > release unless the bug in question is a regression from the previous
> >> > release. That being said, if there is something which is a regression
> >> > that has not been correctly targeted please ping me or a committer to
> >> > help target the issue.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>
> >
> >
> >
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Sean Owen <sr...@gmail.com>.

This is what I got from a straightforward build of the source distro
here ... really, ideally, it builds as-is from source. You're saying
someone would have to first build a k8s distro from source too?
It's not a 'must' that this be automatic but nothing else fails out of the box.
I feel like I might be misunderstanding the setup here.
On Mon, Oct 22, 2018 at 7:25 PM Stavros Kontopoulos
<st...@lightbend.com> wrote:
>
>
>>
>> tar (child): Error is not recoverable: exiting now
>> tar: Child returned status 2
>> tar: Error is not recoverable: exiting now
>> scripts/setup-integration-test-env.sh: line 85:
>> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
>
>
> It seems you are missing the distro file... here is how I run it locally:
>
> DOCKER_USERNAME=...
> SPARK_K8S_IMAGE_TAG=...
>
> ./dev/make-distribution.sh --name test --tgz -Phadoop-2.7 -Pkubernetes -Phive
> tar -zxvf spark-2.4.0-SNAPSHOT-bin-test.tgz
> cd spark-2.4.0-SNAPSHOT-bin-test
> ./bin/docker-image-tool.sh -r $DOCKER_USERNAME -t $SPARK_K8S_IMAGE_TAG build
> cd ..
> TGZ_PATH=$(pwd)/spark-2.4.0-SNAPSHOT-bin-test.tgz
> cd resource-managers/kubernetes/integration-tests
> ./dev/dev-run-integration-tests.sh --image-tag $SPARK_K8S_IMAGE_TAG --spark-tgz $TGZ_PATH --image-repo $DOCKER_USERNAME
>
> Stavros
>
> On Tue, Oct 23, 2018 at 1:54 AM, Sean Owen <sr...@gmail.com> wrote:
>>
>> Provisionally looking good to me, but I had a few questions.
>>
>> We have these open for 2.4, but I presume they aren't actually going
>> to be in 2.4 and should be untargeted:
>>
>> SPARK-25507 Update documents for the new features in 2.4 release
>> SPARK-25179 Document the features that require Pyarrow 0.10
>> SPARK-25783 Spark shell fails because of jline incompatibility
>> SPARK-25347 Document image data source in doc site
>> SPARK-25584 Document libsvm data source in doc site
>> SPARK-25346 Document Spark builtin data sources
>> SPARK-24464 Unit tests for MLlib's Instrumentation
>> SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
>> SPARK-22809 pyspark is sensitive to imports with dots
>> SPARK-21030 extend hint syntax to support any expression for Python and R
>>
>> Comments in several of the doc issues suggest they are needed for 2.4
>> though. How essential?
>>
>> (Brief digression: SPARK-21030 is an example of a pattern I see
>> sometimes. Parent Epic A is targeted for version X. Children B and C
>> are not. Epic A's description is basically "do X and Y". Is the parent
>> helping? And now that Y is done, is there a point in tracking X with
>> two JIRAs? can I just close the Epic?)
>>
>> I am not sure I've tried running K8S in my test runs before, but I get
>> this on my Linux machine:
>>
>> [INFO] --- exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @
>> spark-kubernetes-integration-tests_2.12 ---
>> fatal: not a git repository (or any of the parent directories): .git
>> tar (child): --strip-components=1: Cannot open: No such file or directory
>> tar (child): Error is not recoverable: exiting now
>> tar: Child returned status 2
>> tar: Error is not recoverable: exiting now
>> scripts/setup-integration-test-env.sh: line 85:
>> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
>> No such file or directory
>> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests
>> [INFO]
>> [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
>> spark-kubernetes-integration-tests_2.12 ---
>> Discovery starting.
>> Discovery completed in 289 milliseconds.
>> Run starting. Expected test count is: 14
>> KubernetesSuite:
>> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
>>   java.lang.NullPointerException:
>>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:92)
>>   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org$scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:39)
>>   at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
>>   at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
>>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.run(KubernetesSuite.scala:39)
>>   at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1210)
>>   at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1257)
>>   ...
>>
>> Clearly it's expecting something about the env that isn't true, but I
>> don't know if it's a problem with those expectations versus what is in
>> the source release, or, just something to do with my env. This is with
>> Scala 2.12.
>>
>>
>>
>> On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan <cl...@gmail.com> wrote:
>> >
>> > Please vote on releasing the following candidate as Apache Spark version 2.4.0.
>> >
>> > The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
>> > a minimum of 3 +1 votes.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.4.0
>> > [ ] -1 Do not release this package because ...
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>> > https://github.com/apache/spark/tree/v2.4.0-rc4
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>> >
>> > Signatures used for Spark RCs can be found in this file:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1290
>> >
>> > The documentation corresponding to this release can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>> >
>> > The list of bug fixes going into 2.4.0 can be found at the following URL:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12342385
>> >
>> > FAQ
>> >
>> > =========================
>> > How can I help test this release?
>> > =========================
>> >
>> > If you are a Spark user, you can help us test this release by taking
>> > an existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> > the current RC and see if anything important breaks, in the Java/Scala
>> > you can add the staging repository to your projects resolvers and test
>> > with the RC (make sure to clean up the artifact cache before/after so
>> > you don't end up building with a out of date RC going forward).
>> >
>> > ===========================================
>> > What should happen to JIRA tickets still targeting 2.4.0?
>> > ===========================================
>> >
>> > The current list of open tickets targeted at 2.4.0 can be found at:
>> > https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0
>> >
>> > Committers should look at those and triage. Extremely important bug
>> > fixes, documentation, and API tweaks that impact compatibility should
>> > be worked on immediately. Everything else please retarget to an
>> > appropriate release.
>> >
>> > ==================
>> > But my bug isn't fixed?
>> > ==================
>> >
>> > In order to make timely releases, we will typically not hold the
>> > release unless the bug in question is a regression from the previous
>> > release. That being said, if there is something which is a regression
>> > that has not been correctly targeted please ping me or a committer to
>> > help target the issue.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Stavros Kontopoulos <st...@lightbend.com>.

> tar (child): Error is not recoverable: exiting now
> tar: Child returned status 2
> tar: Error is not recoverable: exiting now
> scripts/setup-integration-test-env.sh: line 85:
> /home/srowen/spark-2.4.0/resource-managers/kubernetes/
> integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:


It seems you are missing the distro file... here is how I run it locally:

DOCKER_USERNAME=...
SPARK_K8S_IMAGE_TAG=...

./dev/make-distribution.sh --name test --tgz -Phadoop-2.7 -Pkubernetes
-Phive
tar -zxvf spark-2.4.0-SNAPSHOT-bin-test.tgz
cd spark-2.4.0-SNAPSHOT-bin-test
./bin/docker-image-tool.sh -r $DOCKER_USERNAME -t $SPARK_K8S_IMAGE_TAG build
cd ..
TGZ_PATH=$(pwd)/spark-2.4.0-SNAPSHOT-bin-test.tgz
cd resource-managers/kubernetes/integration-tests
./dev/dev-run-integration-tests.sh --image-tag $SPARK_K8S_IMAGE_TAG
--spark-tgz $TGZ_PATH --image-repo $DOCKER_USERNAME

Stavros

On Tue, Oct 23, 2018 at 1:54 AM, Sean Owen <sr...@gmail.com> wrote:

> Provisionally looking good to me, but I had a few questions.
>
> We have these open for 2.4, but I presume they aren't actually going
> to be in 2.4 and should be untargeted:
>
> SPARK-25507 Update documents for the new features in 2.4 release
> SPARK-25179 Document the features that require Pyarrow 0.10
> SPARK-25783 Spark shell fails because of jline incompatibility
> SPARK-25347 Document image data source in doc site
> SPARK-25584 Document libsvm data source in doc site
> SPARK-25346 Document Spark builtin data sources
> SPARK-24464 Unit tests for MLlib's Instrumentation
> SPARK-23197 Flaky test: spark.streaming.ReceiverSuite.
> "receiver_life_cycle"
> SPARK-22809 pyspark is sensitive to imports with dots
> SPARK-21030 extend hint syntax to support any expression for Python and R
>
> Comments in several of the doc issues suggest they are needed for 2.4
> though. How essential?
>
> (Brief digression: SPARK-21030 is an example of a pattern I see
> sometimes. Parent Epic A is targeted for version X. Children B and C
> are not. Epic A's description is basically "do X and Y". Is the parent
> helping? And now that Y is done, is there a point in tracking X with
> two JIRAs? can I just close the Epic?)
>
> I am not sure I've tried running K8S in my test runs before, but I get
> this on my Linux machine:
>
> [INFO] --- exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @
> spark-kubernetes-integration-tests_2.12 ---
> fatal: not a git repository (or any of the parent directories): .git
> tar (child): --strip-components=1: Cannot open: No such file or directory
> tar (child): Error is not recoverable: exiting now
> tar: Child returned status 2
> tar: Error is not recoverable: exiting now
> scripts/setup-integration-test-env.sh: line 85:
> /home/srowen/spark-2.4.0/resource-managers/kubernetes/
> integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
> No such file or directory
> /home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests
> [INFO]
> [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
> spark-kubernetes-integration-tests_2.12 ---
> Discovery starting.
> Discovery completed in 289 milliseconds.
> Run starting. Expected test count is: 14
> KubernetesSuite:
> org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED
> ***
>   java.lang.NullPointerException:
>   at org.apache.spark.deploy.k8s.integrationtest.
> KubernetesSuite.beforeAll(KubernetesSuite.scala:92)
>   at org.scalatest.BeforeAndAfterAll.liftedTree1$
> 1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org
> $scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:39)
>   at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
>   at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
>   at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.run(
> KubernetesSuite.scala:39)
>   at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1210)
>   at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1257)
>   ...
>
> Clearly it's expecting something about the env that isn't true, but I
> don't know if it's a problem with those expectations versus what is in
> the source release, or, just something to do with my env. This is with
> Scala 2.12.
>
>
>
> On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan <cl...@gmail.com> wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 2.4.0.
> >
> > The vote is open until October 26 PST and passes if a majority +1 PMC
> votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 2.4.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v2.4.0-rc4 (commit
> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
> > https://github.com/apache/spark/tree/v2.4.0-rc4
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1290
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
> >
> > The list of bug fixes going into 2.4.0 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12342385
> >
> > FAQ
> >
> > =========================
> > How can I help test this release?
> > =========================
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===========================================
> > What should happen to JIRA tickets still targeting 2.4.0?
> > ===========================================
> >
> > The current list of open tickets targeted at 2.4.0 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==================
> > But my bug isn't fixed?
> > ==================
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Sean Owen <sr...@gmail.com>.

Provisionally looking good to me, but I had a few questions.

We have these open for 2.4, but I presume they aren't actually going
to be in 2.4 and should be untargeted:

SPARK-25507 Update documents for the new features in 2.4 release
SPARK-25179 Document the features that require Pyarrow 0.10
SPARK-25783 Spark shell fails because of jline incompatibility
SPARK-25347 Document image data source in doc site
SPARK-25584 Document libsvm data source in doc site
SPARK-25346 Document Spark builtin data sources
SPARK-24464 Unit tests for MLlib's Instrumentation
SPARK-23197 Flaky test: spark.streaming.ReceiverSuite."receiver_life_cycle"
SPARK-22809 pyspark is sensitive to imports with dots
SPARK-21030 extend hint syntax to support any expression for Python and R

Comments in several of the doc issues suggest they are needed for 2.4
though. How essential?

(Brief digression: SPARK-21030 is an example of a pattern I see
sometimes. Parent Epic A is targeted for version X. Children B and C
are not. Epic A's description is basically "do X and Y". Is the parent
helping? And now that Y is done, is there a point in tracking X with
two JIRAs? can I just close the Epic?)

I am not sure I've tried running K8S in my test runs before, but I get
this on my Linux machine:

[INFO] --- exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @
spark-kubernetes-integration-tests_2.12 ---
fatal: not a git repository (or any of the parent directories): .git
tar (child): --strip-components=1: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
scripts/setup-integration-test-env.sh: line 85:
/home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/bin/docker-image-tool.sh:
No such file or directory
/home/srowen/spark-2.4.0/resource-managers/kubernetes/integration-tests
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @
spark-kubernetes-integration-tests_2.12 ---
Discovery starting.
Discovery completed in 289 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
  java.lang.NullPointerException:
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:92)
  at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
  at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
  at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org$scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:39)
  at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:258)
  at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:256)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.run(KubernetesSuite.scala:39)
  at org.scalatest.Suite.callExecuteOnSuite$1(Suite.scala:1210)
  at org.scalatest.Suite.$anonfun$runNestedSuites$1(Suite.scala:1257)
  ...

Clearly it's expecting something about the env that isn't true, but I
don't know if it's a problem with those expectations versus what is in
the source release, or, just something to do with my env. This is with
Scala 2.12.

On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan <cl...@gmail.com> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 2.4.0.
>
> The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
> https://github.com/apache/spark/tree/v2.4.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1290
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.4.0?
> ===========================================
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Hyukjin Kwon <gu...@gmail.com>.

I am sorry for raising this late. Out of curiosity, does anyone know why we
don't treat SPARK-24935 (https://github.com/apache/spark/pull/22144) as a
blocker?

It looks it broke a API compatibility, and an actual usecase of an external
library (https://github.com/DataSketches/sketches-hive)
Also, looks sufficient discussion was made for its diagnosis (
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/sketches-user/GmH4-OlHP9g/MW-J7Hg4BwAJ
).


2018년 10월 23일 (화) 오후 12:03, Darcy Shen <sa...@zoho.com.invalid>님이 작성:

>
>
> +1
>
>
> ---- On Tue, 23 Oct 2018 01:42:06 +0800 Wenchen Fan<cl...@gmail.com>
> wrote ----
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.4.0.
>
> The vote is open until October 26 PST and passes if a majority +1 PMC
> votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc4 (commit
> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
> https://github.com/apache/spark/tree/v2.4.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1290
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.4.0?
> ===========================================
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Darcy Shen <sa...@zoho.com.INVALID>.

        

        
            +1---- On Tue, 23 Oct 2018 01:42:06 +0800  Wenchen Fan<cl...@gmail.com> wrote ----Please vote on releasing the following candidate as Apache Spark version 2.4.0.The vote is open until October 26 PST and passes if a majority +1 PMC votes are cast, witha minimum of 3 +1 votes.[ ] +1 Release this package as Apache Spark 2.4.0[ ] -1 Do not release this package because ...To learn more about Apache Spark, please see http://spark.apache.org/The tag to be voted on is v2.4.0-rc4 (commit e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):https://github.com/apache/spark/tree/v2.4.0-rc4The release files, including signatures, digests, etc. can be found at:https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/Signatures used for Spark RCs can be found in this file:https://dist.apache.org/repos/dist/dev/spark/KEYSThe staging repository for this release can be found at:https://repository.apache.org/content/repositories/orgapachespark-1290The documentation corresponding to this release can be found at:https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/The list of bug fixes going into 2.4.0 can be found at the following URL:https://issues.apache.org/jira/projects/SPARK/versions/12342385FAQ=========================How can I help test this release?=========================If you are a Spark user, you can help us test this release by takingan existing Spark workload and running on this release candidate, thenreporting any regressions.If you're working in PySpark you can set up a virtual env and installthe current RC and see if anything important breaks, in the Java/Scalayou can add the staging repository to your projects resolvers and testwith the RC (make sure to clean up the artifact cache before/after soyou don't end up building with a out of date RC going forward).===========================================What should happen to JIRA tickets still targeting 2.4.0?===========================================The current list of open tickets targeted at 2.4.0 can be found at:https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0Committers should look at those and triage. Extremely important bugfixes, documentation, and API tweaks that impact compatibility shouldbe worked on immediately. Everything else please retarget to anappropriate release.==================But my bug isn't fixed?==================In order to make timely releases, we will typically not hold therelease unless the bug in question is a regression from the previousrelease. That being said, if there is something which is a regressionthat has not been correctly targeted please ping me or a committer tohelp target the issue.

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Imran Rashid <ir...@cloudera.com.INVALID>.

+1
No blockers and our internal tests are all passing.

(I did file https://issues.apache.org/jira/browse/SPARK-25805, but this is
just a minor issue with a flaky test)

On Mon, Oct 22, 2018 at 12:42 PM Wenchen Fan <cl...@gmail.com> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.4.0.
>
> The vote is open until October 26 PST and passes if a majority +1 PMC
> votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc4 (commit
> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
> https://github.com/apache/spark/tree/v2.4.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1290
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.4.0?
> ===========================================
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Wenchen Fan <cl...@gmail.com>.

Personally I don't think it matters. Users can build arbitrary
expressions/plans themselves with internal API, and we never guarantee the
result.

Removing these functions from the function registry is a small patch and
easy to review, and to me it's better than a 1000+ LOC patch that removes
the whole thing.

Again I don't have a strong opinion here. I'm OK to remove the entire thing
if a PR is ready and well reviewed.

On Thu, Oct 25, 2018 at 11:00 PM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you for the decision, All.
>
> As of now, to unblock this, it seems that we are trying to remove them
> from the function registry.
>
> https://github.com/apache/spark/pull/22821
>
> One problem here is that users can recover those functions like this
> simply.
>
> scala> spark.sessionState.functionRegistry.createOrReplaceTempFunction("map_filter", x => org.apache.spark.sql.catalyst.expressions.MapFilter(x(0),x(1)))
>
>
> Technically, the PR looks like a compromised way to unblock the release
> and to allow some users that feature completely.
>
> At first glance, I thought this is a workaround to ignore the discussion
> context. But, that sounds like one of the practical ways for Apache Spark.
> (We had Spark 2.0 Tech. Preview before.)
>
> I want to finalize the decision on `map_filter` (and related three
> functions) issue. Are we good to go with
> https://github.com/apache/spark/pull/22821?
>
> Bests,
> Dongjoon.
>
> PS. Also, there is a PR to completely remove them, too.
>        https://github.com/cloud-fan/spark/pull/11
>
>
> On Wed, Oct 24, 2018 at 10:14 PM Xiao Li <li...@databricks.com> wrote:
>
>> @Dongjoon Hyun <do...@gmail.com>  Thanks! This is a blocking
>> ticket. It returns a wrong result due to our undefined behavior. I agree we
>> should revert the newly added map-oriented functions. In 3.0 release, we
>> need to define the behavior of duplicate keys in the data type MAP and fix
>> all the related issues that are confusing to our end users.
>>
>> Thanks,
>>
>> Xiao
>>
>> On Wed, Oct 24, 2018 at 9:54 PM Wenchen Fan <cl...@gmail.com> wrote:
>>
>>> Ah now I see the problem. `map_filter` has a very weird semantic that is
>>> neither "earlier entry wins" or "latter entry wins".
>>>
>>> I've opened https://github.com/apache/spark/pull/22821 , to remove
>>> these newly added map-related functions from FunctionRegistry(for 2.4.0),
>>> so that they are invisible to end-users, and the weird behavior of Spark
>>> map type with duplicated keys are not escalated. We should fix it ASAP in
>>> the master branch.
>>>
>>> If others are OK with it, I'll start a new RC after that PR is merged.
>>>
>>> Thanks,
>>> Wenchen
>>>
>>> On Thu, Oct 25, 2018 at 10:32 AM Dongjoon Hyun <do...@gmail.com>
>>> wrote:
>>>
>>>> For the first question, it's `bin/spark-sql` result. I didn't check
>>>> STS, but it will return the same with `bin/spark-sql`.
>>>>
>>>> > I think map_filter is implemented correctly. map(1,2,1,3) is
>>>> actually map(1,2) according to the "earlier entry wins" semantic. I
>>>> don't think this will change in 2.4.1.
>>>>
>>>> For the second one, `map_filter` issue is not about `earlier entry
>>>> wins` stuff. Please see the following example.
>>>>
>>>> spark-sql> SELECT m, map_filter(m, (k,v) -> v=2) c FROM (SELECT
>>>> map_concat(map(1,2), map(1,3)) m);
>>>> {1:3} {1:2}
>>>>
>>>> spark-sql> SELECT m, map_filter(m, (k,v) -> v=3) c FROM (SELECT
>>>> map_concat(map(1,2), map(1,3)) m);
>>>> {1:3} {1:3}
>>>>
>>>> spark-sql> SELECT m, map_filter(m, (k,v) -> v=4) c FROM (SELECT
>>>> map_concat(map(1,2), map(1,3)) m);
>>>> {1:3} {}
>>>>
>>>> In other words, `map_filter` works like `push-downed filter` to the map
>>>> in terms of the output result
>>>> while users assumed that `map_filter` works on top of the result of
>>>> `m`.
>>>>
>>>> This is a function semantic issue.
>>>>
>>>>
>>>> On Wed, Oct 24, 2018 at 6:06 PM Wenchen Fan <cl...@gmail.com>
>>>> wrote:
>>>>
>>>>> > spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
>>>>> > {1:3}
>>>>>
>>>>> Are you running in the thrift-server? Then maybe this is caused by the
>>>>> bug in `Dateset.collect` as I mentioned above.
>>>>>
>>>>> I think map_filter is implemented correctly. map(1,2,1,3) is actually
>>>>> map(1,2) according to the "earlier entry wins" semantic. I don't
>>>>> think this will change in 2.4.1.
>>>>>
>>>>> On Thu, Oct 25, 2018 at 8:56 AM Dongjoon Hyun <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you for the follow-ups.
>>>>>>
>>>>>> Then, Spark 2.4.1 will return `{1:2}` differently from the followings
>>>>>> (including Spark/Scala) in the end?
>>>>>>
>>>>>> I hoped to fix the `map_filter`, but now Spark looks inconsistent in
>>>>>> many ways.
>>>>>>
>>>>>> scala> sql("select map(1,2,1,3)").show // Spark 2.2.2
>>>>>> +---------------+
>>>>>> |map(1, 2, 1, 3)|
>>>>>> +---------------+
>>>>>> |    Map(1 -> 3)|
>>>>>> +---------------+
>>>>>>
>>>>>>
>>>>>> spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
>>>>>> {1:3}
>>>>>>
>>>>>>
>>>>>> hive> select map(1,2,1,3);  // Hive 1.2.2
>>>>>> OK
>>>>>> {1:3}
>>>>>>
>>>>>>
>>>>>> presto> SELECT map_concat(map(array[1],array[2]),
>>>>>> map(array[1],array[3])); // Presto 0.212
>>>>>>  _col0
>>>>>> -------
>>>>>>  {1=3}
>>>>>>
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 24, 2018 at 5:17 PM Wenchen Fan <cl...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Dongjoon,
>>>>>>>
>>>>>>> Thanks for reporting it! This is indeed a bug that needs to be fixed.
>>>>>>>
>>>>>>> The problem is not about the function `map_filter`, but about how
>>>>>>> the map type values are created in Spark, when there are duplicated keys.
>>>>>>>
>>>>>>> In programming languages like Java/Scala, when creating map, the
>>>>>>> later entry wins. e.g. in scala
>>>>>>> scala> Map(1 -> 2, 1 -> 3)
>>>>>>> res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)
>>>>>>>
>>>>>>> scala> Map(1 -> 2, 1 -> 3).get(1)
>>>>>>> res1: Option[Int] = Some(3)
>>>>>>>
>>>>>>> However, in Spark, the earlier entry wins
>>>>>>> scala> sql("SELECT map(1,2,1,3)[1]").show
>>>>>>> +------------------+
>>>>>>> |map(1, 2, 1, 3)[1]|
>>>>>>> +------------------+
>>>>>>> |                 2|
>>>>>>> +------------------+
>>>>>>>
>>>>>>> So for Spark users, Map(1 -> 2, 1 -> 3) should be equal to Map(1 ->
>>>>>>> 2).
>>>>>>>
>>>>>>> But there are several bugs in Spark
>>>>>>>
>>>>>>> scala> sql("SELECT map(1,2,1,3)").show
>>>>>>> +----------------+
>>>>>>> | map(1, 2, 1, 3)|
>>>>>>> +----------------+
>>>>>>> |[1 -> 2, 1 -> 3]|
>>>>>>> +----------------+
>>>>>>> The displayed string of map values has a bug and we should
>>>>>>> deduplicate the entries, This is tracked by SPARK-25824.
>>>>>>>
>>>>>>>
>>>>>>> scala> sql("CREATE TABLE t AS SELECT map(1,2,1,3) as map")
>>>>>>> res11: org.apache.spark.sql.DataFrame = []
>>>>>>>
>>>>>>> scala> sql("select * from t").show
>>>>>>> +--------+
>>>>>>> |     map|
>>>>>>> +--------+
>>>>>>> |[1 -> 3]|
>>>>>>> +--------+
>>>>>>> The Hive map value convert has a bug, we should respect the "earlier
>>>>>>> entry wins" semantic. No ticket yet.
>>>>>>>
>>>>>>>
>>>>>>> scala> sql("select map(1,2,1,3)").collect
>>>>>>> res14: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
>>>>>>> Same bug happens at `collect`. No ticket yet.
>>>>>>>
>>>>>>> I'll create tickets and list all of them as known issues in 2.4.0.
>>>>>>>
>>>>>>> It's arguable if the "earlier entry wins" semantic is reasonable.
>>>>>>> Fixing it is a behavior change and we can only apply it to master branch.
>>>>>>>
>>>>>>> Going back to https://issues.apache.org/jira/browse/SPARK-25823,
>>>>>>> it's just a symptom of the hive map value converter bug. I think it's a
>>>>>>> non-blocker.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Wenchen
>>>>>>>
>>>>>>> On Thu, Oct 25, 2018 at 5:31 AM Dongjoon Hyun <
>>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi, All.
>>>>>>>>
>>>>>>>> -0 due to the following issue. From Spark 2.4.0, users may get an
>>>>>>>> incorrect result when they use new `map_fitler` with `map_concat` functions.
>>>>>>>>
>>>>>>>> https://issues.apache.org/jira/browse/SPARK-25823
>>>>>>>>
>>>>>>>> SPARK-25823 is only aiming to fix the data correctness issue from
>>>>>>>> `map_filter`.
>>>>>>>>
>>>>>>>> PMC members are able to lower the priority. Always, I respect PMC's
>>>>>>>> decision.
>>>>>>>>
>>>>>>>> I'm sending this email to draw more attention to this bug and to
>>>>>>>> give some warning on the new feature's limitation to the community.
>>>>>>>>
>>>>>>>> Bests,
>>>>>>>> Dongjoon.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 22, 2018 at 10:42 AM Wenchen Fan <cl...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>>>> version 2.4.0.
>>>>>>>>>
>>>>>>>>> The vote is open until October 26 PST and passes if a majority +1
>>>>>>>>> PMC votes are cast, with
>>>>>>>>> a minimum of 3 +1 votes.
>>>>>>>>>
>>>>>>>>> [ ] +1 Release this package as Apache Spark 2.4.0
>>>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>>>
>>>>>>>>> To learn more about Apache Spark, please see
>>>>>>>>> http://spark.apache.org/
>>>>>>>>>
>>>>>>>>> The tag to be voted on is v2.4.0-rc4 (commit
>>>>>>>>> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>>>>>>>>> https://github.com/apache/spark/tree/v2.4.0-rc4
>>>>>>>>>
>>>>>>>>> The release files, including signatures, digests, etc. can be
>>>>>>>>> found at:
>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>>>>>>>>>
>>>>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>>>>
>>>>>>>>> The staging repository for this release can be found at:
>>>>>>>>>
>>>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1290
>>>>>>>>>
>>>>>>>>> The documentation corresponding to this release can be found at:
>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>>>>>>>>>
>>>>>>>>> The list of bug fixes going into 2.4.0 can be found at the
>>>>>>>>> following URL:
>>>>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>>>>>>>>
>>>>>>>>> FAQ
>>>>>>>>>
>>>>>>>>> =========================
>>>>>>>>> How can I help test this release?
>>>>>>>>> =========================
>>>>>>>>>
>>>>>>>>> If you are a Spark user, you can help us test this release by
>>>>>>>>> taking
>>>>>>>>> an existing Spark workload and running on this release candidate,
>>>>>>>>> then
>>>>>>>>> reporting any regressions.
>>>>>>>>>
>>>>>>>>> If you're working in PySpark you can set up a virtual env and
>>>>>>>>> install
>>>>>>>>> the current RC and see if anything important breaks, in the
>>>>>>>>> Java/Scala
>>>>>>>>> you can add the staging repository to your projects resolvers and
>>>>>>>>> test
>>>>>>>>> with the RC (make sure to clean up the artifact cache before/after
>>>>>>>>> so
>>>>>>>>> you don't end up building with a out of date RC going forward).
>>>>>>>>>
>>>>>>>>> ===========================================
>>>>>>>>> What should happen to JIRA tickets still targeting 2.4.0?
>>>>>>>>> ===========================================
>>>>>>>>>
>>>>>>>>> The current list of open tickets targeted at 2.4.0 can be found at:
>>>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for
>>>>>>>>> "Target Version/s" = 2.4.0
>>>>>>>>>
>>>>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>>>>> fixes, documentation, and API tweaks that impact compatibility
>>>>>>>>> should
>>>>>>>>> be worked on immediately. Everything else please retarget to an
>>>>>>>>> appropriate release.
>>>>>>>>>
>>>>>>>>> ==================
>>>>>>>>> But my bug isn't fixed?
>>>>>>>>> ==================
>>>>>>>>>
>>>>>>>>> In order to make timely releases, we will typically not hold the
>>>>>>>>> release unless the bug in question is a regression from the
>>>>>>>>> previous
>>>>>>>>> release. That being said, if there is something which is a
>>>>>>>>> regression
>>>>>>>>> that has not been correctly targeted please ping me or a committer
>>>>>>>>> to
>>>>>>>>> help target the issue.
>>>>>>>>>
>>>>>>>>
>>
>> --
>> [image: Spark+AI Summit North America 2019]
>> <http://t.sidekickopen24.com/s1t/c/5/f18dQhb0S7lM8dDMPbW2n0x6l2B9nMJN7t5X-FfhMynN2z8MDjQsyTKW56dzQQ1-_gV6102?t=https%3A%2F%2Fdatabricks.com%2Fsparkaisummit%2Fnorth-america&si=undefined&pi=406b8c9a-b648-4923-9ed1-9a51ffe213fa>
>>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Dongjoon Hyun <do...@gmail.com>.

Thank you for the decision, All.

As of now, to unblock this, it seems that we are trying to remove them from
the function registry.

https://github.com/apache/spark/pull/22821

One problem here is that users can recover those functions like this simply.

scala> spark.sessionState.functionRegistry.createOrReplaceTempFunction("map_filter",
x => org.apache.spark.sql.catalyst.expressions.MapFilter(x(0),x(1)))


Technically, the PR looks like a compromised way to unblock the release and
to allow some users that feature completely.

At first glance, I thought this is a workaround to ignore the discussion
context. But, that sounds like one of the practical ways for Apache Spark.
(We had Spark 2.0 Tech. Preview before.)

I want to finalize the decision on `map_filter` (and related three
functions) issue. Are we good to go with
https://github.com/apache/spark/pull/22821?

Bests,
Dongjoon.

PS. Also, there is a PR to completely remove them, too.
       https://github.com/cloud-fan/spark/pull/11


On Wed, Oct 24, 2018 at 10:14 PM Xiao Li <li...@databricks.com> wrote:

> @Dongjoon Hyun <do...@gmail.com>  Thanks! This is a blocking
> ticket. It returns a wrong result due to our undefined behavior. I agree we
> should revert the newly added map-oriented functions. In 3.0 release, we
> need to define the behavior of duplicate keys in the data type MAP and fix
> all the related issues that are confusing to our end users.
>
> Thanks,
>
> Xiao
>
> On Wed, Oct 24, 2018 at 9:54 PM Wenchen Fan <cl...@gmail.com> wrote:
>
>> Ah now I see the problem. `map_filter` has a very weird semantic that is
>> neither "earlier entry wins" or "latter entry wins".
>>
>> I've opened https://github.com/apache/spark/pull/22821 , to remove these
>> newly added map-related functions from FunctionRegistry(for 2.4.0), so that
>> they are invisible to end-users, and the weird behavior of Spark map type
>> with duplicated keys are not escalated. We should fix it ASAP in the master
>> branch.
>>
>> If others are OK with it, I'll start a new RC after that PR is merged.
>>
>> Thanks,
>> Wenchen
>>
>> On Thu, Oct 25, 2018 at 10:32 AM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>>> For the first question, it's `bin/spark-sql` result. I didn't check STS,
>>> but it will return the same with `bin/spark-sql`.
>>>
>>> > I think map_filter is implemented correctly. map(1,2,1,3) is actually
>>> map(1,2) according to the "earlier entry wins" semantic. I don't think
>>> this will change in 2.4.1.
>>>
>>> For the second one, `map_filter` issue is not about `earlier entry wins`
>>> stuff. Please see the following example.
>>>
>>> spark-sql> SELECT m, map_filter(m, (k,v) -> v=2) c FROM (SELECT
>>> map_concat(map(1,2), map(1,3)) m);
>>> {1:3} {1:2}
>>>
>>> spark-sql> SELECT m, map_filter(m, (k,v) -> v=3) c FROM (SELECT
>>> map_concat(map(1,2), map(1,3)) m);
>>> {1:3} {1:3}
>>>
>>> spark-sql> SELECT m, map_filter(m, (k,v) -> v=4) c FROM (SELECT
>>> map_concat(map(1,2), map(1,3)) m);
>>> {1:3} {}
>>>
>>> In other words, `map_filter` works like `push-downed filter` to the map
>>> in terms of the output result
>>> while users assumed that `map_filter` works on top of the result of `m`.
>>>
>>> This is a function semantic issue.
>>>
>>>
>>> On Wed, Oct 24, 2018 at 6:06 PM Wenchen Fan <cl...@gmail.com> wrote:
>>>
>>>> > spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
>>>> > {1:3}
>>>>
>>>> Are you running in the thrift-server? Then maybe this is caused by the
>>>> bug in `Dateset.collect` as I mentioned above.
>>>>
>>>> I think map_filter is implemented correctly. map(1,2,1,3) is actually
>>>> map(1,2) according to the "earlier entry wins" semantic. I don't think
>>>> this will change in 2.4.1.
>>>>
>>>> On Thu, Oct 25, 2018 at 8:56 AM Dongjoon Hyun <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you for the follow-ups.
>>>>>
>>>>> Then, Spark 2.4.1 will return `{1:2}` differently from the followings
>>>>> (including Spark/Scala) in the end?
>>>>>
>>>>> I hoped to fix the `map_filter`, but now Spark looks inconsistent in
>>>>> many ways.
>>>>>
>>>>> scala> sql("select map(1,2,1,3)").show // Spark 2.2.2
>>>>> +---------------+
>>>>> |map(1, 2, 1, 3)|
>>>>> +---------------+
>>>>> |    Map(1 -> 3)|
>>>>> +---------------+
>>>>>
>>>>>
>>>>> spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
>>>>> {1:3}
>>>>>
>>>>>
>>>>> hive> select map(1,2,1,3);  // Hive 1.2.2
>>>>> OK
>>>>> {1:3}
>>>>>
>>>>>
>>>>> presto> SELECT map_concat(map(array[1],array[2]),
>>>>> map(array[1],array[3])); // Presto 0.212
>>>>>  _col0
>>>>> -------
>>>>>  {1=3}
>>>>>
>>>>>
>>>>> Bests,
>>>>> Dongjoon.
>>>>>
>>>>>
>>>>> On Wed, Oct 24, 2018 at 5:17 PM Wenchen Fan <cl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Dongjoon,
>>>>>>
>>>>>> Thanks for reporting it! This is indeed a bug that needs to be fixed.
>>>>>>
>>>>>> The problem is not about the function `map_filter`, but about how the
>>>>>> map type values are created in Spark, when there are duplicated keys.
>>>>>>
>>>>>> In programming languages like Java/Scala, when creating map, the
>>>>>> later entry wins. e.g. in scala
>>>>>> scala> Map(1 -> 2, 1 -> 3)
>>>>>> res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)
>>>>>>
>>>>>> scala> Map(1 -> 2, 1 -> 3).get(1)
>>>>>> res1: Option[Int] = Some(3)
>>>>>>
>>>>>> However, in Spark, the earlier entry wins
>>>>>> scala> sql("SELECT map(1,2,1,3)[1]").show
>>>>>> +------------------+
>>>>>> |map(1, 2, 1, 3)[1]|
>>>>>> +------------------+
>>>>>> |                 2|
>>>>>> +------------------+
>>>>>>
>>>>>> So for Spark users, Map(1 -> 2, 1 -> 3) should be equal to Map(1 ->
>>>>>> 2).
>>>>>>
>>>>>> But there are several bugs in Spark
>>>>>>
>>>>>> scala> sql("SELECT map(1,2,1,3)").show
>>>>>> +----------------+
>>>>>> | map(1, 2, 1, 3)|
>>>>>> +----------------+
>>>>>> |[1 -> 2, 1 -> 3]|
>>>>>> +----------------+
>>>>>> The displayed string of map values has a bug and we should
>>>>>> deduplicate the entries, This is tracked by SPARK-25824.
>>>>>>
>>>>>>
>>>>>> scala> sql("CREATE TABLE t AS SELECT map(1,2,1,3) as map")
>>>>>> res11: org.apache.spark.sql.DataFrame = []
>>>>>>
>>>>>> scala> sql("select * from t").show
>>>>>> +--------+
>>>>>> |     map|
>>>>>> +--------+
>>>>>> |[1 -> 3]|
>>>>>> +--------+
>>>>>> The Hive map value convert has a bug, we should respect the "earlier
>>>>>> entry wins" semantic. No ticket yet.
>>>>>>
>>>>>>
>>>>>> scala> sql("select map(1,2,1,3)").collect
>>>>>> res14: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
>>>>>> Same bug happens at `collect`. No ticket yet.
>>>>>>
>>>>>> I'll create tickets and list all of them as known issues in 2.4.0.
>>>>>>
>>>>>> It's arguable if the "earlier entry wins" semantic is reasonable.
>>>>>> Fixing it is a behavior change and we can only apply it to master branch.
>>>>>>
>>>>>> Going back to https://issues.apache.org/jira/browse/SPARK-25823,
>>>>>> it's just a symptom of the hive map value converter bug. I think it's a
>>>>>> non-blocker.
>>>>>>
>>>>>> Thanks,
>>>>>> Wenchen
>>>>>>
>>>>>> On Thu, Oct 25, 2018 at 5:31 AM Dongjoon Hyun <
>>>>>> dongjoon.hyun@gmail.com> wrote:
>>>>>>
>>>>>>> Hi, All.
>>>>>>>
>>>>>>> -0 due to the following issue. From Spark 2.4.0, users may get an
>>>>>>> incorrect result when they use new `map_fitler` with `map_concat` functions.
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/SPARK-25823
>>>>>>>
>>>>>>> SPARK-25823 is only aiming to fix the data correctness issue from
>>>>>>> `map_filter`.
>>>>>>>
>>>>>>> PMC members are able to lower the priority. Always, I respect PMC's
>>>>>>> decision.
>>>>>>>
>>>>>>> I'm sending this email to draw more attention to this bug and to
>>>>>>> give some warning on the new feature's limitation to the community.
>>>>>>>
>>>>>>> Bests,
>>>>>>> Dongjoon.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 22, 2018 at 10:42 AM Wenchen Fan <cl...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>>> version 2.4.0.
>>>>>>>>
>>>>>>>> The vote is open until October 26 PST and passes if a majority +1
>>>>>>>> PMC votes are cast, with
>>>>>>>> a minimum of 3 +1 votes.
>>>>>>>>
>>>>>>>> [ ] +1 Release this package as Apache Spark 2.4.0
>>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>>
>>>>>>>> To learn more about Apache Spark, please see
>>>>>>>> http://spark.apache.org/
>>>>>>>>
>>>>>>>> The tag to be voted on is v2.4.0-rc4 (commit
>>>>>>>> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>>>>>>>> https://github.com/apache/spark/tree/v2.4.0-rc4
>>>>>>>>
>>>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>>>> at:
>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>>>>>>>>
>>>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>>>
>>>>>>>> The staging repository for this release can be found at:
>>>>>>>>
>>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1290
>>>>>>>>
>>>>>>>> The documentation corresponding to this release can be found at:
>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>>>>>>>>
>>>>>>>> The list of bug fixes going into 2.4.0 can be found at the
>>>>>>>> following URL:
>>>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>>>>>>>
>>>>>>>> FAQ
>>>>>>>>
>>>>>>>> =========================
>>>>>>>> How can I help test this release?
>>>>>>>> =========================
>>>>>>>>
>>>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>>>> an existing Spark workload and running on this release candidate,
>>>>>>>> then
>>>>>>>> reporting any regressions.
>>>>>>>>
>>>>>>>> If you're working in PySpark you can set up a virtual env and
>>>>>>>> install
>>>>>>>> the current RC and see if anything important breaks, in the
>>>>>>>> Java/Scala
>>>>>>>> you can add the staging repository to your projects resolvers and
>>>>>>>> test
>>>>>>>> with the RC (make sure to clean up the artifact cache before/after
>>>>>>>> so
>>>>>>>> you don't end up building with a out of date RC going forward).
>>>>>>>>
>>>>>>>> ===========================================
>>>>>>>> What should happen to JIRA tickets still targeting 2.4.0?
>>>>>>>> ===========================================
>>>>>>>>
>>>>>>>> The current list of open tickets targeted at 2.4.0 can be found at:
>>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for
>>>>>>>> "Target Version/s" = 2.4.0
>>>>>>>>
>>>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>>>> fixes, documentation, and API tweaks that impact compatibility
>>>>>>>> should
>>>>>>>> be worked on immediately. Everything else please retarget to an
>>>>>>>> appropriate release.
>>>>>>>>
>>>>>>>> ==================
>>>>>>>> But my bug isn't fixed?
>>>>>>>> ==================
>>>>>>>>
>>>>>>>> In order to make timely releases, we will typically not hold the
>>>>>>>> release unless the bug in question is a regression from the previous
>>>>>>>> release. That being said, if there is something which is a
>>>>>>>> regression
>>>>>>>> that has not been correctly targeted please ping me or a committer
>>>>>>>> to
>>>>>>>> help target the issue.
>>>>>>>>
>>>>>>>
>
> --
> [image: Spark+AI Summit North America 2019]
> <http://t.sidekickopen24.com/s1t/c/5/f18dQhb0S7lM8dDMPbW2n0x6l2B9nMJN7t5X-FfhMynN2z8MDjQsyTKW56dzQQ1-_gV6102?t=https%3A%2F%2Fdatabricks.com%2Fsparkaisummit%2Fnorth-america&si=undefined&pi=406b8c9a-b648-4923-9ed1-9a51ffe213fa>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Xiao Li <li...@databricks.com>.

@Dongjoon Hyun <do...@gmail.com>  Thanks! This is a blocking
ticket. It returns a wrong result due to our undefined behavior. I agree we
should revert the newly added map-oriented functions. In 3.0 release, we
need to define the behavior of duplicate keys in the data type MAP and fix
all the related issues that are confusing to our end users.

Thanks,

Xiao

On Wed, Oct 24, 2018 at 9:54 PM Wenchen Fan <cl...@gmail.com> wrote:

> Ah now I see the problem. `map_filter` has a very weird semantic that is
> neither "earlier entry wins" or "latter entry wins".
>
> I've opened https://github.com/apache/spark/pull/22821 , to remove these
> newly added map-related functions from FunctionRegistry(for 2.4.0), so that
> they are invisible to end-users, and the weird behavior of Spark map type
> with duplicated keys are not escalated. We should fix it ASAP in the master
> branch.
>
> If others are OK with it, I'll start a new RC after that PR is merged.
>
> Thanks,
> Wenchen
>
> On Thu, Oct 25, 2018 at 10:32 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> For the first question, it's `bin/spark-sql` result. I didn't check STS,
>> but it will return the same with `bin/spark-sql`.
>>
>> > I think map_filter is implemented correctly. map(1,2,1,3) is actually
>> map(1,2) according to the "earlier entry wins" semantic. I don't think
>> this will change in 2.4.1.
>>
>> For the second one, `map_filter` issue is not about `earlier entry wins`
>> stuff. Please see the following example.
>>
>> spark-sql> SELECT m, map_filter(m, (k,v) -> v=2) c FROM (SELECT
>> map_concat(map(1,2), map(1,3)) m);
>> {1:3} {1:2}
>>
>> spark-sql> SELECT m, map_filter(m, (k,v) -> v=3) c FROM (SELECT
>> map_concat(map(1,2), map(1,3)) m);
>> {1:3} {1:3}
>>
>> spark-sql> SELECT m, map_filter(m, (k,v) -> v=4) c FROM (SELECT
>> map_concat(map(1,2), map(1,3)) m);
>> {1:3} {}
>>
>> In other words, `map_filter` works like `push-downed filter` to the map
>> in terms of the output result
>> while users assumed that `map_filter` works on top of the result of `m`.
>>
>> This is a function semantic issue.
>>
>>
>> On Wed, Oct 24, 2018 at 6:06 PM Wenchen Fan <cl...@gmail.com> wrote:
>>
>>> > spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
>>> > {1:3}
>>>
>>> Are you running in the thrift-server? Then maybe this is caused by the
>>> bug in `Dateset.collect` as I mentioned above.
>>>
>>> I think map_filter is implemented correctly. map(1,2,1,3) is actually
>>> map(1,2) according to the "earlier entry wins" semantic. I don't think
>>> this will change in 2.4.1.
>>>
>>> On Thu, Oct 25, 2018 at 8:56 AM Dongjoon Hyun <do...@gmail.com>
>>> wrote:
>>>
>>>> Thank you for the follow-ups.
>>>>
>>>> Then, Spark 2.4.1 will return `{1:2}` differently from the followings
>>>> (including Spark/Scala) in the end?
>>>>
>>>> I hoped to fix the `map_filter`, but now Spark looks inconsistent in
>>>> many ways.
>>>>
>>>> scala> sql("select map(1,2,1,3)").show // Spark 2.2.2
>>>> +---------------+
>>>> |map(1, 2, 1, 3)|
>>>> +---------------+
>>>> |    Map(1 -> 3)|
>>>> +---------------+
>>>>
>>>>
>>>> spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
>>>> {1:3}
>>>>
>>>>
>>>> hive> select map(1,2,1,3);  // Hive 1.2.2
>>>> OK
>>>> {1:3}
>>>>
>>>>
>>>> presto> SELECT map_concat(map(array[1],array[2]),
>>>> map(array[1],array[3])); // Presto 0.212
>>>>  _col0
>>>> -------
>>>>  {1=3}
>>>>
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Wed, Oct 24, 2018 at 5:17 PM Wenchen Fan <cl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Dongjoon,
>>>>>
>>>>> Thanks for reporting it! This is indeed a bug that needs to be fixed.
>>>>>
>>>>> The problem is not about the function `map_filter`, but about how the
>>>>> map type values are created in Spark, when there are duplicated keys.
>>>>>
>>>>> In programming languages like Java/Scala, when creating map, the later
>>>>> entry wins. e.g. in scala
>>>>> scala> Map(1 -> 2, 1 -> 3)
>>>>> res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)
>>>>>
>>>>> scala> Map(1 -> 2, 1 -> 3).get(1)
>>>>> res1: Option[Int] = Some(3)
>>>>>
>>>>> However, in Spark, the earlier entry wins
>>>>> scala> sql("SELECT map(1,2,1,3)[1]").show
>>>>> +------------------+
>>>>> |map(1, 2, 1, 3)[1]|
>>>>> +------------------+
>>>>> |                 2|
>>>>> +------------------+
>>>>>
>>>>> So for Spark users, Map(1 -> 2, 1 -> 3) should be equal to Map(1 -> 2)
>>>>> .
>>>>>
>>>>> But there are several bugs in Spark
>>>>>
>>>>> scala> sql("SELECT map(1,2,1,3)").show
>>>>> +----------------+
>>>>> | map(1, 2, 1, 3)|
>>>>> +----------------+
>>>>> |[1 -> 2, 1 -> 3]|
>>>>> +----------------+
>>>>> The displayed string of map values has a bug and we should deduplicate
>>>>> the entries, This is tracked by SPARK-25824.
>>>>>
>>>>>
>>>>> scala> sql("CREATE TABLE t AS SELECT map(1,2,1,3) as map")
>>>>> res11: org.apache.spark.sql.DataFrame = []
>>>>>
>>>>> scala> sql("select * from t").show
>>>>> +--------+
>>>>> |     map|
>>>>> +--------+
>>>>> |[1 -> 3]|
>>>>> +--------+
>>>>> The Hive map value convert has a bug, we should respect the "earlier
>>>>> entry wins" semantic. No ticket yet.
>>>>>
>>>>>
>>>>> scala> sql("select map(1,2,1,3)").collect
>>>>> res14: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
>>>>> Same bug happens at `collect`. No ticket yet.
>>>>>
>>>>> I'll create tickets and list all of them as known issues in 2.4.0.
>>>>>
>>>>> It's arguable if the "earlier entry wins" semantic is reasonable.
>>>>> Fixing it is a behavior change and we can only apply it to master branch.
>>>>>
>>>>> Going back to https://issues.apache.org/jira/browse/SPARK-25823, it's
>>>>> just a symptom of the hive map value converter bug. I think it's a
>>>>> non-blocker.
>>>>>
>>>>> Thanks,
>>>>> Wenchen
>>>>>
>>>>> On Thu, Oct 25, 2018 at 5:31 AM Dongjoon Hyun <do...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi, All.
>>>>>>
>>>>>> -0 due to the following issue. From Spark 2.4.0, users may get an
>>>>>> incorrect result when they use new `map_fitler` with `map_concat` functions.
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/SPARK-25823
>>>>>>
>>>>>> SPARK-25823 is only aiming to fix the data correctness issue from
>>>>>> `map_filter`.
>>>>>>
>>>>>> PMC members are able to lower the priority. Always, I respect PMC's
>>>>>> decision.
>>>>>>
>>>>>> I'm sending this email to draw more attention to this bug and to give
>>>>>> some warning on the new feature's limitation to the community.
>>>>>>
>>>>>> Bests,
>>>>>> Dongjoon.
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 22, 2018 at 10:42 AM Wenchen Fan <cl...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>> version 2.4.0.
>>>>>>>
>>>>>>> The vote is open until October 26 PST and passes if a majority +1
>>>>>>> PMC votes are cast, with
>>>>>>> a minimum of 3 +1 votes.
>>>>>>>
>>>>>>> [ ] +1 Release this package as Apache Spark 2.4.0
>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>
>>>>>>> To learn more about Apache Spark, please see
>>>>>>> http://spark.apache.org/
>>>>>>>
>>>>>>> The tag to be voted on is v2.4.0-rc4 (commit
>>>>>>> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>>>>>>> https://github.com/apache/spark/tree/v2.4.0-rc4
>>>>>>>
>>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>>> at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>>>>>>>
>>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>>
>>>>>>> The staging repository for this release can be found at:
>>>>>>>
>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1290
>>>>>>>
>>>>>>> The documentation corresponding to this release can be found at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>>>>>>>
>>>>>>> The list of bug fixes going into 2.4.0 can be found at the following
>>>>>>> URL:
>>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>>>>>>
>>>>>>> FAQ
>>>>>>>
>>>>>>> =========================
>>>>>>> How can I help test this release?
>>>>>>> =========================
>>>>>>>
>>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>>> an existing Spark workload and running on this release candidate,
>>>>>>> then
>>>>>>> reporting any regressions.
>>>>>>>
>>>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>>>> the current RC and see if anything important breaks, in the
>>>>>>> Java/Scala
>>>>>>> you can add the staging repository to your projects resolvers and
>>>>>>> test
>>>>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>>>>> you don't end up building with a out of date RC going forward).
>>>>>>>
>>>>>>> ===========================================
>>>>>>> What should happen to JIRA tickets still targeting 2.4.0?
>>>>>>> ===========================================
>>>>>>>
>>>>>>> The current list of open tickets targeted at 2.4.0 can be found at:
>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for
>>>>>>> "Target Version/s" = 2.4.0
>>>>>>>
>>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>>>>> be worked on immediately. Everything else please retarget to an
>>>>>>> appropriate release.
>>>>>>>
>>>>>>> ==================
>>>>>>> But my bug isn't fixed?
>>>>>>> ==================
>>>>>>>
>>>>>>> In order to make timely releases, we will typically not hold the
>>>>>>> release unless the bug in question is a regression from the previous
>>>>>>> release. That being said, if there is something which is a regression
>>>>>>> that has not been correctly targeted please ping me or a committer to
>>>>>>> help target the issue.
>>>>>>>
>>>>>>

-- 
[image: Spark+AI Summit North America 2019]
<http://t.sidekickopen24.com/s1t/c/5/f18dQhb0S7lM8dDMPbW2n0x6l2B9nMJN7t5X-FfhMynN2z8MDjQsyTKW56dzQQ1-_gV6102?t=https%3A%2F%2Fdatabricks.com%2Fsparkaisummit%2Fnorth-america&si=undefined&pi=406b8c9a-b648-4923-9ed1-9a51ffe213fa>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Wenchen Fan <cl...@gmail.com>.

Ah now I see the problem. `map_filter` has a very weird semantic that is
neither "earlier entry wins" or "latter entry wins".

I've opened https://github.com/apache/spark/pull/22821 , to remove these
newly added map-related functions from FunctionRegistry(for 2.4.0), so that
they are invisible to end-users, and the weird behavior of Spark map type
with duplicated keys are not escalated. We should fix it ASAP in the master
branch.

If others are OK with it, I'll start a new RC after that PR is merged.

Thanks,
Wenchen

On Thu, Oct 25, 2018 at 10:32 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> For the first question, it's `bin/spark-sql` result. I didn't check STS,
> but it will return the same with `bin/spark-sql`.
>
> > I think map_filter is implemented correctly. map(1,2,1,3) is actually
> map(1,2) according to the "earlier entry wins" semantic. I don't think
> this will change in 2.4.1.
>
> For the second one, `map_filter` issue is not about `earlier entry wins`
> stuff. Please see the following example.
>
> spark-sql> SELECT m, map_filter(m, (k,v) -> v=2) c FROM (SELECT
> map_concat(map(1,2), map(1,3)) m);
> {1:3} {1:2}
>
> spark-sql> SELECT m, map_filter(m, (k,v) -> v=3) c FROM (SELECT
> map_concat(map(1,2), map(1,3)) m);
> {1:3} {1:3}
>
> spark-sql> SELECT m, map_filter(m, (k,v) -> v=4) c FROM (SELECT
> map_concat(map(1,2), map(1,3)) m);
> {1:3} {}
>
> In other words, `map_filter` works like `push-downed filter` to the map in
> terms of the output result
> while users assumed that `map_filter` works on top of the result of `m`.
>
> This is a function semantic issue.
>
>
> On Wed, Oct 24, 2018 at 6:06 PM Wenchen Fan <cl...@gmail.com> wrote:
>
>> > spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
>> > {1:3}
>>
>> Are you running in the thrift-server? Then maybe this is caused by the
>> bug in `Dateset.collect` as I mentioned above.
>>
>> I think map_filter is implemented correctly. map(1,2,1,3) is actually
>> map(1,2) according to the "earlier entry wins" semantic. I don't think
>> this will change in 2.4.1.
>>
>> On Thu, Oct 25, 2018 at 8:56 AM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>>> Thank you for the follow-ups.
>>>
>>> Then, Spark 2.4.1 will return `{1:2}` differently from the followings
>>> (including Spark/Scala) in the end?
>>>
>>> I hoped to fix the `map_filter`, but now Spark looks inconsistent in
>>> many ways.
>>>
>>> scala> sql("select map(1,2,1,3)").show // Spark 2.2.2
>>> +---------------+
>>> |map(1, 2, 1, 3)|
>>> +---------------+
>>> |    Map(1 -> 3)|
>>> +---------------+
>>>
>>>
>>> spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
>>> {1:3}
>>>
>>>
>>> hive> select map(1,2,1,3);  // Hive 1.2.2
>>> OK
>>> {1:3}
>>>
>>>
>>> presto> SELECT map_concat(map(array[1],array[2]),
>>> map(array[1],array[3])); // Presto 0.212
>>>  _col0
>>> -------
>>>  {1=3}
>>>
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Wed, Oct 24, 2018 at 5:17 PM Wenchen Fan <cl...@gmail.com> wrote:
>>>
>>>> Hi Dongjoon,
>>>>
>>>> Thanks for reporting it! This is indeed a bug that needs to be fixed.
>>>>
>>>> The problem is not about the function `map_filter`, but about how the
>>>> map type values are created in Spark, when there are duplicated keys.
>>>>
>>>> In programming languages like Java/Scala, when creating map, the later
>>>> entry wins. e.g. in scala
>>>> scala> Map(1 -> 2, 1 -> 3)
>>>> res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)
>>>>
>>>> scala> Map(1 -> 2, 1 -> 3).get(1)
>>>> res1: Option[Int] = Some(3)
>>>>
>>>> However, in Spark, the earlier entry wins
>>>> scala> sql("SELECT map(1,2,1,3)[1]").show
>>>> +------------------+
>>>> |map(1, 2, 1, 3)[1]|
>>>> +------------------+
>>>> |                 2|
>>>> +------------------+
>>>>
>>>> So for Spark users, Map(1 -> 2, 1 -> 3) should be equal to Map(1 -> 2).
>>>>
>>>> But there are several bugs in Spark
>>>>
>>>> scala> sql("SELECT map(1,2,1,3)").show
>>>> +----------------+
>>>> | map(1, 2, 1, 3)|
>>>> +----------------+
>>>> |[1 -> 2, 1 -> 3]|
>>>> +----------------+
>>>> The displayed string of map values has a bug and we should deduplicate
>>>> the entries, This is tracked by SPARK-25824.
>>>>
>>>>
>>>> scala> sql("CREATE TABLE t AS SELECT map(1,2,1,3) as map")
>>>> res11: org.apache.spark.sql.DataFrame = []
>>>>
>>>> scala> sql("select * from t").show
>>>> +--------+
>>>> |     map|
>>>> +--------+
>>>> |[1 -> 3]|
>>>> +--------+
>>>> The Hive map value convert has a bug, we should respect the "earlier
>>>> entry wins" semantic. No ticket yet.
>>>>
>>>>
>>>> scala> sql("select map(1,2,1,3)").collect
>>>> res14: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
>>>> Same bug happens at `collect`. No ticket yet.
>>>>
>>>> I'll create tickets and list all of them as known issues in 2.4.0.
>>>>
>>>> It's arguable if the "earlier entry wins" semantic is reasonable.
>>>> Fixing it is a behavior change and we can only apply it to master branch.
>>>>
>>>> Going back to https://issues.apache.org/jira/browse/SPARK-25823, it's
>>>> just a symptom of the hive map value converter bug. I think it's a
>>>> non-blocker.
>>>>
>>>> Thanks,
>>>> Wenchen
>>>>
>>>> On Thu, Oct 25, 2018 at 5:31 AM Dongjoon Hyun <do...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi, All.
>>>>>
>>>>> -0 due to the following issue. From Spark 2.4.0, users may get an
>>>>> incorrect result when they use new `map_fitler` with `map_concat` functions.
>>>>>
>>>>> https://issues.apache.org/jira/browse/SPARK-25823
>>>>>
>>>>> SPARK-25823 is only aiming to fix the data correctness issue from
>>>>> `map_filter`.
>>>>>
>>>>> PMC members are able to lower the priority. Always, I respect PMC's
>>>>> decision.
>>>>>
>>>>> I'm sending this email to draw more attention to this bug and to give
>>>>> some warning on the new feature's limitation to the community.
>>>>>
>>>>> Bests,
>>>>> Dongjoon.
>>>>>
>>>>>
>>>>> On Mon, Oct 22, 2018 at 10:42 AM Wenchen Fan <cl...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>> version 2.4.0.
>>>>>>
>>>>>> The vote is open until October 26 PST and passes if a majority +1 PMC
>>>>>> votes are cast, with
>>>>>> a minimum of 3 +1 votes.
>>>>>>
>>>>>> [ ] +1 Release this package as Apache Spark 2.4.0
>>>>>> [ ] -1 Do not release this package because ...
>>>>>>
>>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>>
>>>>>> The tag to be voted on is v2.4.0-rc4 (commit
>>>>>> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>>>>>> https://github.com/apache/spark/tree/v2.4.0-rc4
>>>>>>
>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>> at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>>>>>>
>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>
>>>>>> The staging repository for this release can be found at:
>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1290
>>>>>>
>>>>>> The documentation corresponding to this release can be found at:
>>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>>>>>>
>>>>>> The list of bug fixes going into 2.4.0 can be found at the following
>>>>>> URL:
>>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>>>>>
>>>>>> FAQ
>>>>>>
>>>>>> =========================
>>>>>> How can I help test this release?
>>>>>> =========================
>>>>>>
>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>> an existing Spark workload and running on this release candidate, then
>>>>>> reporting any regressions.
>>>>>>
>>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>>>> you can add the staging repository to your projects resolvers and test
>>>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>>>> you don't end up building with a out of date RC going forward).
>>>>>>
>>>>>> ===========================================
>>>>>> What should happen to JIRA tickets still targeting 2.4.0?
>>>>>> ===========================================
>>>>>>
>>>>>> The current list of open tickets targeted at 2.4.0 can be found at:
>>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>>>> Version/s" = 2.4.0
>>>>>>
>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>>>> be worked on immediately. Everything else please retarget to an
>>>>>> appropriate release.
>>>>>>
>>>>>> ==================
>>>>>> But my bug isn't fixed?
>>>>>> ==================
>>>>>>
>>>>>> In order to make timely releases, we will typically not hold the
>>>>>> release unless the bug in question is a regression from the previous
>>>>>> release. That being said, if there is something which is a regression
>>>>>> that has not been correctly targeted please ping me or a committer to
>>>>>> help target the issue.
>>>>>>
>>>>>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Dongjoon Hyun <do...@gmail.com>.

For the first question, it's `bin/spark-sql` result. I didn't check STS,
but it will return the same with `bin/spark-sql`.

> I think map_filter is implemented correctly. map(1,2,1,3) is actually
map(1,2) according to the "earlier entry wins" semantic. I don't think this
will change in 2.4.1.

For the second one, `map_filter` issue is not about `earlier entry wins`
stuff. Please see the following example.

spark-sql> SELECT m, map_filter(m, (k,v) -> v=2) c FROM (SELECT
map_concat(map(1,2), map(1,3)) m);
{1:3} {1:2}

spark-sql> SELECT m, map_filter(m, (k,v) -> v=3) c FROM (SELECT
map_concat(map(1,2), map(1,3)) m);
{1:3} {1:3}

spark-sql> SELECT m, map_filter(m, (k,v) -> v=4) c FROM (SELECT
map_concat(map(1,2), map(1,3)) m);
{1:3} {}

In other words, `map_filter` works like `push-downed filter` to the map in
terms of the output result
while users assumed that `map_filter` works on top of the result of `m`.

This is a function semantic issue.


On Wed, Oct 24, 2018 at 6:06 PM Wenchen Fan <cl...@gmail.com> wrote:

> > spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
> > {1:3}
>
> Are you running in the thrift-server? Then maybe this is caused by the bug
> in `Dateset.collect` as I mentioned above.
>
> I think map_filter is implemented correctly. map(1,2,1,3) is actually
> map(1,2) according to the "earlier entry wins" semantic. I don't think
> this will change in 2.4.1.
>
> On Thu, Oct 25, 2018 at 8:56 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Thank you for the follow-ups.
>>
>> Then, Spark 2.4.1 will return `{1:2}` differently from the followings
>> (including Spark/Scala) in the end?
>>
>> I hoped to fix the `map_filter`, but now Spark looks inconsistent in many
>> ways.
>>
>> scala> sql("select map(1,2,1,3)").show // Spark 2.2.2
>> +---------------+
>> |map(1, 2, 1, 3)|
>> +---------------+
>> |    Map(1 -> 3)|
>> +---------------+
>>
>>
>> spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
>> {1:3}
>>
>>
>> hive> select map(1,2,1,3);  // Hive 1.2.2
>> OK
>> {1:3}
>>
>>
>> presto> SELECT map_concat(map(array[1],array[2]),
>> map(array[1],array[3])); // Presto 0.212
>>  _col0
>> -------
>>  {1=3}
>>
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Wed, Oct 24, 2018 at 5:17 PM Wenchen Fan <cl...@gmail.com> wrote:
>>
>>> Hi Dongjoon,
>>>
>>> Thanks for reporting it! This is indeed a bug that needs to be fixed.
>>>
>>> The problem is not about the function `map_filter`, but about how the
>>> map type values are created in Spark, when there are duplicated keys.
>>>
>>> In programming languages like Java/Scala, when creating map, the later
>>> entry wins. e.g. in scala
>>> scala> Map(1 -> 2, 1 -> 3)
>>> res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)
>>>
>>> scala> Map(1 -> 2, 1 -> 3).get(1)
>>> res1: Option[Int] = Some(3)
>>>
>>> However, in Spark, the earlier entry wins
>>> scala> sql("SELECT map(1,2,1,3)[1]").show
>>> +------------------+
>>> |map(1, 2, 1, 3)[1]|
>>> +------------------+
>>> |                 2|
>>> +------------------+
>>>
>>> So for Spark users, Map(1 -> 2, 1 -> 3) should be equal to Map(1 -> 2).
>>>
>>> But there are several bugs in Spark
>>>
>>> scala> sql("SELECT map(1,2,1,3)").show
>>> +----------------+
>>> | map(1, 2, 1, 3)|
>>> +----------------+
>>> |[1 -> 2, 1 -> 3]|
>>> +----------------+
>>> The displayed string of map values has a bug and we should deduplicate
>>> the entries, This is tracked by SPARK-25824.
>>>
>>>
>>> scala> sql("CREATE TABLE t AS SELECT map(1,2,1,3) as map")
>>> res11: org.apache.spark.sql.DataFrame = []
>>>
>>> scala> sql("select * from t").show
>>> +--------+
>>> |     map|
>>> +--------+
>>> |[1 -> 3]|
>>> +--------+
>>> The Hive map value convert has a bug, we should respect the "earlier
>>> entry wins" semantic. No ticket yet.
>>>
>>>
>>> scala> sql("select map(1,2,1,3)").collect
>>> res14: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
>>> Same bug happens at `collect`. No ticket yet.
>>>
>>> I'll create tickets and list all of them as known issues in 2.4.0.
>>>
>>> It's arguable if the "earlier entry wins" semantic is reasonable. Fixing
>>> it is a behavior change and we can only apply it to master branch.
>>>
>>> Going back to https://issues.apache.org/jira/browse/SPARK-25823, it's
>>> just a symptom of the hive map value converter bug. I think it's a
>>> non-blocker.
>>>
>>> Thanks,
>>> Wenchen
>>>
>>> On Thu, Oct 25, 2018 at 5:31 AM Dongjoon Hyun <do...@gmail.com>
>>> wrote:
>>>
>>>> Hi, All.
>>>>
>>>> -0 due to the following issue. From Spark 2.4.0, users may get an
>>>> incorrect result when they use new `map_fitler` with `map_concat` functions.
>>>>
>>>> https://issues.apache.org/jira/browse/SPARK-25823
>>>>
>>>> SPARK-25823 is only aiming to fix the data correctness issue from
>>>> `map_filter`.
>>>>
>>>> PMC members are able to lower the priority. Always, I respect PMC's
>>>> decision.
>>>>
>>>> I'm sending this email to draw more attention to this bug and to give
>>>> some warning on the new feature's limitation to the community.
>>>>
>>>> Bests,
>>>> Dongjoon.
>>>>
>>>>
>>>> On Mon, Oct 22, 2018 at 10:42 AM Wenchen Fan <cl...@gmail.com>
>>>> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 2.4.0.
>>>>>
>>>>> The vote is open until October 26 PST and passes if a majority +1 PMC
>>>>> votes are cast, with
>>>>> a minimum of 3 +1 votes.
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 2.4.0
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>
>>>>> The tag to be voted on is v2.4.0-rc4 (commit
>>>>> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>>>>> https://github.com/apache/spark/tree/v2.4.0-rc4
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>>>>>
>>>>> Signatures used for Spark RCs can be found in this file:
>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1290
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>>>>>
>>>>> The list of bug fixes going into 2.4.0 can be found at the following
>>>>> URL:
>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>>>>
>>>>> FAQ
>>>>>
>>>>> =========================
>>>>> How can I help test this release?
>>>>> =========================
>>>>>
>>>>> If you are a Spark user, you can help us test this release by taking
>>>>> an existing Spark workload and running on this release candidate, then
>>>>> reporting any regressions.
>>>>>
>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>>> you can add the staging repository to your projects resolvers and test
>>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>>> you don't end up building with a out of date RC going forward).
>>>>>
>>>>> ===========================================
>>>>> What should happen to JIRA tickets still targeting 2.4.0?
>>>>> ===========================================
>>>>>
>>>>> The current list of open tickets targeted at 2.4.0 can be found at:
>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>>> Version/s" = 2.4.0
>>>>>
>>>>> Committers should look at those and triage. Extremely important bug
>>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>>> be worked on immediately. Everything else please retarget to an
>>>>> appropriate release.
>>>>>
>>>>> ==================
>>>>> But my bug isn't fixed?
>>>>> ==================
>>>>>
>>>>> In order to make timely releases, we will typically not hold the
>>>>> release unless the bug in question is a regression from the previous
>>>>> release. That being said, if there is something which is a regression
>>>>> that has not been correctly targeted please ping me or a committer to
>>>>> help target the issue.
>>>>>
>>>>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Wenchen Fan <cl...@gmail.com>.

> spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
> {1:3}

Are you running in the thrift-server? Then maybe this is caused by the bug
in `Dateset.collect` as I mentioned above.

I think map_filter is implemented correctly. map(1,2,1,3) is actually
map(1,2) according to the "earlier entry wins" semantic. I don't think this
will change in 2.4.1.

On Thu, Oct 25, 2018 at 8:56 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Thank you for the follow-ups.
>
> Then, Spark 2.4.1 will return `{1:2}` differently from the followings
> (including Spark/Scala) in the end?
>
> I hoped to fix the `map_filter`, but now Spark looks inconsistent in many
> ways.
>
> scala> sql("select map(1,2,1,3)").show // Spark 2.2.2
> +---------------+
> |map(1, 2, 1, 3)|
> +---------------+
> |    Map(1 -> 3)|
> +---------------+
>
>
> spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
> {1:3}
>
>
> hive> select map(1,2,1,3);  // Hive 1.2.2
> OK
> {1:3}
>
>
> presto> SELECT map_concat(map(array[1],array[2]), map(array[1],array[3]));
> // Presto 0.212
>  _col0
> -------
>  {1=3}
>
>
> Bests,
> Dongjoon.
>
>
> On Wed, Oct 24, 2018 at 5:17 PM Wenchen Fan <cl...@gmail.com> wrote:
>
>> Hi Dongjoon,
>>
>> Thanks for reporting it! This is indeed a bug that needs to be fixed.
>>
>> The problem is not about the function `map_filter`, but about how the map
>> type values are created in Spark, when there are duplicated keys.
>>
>> In programming languages like Java/Scala, when creating map, the later
>> entry wins. e.g. in scala
>> scala> Map(1 -> 2, 1 -> 3)
>> res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)
>>
>> scala> Map(1 -> 2, 1 -> 3).get(1)
>> res1: Option[Int] = Some(3)
>>
>> However, in Spark, the earlier entry wins
>> scala> sql("SELECT map(1,2,1,3)[1]").show
>> +------------------+
>> |map(1, 2, 1, 3)[1]|
>> +------------------+
>> |                 2|
>> +------------------+
>>
>> So for Spark users, Map(1 -> 2, 1 -> 3) should be equal to Map(1 -> 2).
>>
>> But there are several bugs in Spark
>>
>> scala> sql("SELECT map(1,2,1,3)").show
>> +----------------+
>> | map(1, 2, 1, 3)|
>> +----------------+
>> |[1 -> 2, 1 -> 3]|
>> +----------------+
>> The displayed string of map values has a bug and we should deduplicate
>> the entries, This is tracked by SPARK-25824.
>>
>>
>> scala> sql("CREATE TABLE t AS SELECT map(1,2,1,3) as map")
>> res11: org.apache.spark.sql.DataFrame = []
>>
>> scala> sql("select * from t").show
>> +--------+
>> |     map|
>> +--------+
>> |[1 -> 3]|
>> +--------+
>> The Hive map value convert has a bug, we should respect the "earlier
>> entry wins" semantic. No ticket yet.
>>
>>
>> scala> sql("select map(1,2,1,3)").collect
>> res14: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
>> Same bug happens at `collect`. No ticket yet.
>>
>> I'll create tickets and list all of them as known issues in 2.4.0.
>>
>> It's arguable if the "earlier entry wins" semantic is reasonable. Fixing
>> it is a behavior change and we can only apply it to master branch.
>>
>> Going back to https://issues.apache.org/jira/browse/SPARK-25823, it's
>> just a symptom of the hive map value converter bug. I think it's a
>> non-blocker.
>>
>> Thanks,
>> Wenchen
>>
>> On Thu, Oct 25, 2018 at 5:31 AM Dongjoon Hyun <do...@gmail.com>
>> wrote:
>>
>>> Hi, All.
>>>
>>> -0 due to the following issue. From Spark 2.4.0, users may get an
>>> incorrect result when they use new `map_fitler` with `map_concat` functions.
>>>
>>> https://issues.apache.org/jira/browse/SPARK-25823
>>>
>>> SPARK-25823 is only aiming to fix the data correctness issue from
>>> `map_filter`.
>>>
>>> PMC members are able to lower the priority. Always, I respect PMC's
>>> decision.
>>>
>>> I'm sending this email to draw more attention to this bug and to give
>>> some warning on the new feature's limitation to the community.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Mon, Oct 22, 2018 at 10:42 AM Wenchen Fan <cl...@gmail.com>
>>> wrote:
>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 2.4.0.
>>>>
>>>> The vote is open until October 26 PST and passes if a majority +1 PMC
>>>> votes are cast, with
>>>> a minimum of 3 +1 votes.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 2.4.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> The tag to be voted on is v2.4.0-rc4 (commit
>>>> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>>>> https://github.com/apache/spark/tree/v2.4.0-rc4
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>>>>
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1290
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>>>>
>>>> The list of bug fixes going into 2.4.0 can be found at the following
>>>> URL:
>>>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>>>
>>>> FAQ
>>>>
>>>> =========================
>>>> How can I help test this release?
>>>> =========================
>>>>
>>>> If you are a Spark user, you can help us test this release by taking
>>>> an existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> If you're working in PySpark you can set up a virtual env and install
>>>> the current RC and see if anything important breaks, in the Java/Scala
>>>> you can add the staging repository to your projects resolvers and test
>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>> you don't end up building with a out of date RC going forward).
>>>>
>>>> ===========================================
>>>> What should happen to JIRA tickets still targeting 2.4.0?
>>>> ===========================================
>>>>
>>>> The current list of open tickets targeted at 2.4.0 can be found at:
>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>> Version/s" = 2.4.0
>>>>
>>>> Committers should look at those and triage. Extremely important bug
>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>> be worked on immediately. Everything else please retarget to an
>>>> appropriate release.
>>>>
>>>> ==================
>>>> But my bug isn't fixed?
>>>> ==================
>>>>
>>>> In order to make timely releases, we will typically not hold the
>>>> release unless the bug in question is a regression from the previous
>>>> release. That being said, if there is something which is a regression
>>>> that has not been correctly targeted please ping me or a committer to
>>>> help target the issue.
>>>>
>>>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Dongjoon Hyun <do...@gmail.com>.

Thank you for the follow-ups.

Then, Spark 2.4.1 will return `{1:2}` differently from the followings
(including Spark/Scala) in the end?

I hoped to fix the `map_filter`, but now Spark looks inconsistent in many
ways.

scala> sql("select map(1,2,1,3)").show // Spark 2.2.2
+---------------+
|map(1, 2, 1, 3)|
+---------------+
|    Map(1 -> 3)|
+---------------+


spark-sql> select map(1,2,1,3); // Spark 2.4.0 RC4
{1:3}


hive> select map(1,2,1,3);  // Hive 1.2.2
OK
{1:3}


presto> SELECT map_concat(map(array[1],array[2]), map(array[1],array[3]));
// Presto 0.212
 _col0
-------
 {1=3}


Bests,
Dongjoon.


On Wed, Oct 24, 2018 at 5:17 PM Wenchen Fan <cl...@gmail.com> wrote:

> Hi Dongjoon,
>
> Thanks for reporting it! This is indeed a bug that needs to be fixed.
>
> The problem is not about the function `map_filter`, but about how the map
> type values are created in Spark, when there are duplicated keys.
>
> In programming languages like Java/Scala, when creating map, the later
> entry wins. e.g. in scala
> scala> Map(1 -> 2, 1 -> 3)
> res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)
>
> scala> Map(1 -> 2, 1 -> 3).get(1)
> res1: Option[Int] = Some(3)
>
> However, in Spark, the earlier entry wins
> scala> sql("SELECT map(1,2,1,3)[1]").show
> +------------------+
> |map(1, 2, 1, 3)[1]|
> +------------------+
> |                 2|
> +------------------+
>
> So for Spark users, Map(1 -> 2, 1 -> 3) should be equal to Map(1 -> 2).
>
> But there are several bugs in Spark
>
> scala> sql("SELECT map(1,2,1,3)").show
> +----------------+
> | map(1, 2, 1, 3)|
> +----------------+
> |[1 -> 2, 1 -> 3]|
> +----------------+
> The displayed string of map values has a bug and we should deduplicate the
> entries, This is tracked by SPARK-25824.
>
>
> scala> sql("CREATE TABLE t AS SELECT map(1,2,1,3) as map")
> res11: org.apache.spark.sql.DataFrame = []
>
> scala> sql("select * from t").show
> +--------+
> |     map|
> +--------+
> |[1 -> 3]|
> +--------+
> The Hive map value convert has a bug, we should respect the "earlier entry
> wins" semantic. No ticket yet.
>
>
> scala> sql("select map(1,2,1,3)").collect
> res14: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
> Same bug happens at `collect`. No ticket yet.
>
> I'll create tickets and list all of them as known issues in 2.4.0.
>
> It's arguable if the "earlier entry wins" semantic is reasonable. Fixing
> it is a behavior change and we can only apply it to master branch.
>
> Going back to https://issues.apache.org/jira/browse/SPARK-25823, it's
> just a symptom of the hive map value converter bug. I think it's a
> non-blocker.
>
> Thanks,
> Wenchen
>
> On Thu, Oct 25, 2018 at 5:31 AM Dongjoon Hyun <do...@gmail.com>
> wrote:
>
>> Hi, All.
>>
>> -0 due to the following issue. From Spark 2.4.0, users may get an
>> incorrect result when they use new `map_fitler` with `map_concat` functions.
>>
>> https://issues.apache.org/jira/browse/SPARK-25823
>>
>> SPARK-25823 is only aiming to fix the data correctness issue from
>> `map_filter`.
>>
>> PMC members are able to lower the priority. Always, I respect PMC's
>> decision.
>>
>> I'm sending this email to draw more attention to this bug and to give
>> some warning on the new feature's limitation to the community.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Mon, Oct 22, 2018 at 10:42 AM Wenchen Fan <cl...@gmail.com> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.4.0.
>>>
>>> The vote is open until October 26 PST and passes if a majority +1 PMC
>>> votes are cast, with
>>> a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.4.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.4.0-rc4 (commit
>>> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>>> https://github.com/apache/spark/tree/v2.4.0-rc4
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1290
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>>>
>>> The list of bug fixes going into 2.4.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>>
>>> FAQ
>>>
>>> =========================
>>> How can I help test this release?
>>> =========================
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===========================================
>>> What should happen to JIRA tickets still targeting 2.4.0?
>>> ===========================================
>>>
>>> The current list of open tickets targeted at 2.4.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.4.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==================
>>> But my bug isn't fixed?
>>> ==================
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Wenchen Fan <cl...@gmail.com>.

Hi Dongjoon,

Thanks for reporting it! This is indeed a bug that needs to be fixed.

The problem is not about the function `map_filter`, but about how the map
type values are created in Spark, when there are duplicated keys.

In programming languages like Java/Scala, when creating map, the later
entry wins. e.g. in scala
scala> Map(1 -> 2, 1 -> 3)
res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 3)

scala> Map(1 -> 2, 1 -> 3).get(1)
res1: Option[Int] = Some(3)

However, in Spark, the earlier entry wins
scala> sql("SELECT map(1,2,1,3)[1]").show
+------------------+
|map(1, 2, 1, 3)[1]|
+------------------+
|                 2|
+------------------+

So for Spark users, Map(1 -> 2, 1 -> 3) should be equal to Map(1 -> 2).

But there are several bugs in Spark

scala> sql("SELECT map(1,2,1,3)").show
+----------------+
| map(1, 2, 1, 3)|
+----------------+
|[1 -> 2, 1 -> 3]|
+----------------+
The displayed string of map values has a bug and we should deduplicate the
entries, This is tracked by SPARK-25824.


scala> sql("CREATE TABLE t AS SELECT map(1,2,1,3) as map")
res11: org.apache.spark.sql.DataFrame = []

scala> sql("select * from t").show
+--------+
|     map|
+--------+
|[1 -> 3]|
+--------+
The Hive map value convert has a bug, we should respect the "earlier entry
wins" semantic. No ticket yet.


scala> sql("select map(1,2,1,3)").collect
res14: Array[org.apache.spark.sql.Row] = Array([Map(1 -> 3)])
Same bug happens at `collect`. No ticket yet.

I'll create tickets and list all of them as known issues in 2.4.0.

It's arguable if the "earlier entry wins" semantic is reasonable. Fixing it
is a behavior change and we can only apply it to master branch.

Going back to https://issues.apache.org/jira/browse/SPARK-25823, it's just
a symptom of the hive map value converter bug. I think it's a non-blocker.

Thanks,
Wenchen

On Thu, Oct 25, 2018 at 5:31 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Hi, All.
>
> -0 due to the following issue. From Spark 2.4.0, users may get an
> incorrect result when they use new `map_fitler` with `map_concat` functions.
>
> https://issues.apache.org/jira/browse/SPARK-25823
>
> SPARK-25823 is only aiming to fix the data correctness issue from
> `map_filter`.
>
> PMC members are able to lower the priority. Always, I respect PMC's
> decision.
>
> I'm sending this email to draw more attention to this bug and to give some
> warning on the new feature's limitation to the community.
>
> Bests,
> Dongjoon.
>
>
> On Mon, Oct 22, 2018 at 10:42 AM Wenchen Fan <cl...@gmail.com> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.4.0.
>>
>> The vote is open until October 26 PST and passes if a majority +1 PMC
>> votes are cast, with
>> a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.4.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.4.0-rc4 (commit
>> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
>> https://github.com/apache/spark/tree/v2.4.0-rc4
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1290
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>>
>> The list of bug fixes going into 2.4.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>
>> FAQ
>>
>> =========================
>> How can I help test this release?
>> =========================
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===========================================
>> What should happen to JIRA tickets still targeting 2.4.0?
>> ===========================================
>>
>> The current list of open tickets targeted at 2.4.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 2.4.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==================
>> But my bug isn't fixed?
>> ==================
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by Dongjoon Hyun <do...@gmail.com>.

Hi, All.

-0 due to the following issue. From Spark 2.4.0, users may get an incorrect
result when they use new `map_fitler` with `map_concat` functions.

https://issues.apache.org/jira/browse/SPARK-25823

SPARK-25823 is only aiming to fix the data correctness issue from
`map_filter`.

PMC members are able to lower the priority. Always, I respect PMC's
decision.

I'm sending this email to draw more attention to this bug and to give some
warning on the new feature's limitation to the community.

Bests,
Dongjoon.


On Mon, Oct 22, 2018 at 10:42 AM Wenchen Fan <cl...@gmail.com> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.4.0.
>
> The vote is open until October 26 PST and passes if a majority +1 PMC
> votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc4 (commit
> e69e2bfa486d8d3b9d203b96ca9c0f37c2b6cabe):
> https://github.com/apache/spark/tree/v2.4.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1290
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc4-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =========================
> How can I help test this release?
> =========================
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===========================================
> What should happen to JIRA tickets still targeting 2.4.0?
> ===========================================
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==================
> But my bug isn't fixed?
> ==================
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>

Re: [VOTE] SPARK 2.4.0 (RC4)

Posted by "Aron.tao" <ta...@gmail.com>.

+1



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org