You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Patrick Wendell <pw...@gmail.com> on 2015/02/18 09:12:08 UTC

[VOTE] Release Apache Spark 1.3.0 (RC1)

Please vote on releasing the following candidate as Apache Spark version 1.3.0!

The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.3.0-rc1/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1069/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.3.0-rc1-docs/

Please vote on releasing this package as Apache Spark 1.3.0!

The vote is open until Saturday, February 21, at 08:03 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.3.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.2 workload and running on this release candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.3 QA period,
so -1 votes should only occur for significant regressions from 1.2.1.
Bugs already present in 1.2.X, minor regressions, or bugs related
to new features will not block this release.

- Patrick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Sandor Van Wassenhove <sa...@palantir.com>.

FWIW, I tested the first rc and saw no regressions. I ran our benchmarks
built against spark 1.3 and saw results consistent with spark 1.2/1.2.1.

On 2/25/15, 5:51 PM, "Patrick Wendell" <pw...@gmail.com> wrote:

>Hey All,
>
>Just a quick updated on this thread. Issues have continued to trickle
>in. Not all of them are blocker level but enough to warrant another
>RC:
>
>I've been keeping the JIRA dashboard up and running with the latest
>status (sorry, long link):
>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jir
>a_issues_-3Fjql-3Dproject-2520-253D-2520SPARK-2520AND-2520-2522Target-2520
>Version-252Fs-2522-2520-253D-25201.3.0-2520AND-2520-28fixVersion-2520IS-25
>20EMPTY-2520OR-2520fixVersion-2520-21-253D-25201.3.0-29-2520AND-2520-28Res
>olution-2520IS-2520EMPTY-2520OR-2520Resolution-2520IN-2520-28Done-252C-252
>0Fixed-252C-2520Implemented-29-29-2520ORDER-2520BY-2520priority-252C-2520c
>omponent&d=AwIFAw&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=cyguR-hd
>uPXP87jeUDbz1NGOZ18iIQjDTb_C1-_2JUA&m=frmHzwi9qJcMu2udAW6MBS4NWwKmHCBBpCG9
>zeuaRhA&s=SEjc91m9Dpx8QLLWlMK_5G0ORYtTHlLR2r3091n9qU0&e=
>
>One these are in I will cut another RC. Thanks everyone for the
>continued voting!
>
>- Patrick
>
>On Mon, Feb 23, 2015 at 10:52 PM, Tathagata Das
><ta...@gmail.com> wrote:
>> Hey all,
>>
>> I found a major issue where JobProgressListener (a listener used to keep
>> track of jobs for the web UI) never forgets stages in one of its data
>> structures. This is a blocker for long running applications.
>> 
>>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_ji
>>ra_browse_SPARK-2D5967&d=AwIFAw&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oO
>>nmz8&r=cyguR-hduPXP87jeUDbz1NGOZ18iIQjDTb_C1-_2JUA&m=frmHzwi9qJcMu2udAW6M
>>BS4NWwKmHCBBpCG9zeuaRhA&s=06QttEOx2YqhPQ2sWdQmOElwog_cJ5iT2Mqa1_5jnl4&e=
>>
>> I am testing a fix for this right now.
>>
>> TD
>>
>> On Mon, Feb 23, 2015 at 7:23 PM, Soumitra Kumar
>><ku...@gmail.com>
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> For: 
>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_j
>>>ira_browse_SPARK-2D3660&d=AwIFAw&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6
>>>oOnmz8&r=cyguR-hduPXP87jeUDbz1NGOZ18iIQjDTb_C1-_2JUA&m=frmHzwi9qJcMu2udA
>>>W6MBS4NWwKmHCBBpCG9zeuaRhA&s=0sBvf0vWgAski9HweupKdPZwWdYH0Mimda14oHnNVDA
>>>&e= 
>>>
>>> . Docs OK
>>> . Example code is good
>>>
>>> -Soumitra.
>>>
>>>
>>> On Mon, Feb 23, 2015 at 10:33 AM, Marcelo Vanzin <va...@cloudera.com>
>>> wrote:
>>>
>>> > Hi Tom, are you using an sbt-built assembly by any chance? If so,
>>>take
>>> > a look at SPARK-5808.
>>> >
>>> > I haven't had any problems with the maven-built assembly. Setting
>>> > SPARK_HOME on the executors is a workaround if you want to use the
>>>sbt
>>> > assembly.
>>> >
>>> > On Fri, Feb 20, 2015 at 2:56 PM, Tom Graves
>>> > <tg...@yahoo.com.invalid> wrote:
>>> > > Trying to run pyspark on yarn in client mode with basic wordcount
>>> > example I see the following error when doing the collect:
>>> > > Error from python worker:  /usr/bin/python: No module named
>>> > sqlPYTHONPATH was:
>>> >
>>> 
>>>/grid/3/tmp/yarn-local/usercache/tgraves/filecache/20/spark-assembly-1.3
>>>.0-hadoop2.6.0.1.1411101121.jarjava.io.EOFException
>>> >       at java.io.DataInputStream.readInt(DataInputStream.java:392)
>>> > at
>>> >
>>> 
>>>org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorker
>>>Factory.scala:163)
>>> >       at
>>> >
>>> 
>>>org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(Pyth
>>>onWorkerFactory.scala:86)
>>> >       at
>>> >
>>> 
>>>org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFacto
>>>ry.scala:62)
>>> >       at 
>>>org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105)
>>> >       at
>>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69)
>>> >       at 
>>>org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at
>>> > org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:308)
>>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at
>>> >
>>> 
>>>org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:6
>>>8)
>>> >       at
>>> >
>>> 
>>>org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:4
>>>1)
>>> >       at org.apache.spark.scheduler.Task.run(Task.scala:64)        at
>>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
>>> >   at
>>> >
>>> 
>>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.jav
>>>a:1145)
>>> >       at
>>> >
>>> 
>>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
>>>va:615)
>>> >       at java.lang.Thread.run(Thread.java:722)
>>> > > any ideas on this?
>>> > > Tom
>>> > >
>>> > >      On Wednesday, February 18, 2015 2:14 AM, Patrick Wendell <
>>> > pwendell@gmail.com> wrote:
>>> > >
>>> > >
>>> > >  Please vote on releasing the following candidate as Apache Spark
>>> > version 1.3.0!
>>> > >
>>> > > The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a):
>>> > >
>>> >
>>> 
>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__git-2Dwip-2Dus.apac
>>>he.org_repos_asf-3Fp-3Dspark.git-3Ba-3Dcommit-3Bh-3Df97b0d4a6b2650491681
>>>6d7aefcf3132cd1da6c2&d=AwIFAw&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOn
>>>mz8&r=cyguR-hduPXP87jeUDbz1NGOZ18iIQjDTb_C1-_2JUA&m=frmHzwi9qJcMu2udAW6M
>>>BS4NWwKmHCBBpCG9zeuaRhA&s=DF8Cc8QmI354neHBHJ0HGyQtKL4yOIX2SDDwc0-hshw&e=
>>> 
>>> > >
>>> > > The release files, including signatures, digests, etc. can be
>>>found at:
>>> > > 
>>>https://urldefense.proofpoint.com/v2/url?u=http-3A__people.apache.org_-7
>>>Epwendell_spark-2D1.3.0-2Drc1_&d=AwIFAw&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXB
>>>rZ4tFb6oOnmz8&r=cyguR-hduPXP87jeUDbz1NGOZ18iIQjDTb_C1-_2JUA&m=frmHzwi9qJ
>>>cMu2udAW6MBS4NWwKmHCBBpCG9zeuaRhA&s=SHWRgoK3UcmmnWVXU0LWjArD2PdG9RYWnO2f
>>>lVC8nMQ&e= 
>>> > >
>>> > > Release artifacts are signed with the following key:
>>> > > 
>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__people.apache.org_k
>>>eys_committer_pwendell.asc&d=AwIFAw&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4t
>>>Fb6oOnmz8&r=cyguR-hduPXP87jeUDbz1NGOZ18iIQjDTb_C1-_2JUA&m=frmHzwi9qJcMu2
>>>udAW6MBS4NWwKmHCBBpCG9zeuaRhA&s=lAnGa6hXGkJQp14UV7lB1zQqOcCeMS3hYG0scwXh
>>>OFw&e= 
>>> > >
>>> > > The staging repository for this release can be found at:
>>> > >
>>> 
>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.o
>>>rg_content_repositories_orgapachespark-2D1069_&d=AwIFAw&c=izlc9mHr637UR4
>>>lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=cyguR-hduPXP87jeUDbz1NGOZ18iIQjDTb_C1-_2
>>>JUA&m=frmHzwi9qJcMu2udAW6MBS4NWwKmHCBBpCG9zeuaRhA&s=TOEI0htKa2cktRFNdRiM
>>>owZerFsTz44EPFC3qpzDzs8&e=
>>> > >
>>> > > The documentation corresponding to this release can be found at:
>>> > > 
>>>https://urldefense.proofpoint.com/v2/url?u=http-3A__people.apache.org_-7
>>>Epwendell_spark-2D1.3.0-2Drc1-2Ddocs_&d=AwIFAw&c=izlc9mHr637UR4lpLEZLFFS
>>>3Vn2UXBrZ4tFb6oOnmz8&r=cyguR-hduPXP87jeUDbz1NGOZ18iIQjDTb_C1-_2JUA&m=frm
>>>Hzwi9qJcMu2udAW6MBS4NWwKmHCBBpCG9zeuaRhA&s=iduBlV7hay0TwWj6-Gwto3ZBElN4k
>>>0frDTIn0Ce8B8E&e=
>>> > >
>>> > > Please vote on releasing this package as Apache Spark 1.3.0!
>>> > >
>>> > > The vote is open until Saturday, February 21, at 08:03 UTC and
>>>passes
>>> > > if a majority of at least 3 +1 PMC votes are cast.
>>> > >
>>> > > [ ] +1 Release this package as Apache Spark 1.3.0
>>> > > [ ] -1 Do not release this package because ...
>>> > >
>>> > > To learn more about Apache Spark, please see
>>> > > 
>>>https://urldefense.proofpoint.com/v2/url?u=http-3A__spark.apache.org_&d=
>>>AwIFAw&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=cyguR-hduPXP87jeU
>>>Dbz1NGOZ18iIQjDTb_C1-_2JUA&m=frmHzwi9qJcMu2udAW6MBS4NWwKmHCBBpCG9zeuaRhA
>>>&s=UPGEOKzVMEZ-8CqDq6dkvwzKpkF6fmBgy9ZVXanQOcE&e=
>>> > >
>>> > > == How can I help test this release? ==
>>> > > If you are a Spark user, you can help us test this release by
>>> > > taking a Spark 1.2 workload and running on this release candidate,
>>> > > then reporting any regressions.
>>> > >
>>> > > == What justifies a -1 vote for this release? ==
>>> > > This vote is happening towards the end of the 1.3 QA period,
>>> > > so -1 votes should only occur for significant regressions from
>>>1.2.1.
>>> > > Bugs already present in 1.2.X, minor regressions, or bugs related
>>> > > to new features will not block this release.
>>> > >
>>> > > - Patrick
>>> > >
>>> > > 
>>>---------------------------------------------------------------------
>>> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> > > For additional commands, e-mail: dev-help@spark.apache.org
>>> > >
>>> > >
>>> > >
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Marcelo
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> > For additional commands, e-mail: dev-help@spark.apache.org
>>> >
>>> >
>>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>For additional commands, e-mail: dev-help@spark.apache.org
>

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Patrick Wendell <pw...@gmail.com>.

Hey All,

Just a quick updated on this thread. Issues have continued to trickle
in. Not all of them are blocker level but enough to warrant another
RC:

I've been keeping the JIRA dashboard up and running with the latest
status (sorry, long link):
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20%22Target%20Version%2Fs%22%20%3D%201.3.0%20AND%20(fixVersion%20IS%20EMPTY%20OR%20fixVersion%20!%3D%201.3.0)%20AND%20(Resolution%20IS%20EMPTY%20OR%20Resolution%20IN%20(Done%2C%20Fixed%2C%20Implemented))%20ORDER%20BY%20priority%2C%20component

One these are in I will cut another RC. Thanks everyone for the
continued voting!

- Patrick

On Mon, Feb 23, 2015 at 10:52 PM, Tathagata Das
<ta...@gmail.com> wrote:
> Hey all,
>
> I found a major issue where JobProgressListener (a listener used to keep
> track of jobs for the web UI) never forgets stages in one of its data
> structures. This is a blocker for long running applications.
> https://issues.apache.org/jira/browse/SPARK-5967
>
> I am testing a fix for this right now.
>
> TD
>
> On Mon, Feb 23, 2015 at 7:23 PM, Soumitra Kumar <ku...@gmail.com>
> wrote:
>
>> +1 (non-binding)
>>
>> For: https://issues.apache.org/jira/browse/SPARK-3660
>>
>> . Docs OK
>> . Example code is good
>>
>> -Soumitra.
>>
>>
>> On Mon, Feb 23, 2015 at 10:33 AM, Marcelo Vanzin <va...@cloudera.com>
>> wrote:
>>
>> > Hi Tom, are you using an sbt-built assembly by any chance? If so, take
>> > a look at SPARK-5808.
>> >
>> > I haven't had any problems with the maven-built assembly. Setting
>> > SPARK_HOME on the executors is a workaround if you want to use the sbt
>> > assembly.
>> >
>> > On Fri, Feb 20, 2015 at 2:56 PM, Tom Graves
>> > <tg...@yahoo.com.invalid> wrote:
>> > > Trying to run pyspark on yarn in client mode with basic wordcount
>> > example I see the following error when doing the collect:
>> > > Error from python worker:  /usr/bin/python: No module named
>> > sqlPYTHONPATH was:
>> >
>> /grid/3/tmp/yarn-local/usercache/tgraves/filecache/20/spark-assembly-1.3.0-hadoop2.6.0.1.1411101121.jarjava.io.EOFException
>> >       at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> > at
>> >
>> org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
>> >       at
>> >
>> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
>> >       at
>> >
>> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
>> >       at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105)
>> >       at
>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69)
>> >       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at
>> > org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:308)
>> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> >       at
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> >       at org.apache.spark.scheduler.Task.run(Task.scala:64)        at
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
>> >   at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >       at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >       at java.lang.Thread.run(Thread.java:722)
>> > > any ideas on this?
>> > > Tom
>> > >
>> > >      On Wednesday, February 18, 2015 2:14 AM, Patrick Wendell <
>> > pwendell@gmail.com> wrote:
>> > >
>> > >
>> > >  Please vote on releasing the following candidate as Apache Spark
>> > version 1.3.0!
>> > >
>> > > The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a):
>> > >
>> >
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2
>> > >
>> > > The release files, including signatures, digests, etc. can be found at:
>> > > http://people.apache.org/~pwendell/spark-1.3.0-rc1/
>> > >
>> > > Release artifacts are signed with the following key:
>> > > https://people.apache.org/keys/committer/pwendell.asc
>> > >
>> > > The staging repository for this release can be found at:
>> > >
>> https://repository.apache.org/content/repositories/orgapachespark-1069/
>> > >
>> > > The documentation corresponding to this release can be found at:
>> > > http://people.apache.org/~pwendell/spark-1.3.0-rc1-docs/
>> > >
>> > > Please vote on releasing this package as Apache Spark 1.3.0!
>> > >
>> > > The vote is open until Saturday, February 21, at 08:03 UTC and passes
>> > > if a majority of at least 3 +1 PMC votes are cast.
>> > >
>> > > [ ] +1 Release this package as Apache Spark 1.3.0
>> > > [ ] -1 Do not release this package because ...
>> > >
>> > > To learn more about Apache Spark, please see
>> > > http://spark.apache.org/
>> > >
>> > > == How can I help test this release? ==
>> > > If you are a Spark user, you can help us test this release by
>> > > taking a Spark 1.2 workload and running on this release candidate,
>> > > then reporting any regressions.
>> > >
>> > > == What justifies a -1 vote for this release? ==
>> > > This vote is happening towards the end of the 1.3 QA period,
>> > > so -1 votes should only occur for significant regressions from 1.2.1.
>> > > Bugs already present in 1.2.X, minor regressions, or bugs related
>> > > to new features will not block this release.
>> > >
>> > > - Patrick
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > > For additional commands, e-mail: dev-help@spark.apache.org
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> > --
>> > Marcelo
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: dev-help@spark.apache.org
>> >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Tathagata Das <ta...@gmail.com>.

Hey all,

I found a major issue where JobProgressListener (a listener used to keep
track of jobs for the web UI) never forgets stages in one of its data
structures. This is a blocker for long running applications.
https://issues.apache.org/jira/browse/SPARK-5967

I am testing a fix for this right now.

TD

On Mon, Feb 23, 2015 at 7:23 PM, Soumitra Kumar <ku...@gmail.com>
wrote:

> +1 (non-binding)
>
> For: https://issues.apache.org/jira/browse/SPARK-3660
>
> . Docs OK
> . Example code is good
>
> -Soumitra.
>
>
> On Mon, Feb 23, 2015 at 10:33 AM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
> > Hi Tom, are you using an sbt-built assembly by any chance? If so, take
> > a look at SPARK-5808.
> >
> > I haven't had any problems with the maven-built assembly. Setting
> > SPARK_HOME on the executors is a workaround if you want to use the sbt
> > assembly.
> >
> > On Fri, Feb 20, 2015 at 2:56 PM, Tom Graves
> > <tg...@yahoo.com.invalid> wrote:
> > > Trying to run pyspark on yarn in client mode with basic wordcount
> > example I see the following error when doing the collect:
> > > Error from python worker:  /usr/bin/python: No module named
> > sqlPYTHONPATH was:
> >
> /grid/3/tmp/yarn-local/usercache/tgraves/filecache/20/spark-assembly-1.3.0-hadoop2.6.0.1.1411101121.jarjava.io.EOFException
> >       at java.io.DataInputStream.readInt(DataInputStream.java:392)
> > at
> >
> org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
> >       at
> >
> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
> >       at
> >
> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
> >       at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105)
> >       at
> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69)
> >       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> >     at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at
> > org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:308)
> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> >       at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> >       at org.apache.spark.scheduler.Task.run(Task.scala:64)        at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
> >   at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >       at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >       at java.lang.Thread.run(Thread.java:722)
> > > any ideas on this?
> > > Tom
> > >
> > >      On Wednesday, February 18, 2015 2:14 AM, Patrick Wendell <
> > pwendell@gmail.com> wrote:
> > >
> > >
> > >  Please vote on releasing the following candidate as Apache Spark
> > version 1.3.0!
> > >
> > > The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a):
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2
> > >
> > > The release files, including signatures, digests, etc. can be found at:
> > > http://people.apache.org/~pwendell/spark-1.3.0-rc1/
> > >
> > > Release artifacts are signed with the following key:
> > > https://people.apache.org/keys/committer/pwendell.asc
> > >
> > > The staging repository for this release can be found at:
> > >
> https://repository.apache.org/content/repositories/orgapachespark-1069/
> > >
> > > The documentation corresponding to this release can be found at:
> > > http://people.apache.org/~pwendell/spark-1.3.0-rc1-docs/
> > >
> > > Please vote on releasing this package as Apache Spark 1.3.0!
> > >
> > > The vote is open until Saturday, February 21, at 08:03 UTC and passes
> > > if a majority of at least 3 +1 PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Spark 1.3.0
> > > [ ] -1 Do not release this package because ...
> > >
> > > To learn more about Apache Spark, please see
> > > http://spark.apache.org/
> > >
> > > == How can I help test this release? ==
> > > If you are a Spark user, you can help us test this release by
> > > taking a Spark 1.2 workload and running on this release candidate,
> > > then reporting any regressions.
> > >
> > > == What justifies a -1 vote for this release? ==
> > > This vote is happening towards the end of the 1.3 QA period,
> > > so -1 votes should only occur for significant regressions from 1.2.1.
> > > Bugs already present in 1.2.X, minor regressions, or bugs related
> > > to new features will not block this release.
> > >
> > > - Patrick
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > > For additional commands, e-mail: dev-help@spark.apache.org
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > Marcelo
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
> >
>

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Soumitra Kumar <ku...@gmail.com>.

+1 (non-binding)

For: https://issues.apache.org/jira/browse/SPARK-3660

. Docs OK
. Example code is good

-Soumitra.


On Mon, Feb 23, 2015 at 10:33 AM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> Hi Tom, are you using an sbt-built assembly by any chance? If so, take
> a look at SPARK-5808.
>
> I haven't had any problems with the maven-built assembly. Setting
> SPARK_HOME on the executors is a workaround if you want to use the sbt
> assembly.
>
> On Fri, Feb 20, 2015 at 2:56 PM, Tom Graves
> <tg...@yahoo.com.invalid> wrote:
> > Trying to run pyspark on yarn in client mode with basic wordcount
> example I see the following error when doing the collect:
> > Error from python worker:  /usr/bin/python: No module named
> sqlPYTHONPATH was:
> /grid/3/tmp/yarn-local/usercache/tgraves/filecache/20/spark-assembly-1.3.0-hadoop2.6.0.1.1411101121.jarjava.io.EOFException
>       at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)
>       at
> org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)
>       at
> org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
>       at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105)
>       at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69)
>       at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at
> org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:308)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>       at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>       at org.apache.spark.scheduler.Task.run(Task.scala:64)        at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
>   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:722)
> > any ideas on this?
> > Tom
> >
> >      On Wednesday, February 18, 2015 2:14 AM, Patrick Wendell <
> pwendell@gmail.com> wrote:
> >
> >
> >  Please vote on releasing the following candidate as Apache Spark
> version 1.3.0!
> >
> > The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a):
> >
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-1.3.0-rc1/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1069/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-1.3.0-rc1-docs/
> >
> > Please vote on releasing this package as Apache Spark 1.3.0!
> >
> > The vote is open until Saturday, February 21, at 08:03 UTC and passes
> > if a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 1.3.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see
> > http://spark.apache.org/
> >
> > == How can I help test this release? ==
> > If you are a Spark user, you can help us test this release by
> > taking a Spark 1.2 workload and running on this release candidate,
> > then reporting any regressions.
> >
> > == What justifies a -1 vote for this release? ==
> > This vote is happening towards the end of the 1.3 QA period,
> > so -1 votes should only occur for significant regressions from 1.2.1.
> > Bugs already present in 1.2.X, minor regressions, or bugs related
> > to new features will not block this release.
> >
> > - Patrick
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> > For additional commands, e-mail: dev-help@spark.apache.org
> >
> >
> >
> >
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Marcelo Vanzin <va...@cloudera.com>.

Hi Tom, are you using an sbt-built assembly by any chance? If so, take
a look at SPARK-5808.

I haven't had any problems with the maven-built assembly. Setting
SPARK_HOME on the executors is a workaround if you want to use the sbt
assembly.

On Fri, Feb 20, 2015 at 2:56 PM, Tom Graves
<tg...@yahoo.com.invalid> wrote:
> Trying to run pyspark on yarn in client mode with basic wordcount example I see the following error when doing the collect:
> Error from python worker:  /usr/bin/python: No module named sqlPYTHONPATH was:  /grid/3/tmp/yarn-local/usercache/tgraves/filecache/20/spark-assembly-1.3.0-hadoop2.6.0.1.1411101121.jarjava.io.EOFException        at java.io.DataInputStream.readInt(DataInputStream.java:392)        at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)        at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105)        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69)        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:308)        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)        at org.apache.spark.scheduler.Task.run(Task.scala:64)        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)        at java.lang.Thread.run(Thread.java:722)
> any ideas on this?
> Tom
>
>      On Wednesday, February 18, 2015 2:14 AM, Patrick Wendell <pw...@gmail.com> wrote:
>
>
>  Please vote on releasing the following candidate as Apache Spark version 1.3.0!
>
> The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc1/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1069/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc1-docs/
>
> Please vote on releasing this package as Apache Spark 1.3.0!
>
> The vote is open until Saturday, February 21, at 08:03 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == How can I help test this release? ==
> If you are a Spark user, you can help us test this release by
> taking a Spark 1.2 workload and running on this release candidate,
> then reporting any regressions.
>
> == What justifies a -1 vote for this release? ==
> This vote is happening towards the end of the 1.3 QA period,
> so -1 votes should only occur for significant regressions from 1.2.1.
> Bugs already present in 1.2.X, minor regressions, or bugs related
> to new features will not block this release.
>
> - Patrick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

Trying to run pyspark on yarn in client mode with basic wordcount example I see the following error when doing the collect:
Error from python worker:  /usr/bin/python: No module named sqlPYTHONPATH was:  /grid/3/tmp/yarn-local/usercache/tgraves/filecache/20/spark-assembly-1.3.0-hadoop2.6.0.1.1411101121.jarjava.io.EOFException        at java.io.DataInputStream.readInt(DataInputStream.java:392)        at org.apache.spark.api.python.PythonWorkerFactory.startDaemon(PythonWorkerFactory.scala:163)        at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:86)        at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)        at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:105)        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:69)        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:308)        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)        at org.apache.spark.scheduler.Task.run(Task.scala:64)        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)        at java.lang.Thread.run(Thread.java:722)
any ideas on this?
Tom 

     On Wednesday, February 18, 2015 2:14 AM, Patrick Wendell <pw...@gmail.com> wrote:
   

 Please vote on releasing the following candidate as Apache Spark version 1.3.0!

The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.3.0-rc1/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1069/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.3.0-rc1-docs/

Please vote on releasing this package as Apache Spark 1.3.0!

The vote is open until Saturday, February 21, at 08:03 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.3.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.2 workload and running on this release candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.3 QA period,
so -1 votes should only occur for significant regressions from 1.2.1.
Bugs already present in 1.2.X, minor regressions, or bugs related
to new features will not block this release.

- Patrick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Timothy Chen <tn...@gmail.com>.

+1 (non-binding)

Tested Mesos coarse/fine-grained mode with 4 nodes Mesos cluster with
simple shuffle/map task.

Will be testing with more complete suite (ie: spark-perf) once the
infrastructure is setup to do so.

Tim

On Thu, Feb 19, 2015 at 12:50 PM, Krishna Sankar <ks...@gmail.com> wrote:
> Excellent. Explicit toDF() works.
> a) employees.toDF().registerTempTable("Employees") - works
> b) Also affects saveAsParquetFile - orders.toDF().saveAsParquetFile
>
> Adding to my earlier tests:
> 4.0 SQL from Scala and Python
> 4.1 result = sqlContext.sql("SELECT * from Employees WHERE State = 'WA'") OK
> 4.2 result = sqlContext.sql("SELECT
> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
> 4.3 result = sqlContext.sql("SELECT ShipCountry, Sum(OrderDetails.UnitPrice
> * Qty * Discount) AS ProductSales FROM Orders INNER JOIN OrderDetails ON
> Orders.OrderID = OrderDetails.OrderID GROUP BY ShipCountry") OK
> 4.4 saveAsParquetFile OK
> 4.5 Read and verify the 4.4 save - sqlContext.parquetFile,
> registerTempTable, sql OK
>
> Cheers & thanks Michael
> <k/>
>
>
>
> On Thu, Feb 19, 2015 at 12:02 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> P.S: For some reason replacing  "import sqlContext.createSchemaRDD" with "
>>> import sqlContext.implicits._" doesn't do the implicit conversations.
>>> registerTempTable
>>> gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?
>>
>>
>> We will write up a whole migration guide before the final release, but I
>> can quickly explain this one.  We made the implicit conversion
>> significantly less broad to avoid the chance of confusing conflicts.
>> However, now you have to call .toDF in order to force RDDs to become
>> DataFrames.
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Krishna Sankar <ks...@gmail.com>.

Excellent. Explicit toDF() works.
a) employees.toDF().registerTempTable("Employees") - works
b) Also affects saveAsParquetFile - orders.toDF().saveAsParquetFile

Adding to my earlier tests:
4.0 SQL from Scala and Python
4.1 result = sqlContext.sql("SELECT * from Employees WHERE State = 'WA'") OK
4.2 result = sqlContext.sql("SELECT
OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
4.3 result = sqlContext.sql("SELECT ShipCountry, Sum(OrderDetails.UnitPrice
* Qty * Discount) AS ProductSales FROM Orders INNER JOIN OrderDetails ON
Orders.OrderID = OrderDetails.OrderID GROUP BY ShipCountry") OK
4.4 saveAsParquetFile OK
4.5 Read and verify the 4.4 save - sqlContext.parquetFile,
registerTempTable, sql OK

Cheers & thanks Michael
<k/>



On Thu, Feb 19, 2015 at 12:02 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> P.S: For some reason replacing  "import sqlContext.createSchemaRDD" with "
>> import sqlContext.implicits._" doesn't do the implicit conversations.
>> registerTempTable
>> gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?
>
>
> We will write up a whole migration guide before the final release, but I
> can quickly explain this one.  We made the implicit conversion
> significantly less broad to avoid the chance of confusing conflicts.
> However, now you have to call .toDF in order to force RDDs to become
> DataFrames.
>

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Michael Armbrust <mi...@databricks.com>.

>
> P.S: For some reason replacing  "import sqlContext.createSchemaRDD" with "
> import sqlContext.implicits._" doesn't do the implicit conversations.
> registerTempTable
> gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?


We will write up a whole migration guide before the final release, but I
can quickly explain this one.  We made the implicit conversion
significantly less broad to avoid the chance of confusing conflicts.
However, now you have to call .toDF in order to force RDDs to become
DataFrames.

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Krishna Sankar <ks...@gmail.com>.

+1 (non-binding, of course)

1. Compiled OSX 10.10 (Yosemite) OK Total time: 14:50 min
     mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive -DskipTests -Dscala-2.11
2. Tested pyspark, mlib - running as well as compare results with 1.1.x &
1.2.x
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Laso Regression OK

But MSE has increased from 40.81 to 105.86. Has some refactoring happened
on SGD/Linear Models ? Or do we have some extra parameters ? or change
of defaults ?

2.3. Decision Tree, Naive Bayes OK
2.4. KMeans OK
       Center And Scale OK
       WSSSE has come down slightly
2.5. rdd operations OK
      State of the Union Texts - MapReduce, Filter,sortByKey (word count)
2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
       Model evaluation/optimization (rank, numIter, lmbda) with itertools
OK
3. Scala - MLlib
3.1. statistics (min,max,mean,Pearson,Spearman) OK
3.2. LinearRegressionWIthSGD OK
3.3. Decision Tree OK
3.4. KMeans OK
3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK

Cheers
<k/>
P.S: For some reason replacing  "import sqlContext.createSchemaRDD" with "
import sqlContext.implicits._" doesn't do the implicit conversations.
registerTempTable
gives syntax error. I will dig deeper tomorrow. Has anyone seen this ?

On Wed, Feb 18, 2015 at 3:25 PM, Sean Owen <so...@cloudera.com> wrote:

> On Wed, Feb 18, 2015 at 6:13 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> >> Patrick this link gives a 404:
> >> https://people.apache.org/keys/committer/pwendell.asc
> >
> > Works for me. Maybe it's some ephemeral issue?
>
> Yes works now; I swear it didn't before! that's all set now. The
> signing key is in that file.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Corey Nolet <cj...@gmail.com>.

+1 (non-binding)

- Verified signatures using [1]
- Built on MacOSX Yosemite
- Built on Fedora 21

Each build was run with and Hadoop-2.4 version with yarn, hive, and
hive-thriftserver profiles

I am having trouble getting all the tests passing on a single run on both
machines but we have this same problem on other projects as well.

[1] https://github.com/cjnolet/nexus-staging-gpg-verify


On Wed, Feb 18, 2015 at 6:25 PM, Sean Owen <so...@cloudera.com> wrote:

> On Wed, Feb 18, 2015 at 6:13 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
> >> Patrick this link gives a 404:
> >> https://people.apache.org/keys/committer/pwendell.asc
> >
> > Works for me. Maybe it's some ephemeral issue?
>
> Yes works now; I swear it didn't before! that's all set now. The
> signing key is in that file.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Sean Owen <so...@cloudera.com>.

On Wed, Feb 18, 2015 at 6:13 PM, Patrick Wendell <pw...@gmail.com> wrote:
>> Patrick this link gives a 404:
>> https://people.apache.org/keys/committer/pwendell.asc
>
> Works for me. Maybe it's some ephemeral issue?

Yes works now; I swear it didn't before! that's all set now. The
signing key is in that file.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Patrick Wendell <pw...@gmail.com>.

> UISeleniumSuite:
> *** RUN ABORTED ***
>   java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal
> ...

This is a newer test suite. There is something flaky about it, we
should definitely fix it, IMO it's not a blocker though.

>
> Patrick this link gives a 404:
> https://people.apache.org/keys/committer/pwendell.asc

Works for me. Maybe it's some ephemeral issue?

> Finally, I already realized I failed to get the fix for
> https://issues.apache.org/jira/browse/SPARK-5669 correct, and that has
> to be correct for the release. I'll patch that up straight away,
> sorry. I believe the result of the intended fix is still as I
> described in SPARK-5669, so there is no bad news there. A local test
> seems to confirm it and I'm waiting on Jenkins. If it's all good I'll
> merge that fix. So, that much will need a new release, I apologize.

Thanks for finding this. I'm going to leave this open for continued testing...

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Sean Owen <so...@cloudera.com>.

On OS X and Ubuntu I see the following test failure in the source
release for 1.3.0-RC1:

UISeleniumSuite:
*** RUN ABORTED ***
  java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal
...


Patrick this link gives a 404:
https://people.apache.org/keys/committer/pwendell.asc


Finally, I already realized I failed to get the fix for
https://issues.apache.org/jira/browse/SPARK-5669 correct, and that has
to be correct for the release. I'll patch that up straight away,
sorry. I believe the result of the intended fix is still as I
described in SPARK-5669, so there is no bad news there. A local test
seems to confirm it and I'm waiting on Jenkins. If it's all good I'll
merge that fix. So, that much will need a new release, I apologize.


Please keep testing anyway!


Otherwise, I verified the signatures are correct, licenses are
correct, compiles on OS X and Ubuntu 14.


On Wed, Feb 18, 2015 at 8:12 AM, Patrick Wendell <pw...@gmail.com> wrote:
> Please vote on releasing the following candidate as Apache Spark version 1.3.0!
>
> The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc1/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1069/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc1-docs/
>
> Please vote on releasing this package as Apache Spark 1.3.0!
>
> The vote is open until Saturday, February 21, at 08:03 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == How can I help test this release? ==
> If you are a Spark user, you can help us test this release by
> taking a Spark 1.2 workload and running on this release candidate,
> then reporting any regressions.
>
> == What justifies a -1 vote for this release? ==
> This vote is happening towards the end of the 1.3 QA period,
> so -1 votes should only occur for significant regressions from 1.2.1.
> Bugs already present in 1.2.X, minor regressions, or bugs related
> to new features will not block this release.
>
> - Patrick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Cheng Lian <li...@gmail.com>.

My bad, had once fixed all Hive 12 test failures in PR #4107, but didn't 
got time to get it merged.

Considering the release is close, I can cherry-pick those Hive 12 fixes 
from #4107 and open a more surgical PR soon.

Cheng

On 2/24/15 4:18 AM, Michael Armbrust wrote:
> On Sun, Feb 22, 2015 at 11:20 PM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> So what are we expecting of Hive 0.12.0 builds with this RC?  I know not
>> every combination of Hadoop and Hive versions, etc., can be supported, but
>> even an example build from the "Building Spark" page isn't looking too good
>> to me.
>>
> I would definitely expect this to build and we do actually test that for
> each PR.  We don't yet run the tests for both versions of Hive and thus
> unfortunately these do get out of sync.  Usually these are just problems
> diff-ing golden output or cases where we have added a test that uses a
> feature not available in hive 12.
>
> Have you seen problems with using Hive 12 outside of these test failures?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Mark Hamstra <ma...@clearstorydata.com>.

Nothing that I can point to, so this may only be a problem in test scope.
I am looking at a problem where some UDFs that run with 0.12 fail with
0.13; but that problem is already present in Spark 1.2.x, so it's not a
blocking regression for 1.3.  (Very likely a HiveFunctionWrapper serde
problem, but I haven't run it to ground yet.)

On Mon, Feb 23, 2015 at 12:18 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> On Sun, Feb 22, 2015 at 11:20 PM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> So what are we expecting of Hive 0.12.0 builds with this RC?  I know not
>> every combination of Hadoop and Hive versions, etc., can be supported, but
>> even an example build from the "Building Spark" page isn't looking too
>> good
>> to me.
>>
>
> I would definitely expect this to build and we do actually test that for
> each PR.  We don't yet run the tests for both versions of Hive and thus
> unfortunately these do get out of sync.  Usually these are just problems
> diff-ing golden output or cases where we have added a test that uses a
> feature not available in hive 12.
>
> Have you seen problems with using Hive 12 outside of these test failures?
>

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Michael Armbrust <mi...@databricks.com>.

On Sun, Feb 22, 2015 at 11:20 PM, Mark Hamstra <ma...@clearstorydata.com>
wrote:

> So what are we expecting of Hive 0.12.0 builds with this RC?  I know not
> every combination of Hadoop and Hive versions, etc., can be supported, but
> even an example build from the "Building Spark" page isn't looking too good
> to me.
>

I would definitely expect this to build and we do actually test that for
each PR.  We don't yet run the tests for both versions of Hive and thus
unfortunately these do get out of sync.  Usually these are just problems
diff-ing golden output or cases where we have added a test that uses a
feature not available in hive 12.

Have you seen problems with using Hive 12 outside of these test failures?

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Patrick Wendell <pw...@gmail.com>.

It's only been reported on this thread by Tom, so far.

On Mon, Feb 23, 2015 at 10:29 AM, Marcelo Vanzin <va...@cloudera.com> wrote:
> Hey Patrick,
>
> Do you have a link to the bug related to Python and Yarn? I looked at
> the blockers in Jira but couldn't find it.
>
> On Mon, Feb 23, 2015 at 10:18 AM, Patrick Wendell <pw...@gmail.com> wrote:
>> So actually, the list of blockers on JIRA is a bit outdated. These
>> days I won't cut RC1 unless there are no known issues that I'm aware
>> of that would actually block the release (that's what the snapshot
>> ones are for). I'm going to clean those up and push others to do so
>> also.
>>
>> The main issues I'm aware of that came about post RC1 are:
>> 1. Python submission broken on YARN
>> 2. The license issue in MLlib [now fixed].
>> 3. Varargs broken for Java Dataframes [now fixed]
>>
>> Re: Corey - yeah, as it stands now I try to wait if there are things
>> that look like implicit -1 votes.
>
> --
> Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Marcelo Vanzin <va...@cloudera.com>.

Hey Patrick,

Do you have a link to the bug related to Python and Yarn? I looked at
the blockers in Jira but couldn't find it.

On Mon, Feb 23, 2015 at 10:18 AM, Patrick Wendell <pw...@gmail.com> wrote:
> So actually, the list of blockers on JIRA is a bit outdated. These
> days I won't cut RC1 unless there are no known issues that I'm aware
> of that would actually block the release (that's what the snapshot
> ones are for). I'm going to clean those up and push others to do so
> also.
>
> The main issues I'm aware of that came about post RC1 are:
> 1. Python submission broken on YARN
> 2. The license issue in MLlib [now fixed].
> 3. Varargs broken for Java Dataframes [now fixed]
>
> Re: Corey - yeah, as it stands now I try to wait if there are things
> that look like implicit -1 votes.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Patrick Wendell <pw...@gmail.com>.

So actually, the list of blockers on JIRA is a bit outdated. These
days I won't cut RC1 unless there are no known issues that I'm aware
of that would actually block the release (that's what the snapshot
ones are for). I'm going to clean those up and push others to do so
also.

The main issues I'm aware of that came about post RC1 are:
1. Python submission broken on YARN
2. The license issue in MLlib [now fixed].
3. Varargs broken for Java Dataframes [now fixed]

Re: Corey - yeah, as it stands now I try to wait if there are things
that look like implicit -1 votes.

On Mon, Feb 23, 2015 at 6:13 AM, Corey Nolet <cj...@gmail.com> wrote:
> Thanks Sean. I glossed over the comment about SPARK-5669.
>
> On Mon, Feb 23, 2015 at 9:05 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>> Yes my understanding from Patrick's comment is that this RC will not
>> be released, but, to keep testing. There's an implicit -1 out of the
>> gates there, I believe, and so the vote won't pass, so perhaps that's
>> why there weren't further binding votes. I'm sure that will be
>> formalized shortly.
>>
>> FWIW here are 10 issues still listed as blockers for 1.3.0:
>>
>> SPARK-5910 DataFrame.selectExpr("col as newName") does not work
>> SPARK-5904 SPARK-5166 DataFrame methods with varargs do not work in Java
>> SPARK-5873 Can't see partially analyzed plans
>> SPARK-5546 Improve path to Kafka assembly when trying Kafka Python API
>> SPARK-5517 SPARK-5166 Add input types for Java UDFs
>> SPARK-5463 Fix Parquet filter push-down
>> SPARK-5310 SPARK-5166 Update SQL programming guide for 1.3
>> SPARK-5183 SPARK-5180 Document data source API
>> SPARK-3650 Triangle Count handles reverse edges incorrectly
>> SPARK-3511 Create a RELEASE-NOTES.txt file in the repo
>>
>>
>> On Mon, Feb 23, 2015 at 1:55 PM, Corey Nolet <cj...@gmail.com> wrote:
>> > This vote was supposed to close on Saturday but it looks like no PMCs
>> > voted
>> > (other than the implicit vote from Patrick). Was there a discussion
>> > offline
>> > to cut an RC2? Was the vote extended?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Corey Nolet <cj...@gmail.com>.

Thanks Sean. I glossed over the comment about SPARK-5669.

On Mon, Feb 23, 2015 at 9:05 AM, Sean Owen <so...@cloudera.com> wrote:

> Yes my understanding from Patrick's comment is that this RC will not
> be released, but, to keep testing. There's an implicit -1 out of the
> gates there, I believe, and so the vote won't pass, so perhaps that's
> why there weren't further binding votes. I'm sure that will be
> formalized shortly.
>
> FWIW here are 10 issues still listed as blockers for 1.3.0:
>
> SPARK-5910 DataFrame.selectExpr("col as newName") does not work
> SPARK-5904 SPARK-5166 DataFrame methods with varargs do not work in Java
> SPARK-5873 Can't see partially analyzed plans
> SPARK-5546 Improve path to Kafka assembly when trying Kafka Python API
> SPARK-5517 SPARK-5166 Add input types for Java UDFs
> SPARK-5463 Fix Parquet filter push-down
> SPARK-5310 SPARK-5166 Update SQL programming guide for 1.3
> SPARK-5183 SPARK-5180 Document data source API
> SPARK-3650 Triangle Count handles reverse edges incorrectly
> SPARK-3511 Create a RELEASE-NOTES.txt file in the repo
>
>
> On Mon, Feb 23, 2015 at 1:55 PM, Corey Nolet <cj...@gmail.com> wrote:
> > This vote was supposed to close on Saturday but it looks like no PMCs
> voted
> > (other than the implicit vote from Patrick). Was there a discussion
> offline
> > to cut an RC2? Was the vote extended?
>

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Sean Owen <so...@cloudera.com>.

Yes my understanding from Patrick's comment is that this RC will not
be released, but, to keep testing. There's an implicit -1 out of the
gates there, I believe, and so the vote won't pass, so perhaps that's
why there weren't further binding votes. I'm sure that will be
formalized shortly.

FWIW here are 10 issues still listed as blockers for 1.3.0:

SPARK-5910 DataFrame.selectExpr("col as newName") does not work
SPARK-5904 SPARK-5166 DataFrame methods with varargs do not work in Java
SPARK-5873 Can't see partially analyzed plans
SPARK-5546 Improve path to Kafka assembly when trying Kafka Python API
SPARK-5517 SPARK-5166 Add input types for Java UDFs
SPARK-5463 Fix Parquet filter push-down
SPARK-5310 SPARK-5166 Update SQL programming guide for 1.3
SPARK-5183 SPARK-5180 Document data source API
SPARK-3650 Triangle Count handles reverse edges incorrectly
SPARK-3511 Create a RELEASE-NOTES.txt file in the repo


On Mon, Feb 23, 2015 at 1:55 PM, Corey Nolet <cj...@gmail.com> wrote:
> This vote was supposed to close on Saturday but it looks like no PMCs voted
> (other than the implicit vote from Patrick). Was there a discussion offline
> to cut an RC2? Was the vote extended?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Corey Nolet <cj...@gmail.com>.

This vote was supposed to close on Saturday but it looks like no PMCs voted
(other than the implicit vote from Patrick). Was there a discussion offline
to cut an RC2? Was the vote extended?

On Mon, Feb 23, 2015 at 6:59 AM, Robin East <ro...@xense.co.uk> wrote:

> Running ec2 launch scripts gives me the following error:
>
> ssl.SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL
> routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
>
> Full stack trace at
> https://gist.github.com/insidedctm/4d41600bc22560540a26
>
> I’m running OSX Mavericks 10.9.5
>
> I’ll investigate further but wondered if anyone else has run into this.
>
> Robin

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Robin East <ro...@xense.co.uk>.

Running ec2 launch scripts gives me the following error:

ssl.SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Full stack trace at
https://gist.github.com/insidedctm/4d41600bc22560540a26

I’m running OSX Mavericks 10.9.5

I’ll investigate further but wondered if anyone else has run into this.

Robin

Re: [VOTE] Release Apache Spark 1.3.0 (RC1)

Posted by Mark Hamstra <ma...@clearstorydata.com>.

So what are we expecting of Hive 0.12.0 builds with this RC?  I know not
every combination of Hadoop and Hive versions, etc., can be supported, but
even an example build from the "Building Spark" page isn't looking too good
to me.

Working from f97b0d4, the example build command works: mvn -Pyarn
-Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-0.12.0
-Phive-thriftserver -DskipTests clean package
...but then running the tests results in multiple failures in the Hive and
Hive Thrift Server sub-projects.


On Wed, Feb 18, 2015 at 12:12 AM, Patrick Wendell <pw...@gmail.com>
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.3.0!
>
> The tag to be voted on is v1.3.0-rc1 (commit f97b0d4a):
>
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=f97b0d4a6b26504916816d7aefcf3132cd1da6c2
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc1/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1069/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.3.0-rc1-docs/
>
> Please vote on releasing this package as Apache Spark 1.3.0!
>
> The vote is open until Saturday, February 21, at 08:03 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/
>
> == How can I help test this release? ==
> If you are a Spark user, you can help us test this release by
> taking a Spark 1.2 workload and running on this release candidate,
> then reporting any regressions.
>
> == What justifies a -1 vote for this release? ==
> This vote is happening towards the end of the 1.3 QA period,
> so -1 votes should only occur for significant regressions from 1.2.1.
> Bugs already present in 1.2.X, minor regressions, or bugs related
> to new features will not block this release.
>
> - Patrick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>