You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2016/12/16 05:16:41 UTC

[VOTE] Apache Spark 2.1.0 (RC5)

Please vote on releasing the following candidate as Apache Spark version
2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.0-rc5
(cd0a08361e2526519e7c131c42116bf56fa62c76)

List of JIRA tickets resolved are:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1223/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/


*FAQ*

*How can I help test this release?*

If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.

*What should happen to JIRA tickets still targeting 2.1.0?*

Committers should look at those and triage. Extremely important bug fixes,
documentation, and API tweaks that impact compatibility should be worked on
immediately. Everything else please retarget to 2.1.1 or 2.2.0.

*What happened to RC3/RC5?*

They had issues withe release packaging and as a result were skipped.

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Felix Cheung <fe...@hotmail.com>.

0/+1

Tested a bunch of R package/install cases.
Unfortunately we are still working on SPARK-18817, which looks to be a change when going from Spark 1.6 to 2.0. In that case it won't be a blocker.

_____________________________
From: vaquar khan <va...@gmail.com>>
Sent: Sunday, December 18, 2016 2:33 PM
Subject: Re: [VOTE] Apache Spark 2.1.0 (RC5)
To: Adam Roberts <ar...@uk.ibm.com>>
Cc: Denny Lee <de...@gmail.com>>, Holden Karau <ho...@pigscanfly.ca>>, Liwei Lin <lw...@gmail.com>>, <de...@spark.apache.org>>

+1 (non-binding)

Regards,
vaquar khan

On Sun, Dec 18, 2016 at 2:33 PM, Adam Roberts <AR...@uk.ibm.com>> wrote:
+1 (non-binding)

Functional: looks good, tested with OpenJDK 8 (1.8.0_111) and IBM's latest SDK for Java (8 SR3 FP21).

Tests run clean on Ubuntu 16 04, 14 04, SUSE 12, CentOS 7.2 on x86 and IBM specific platforms including big-endian. On slower machines I see these failing but nothing to be concerned over (timeouts):

org.apache.spark.DistributedSuite.caching on disk
org.apache.spark.rdd.LocalCheckpointSuite.missing checkpoint block fails with informative message
org.apache.spark.sql.streaming.StreamingAggregationSuite.prune results by current_time, complete mode
org.apache.spark.sql.streaming.StreamingAggregationSuite.prune results by current_date, complete mode
org.apache.spark.sql.hive.HiveSparkSubmitSuite.set hive.metastore.warehouse.dir

Performance vs 2.0.2: lots of improvements seen using the HiBench and SparkSqlPerf benchmarks, tested with a 48 core Intel machine using the Kryo serializer, controlled test environment. These are all open source benchmarks anyone can use and experiment with. Elapsed times measured, + scores are an improvement (so it's that much percent faster) and- scores are used for regressions I'm seeing.

  *   K-means: Java API +22% (100 sec to 78 sec), Scala API+30% (34 seconds to 24 seconds), Python API unchanged
  *   PageRank: minor improvement from 40 seconds to 38 seconds,+5%
  *   Sort: minor improvement, 10.8 seconds to 9.8 seconds,+10%
  *   WordCount: unchanged
  *   Bayes: mixed bag, sometimes much slower (95 sec to 140 sec) which is-47%, other times marginally faster by 15%, something to keep an eye on
  *   Terasort: +18% (39 seconds to 32 seconds) with the Java/Scala APIs

For TPC-DS SQL queries the results are a mixed bag again, I see > 10% boosts for q9,  q68, q75, q96 and > 10% slowdowns for q7, q39a, q43, q52, q57, q89. Five iterations, average times compared, only changing which version of Spark we're using

From:        Holden Karau <ho...@pigscanfly.ca>>
To:        Denny Lee <de...@gmail.com>>, Liwei Lin <lw...@gmail.com>>, "dev@spark.apache.org<ma...@spark.apache.org>" <de...@spark.apache.org>>
Date:        18/12/2016 20:05
Subject:        Re: [VOTE] Apache Spark 2.1.0 (RC5)
________________________________

+1 (non-binding) - checked Python artifacts with virtual env.

On Sun, Dec 18, 2016 at 11:42 AM Denny Lee <de...@gmail.com>> wrote:
+1 (non-binding)

On Sat, Dec 17, 2016 at 11:45 PM Liwei Lin <lw...@gmail.com>> wrote:
+1

Cheers,
Liwei

On Sat, Dec 17, 2016 at 10:29 AM, Yuming Wang <wg...@gmail.com>> wrote:
I hope https://github.com/apache/spark/pull/16252 can be fixed until release 2.1.0. It's a fix for broadcast cannot fit in memory.

On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley <jo...@databricks.com>> wrote:
+1

On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <hv...@databricks.com>> wrote:
+1

On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li <ga...@gmail.com>> wrote:
+1

Xiao Li

2016-12-16 12:19 GMT-08:00 Felix Cheung <fe...@hotmail.com>>:

For R we have a license field in the DESCRIPTION, and this is standard practice (and requirement) for R packages.

https://cran.r-project.org/doc/manuals/R-exts.html#Licensing

________________________________

From: Sean Owen <so...@cloudera.com>>

Sent: Friday, December 16, 2016 9:57:15 AM

To: Reynold Xin; dev@spark.apache.org<ma...@spark.apache.org>

Subject: Re: [VOTE] Apache Spark 2.1.0 (RC5)

(If you have a template for these emails, maybe update it to use https links. They work for

apache.org<http://apache.org/> domains. After all we are asking people to verify the integrity of release artifacts, so it might as well be secure.)

(Also the new archives use .tar.gz instead of .tgz like the others. No big deal, my OCD eye just noticed it.)

I don't see an Apache license / notice for the Pyspark or SparkR artifacts. It would be good practice to include this in a convenience binary. I'm not sure if it's strictly mandatory, but something to adjust in any event. I think that's all there is to

do for SparkR. For Pyspark, which packages a bunch of dependencies, it does include the licenses (good) but I think it should include the NOTICE file.

This is the first time I recall getting 0 test failures off the bat!

I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.

I think I'd +1 this therefore unless someone knows that the license issue above is real and a blocker.

On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com>> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.0

[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see

http://spark.apache.org/

The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116bf56fa62c76)

List of JIRA tickets resolved are:  https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0

The release files, including signatures, digests, etc. can be found at:

http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/

Release artifacts are signed with the following key:

https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:

https://repository.apache.org/content/repositories/orgapachespark-1223/

The documentation corresponding to this release can be found at:

http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/

FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.

What happened to RC3/RC5?

They had issues withe release packaging and as a result were skipped.

--
Herman van Hövell
Software Engineer
Databricks Inc.
hvanhovell@databricks.com<ma...@databricks.com>
+31 6 420 590 27
databricks.com<http://databricks.com/>
<http://databricks.com/>

--
Joseph Bradley
Software Engineer - Machine Learning
Databricks, Inc.
<http://databricks.com/>

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
Regards,
Vaquar Khan
+1 -224-436-0783

IT Architect / Lead Consultant
Greater Chicago

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by vaquar khan <va...@gmail.com>.

+1 (non-binding)

Regards,
vaquar khan

On Sun, Dec 18, 2016 at 2:33 PM, Adam Roberts <AR...@uk.ibm.com> wrote:

> +1 (non-binding)
>
> *Functional*: looks good, tested with OpenJDK 8 (1.8.0_111) and IBM's
> latest SDK for Java (8 SR3 FP21).
>
> Tests run clean on Ubuntu 16 04, 14 04, SUSE 12, CentOS 7.2 on x86 and IBM
> specific platforms including big-endian. On slower machines I see these
> failing but nothing to be concerned over (timeouts):
>
> *org.apache.spark.DistributedSuite.caching on disk*
> *org.apache.spark.rdd.LocalCheckpointSuite.missing checkpoint block fails
> with informative message*
> *org.apache.spark.sql.streaming.StreamingAggregationSuite.prune results by
> current_time, complete mode*
> *org.apache.spark.sql.streaming.StreamingAggregationSuite.prune results by
> current_date, complete mode*
> *org.apache.spark.sql.hive.HiveSparkSubmitSuite.set
> hive.metastore.warehouse.dir*
>
> *Performance vs 2.0.2:* lots of improvements seen using the HiBench and
> SparkSqlPerf benchmarks, tested with a 48 core Intel machine using the Kryo
> serializer, controlled test environment. These are all open source
> benchmarks anyone can use and experiment with. Elapsed times measured, *+
> scores* are an improvement (so it's that much percent faster) and *-
> scores* are used for regressions I'm seeing.
>
>    - K-means: Java API *+22%* (100 sec to 78 sec), Scala API *+30%* (34
>    seconds to 24 seconds), Python API unchanged
>    - PageRank: minor improvement from 40 seconds to 38 seconds, *+5%*
>    - Sort: minor improvement, 10.8 seconds to 9.8 seconds, *+10%*
>    - WordCount: unchanged
>    - Bayes: mixed bag, sometimes much slower (95 sec to 140 sec) which is
>    *-47%*, other times marginally faster by *15%*, something to keep an
>    eye on
>    - Terasort: *+18%* (39 seconds to 32 seconds) with the Java/Scala APIs
>
>
> For TPC-DS SQL queries the results are a mixed bag again, I see > 10%
> boosts for q9,  q68, q75, q96 and > 10% slowdowns for q7, q39a, q43, q52,
> q57, q89. Five iterations, average times compared, only changing which
> version of Spark we're using
>
>
>
> From:        Holden Karau <ho...@pigscanfly.ca>
> To:        Denny Lee <de...@gmail.com>, Liwei Lin <lw...@gmail.com>,
> "dev@spark.apache.org" <de...@spark.apache.org>
> Date:        18/12/2016 20:05
> Subject:        Re: [VOTE] Apache Spark 2.1.0 (RC5)
> ------------------------------
>
>
>
> +1 (non-binding) - checked Python artifacts with virtual env.
>
> On Sun, Dec 18, 2016 at 11:42 AM Denny Lee <*denny.g.lee@gmail.com*
> <de...@gmail.com>> wrote:
> +1 (non-binding)
>
>
> On Sat, Dec 17, 2016 at 11:45 PM Liwei Lin <*lwlin7@gmail.com*
> <lw...@gmail.com>> wrote:
> +1
>
> Cheers,
> Liwei
>
>
>
> On Sat, Dec 17, 2016 at 10:29 AM, Yuming Wang <*wgyumg@gmail.com*
> <wg...@gmail.com>> wrote:
> I hope *https://github.com/apache/spark/pull/16252*
> <https://github.com/apache/spark/pull/16252> can be fixed until release
> 2.1.0. It's a fix for broadcast cannot fit in memory.
>
> On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley <*joseph@databricks.com*
> <jo...@databricks.com>> wrote:
> +1
>
> On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
> *hvanhovell@databricks.com* <hv...@databricks.com>> wrote:
> +1
>
> On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li <*gatorsmile@gmail.com*
> <ga...@gmail.com>> wrote:
> +1
>
> Xiao Li
>
> 2016-12-16 12:19 GMT-08:00 Felix Cheung <*felixcheung_m@hotmail.com*
> <fe...@hotmail.com>>:
>
>
>
>
>
>
>
>
>
>
>
>
> For R we have a license field in the DESCRIPTION, and this is standard
> practice (and requirement) for R packages.
>
>
>
>
>
>
>
> *https://cran.r-project.org/doc/manuals/R-exts.html#Licensing*
> <https://cran.r-project.org/doc/manuals/R-exts.html#Licensing>
>
>
>
>
>
>
>
> ------------------------------
>
>
> *From:* Sean Owen <*sowen@cloudera.com* <so...@cloudera.com>>
>
>
> * Sent:* Friday, December 16, 2016 9:57:15 AM
>
>
> * To:* Reynold Xin; *dev@spark.apache.org* <de...@spark.apache.org>
>
>
> * Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>
>
>
>
>
>
>
>
>
>
> (If you have a template for these emails, maybe update it to use https
> links. They work for
>
> *apache.org* <http://apache.org/> domains. After all we are asking people
> to verify the integrity of release artifacts, so it might as well be
> secure.)
>
>
>
>
>
>
>
> (Also the new archives use .tar.gz instead of .tgz like the others. No big
> deal, my OCD eye just noticed it.)
>
>
>
>
>
>
>
> I don't see an Apache license / notice for the Pyspark or SparkR
> artifacts. It would be good practice to include this in a convenience
> binary. I'm not sure if it's strictly mandatory, but something to adjust in
> any event. I think that's all there is to
>
> do for SparkR. For Pyspark, which packages a bunch of dependencies, it
> does include the licenses (good) but I think it should include the NOTICE
> file.
>
>
>
>
>
>
>
> This is the first time I recall getting 0 test failures off the bat!
>
>
> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>
>
>
>
>
>
>
> I think I'd +1 this therefore unless someone knows that the license issue
> above is real and a blocker.
>
>
>
>
>
>
>
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <*rxin@databricks.com*
> <rx...@databricks.com>> wrote:
>
>
>
>
>
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
>
>
>
>
>
>
> [ ] +1 Release this package as Apache Spark 2.1.0
>
>
> [ ] -1 Do not release this package because ...
>
>
>
>
>
>
>
>
>
>
>
>
> To learn more about Apache Spark, please see
>
> * http://spark.apache.org/* <http://spark.apache.org/>
>
>
>
>
>
>
>
> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116b
> f56fa62c76)
>
>
>
>
>
>
>
> List of JIRA tickets resolved are:
> *https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0*
> <https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0>
>
>
>
>
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
>
> *http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/*
> <http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/>
>
>
>
>
>
>
>
> Release artifacts are signed with the following key:
>
>
> *https://people.apache.org/keys/committer/pwendell.asc*
> <https://people.apache.org/keys/committer/pwendell.asc>
>
>
>
>
>
>
>
> The staging repository for this release can be found at:
>
>
> *https://repository.apache.org/content/repositories/orgapachespark-1223/*
> <https://repository.apache.org/content/repositories/orgapachespark-1223/>
>
>
>
>
>
>
>
> The documentation corresponding to this release can be found at:
>
>
> *http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/*
> <http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/>
>
>
>
>
>
>
>
>
>
>
>
>
> *FAQ*
>
>
>
>
>
>
>
> *How can I help test this release?*
>
>
>
>
>
>
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
>
>
>
>
>
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
>
>
>
>
>
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
>
>
>
>
>
>
> *What happened to RC3/RC5?*
>
>
>
>
>
>
>
> They had issues withe release packaging and as a result were skipped.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Herman van Hövell
> Software Engineer
> Databricks Inc.
> *hvanhovell@databricks.com* <hv...@databricks.com>
> +31 6 420 590 27
> *databricks.com* <http://databricks.com/>
> <http://databricks.com/>
>
>
>
>
>
>
>
>
> --
> Joseph Bradley
> Software Engineer - Machine Learning
> Databricks, Inc.
> <http://databricks.com/>
>
>
>
>
>
>
>
>
>
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>


-- 
Regards,
Vaquar Khan
+1 -224-436-0783

IT Architect / Lead Consultant
Greater Chicago

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Adam Roberts <AR...@uk.ibm.com>.

+1 (non-binding)

Functional: looks good, tested with OpenJDK 8 (1.8.0_111) and IBM's latest 
SDK for Java (8 SR3 FP21).

Tests run clean on Ubuntu 16 04, 14 04, SUSE 12, CentOS 7.2 on x86 and IBM 
specific platforms including big-endian. On slower machines I see these 
failing but nothing to be concerned over (timeouts):

org.apache.spark.DistributedSuite.caching on disk
org.apache.spark.rdd.LocalCheckpointSuite.missing checkpoint block fails 
with informative message
org.apache.spark.sql.streaming.StreamingAggregationSuite.prune results by 
current_time, complete mode
org.apache.spark.sql.streaming.StreamingAggregationSuite.prune results by 
current_date, complete mode
org.apache.spark.sql.hive.HiveSparkSubmitSuite.set 
hive.metastore.warehouse.dir

Performance vs 2.0.2: lots of improvements seen using the HiBench and 
SparkSqlPerf benchmarks, tested with a 48 core Intel machine using the 
Kryo serializer, controlled test environment. These are all open source 
benchmarks anyone can use and experiment with. Elapsed times measured, + 
scores are an improvement (so it's that much percent faster) and - scores 
are used for regressions I'm seeing.

K-means: Java API +22% (100 sec to 78 sec), Scala API +30% (34 seconds to 
24 seconds), Python API unchanged
PageRank: minor improvement from 40 seconds to 38 seconds, +5%
Sort: minor improvement, 10.8 seconds to 9.8 seconds, +10%
WordCount: unchanged
Bayes: mixed bag, sometimes much slower (95 sec to 140 sec) which is -47%, 
other times marginally faster by 15%, something to keep an eye on
Terasort: +18% (39 seconds to 32 seconds) with the Java/Scala APIs

For TPC-DS SQL queries the results are a mixed bag again, I see > 10% 
boosts for q9,  q68, q75, q96 and > 10% slowdowns for q7, q39a, q43, q52, 
q57, q89. Five iterations, average times compared, only changing which 
version of Spark we're using



From:   Holden Karau <ho...@pigscanfly.ca>
To:     Denny Lee <de...@gmail.com>, Liwei Lin <lw...@gmail.com>, 
"dev@spark.apache.org" <de...@spark.apache.org>
Date:   18/12/2016 20:05
Subject:        Re: [VOTE] Apache Spark 2.1.0 (RC5)



+1 (non-binding) - checked Python artifacts with virtual env.

On Sun, Dec 18, 2016 at 11:42 AM Denny Lee <de...@gmail.com> wrote:
+1 (non-binding)


On Sat, Dec 17, 2016 at 11:45 PM Liwei Lin <lw...@gmail.com> wrote:
+1

Cheers,
Liwei



On Sat, Dec 17, 2016 at 10:29 AM, Yuming Wang <wg...@gmail.com> wrote:
I hope https://github.com/apache/spark/pull/16252 can be fixed until 
release 2.1.0. It's a fix for broadcast cannot fit in memory.

On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley <jo...@databricks.com> 
wrote:
+1

On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
hvanhovell@databricks.com> wrote:
+1

On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li <ga...@gmail.com> wrote:
+1

Xiao Li

2016-12-16 12:19 GMT-08:00 Felix Cheung <fe...@hotmail.com>:












For R we have a license field in the DESCRIPTION, and this is standard 
practice (and requirement) for R packages.







https://cran.r-project.org/doc/manuals/R-exts.html#Licensing









From: Sean Owen <so...@cloudera.com>


Sent: Friday, December 16, 2016 9:57:15 AM


To: Reynold Xin; dev@spark.apache.org


Subject: Re: [VOTE] Apache Spark 2.1.0 (RC5)

 








(If you have a template for these emails, maybe update it to use https 
links. They work for

apache.org domains. After all we are asking people to verify the integrity 
of release artifacts, so it might as well be secure.)







(Also the new archives use .tar.gz instead of .tgz like the others. No big 
deal, my OCD eye just noticed it.)







I don't see an Apache license / notice for the Pyspark or SparkR 
artifacts. It would be good practice to include this in a convenience 
binary. I'm not sure if it's strictly mandatory, but something to adjust 
in any event. I think that's all there is to

do for SparkR. For Pyspark, which packages a bunch of dependencies, it 
does include the licenses (good) but I think it should include the NOTICE 
file.







This is the first time I recall getting 0 test failures off the bat!


I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.







I think I'd +1 this therefore unless someone knows that the license issue 
above is real and a blocker.







On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:








Please vote on releasing the following candidate as Apache Spark version 
2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and 
passes if a majority of at least 3 +1 PMC votes are cast.







[ ] +1 Release this package as Apache Spark 2.1.0


[ ] -1 Do not release this package because ...












To learn more about Apache Spark, please see 

http://spark.apache.org/







The tag to be voted on is v2.1.0-rc5 
(cd0a08361e2526519e7c131c42116bf56fa62c76)







List of JIRA tickets resolved are:  
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0







The release files, including signatures, digests, etc. can be found at:


http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/







Release artifacts are signed with the following key:


https://people.apache.org/keys/committer/pwendell.asc







The staging repository for this release can be found at:


https://repository.apache.org/content/repositories/orgapachespark-1223/







The documentation corresponding to this release can be found at:


http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/












FAQ







How can I help test this release?







If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then 
reporting any regressions.







What should happen to JIRA tickets still targeting 2.1.0?







Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked 
on immediately. Everything else please retarget to 2.1.1 or 2.2.0.







What happened to RC3/RC5?







They had issues withe release packaging and as a result were skipped.



























-- 
Herman van Hövell
Software Engineer
Databricks Inc.
hvanhovell@databricks.com
+31 6 420 590 27
databricks.com








-- 
Joseph Bradley
Software Engineer - Machine Learning
Databricks, Inc.












Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Holden Karau <ho...@pigscanfly.ca>.

+1 (non-binding) - checked Python artifacts with virtual env.

On Sun, Dec 18, 2016 at 11:42 AM Denny Lee <de...@gmail.com> wrote:

> +1 (non-binding)
>
>
> On Sat, Dec 17, 2016 at 11:45 PM Liwei Lin <lw...@gmail.com> wrote:
>
> +1
>
> Cheers,
> Liwei
>
>
>
> On Sat, Dec 17, 2016 at 10:29 AM, Yuming Wang <wg...@gmail.com> wrote:
>
> I hope https://github.com/apache/spark/pull/16252 can be fixed until
> release 2.1.0. It's a fix for broadcast cannot fit in memory.
>
> On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley <jo...@databricks.com>
> wrote:
>
> +1
>
> On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
> hvanhovell@databricks.com> wrote:
>
> +1
>
> On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li <ga...@gmail.com> wrote:
>
> +1
>
> Xiao Li
>
> 2016-12-16 12:19 GMT-08:00 Felix Cheung <fe...@hotmail.com>:
>
>
>
>
>
>
>
>
>
>
>
>
>
> For R we have a license field in the DESCRIPTION, and this is standard
> practice (and requirement) for R packages.
>
>
>
>
>
>
>
> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>
>
>
>
>
>
>
> ------------------------------
>
>
> *From:* Sean Owen <so...@cloudera.com>
>
>
> *Sent:* Friday, December 16, 2016 9:57:15 AM
>
>
> *To:* Reynold Xin; dev@spark.apache.org
>
>
> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>
>
>
>
>
>
>
>
>
>
> (If you have a template for these emails, maybe update it to use https
> links. They work for
>
> apache.org domains. After all we are asking people to verify the
> integrity of release artifacts, so it might as well be secure.)
>
>
>
>
>
>
>
> (Also the new archives use .tar.gz instead of .tgz like the others. No big
> deal, my OCD eye just noticed it.)
>
>
>
>
>
>
>
> I don't see an Apache license / notice for the Pyspark or SparkR
> artifacts. It would be good practice to include this in a convenience
> binary. I'm not sure if it's strictly mandatory, but something to adjust in
> any event. I think that's all there is to
>
> do for SparkR. For Pyspark, which packages a bunch of dependencies, it
> does include the licenses (good) but I think it should include the NOTICE
> file.
>
>
>
>
>
>
>
> This is the first time I recall getting 0 test failures off the bat!
>
>
> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>
>
>
>
>
>
>
> I think I'd +1 this therefore unless someone knows that the license issue
> above is real and a blocker.
>
>
>
>
>
>
>
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:
>
>
>
>
>
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
>
>
>
>
>
>
> [ ] +1 Release this package as Apache Spark 2.1.0
>
>
> [ ] -1 Do not release this package because ...
>
>
>
>
>
>
>
>
>
>
>
>
> To learn more about Apache Spark, please see
>
> http://spark.apache.org/
>
>
>
>
>
>
>
> The tag to be voted on is v2.1.0-rc5
> (cd0a08361e2526519e7c131c42116bf56fa62c76)
>
>
>
>
>
>
>
> List of JIRA tickets resolved are:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
>
>
>
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
>
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
>
>
>
>
>
>
> Release artifacts are signed with the following key:
>
>
> https://people.apache.org/keys/committer/pwendell.asc
>
>
>
>
>
>
>
> The staging repository for this release can be found at:
>
>
> https://repository.apache.org/content/repositories/orgapachespark-1223/
>
>
>
>
>
>
>
> The documentation corresponding to this release can be found at:
>
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>
>
>
>
>
>
>
>
>
>
>
>
> *FAQ*
>
>
>
>
>
>
>
> *How can I help test this release?*
>
>
>
>
>
>
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
>
>
>
>
>
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
>
>
>
>
>
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
>
>
>
>
>
>
> *What happened to RC3/RC5?*
>
>
>
>
>
>
>
> They had issues withe release packaging and as a result were skipped.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Herman van Hövell
>
> Software Engineer
>
> Databricks Inc.
>
> hvanhovell@databricks.com
>
> +31 6 420 590 27
>
> databricks.com
>
> [image: http://databricks.com] <http://databricks.com/>
>
>
>
>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] <http://databricks.com/>
>
>
>
>
>
>
>
>
>
>
>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Denny Lee <de...@gmail.com>.

+1 (non-binding)


On Sat, Dec 17, 2016 at 11:45 PM Liwei Lin <lw...@gmail.com> wrote:

> +1
>
> Cheers,
> Liwei
>
> On Sat, Dec 17, 2016 at 10:29 AM, Yuming Wang <wg...@gmail.com> wrote:
>
> I hope https://github.com/apache/spark/pull/16252 can be fixed until
> release 2.1.0. It's a fix for broadcast cannot fit in memory.
>
> On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley <jo...@databricks.com>
> wrote:
>
> +1
>
> On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
> hvanhovell@databricks.com> wrote:
>
> +1
>
> On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li <ga...@gmail.com> wrote:
>
> +1
>
> Xiao Li
>
> 2016-12-16 12:19 GMT-08:00 Felix Cheung <fe...@hotmail.com>:
>
> For R we have a license field in the DESCRIPTION, and this is standard
> practice (and requirement) for R packages.
>
> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>
> ------------------------------
> *From:* Sean Owen <so...@cloudera.com>
> *Sent:* Friday, December 16, 2016 9:57:15 AM
> *To:* Reynold Xin; dev@spark.apache.org
> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>
> (If you have a template for these emails, maybe update it to use https
> links. They work for apache.org domains. After all we are asking people
> to verify the integrity of release artifacts, so it might as well be
> secure.)
>
> (Also the new archives use .tar.gz instead of .tgz like the others. No big
> deal, my OCD eye just noticed it.)
>
> I don't see an Apache license / notice for the Pyspark or SparkR
> artifacts. It would be good practice to include this in a convenience
> binary. I'm not sure if it's strictly mandatory, but something to adjust in
> any event. I think that's all there is to do for SparkR. For Pyspark, which
> packages a bunch of dependencies, it does include the licenses (good) but I
> think it should include the NOTICE file.
>
> This is the first time I recall getting 0 test failures off the bat!
> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>
> I think I'd +1 this therefore unless someone knows that the license issue
> above is real and a blocker.
>
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc5
> (cd0a08361e2526519e7c131c42116bf56fa62c76)
>
> List of JIRA tickets resolved are:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1223/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
> *What happened to RC3/RC5?*
>
> They had issues withe release packaging and as a result were skipped.
>
>
>
>
>
> --
>
> Herman van Hövell
>
> Software Engineer
>
> Databricks Inc.
>
> hvanhovell@databricks.com
>
> +31 6 420 590 27
>
> databricks.com
>
> [image: http://databricks.com] <http://databricks.com/>
>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] <http://databricks.com/>
>
>
>
>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Liwei Lin <lw...@gmail.com>.

+1

Cheers,
Liwei

On Sat, Dec 17, 2016 at 10:29 AM, Yuming Wang <wg...@gmail.com> wrote:

> I hope https://github.com/apache/spark/pull/16252 can be fixed until
> release 2.1.0. It's a fix for broadcast cannot fit in memory.
>
> On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley <jo...@databricks.com>
> wrote:
>
>> +1
>>
>> On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
>> hvanhovell@databricks.com> wrote:
>>
>>> +1
>>>
>>> On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li <ga...@gmail.com> wrote:
>>>
>>>> +1
>>>>
>>>> Xiao Li
>>>>
>>>> 2016-12-16 12:19 GMT-08:00 Felix Cheung <fe...@hotmail.com>:
>>>>
>>>>> For R we have a license field in the DESCRIPTION, and this is standard
>>>>> practice (and requirement) for R packages.
>>>>>
>>>>> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>>>>>
>>>>> ------------------------------
>>>>> *From:* Sean Owen <so...@cloudera.com>
>>>>> *Sent:* Friday, December 16, 2016 9:57:15 AM
>>>>> *To:* Reynold Xin; dev@spark.apache.org
>>>>> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>>>>>
>>>>> (If you have a template for these emails, maybe update it to use https
>>>>> links. They work for apache.org domains. After all we are asking
>>>>> people to verify the integrity of release artifacts, so it might as well be
>>>>> secure.)
>>>>>
>>>>> (Also the new archives use .tar.gz instead of .tgz like the others. No
>>>>> big deal, my OCD eye just noticed it.)
>>>>>
>>>>> I don't see an Apache license / notice for the Pyspark or SparkR
>>>>> artifacts. It would be good practice to include this in a convenience
>>>>> binary. I'm not sure if it's strictly mandatory, but something to adjust in
>>>>> any event. I think that's all there is to do for SparkR. For Pyspark, which
>>>>> packages a bunch of dependencies, it does include the licenses (good) but I
>>>>> think it should include the NOTICE file.
>>>>>
>>>>> This is the first time I recall getting 0 test failures off the bat!
>>>>> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>>>>>
>>>>> I think I'd +1 this therefore unless someone knows that the license
>>>>> issue above is real and a blocker.
>>>>>
>>>>> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>> version 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT
>>>>>> and passes if a majority of at least 3 +1 PMC votes are cast.
>>>>>>
>>>>>> [ ] +1 Release this package as Apache Spark 2.1.0
>>>>>> [ ] -1 Do not release this package because ...
>>>>>>
>>>>>>
>>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>>
>>>>>> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116
>>>>>> bf56fa62c76)
>>>>>>
>>>>>> List of JIRA tickets resolved are:  https://issues.apache.org/jir
>>>>>> a/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>>>>>>
>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>> at:
>>>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>>>>>>
>>>>>> Release artifacts are signed with the following key:
>>>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>>>
>>>>>> The staging repository for this release can be found at:
>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>> spark-1223/
>>>>>>
>>>>>> The documentation corresponding to this release can be found at:
>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.
>>>>>> 0-rc5-docs/
>>>>>>
>>>>>>
>>>>>> *FAQ*
>>>>>>
>>>>>> *How can I help test this release?*
>>>>>>
>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>> an existing Spark workload and running on this release candidate, then
>>>>>> reporting any regressions.
>>>>>>
>>>>>> *What should happen to JIRA tickets still targeting 2.1.0?*
>>>>>>
>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>> fixes, documentation, and API tweaks that impact compatibility should be
>>>>>> worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>>>>>>
>>>>>> *What happened to RC3/RC5?*
>>>>>>
>>>>>> They had issues withe release packaging and as a result were skipped.
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Herman van Hövell
>>>
>>> Software Engineer
>>>
>>> Databricks Inc.
>>>
>>> hvanhovell@databricks.com
>>>
>>> +31 6 420 590 27
>>>
>>> databricks.com
>>>
>>> [image: http://databricks.com] <http://databricks.com/>
>>>
>>
>>
>>
>> --
>>
>> Joseph Bradley
>>
>> Software Engineer - Machine Learning
>>
>> Databricks, Inc.
>>
>> [image: http://databricks.com] <http://databricks.com/>
>>
>
>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Yuming Wang <wg...@gmail.com>.

I hope https://github.com/apache/spark/pull/16252 can be fixed until
release 2.1.0. It's a fix for broadcast cannot fit in memory.

On Sat, Dec 17, 2016 at 10:23 AM, Joseph Bradley <jo...@databricks.com>
wrote:

> +1
>
> On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
> hvanhovell@databricks.com> wrote:
>
>> +1
>>
>> On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li <ga...@gmail.com> wrote:
>>
>>> +1
>>>
>>> Xiao Li
>>>
>>> 2016-12-16 12:19 GMT-08:00 Felix Cheung <fe...@hotmail.com>:
>>>
>>>> For R we have a license field in the DESCRIPTION, and this is standard
>>>> practice (and requirement) for R packages.
>>>>
>>>> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>>>>
>>>> ------------------------------
>>>> *From:* Sean Owen <so...@cloudera.com>
>>>> *Sent:* Friday, December 16, 2016 9:57:15 AM
>>>> *To:* Reynold Xin; dev@spark.apache.org
>>>> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>>>>
>>>> (If you have a template for these emails, maybe update it to use https
>>>> links. They work for apache.org domains. After all we are asking
>>>> people to verify the integrity of release artifacts, so it might as well be
>>>> secure.)
>>>>
>>>> (Also the new archives use .tar.gz instead of .tgz like the others. No
>>>> big deal, my OCD eye just noticed it.)
>>>>
>>>> I don't see an Apache license / notice for the Pyspark or SparkR
>>>> artifacts. It would be good practice to include this in a convenience
>>>> binary. I'm not sure if it's strictly mandatory, but something to adjust in
>>>> any event. I think that's all there is to do for SparkR. For Pyspark, which
>>>> packages a bunch of dependencies, it does include the licenses (good) but I
>>>> think it should include the NOTICE file.
>>>>
>>>> This is the first time I recall getting 0 test failures off the bat!
>>>> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>>>>
>>>> I think I'd +1 this therefore unless someone knows that the license
>>>> issue above is real and a blocker.
>>>>
>>>> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT
>>>>> and passes if a majority of at least 3 +1 PMC votes are cast.
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 2.1.0
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>>
>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>
>>>>> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116
>>>>> bf56fa62c76)
>>>>>
>>>>> List of JIRA tickets resolved are:  https://issues.apache.org/jir
>>>>> a/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>>>>>
>>>>> Release artifacts are signed with the following key:
>>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>> spark-1223/
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.
>>>>> 0-rc5-docs/
>>>>>
>>>>>
>>>>> *FAQ*
>>>>>
>>>>> *How can I help test this release?*
>>>>>
>>>>> If you are a Spark user, you can help us test this release by taking
>>>>> an existing Spark workload and running on this release candidate, then
>>>>> reporting any regressions.
>>>>>
>>>>> *What should happen to JIRA tickets still targeting 2.1.0?*
>>>>>
>>>>> Committers should look at those and triage. Extremely important bug
>>>>> fixes, documentation, and API tweaks that impact compatibility should be
>>>>> worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>>>>>
>>>>> *What happened to RC3/RC5?*
>>>>>
>>>>> They had issues withe release packaging and as a result were skipped.
>>>>>
>>>>>
>>>
>>
>>
>> --
>>
>> Herman van Hövell
>>
>> Software Engineer
>>
>> Databricks Inc.
>>
>> hvanhovell@databricks.com
>>
>> +31 6 420 590 27
>>
>> databricks.com
>>
>> [image: http://databricks.com] <http://databricks.com/>
>>
>
>
>
> --
>
> Joseph Bradley
>
> Software Engineer - Machine Learning
>
> Databricks, Inc.
>
> [image: http://databricks.com] <http://databricks.com/>
>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Joseph Bradley <jo...@databricks.com>.

+1

On Fri, Dec 16, 2016 at 3:21 PM, Herman van Hövell tot Westerflier <
hvanhovell@databricks.com> wrote:

> +1
>
> On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li <ga...@gmail.com> wrote:
>
>> +1
>>
>> Xiao Li
>>
>> 2016-12-16 12:19 GMT-08:00 Felix Cheung <fe...@hotmail.com>:
>>
>>> For R we have a license field in the DESCRIPTION, and this is standard
>>> practice (and requirement) for R packages.
>>>
>>> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>>>
>>> ------------------------------
>>> *From:* Sean Owen <so...@cloudera.com>
>>> *Sent:* Friday, December 16, 2016 9:57:15 AM
>>> *To:* Reynold Xin; dev@spark.apache.org
>>> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>>>
>>> (If you have a template for these emails, maybe update it to use https
>>> links. They work for apache.org domains. After all we are asking people
>>> to verify the integrity of release artifacts, so it might as well be
>>> secure.)
>>>
>>> (Also the new archives use .tar.gz instead of .tgz like the others. No
>>> big deal, my OCD eye just noticed it.)
>>>
>>> I don't see an Apache license / notice for the Pyspark or SparkR
>>> artifacts. It would be good practice to include this in a convenience
>>> binary. I'm not sure if it's strictly mandatory, but something to adjust in
>>> any event. I think that's all there is to do for SparkR. For Pyspark, which
>>> packages a bunch of dependencies, it does include the licenses (good) but I
>>> think it should include the NOTICE file.
>>>
>>> This is the first time I recall getting 0 test failures off the bat!
>>> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>>>
>>> I think I'd +1 this therefore unless someone knows that the license
>>> issue above is real and a blocker.
>>>
>>> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:
>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT
>>>> and passes if a majority of at least 3 +1 PMC votes are cast.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 2.1.0
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116
>>>> bf56fa62c76)
>>>>
>>>> List of JIRA tickets resolved are:  https://issues.apache.org/jir
>>>> a/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>>>>
>>>> Release artifacts are signed with the following key:
>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1223/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>>>>
>>>>
>>>> *FAQ*
>>>>
>>>> *How can I help test this release?*
>>>>
>>>> If you are a Spark user, you can help us test this release by taking an
>>>> existing Spark workload and running on this release candidate, then
>>>> reporting any regressions.
>>>>
>>>> *What should happen to JIRA tickets still targeting 2.1.0?*
>>>>
>>>> Committers should look at those and triage. Extremely important bug
>>>> fixes, documentation, and API tweaks that impact compatibility should be
>>>> worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>>>>
>>>> *What happened to RC3/RC5?*
>>>>
>>>> They had issues withe release packaging and as a result were skipped.
>>>>
>>>>
>>
>
>
> --
>
> Herman van Hövell
>
> Software Engineer
>
> Databricks Inc.
>
> hvanhovell@databricks.com
>
> +31 6 420 590 27
>
> databricks.com
>
> [image: http://databricks.com] <http://databricks.com/>
>



-- 

Joseph Bradley

Software Engineer - Machine Learning

Databricks, Inc.

[image: http://databricks.com] <http://databricks.com/>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Herman van Hövell tot Westerflier <hv...@databricks.com>.

+1

On Sat, Dec 17, 2016 at 12:14 AM, Xiao Li <ga...@gmail.com> wrote:

> +1
>
> Xiao Li
>
> 2016-12-16 12:19 GMT-08:00 Felix Cheung <fe...@hotmail.com>:
>
>> For R we have a license field in the DESCRIPTION, and this is standard
>> practice (and requirement) for R packages.
>>
>> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>>
>> ------------------------------
>> *From:* Sean Owen <so...@cloudera.com>
>> *Sent:* Friday, December 16, 2016 9:57:15 AM
>> *To:* Reynold Xin; dev@spark.apache.org
>> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>>
>> (If you have a template for these emails, maybe update it to use https
>> links. They work for apache.org domains. After all we are asking people
>> to verify the integrity of release artifacts, so it might as well be
>> secure.)
>>
>> (Also the new archives use .tar.gz instead of .tgz like the others. No
>> big deal, my OCD eye just noticed it.)
>>
>> I don't see an Apache license / notice for the Pyspark or SparkR
>> artifacts. It would be good practice to include this in a convenience
>> binary. I'm not sure if it's strictly mandatory, but something to adjust in
>> any event. I think that's all there is to do for SparkR. For Pyspark, which
>> packages a bunch of dependencies, it does include the licenses (good) but I
>> think it should include the NOTICE file.
>>
>> This is the first time I recall getting 0 test failures off the bat!
>> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>>
>> I think I'd +1 this therefore unless someone knows that the license issue
>> above is real and a blocker.
>>
>> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
>>> if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.1.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116
>>> bf56fa62c76)
>>>
>>> List of JIRA tickets resolved are:  https://issues.apache.org/jir
>>> a/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1223/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this release?*
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> *What should happen to JIRA tickets still targeting 2.1.0?*
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>>>
>>> *What happened to RC3/RC5?*
>>>
>>> They had issues withe release packaging and as a result were skipped.
>>>
>>>
>


-- 

Herman van Hövell

Software Engineer

Databricks Inc.

hvanhovell@databricks.com

+31 6 420 590 27

databricks.com

[image: http://databricks.com] <http://databricks.com/>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Xiao Li <ga...@gmail.com>.

+1

Xiao Li

2016-12-16 12:19 GMT-08:00 Felix Cheung <fe...@hotmail.com>:

> For R we have a license field in the DESCRIPTION, and this is standard
> practice (and requirement) for R packages.
>
> https://cran.r-project.org/doc/manuals/R-exts.html#Licensing
>
> ------------------------------
> *From:* Sean Owen <so...@cloudera.com>
> *Sent:* Friday, December 16, 2016 9:57:15 AM
> *To:* Reynold Xin; dev@spark.apache.org
> *Subject:* Re: [VOTE] Apache Spark 2.1.0 (RC5)
>
> (If you have a template for these emails, maybe update it to use https
> links. They work for apache.org domains. After all we are asking people
> to verify the integrity of release artifacts, so it might as well be
> secure.)
>
> (Also the new archives use .tar.gz instead of .tgz like the others. No big
> deal, my OCD eye just noticed it.)
>
> I don't see an Apache license / notice for the Pyspark or SparkR
> artifacts. It would be good practice to include this in a convenience
> binary. I'm not sure if it's strictly mandatory, but something to adjust in
> any event. I think that's all there is to do for SparkR. For Pyspark, which
> packages a bunch of dependencies, it does include the licenses (good) but I
> think it should include the NOTICE file.
>
> This is the first time I recall getting 0 test failures off the bat!
> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
>
> I think I'd +1 this therefore unless someone knows that the license issue
> above is real and a blocker.
>
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116b
>> f56fa62c76)
>>
>> List of JIRA tickets resolved are:  https://issues.apache.org/
>> jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1223/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.1.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>>
>> *What happened to RC3/RC5?*
>>
>> They had issues withe release packaging and as a result were skipped.
>>
>>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Felix Cheung <fe...@hotmail.com>.

For R we have a license field in the DESCRIPTION, and this is standard practice (and requirement) for R packages.

https://cran.r-project.org/doc/manuals/R-exts.html#Licensing

________________________________
From: Sean Owen <so...@cloudera.com>
Sent: Friday, December 16, 2016 9:57:15 AM
To: Reynold Xin; dev@spark.apache.org
Subject: Re: [VOTE] Apache Spark 2.1.0 (RC5)

(If you have a template for these emails, maybe update it to use https links. They work for apache.org<http://apache.org> domains. After all we are asking people to verify the integrity of release artifacts, so it might as well be secure.)

(Also the new archives use .tar.gz instead of .tgz like the others. No big deal, my OCD eye just noticed it.)

I don't see an Apache license / notice for the Pyspark or SparkR artifacts. It would be good practice to include this in a convenience binary. I'm not sure if it's strictly mandatory, but something to adjust in any event. I think that's all there is to do for SparkR. For Pyspark, which packages a bunch of dependencies, it does include the licenses (good) but I think it should include the NOTICE file.

This is the first time I recall getting 0 test failures off the bat!
I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.

I think I'd +1 this therefore unless someone knows that the license issue above is real and a blocker.

On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116bf56fa62c76)

List of JIRA tickets resolved are:  https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1223/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/

FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.0?

Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.

What happened to RC3/RC5?

They had issues withe release packaging and as a result were skipped.

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Dongjoon Hyun <do...@apache.org>.

RC5 is also tested on CentOS 6.8, OpenJDK 1.8.0_111, R 3.3.2 with profiles `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver -Psparkr`.

BTW, there still exist five on-going issues in JIRA (with target version 2.1.0).

1. SPARK-16845  org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
2. SPARK-18669 Update Apache docs regard watermarking in Structured Streaming
3. SPARK-18894 Event time watermark delay threshold specified in months or years gives incorrect results
4. SPARK-18899 append data to a bucketed table with mismatched bucketing should fail

+1 with known issues for now.

Bests,
Dongjoon.

On 2016-12-16 09:57 (-0800), Sean Owen <so...@cloudera.com> wrote: 
> (If you have a template for these emails, maybe update it to use https
> links. They work for apache.org domains. After all we are asking people to
> verify the integrity of release artifacts, so it might as well be secure.)
> 
> (Also the new archives use .tar.gz instead of .tgz like the others. No big
> deal, my OCD eye just noticed it.)
> 
> I don't see an Apache license / notice for the Pyspark or SparkR artifacts.
> It would be good practice to include this in a convenience binary. I'm not
> sure if it's strictly mandatory, but something to adjust in any event. I
> think that's all there is to do for SparkR. For Pyspark, which packages a
> bunch of dependencies, it does include the licenses (good) but I think it
> should include the NOTICE file.
> 
> This is the first time I recall getting 0 test failures off the bat!
> I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.
> 
> I think I'd +1 this therefore unless someone knows that the license issue
> above is real and a blocker.
> 
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:
> 
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> > if a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.1.0
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v2.1.0-rc5
> > (cd0a08361e2526519e7c131c42116bf56fa62c76)
> >
> > List of JIRA tickets resolved are:
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1223/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
> >
> >
> > *FAQ*
> >
> > *How can I help test this release?*
> >
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > *What should happen to JIRA tickets still targeting 2.1.0?*
> >
> > Committers should look at those and triage. Extremely important bug fixes,
> > documentation, and API tweaks that impact compatibility should be worked on
> > immediately. Everything else please retarget to 2.1.1 or 2.2.0.
> >
> > *What happened to RC3/RC5?*
> >
> > They had issues withe release packaging and as a result were skipped.
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Sean Owen <so...@cloudera.com>.

(If you have a template for these emails, maybe update it to use https
links. They work for apache.org domains. After all we are asking people to
verify the integrity of release artifacts, so it might as well be secure.)

(Also the new archives use .tar.gz instead of .tgz like the others. No big
deal, my OCD eye just noticed it.)

I don't see an Apache license / notice for the Pyspark or SparkR artifacts.
It would be good practice to include this in a convenience binary. I'm not
sure if it's strictly mandatory, but something to adjust in any event. I
think that's all there is to do for SparkR. For Pyspark, which packages a
bunch of dependencies, it does include the licenses (good) but I think it
should include the NOTICE file.

This is the first time I recall getting 0 test failures off the bat!
I'm using Java 8 / Ubuntu 16 and yarn/hive/hadoop-2.7 profiles.

I think I'd +1 this therefore unless someone knows that the license issue
above is real and a blocker.

On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc5
> (cd0a08361e2526519e7c131c42116bf56fa62c76)
>
> List of JIRA tickets resolved are:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1223/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
> *What happened to RC3/RC5?*
>
> They had issues withe release packaging and as a result were skipped.
>
>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Holden Karau <ho...@pigscanfly.ca>.

Thanks for the specific mention of the new PySpark packaging Shivaram,

For *nix (Linux, Unix, OS X, etc.) Python users interested in helping test
the new artifacts you can do as follows:

Setup PySpark with pip by:

1. Download the artifact from
http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/pyspark-2.1.0+hadoop2.7.tar.gz
2. (Optional): Create a virtual env (e.g. virtualenv /tmp/pysparktest;
source /tmp/pysparktest/bin/activate)
3. (Possibly required depending on pip version): Upgrade pip to a recent
version (e.g. pip install --upgrade pip)
3. Install the package with pip install pyspark-2.1.0+hadoop2.7.tar.gz
4. If you have SPARK_HOME set to any specific path unset it to force the
pip installed pyspark to run with its provided jars

In the future we hope to publish to PyPI allowing you to skip the download
step, but there just wasn't a chance to get that part included for this
release. If everything goes smoothly hopefully we can add that soon (see
SPARK-18128 <https://issues.apache.org/jira/browse/SPARK-18128>) :)

Some things to verify:
1) Verify you can start the PySpark shell (e.g. run pyspark)
2) Verify you can start PySpark from python (e.g. run python, verify you
can import pyspark and construct a SparkContext).
3) Verify you PySpark programs works with pip installed PySpark as well as
regular spark (e.g. spark-submit my-workload.py)
4) Have a different version of Spark downloaded locally as well? Verify
that launches and runs correctly & pip installed PySpark is not taking
precedence (make sure to use the fully qualified path when executing).

Some things that are explicitly not supported in pip installed PySpark:
1) Starting a new standalone cluster with pip installed PySpark (connecting
to an existing standalone cluster is expected to work)
2) non-Python Spark interfaces (e.g. don't pip install pypsark for SparkR,
use the SparkR packaging instead :)).
3) PyPi - if things go well coming in a future release (track the progress
on https://issues.apache.org/jira/browse/SPARK-18128)
4) Python versions prior to 2.7
5) Full Windows support - later follow up task (if your interested in this
please chat with me or see https://issues.apache.org/jira/browse/SPARK-18136
)

Post verification cleanup:
1. Uninstall the pip installed PySpark since it is just an RC and you don't
want it getting in the way later (e.g. pip uninstall pypsark-2.1.0 )
2 (Optional). deactivate your pip environment

If anyone has any questions about the new PySpark packaging I'm more than
happy to chat :)

Cheers,

Holden :)

On Thu, Dec 15, 2016 at 9:44 PM, Reynold Xin <rx...@databricks.com> wrote:

> I'm going to start this with a +1!
>
>
> On Thu, Dec 15, 2016 at 9:42 PM, Shivaram Venkataraman <
> shivaram@eecs.berkeley.edu> wrote:
>
>> In addition to usual binary artifacts, this is the first release where
>> we have installable packages for Python [1] and R [2] that are part of
>> the release.  I'm including instructions to test the R package below.
>> Holden / other Python developers can chime in if there are special
>> instructions to test the pip package.
>>
>> To test the R source package you can follow the following commands.
>> 1. Download the SparkR source package from
>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.
>> 0-rc5-bin/SparkR_2.1.0.tar.gz
>> 2. Install the source package with R CMD INSTALL SparkR_2.1.0.tar.gz
>> 3. As the SparkR package doesn't contain Spark JARs (this is due to
>> package size limits from CRAN), we'll need to run [3]
>> export SPARKR_RELEASE_DOWNLOAD_URL="http://people.apache.org/~pwend
>> ell/spark-releases/spark-2.1.0-rc5-bin/spark-2.1.0-bin-hadoop2.6.tgz"
>> 4. Launch R. You can now use include SparkR with `library(SparkR)` and
>> test it with your applications.
>> 5. Note that the first time a SparkSession is created the binary
>> artifacts will the downloaded.
>>
>> Thanks
>> Shivaram
>>
>> [1] https://issues.apache.org/jira/browse/SPARK-18267
>> [2] https://issues.apache.org/jira/browse/SPARK-18590
>> [3] Note that this isn't required once 2.1.0 has been released as
>> SparkR can automatically resolve and download releases.
>>
>> On Thu, Dec 15, 2016 at 9:16 PM, Reynold Xin <rx...@databricks.com> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and
>> passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.1.0
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.1.0-rc5
>> > (cd0a08361e2526519e7c131c42116bf56fa62c76)
>> >
>> > List of JIRA tickets resolved are:
>> > https://issues.apache.org/jira/issues/?jql=project%20%3D%
>> 20SPARK%20AND%20fixVersion%20%3D%202.1.0
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1223/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>> >
>> >
>> > FAQ
>> >
>> > How can I help test this release?
>> >
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > What should happen to JIRA tickets still targeting 2.1.0?
>> >
>> > Committers should look at those and triage. Extremely important bug
>> fixes,
>> > documentation, and API tweaks that impact compatibility should be
>> worked on
>> > immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>> >
>> > What happened to RC3/RC5?
>> >
>> > They had issues withe release packaging and as a result were skipped.
>> >
>>
>
>

-- 
Twitter: https://twitter.com/holdenkarau

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Reynold Xin <rx...@databricks.com>.

I'm going to start this with a +1!


On Thu, Dec 15, 2016 at 9:42 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> In addition to usual binary artifacts, this is the first release where
> we have installable packages for Python [1] and R [2] that are part of
> the release.  I'm including instructions to test the R package below.
> Holden / other Python developers can chime in if there are special
> instructions to test the pip package.
>
> To test the R source package you can follow the following commands.
> 1. Download the SparkR source package from
> http://people.apache.org/~pwendell/spark-releases/spark-
> 2.1.0-rc5-bin/SparkR_2.1.0.tar.gz
> 2. Install the source package with R CMD INSTALL SparkR_2.1.0.tar.gz
> 3. As the SparkR package doesn't contain Spark JARs (this is due to
> package size limits from CRAN), we'll need to run [3]
> export SPARKR_RELEASE_DOWNLOAD_URL="http://people.apache.org/~
> pwendell/spark-releases/spark-2.1.0-rc5-bin/spark-2.1.0-bin-hadoop2.6.tgz"
> 4. Launch R. You can now use include SparkR with `library(SparkR)` and
> test it with your applications.
> 5. Note that the first time a SparkSession is created the binary
> artifacts will the downloaded.
>
> Thanks
> Shivaram
>
> [1] https://issues.apache.org/jira/browse/SPARK-18267
> [2] https://issues.apache.org/jira/browse/SPARK-18590
> [3] Note that this isn't required once 2.1.0 has been released as
> SparkR can automatically resolve and download releases.
>
> On Thu, Dec 15, 2016 at 9:16 PM, Reynold Xin <rx...@databricks.com> wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and
> passes
> > if a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.1.0
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v2.1.0-rc5
> > (cd0a08361e2526519e7c131c42116bf56fa62c76)
> >
> > List of JIRA tickets resolved are:
> > https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1223/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
> >
> >
> > FAQ
> >
> > How can I help test this release?
> >
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > What should happen to JIRA tickets still targeting 2.1.0?
> >
> > Committers should look at those and triage. Extremely important bug
> fixes,
> > documentation, and API tweaks that impact compatibility should be worked
> on
> > immediately. Everything else please retarget to 2.1.1 or 2.2.0.
> >
> > What happened to RC3/RC5?
> >
> > They had issues withe release packaging and as a result were skipped.
> >
>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

In addition to usual binary artifacts, this is the first release where
we have installable packages for Python [1] and R [2] that are part of
the release.  I'm including instructions to test the R package below.
Holden / other Python developers can chime in if there are special
instructions to test the pip package.

To test the R source package you can follow the following commands.
1. Download the SparkR source package from
http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/SparkR_2.1.0.tar.gz
2. Install the source package with R CMD INSTALL SparkR_2.1.0.tar.gz
3. As the SparkR package doesn't contain Spark JARs (this is due to
package size limits from CRAN), we'll need to run [3]
export SPARKR_RELEASE_DOWNLOAD_URL="http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/spark-2.1.0-bin-hadoop2.6.tgz"
4. Launch R. You can now use include SparkR with `library(SparkR)` and
test it with your applications.
5. Note that the first time a SparkSession is created the binary
artifacts will the downloaded.

Thanks
Shivaram

[1] https://issues.apache.org/jira/browse/SPARK-18267
[2] https://issues.apache.org/jira/browse/SPARK-18590
[3] Note that this isn't required once 2.1.0 has been released as
SparkR can automatically resolve and download releases.

On Thu, Dec 15, 2016 at 9:16 PM, Reynold Xin <rx...@databricks.com> wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc5
> (cd0a08361e2526519e7c131c42116bf56fa62c76)
>
> List of JIRA tickets resolved are:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1223/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>
>
> FAQ
>
> How can I help test this release?
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> What should happen to JIRA tickets still targeting 2.1.0?
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
> What happened to RC3/RC5?
>
> They had issues withe release packaging and as a result were skipped.
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Nicholas Chammas <ni...@gmail.com>.

Since it’s not a regression from 2.0 (I believe the same issue affects both
2.0 and 2.1) it doesn’t merit a -1 vote according to the voting guidelines.

Of course, it would be nice if we could fix the various optimizer issues
that all seem to have a workaround that involves persist() (another one is
SPARK-18492 <https://issues.apache.org/jira/browse/SPARK-18492>) but I
don’t think this should block the release.


On Mon, Dec 19, 2016 at 12:36 PM Franklyn D'souza <
franklyn.dsouza@shopify.com> wrote:

> -1 https://issues.apache.org/jira/browse/SPARK-18589 hasn't been resolved
> by this release and is a blocker in our adoption of spark 2.0. I've updated
> the issue with some steps to reproduce the error.
>
> On Mon, Dec 19, 2016 at 4:37 AM, Sean Owen <so...@cloudera.com> wrote:
>
> PS, here are the open issues for 2.1.0. Forgot this one. No Blockers, but
> one "Critical":
>
> SPARK-16845
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering"
> grows beyond 64 KB
>
> SPARK-18669 Update Apache docs regard watermarking in Structured Streaming
>
> SPARK-18894 Event time watermark delay threshold specified in months or
> years gives incorrect results
>
> SPARK-18899 append data to a bucketed table with mismatched bucketing
> should fail
>
> SPARK-18909 The error message in `ExpressionEncoder.toRow` and `fromRow`
> is too verbose
>
> SPARK-18912 append to a non-file-based data source table should detect
> columns number mismatch
>
> SPARK-18913 append to a table with special column names should work
>
> SPARK-18921 check database existence with Hive.databaseExists instead of
> getDatabase
>
>
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc5
> (cd0a08361e2526519e7c131c42116bf56fa62c76)
>
> List of JIRA tickets resolved are:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1223/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
> *What happened to RC3/RC5?*
>
> They had issues withe release packaging and as a result were skipped.
>
>
>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Franklyn D'souza <fr...@shopify.com>.

-1 https://issues.apache.org/jira/browse/SPARK-18589 hasn't been resolved
by this release and is a blocker in our adoption of spark 2.0. I've updated
the issue with some steps to reproduce the error.

On Mon, Dec 19, 2016 at 4:37 AM, Sean Owen <so...@cloudera.com> wrote:

> PS, here are the open issues for 2.1.0. Forgot this one. No Blockers, but
> one "Critical":
>
> SPARK-16845 org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering"
> grows beyond 64 KB
>
> SPARK-18669 Update Apache docs regard watermarking in Structured Streaming
>
> SPARK-18894 Event time watermark delay threshold specified in months or
> years gives incorrect results
>
> SPARK-18899 append data to a bucketed table with mismatched bucketing
> should fail
>
> SPARK-18909 The error message in `ExpressionEncoder.toRow` and `fromRow`
> is too verbose
>
> SPARK-18912 append to a non-file-based data source table should detect
> columns number mismatch
>
> SPARK-18913 append to a table with special column names should work
>
> SPARK-18921 check database existence with Hive.databaseExists instead of
> getDatabase
>
>
> On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116b
>> f56fa62c76)
>>
>> List of JIRA tickets resolved are:  https://issues.apache.org/
>> jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1223/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.1.0?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>>
>> *What happened to RC3/RC5?*
>>
>> They had issues withe release packaging and as a result were skipped.
>>
>>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Sean Owen <so...@cloudera.com>.

PS, here are the open issues for 2.1.0. Forgot this one. No Blockers, but
one "Critical":

SPARK-16845
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering"
grows beyond 64 KB

SPARK-18669 Update Apache docs regard watermarking in Structured Streaming

SPARK-18894 Event time watermark delay threshold specified in months or
years gives incorrect results

SPARK-18899 append data to a bucketed table with mismatched bucketing
should fail

SPARK-18909 The error message in `ExpressionEncoder.toRow` and `fromRow` is
too verbose

SPARK-18912 append to a non-file-based data source table should detect
columns number mismatch

SPARK-18913 append to a table with special column names should work

SPARK-18921 check database existence with Hive.databaseExists instead of
getDatabase


On Fri, Dec 16, 2016 at 5:17 AM Reynold Xin <rx...@databricks.com> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc5
> (cd0a08361e2526519e7c131c42116bf56fa62c76)
>
> List of JIRA tickets resolved are:
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1223/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
> *What happened to RC3/RC5?*
>
> They had issues withe release packaging and as a result were skipped.
>
>

Re: [VOTE] Apache Spark 2.1.0 (RC5)

Posted by Reynold Xin <rx...@databricks.com>.

The vote passed with the following +1 and -1:


+1

Reynold Xin*
Sean Owen*
Dongjoon Hyun
Xiao Li
Herman van Hövell tot Westerflier
Joseph Bradley*
Liwei Lin
Denny Lee
Holden Karau
Adam Roberts
vaquar khan


0/+1 (not sure what this means but putting it here just in case)
Felix Cheung

-1
Franklyn D'souza (due to a bug that's not a regression)


I will work on packaging the release.
















On Thu, Dec 15, 2016 at 9:16 PM, Reynold Xin <rx...@databricks.com> wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.1.0. The vote is open until Sun, December 18, 2016 at 21:30 PT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.0
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.0-rc5 (cd0a08361e2526519e7c131c42116b
> f56fa62c76)
>
> List of JIRA tickets resolved are:  https://issues.apache.org/
> jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.0
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1223/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc5-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.0?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.1 or 2.2.0.
>
> *What happened to RC3/RC5?*
>
> They had issues withe release packaging and as a result were skipped.
>
>