You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by sryza <gi...@git.apache.org> on 2014/02/24 08:33:51 UTC
[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN
GitHub user sryza opened a pull request:
https://github.com/apache/incubator-spark/pull/640
SPARK-1004: PySpark on YARN
Make pyspark work in yarn-client mode. This build's on Josh's work. I tested verified it works on a 5-node cluster.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/incubator-spark sandy-spark-1004
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-spark/pull/640.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #640
----
commit e752a6a1c8a9d7cbc31d7b911800e22db6fcb2b0
Author: Josh Rosen <jo...@apache.org>
Date: 2014-01-24T18:19:58Z
Automatically set Yarn env vars in PySpark (SPARK-1030).
commit 0adcaa971086853b254baf32748811561bb6e209
Author: Josh Rosen <jo...@apache.org>
Date: 2014-01-25T23:28:56Z
WIP towards PySpark on YARN:
- Remove reliance on SPARK_HOME on the workers. Only the driver
should know about SPARK_HOME. On the workers, we ensure that the
PySpark Python libraries are added to the PYTHONPATH.
- Add a Makefile for generating a "fat zip" that contains PySpark's
Python dependencies. This is a bit of a hack and I'd be open to
better packaging tools, but this doesn't require any extra Python
libraries. This use case doesn't seem to be well-addressed by the
existing Python packaging tools: there are plenty of tools to package
complete Python environments (such as pyinstaller and virtualenv) or
to bundle *individual* libraries (e.g. distutils), but few to generate
portable fat zips or eggs.
This hasn't been tested with YARN and may not actually compile.
commit d4a71d0495d072d5b5364601e7cd0dc9a7c9c9b9
Author: Josh Rosen <jo...@apache.org>
Date: 2014-02-19T06:27:21Z
Add missing setup.py file for PySpark.
commit dcda63863a41414ba5e410092dc4fbab2e353543
Author: Sandy Ryza <sa...@cloudera.com>
Date: 2014-02-24T07:06:42Z
Improvements
commit 38546d4f282727f3ae112f1e564df72443b726f5
Author: Sandy Ryza <sa...@cloudera.com>
Date: 2014-02-24T07:26:01Z
Don't set SPARK_JAR
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---
[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/640#issuecomment-35863413
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---
[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/640#issuecomment-35865040
One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12825/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---
[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/640#issuecomment-35865039
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---
[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN
Posted by jyotiska <gi...@git.apache.org>.
Github user jyotiska commented on a diff in the pull request:
https://github.com/apache/incubator-spark/pull/640#discussion_r9983532
--- Diff: python/setup.py ---
@@ -0,0 +1,30 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from distutils.core import setup
+
+
+setup(
+ name='pyspark',
+ version='0.9.0-incubating-SNAPSHOT',
+ description='Python API for Spark',
+ author='The Apache Software Foundation',
+ author_email='user@spark.incubator.apache.org',
+ license='Apache License 2.0',
+ url='spark-project.org',
+ packages=['pyspark'],
+)
--- End diff --
Should we specify any other packages such as numpy inside a separate <code>install_requires</code> field?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---
[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/incubator-spark/pull/640#issuecomment-35863414
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---