You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by sryza <gi...@git.apache.org> on 2014/02/24 08:33:51 UTC

[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN

GitHub user sryza opened a pull request:

    https://github.com/apache/incubator-spark/pull/640

    SPARK-1004: PySpark on YARN

    Make pyspark work in yarn-client mode.  This build's on Josh's work.  I tested verified it works on a 5-node cluster.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sryza/incubator-spark sandy-spark-1004

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/640.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #640
    
----
commit e752a6a1c8a9d7cbc31d7b911800e22db6fcb2b0
Author: Josh Rosen <jo...@apache.org>
Date:   2014-01-24T18:19:58Z

    Automatically set Yarn env vars in PySpark (SPARK-1030).

commit 0adcaa971086853b254baf32748811561bb6e209
Author: Josh Rosen <jo...@apache.org>
Date:   2014-01-25T23:28:56Z

    WIP towards PySpark on YARN:
    
    - Remove reliance on SPARK_HOME on the workers.  Only the driver
      should know about SPARK_HOME.  On the workers, we ensure that the
      PySpark Python libraries are added to the PYTHONPATH.
    
    - Add a Makefile for generating a "fat zip" that contains PySpark's
      Python dependencies.  This is a bit of a hack and I'd be open to
      better packaging tools, but this doesn't require any extra Python
      libraries.  This use case doesn't seem to be well-addressed by the
      existing Python packaging tools: there are plenty of tools to package
      complete Python environments (such as pyinstaller and virtualenv) or
      to bundle *individual* libraries (e.g. distutils), but few to generate
      portable fat zips or eggs.
    
    This hasn't been tested with YARN and may not actually compile.

commit d4a71d0495d072d5b5364601e7cd0dc9a7c9c9b9
Author: Josh Rosen <jo...@apache.org>
Date:   2014-02-19T06:27:21Z

    Add missing setup.py file for PySpark.

commit dcda63863a41414ba5e410092dc4fbab2e353543
Author: Sandy Ryza <sa...@cloudera.com>
Date:   2014-02-24T07:06:42Z

    Improvements

commit 38546d4f282727f3ae112f1e564df72443b726f5
Author: Sandy Ryza <sa...@cloudera.com>
Date:   2014-02-24T07:26:01Z

    Don't set SPARK_JAR

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/640#issuecomment-35863413
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/640#issuecomment-35865040
  
    One or more automated tests failed
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12825/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/640#issuecomment-35865039
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN

Posted by jyotiska <gi...@git.apache.org>.
Github user jyotiska commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/640#discussion_r9983532
  
    --- Diff: python/setup.py ---
    @@ -0,0 +1,30 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from distutils.core import setup
    +
    +
    +setup(
    +    name='pyspark',
    +    version='0.9.0-incubating-SNAPSHOT',
    +    description='Python API for Spark',
    +    author='The Apache Software Foundation',
    +    author_email='user@spark.incubator.apache.org',
    +    license='Apache License 2.0',
    +    url='spark-project.org',
    +    packages=['pyspark'],
    +)
    --- End diff --
    
    Should we specify any other packages such as numpy inside a separate <code>install_requires</code> field?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1004: PySpark on YARN

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/640#issuecomment-35863414
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---