You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dusktreader <gi...@git.apache.org> on 2017/08/17 23:09:38 UTC

[GitHub] spark pull request #18981: Fixed pandoc dependency issue in python/setup.py

GitHub user dusktreader opened a pull request:

    https://github.com/apache/spark/pull/18981

    Fixed pandoc dependency issue in python/setup.py

    ## Problem Description
    
    When pyspark is listed as a dependency of another package, installing
    the other package will cause an install failure in pyspark. When the
    other package is being installed, pyspark's setup_requires requirements
    are installed including pypandoc. Thus, the exception handling on
    setup.py:152 does not work because the pypandoc module is indeed
    available. However, the pypandoc.convert() function fails if pandoc
    itself is not installed (in our use cases it is not). This raises an
    OSError that is not handled, and setup fails.
    
    The following is a sample failure:
    ```
    $ which pandoc
    $ pip freeze | grep pypandoc
    pypandoc==1.4
    $ pip install pyspark
    Collecting pyspark
      Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
        100% |████████████████████████████████| 188.3MB 16.8MB/s
        Complete output from command python setup.py egg_info:
        Maybe try:
    
            sudo apt-get install pandoc
        See http://johnmacfarlane.net/pandoc/installing.html
        for installation options
        ---------------------------------------------------------------
    
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
            long_description = pypandoc.convert('README.md', 'rst')
          File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 69, in convert
            outputfile=outputfile, filters=filters)
          File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 260, in _convert_input
            _ensure_pandoc_path()
          File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 544, in _ensure_pandoc_path
            raise OSError("No pandoc was found: either install pandoc and add it\n"
        OSError: No pandoc was found: either install pandoc and add it
        to your PATH or or call pypandoc.download_pandoc(...) or
        install pypandoc wheels with included pandoc.
    
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-mfnizcwa/pyspark/
    ```
    
    ## What changes were proposed in this pull request?
    
    This change simply adds an additional exception handler for the OSError
    that is raised. This allows pyspark to be installed client-side without requiring pandoc to be installed.
    
    ## How was this patch tested?
    
    I tested this by building a wheel package of pyspark with the change applied. Then, in a clean virtual environment with pypandoc installed but pandoc not available on the system, I installed pyspark from the wheel.
    
    Here is the output
    
    ```
    $ pip freeze | grep pypandoc
    pypandoc==1.4
    $ which pandoc
    $ pip install --no-cache-dir ../spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl 
    Processing /home/tbeck/work/spark/python/dist/pyspark-2.3.0.dev0-py2.py3-none-any.whl
    Requirement already satisfied: py4j==0.10.6 in /home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages (from pyspark==2.3.0.dev0)
    Installing collected packages: pyspark
    Successfully installed pyspark-2.3.0.dev0
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dusktreader/spark dusktreader/fix-pandoc-dependency-issue-in-setup_py

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18981.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18981
    
----
commit edd53828a23561144b535582430c0805fe54b355
Author: Tucker Beck <tu...@rentrakmail.com>
Date:   2017-08-17T22:20:47Z

    Fixed pandoc dependency issue in python/setup.py
    
    When pyspark is listed as a dependency of another package, installing
    the other package will cause an install failure in pyspark. When the
    other package is being installed, pyspark's setup_requires requirements
    are installed including pypandoc. Thus, the exception handling on
    setup.py:152 does not work because the pypandoc module is indeed
    available. However, the pypandoc.convert() function fails if pandoc
    itself is not installed (in our use cases it is not). This raises an
    OSError that is not handled, and setup fails.
    
    This change simply adds an additional exception handler for the OSError
    that is raised.
    
    The following is a sample failure:
    
    $ which pandoc
    $ pip freeze | grep pypandoc
    pypandoc==1.4
    $ pip install pyspark
    Collecting pyspark
      Downloading pyspark-2.2.0.post0.tar.gz (188.3MB)
        100% |████████████████████████████████| 188.3MB 16.8MB/s
        Complete output from command python setup.py egg_info:
        Maybe try:
    
            sudo apt-get install pandoc
        See http://johnmacfarlane.net/pandoc/installing.html
        for installation options
        ---------------------------------------------------------------
    
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-build-mfnizcwa/pyspark/setup.py", line 151, in <module>
            long_description = pypandoc.convert('README.md', 'rst')
          File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 69, in convert
            outputfile=outputfile, filters=filters)
          File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 260, in _convert_input
            _ensure_pandoc_path()
          File "/home/tbeck/.virtualenvs/cem/lib/python3.5/site-packages/pypandoc/__init__.py", line 544, in _ensure_pandoc_path
            raise OSError("No pandoc was found: either install pandoc and add it\n"
        OSError: No pandoc was found: either install pandoc and add it
        to your PATH or or call pypandoc.download_pandoc(...) or
        install pypandoc wheels with included pandoc.
    
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-mfnizcwa/pyspark/

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    Merged to master and branch-2.2


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    Thanks for fixing this @dusktreader :) OSError seems to mostly occur when pandoc is not installed.
    
    I think automatically attempting to install pandoc is more likely to lead us into difficulty rather than skipping it (since otherwise in part pypandoc would just do this automatically).
    
    Jenkins, OK to Test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    looks something gone wrong with Jenkins command..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    Also to be clear pandoc is pretty optional, we want to use for packaging but in local mode it doesn't really matter. So LGTM pending jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80904/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    **[Test build #81459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81459/testReport)** for PR 18981 at commit [`edd5382`](https://github.com/apache/spark/commit/edd53828a23561144b535582430c0805fe54b355).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81459/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    **[Test build #81459 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81459/testReport)** for PR 18981 at commit [`edd5382`](https://github.com/apache/spark/commit/edd53828a23561144b535582430c0805fe54b355).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by dusktreader <gi...@git.apache.org>.
Github user dusktreader commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    @srowen you could call download_pandoc(). However, there's no utility for having pandoc installed for a pyspark user. It's possible that the convert() function could raise an OSError under other circumstances, but most likely that this condition is the only source for that exception 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18981: Fixed pandoc dependency issue in python/setup.py

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18981


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    **[Test build #80904 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80904/testReport)** for PR 18981 at commit [`edd5382`](https://github.com/apache/spark/commit/edd53828a23561144b535582430c0805fe54b355).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    Is this the only thing that needs to change, or does some other script/doc need to change to make sure pandoc is installed? does OSError occur in other situations or mostly only when pandoc isn't installed here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    LGTM too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18981
  
    **[Test build #80904 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80904/testReport)** for PR 18981 at commit [`edd5382`](https://github.com/apache/spark/commit/edd53828a23561144b535582430c0805fe54b355).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org