You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/17 03:51:47 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

HyukjinKwon opened a new pull request #29779:
URL: https://github.com/apache/spark/pull/29779


   ### What changes were proposed in this pull request?
   
   This PR:
   - Rephrases some wordings in installation guide to avoid using the terms that can be potentially ambiguous such as "different favors"
   - Document extra dependency installation `pip install pyspark[sql]`
   - Use the link that corresponds to the released version. e.g.) https://spark.apache.org/docs/latest/building-spark.html vs https://spark.apache.org/docs/3.0.0/building-spark.html
   - Add some more details
   
   ### Why are the changes needed?
   
   To improve installation guide.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it updates the user-facing installation guide.
   
   ### How was this patch tested?
   
   Manually built the doc and tested.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693854205


   **[Test build #128794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128794/testReport)** for PR 29779 at commit [`bd35323`](https://github.com/apache/spark/commit/bd35323c39fdc66029d8be6768560e4b70a71fb3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695485214


   Merged to master.
   
   Thank you @srowen, @dongjoon-hyun and @viirya for your time!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695404599






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693792818


   **[Test build #128792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128792/testReport)** for PR 29779 at commit [`8d6634b`](https://github.com/apache/spark/commit/8d6634b50f06ab9259a470a5e3a3de46e616ed3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693805704






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #29779:
URL: https://github.com/apache/spark/pull/29779#discussion_r491235435



##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -0,0 +1,138 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+============
+Installation
+============
+
+PySpark is included in the official releases of Spark available in the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
+For Python users, PySpark also provides ``pip`` installation from PyPI. This is usually for local usage or as
+a client to connect to a cluster instead of setting up a cluster itself.
+ 
+This page includes instructions for installing PySpark by using pip, Conda, downloading manually,
+and building from the source.
+
+
+Python Version Supported
+------------------------
+
+Python 3.6 and above.
+
+
+Using PyPI
+----------
+
+PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_ is as follows:
+
+.. code-block:: bash
+
+    pip install pyspark
+
+If you want to install extra dependencies for a specific componenet, you can install it as below:
+
+.. code-block:: bash
+
+    pip install pyspark[sql]
+
+
+Using Conda
+-----------
+
+Conda is an open-source package management and environment management system which is a part of
+the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both cross-platform and
+language agnostic. In practice, Conda can replace both `pip <https://pip.pypa.io/en/latest/>`_ and
+`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+
+Create new virtual environment from your terminal as shown below:
+
+.. code-block:: bash
+
+    conda create -n pyspark_env
+
+After the virtual environment is created, it should be visible under the list of Conda environments
+which can be seen using the following command:
+
+.. code-block:: bash
+
+    conda env list
+
+Now activate the newly created environment with the following command:
+
+.. code-block:: bash
+
+    conda activate pyspark_env
+
+You can install pyspark by `Using PyPI <#using-pypi>`_ to install PySpark in the newly created
+environment, for example as below. It will install PySpark under the new virtual environemnt
+``pyspark_env`` created above.
+
+.. code-block:: bash
+
+    pip install pyspark
+
+Alternatively, you can install PySpark from Conda itself as below:
+
+.. code-block:: bash
+
+    conda install pyspark
+
+However, note that `PySpark at Conda <https://anaconda.org/conda-forge/pyspark>`_ is not necessarily
+synced with PySpark release cycle because it is maintained by the community separately.
+
+
+Manually Downloading
+--------------------
+
+PySpark is included in the distributions available at the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
+You can download a distribution you want from the site. After that, uncompress the tar file into the directoy where you want
+to install Spark as below:
+
+.. code-block:: bash
+
+    tar xzvf spark-3.0.0-bin-hadoop2.7.tgz
+
+Ensure the ``SPARK_HOME`` environment variable points to the directory where the code has been extracted. 

Review comment:
       `where the tar file has been extracted.`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693793104






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29779:
URL: https://github.com/apache/spark/pull/29779#discussion_r491637342



##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -0,0 +1,138 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+============
+Installation
+============
+
+PySpark is included in the official releases of Spark available in the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
+For Python users, PySpark also provides ``pip`` installation from PyPI. This is usually for local usage or as
+a client to connect to a cluster instead of setting up a cluster itself.
+ 
+This page includes instructions for installing PySpark by using pip, Conda, downloading manually,
+and building from the source.
+
+
+Python Version Supported
+------------------------
+
+Python 3.6 and above.
+
+
+Using PyPI
+----------
+
+PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_ is as follows:
+
+.. code-block:: bash
+
+    pip install pyspark
+
+If you want to install extra dependencies for a specific componenet, you can install it as below:
+
+.. code-block:: bash
+
+    pip install pyspark[sql]
+
+
+Using Conda
+-----------
+
+Conda is an open-source package management and environment management system which is a part of
+the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both cross-platform and
+language agnostic. In practice, Conda can replace both `pip <https://pip.pypa.io/en/latest/>`_ and
+`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+
+Create new virtual environment from your terminal as shown below:
+
+.. code-block:: bash
+
+    conda create -n pyspark_env
+
+After the virtual environment is created, it should be visible under the list of Conda environments
+which can be seen using the following command:
+
+.. code-block:: bash
+
+    conda env list
+
+Now activate the newly created environment with the following command:
+
+.. code-block:: bash
+
+    conda activate pyspark_env
+
+You can install pyspark by `Using PyPI <#using-pypi>`_ to install PySpark in the newly created
+environment, for example as below. It will install PySpark under the new virtual environemnt
+``pyspark_env`` created above.
+
+.. code-block:: bash
+
+    pip install pyspark
+
+Alternatively, you can install PySpark from Conda itself as below:
+
+.. code-block:: bash
+
+    conda install pyspark
+
+However, note that `PySpark at Conda <https://anaconda.org/conda-forge/pyspark>`_ is not necessarily
+synced with PySpark release cycle because it is maintained by the community separately.
+
+
+Manually Downloading
+--------------------
+
+PySpark is included in the distributions available at the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
+You can download a distribution you want from the site. After that, uncompress the tar file into the directoy where you want
+to install Spark as below:
+
+.. code-block:: bash
+
+    tar xzvf spark-3.0.0-bin-hadoop2.7.tgz

Review comment:
       Oh,  I thought I clarified that this is just an example. Let me fix




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693792818


   **[Test build #128792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128792/testReport)** for PR 29779 at commit [`8d6634b`](https://github.com/apache/spark/commit/8d6634b50f06ab9259a470a5e3a3de46e616ed3f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694679748






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695414590






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693805704






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #29779:
URL: https://github.com/apache/spark/pull/29779


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693855631






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693855631






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693817842






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694667239






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693803882


   **[Test build #128794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128794/testReport)** for PR 29779 at commit [`bd35323`](https://github.com/apache/spark/commit/bd35323c39fdc66029d8be6768560e4b70a71fb3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693803882


   **[Test build #128794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128794/testReport)** for PR 29779 at commit [`bd35323`](https://github.com/apache/spark/commit/bd35323c39fdc66029d8be6768560e4b70a71fb3).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694666900


   **[Test build #128856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128856/testReport)** for PR 29779 at commit [`a79b48e`](https://github.com/apache/spark/commit/a79b48eb95f3199a6cff4db6217db562d4c3f62d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694679410


   **[Test build #128856 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128856/testReport)** for PR 29779 at commit [`a79b48e`](https://github.com/apache/spark/commit/a79b48eb95f3199a6cff4db6217db562d4c3f62d).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695414590






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen commented on a change in pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
srowen commented on a change in pull request #29779:
URL: https://github.com/apache/spark/pull/29779#discussion_r490342386



##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -19,71 +19,95 @@
 Installation
 ============
 
-Official releases are available from the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
-Alternatively, you can install it via ``pip`` from PyPI.  PyPI installation is usually for standalone
-locally or as a client to connect to a cluster instead of setting a cluster up.  
+PySpark is included in the official releases of Spark available in the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
+For Python users, PySpark also provides ``pip`` installation from PyPI. This is usually for standalone
+locally or as a client to connect to a cluster instead of setting a cluster itself up.

Review comment:
       standalone locally -> local usage?
   "instead of setting up a cluster itself" maybe

##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -19,71 +19,95 @@
 Installation
 ============
 
-Official releases are available from the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
-Alternatively, you can install it via ``pip`` from PyPI.  PyPI installation is usually for standalone
-locally or as a client to connect to a cluster instead of setting a cluster up.  
+PySpark is included in the official releases of Spark available in the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
+For Python users, PySpark also provides ``pip`` installation from PyPI. This is usually for standalone
+locally or as a client to connect to a cluster instead of setting a cluster itself up.
  
-This page includes the instructions for installing PySpark by using pip, Conda, downloading manually, and building it from the source.
+This page includes the instructions for installing PySpark by using pip, Conda, downloading manually,
+and building it from the source.
+
 
 Python Version Supported
 ------------------------
 
 Python 3.6 and above.
 
+
 Using PyPI
 ----------
 
-PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_
+PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_ is as follows:
 
 .. code-block:: bash
 
     pip install pyspark
-	
-Using Conda  
+
+If you want to install extra dependencies for a specific componenet, you can install it as below:
+
+.. code-block:: bash
+
+    pip install pyspark[sql]
+
+
+
+Using Conda
 -----------
 
-Conda is an open-source package management and environment management system which is a part of the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both cross-platform and language agnostic.
-  
-Conda can be used to create a virtual environment from terminal as shown below:
+Conda is an open-source package management and environment management system which is a part of
+the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both cross-platform and
+language agnostic. In practice, Conda can replace both `pip <https://pip.pypa.io/en/latest/>`_ and
+`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+
+Create new virtual environment from your terminal as shown below:
 
 .. code-block:: bash
 
-    conda create -n pyspark_env 
+    conda create -n pyspark_env
 
-After the virtual environment is created, it should be visible under the list of Conda environments which can be seen using the following command:
+After the virtual environment is created, it should be visible under the list of Conda environments
+which can be seen using the following command:
 
 .. code-block:: bash
 
     conda env list
 
-The newly created environment can be accessed using the following command:
+Now activate the the newly created environment by the following command:

Review comment:
       "the the" -> the
   by -> with

##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -19,71 +19,95 @@
 Installation
 ============
 
-Official releases are available from the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
-Alternatively, you can install it via ``pip`` from PyPI.  PyPI installation is usually for standalone
-locally or as a client to connect to a cluster instead of setting a cluster up.  
+PySpark is included in the official releases of Spark available in the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
+For Python users, PySpark also provides ``pip`` installation from PyPI. This is usually for standalone
+locally or as a client to connect to a cluster instead of setting a cluster itself up.
  
-This page includes the instructions for installing PySpark by using pip, Conda, downloading manually, and building it from the source.
+This page includes the instructions for installing PySpark by using pip, Conda, downloading manually,

Review comment:
       "includes instructions"
   "building from source"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694651249


   Thank you @srowen.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693817842






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695447153


   **[Test build #128896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128896/testReport)** for PR 29779 at commit [`ccfebbe`](https://github.com/apache/spark/commit/ccfebbed10b91e6b9d81f5b97e0436769a4c500c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695412880


   **[Test build #128896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128896/testReport)** for PR 29779 at commit [`ccfebbe`](https://github.com/apache/spark/commit/ccfebbed10b91e6b9d81f5b97e0436769a4c500c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695484206


   Let me merge this. It's very unlikely that adding some comments into `setup.py` causes a build failure in JDK 11 ...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695448253






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694667239






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693793104






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #29779:
URL: https://github.com/apache/spark/pull/29779#discussion_r490714909



##########
File path: python/docs/source/getting_started/installation.rst
##########
@@ -1,114 +0,0 @@
-..  Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       Ah .. looks like there are too many diff and it doesn't know that it was renamed ..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-693816448


   **[Test build #128792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128792/testReport)** for PR 29779 at commit [`8d6634b`](https://github.com/apache/spark/commit/8d6634b50f06ab9259a470a5e3a3de46e616ed3f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695448253






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694666900


   **[Test build #128856 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128856/testReport)** for PR 29779 at commit [`a79b48e`](https://github.com/apache/spark/commit/a79b48eb95f3199a6cff4db6217db562d4c3f62d).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695404599






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-695412880


   **[Test build #128896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128896/testReport)** for PR 29779 at commit [`ccfebbe`](https://github.com/apache/spark/commit/ccfebbed10b91e6b9d81f5b97e0436769a4c500c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #29779:
URL: https://github.com/apache/spark/pull/29779#discussion_r491192048



##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -0,0 +1,138 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+============
+Installation
+============
+
+PySpark is included in the official releases of Spark available in the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
+For Python users, PySpark also provides ``pip`` installation from PyPI. This is usually for local usage or as
+a client to connect to a cluster instead of setting up a cluster itself.
+ 
+This page includes instructions for installing PySpark by using pip, Conda, downloading manually,
+and building from the source.
+
+
+Python Version Supported
+------------------------
+
+Python 3.6 and above.
+
+
+Using PyPI
+----------
+
+PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_ is as follows:
+
+.. code-block:: bash
+
+    pip install pyspark
+
+If you want to install extra dependencies for a specific componenet, you can install it as below:
+
+.. code-block:: bash
+
+    pip install pyspark[sql]
+
+
+Using Conda
+-----------
+
+Conda is an open-source package management and environment management system which is a part of
+the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both cross-platform and
+language agnostic. In practice, Conda can replace both `pip <https://pip.pypa.io/en/latest/>`_ and
+`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+
+Create new virtual environment from your terminal as shown below:
+
+.. code-block:: bash
+
+    conda create -n pyspark_env
+
+After the virtual environment is created, it should be visible under the list of Conda environments
+which can be seen using the following command:
+
+.. code-block:: bash
+
+    conda env list
+
+Now activate the newly created environment with the following command:
+
+.. code-block:: bash
+
+    conda activate pyspark_env
+
+You can install pyspark by `Using PyPI <#using-pypi>`_ to install PySpark in the newly created
+environment, for example as below. It will install PySpark under the new virtual environemnt
+``pyspark_env`` created above.
+
+.. code-block:: bash
+
+    pip install pyspark
+
+Alternatively, you can install PySpark from Conda itself as below:
+
+.. code-block:: bash
+
+    conda install pyspark
+
+However, note that `PySpark at Conda <https://anaconda.org/conda-forge/pyspark>`_ is not necessarily
+synced with PySpark release cycle because it is maintained by the community separately.
+
+
+Manually Downloading
+--------------------
+
+PySpark is included in the distributions available at the `Apache Spark website <https://spark.apache.org/downloads.html>`_.
+You can download a distribution you want from the site. After that, uncompress the tar file into the directoy where you want
+to install Spark as below:
+
+.. code-block:: bash
+
+    tar xzvf spark-3.0.0-bin-hadoop2.7.tgz

Review comment:
       Is `3.0.0` used because we need to land this to `branch-3.0`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #29779: [SPARK-32180][PYTHON][DOCS][FOLLOW-UP] Rephrase and add some more information in installation guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #29779:
URL: https://github.com/apache/spark/pull/29779#issuecomment-694679748






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org