You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/19 02:51:59 UTC

[GitHub] [spark] HyukjinKwon commented on a change in pull request #34315: [SPARK-37050][PYTHON] Update Conda installation instructions

HyukjinKwon commented on a change in pull request #34315:
URL: https://github.com/apache/spark/pull/34315#discussion_r731453369



##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -83,46 +83,54 @@ Note that this installation way of PySpark with/without a specific Hadoop versio
 Using Conda
 -----------
 
-Conda is an open-source package management and environment management system which is a part of
-the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both cross-platform and
-language agnostic. In practice, Conda can replace both `pip <https://pip.pypa.io/en/latest/>`_ and
-`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
-
-Create new virtual environment from your terminal as shown below:
-
-.. code-block:: bash
-
-    conda create -n pyspark_env
-
-After the virtual environment is created, it should be visible under the list of Conda environments
-which can be seen using the following command:
+Conda is an open-source package management and environment management system (developed by
+`Anaconda <https://www.anaconda.com/>`_), which is best installed through
+`Miniconda <https://docs.conda.io/en/latest/miniconda.html/>`_ or `Miniforge <https://github.com/conda-forge/miniforge/>`_.
+The tool is both cross-platform and language agnostic, and in practice, conda can replace both
+`pip <https://pip.pypa.io/en/latest/>`_ and `virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+
+Conda uses so-called channels to distribute packages, and together with the default channels by
+Anaconda itself, the most important channel is `conda-forge <https://conda-forge.org/>`_, which
+is the community-driven packaging effort that is the most extensive & the most current (and also
+serves as the upstream for the Anaconda channels in most cases).
+
+Generally, it is recommended to use *as few channels as possible*. Conda-forge & Anaconda put a

Review comment:
       While I agree with this, let's make the instruction as simple as possible, and stick to default setting in conda.

##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -83,46 +83,54 @@ Note that this installation way of PySpark with/without a specific Hadoop versio
 Using Conda
 -----------
 
-Conda is an open-source package management and environment management system which is a part of
-the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both cross-platform and
-language agnostic. In practice, Conda can replace both `pip <https://pip.pypa.io/en/latest/>`_ and
-`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
-
-Create new virtual environment from your terminal as shown below:
-
-.. code-block:: bash
-
-    conda create -n pyspark_env
-
-After the virtual environment is created, it should be visible under the list of Conda environments
-which can be seen using the following command:
+Conda is an open-source package management and environment management system (developed by
+`Anaconda <https://www.anaconda.com/>`_), which is best installed through
+`Miniconda <https://docs.conda.io/en/latest/miniconda.html/>`_ or `Miniforge <https://github.com/conda-forge/miniforge/>`_.
+The tool is both cross-platform and language agnostic, and in practice, conda can replace both
+`pip <https://pip.pypa.io/en/latest/>`_ and `virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+
+Conda uses so-called channels to distribute packages, and together with the default channels by
+Anaconda itself, the most important channel is `conda-forge <https://conda-forge.org/>`_, which
+is the community-driven packaging effort that is the most extensive & the most current (and also
+serves as the upstream for the Anaconda channels in most cases).
+
+Generally, it is recommended to use *as few channels as possible*. Conda-forge & Anaconda put a
+lot of effort in guaranteeing binary compatibility between packages (e.g. by using compatible
+compilers for all packages and tracking which packages are ABI-relevant). Needlessly mixing in
+other channels can end up breaking those guarantees, which is why conda-forge even recommends
+so-called "strict channel priority":
 
 .. code-block:: bash
 
-    conda env list
+    conda config --add channels conda-forge
+    conda config --set channel_priority strict

Review comment:
       so I would remove this one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org