You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2023/02/23 01:09:37 UTC
[spark] branch branch-3.4 updated: [SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push:
new 3477d14d802 [SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide
3477d14d802 is described below
commit 3477d14d802a0b45970f2f99330dd4ddb9e6fefc
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Wed Feb 22 17:09:17 2023 -0800
[SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide
### What changes were proposed in this pull request?
This PR aims to remove `Hadoop 2` from PySpark installation guide.
### Why are the changes needed?
From Apache Spark 3.4.0, we don't provide Hadoop 2 binaries.
### Does this PR introduce _any_ user-facing change?
This is a documentation fix to be consistent with the new availability.
### How was this patch tested?
Manual review.
Closes #40127 from dongjoon-hyun/SPARK-42530.
Authored-by: Dongjoon Hyun <do...@apache.org>
Signed-off-by: Dongjoon Hyun <do...@apache.org>
(cherry picked from commit 295617c5d8913fc1afc78fa9647d2f99b925ceaf)
Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
python/docs/source/getting_started/install.rst | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst
index be2a1eae66d..3db6b278403 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -57,7 +57,7 @@ For PySpark with/without a specific Hadoop version, you can install it by using
.. code-block:: bash
- PYSPARK_HADOOP_VERSION=2 pip install pyspark
+ PYSPARK_HADOOP_VERSION=3 pip install pyspark
The default distribution uses Hadoop 3.3 and Hive 2.3. If users specify different versions of Hadoop, the pip installation automatically
downloads a different version and uses it in PySpark. Downloading it can take a while depending on
@@ -65,18 +65,17 @@ the network and the mirror chosen. ``PYSPARK_RELEASE_MIRROR`` can be set to manu
.. code-block:: bash
- PYSPARK_RELEASE_MIRROR=http://mirror.apache-kr.org PYSPARK_HADOOP_VERSION=2 pip install
+ PYSPARK_RELEASE_MIRROR=http://mirror.apache-kr.org PYSPARK_HADOOP_VERSION=3 pip install
It is recommended to use ``-v`` option in ``pip`` to track the installation and download status.
.. code-block:: bash
- PYSPARK_HADOOP_VERSION=2 pip install pyspark -v
+ PYSPARK_HADOOP_VERSION=3 pip install pyspark -v
Supported values in ``PYSPARK_HADOOP_VERSION`` are:
- ``without``: Spark pre-built with user-provided Apache Hadoop
-- ``2``: Spark pre-built for Apache Hadoop 2.7
- ``3``: Spark pre-built for Apache Hadoop 3.3 and later (default)
Note that this installation of PySpark with/without a specific Hadoop version is experimental. It can change or be removed between minor releases.
@@ -132,7 +131,7 @@ to install Spark, for example, as below:
.. code-block:: bash
- tar xzvf spark-3.3.0-bin-hadoop3.tgz
+ tar xzvf spark-3.4.0-bin-hadoop3.tgz
Ensure the ``SPARK_HOME`` environment variable points to the directory where the tar file has been extracted.
Update ``PYTHONPATH`` environment variable such that it can find the PySpark and Py4J under ``SPARK_HOME/python/lib``.
@@ -140,7 +139,7 @@ One example of doing this is shown below:
.. code-block:: bash
- cd spark-3.3.0-bin-hadoop3
+ cd spark-3.4.0-bin-hadoop3
export SPARK_HOME=`pwd`
export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org