You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2023/02/23 01:09:37 UTC
[spark] branch branch-3.4 updated: [SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new 3477d14d802 [SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide
3477d14d802 is described below

commit 3477d14d802a0b45970f2f99330dd4ddb9e6fefc
Author: Dongjoon Hyun <do...@apache.org>
AuthorDate: Wed Feb 22 17:09:17 2023 -0800

    [SPARK-42530][PYSPARK][DOCS] Remove Hadoop 2 from PySpark installation guide
    
    ### What changes were proposed in this pull request?
    
    This PR aims to remove `Hadoop 2` from PySpark installation guide.
    
    ### Why are the changes needed?
    
    From Apache Spark 3.4.0, we don't provide Hadoop 2 binaries.
    
    ### Does this PR introduce _any_ user-facing change?
    
    This is a documentation fix to be consistent with the new availability.
    
    ### How was this patch tested?
    
    Manual review.
    
    Closes #40127 from dongjoon-hyun/SPARK-42530.
    
    Authored-by: Dongjoon Hyun <do...@apache.org>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
    (cherry picked from commit 295617c5d8913fc1afc78fa9647d2f99b925ceaf)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 python/docs/source/getting_started/install.rst | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/python/docs/source/getting_started/install.rst b/python/docs/source/getting_started/install.rst
index be2a1eae66d..3db6b278403 100644
--- a/python/docs/source/getting_started/install.rst
+++ b/python/docs/source/getting_started/install.rst
@@ -57,7 +57,7 @@ For PySpark with/without a specific Hadoop version, you can install it by using
 
 .. code-block:: bash
 
-    PYSPARK_HADOOP_VERSION=2 pip install pyspark
+    PYSPARK_HADOOP_VERSION=3 pip install pyspark
 
 The default distribution uses Hadoop 3.3 and Hive 2.3. If users specify different versions of Hadoop, the pip installation automatically
 downloads a different version and uses it in PySpark. Downloading it can take a while depending on
@@ -65,18 +65,17 @@ the network and the mirror chosen. ``PYSPARK_RELEASE_MIRROR`` can be set to manu
 
 .. code-block:: bash
 
-    PYSPARK_RELEASE_MIRROR=http://mirror.apache-kr.org PYSPARK_HADOOP_VERSION=2 pip install
+    PYSPARK_RELEASE_MIRROR=http://mirror.apache-kr.org PYSPARK_HADOOP_VERSION=3 pip install
 
 It is recommended to use ``-v`` option in ``pip`` to track the installation and download status.
 
 .. code-block:: bash
 
-    PYSPARK_HADOOP_VERSION=2 pip install pyspark -v
+    PYSPARK_HADOOP_VERSION=3 pip install pyspark -v
 
 Supported values in ``PYSPARK_HADOOP_VERSION`` are:
 
 - ``without``: Spark pre-built with user-provided Apache Hadoop
-- ``2``: Spark pre-built for Apache Hadoop 2.7
 - ``3``: Spark pre-built for Apache Hadoop 3.3 and later (default)
 
 Note that this installation of PySpark with/without a specific Hadoop version is experimental. It can change or be removed between minor releases.
@@ -132,7 +131,7 @@ to install Spark, for example, as below:
 
 .. code-block:: bash
 
-    tar xzvf spark-3.3.0-bin-hadoop3.tgz
+    tar xzvf spark-3.4.0-bin-hadoop3.tgz
 
 Ensure the ``SPARK_HOME`` environment variable points to the directory where the tar file has been extracted.
 Update ``PYTHONPATH`` environment variable such that it can find the PySpark and Py4J under ``SPARK_HOME/python/lib``.
@@ -140,7 +139,7 @@ One example of doing this is shown below:
 
 .. code-block:: bash
 
-    cd spark-3.3.0-bin-hadoop3
+    cd spark-3.4.0-bin-hadoop3
     export SPARK_HOME=`pwd`
     export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org