You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2023/03/22 01:00:57 UTC

[spark] branch branch-3.4 updated: [SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new ca260cccb15 [SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11
ca260cccb15 is described below

commit ca260cccb15d5c28b0a25cf0423723700d343d3c
Author: Chris Nauroth <cn...@apache.org>
AuthorDate: Tue Mar 21 18:00:35 2023 -0700

    [SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11
    
    ### What changes were proposed in this pull request?
    
    Upgrade the [GCS Connector](https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs) bundled in the Spark distro from version 2.2.7 to 2.2.11.
    
    ### Why are the changes needed?
    
    The new release contains multiple bug fixes and enhancements discussed in the [Release Notes](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md). Notable changes include:
    * Improved socket timeout handling.
    * Trace logging capabilities.
    * Fix bug that prevented usage of GCS as a [Hadoop Credential Provider](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
    * Dependency upgrades.
    * Support OAuth2 based client authentication.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Distributions built with `-Phadoop-cloud` now include GCS connector 2.2.11 instead of 2.2.7.
    
    ```
    cnaurothcnauroth-2-1-m:~/spark-3.5.0-SNAPSHOT-bin-custom-spark$ ls -lrt jars/gcs*
    -rw-r--r-- 1 cnauroth cnauroth 36497606 Mar 21 00:42 jars/gcs-connector-hadoop3-2.2.11-shaded.jar
    ```
    
    ### How was this patch tested?
    
    **Build**
    
    I built a custom distro with `-Phadoop-cloud`:
    
    ```
    ./dev/make-distribution.sh --name custom-spark --pip --tgz -Phadoop-3 -Phadoop-cloud -Pscala-2.12
    ```
    
    **Run**
    
    I ran a PySpark job that successfully reads and writes using GCS:
    
    ```
    from pyspark.sql import SparkSession
    
    def main() -> None:
      # Create SparkSession.
      spark = (SparkSession.builder
               .appName('copy-shakespeare')
               .getOrCreate())
    
      # Read.
      df = spark.read.text('gs://dataproc-datasets-us-central1/shakespeare')
    
      # Write.
      df.write.text('gs://cnauroth-hive-metastore-proxy-dist/output/copy-shakespeare')
    
      spark.stop()
    
    if __name__ == '__main__':
      main()
    ```
    
    Authored-by: Chris Nauroth <cnaurothapache.org>
    
    Closes #40511 from cnauroth/SPARK-42888.
    
    Authored-by: Chris Nauroth <cn...@apache.org>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
    (cherry picked from commit f9017cbe521f7696128b8c9edcb825c79f16768b)
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml                               | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 01e6c814fea..ec382099d24 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -63,7 +63,7 @@ datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
 derby/10.14.2.0//derby-10.14.2.0.jar
 dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
 flatbuffers-java/1.12.0//flatbuffers-java-1.12.0.jar
-gcs-connector/hadoop2-2.2.7/shaded/gcs-connector-hadoop2-2.2.7-shaded.jar
+gcs-connector/hadoop2-2.2.11/shaded/gcs-connector-hadoop2-2.2.11-shaded.jar
 gmetric4j/1.0.10//gmetric4j-1.0.10.jar
 gson/2.2.4//gson-2.2.4.jar
 guava/14.0.1//guava-14.0.1.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 4fd81b5f43b..4c834bed077 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -60,7 +60,7 @@ datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
 derby/10.14.2.0//derby-10.14.2.0.jar
 dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
 flatbuffers-java/1.12.0//flatbuffers-java-1.12.0.jar
-gcs-connector/hadoop3-2.2.7/shaded/gcs-connector-hadoop3-2.2.7-shaded.jar
+gcs-connector/hadoop3-2.2.11/shaded/gcs-connector-hadoop3-2.2.11-shaded.jar
 gmetric4j/1.0.10//gmetric4j-1.0.10.jar
 gson/2.2.4//gson-2.2.4.jar
 guava/14.0.1//guava-14.0.1.jar
diff --git a/pom.xml b/pom.xml
index 6d94e06ca78..4f1f93f3492 100644
--- a/pom.xml
+++ b/pom.xml
@@ -160,7 +160,7 @@
     <aws.java.sdk.version>1.11.655</aws.java.sdk.version>
     <!-- the producer is used in tests -->
     <aws.kinesis.producer.version>0.12.8</aws.kinesis.producer.version>
-    <gcs-connector.version>hadoop3-2.2.7</gcs-connector.version>
+    <gcs-connector.version>hadoop3-2.2.11</gcs-connector.version>
     <!--  org.apache.httpcomponents/httpclient-->
     <commons.httpclient.version>4.5.14</commons.httpclient.version>
     <commons.httpcore.version>4.4.16</commons.httpcore.version>
@@ -3514,7 +3514,7 @@
         <hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact>
         <hadoop-client-runtime.artifact>hadoop-yarn-api</hadoop-client-runtime.artifact>
         <hadoop-client-minicluster.artifact>hadoop-client</hadoop-client-minicluster.artifact>
-        <gcs-connector.version>hadoop2-2.2.7</gcs-connector.version>
+        <gcs-connector.version>hadoop2-2.2.11</gcs-connector.version>
         <!-- SPARK-36547: Please don't upgrade the version below, otherwise there will be an error on building Hadoop 2.7 package -->
         <scala-maven-plugin.version>4.3.0</scala-maven-plugin.version>
       </properties>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org