You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2023/03/22 01:00:57 UTC
[spark] branch branch-3.4 updated: [SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push:
new ca260cccb15 [SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11
ca260cccb15 is described below
commit ca260cccb15d5c28b0a25cf0423723700d343d3c
Author: Chris Nauroth <cn...@apache.org>
AuthorDate: Tue Mar 21 18:00:35 2023 -0700
[SPARK-42888][BUILD] Upgrade `gcs-connector` to 2.2.11
### What changes were proposed in this pull request?
Upgrade the [GCS Connector](https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/v2.2.11/gcs) bundled in the Spark distro from version 2.2.7 to 2.2.11.
### Why are the changes needed?
The new release contains multiple bug fixes and enhancements discussed in the [Release Notes](https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v2.2.11/gcs/CHANGES.md). Notable changes include:
* Improved socket timeout handling.
* Trace logging capabilities.
* Fix bug that prevented usage of GCS as a [Hadoop Credential Provider](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
* Dependency upgrades.
* Support OAuth2 based client authentication.
### Does this PR introduce _any_ user-facing change?
Distributions built with `-Phadoop-cloud` now include GCS connector 2.2.11 instead of 2.2.7.
```
cnaurothcnauroth-2-1-m:~/spark-3.5.0-SNAPSHOT-bin-custom-spark$ ls -lrt jars/gcs*
-rw-r--r-- 1 cnauroth cnauroth 36497606 Mar 21 00:42 jars/gcs-connector-hadoop3-2.2.11-shaded.jar
```
### How was this patch tested?
**Build**
I built a custom distro with `-Phadoop-cloud`:
```
./dev/make-distribution.sh --name custom-spark --pip --tgz -Phadoop-3 -Phadoop-cloud -Pscala-2.12
```
**Run**
I ran a PySpark job that successfully reads and writes using GCS:
```
from pyspark.sql import SparkSession
def main() -> None:
# Create SparkSession.
spark = (SparkSession.builder
.appName('copy-shakespeare')
.getOrCreate())
# Read.
df = spark.read.text('gs://dataproc-datasets-us-central1/shakespeare')
# Write.
df.write.text('gs://cnauroth-hive-metastore-proxy-dist/output/copy-shakespeare')
spark.stop()
if __name__ == '__main__':
main()
```
Authored-by: Chris Nauroth <cnaurothapache.org>
Closes #40511 from cnauroth/SPARK-42888.
Authored-by: Chris Nauroth <cn...@apache.org>
Signed-off-by: Dongjoon Hyun <do...@apache.org>
(cherry picked from commit f9017cbe521f7696128b8c9edcb825c79f16768b)
Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
dev/deps/spark-deps-hadoop-2-hive-2.3 | 2 +-
dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
pom.xml | 4 ++--
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/dev/deps/spark-deps-hadoop-2-hive-2.3 b/dev/deps/spark-deps-hadoop-2-hive-2.3
index 01e6c814fea..ec382099d24 100644
--- a/dev/deps/spark-deps-hadoop-2-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-2-hive-2.3
@@ -63,7 +63,7 @@ datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
derby/10.14.2.0//derby-10.14.2.0.jar
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
flatbuffers-java/1.12.0//flatbuffers-java-1.12.0.jar
-gcs-connector/hadoop2-2.2.7/shaded/gcs-connector-hadoop2-2.2.7-shaded.jar
+gcs-connector/hadoop2-2.2.11/shaded/gcs-connector-hadoop2-2.2.11-shaded.jar
gmetric4j/1.0.10//gmetric4j-1.0.10.jar
gson/2.2.4//gson-2.2.4.jar
guava/14.0.1//guava-14.0.1.jar
diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 4fd81b5f43b..4c834bed077 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -60,7 +60,7 @@ datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
derby/10.14.2.0//derby-10.14.2.0.jar
dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
flatbuffers-java/1.12.0//flatbuffers-java-1.12.0.jar
-gcs-connector/hadoop3-2.2.7/shaded/gcs-connector-hadoop3-2.2.7-shaded.jar
+gcs-connector/hadoop3-2.2.11/shaded/gcs-connector-hadoop3-2.2.11-shaded.jar
gmetric4j/1.0.10//gmetric4j-1.0.10.jar
gson/2.2.4//gson-2.2.4.jar
guava/14.0.1//guava-14.0.1.jar
diff --git a/pom.xml b/pom.xml
index 6d94e06ca78..4f1f93f3492 100644
--- a/pom.xml
+++ b/pom.xml
@@ -160,7 +160,7 @@
<aws.java.sdk.version>1.11.655</aws.java.sdk.version>
<!-- the producer is used in tests -->
<aws.kinesis.producer.version>0.12.8</aws.kinesis.producer.version>
- <gcs-connector.version>hadoop3-2.2.7</gcs-connector.version>
+ <gcs-connector.version>hadoop3-2.2.11</gcs-connector.version>
<!-- org.apache.httpcomponents/httpclient-->
<commons.httpclient.version>4.5.14</commons.httpclient.version>
<commons.httpcore.version>4.4.16</commons.httpcore.version>
@@ -3514,7 +3514,7 @@
<hadoop-client-api.artifact>hadoop-client</hadoop-client-api.artifact>
<hadoop-client-runtime.artifact>hadoop-yarn-api</hadoop-client-runtime.artifact>
<hadoop-client-minicluster.artifact>hadoop-client</hadoop-client-minicluster.artifact>
- <gcs-connector.version>hadoop2-2.2.7</gcs-connector.version>
+ <gcs-connector.version>hadoop2-2.2.11</gcs-connector.version>
<!-- SPARK-36547: Please don't upgrade the version below, otherwise there will be an error on building Hadoop 2.7 package -->
<scala-maven-plugin.version>4.3.0</scala-maven-plugin.version>
</properties>
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org