You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Miles Granger (Jira)" <ji...@apache.org> on 2023/10/26 08:51:00 UTC

[jira] [Created] (SPARK-45676) Upgrade to PySpark 3.5.0 gives Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

Miles Granger created SPARK-45676:
-------------------------------------

             Summary: Upgrade to PySpark 3.5.0 gives Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
                 Key: SPARK-45676
                 URL: https://issues.apache.org/jira/browse/SPARK-45676
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.5.0
            Reporter: Miles Granger


Using PySpark 3.4.1 w/ the following dependencies works fine for reading S3 files:

hadoop-client:3.3.4
hadoop-common:3.3.4
hadoop-aws:3.3.4
aws-java-sdk-bundle:1.12.262

Doing a simple upgrade to PySpark 3.5.0 (which is still using hadoop 3.3.4 AFAIK) results in failing to read the same S3 files:

```
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2688)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
	at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
	at org.apache.parquet.hadoop.util.HadoopInputFile.fromStatus(HadoopInputFile.java:44)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:76)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:450)
	... 14 more
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org