You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by "kukayiyi (via GitHub)" <gi...@apache.org> on 2023/04/21 11:17:54 UTC

[GitHub] [iceberg] kukayiyi opened a new issue, #7396: Failed to initialize S3FileIO when writing to minio using spark

kukayiyi opened a new issue, #7396:
URL: https://github.com/apache/iceberg/issues/7396

   ### Query engine
   
   spark
   
   ### Question
   
   I encountered the following error when reading and writing using iceberg+spark+minio
   `
   Exception in thread "main" java.lang.IllegalArgumentException: Cannot initialize FileIO, missing no-arg constructor: org.apache.iceberg.aws.s3.S3FileIO
   	at org.apache.iceberg.CatalogUtil.loadFileIO(CatalogUtil.java:312)
   	at org.apache.iceberg.hadoop.HadoopCatalog.initialize(HadoopCatalog.java:118)
   	at org.apache.iceberg.CatalogUtil.loadCatalog(CatalogUtil.java:239)
   	at org.apache.iceberg.CatalogUtil.buildIcebergCatalog(CatalogUtil.java:284)
   	at org.apache.iceberg.spark.SparkCatalog.buildIcebergCatalog(SparkCatalog.java:135)
   	at org.apache.iceberg.spark.SparkCatalog.initialize(SparkCatalog.java:537)
   	at org.apache.iceberg.spark.SparkSessionCatalog.buildSparkCatalog(SparkSessionCatalog.java:77)
   	at org.apache.iceberg.spark.SparkSessionCatalog.initialize(SparkSessionCatalog.java:307)
   	at org.apache.spark.sql.connector.catalog.Catalogs$.load(Catalogs.scala:60)
   	at org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$catalog$1(CatalogManager.scala:53)
   	at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
   	at org.apache.spark.sql.connector.catalog.CatalogManager.catalog(CatalogManager.scala:53)
   	at org.apache.spark.sql.connector.catalog.CatalogManager.currentCatalog(CatalogManager.scala:122)
   	at org.apache.spark.sql.connector.catalog.LookupCatalog.currentCatalog(LookupCatalog.scala:34)
   	at org.apache.spark.sql.connector.catalog.LookupCatalog.currentCatalog$(LookupCatalog.scala:34)
   	at org.apache.spark.sql.catalyst.analysis.Analyzer.currentCatalog(Analyzer.scala:188)
   	at org.apache.spark.sql.connector.catalog.LookupCatalog$CatalogAndIdentifier$.unapply(LookupCatalog.scala:125)
   	at org.apache.spark.sql.connector.catalog.LookupCatalog$NonSessionCatalogAndIdentifier$.unapply(LookupCatalog.scala:72)
   	at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:565)
   	at com.yisa.iceberg.MinioIcebergExample.main(MinioIcebergExample.java:77)
   `
   My config is:
   `
   ("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
   ("spark.sql.catalog.demo", "org.apache.iceberg.spark.SparkSessionCatalog")
   ("spark.sql.catalog.demo.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
   ("spark.sql.catalog.demo.warehouse", "s3a://iceberg")
   ("spark.sql.catalog.demo.s3.endpoint", "http://127.0.0.19000")
   ("spark.sql.defaultCatalog", "demo")
   ("spark.sql.catalogImplementation", "in-memory")
   ("spark.sql.catalog.demo.type", "hadoop")
   ("spark.executor.heartbeatInterval", "300000")
   ("spark.network.timeout", "400000");
   ("spark.hadoop.fs.s3a.access.key", "minioadmin");
   ("spark.hadoop.fs.s3a.secret.key", "minioadmin");
   ("spark.hadoop.fs.s3a.endpoint",  "127.0.0.1:9000");
   ("spark.hadoop.fs.s3a.connection.ssl.enabled", "false");
   ("spark.hadoop.fs.s3a.path.style.access", "true");
   ("spark.hadoop.fs.s3a.attempts.maximum", "1");
   ("spark.hadoop.fs.s3a.connection.establish.timeout", "5000");
   ("spark.hadoop.fs.s3a.connection.timeout", "10000");
   `
   This config is based on:https://blog.min.io/manage-iceberg-tables-with-spark/
   I used the following jar package: 
   `
   <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-core_2.12</artifactId>
       <version>3.3.2</version>
   </dependency>
   <dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-sql_2.12</artifactId>
       <version>3.3.2</version>
   </dependency>
   <dependency>
       <groupId>org.apache.hadoop</groupId>
       <artifactId>hadoop-aws</artifactId>
       <version>3.3.2</version>
   </dependency>
   <dependency>
       <groupId>com.amazonaws</groupId>
       <artifactId>aws-java-sdk-bundle</artifactId>
       <version>1.12.452</version>
   </dependency>
   <dependency>
       <groupId>org.apache.iceberg</groupId>
       <artifactId>iceberg-spark-runtime-3.3_2.12</artifactId>
       <version>1.2.1</version>
   </dependency>
   `
   ll I did was read a CSV file from Minio and convert it to an Iceberg table for storage, just like the link I attached above. I also tried writing a regular dataset as an Iceberg table and got the same error.By the way, I have tried it on java/python/spark-sql and reported the same error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] Guaniuzzt commented on issue #7396: Failed to initialize S3FileIO when writing to minio using spark

Posted by "Guaniuzzt (via GitHub)" <gi...@apache.org>.

Guaniuzzt commented on issue #7396:
URL: https://github.com/apache/iceberg/issues/7396#issuecomment-1646648551

   ("spark.sql.catalog.demo.s3.endpoint", "http://127.0.0.9000")
   =========================================
   I think your s3 endpoint should be http://127.0.0.1:9000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] nastra commented on issue #7396: Failed to initialize S3FileIO when writing to minio using spark

Posted by "nastra (via GitHub)" <gi...@apache.org>.

nastra commented on issue #7396:
URL: https://github.com/apache/iceberg/issues/7396#issuecomment-1689722501

   > had the same problem, using rest catalog, s3 , iceberg 1.3.1 and spark 3.4.1 problem was fixed by removing following option:
   > 
   > --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   
   People might search for this issue and think that this is a viable solution. This isn't the case and I don't see how this would fix it.
   
   The problem is caused by 
   a) mixed AWS SDK versions on the classpath 
   b) missing AWS SDK version on the classpath
   
   The solution is to make sure that you're using the correct AWS SDK version for the right Iceberg version, which is mentioned in https://iceberg.apache.org/docs/latest/aws/#spark.
   
   Going forward (starting with Iceberg 1.4.0), Iceberg will provide an `org.apache.iceberg:iceberg-aws-bundle` artifact, which will contain the correct version of the AWS SDK dependencies.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kukayiyi closed issue #7396: Failed to initialize S3FileIO when writing to minio using spark

Posted by "kukayiyi (via GitHub)" <gi...@apache.org>.

kukayiyi closed issue #7396: Failed to initialize S3FileIO when writing to minio using spark
URL: https://github.com/apache/iceberg/issues/7396


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] andreanerla commented on issue #7396: Failed to initialize S3FileIO when writing to minio using spark

Posted by "andreanerla (via GitHub)" <gi...@apache.org>.

andreanerla commented on issue #7396:
URL: https://github.com/apache/iceberg/issues/7396#issuecomment-1620336655

   I actually had the same issue when writing to AWS Glue Data Catalog. The issue was that I didn't specify URI location for the Glue Database.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] nastra commented on issue #7396: Failed to initialize S3FileIO when writing to minio using spark

Posted by "nastra (via GitHub)" <gi...@apache.org>.

nastra commented on issue #7396:
URL: https://github.com/apache/iceberg/issues/7396#issuecomment-1517691659

   Iceberg 1.2.1 uses AWS Bundle 2.20.18. Can you try with that version?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] mwullink commented on issue #7396: Failed to initialize S3FileIO when writing to minio using spark

Posted by "mwullink (via GitHub)" <gi...@apache.org>.

mwullink commented on issue #7396:
URL: https://github.com/apache/iceberg/issues/7396#issuecomment-1689595566

   had the same problem, using rest catalog, s3 , iceberg 1.3.1  and spark 3.4.1
   problem was fixed by removing following option:
   
   --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] mwullink commented on issue #7396: Failed to initialize S3FileIO when writing to minio using spark

Posted by "mwullink (via GitHub)" <gi...@apache.org>.

mwullink commented on issue #7396:
URL: https://github.com/apache/iceberg/issues/7396#issuecomment-1691087976

   thanks for the info, a iceberg-aws-bundle will make things much easier.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org