You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/01 05:42:30 UTC

[GitHub] [iceberg] harini-venkataraman opened a new issue, #6089: Issue with Creation of Database using Spark

harini-venkataraman opened a new issue, #6089:
URL: https://github.com/apache/iceberg/issues/6089

   ### Apache Iceberg version
   
   0.14.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   With the configuration, tried to create a database using Hive MetaStore and write it to MinIO.
   
   **Configuration :**
   spark.sql.catalog.ibdemo               org.apache.iceberg.spark.SparkCatalog
   spark.sql.catalog.ibdemo.type          hive
   spark.sql.catalog.ibdemo.uri           thrift://hive-metastore:9083
   spark.sql.catalog.ibdemo.io-impl       org.apache.iceberg.aws.s3.S3FileIO
   spark.sql.catalog.ibdemo.warehouse     s3://warehouse
   spark.sql.catalog.ibdemo.s3.endpoint   http://minio:9000
   spark.sql.defaultCatalog               ibdemo
   
   Please note that the following.
   1. Classpath is mapped to right location inside the running container.
   2. Spark is able to access data from existing schemas created via Trino
   3. It works seemlessly when tried with PostgreSQL, unlink Hive MetaStore
   4. Tried with both versions Hive Metastore v3.1.3 and 4.0.0-alpha1 
   5. Spark Version -  version 2.4.4, IceBerg - 0.14.1, iceberg-spark-runtime - 0.13.2
   
   < Response when created database from Spark-Sql using PostgreSQL Metastore >
   ```
   Spark master: local[*], Application Id: local-1667223782694
   spark-sql> CREATE DATABASE test;
   Time taken: 1.342 seconds
   spark-sql> CREATE DATABASE warehouse_demo;
   Time taken: 0.029 seconds 
   ```
   
   Whereas, getting the following exception while writing the schema using **Hive MetaStore.** 
   `
   java.lang.RuntimeException: Metastore operation failed for warehouse.taxi
     at org.apache.iceberg.hive.HiveCatalog.defaultWarehouseLocation(HiveCatalog.java:466)
     at org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.createTransaction(BaseMetastoreCatalog.java:187)
     at org.apache.iceberg.CachingCatalog$CachingTableBuilder.createTransaction(CachingCatalog.java:268)
     at org.apache.iceberg.spark.SparkCatalog.stageCreate(SparkCatalog.java:204)
     at org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:130)
     at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
     at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
     at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
     at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
     at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
     at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
     at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
     at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
     at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
     at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
     at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] harini-venkataraman closed issue #6089: Issue with Creation of Database using Spark

Posted by GitBox <gi...@apache.org>.

harini-venkataraman closed issue #6089: Issue with Creation of Database using Spark
URL: https://github.com/apache/iceberg/issues/6089


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] harini-venkataraman commented on issue #6089: Issue with Creation of Database using Spark

Posted by GitBox <gi...@apache.org>.

harini-venkataraman commented on issue #6089:
URL: https://github.com/apache/iceberg/issues/6089#issuecomment-1303312171

   **RCA :**
   Spark had introduced a new configuration - `spark.sql.warehouse.dir`
    https://issues.apache.org/jira/browse/SPARK-15034
   Tried changing this in the configuration file from `spark.sql.catalog.catalog_name.warehouse`  to `spark.sql.catalog.catalog_name.warehouse.dir` and re built the image!
   With this new configuration and database creation is working seamlessly.
   **PS :** There were previous issues reported to Spark for this relative path issue and we encountered the same exception after updating spark version from 3.3.0 to 3.3.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org