You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by "Nandakumar-M (via GitHub)" <gi...@apache.org> on 2023/06/15 11:24:16 UTC

[GitHub] [iceberg] Nandakumar-M opened a new issue, #7847: Iceberg does not work with Spark's default hive metastore (embedded Derby database)

Nandakumar-M opened a new issue, #7847:
URL: https://github.com/apache/iceberg/issues/7847

   ### Apache Iceberg version
   
   main (development)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   When using Spark shell or spark sql in local mode, it uses the embedded derby database as the hive metastore.
   
   I am using SparkSessionCatalog as described in the getting started documentation. 
   
   `$ ./spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.3.0 \
     --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
     --conf spark.sql.catalog.spark_catalog.type=hive`
   
   While creating a non-iceberg table with this, it works fine.
   
   `spark-sql> create table spark1 as select 1 as  a;
   Time taken: 3.297 seconds
   spark-sql> show tables;
   spark1
   Time taken: 0.157 seconds, Fetched 1 row(s)`
   
   However, when trying to create iceberg table, it throws up an error.
   
   `spark-sql> create table iceberg1 using iceberg as select 1 as  a;
   23/06/13 13:01:54 WARN MetastoreLock: Failed to create lock LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, dbname:default, tablename:iceberg1, operationType:UNSET)], user:nandakumar, hostname:nandakumar-Latitude-3410, agentInfo:Iceberg-956e7540-eaa6-46f2-97f6-4c8d6586c140)
   MetaException(message:Unable to update transaction database java.sql.SQLSyntaxErrorException: Table/View 'NEXT_LOCK_ID' does not exist.
   	at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
   	at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
   	at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
   	at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
   	at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
   	at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
   	at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
   	at org.apache.derby.impl.jdbc.EmbedStatement.executeQuery(Unknown Source)
   	at com.jolbox.bonecp.StatementHandle.executeQuery(StatementHandle.java:464)
   	at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:950)
   	at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:872)
   	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:6375)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
   	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
   	at com.sun.proxy.$Proxy16.lock(Unknown Source)
   	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2153)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
   	at com.sun.proxy.$Proxy17.lock(Unknown Source)
   	at org.apache.iceberg.hive.MetastoreLock.lambda$createLock$3(MetastoreLock.java:305)
   	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:58)
   	at org.apache.iceberg.ClientPoolImpl.run(ClientPoolImpl.java:51)
   	at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:122)
   	at org.apache.iceberg.hive.MetastoreLock.lambda$createLock$4(MetastoreLock.java:305)
   	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
   	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
   	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
   	at org.apache.iceberg.hive.MetastoreLock.createLock(MetastoreLock.java:302)
   	at org.apache.iceberg.hive.MetastoreLock.acquireLock(MetastoreLock.java:185)
   	at org.apache.iceberg.hive.MetastoreLock.lock(MetastoreLock.java:146)
   	at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:194)
   	at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135)
   	at org.apache.iceberg.BaseTransaction.commitCreateTransaction(BaseTransaction.java:311)
   	at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:290)
   	at org.apache.iceberg.spark.source.StagedSparkTable.commitStagedChanges(StagedSparkTable.java:34)
   	at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:484)
   	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
   	at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:468)
   	at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:463)
   	at org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:106)
   	at org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:127)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:93)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:80)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:78)
   	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
   	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
   	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
   	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
   	at scala.collection.Iterator.foreach(Iterator.scala:943)
   	at scala.collection.Iterator.foreach$(Iterator.scala:943)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
   	at scala.collection.IterableLike.foreach(IterableLike.scala:74)
   	at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
   	at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:287)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:966)
   	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:191)
   	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:214)
   	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
   	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1054)
   	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1063)
   	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: ERROR 42X05: Table/View 'NEXT_LOCK_ID' does not exist.
   	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
   	at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
   	at org.apache.derby.impl.sql.compile.FromBaseTable.bindTableDescriptor(Unknown Source)
   	at org.apache.derby.impl.sql.compile.FromBaseTable.bindNonVTITables(Unknown Source)
   	at org.apache.derby.impl.sql.compile.FromList.bindTables(Unknown Source)
   	at org.apache.derby.impl.sql.compile.SelectNode.bindNonVTITables(Unknown Source)
   	at org.apache.derby.impl.sql.compile.DMLStatementNode.bindTables(Unknown Source)
   	at org.apache.derby.impl.sql.compile.DMLStatementNode.bind(Unknown Source)
   	at org.apache.derby.impl.sql.compile.CursorNode.bindStatement(Unknown Source)
   	at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
   	at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
   	at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown Source)
   	... 98 more`
   
   Similar [issue]( https://github.com/apache/iceberg/issues/370) was reported in 2019 -- just the exception message is a bit different this time.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

Re: [I] Iceberg does not work with Spark's default hive metastore (embedded Derby database) [iceberg]

Posted by "nsucheendran (via GitHub)" <gi...@apache.org>.

nsucheendran commented on issue #7847:
URL: https://github.com/apache/iceberg/issues/7847#issuecomment-2027409843

   > I set the below config to switch to hadoop. "spark.sql.extensions" -> "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", "spark.sql.catalog.iceberg_catalog" -> "org.apache.iceberg.spark.SparkCatalog", "spark.sql.catalog.iceberg_catalog.type" -> "hadoop", "spark.sql.catalog.iceberg_catalog.warehouse" -> warehouseFile, "spark.sql.catalog.iceberg_catalog.cache-enabled" -> "false"
   
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] vinitamaloo-asu commented on issue #7847: Iceberg does not work with Spark's default hive metastore (embedded Derby database)

Posted by "vinitamaloo-asu (via GitHub)" <gi...@apache.org>.

vinitamaloo-asu commented on issue #7847:
URL: https://github.com/apache/iceberg/issues/7847#issuecomment-1741621342

   I used a hadoop catalog for my testing. Worked fine! Thanks @RussellSpitzer .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] chiennht commented on issue #7847: Iceberg does not work with Spark's default hive metastore (embedded Derby database)

Posted by "chiennht (via GitHub)" <gi...@apache.org>.

chiennht commented on issue #7847:
URL: https://github.com/apache/iceberg/issues/7847#issuecomment-1694346473

   Same for my:
   Spark: 3.3.3
   Hive:4.0.0-beta1
   `%spark.pyspark
   import logging
   import os
   from pyspark import SparkConf
   from pyspark import SparkContext
   from pyspark.sql import SparkSession
   from pyspark.sql.types import StructType, StructField, FloatType, LongType, DoubleType, StringType
   conf = (
       SparkConf()
       .set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
       .set("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog")
       .set("spark.sql.catalog.spark_catalog.type","hive")
       )
   S = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()`
   
   `S.sql("""CREATE TABLE company_vn.listed_index_master
   (
       index_code string,
       short_index_code string,
       index_name string,
       index_description string,
       creation_date date,
       establish_date date,
       city string,
       country string
       ) --partitioned BY SPEC (country)
       USING ICEBERG
       TBLPROPERTIES ('format-version'='2','engine.hive.enabled'=TRUE);
   """)`
   
   `Py4JJavaError: An error occurred while calling o138.create.
   : org.apache.spark.SparkException: Writing job aborted
   	at org.apache.spark.sql.errors.QueryExecutionErrors$.writingJobAbortedError(QueryExecutionErrors.scala:767)
   	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:409)
   	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:353)
   	at org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.writeWithV2(WriteToDataSourceV2Exec.scala:108)
   	at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:503)
   	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
   	at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:491)
   	at org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:486)
   	at org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:108)
   	at org.apache.spark.sql.execution.datasources.v2.AtomicCreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:131)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
   	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
   	at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
   	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
   	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
   	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
   	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
   	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
   	at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:116)
   	at org.apache.spark.sql.DataFrameWriterV2.runCommand(DataFrameWriterV2.scala:195)
   	at org.apache.spark.sql.DataFrameWriterV2.create(DataFrameWriterV2.scala:125)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 4) (ddbe96d75e11 executor 2): java.lang.ClassCastException: org.apache.parquet.schema.MessageType cannot be cast to org.apache.iceberg.shaded.org.apache.parquet.schema.MessageType
   	at org.apache.iceberg.parquet.ParquetWriter.<init>(ParquetWriter.java:106)
   	at org.apache.iceberg.parquet.Parquet$WriteBuilder.build(Parquet.java:305)
   	at org.apache.iceberg.parquet.Parquet$DataWriteBuilder.build(Parquet.java:672)
   	at org.apache.iceberg.data.BaseFileWriterFactory.newDataWriter(BaseFileWriterFactory.java:133)
   	at org.apache.iceberg.io.RollingDataWriter.newWriter(RollingDataWriter.java:52)
   	at org.apache.iceberg.io.RollingDataWriter.newWriter(RollingDataWriter.java:32)
   	at org.apache.iceberg.io.RollingFileWriter.openCurrentWriter(RollingFileWriter.java:108)
   	at org.apache.iceberg.io.RollingDataWriter.<init>(RollingDataWriter.java:47)
   	at org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.<init>(SparkWrite.java:686)
   	at org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.<init>(SparkWrite.java:676)
   	at org.apache.iceberg.spark.source.SparkWrite$WriterFactory.createWriter(SparkWrite.java:660)
   	at org.apache.iceberg.spark.source.SparkWrite$WriterFactory.createWriter(SparkWrite.java:638)
   	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:430)
   	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:381)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:136)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:750)
   
   Driver stacktrace:
   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2668)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2604)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2603)
   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2603)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1178)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1178)
   	at scala.Option.foreach(Option.scala:407)
   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1178)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2856)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2798)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2787)
   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2238)
   	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:377)
   	... 45 more
   Caused by: java.lang.ClassCastException: org.apache.parquet.schema.MessageType cannot be cast to org.apache.iceberg.shaded.org.apache.parquet.schema.MessageType
   	at org.apache.iceberg.parquet.ParquetWriter.<init>(ParquetWriter.java:106)
   	at org.apache.iceberg.parquet.Parquet$WriteBuilder.build(Parquet.java:305)
   	at org.apache.iceberg.parquet.Parquet$DataWriteBuilder.build(Parquet.java:672)
   	at org.apache.iceberg.data.BaseFileWriterFactory.newDataWriter(BaseFileWriterFactory.java:133)
   	at org.apache.iceberg.io.RollingDataWriter.newWriter(RollingDataWriter.java:52)
   	at org.apache.iceberg.io.RollingDataWriter.newWriter(RollingDataWriter.java:32)
   	at org.apache.iceberg.io.RollingFileWriter.openCurrentWriter(RollingFileWriter.java:108)
   	at org.apache.iceberg.io.RollingDataWriter.<init>(RollingDataWriter.java:47)
   	at org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.<init>(SparkWrite.java:686)
   	at org.apache.iceberg.spark.source.SparkWrite$UnpartitionedDataWriter.<init>(SparkWrite.java:676)
   	at org.apache.iceberg.spark.source.SparkWrite$WriterFactory.createWriter(SparkWrite.java:660)
   	at org.apache.iceberg.spark.source.SparkWrite$WriterFactory.createWriter(SparkWrite.java:638)
   	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:430)
   	at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:381)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:136)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	... 1 more`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

Re: [I] Iceberg does not work with Spark's default hive metastore (embedded Derby database) [iceberg]

Posted by "nsucheendran (via GitHub)" <gi...@apache.org>.

nsucheendran commented on issue #7847:
URL: https://github.com/apache/iceberg/issues/7847#issuecomment-2007824814

   > I used a hadoop catalog for my testing. Worked fine! Thanks @RussellSpitzer .
   
   Hi @vinitamaloo-asu 
   My team is facing the same issue. Is there any code you can share on how you switched to hadoop catalog instead of derby?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] Nandakumar-M commented on issue #7847: Iceberg does not work with Spark's default hive metastore (embedded Derby database)

Posted by "Nandakumar-M (via GitHub)" <gi...@apache.org>.

Nandakumar-M commented on issue #7847:
URL: https://github.com/apache/iceberg/issues/7847#issuecomment-1592863202

   Will create a PR for this and share here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

Re: [I] Iceberg does not work with Spark's default hive metastore (embedded Derby database) [iceberg]

Posted by "vinitamaloo-asu (via GitHub)" <gi...@apache.org>.

vinitamaloo-asu commented on issue #7847:
URL: https://github.com/apache/iceberg/issues/7847#issuecomment-2008290040

   I set the below config to switch to hadoop.
         "spark.sql.extensions" -> "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
         "spark.sql.catalog.iceberg_catalog" -> "org.apache.iceberg.spark.SparkCatalog",
         "spark.sql.catalog.iceberg_catalog.type" -> "hadoop",
         "spark.sql.catalog.iceberg_catalog.warehouse" -> warehouseFile,
         "spark.sql.catalog.iceberg_catalog.cache-enabled" -> "false"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] vinitamaloo-asu commented on issue #7847: Iceberg does not work with Spark's default hive metastore (embedded Derby database)

Posted by "vinitamaloo-asu (via GitHub)" <gi...@apache.org>.

vinitamaloo-asu commented on issue #7847:
URL: https://github.com/apache/iceberg/issues/7847#issuecomment-1672116633

   Following


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #7847: Iceberg does not work with Spark's default hive metastore (embedded Derby database)

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on issue #7847:
URL: https://github.com/apache/iceberg/issues/7847#issuecomment-1694428156

   The embedded catalog doesn't support the interfaces (or concurrent access) required for iceberg to work. If you are just locally testing use a hadoop catalog or a full embeded metastore like this repository's integration tests. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org