You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/10 21:01:05 UTC

[GitHub] [iceberg] XAZAD opened a new issue, #6172: rewriteDataFiles throws exception in spark 3.2

XAZAD opened a new issue, #6172:
URL: https://github.com/apache/iceberg/issues/6172

   ### Apache Iceberg version
   
   0.13.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   Method rewriteDataFiles throws 
   `org.apache.spark.sql.connector.catalog.CatalogNotFoundException: Catalog 'ice' plugin class not found: spark.sql.catalog.ice is not defined.`
   
   My code:
   
   ```
   val IcebergTable: IcebergTable = Spark3Util.loadIcebergTable(spark,"ice.someDb.someSchema")
   println(IcebergTable.schema.asStruct.fields())
   
   SparkActions
       .get(spark)
       .rewriteDataFiles(IcebergTable)
       .option("target-file-size-bytes", (1024L * 1024L * 1024L).toString)
       .option("min-input-files", "2")
       .execute()
   ```
   
   my configuration:
   `
   {
       "hive.metastore.uris": "thrift://*****",
       "iceberg.engine.hive.enabled": "true",
       "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
       "spark.sql.catalog.ice.uri": "thrift://*****",
       "spark.sql.catalog.ice": "org.apache.iceberg.spark.SparkCatalog",
       "spark.sql.catalog.ice.type": "hive"
   }
   `
   my previous configuration was:
   `
   {
       "spark.sql.catalog.spark_catalog.uri": "***",
       "spark.sql.catalog.spark_catalog": "org.apache.iceberg.spark.SparkSessionCatalog"
       "spark.sql.catalog.spark_catalog": "hive"
   }
   `
   error was 
   `java.lang.RuntimeException: org.apache.spark.sql.connector.catalog.CatalogNotFoundException: Catalog 'default_iceberg' plugin class not found: spark.sql.catalog.default_iceberg is not defined`
   also while using org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions I've got an exception:
   "Table %s is not an Iceberg table" in method Spark3Util.loadIcebergTable
   changing catalog class solved it. 
   
   I'm using spark on Kubernetes, but I don't think it has a big affection to problem.
   
   Can you please give me an advice where to go  further, thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] XAZAD commented on issue #6172: rewriteDataFiles throws exception in spark 3.2

Posted by GitBox <gi...@apache.org>.

XAZAD commented on issue #6172:
URL: https://github.com/apache/iceberg/issues/6172#issuecomment-1311676676

   some more details:
   <img width="1396" alt="image" src="https://user-images.githubusercontent.com/13484463/201346541-cf319f21-dd02-4781-956b-7ba98e64b599.png">
   full stack trace:
   ```
   java.lang.RuntimeException: org.apache.spark.sql.connector.catalog.CatalogNotFoundException: Catalog 'ice' plugin class not found: spark.sql.catalog.ice is not defined
     org.apache.iceberg.util.ExceptionUtil.castAndThrow(ExceptionUtil.java:41)
     org.apache.iceberg.util.Tasks.throwOne(Tasks.java:589)
     org.apache.iceberg.util.Tasks.access$100(Tasks.java:42)
     org.apache.iceberg.util.Tasks$Builder.runParallel(Tasks.java:387)
     org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
     org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
     org.apache.iceberg.spark.actions.BaseRewriteDataFilesSparkAction.doExecute(BaseRewriteDataFilesSparkAction.java:270)
     org.apache.iceberg.spark.actions.BaseRewriteDataFilesSparkAction.execute(BaseRewriteDataFilesSparkAction.java:176)
     org.apache.iceberg.spark.actions.BaseRewriteDataFilesSparkAction.execute(BaseRewriteDataFilesSparkAction.java:75)
     ammonite.$sess.cmd0$Helper.<init>(cmd0.sc:15)
     ammonite.$sess.cmd0$.<init>(cmd0.sc:7)
     ammonite.$sess.cmd0$.<clinit>(cmd0.sc:-1)
   org.apache.spark.sql.connector.catalog.CatalogNotFoundException: Catalog 'ice' plugin class not found: spark.sql.catalog.ice is not defined
     org.apache.spark.sql.errors.QueryExecutionErrors$.catalogPluginClassNotFoundError(QueryExecutionErrors.scala:1443)
     org.apache.spark.sql.connector.catalog.Catalogs$.load(Catalogs.scala:51)
     org.apache.spark.sql.connector.catalog.CatalogManager.$anonfun$catalog$1(CatalogManager.scala:52)
     scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
     org.apache.spark.sql.connector.catalog.CatalogManager.catalog(CatalogManager.scala:52)
     org.apache.spark.sql.connector.catalog.CatalogV2Util$.$anonfun$getTableProviderCatalog$1(CatalogV2Util.scala:372)
     scala.Option.map(Option.scala:230)
     org.apache.spark.sql.connector.catalog.CatalogV2Util$.getTableProviderCatalog(CatalogV2Util.scala:372)
     org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:229)
     scala.Option.map(Option.scala:230)
     org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
     org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)
     org.apache.iceberg.spark.actions.SparkBinPackStrategy.rewriteFiles(SparkBinPackStrategy.java:69)
     org.apache.iceberg.spark.actions.BaseRewriteDataFilesSparkAction.lambda$rewriteFiles$2(BaseRewriteDataFilesSparkAction.java:234)
     org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:112)
     org.apache.iceberg.spark.actions.BaseRewriteDataFilesSparkAction.rewriteFiles(BaseRewriteDataFilesSparkAction.java:232)
     org.apache.iceberg.spark.actions.BaseRewriteDataFilesSparkAction.lambda$doExecute$4(BaseRewriteDataFilesSparkAction.java:271)
     org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
     org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:70)
     org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:310)
     java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
     java.util.concurrent.FutureTask.run(FutureTask.java:264)
     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
     java.lang.Thread.run(Thread.java:829)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] nastra commented on issue #6172: rewriteDataFiles throws exception in spark 3.2

Posted by GitBox <gi...@apache.org>.

nastra commented on issue #6172:
URL: https://github.com/apache/iceberg/issues/6172#issuecomment-1311340587

   The error message mentions `default_iceberg`/`ice` so you'd have to make sure that this catalog is propertly set up: `spark.sql.catalog.(catalog_name): ...`. Addtional details can be found at https://iceberg.apache.org/spark-quickstart/#adding-a-catalog


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] dmgcodevil commented on issue #6172: rewriteDataFiles throws exception in spark 3.2

Posted by "dmgcodevil (via GitHub)" <gi...@apache.org>.

dmgcodevil commented on issue #6172:
URL: https://github.com/apache/iceberg/issues/6172#issuecomment-1428806091

   @RussellSpitzer it worked after I specified `default_iceberg` catalog:
   
   ```
   .config(s"spark.sql.catalog.default_iceberg", "org.apache.iceberg.spark.SparkCatalog")
    .config(s"spark.sql.catalog.default_iceberg.type", "hive")
    .config(s"spark.sql.catalog.default_iceberg.uri", conf.metastore)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] XAZAD commented on issue #6172: rewriteDataFiles throws exception in spark 3.2

Posted by GitBox <gi...@apache.org>.

XAZAD commented on issue #6172:
URL: https://github.com/apache/iceberg/issues/6172#issuecomment-1311770953

   I'll try but it will take some tome cause infrastructure I'm using is huge and it will take time to deploy new version on library even in dev.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] nastra commented on issue #6172: rewriteDataFiles throws exception in spark 3.2

Posted by GitBox <gi...@apache.org>.

nastra commented on issue #6172:
URL: https://github.com/apache/iceberg/issues/6172#issuecomment-1311769354

   Could you maybe check whether the issue exists on the latest Iceberg version (1.0.0)? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #6172: rewriteDataFiles throws exception in spark 3.2

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on issue #6172:
URL: https://github.com/apache/iceberg/issues/6172#issuecomment-1428815806

   @dmgcodevil That would be a sign of what i'm talking about being broken. That catalog is always made by default. See
   
   https://github.com/apache/iceberg/blob/c07f2aabc0a1d02f068ecf1514d2479c0fbdd3b0/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java#L216-L231
   
   So your configuration would change the active session but the rewriteDatafiles is still relying on the catalog being setup by default.
   
   If you could give me more info about your env I'd like to see if there is a common thread, espeically if you could check the SQLConf like I mentioned.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] dmgcodevil commented on issue #6172: rewriteDataFiles throws exception in spark 3.2

Posted by "dmgcodevil (via GitHub)" <gi...@apache.org>.

dmgcodevil commented on issue #6172:
URL: https://github.com/apache/iceberg/issues/6172#issuecomment-1428771738

   same with `1.0.0`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #6172: rewriteDataFiles throws exception in spark 3.2

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.

RussellSpitzer commented on issue #6172:
URL: https://github.com/apache/iceberg/issues/6172#issuecomment-1428795920

   The Catalog Not Found, implies that a different catalog manager is being used by the RewriteManager than is being used by the normal REPL, assuming that you are able to access the catalog plugin normally. If you have a chance, can you check 
   ```
   val sqlconf = SQLConf.get
   sqlconf.getConfString("spark.sql.catalog.ice") // Or whatever you catalog is named
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] XAZAD commented on issue #6172: rewriteDataFiles throws exception in spark 3.2

Posted by GitBox <gi...@apache.org>.

XAZAD commented on issue #6172:
URL: https://github.com/apache/iceberg/issues/6172#issuecomment-1311380547

   As a mentioned before catalogs are configured and working properly
   
   `{ "hive.metastore.uris": "thrift://*****",
   "iceberg.engine.hive.enabled": "true",
   "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", 
   "spark.sql.catalog.ice.uri": "thrift://*****", 
   "spark.sql.catalog.ice": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.ice.type": "hive" }`
   
    on both cases I'm able to read and write to catalog tables with spark.table and spark.writeTo
   
   Also in the first case  
   
   `
   
   Spark3Util.loadIcebergTable(spark,"ice.someDb.someSchema")
   
   `
   Works fine and as a precheck I can see table schema with:
   
   `
   IcebergTable.schema.asStruct.fields())
   `
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org