You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/10 08:49:23 UTC

[GitHub] [iceberg] 0xffmeta opened a new issue, #5946: Not able to run spark procedure rewrite_data_files

0xffmeta opened a new issue, #5946:
URL: https://github.com/apache/iceberg/issues/5946

   ### Apache Iceberg version
   
   0.13.1
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   When I tried to run spark procedure
   ```
   CALL spark_catalog.system.rewrite_data_files(table => 'xxxx', where => 'dt="2022-10-01" and hh="00"')
   ```
   The spark returned error: 
   ```
   Caused by: java.lang.ClassNotFoundException:
   Failed to find data source: iceberg. Please find packages at
   http://spark.apache.org/third-party-projects.html
   
   	at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:443)
   	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:670)
   	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:720)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)
   	at org.apache.iceberg.spark.actions.Spark3BinPackStrategy.rewriteFiles(Spark3BinPackStrategy.java:68)
   	at org.apache.iceberg.spark.actions.BaseRewriteDataFilesSparkAction.lambda$rewriteFiles$2(BaseRewriteDataFilesSparkAction.java:232)
   	at org.apache.iceberg.spark.actions.BaseSparkAction.withJobGroupInfo(BaseSparkAction.java:98)
   	at org.apache.iceberg.spark.actions.BaseRewriteDataFilesSparkAction.rewriteFiles(BaseRewriteDataFilesSparkAction.java:230)
   	at org.apache.iceberg.spark.actions.BaseRewriteDataFilesSparkAction.lambda$doExecute$4(BaseRewriteDataFilesSparkAction.java:269)
   	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
   	at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:70)
   	at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:310)
   	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
   	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   	at java.base/java.lang.Thread.run(Unknown Source)
   Caused by: java.lang.ClassNotFoundException: iceberg.DefaultSource
   	at java.base/java.net.URLClassLoader.findClass(Unknown Source)
   	at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
   	at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
   	at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:656)
   	at scala.util.Try$.apply(Try.scala:213)
   	at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:656)
   	at scala.util.Failure.orElse(Try.scala:224)
   	at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:656)
   	... 16 more
   ```
   
   However below procedure ran successfully: 
   ```
   CALL spark_catalog.system.rewrite_data_files('xxxx')
   ```
   
   Is there anything missed here? I have already added conf
   ```
   spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] 0xffmeta closed issue #5946: Not able to run spark procedure rewrite_data_files

Posted by GitBox <gi...@apache.org>.
0xffmeta closed issue #5946: Not able to run spark procedure rewrite_data_files
URL: https://github.com/apache/iceberg/issues/5946


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] 0xffmeta commented on issue #5946: Not able to run spark procedure rewrite_data_files

Posted by GitBox <gi...@apache.org>.
0xffmeta commented on issue #5946:
URL: https://github.com/apache/iceberg/issues/5946#issuecomment-1278674526

   To resolve the issue, we need to use the maven shade plugin to consolidate `org.apache.spark.sql.sources.DataSourceRegister` into the fat jar. So that spark can recongnize the iceberg datasource.
   ```
     <plugin>
         <groupId>org.apache.maven.plugins</groupId>
         <artifactId>maven-shade-plugin</artifactId>
         <version>3.1.1</version>
         <executions>
             <execution>
                 <goals>
                     <goal>shade</goal>
                 </goals>
                 <configuration>
                     <transformers>
                         <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                     </transformers>
                 </configuration>
             </execution>
         </executions>
     </plugin>
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Gschiavon commented on issue #5946: Not able to run spark procedure rewrite_data_files

Posted by GitBox <gi...@apache.org>.
Gschiavon commented on issue #5946:
URL: https://github.com/apache/iceberg/issues/5946#issuecomment-1367851261

   A quick solution (workaround) to this is to add the package in the `--packages` while doing spark submit
   
   Like this:
   `--packages org.apache.iceberg:iceberg-spark-runtime-3.3_2.13:1.1.0`
   
   I had to add this too `--conf spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp`
   
    in case someone runs to the same error using k8s
   `Exception in thread "main" java.io.FileNotFoundException: /opt/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-785919a8-9943-4aa2-8694-d58267a93470-1.0.xml (No such file or directory)`
   
   Another workaround would be to download the iceberg package into the docker image


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Not able to run spark procedure rewrite_data_files [iceberg]

Posted by "suryaprabhakark (via GitHub)" <gi...@apache.org>.
suryaprabhakark commented on issue #5946:
URL: https://github.com/apache/iceberg/issues/5946#issuecomment-1951663572

   I faced the same issue with Spark 3.3 but 3.2 is working fine. Not sure what is the issue though. I had to do the solution @Gschiavon suggested. Tested locally and Dataproc as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org