You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "FredXu (Jira)" <ji...@apache.org> on 2022/06/15 07:32:00 UTC
[jira] [Commented] (HUDI-3696) ORC dependency conflicts between hudi and spark
[ https://issues.apache.org/jira/browse/HUDI-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554429#comment-17554429 ]
FredXu commented on HUDI-3696:
------------------------------
Excuse me, miomiocat,
I met the same question to write orc with hudi :
{code:java}
WARN storage.BlockManager: Putting block rdd_96_0 failed due to exception java.lang.RuntimeException: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.orc.TypeDescription.createRowBatch()Lorg/apache/orc/storage/ql/exec/vector/VectorizedRowBatch;.
{code}
Then I delete orc-core-xxx.jar and use orc-core-xxx-nohive.jar ,
and it shows :(:
{code:java}
org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: org/apache/orc/impl/KeyProvider$Factory. {code}
and I can't find anything to solve this, can you teach me how to solve this ?
> ORC dependency conflicts between hudi and spark
> -----------------------------------------------
>
> Key: HUDI-3696
> URL: https://issues.apache.org/jira/browse/HUDI-3696
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: miomiocat
> Priority: Major
>
> Hudi use orc-core-xxx-nohive.jar to initialize orc storage api like VectorizedRowBatch but orc-core-xxx.jar in spark, so there are conflicts of package dependency between them
>
> {code:java}
> df.write.format("hudi")
> .option("hoodie.table.name", "spark_shell_created_cow_nopartition_type")
> .option("hoodie.datasource.write.precombine.field", "id")
> .option("hoodie.datasource.write.recordkey.field", "id")
> .option("hoodie.datasource.write.partitionpath.field", "id")
> .option("hoodie.datasource.write.hive_style_partitioning", "true")
> .option("hoodie.datasource.write.table.type", "COPY_ON_WRITE")
> .option("hoodie.table.base.file.format", "ORC")
> .mode(Overwrite)
> .save(basePath) {code}
>
> For example, I try to write a ORC format based hudi table with commands above, following excetion will be thrown:
>
> {code:java}
> Caused by: java.lang.NoSuchMethodError: org.apache.orc.TypeDescription.createRowBatch()Lorg/apache/orc/storage/ql/exec/vector/VectorizedRowBatch;
> at org.apache.hudi.io.storage.HoodieOrcWriter.<init>(HoodieOrcWriter.java:84)
> at org.apache.hudi.io.storage.HoodieFileWriterFactory.newOrcFileWriter(HoodieFileWriterFactory.java:102)
> at org.apache.hudi.io.storage.HoodieFileWriterFactory.getFileWriter(HoodieFileWriterFactory.java:59)
> at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:100)
> at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:73)
> at org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:46)
> at org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:83)
> at org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:40)
> at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
> at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:134)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ... 3 more {code}
>
> I tried to replace orc-core-xxx-nohive with orc-core-xxx and fixed some import statement, it worked successfully.
> So I think more compatibility modification is needed about orc format and spark.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)