You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2022/01/24 13:25:00 UTC
[jira] [Closed] (HUDI-3262) Not able to query hudi tables(spark datasource read) from bundles (utilities, integ test suite)
[ https://issues.apache.org/jira/browse/HUDI-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
sivabalan narayanan closed HUDI-3262.
-------------------------------------
Assignee: sivabalan narayanan (was: leesf)
Resolution: Fixed
> Not able to query hudi tables(spark datasource read) from bundles (utilities, integ test suite)
> -----------------------------------------------------------------------------------------------
>
> Key: HUDI-3262
> URL: https://issues.apache.org/jira/browse/HUDI-3262
> Project: Apache Hudi
> Issue Type: Bug
> Components: tests-ci
> Reporter: Raymond Xu
> Assignee: sivabalan narayanan
> Priority: Blocker
> Labels: pull-request-available, sev:normal
> Fix For: 0.11.0
>
> Original Estimate: 2h
> Remaining Estimate: 2h
>
> Guess we need to fix the way our bundles are packaged. For eg, I tried to query hudi table using hudi-utilities bundle and it succeeds w/ 0.10.1, but fails w/ master. Should be the same reason why integ test suite bundle fails to query hudi table.
> {code:java}
> ./bin/spark-shell \
> --packages org.apache.spark:spark-avro_2.11:2.4.4 \
> --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --jars ~/Documents/personal/projects/apache_hudi_dec/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.10.1-rc2.jar
> scala> val df = spark.read.format("hudi").load("/tmp/hudi-deltastreamer-ny/")
> scala> df.count
> {code}
>
> {code:java}
> ./bin/spark-shell \
> --packages org.apache.spark:spark-avro_2.11:2.4.4 \
> --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --jars ~/Documents/personal/projects/nov26/hudi/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0-SNAPSHOT.jar
> scala> val df = spark.read.format("hudi").load("/tmp/hudi-deltastreamer-ny/")
> java.lang.ClassNotFoundException: Failed to find data source: hudi. Please find packages at http://spark.apache.org/third-party-projects.html
> at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:675)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:213)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:197)
> ... 49 elided
> Caused by: java.lang.ClassNotFoundException: hudi.DefaultSource
> at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:652)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20$$anonfun$apply$12.apply(DataSource.scala:652)
> at scala.util.Try$.apply(Try.scala:192)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:652)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$20.apply(DataSource.scala:652)
> at scala.util.Try.orElse(Try.scala:84)
> at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:652)
> ... 51 more {code}
>
>
> Original issue reported via github issue:
> detailed in [https://github.com/apache/hudi/issues/4621]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)