You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/16 11:12:41 UTC

[GitHub] [hudi] pete91z commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

pete91z commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-899427179


   I am seeing this issue with MOR tables using Apache Spark 3.1.2 (Not using AWS EMR) and Hudi 0.7.0, is it possible to re-open please? Or is it fixed in 0.8.0?
   
   Context: I have created a table using deltastreamer. Deltastreamer appears to work fine, but later when I try and create dataframe to read the table in pyspark I get the following error:
   
   Traceback (most recent call last):
     File "./init_hudi_for_billing_ds", line 45, in <module>
       billingDF=sqlContext.read.format("hudi").load(basePath+"/*/*")
     File "/home/spark_311/py1/lib64/python3.6/dist-packages/pyspark/sql/readwriter.py", line 204, in load
       return self._df(self._jreader.load(path))
     File "/home/spark_311/py1/lib64/python3.6/dist-packages/py4j/java_gateway.py", line 1305, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File "/home/spark_311/py1/lib64/python3.6/dist-packages/pyspark/sql/utils.py", line 111, in deco
       return f(*a, **kw)
     File "/home/spark_311/py1/lib64/python3.6/dist-packages/py4j/protocol.py", line 328, in get_return_value
       format(target_id, ".", name), value)
   py4j.protocol.Py4JJavaError: An error occurred while calling o30.load.
   : java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(Lorg/apache/spark/sql/SparkSession;Lscala/collection/Seq;Lscala/collection/immutable/Map;Lscala/Option;Lorg/apache/spark/sql/execution/datasources/FileStatusCache;)V
   	at org.apache.hudi.HoodieSparkUtils$.createInMemoryFileIndex(HoodieSparkUtils.scala:89)
   	at org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:127)
   	at org.apache.hudi.MergeOnReadSnapshotRelation.<init>(MergeOnReadSnapshotRelation.scala:72)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:89)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:53)
   	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
   	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
   	at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
   	at scala.Option.getOrElse(Option.scala:189)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:239)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:748)
   
   I've tried setting the table.type as MERGE_ON_READ in the hudi options, but has no effect. These errors are not seen with COPY_ON_WRITE tables.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org