You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/13 13:03:01 UTC

[GitHub] [hudi] parisni commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

parisni commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-942282671


   same issue here:
   ```
   # This fails with the above error
   spark.sql("select * from my_table_rt").show() 
   # also this fails with the same error
   spark.read.format("hudi").load(my_table_path).show()
   # this works 
   spark.sql("select * from my_table_ro").show()
   ```
   
   Using our own spark 2.4.4 build compiled with glue metastore on emr 5. 
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 380, in show
       print(self._jdf.showString(n, 20, vertical))
     File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
     File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
       return f(*a, **kw)
     File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o138.showString.
   : java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.PartitionedFile.<init>(Lorg/apache/spark/sql/catalyst/InternalRow;Ljava/lang/String;JJ[Ljava/lang/String;)V
           at org.apache.hudi.MergeOnReadSnapshotRelation$$anonfun$7.apply(MergeOnReadSnapshotRelation.scala:217)
           at org.apache.hudi.MergeOnReadSnapshotRelation$$anonfun$7.apply(MergeOnReadSnapshotRelation.scala:209)
           at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
           at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
           at scala.collection.immutable.List.foreach(List.scala:392)
           at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
           at scala.collection.immutable.List.map(List.scala:296)
           at org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:209)
           at org.apache.hudi.MergeOnReadSnapshotRelation.buildScan(MergeOnReadSnapshotRelation.scala:110)
           at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:309)
           at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:309)
           at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:342)
           at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:341)
           at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:419)
           at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:337)
           at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:305)
           at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
           at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
           at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
           at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
           at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
           at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
           at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
           at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
           at scala.collection.Iterator$class.foreach(Iterator.scala:891)
           at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
           at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
           at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
           at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
           at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
           at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
           at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
           at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
           at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
           at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
           at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
           at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
           at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3365)
           at org.apache.spark.sql.Dataset.head(Dataset.scala:2550)
           at org.apache.spark.sql.Dataset.take(Dataset.scala:2764)
           at org.apache.spark.sql.Dataset.getRows(Dataset.scala:254)
           at org.apache.spark.sql.Dataset.showString(Dataset.scala:291)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
           at py4j.Gateway.invoke(Gateway.java:282)
           at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
           at py4j.commands.CallCommand.execute(CallCommand.java:79)
           at py4j.GatewayConnection.run(GatewayConnection.java:238)
           at java.lang.Thread.run(Thread.java:748)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org