You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/15 14:56:14 UTC

[GitHub] [hudi] rahulpoptani opened a new issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

rahulpoptani opened a new issue #2180:
URL: https://github.com/apache/hudi/issues/2180


   **Describe the problem you faced**
   Hi Team,
   I've written a MERGE ON READ type table and made few upserts and deletes as mentioned in the documentation.
   When trying to read the hudi format output using Spark Datasource, I'm able to read the older version of data using "hoodie.datasource.query.type=read_optimized" option. But when trying to read the same dataset using "hoodie.datasource.query.type=snapshot", I'm getting below attached error.
   
   Here is the statement what I'm executing.
   val hudiDF = spark.read.format("org.apache.hudi")
   	.option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY,DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL)
   	.load("abfss://xx@xxx.dfs.core.windows.net/path/hudi_output_folder/*/")
   
   Here is my hoodie.properties file from .hoodie folder
   hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
   hoodie.table.name=hudi_mor
   hoodie.archivelog.folder=archived
   hoodie.table.type=MERGE_ON_READ
   hoodie.table.version=1
   hoodie.timeline.layout.version=1
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Write hudi format dataset in ADLS Gen2
   2. Perform Upserts/Deletes and write into same location from step 1.
   3. Read the data using Spark Datasource API
   
   **Expected behavior**
   
   Ideally I should be able to read back the dataset with all the changes performed using snapshot option.
   
   **Environment Description**
   
   * Hudi version : 0.6.0
   
   * Spark version : 2.4.5
   
   * Hive version : N/A
   
   * Hadoop version : Databricks 6.4 (Apache Spark 2.4.5, Scala 2.11)
   
   * Storage (HDFS/S3/GCS..) : ABFS (ADLS Gen2)
   
   * Running on Docker? (yes/no) : No
   
   **Stacktrace**
   
   ERROR Uncaught throwable from user code: java.lang.ArrayStoreException: org.apache.spark.sql.execution.datasources.SerializableFileStatus
   	at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:90)
   	at scala.collection.IterableLike$class.copyToArray(IterableLike.scala:254)
   	at scala.collection.AbstractIterable.copyToArray(Iterable.scala:54)
   	at scala.collection.TraversableOnce$class.copyToArray(TraversableOnce.scala:278)
   	at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:104)
   	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:286)
   	at scala.collection.AbstractTraversable.toArray(Traversable.scala:104)
   	at org.apache.hudi.MergeOnReadSnapshotRelation.buildFileIndex(MergeOnReadSnapshotRelation.scala:138)
   	at org.apache.hudi.MergeOnReadSnapshotRelation.<init>(MergeOnReadSnapshotRelation.scala:72)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:87)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:51)
   	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:351)
   	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:311)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:297)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:214)
   	at some.internal.app.path.HudiReader.read(HudiReader.java:29)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command--1:1)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw$$iw$$iw.<init>(command--1:44)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw$$iw.<init>(command--1:46)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw$$iw.<init>(command--1:48)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw$$iw.<init>(command--1:50)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$read$$iw.<init>(command--1:52)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$read.<init>(command--1:54)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$read$.<init>(command--1:58)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$read$.<clinit>(command--1)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$eval$.$print$lzycompute(<notebook>:7)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$eval$.$print(<notebook>:6)
   	at linea9ed2ae11338492abc70c94b9b10ce2025.$eval.$print(<notebook>)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:793)
   	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1054)
   	at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:645)
   	at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:644)
   	at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
   	at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
   	at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:644)
   	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:576)
   	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:572)
   	at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
   	at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:202)
   	at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
   	at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:202)
   	at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
   	at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
   	at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
   	at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:396)
   	at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$9.apply(DriverLocal.scala:373)
   	at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:238)
   	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
   	at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:233)
   	at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
   	at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:275)
   	at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
   	at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
   	at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
   	at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:644)
   	at scala.util.Try$.apply(Try.scala:192)
   	at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:639)
   	at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:485)
   	at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:597)
   	at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:390)
   	at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
   	at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
   	at java.lang.Thread.run(Thread.java:748)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rahulpoptani commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

rahulpoptani commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-719191806


   @bvaradar I was able to successfully perform read and write operations with OSS Spark and HDFS. But facing issues only in Databricks environment. Are there any plans to add guide with Databricks Environment and ADLS Gen2 storage?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-715656656


   @rahulpoptani sorry. Then I misread; I know we fixed some bundle issues jars built on master. Is that worth trying? [those were related to jetty use though


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rahulpoptani commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

rahulpoptani commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-714236090


   @vinothchandar We are actually trying to read using Spark 2.x . I'm not sure how it's linked to 1760. This error we are getting only in Databricks, and not in other environments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-710699046


   @rahulpoptani : Would it be possible to test with OSS spark version and read the snapshot to verify ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rahulpoptani commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

rahulpoptani commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-717699035


   Yes @bvaradar 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rahulpoptani commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

rahulpoptani commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-712585912


   I used a different environment where I used Spark 2.4.5 with Scala 2.12 and I was able to successfully perform Insert/Upsert/Deletes and Read on Merge-On-Read table type.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-720540158


   Opened a Jira to track this : https://issues.apache.org/jira/browse/HUDI-1368 
   This is currently not actively being worked on. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar closed issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

bvaradar closed issue #2180:
URL: https://github.com/apache/hudi/issues/2180


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rahulpoptani commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

rahulpoptani commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-717046074


   @vinothchandar I'm not facing jetty issue but I gave the master built a try and got another issue while reading (both snapshot and read optimized)
   Here is the error:
   `java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes
   	at org.apache.hudi.common.bootstrap.index.HFileBootstrapIndex.<clinit>(HFileBootstrapIndex.java:92)
   	at java.lang.Class.forName0(Native Method)
   	at java.lang.Class.forName(Class.java:264)
   	at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:53)
   	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:87)
   	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:98)
   	at org.apache.hudi.common.bootstrap.index.BootstrapIndex.getBootstrapIndex(BootstrapIndex.java:159)
   	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:100)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:95)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:89)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:80)
   	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:130)
   	at org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:167)
   	at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:353)
   	at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$16.apply(InMemoryFileIndex.scala:353)
   	at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
   	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
   	at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
   	at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
   	at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
   	at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.org$apache$spark$sql$execution$datasources$InMemoryFileIndex$$listLeafFiles(InMemoryFileIndex.scala:353)
   	at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$bulkListLeafFiles$2.apply(InMemoryFileIndex.scala:264)
   	at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$bulkListLeafFiles$2.apply(InMemoryFileIndex.scala:263)
   	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
   	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
   	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
   	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
   	at scala.collection.AbstractTraversable.map(Traversable.scala:104)
   	at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$.bulkListLeafFiles(InMemoryFileIndex.scala:263)
   	at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.listLeafFiles(InMemoryFileIndex.scala:130)
   	at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.refresh0(InMemoryFileIndex.scala:94)
   	at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init>(InMemoryFileIndex.scala:70)
   	at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$createInMemoryFileIndex(DataSource.scala:589)
   	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:419)
   	at org.apache.hudi.DefaultSource.getBaseFileOnlyView(DefaultSource.scala:172)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:93)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:51)
   	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:351)
   	at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:311)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:297)
   	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:214)`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] umehrot2 commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

umehrot2 commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-709480582

This is strange. The exception appears to be happening at https://github.com/apache/hudi/blob/master/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala#L138 while doing `fileStatuses.toArray`.

Now it is an `java.lang.ArrayStoreException` which indicates that it is possibly trying to store a wrong type of object `org.apache.spark.sql.execution.datasources.SerializableFileStatus` in an array of `FileStatus`. This would mean that Spark itself is returning `Seq[SerializableFileStatus]` instead of `Seq[FileStatus]` which is not possible on open source spark.

In open source spark they always convert `SerializableFileStatus` to `FileStatus` before returning and thats the contract: https://github.com/apache/spark/blob/v2.4.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala#L246 . `SerializableFileStatus` is private to that class. So my guess is that databricks Spark implementation differs here and its possibly returning `Seq[SerializableFileStatus]` and thats why this is happening. Not sure whether this is something we want to consider fixing and how. @bvaradar @garyli1019 @vinothchandar

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-709457283


   @garyli1019 @umehrot2 : Can you help with this ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-717377743


   @rahulpoptani : Are you using hudi-spark-bundle jar ? 
   
   cc @rmpifer @umehrot2 @bhasudha  who have handled hbase shading recently. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-713185450


   This could be related to #1760 being worked on. i.e not working with spark3


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] rahulpoptani commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

rahulpoptani commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-711553274


   @bvaradar The Jar I'm using is an Uber jar with Spark dependency included. Looks like the databricks environment may be the cause of these errors. I tried to run locally on my laptop and it worked fine.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] bvaradar commented on issue #2180: [SUPPORT] Unable to read MERGE ON READ table with Snapshot option using Databricks.

Posted by GitBox <gi...@apache.org>.

bvaradar commented on issue #2180:
URL: https://github.com/apache/hudi/issues/2180#issuecomment-718167111


   @rahulpoptani : The integration tests which goes through this path by using spark bundle has been working fine (https://travis-ci.org/github/apache/hudi/jobs/737883357#L13336).  This could be related to your setup. The steps are also documented in https://hudi.apache.org/docs/docker_demo.html . Can you compare this with your setup to see what is missing ?
   
   Thanks,
   Balaji.V


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org