You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/06/03 17:18:13 UTC
[GitHub] [hudi] prashanthpdesai edited a comment on issue #1695: [SUPPORT] : Global Bloom Index config issue

prashanthpdesai edited a comment on issue #1695:
URL: https://github.com/apache/hudi/issues/1695#issuecomment-638335917


   @nsivabalan : thank you i was able to write it successfully with global index after pointing to newer version of jar and but i see below exception while reading the parquet file .
   could you please check is that something you can help on this.
   
   spark.read.parquet(basepath+"/*").show(false)
   
   **Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:**
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
     at org.apache.spark.scheduler.Task.run(Task.scala:123)
     at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.org$apache$spark$executor$Executor$TaskRunner$$anonfun$$res$1(Executor.scala:412)
     at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:419)
     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1359)
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:430)
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
     at java.lang.Thread.run(Thread.java:748)
   **Caused by: java.io.IOException: Could not read footer for file: FileStatus{path=maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit;** isDirectory=false; length=4366; replication=0; blocksize=0; modification_time=0; access_time=0; owner=; group=; permission=rw-rw-rw-; isSymlink=false}
     at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:551)
     at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:538)
     at org.apache.spark.util.ThreadUtils$$anonfun$3$$anonfun$apply$1.apply(ThreadUtils.scala:287)
     at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
     at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
     at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
     at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
     at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
     at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
   **Caused by: java.lang.RuntimeException: maprfs:///datalake/globalndextest0604/.hoodie/20200603115556.commit is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [32, 48, 10, 125]**
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org