You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/26 14:41:41 UTC

[GitHub] [hudi] wqwl611 opened a new issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

wqwl611 opened a new issue #4690:
URL: https://github.com/apache/hudi/issues/4690


   * Hudi version : 0.10
   
   * Spark version : 3.2
   
   * Hive version :
   
   * Hadoop version :2.7
   
   * Storage (HDFS/S3/GCS..) : hdfs
   
   
   Caused by: org.apache.hudi.exception.HoodieIOException: Failed to read footer for parquet hdfs://ns/user/.../1e4957b3-efff-47f3-b993-ef7c56538a3d-0_8-526-10774_20220126193401513.parquet
   	at org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:185)
   	at org.apache.hudi.common.util.ParquetUtils.readFooter(ParquetUtils.java:201)
   	at org.apache.hudi.common.util.BaseFileUtils.readMinMaxRecordKeys(BaseFileUtils.java:109)
   	at org.apache.hudi.io.storage.HoodieParquetReader.readMinMaxRecordKeys(HoodieParquetReader.java:49)
   	at org.apache.hudi.io.HoodieRangeInfoHandle.getMinMaxKeys(HoodieRangeInfoHandle.java:39)
   	at org.apache.hudi.index.bloom.HoodieBloomIndex.lambda$loadInvolvedFiles$4cbadf07$1(HoodieBloomIndex.java:149)
   	at org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1070)
   	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
   	at scala.collection.Iterator.foreach(Iterator.scala:943)
   	at scala.collection.Iterator.foreach$(Iterator.scala:943)
   	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
   	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
   	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
   	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
   	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
   	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
   	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
   	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
   	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
   	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
   	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
   	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
   	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
   	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
   	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2254)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.run(Task.scala:131)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1489)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
   	... 3 more
   Caused by: java.io.FileNotFoundException: File does not exist: hdfs://ns/.../1e4957b3-efff-47f3-b993-ef7c56538a3d-0_8-526-10774_20220126193401513.parquet
   	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1289)
   	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1281)
   	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1297)
   	at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:39)
   	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:469)
   	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:454)
   	at org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:183)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] wqwl611 commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
wqwl611 commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1031233093


   @nsivabalan 
   I  have tried disable metadata table,but issue still。
   And with 2.4.5,my job have running about~ 3days,and not seen the issue。
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1032674381


   thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1039623431


   cool, thanks! Do you prefer to keep this open or close it and then you can open a new one once you have more updates. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] wqwl611 commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
wqwl611 commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1041125964


   ok, i would close it and reopen if i find something
   @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] wqwl611 commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
wqwl611 commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1032188207


   @nsivabalan ok, I will reproduce the issue, and then share the contents


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1032002981


   hmmm. if you are facing it w/ metadata disabled, we should definitely look into it. Do you have the table state saved? If not, next time when you encounter the issue, can you share the contents of .hoodie if you don't mind. Would help us triage the issue. 
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1039623431


   cool, thanks! Do you prefer to keep this open or close it and then you can open a new one once you have more updates. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] wqwl611 commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
wqwl611 commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1033430210


   @nsivabalan 
   I‘m trying to reproduce the issue, but my job running over 24h, and the issue don't show.
   so weird.
   I will share the content onece I see the issue again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1030879149


   @wqwl611 : with 2.4.5, even after running the pipeline for long time, you are not seeing the issue is it?
   Above issue could happen only after sometime. For eg, if some file got cleaned up, but your write or read is trying to access the file. 
   Do you have metadata enabled by any chance? Can you disable and see if you still encounter the issue? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] wqwl611 closed issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
wqwl611 closed issue #4690:
URL: https://github.com/apache/hudi/issues/4690


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1025238050


   @wqwl611 we need to know more details like the code you're trying to execute and the environment to help reproduce the issue. Also spark 3.2 is not supported in hudi 0.10.0. Please try spark 3.1 or 3.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] wqwl611 commented on issue #4690: write hudi mor table always encounter FileNotFoundException hdfs://ns/...0220126193401513.parquet

Posted by GitBox <gi...@apache.org>.
wqwl611 commented on issue #4690:
URL: https://github.com/apache/hudi/issues/4690#issuecomment-1027770359


   @xushiyan I write df into hudi in saprk-streaming just with regular hudi config . And I tried Spark2.4.5 , this  error don't show
   ,Maybe just spark3.2 problem.
   but weird thing is 3.2 don't failed imediately, but fail at after several batch success。
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org