You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "xccui (via GitHub)" <gi...@apache.org> on 2023/04/20 13:57:19 UTC

[GitHub] [hudi] xccui opened a new issue, #8516: [SUPPORT] Mismatched write token for parquet files

xccui opened a new issue, #8516:
URL: https://github.com/apache/hudi/issues/8516

   Use a Flink streaming job to write MoR tables. The compaction of a series of table was blocked by the following exception. It seems that the parquet file name in the compaction plan differs from the actual file name in terms of the write token part.
   
   The actual file is `55078b57-488a-4be1-87ac-204548d3ec66_1-5-24_20230420023427524.parquet`.
   ```
   2023-04-20 13:35:10 [pool-31-thread-1] ERROR org.apache.hudi.sink.compact.CompactOperator                 [] - Executor executes action [Execute compaction for instant 20230420041145422 from task 1] error
   org.apache.hudi.exception.HoodieIOException: Failed to read footer for parquet s3a://path-to-table/dt=2023-01-20/hr=19/55078b57-488a-4be1-87ac-204548d3ec66_1-5-23_20230420023427524.parquet
   	at org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:95) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.common.util.ParquetUtils.readSchema(ParquetUtils.java:208) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:230) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.io.storage.HoodieAvroParquetReader.getSchema(HoodieAvroParquetReader.java:104) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:91) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdateInternal(HoodieFlinkCopyOnWriteTable.java:374) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.table.HoodieFlinkCopyOnWriteTable.handleUpdate(HoodieFlinkCopyOnWriteTable.java:365) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.table.action.compact.CompactionExecutionHelper.writeFileAndGetWriteStats(CompactionExecutionHelper.java:64) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:231) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:144) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.sink.compact.CompactOperator.doCompaction(CompactOperator.java:133) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.sink.compact.CompactOperator.lambda$processElement$0(CompactOperator.java:116) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
   	at java.lang.Thread.run(Unknown Source) [?:?]
   Caused by: java.io.FileNotFoundException: No such file or directory: s3a://path-to-table/dt=2023-01-20/hr=19/55078b57-488a-4be1-87ac-204548d3ec66_1-5-23_20230420023427524.parquet
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3866) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3688) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getFileStatus$24(S3AFileSystem.java:3556) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:444) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2337) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2356) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3554) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at promoted.ai.org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:39) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at promoted.ai.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:469) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at promoted.ai.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:454) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	at org.apache.hudi.common.util.ParquetUtils.readMetadata(ParquetUtils.java:93) ~[blob_p-abdf98cc6fdb80521c5886e97d0250884f55321b-e6c0beee736c7301690a2ba078cc0a0f:?]
   	... 15 more
   ```
   
   **Environment Description**
   
   * Hudi version : bdb50ddccc9631317dfb06a06abc38cbd3714ce8
   
   * Flink version : 1.16.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   The job had metadata enabled first. I disabled the metadata table when restarting the job from a checkpoint.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xccui commented on issue #8516: [SUPPORT] Mismatched write token for parquet files

Posted by "xccui (via GitHub)" <gi...@apache.org>.
xccui commented on issue #8516:
URL: https://github.com/apache/hudi/issues/8516#issuecomment-1517073005

   > which version of hudi are you using? 
   > 
   
   I built a snapshot version based on the commit in the description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #8516: [SUPPORT] Mismatched write token for parquet files

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #8516:
URL: https://github.com/apache/hudi/issues/8516#issuecomment-1517068687

   which version of hudi are you using? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #8516: [SUPPORT] Mismatched write token for parquet files

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #8516:
URL: https://github.com/apache/hudi/issues/8516#issuecomment-1516517604

   Guess there are some inconsistency while enabling the MDT during the generation of compaction plan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org