You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Danny Chen (Jira)" <ji...@apache.org> on 2021/11/27 05:40:00 UTC

[jira] [Commented] (HUDI-2576) flink do checkpoint error because parquet file is missing

    [ https://issues.apache.org/jira/browse/HUDI-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17449747#comment-17449747 ] 

Danny Chen commented on HUDI-2576:
----------------------------------

Thanks for the feedback [~liyuanzhao435], the file is deleted because of the marker based file cleaning before metadata commit,
it is the #finalizeWrite step of the {{FlinkHoodieWriteClient}}.

I guess there may be some metadata exception that does not report the written files correctly, and the coordinator finally diff and clean the file.

Do you use the append mode write ? I saw you use the {{HoodieRowDataCreateHandle}}.

> flink do  checkpoint error because parquet file is missing
> ----------------------------------------------------------
>
>                 Key: HUDI-2576
>                 URL: https://issues.apache.org/jira/browse/HUDI-2576
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Flink Integration
>    Affects Versions: 0.10.0
>            Reporter: liyuanzhao435
>            Priority: Major
>              Labels: flink, hudi
>             Fix For: 0.11.0
>
>         Attachments: error.txt
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> hudi:0.10.0, flink 1.13.1
> some times when flink do checkpoint , error occurs,  the error shows a hudi parquet file is missing (says file not exists) : 
> *2021-10-19 09:20:03,796 INFO org.apache.hudi.io.storage.row.HoodieRowDataCreateHandle [] - start close hoodie row data*
> *2021-10-19 09:20:03,800 WARN org.apache.hadoop.hdfs.DataStreamer [] - DataStreamer Exception*
> *java.io.FileNotFoundException: File does not exist: /tmp/test_liyz2/aa/2ff301cc-8db2-478e-b707-e8f2327ba38f-0_0-1-4_20211019091917.parquet (inode 32234795) Holder DFSClient_NONMAPREDUCE_633610786_99 does not have any open files.*
>  *at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2815)*
>  
> detail see  appendix



--
This message was sent by Atlassian Jira
(v8.20.1#820001)