You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/27 06:13:09 UTC

[GitHub] [iceberg] chenwyi2 opened a new issue, #6066: flink restore failed with filenotfound

chenwyi2 opened a new issue, #6066:
URL: https://github.com/apache/iceberg/issues/6066

   ### Apache Iceberg version
   
   0.14.1 (latest release)
   
   ### Query engine
   
   Flink
   
   ### Please describe the bug 🐞
   
   We have a flink job that write upsert stream into a partitioned icebergV2 table . When that job get failed, we restart it from the latest checkPoint. But we got that exception: Files does not exists.FileNotFoundException: File does not exist: /rbf/warehouse/cupid_bi.db/ads_qixiao_olap_1min/metadata/c918a379b3cc15d7a8193cf27eb8b473-00000-1-38851-10287.avro
   and i saw task.log
   2022-10-21 16:57:19,188 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committing append with 2 data files and 0 delete files to table icebergCatalog.xxx
   2022-10-21 16:57:19,573 INFO org.apache.iceberg.BaseMetastoreTableOperations [] - Successfully committed to table icebergCatalog.xx
   2022-10-21 16:57:19,573 INFO org.apache.iceberg.SnapshotProducer [] - Committed snapshot 7536746147835307981 (MergeAppend)
   2022-10-21 16:57:19,594 INFO org.apache.iceberg.BaseMetastoreTableOperations [] - Refreshing table metadata from new version: qbfs://online01/warehouse/xx/metadata/99171-45858db6-5917-4397-afcc-76c10ea80305.metadata.json
   2022-10-21 16:57:19,750 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committed in 562 ms
   without  flushing snapshot state to state backend
   the reason is metadata checkpoint information is diffrent from snapshotstate? since the jobfailed without flush snapshot state to state backend


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on issue #6066: flink restore failed with filenotfound

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on issue #6066:
URL: https://github.com/apache/iceberg/issues/6066#issuecomment-1294422350

   >  Start to flush snapshot state to state backend
   
   This happens in `IcebergFilesCommitter#snapshotState`. if checkpoint N didn't complete successfully, the written manifest file for the incomplete checkpoint won't be used because last completed checkpoint is N-1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] chenwyi2 commented on issue #6066: flink restore failed with filenotfound

Posted by GitBox <gi...@apache.org>.
chenwyi2 commented on issue #6066:
URL: https://github.com/apache/iceberg/issues/6066#issuecomment-1294294691

   > `.avro` might be a manifest file. do you have the complete stack trace? Which Flink version?
   > 
   > I couldn't find this log line in 1.13 (or 1.14 and 1.15).
   > 
   > ```
   > 2022-10-21 16:57:19,750 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committed in 562 ms
   > without flushing snapshot state to state backend
   > ```
   > 
   > 1.13 has log line without the part after `Committed in 562 ms`. https://github.com/apache/iceberg/blob/master/flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
   
   it is my mistake, the right log should be 
   "2022-10-21 16:57:19,188 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committing append with 2 data files and 0 delete files to table icebergCatalog.xxx
   2022-10-21 16:57:19,573 INFO org.apache.iceberg.BaseMetastoreTableOperations [] - Successfully committed to table icebergCatalog.xx
   2022-10-21 16:57:19,573 INFO org.apache.iceberg.SnapshotProducer [] - Committed snapshot 7536746147835307981 (MergeAppend)
   2022-10-21 16:57:19,594 INFO org.apache.iceberg.BaseMetastoreTableOperations [] - Refreshing table metadata from new version: qbfs://online01/warehouse/xx/metadata/99171-45858db6-5917-4397-afcc-76c10ea80305.metadata.json
   2022-10-21 16:57:19,750 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committed in 562 ms
   2022-10-21 16:57:20,090 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Start to flush snapshot state to state backend, table: icebergCatalog.cupid_bi.ads_qixiao_tracking_hawkeye_no_filter_1min, checkpointId: 8227"
   but in my situation, the task existed without showing  flushing snapshot state to state backend, because of nodemanager restart, then i restart fllink job, the job failed with FileNotFoundException
   in the hdfs audit log, i never saw that avro file has been created.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on issue #6066: flink restore failed with filenotfound

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on issue #6066:
URL: https://github.com/apache/iceberg/issues/6066#issuecomment-1293769967

   `.avro` might be a manifest file. do you have the complete stack trace? Which Flink version? 
   
   I couldn't find this log line in 1.13 (or 1.14 and 1.15).  
   ```
   2022-10-21 16:57:19,750 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committed in 562 ms
   without flushing snapshot state to state backend
   ```
   
   1.13 has log line without the part after `Committed in 562 ms`.
   https://github.com/apache/iceberg/blob/master/flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] congd123 commented on issue #6066: flink restore failed with filenotfound

Posted by "congd123 (via GitHub)" <gi...@apache.org>.
congd123 commented on issue #6066:
URL: https://github.com/apache/iceberg/issues/6066#issuecomment-1506854606

   @stevenzwu if this scenario happens
   
   > if checkpoint N didn't complete successfully, the written manifest file for the incomplete checkpoint won't be used because last completed checkpoint is N-1.
   
   What is the best approach to recover the job?
   
   I have a similar behavior using Flink 1.14.1 and Iceberg 1.0.0 (V2)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org