You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/27 06:13:09 UTC
[GitHub] [iceberg] chenwyi2 opened a new issue, #6066: flink restore failed with filenotfound
chenwyi2 opened a new issue, #6066:
URL: https://github.com/apache/iceberg/issues/6066
### Apache Iceberg version
0.14.1 (latest release)
### Query engine
Flink
### Please describe the bug 🐞
We have a flink job that write upsert stream into a partitioned icebergV2 table . When that job get failed, we restart it from the latest checkPoint. But we got that exception: Files does not exists.FileNotFoundException: File does not exist: /rbf/warehouse/cupid_bi.db/ads_qixiao_olap_1min/metadata/c918a379b3cc15d7a8193cf27eb8b473-00000-1-38851-10287.avro
and i saw task.log
2022-10-21 16:57:19,188 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committing append with 2 data files and 0 delete files to table icebergCatalog.xxx
2022-10-21 16:57:19,573 INFO org.apache.iceberg.BaseMetastoreTableOperations [] - Successfully committed to table icebergCatalog.xx
2022-10-21 16:57:19,573 INFO org.apache.iceberg.SnapshotProducer [] - Committed snapshot 7536746147835307981 (MergeAppend)
2022-10-21 16:57:19,594 INFO org.apache.iceberg.BaseMetastoreTableOperations [] - Refreshing table metadata from new version: qbfs://online01/warehouse/xx/metadata/99171-45858db6-5917-4397-afcc-76c10ea80305.metadata.json
2022-10-21 16:57:19,750 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committed in 562 ms
without flushing snapshot state to state backend
the reason is metadata checkpoint information is diffrent from snapshotstate? since the jobfailed without flush snapshot state to state backend
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] stevenzwu commented on issue #6066: flink restore failed with filenotfound
Posted by GitBox <gi...@apache.org>.
stevenzwu commented on issue #6066:
URL: https://github.com/apache/iceberg/issues/6066#issuecomment-1294422350
> Start to flush snapshot state to state backend
This happens in `IcebergFilesCommitter#snapshotState`. if checkpoint N didn't complete successfully, the written manifest file for the incomplete checkpoint won't be used because last completed checkpoint is N-1.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] chenwyi2 commented on issue #6066: flink restore failed with filenotfound
Posted by GitBox <gi...@apache.org>.
chenwyi2 commented on issue #6066:
URL: https://github.com/apache/iceberg/issues/6066#issuecomment-1294294691
> `.avro` might be a manifest file. do you have the complete stack trace? Which Flink version?
>
> I couldn't find this log line in 1.13 (or 1.14 and 1.15).
>
> ```
> 2022-10-21 16:57:19,750 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committed in 562 ms
> without flushing snapshot state to state backend
> ```
>
> 1.13 has log line without the part after `Committed in 562 ms`. https://github.com/apache/iceberg/blob/master/flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
it is my mistake, the right log should be
"2022-10-21 16:57:19,188 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committing append with 2 data files and 0 delete files to table icebergCatalog.xxx
2022-10-21 16:57:19,573 INFO org.apache.iceberg.BaseMetastoreTableOperations [] - Successfully committed to table icebergCatalog.xx
2022-10-21 16:57:19,573 INFO org.apache.iceberg.SnapshotProducer [] - Committed snapshot 7536746147835307981 (MergeAppend)
2022-10-21 16:57:19,594 INFO org.apache.iceberg.BaseMetastoreTableOperations [] - Refreshing table metadata from new version: qbfs://online01/warehouse/xx/metadata/99171-45858db6-5917-4397-afcc-76c10ea80305.metadata.json
2022-10-21 16:57:19,750 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committed in 562 ms
2022-10-21 16:57:20,090 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Start to flush snapshot state to state backend, table: icebergCatalog.cupid_bi.ads_qixiao_tracking_hawkeye_no_filter_1min, checkpointId: 8227"
but in my situation, the task existed without showing flushing snapshot state to state backend, because of nodemanager restart, then i restart fllink job, the job failed with FileNotFoundException
in the hdfs audit log, i never saw that avro file has been created.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] stevenzwu commented on issue #6066: flink restore failed with filenotfound
Posted by GitBox <gi...@apache.org>.
stevenzwu commented on issue #6066:
URL: https://github.com/apache/iceberg/issues/6066#issuecomment-1293769967
`.avro` might be a manifest file. do you have the complete stack trace? Which Flink version?
I couldn't find this log line in 1.13 (or 1.14 and 1.15).
```
2022-10-21 16:57:19,750 INFO org.apache.iceberg.flink.sink.IcebergFilesCommitter [] - Committed in 562 ms
without flushing snapshot state to state backend
```
1.13 has log line without the part after `Committed in 562 ms`.
https://github.com/apache/iceberg/blob/master/flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] congd123 commented on issue #6066: flink restore failed with filenotfound
Posted by "congd123 (via GitHub)" <gi...@apache.org>.
congd123 commented on issue #6066:
URL: https://github.com/apache/iceberg/issues/6066#issuecomment-1506854606
@stevenzwu if this scenario happens
> if checkpoint N didn't complete successfully, the written manifest file for the incomplete checkpoint won't be used because last completed checkpoint is N-1.
What is the best approach to recover the job?
I have a similar behavior using Flink 1.14.1 and Iceberg 1.0.0 (V2)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org