You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/23 16:23:27 UTC

[GitHub] [iceberg] rdblue commented on issue #4194: Delete Orphan Files makes metadata inconsistent and table unusable

rdblue commented on issue #4194:
URL: https://github.com/apache/iceberg/issues/4194#issuecomment-1048962292


   @jotarada, do you have any information about the run that deleted the file? Were there concurrent writes? And what length of time did you use for `older_than`?
   
   The interval you use is important if you have jobs that run for a long time. What usually is the case when this happens is that the `older_than` timestamp allows removing files that haven't been committed yet. For example, if you have `older_than` set to 3 hours ago and a job writes files for 4 hours, then the job may write files that get caught as orphan files because they're older than the limit and not (yet) committed to the table.
   
   Another possibility is that your file system listing doesn't match the table listing, but that's more rare---we've only seen it with HDFS alternate name nodes so far. It seems unlikely that would happen with GCS.
   
   If you can share some of the logs from the orphan files run, that would be helpful!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org