You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/18 19:23:14 UTC

[GitHub] [iceberg] dramaticlly commented on issue #4168: Flink S3Fileio incorrect commit

dramaticlly commented on issue #4168:
URL: https://github.com/apache/iceberg/issues/4168#issuecomment-1045055044


   So essentially what we observed is that the Flink S3 FileIO failed to upload the data files due to file not exist, but the missing data file is incorrectly tracked in iceberg manifest. So the subsequent query against the partition will fail as the claimed data file cannot be found.
   
   Our iceberg was setup on AWS S3 with versioned bucket and we can confirm that the 
   - data file never get uploaded, no version exists of given path
   - the data file is being tracked in iceberg and we do see such non-exist data file in iceberg metadata query
   
   This result in some nasty behavior where we need to reconcile the manifest state based on what exists in underlying file system, which is not possible. So we had to drop the partition as temporary work around. But I think it would be helpful to understand 
   
   why do we run into the issue where iceberg commits before s3 confirm the data file is uploaded?
   
   In the meantime, we are trying to see if we can have a simple repro of the issue
   
   CC @szehon-ho 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org