You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/29 08:09:32 UTC
[GitHub] [iceberg] ConeyLiu opened a new pull request #2890: Fixes RemoveOrphanFiles delete files unexpected
ConeyLiu opened a new pull request #2890:
URL: https://github.com/apache/iceberg/pull/2890
`RemoveOrphanFiles` use `actualFileDF leftanti join validFileDF` to determine which files should be removed. We will list all the files under the provided or table location directory with `FileSystem.listStatus` and create the `actualFileDF`. `validFileDF` is created by index those metadata file and reference.
However, `FileSystem.listStatus` will `qualify` the given path. For example: a path: `hdfs:/path` will be qualified with `hdfs://host:port/path`. If the `warehouse` is set as: `hdfs:/path`:
`validFileDF`:
hdfs:/path/file1
hdfs:/path/file2
hdfs:/path/file3
....
`actualFileDF`:
hdfs://host:port/path/file1
hdfs://host:port/path/file2
hdfs://host:port/path/file3
....
Then, all the files in `actualFileDF` will be treated as invalid.
In this patch, we only compare the pure path (remove the schema and authority) when doing the `leftanti join`.
Updated existed UTs to test it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] aokolnychyi commented on pull request #2890: Fixes RemoveOrphanFiles delete files unexpected
Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2890:
URL: https://github.com/apache/iceberg/pull/2890#issuecomment-891355561
There have been multiple discussions around this. I'll try to fetch the old thread on slack later today.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] ConeyLiu commented on pull request #2890: Fixes RemoveOrphanFiles delete files unexpected
Posted by GitBox <gi...@apache.org>.
ConeyLiu commented on pull request #2890:
URL: https://github.com/apache/iceberg/pull/2890#issuecomment-918940938
gentle ping @rdblue @aokolnychyi, could you help to review this? Thanks a lot.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] ConeyLiu commented on pull request #2890: Fixes RemoveOrphanFiles delete files unexpected
Posted by GitBox <gi...@apache.org>.
ConeyLiu commented on pull request #2890:
URL: https://github.com/apache/iceberg/pull/2890#issuecomment-897545328
Hi @aokolnychyi, could you help to review this? Thanks a lot.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org