You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/13 10:23:09 UTC

[GitHub] [iceberg] alexkingli opened a new pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

alexkingli opened a new pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730


   Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore. Then we don't need to delete the table files by hadoop command. It will fail for security if the table exists. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli commented on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994707964


   > > Another problem is that the dropTable can't delete directories and some metadata file in hdfs if we use the hdfs as underlying storage. This problem may cause the failure of rebuilding table with same table name. Please take a look at this problem , thanks again.
   > 
   > Does the delete run into some errors, or it just does not try to remove files? Would #3622 help here?
   Even if this method is successfully executed, some directories and some meta files are not cleaned in hdfs。For example:/home/$lakehouse_dir/$database.db/$table/、/home/$lakehouse_dir/$database.db/$table/data、/home/$lakehouse_dir/$database.db/$table/metadata/*.metadata.json . Then if I create a table with the same name , i need to delete these directories by and files by hadoop command first . 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994477484


   I would be interested in:
   - How would you try drop the table after the first failure?
   - Who would call the new API method?
   
   I think if we know the location of the former table, then we can use it to remove the directory by hand or through a FileSystem API. I am not sure we really need a new method for it in the Iceberg API, especially that it does not do anything Iceberg specific, just removes the directory recursively (which could be wrong in case of an Iceberg table as it could contain data files from anywhere).
   
   I still might miss some points, so correct me if I am wrong in my assumptions/logic above.
   
   Thanks,
   Peter
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-992825671


   For your case, where the wrong user was used, there's also a Spark action that can potentially help this. RemoveReachableFiles.
   
   It's already usable, but here's a PR to add it as a stored-procedure: https://github.com/apache/iceberg/pull/3719
   
   Would that possibly help as well? My understanding has always been that situations like the 2nd was the reason for having that action.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-992825671


   For your case, where the wrong user was used, there's also a Spark action that can potentially help this. RemoveReachableFiles.
   
   It's already usable, but here's a PR to add it as a stored-procedure: https://github.com/apache/iceberg/pull/3719
   
   Would that possibly help as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli commented on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
alexkingli commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994332602


   > @alexkingli: Could you please explain exactly what happened? What I understand is:
   > 
   > 1. Drop table (wrong user) - metadata dropped, deleting data failed
   > 2. Drop table (correct user) - drop table failed
   > 
   > What I do no understand is that at the 2nd try we do not have metadata for the table. How is it possible to drop a table in this case?
   > 
   > #2583 is somewhat related. Could it help you?
   > 
   > Thanks, Peter
   
   Yes, the situation you described is same with me. And I think this cleanup method is necessary in hivecatalog. We don't  need to clean up these files directly with hadoop commands every time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli commented on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994576835


   > I think I understand the design of dropTable. Thanks a lot. Another problem is that the dropTable can't delete directories and some metadata file in hdfs if we use the hdfs as underlying storage. This problem may cause the failure of rebuilding table with same table name. Please take a look at this problem , thanks again.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-995754059


   So basically this is the same discussion as in #1839.
   Were you successful finding out why the HiveMetaStore does not delete the table directory in Hive 3.1.2?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli commented on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
alexkingli commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994498804


   Ok, I submitted a new commit. I change the logic of dropTable in HiveCatalog. I think it should remove the data file and then clean the metadata in metastore. The problem at present is the dropTable can't ensure the transaction for deleting metadata in hivemetastore and deleting data in S3 or HDFS. So It's a good idea to put the logic of deleting data first,it will ensure deleting data  successfully as far as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994522770


   > The problem at present is the dropTable can't ensure the transaction for deleting metadata in hivemetastore and deleting data in S3 or HDFS. So It's a good idea to put the logic of deleting data first,it will ensure deleting data successfully as far as possible.
   
   We had this discussion in Hive before and the conclusion was that it is better to drop the metadata first. This way we might keep some unnecessary files but we always have a consistent Hive table structure. Also it is quite easy to delete the files by hand. If we go the other way around, then we might end up in a situation where we drop the files, but keep the metadata. This way we will have corrupted Hive table structure which the community decided that is a much worse situation for the Hive users than the former.
   
   I think the Iceberg tables are not special in this way, and I think the argument above still holds true.
   
   What do you think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli removed a comment on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
alexkingli removed a comment on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994559694


   > So It's a good idea to put the logic of deleting data first,it will ensure deleting data successfully as far as possible.
   
   I think I understand the design of dropTable. Thanks a lot.  Another problem is that the dropTable can't delete directories and some metadata file in hdfs if we use the hdfs as underlying storage. This problem may cause the failure of rebuilding table with same table name. Please take a look at this problem , thanks again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli commented on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
alexkingli commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994569184


   I think I understand the design of dropTable. Thanks a lot. Another problem is that the dropTable can't delete directories and some metadata file in hdfs if we use the hdfs as underlying storage. This problem may cause the failure of rebuilding table with same table name. Please take a look at this problem , thanks again.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli commented on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994736784


   > I think #3622 should help by removing the /home/$lakehouse_dir/$database.db/$table/metadata/*.metadata.json files. If I understand correctly the reason for that change was that the old metadata files were kept, and after the PR they will be removed. I would guess that with #3622 the only remaining problems are the /home/$lakehouse_dir/$database.db/$table/, /home/$lakehouse_dir/$database.db/$table/ and the /home/$lakehouse_dir/$database.db/$table/metadata/ dirs.
   > 
   > It might be useful to remove them at the end of the `dropTableData` method, but we can do it only if they are empty (as other tables can use these dirs "legally" 😄). This would solve the issue for every engine.
   > 
   > If the drop happens from Hive then we might get away by adding it to the `HiveIcebergMetaHook.commitDropTable` method, but this is a Hive only solution, so the above one might be better, if the community agrees. https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java#L222-L241
   > 
   > Thanks, Peter
   
   Thanks a lot.  I think it is necessary to delete these directories without any files. In fact, there may be some partition directories that need to be deleted. It is not convenient to delete these directories in hdfs or S3 by command. Or we can make a tool to solve this problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli commented on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
alexkingli commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994559694


   > So It's a good idea to put the logic of deleting data first,it will ensure deleting data successfully as far as possible.
   
   I think I understand the design of dropTable. Thanks a lot.  Another problem is that the dropTable can't delete directories and some metadata file in hdfs if we use the hdfs as underlying storage. This problem may cause the failure of rebuilding table with same table name. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli edited a comment on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli edited a comment on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994736784


   > I think #3622 should help by removing the /home/$lakehouse_dir/$database.db/$table/metadata/*.metadata.json files. If I understand correctly the reason for that change was that the old metadata files were kept, and after the PR they will be removed. I would guess that with #3622 the only remaining problems are the /home/$lakehouse_dir/$database.db/$table/, /home/$lakehouse_dir/$database.db/$table/ and the /home/$lakehouse_dir/$database.db/$table/metadata/ dirs.
   > 
   > It might be useful to remove them at the end of the `dropTableData` method, but we can do it only if they are empty (as other tables can use these dirs "legally" 😄). This would solve the issue for every engine.
   > 
   > If the drop happens from Hive then we might get away by adding it to the `HiveIcebergMetaHook.commitDropTable` method, but this is a Hive only solution, so the above one might be better, if the community agrees. https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java#L222-L241
   > 
   > Thanks, Peter
   
   Thanks a lot.  I think it is necessary to delete these directories without any files. In fact, there may be some partition directories that need to be deleted. It is not convenient to delete these directories in hdfs or S3 by command every time.  Or we can make a tool to solve this problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-992485770


   @alexkingli: Could you please explain exactly what happened?
   What I understand is:
   1. Drop table (wrong user) - metadata dropped, deleting data failed
   2. Drop table (correct user) - drop table failed
   
   What I do no understand is that at the 2nd try we do not have metadata for the table. How is it possible to drop a table in this case?
   
   #2583 is somewhat related. Could it help you?
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli closed pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli closed pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994718603


   > #3622
   
   
   
   > > Does the delete run into some errors, or it just does not try to remove files? Would #3622 help here?
   > 
   > Even if this method is successfully executed, some directories and some meta files are not cleaned in hdfs。For example:/home/$lakehouse_dir/$database.db/$table/、/home/$lakehouse_dir/$database.db/$table/、/home/$lakehouse_dir/$database.db/$table/metadata/*.metadata.json . Then if I create a table with the same name , i need to delete these directories and files by hadoop command first .
   
   I think #3622 should help by removing the /home/$lakehouse_dir/$database.db/$table/metadata/*.metadata.json files. If I understand correctly the reason for that change was that the old metadata files were kept, and after the PR they will be removed. I would guess that with #3622 the only remaining problems are the /home/$lakehouse_dir/$database.db/$table/,  /home/$lakehouse_dir/$database.db/$table/ and the /home/$lakehouse_dir/$database.db/$table/metadata/ dirs.
   
   It might be useful to remove them at the end of the `dropTableData` method, but we can do it only if they are empty (as other tables can use these dirs "legally" 😄). This would solve the issue for every engine.
   
   If the drop happens from Hive then we might get away by adding it to the `HiveIcebergMetaHook.commitDropTable` method, but this is a Hive only solution, so the above one might be better, if the community agrees.
   https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java#L222-L241
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli edited a comment on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli edited a comment on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994707964


   > >Does the delete run into some errors, or it just does not try to remove files? Would #3622 help here?
   
   Even if this method is successfully executed, some directories and some meta files are not cleaned in hdfs。For example:/home/$lakehouse_dir/$database.db/$table/、/home/$lakehouse_dir/$database.db/$table/data、/home/$lakehouse_dir/$database.db/$table/metadata/*.metadata.json . Then if I create a table with the same name , i need to delete these directories by and files by hadoop command first . 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-1015661786


   It looks like there's quite a bit of discussion about this being the right direction. For now, I'm going to close this PR until we figure out how to handle the case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli removed a comment on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli removed a comment on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994576835


   > I think I understand the design of dropTable. Thanks a lot. Another problem is that the dropTable can't delete directories and some metadata file in hdfs if we use the hdfs as underlying storage. This problem may cause the failure of rebuilding table with same table name. Please take a look at this problem , thanks again.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli edited a comment on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli edited a comment on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994569184


   I think I understand the design of dropTable. Thanks a lot. Another problem is that the dropTable can't delete directories and some metadata file in hdfs if we use the hdfs as underlying storage. This problem may cause the failure of rebuilding table with same table name. Please take a look at this problem , thanks again.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick edited a comment on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
kbendick edited a comment on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-992825671


   For your case, where the wrong user was used, there's also a Spark action that can potentially help this. RemoveReachableFiles.
   
   It's already usable, but here's a PR to add it as a stored-procedure: https://github.com/apache/iceberg/pull/3719
   
   Would that possibly help as well? My understanding has always been that situations like this were the reason for having that action.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli edited a comment on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli edited a comment on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994707964


   >Does the delete run into some errors, or it just does not try to remove files? Would #3622 help here?
   
   Even if this method is successfully executed, some directories and some meta files are not cleaned in hdfs。For example:/home/$lakehouse_dir/$database.db/$table/、/home/$lakehouse_dir/$database.db/$table/data、/home/$lakehouse_dir/$database.db/$table/metadata/*.metadata.json . Then if I create a table with the same name , i need to delete these directories and files by hadoop command first . 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli edited a comment on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli edited a comment on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994707964


   >Does the delete run into some errors, or it just does not try to remove files? Would #3622 help here?
   
   Even if this method is successfully executed, some directories and some meta files are not cleaned in hdfs。For example:/home/$lakehouse_dir/$database.db/$table/、/home/$lakehouse_dir/$database.db/$table/data、/home/$lakehouse_dir/$database.db/$table/metadata/*.metadata.json . Then if I create a table with the same name , i need to delete these directories by and files by hadoop command first . 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994751861


   The `FileIO` API is quite narrow by design. I would not think that the community would agree to add new method to it.
   
   I think the easiest way forward would be to convince Hive to drop the table data.
   It should be possible to do it based on the value of `ifPurge`
   https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2548
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli edited a comment on pull request #3730: Add the cleanTable in HiveCatalog to slove the problem that unable to delete table files when the table is already deleted in metastore.

Posted by GitBox <gi...@apache.org>.
alexkingli edited a comment on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994559694


   > So It's a good idea to put the logic of deleting data first,it will ensure deleting data successfully as far as possible.
   
   I think I understand the design of dropTable. Thanks a lot.  Another problem is that the dropTable can't delete directories and some metadata file in hdfs if we use the hdfs as underlying storage. This problem may cause the failure of rebuilding table with same table name. Please take a look at this problem , thanks again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-994677029


   > Another problem is that the dropTable can't delete directories and some metadata file in hdfs if we use the hdfs as underlying storage. This problem may cause the failure of rebuilding table with same table name. Please take a look at this problem , thanks again.
   
   Does the delete run into some errors, or it just does not try to remove files?
   Would #3622 help here?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexkingli commented on pull request #3730: slove the problem that unable to delete table files by hivecatalog

Posted by GitBox <gi...@apache.org>.
alexkingli commented on pull request #3730:
URL: https://github.com/apache/iceberg/pull/3730#issuecomment-995520817


   > The `FileIO` API is quite narrow by design. I would not think that the community would agree to add new method to it.
   > 
   > I think the easiest way forward would be to convince Hive to drop the table data. It should be possible to do it based on the value of `ifPurge` https://github.com/apache/hive/blob/rel/release-3.1.2/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L2548
   
   I think we can slove this problem  by merging pr1839., but it was made long time ago and wasn't merged into the release.  Maybe you are right,it is not a big problem for the moment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org