You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/05 21:15:27 UTC

[GitHub] [iceberg] RussellSpitzer opened a new issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

RussellSpitzer opened a new issue #2554:
URL: https://github.com/apache/iceberg/issues/2554


   If a user is working with the HiveIcebergStorageHandler then they can get into a bit of a tricky situation. Say perhaps, that they delete their metadata.json or alter that property such that it no longer points to a valid file. Any subsequent DDL or operations will fail because for Hive to modify the table it must first load metadata.json. So if the file is dropped the the table is undroppable. This may also be an issue in case the table property is changed to point to a bad metadata.json path.
   
   I'm not sure if this is something we can guard against but I thought I'd make a note of it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] KarlManong edited a comment on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
KarlManong edited a comment on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-964910390


   > Having the issue on 0.12.0 using spark to drop the tables with Hive Meta Store, is there a way to do it without spark?
   
   Drop or recreate  the table in Hive Service. That's the only way I know. Also you should treat orphans files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
marton-bod commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-833449999


   This is a good point, and I think we can and should guard against that. The table currently becomes undroppable only because we try to load the Iceberg table prior to dropping the HMS table. We do this because Hive2 and Hive3 do not clean up the metadata folder completely so we need to cache the io to perform the deletion later. 
   
   What we can do instead is try to load the Iceberg table as it is, but if it fails, just log a warning message that the table was unloadable for whatever reason, but otherwise proceed with the HMS table drop. This might leave some undeleted metadata files behind but it's still much better than leaving the table undroppable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] fcomuniz commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
fcomuniz commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-907651938


   Having the issue on 0.12.0 using spark to drop the tables with Hive Meta Store, is there a way to do it without spark?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-836603671


   @RussellSpitzer: How likely is this situation based on your experience?
   
   In Hive we faced similar situation a few times before, for example:
   - Dropping the table and failing to remove the files
    
   Sadly we still do not have a way in Hive to return a warning message along with reporting the success of the drop table operation. What we did in these cases:
   - Log a warn message in the logs, and silently (for the user) drop the metadata and skip the file removal.
   
   This looks like the same solution suggested by @marton-bod, and I think it is much easier to recover from a problem if we follow this way. It is easier to drop the remaining data from the Iceberg table later, than recreating the appropriate files just so the metadata could be deleted as well.
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-836762324


   @marton-bod: I would prefer your first suggestion:
   - Log a warn message
   - `DROP TABLE` should be successful
   
   There are automatic clients who might expect the table to remain if the `DROP TABLE` command is not successful. This could be confusing. We either do not drop a table metadata and return an error, or drop the table metadata and return successfully. At least that was the reasoning behind our decision in the aforementioned case.
   
   bq. The next issue we kept hitting is that there were no iceberg jars on the hive class-path which is a problem because you can always set the storage handler, but once set, you can't remove the property without the jars on the class-path
   
   This is another story might worth another issue, if we are not able to drop the table even from Spark, or Java clients


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod edited a comment on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
marton-bod edited a comment on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-836682523


   We do have the post-drop hook available though to convey information to the user after-the-fact, so for example we can throw an exception to draw the user's attention to this anomaly, despite the success of the core DROP operation. 
   
   So, I see two options:
   
   - we try to load the Iceberg table in the pre-drop hook, save in a boolean whether it succeeded, DROP the table as per Hive, and if the load failed earlier then throw a descriptive exception in the post-drop hook explaining what happened (i.e. the drop succeeded, but there was this anomaly, so please look into it, there might metadata files undeleted which might need manual intervention, etc.)
   - we try to load the Iceberg table in the pre-drop hook, log a descriptive error msg if it fails, DROP the table as per Hive, nothing else (i.e. what I described a couple days ago)
   
   I would prefer the first one, since I think it's more explicit (the drawback is that it fails the operation from a return code point of view, but not from a functional pov). The second one is silent and we can't expect the user to inspect the logs after each DROP operation looking for warning signs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
marton-bod commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-836682523


   We do have the post-drop hook available though to convey information to the user after-the-fact, so for example we can throw an exception to draw the user's attention to this anomaly, despite the success of the core DROP operation. 
   
   So, I see two options:
   
   - we try to load the Iceberg table in the pre-drop hook, save in a boolean whether it succeeded, DROP the table as per Hive, and if the load failed earlier then throw a descriptive exception in the post-drop hook explaining what happened (i.e. the drop succeeded, but there was this anomaly, so please look into it, there might metadata files undeleted which might need manual intervention, etc.)
   - we try to load the Iceberg table in the pre-drop hook, log a descriptive error msg if it fails, DROP the table as per Hive, nothing else (i.e. what I described a couple days ago)
   
   I would prefer the first one, since I think it's more explicit (the drawback is that it fails the operation from a return code point of view). The second one is silent and we can't expect the user to inspect the logs after each DROP operation looking for warning signs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-836618503


   So for now it's 1/1 with users who set this property 😊. They were setting up a framework for tons of other users and use cases so they created a lot of entities


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] KarlManong edited a comment on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
KarlManong edited a comment on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-964910390


   > Having the issue on 0.12.0 using spark to drop the tables with Hive Meta Store, is there a way to do it without spark?
   
   We got the same issue. I drop the table in Hive Service. That's the only way I know. Also I deleted the data files manually.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-833012392


   @marton-bod + @pvary Just in case you had some thoughts


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-836811980


   Made #2578 for the engine enabled issue no jars on CP @pvary, Thanks ya'll for helping out with this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
marton-bod commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-836785417


   > .. who might expect the table to remain if the DROP TABLE command is not successful
   
   This is a good point! Let's go with the first idea then - I'll put together a PR soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary edited a comment on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
pvary edited a comment on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-836762324


   @marton-bod: I would prefer your first suggestion:
   - Log a warn message
   - `DROP TABLE` should be successful
   
   There are automatic clients who might expect the table to remain if the `DROP TABLE` command is not successful. This could be confusing. We either do not drop a table metadata and return an error, or drop the table metadata and return successfully. At least that was the reasoning behind our decision in the aforementioned case.
   
   > The next issue we kept hitting is that there were no iceberg jars on the hive class-path which is a problem because you can always set the storage handler, but once set, you can't remove the property without the jars on the class-path
   
   This is another story might worth another issue, if we are not able to drop the table even from Spark, or Java clients


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-836617396


   So far this has never cropped up intentionally but I had a user who saw the hive.engine.enabled property and set it on a large number of multiuser tables. 
   
   This caused a bit of havoc, the deletion issue then happened first since they were used to standard hive tables and removed the data files from some tables before the catalog.
   
   The next issue we kept hitting is that there was no iceberg jars on the hive class-path which is a problem because you can always set the storage handler, but once set, you can't remove the property without the jars on the class-path


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer edited a comment on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
RussellSpitzer edited a comment on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-836617396


   So far this has never cropped up intentionally, but I had a user who saw the hive.engine.enabled property and set it on a large number of multiuser tables. 
   
   This caused a bit of havoc. The deletion issue then happened first as they were used to standard hive tables and removed the data files from some tables before the dropping them from the catalog.
   
   The next issue we kept hitting is that there were no iceberg jars on the hive class-path which is a problem because you can always set the storage handler, but once set, you can't remove the property without the jars on the class-path


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] KarlManong commented on issue #2554: Cannot Drop Table Created with HiveIcebergStorageHandler Enabled but Metadata.json is Missing

Posted by GitBox <gi...@apache.org>.
KarlManong commented on issue #2554:
URL: https://github.com/apache/iceberg/issues/2554#issuecomment-964910390


   > Having the issue on 0.12.0 using spark to drop the tables with Hive Meta Store, is there a way to do it without spark?
   
   Drop or recreate  the table in Hive Service. That's the only way I know.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org