You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Adam Szita <sz...@cloudera.com> on 2017/01/31 09:19:11 UTC

Review Request 56118: DROP TABLE in hive doesn't Throw Error

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56118/
-----------------------------------------------------------

Review request for hive, Aihua Xu, Peter Vary, and Sergio Pena.


Bugs: HIVE-14181
    https://issues.apache.org/jira/browse/HIVE-14181


Repository: hive-git


Description
-------

Failure during table drop doesn't throw errors and results in success - some times data resides in warehouse, but table (meta data) is removed from metastore resulting in incosistency


Diffs
-----

  itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java af125c38236582ba532f5e3de3d2ba724f38b101 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 13d0aab40eaf344869b8230614b55115f64857e6 

Diff: https://reviews.apache.org/r/56118/diff/


Testing
-------

-Added test case
-Tested on cluster


Thanks,

Adam Szita


Re: Review Request 56118: DROP TABLE in hive doesn't Throw Error

Posted by Adam Szita <sz...@cloudera.com>.

> On Feb. 3, 2017, 2:42 p.m., Aihua Xu wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, line 1679
> > <https://reviews.apache.org/r/56118/diff/2/?file=1623348#file1623348line1679>
> >
> >     With this approach, I feel it will place Hive in worse situation since now it could fail during the data deletion and it failed, metastore is rolled back but part of the data is removed. The table is in unusable conditition.
> >     
> >     Without this change, we roll back when the metadata drop has issues. After roll back, the table is still consistent.
> >     
> >     If the metadata drop succeeds, then we continue deleting the data. It's not that good that the data could not be purged, but it won't cause any inconsistency since metadata is gone and the data is not visiable to the caller.

Okay, I see, let's stick to removing the metadata as well then.
Although I still suggest we _do not return success_ to the client - if some file/dir couldn't have been deleted then the drop_table command clearly didn't succeeded entirely and the user should know about this.

So I think the exception throwing changes should be kept (along with the configurability of this strict mode). Also, it should be logged additionally what path could not have been deleted, and what (partition) paths remained on disk because of this.
This all will ultimately give the user: (1) notification that due to an error some data is remained; (2) what paths have to be considered for retrying to delete them manually (can't retry from Hive itself, since we drop the metadata regardless).

So - having much more experience in Hive, what do you think about this approach?


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56118/#review164121
-----------------------------------------------------------


On Feb. 3, 2017, 2:22 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56118/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 2:22 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-14181
>     https://issues.apache.org/jira/browse/HIVE-14181
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Failure during table drop doesn't throw errors and results in success - some times data resides in warehouse, but table (meta data) is removed from metastore resulting in incosistency
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 53b9b0c6962c9b1cd2eef1cb71687ec0245cfac3 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java af125c38236582ba532f5e3de3d2ba724f38b101 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java f8c3c4e48db0df9d6c18801bcd61f9e5dc6eb7c2 
> 
> Diff: https://reviews.apache.org/r/56118/diff/
> 
> 
> Testing
> -------
> 
> -Added test case
> -Tested on cluster
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 56118: DROP TABLE in hive doesn't Throw Error

Posted by Aihua Xu <ax...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56118/#review164121
-----------------------------------------------------------




metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java (line 1679)
<https://reviews.apache.org/r/56118/#comment235712>

    With this approach, I feel it will place Hive in worse situation since now it could fail during the data deletion and it failed, metastore is rolled back but part of the data is removed. The table is in unusable conditition.
    
    Without this change, we roll back when the metadata drop has issues. After roll back, the table is still consistent.
    
    If the metadata drop succeeds, then we continue deleting the data. It's not that good that the data could not be purged, but it won't cause any inconsistency since metadata is gone and the data is not visiable to the caller.


- Aihua Xu


On Feb. 3, 2017, 2:22 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56118/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 2:22 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-14181
>     https://issues.apache.org/jira/browse/HIVE-14181
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Failure during table drop doesn't throw errors and results in success - some times data resides in warehouse, but table (meta data) is removed from metastore resulting in incosistency
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 53b9b0c6962c9b1cd2eef1cb71687ec0245cfac3 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java af125c38236582ba532f5e3de3d2ba724f38b101 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java f8c3c4e48db0df9d6c18801bcd61f9e5dc6eb7c2 
> 
> Diff: https://reviews.apache.org/r/56118/diff/
> 
> 
> Testing
> -------
> 
> -Added test case
> -Tested on cluster
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 56118: DROP TABLE in hive doesn't Throw Error

Posted by Aihua Xu <ax...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56118/#review165125
-----------------------------------------------------------




metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java (line 1679)
<https://reviews.apache.org/r/56118/#comment236939>

    Actually it still has the issue here. 
    
    With this change, if the data deletion fails, now Hive will roll back the metastore DB change, but the actual data may not be recovered. That will still put the table in inconsistent state. 
    
    I guess this part doesn't need change. But even that, it will cause issue since if we return deleting fails, from the user point of view, the table should be intact but actually it's already deleted.
    
    So I'm rethinking of this jira. Maybe we have to keep like this, since the current behaivor seems to be the best with those nonatomic operations. 
    
    For hive, if metastore info is removed, then it's considered table deleted. For external table, actually the data would not get removed.



metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java (line 1811)
<https://reviews.apache.org/r/56118/#comment236940>

    Now when we throw the exception here, if there is one partition we can't delete, it will stop rather than continuing deleting the remaining. 
    
    I would suggest not to throw exception but return a true/false flag to indicate deletion success or failure. Same for deleteTableData.


- Aihua Xu


On Feb. 3, 2017, 2:22 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56118/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 2:22 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-14181
>     https://issues.apache.org/jira/browse/HIVE-14181
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Failure during table drop doesn't throw errors and results in success - some times data resides in warehouse, but table (meta data) is removed from metastore resulting in incosistency
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 53b9b0c6962c9b1cd2eef1cb71687ec0245cfac3 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java af125c38236582ba532f5e3de3d2ba724f38b101 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java f8c3c4e48db0df9d6c18801bcd61f9e5dc6eb7c2 
> 
> Diff: https://reviews.apache.org/r/56118/diff/
> 
> 
> Testing
> -------
> 
> -Added test case
> -Tested on cluster
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 56118: DROP TABLE in hive doesn't Throw Error

Posted by Adam Szita <sz...@cloudera.com>.

> On Feb. 10, 2017, 7:32 p.m., Vihang Karajgaonkar wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java, line 1679
> > <https://reviews.apache.org/r/56118/diff/2/?file=1623348#file1623348line1679>
> >
> >     I agree with Aihua here. As long as the table metadata is dropped, from the client's point of view the table does not exist. The filesystem will have stale data because it could not be deleted successfully, but that stale data unusable anyways without the metadata. If we want to notify such cases to the client, I think it should be a warning at best and not an error.

Let's consider the following situation:
-The user creates a table, fills it with some data, then drops it (which fails silently leaving data behind on disk).
-Then the user decides to recreate the table with a different serde, e.g. Avro format. (or even another user could create a table with the same name)
-A simple _select * from table_ will fail with the following: _"Error: java.io.IOException: java.io.IOException: Not a data file. (state=,code=0)"_
-User will get quite confused since they don't know that a previous drop table failure has caused this 

So as I see it, we should either:
-Remove table from HMS, and give back a very simple exception e.g. "Table definition is deleted, but some data files remained on disk, please clean up manually" (we either succeed or throw an exception, in order to signal warning thrift contract would have to be amended which is an overkill for this issue :) )
-Leave this functionality as is, but add a feature to one of the existing tools (e.g. schematool, metatool) that can detect orphaned data leftovers on disk by comparing HDFS content with HMS (obviously this should go to a separate jira)

I would've chosen the first option since it's really simple - just a notification to the user - and we could even make it configurable e.g. hive.metastore.droptable.verbose=true/false


- Adam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56118/#review165150
-----------------------------------------------------------


On Feb. 3, 2017, 2:22 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56118/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 2:22 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-14181
>     https://issues.apache.org/jira/browse/HIVE-14181
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Failure during table drop doesn't throw errors and results in success - some times data resides in warehouse, but table (meta data) is removed from metastore resulting in incosistency
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 53b9b0c6962c9b1cd2eef1cb71687ec0245cfac3 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java af125c38236582ba532f5e3de3d2ba724f38b101 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java f8c3c4e48db0df9d6c18801bcd61f9e5dc6eb7c2 
> 
> Diff: https://reviews.apache.org/r/56118/diff/
> 
> 
> Testing
> -------
> 
> -Added test case
> -Tested on cluster
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 56118: DROP TABLE in hive doesn't Throw Error

Posted by Vihang Karajgaonkar <vi...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56118/#review165150
-----------------------------------------------------------




metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java (line 1679)
<https://reviews.apache.org/r/56118/#comment236965>

    I agree with Aihua here. As long as the table metadata is dropped, from the client's point of view the table does not exist. The filesystem will have stale data because it could not be deleted successfully, but that stale data unusable anyways without the metadata. If we want to notify such cases to the client, I think it should be a warning at best and not an error.


- Vihang Karajgaonkar


On Feb. 3, 2017, 2:22 p.m., Adam Szita wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56118/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2017, 2:22 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Peter Vary, and Sergio Pena.
> 
> 
> Bugs: HIVE-14181
>     https://issues.apache.org/jira/browse/HIVE-14181
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Failure during table drop doesn't throw errors and results in success - some times data resides in warehouse, but table (meta data) is removed from metastore resulting in incosistency
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 53b9b0c6962c9b1cd2eef1cb71687ec0245cfac3 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java af125c38236582ba532f5e3de3d2ba724f38b101 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java f8c3c4e48db0df9d6c18801bcd61f9e5dc6eb7c2 
> 
> Diff: https://reviews.apache.org/r/56118/diff/
> 
> 
> Testing
> -------
> 
> -Added test case
> -Tested on cluster
> 
> 
> Thanks,
> 
> Adam Szita
> 
>


Re: Review Request 56118: DROP TABLE in hive doesn't Throw Error

Posted by Adam Szita <sz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56118/
-----------------------------------------------------------

(Updated Feb. 3, 2017, 2:22 p.m.)


Review request for hive, Aihua Xu, Peter Vary, and Sergio Pena.


Changes
-------

Added configuration option to turn on/off the stricter drop_table apporach.


Bugs: HIVE-14181
    https://issues.apache.org/jira/browse/HIVE-14181


Repository: hive-git


Description
-------

Failure during table drop doesn't throw errors and results in success - some times data resides in warehouse, but table (meta data) is removed from metastore resulting in incosistency


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 53b9b0c6962c9b1cd2eef1cb71687ec0245cfac3 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java af125c38236582ba532f5e3de3d2ba724f38b101 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java f8c3c4e48db0df9d6c18801bcd61f9e5dc6eb7c2 

Diff: https://reviews.apache.org/r/56118/diff/


Testing
-------

-Added test case
-Tested on cluster


Thanks,

Adam Szita