You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "rameshkrishnan muthusamy (Jira)" <ji...@apache.org> on 2020/07/08 14:05:00 UTC

[jira] [Updated] (HIVE-23816) Concurrent access of metastore dynamic partition registration API resulting in data loss due to HDFS dir deletion

     [ https://issues.apache.org/jira/browse/HIVE-23816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

rameshkrishnan muthusamy updated HIVE-23816:
--------------------------------------------
    Description: 
During the process of partition registration via thrift api we are noticing that the HDFS file path associated is being deleted even though the path was not created by the same process. 

This results in loss of data in the dir path.  In the below example there are 3 threads that is trying to create a dir and only one of succeeds in registering a partition , resulting the other 2 threads deleting the directory created and registered by the original thread. 


hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,307 INFO org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379217]: Creating directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,308 INFO org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-386717]: Creating directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,308 INFO org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379074]: Creating directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,314 INFO hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: deleting hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,315 INFO hive.metastore.hivemetastoressimpl: [pool-5-thread-379217]: deleting hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,321 INFO org.apache.hadoop.fs.TrashPolicyDefault: [pool-5-thread-386717]: Moved: 'hdfs://test_path/dt=2020-07-02/hhmm-0850' to trash at: hdfs://user/test/.Trash/Current/test/dt=2020-07-02/hhmm=0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,321 INFO hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: Moved to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,323 ERROR hive.log: [pool-5-thread-379217]: Got exception: java.io.IOException Failed to move to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:java.io.IOException: Failed to move to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,328 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-379217]: MetaException(message:Got exception: java.io.IOException Failed to move to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850)

 

  was:
During the process of partition registration via thrift api we are noticing that the HDFS file path associated is being deleted even though the path was not created by the same process. 

This results in loss of data in the dir path. 

 


>  Concurrent access of metastore dynamic partition registration API resulting in data loss due to HDFS dir deletion 
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-23816
>                 URL: https://issues.apache.org/jira/browse/HIVE-23816
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: rameshkrishnan muthusamy
>            Assignee: rameshkrishnan muthusamy
>            Priority: Major
>
> During the process of partition registration via thrift api we are noticing that the HDFS file path associated is being deleted even though the path was not created by the same process. 
> This results in loss of data in the dir path.  In the below example there are 3 threads that is trying to create a dir and only one of succeeds in registering a partition , resulting the other 2 threads deleting the directory created and registered by the original thread. 
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,307 INFO org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379217]: Creating directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,308 INFO org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-386717]: Creating directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,308 INFO org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379074]: Creating directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,314 INFO hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: deleting hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,315 INFO hive.metastore.hivemetastoressimpl: [pool-5-thread-379217]: deleting hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,321 INFO org.apache.hadoop.fs.TrashPolicyDefault: [pool-5-thread-386717]: Moved: 'hdfs://test_path/dt=2020-07-02/hhmm-0850' to trash at: hdfs://user/test/.Trash/Current/test/dt=2020-07-02/hhmm=0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,321 INFO hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: Moved to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,323 ERROR hive.log: [pool-5-thread-379217]: Got exception: java.io.IOException Failed to move to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:java.io.IOException: Failed to move to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,328 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-379217]: MetaException(message:Got exception: java.io.IOException Failed to move to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)