You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/02/24 20:23:58 UTC

[GitHub] [hive] zchovan opened a new pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

zchovan opened a new pull request #2017:
URL: https://github.com/apache/hive/pull/2017


   Change-Id: Ie989ced8a1ef88e397d4d310628143cdb53ee8ca
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   This changes the drop partition operation to asynchronous. The data files of transactional tables will not be deleted, but a truncated basefile will be written, which is going to be later cleaned up by the Compactor/Cleaner. 
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   This along with a few other changes will enable us to not use read locks, which provides perf boost to the transactional tables.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Added tests to the TestTxnCommands
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zchovan commented on a change in pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
zchovan commented on a change in pull request #2017:
URL: https://github.com/apache/hive/pull/2017#discussion_r582638123



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##########
@@ -4865,10 +4871,26 @@ private boolean drop_partition_common(RawStore ms, String catName, String db_nam
 
           if (isArchived) {
             assert (archiveParentDir != null);
-            wh.deleteDir(archiveParentDir, true, mustPurge, needsCm);
+            if (writeTruncatedBase) {
+              try {
+                addTruncateBaseFile(archiveParentDir, writeId, archiveParentDir.getFileSystem(getConf()));
+              } catch (Exception e) {
+                throw newMetaException(e);
+              }
+            } else {
+              wh.deleteDir(archiveParentDir, true, mustPurge, needsCm);
+            }
           } else {
             assert (partPath != null);
-            wh.deleteDir(partPath, true, mustPurge, needsCm);
+            if (writeTruncatedBase) {
+              try {
+                addTruncateBaseFile(partPath, writeId, archiveParentDir.getFileSystem(getConf()));

Review comment:
       good idea, will do




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zchovan commented on pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
zchovan commented on pull request #2017:
URL: https://github.com/apache/hive/pull/2017#issuecomment-785350797


   cc @deniskuzZ @pvargacl 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
pvargacl commented on pull request #2017:
URL: https://github.com/apache/hive/pull/2017#issuecomment-785691173


   I missing some features, should these part of this PR?
   
   -  shouldn't the lock type changed from exclusive to excl_write if we use this kind of drop?
   -  shouldn't he Cleaner later delete the whole partition directory if it was not recreated?
   -  I think you should add many test, around drop and recreate, what happens if you recreate in different scenarios (while there is an old read still running, when you overlap with the cleaner, when it was already cleaned up ...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #2017:
URL: https://github.com/apache/hive/pull/2017#discussion_r582609667



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/PartitionDropOptions.java
##########
@@ -27,6 +27,8 @@
   public boolean ifExists = false;
   public boolean returnResults = true;
   public boolean purgeData = false;
+  public String validWriteIds;

Review comment:
       Is the validWriteIds used anywhere?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zchovan commented on a change in pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
zchovan commented on a change in pull request #2017:
URL: https://github.com/apache/hive/pull/2017#discussion_r582632897



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##########
@@ -687,6 +687,8 @@ public static ConfVars getMetaConf(String name) {
         "hive-metastore/_HOST@EXAMPLE.COM",
         "The service principal for the metastore Thrift server. \n" +
             "The special string _HOST will be replaced automatically with the correct host name."),
+    LOCKLESS_READS_ENABLED("metastore.lockless.reads.enabled", "metastore.lockless.reads.enabled",

Review comment:
       do you mean there should be one main config (e.g. this one) and additional one for each related operation (dropTable/partition/etc) or no main one and one for each op?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zchovan commented on pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
zchovan commented on pull request #2017:
URL: https://github.com/apache/hive/pull/2017#issuecomment-785731823


   @pvargacl 
   The Cleaner changes are planned for a separate commit. I agree that the scenarios you've mentioned have to be tested, but without the final cleaner changes they don't really make sense here. 
   The way I see it, the Cleaner first checks if the partition still exists in the HMS, if it doesn't, then the partition has not been yet recreated and the whole location dir can be deleted, no compaction needed.
   If the partition exists that means that between the dropPartition and the compaction's start the partition was recreated and should be compacted, e.g the files created before the truncated/deleted base file was written can be compacted/deleted.
   This still leaves the last scenario where the Cleaner is already running and the partition is recreated, so yeah that should be checked and tested.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #2017:
URL: https://github.com/apache/hive/pull/2017#discussion_r582602457



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
##########
@@ -687,6 +687,8 @@ public static ConfVars getMetaConf(String name) {
         "hive-metastore/_HOST@EXAMPLE.COM",
         "The service principal for the metastore Thrift server. \n" +
             "The special string _HOST will be replaced automatically with the correct host name."),
+    LOCKLESS_READS_ENABLED("metastore.lockless.reads.enabled", "metastore.lockless.reads.enabled",

Review comment:
       I think there should be a separate config just for the drop partition




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #2017:
URL: https://github.com/apache/hive/pull/2017#discussion_r582603575



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/AcidEventListener.java
##########
@@ -72,6 +77,22 @@ public void onDropPartition(DropPartitionEvent partitionEvent)  throws MetaExcep
       txnHandler = getTxnHandler();
       txnHandler.cleanupRecords(HiveObjectType.PARTITION, null, partitionEvent.getTable(),
           partitionEvent.getPartitionIterator());
+
+      if (MetastoreConf.getBoolVar(conf, ConfVars.LOCKLESS_READS_ENABLED)) {
+        CompactionRequest rqst = new CompactionRequest(partitionEvent.getTable().getDbName(), partitionEvent.getTable().getTableName(),

Review comment:
       Is this going to work? If the partition record is dropped, I think the compaction will automatically fail, there should be a test for this
   I think you should just create a compaction record that is in ready_for_cleaning state. And since there is a base file, I think I would prefer a major compaction, but it does not really matter.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] closed pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #2017:
URL: https://github.com/apache/hive/pull/2017


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #2017:
URL: https://github.com/apache/hive/pull/2017#issuecomment-855489220


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl edited a comment on pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
pvargacl edited a comment on pull request #2017:
URL: https://github.com/apache/hive/pull/2017#issuecomment-785691173


   I missing some features, should these part of this PR?
   
   -  shouldn't the lock type changed from exclusive to excl_write if we use this kind of drop?
   -  shouldn't he Cleaner later delete the whole partition directory if it was not recreated?
   -  I think you should add many test, around drop and recreate, what happens if you recreate in different scenarios (while there is an old read still running, when you overlap with the cleaner, when it was already cleaned up ...
   - there will be a race condition in the cleaner and the create partition I think, what happens if Cleaner starts to run, checks if the partition was dropped, so decides it has to delete the partition directory, but now a create partition comes and a new query starts to write a new delta. This should be handled. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #2017:
URL: https://github.com/apache/hive/pull/2017#discussion_r582604658



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##########
@@ -4865,10 +4871,26 @@ private boolean drop_partition_common(RawStore ms, String catName, String db_nam
 
           if (isArchived) {
             assert (archiveParentDir != null);
-            wh.deleteDir(archiveParentDir, true, mustPurge, needsCm);
+            if (writeTruncatedBase) {
+              try {
+                addTruncateBaseFile(archiveParentDir, writeId, archiveParentDir.getFileSystem(getConf()));
+              } catch (Exception e) {
+                throw newMetaException(e);
+              }
+            } else {
+              wh.deleteDir(archiveParentDir, true, mustPurge, needsCm);
+            }
           } else {
             assert (partPath != null);
-            wh.deleteDir(partPath, true, mustPurge, needsCm);
+            if (writeTruncatedBase) {
+              try {
+                addTruncateBaseFile(partPath, writeId, archiveParentDir.getFileSystem(getConf()));

Review comment:
       The metadata file has a type information, I think you should not use truncate type, rather create a drop type.
   That info might be useful when you do the cleanup and have to decide whether to delete the whole partition directory or not




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zchovan commented on a change in pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
zchovan commented on a change in pull request #2017:
URL: https://github.com/apache/hive/pull/2017#discussion_r582631700



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/PartitionDropOptions.java
##########
@@ -27,6 +27,8 @@
   public boolean ifExists = false;
   public boolean returnResults = true;
   public boolean purgeData = false;
+  public String validWriteIds;

Review comment:
       no, that was left in by mistake




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on pull request #2017: HIVE-24753: Non blocking DROP PARTITION implementation

Posted by GitBox <gi...@apache.org>.
pvargacl commented on pull request #2017:
URL: https://github.com/apache/hive/pull/2017#issuecomment-785744497


   > @pvargacl
   > The Cleaner changes are planned for a separate commit. I agree that the scenarios you've mentioned have to be tested, but without the final cleaner changes they don't really make sense here.
   > The way I see it, the Cleaner first checks if the partition still exists in the HMS, if it doesn't, then the partition has not been yet recreated and the whole location dir can be deleted, no compaction needed.
   > If the partition exists that means that between the dropPartition and the compaction's start the partition was recreated and should be compacted, e.g the files created before the truncated/deleted base file was written can be compacted/deleted.
   > This still leaves the last scenario where the Cleaner is already running and the partition is recreated, so yeah that should be checked and tested.
   
   I don't think this can go in with some basic Cleaner change, even if it does not delete the partition directory, you have to handle if the partition record is missing, otherwise the Cleaner will just fail.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org