You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by "akshat0395 (via GitHub)" <gi...@apache.org> on 2023/03/31 12:58:59 UTC

[GitHub] [hive] akshat0395 opened a new pull request, #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

akshat0395 opened a new pull request, #4181:
URL: https://github.com/apache/hive/pull/4181

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Improve Qtest Coverage for Compaction use cases for ACID Tables:
   
   Partitioned Tables( Major & Minor ) 
   Insert-Only Clustered( Major & Minor )
   Insert-Only Partitioned( Major & Minor ) 
   Insert-Only Clustered and Partitioned( Major & Minor ) 
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   To Improve test coverage for compaction
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Qtest


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] rkirtir commented on a diff in pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "rkirtir (via GitHub)" <gi...@apache.org>.
rkirtir commented on code in PR #4181:
URL: https://github.com/apache/hive/pull/4181#discussion_r1155589934


##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_partitioned_clustered.q:
##########


Review Comment:
   Is not it better to have major and minor compactions in one file? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] SourabhBadhya commented on a diff in pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "SourabhBadhya (via GitHub)" <gi...@apache.org>.
SourabhBadhya commented on code in PR #4181:
URL: https://github.com/apache/hive/pull/4181#discussion_r1158174819


##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;
+
+create table orc_bucketed (a int, b string) clustered by (a) into 3 buckets stored as orc TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into orc_bucketed values('1', 'text1');
+insert into orc_bucketed values('2', 'text2');
+insert into orc_bucketed values('3', 'text3');
+insert into orc_bucketed values('4', 'text4');
+insert into orc_bucketed values('5', 'text5');
+insert into orc_bucketed values('6', 'text6');
+insert into orc_bucketed values('7', 'text7');
+insert into orc_bucketed values('8', 'text8');
+insert into orc_bucketed values('9', 'text9');
+insert into orc_bucketed values('10', 'text10');
+
+describe extended orc_bucketed;
+alter table orc_bucketed compact 'MAJOR' and wait;
+analyze table orc_bucketed compute statistics;

Review Comment:
   @zratkai Ok makes sense then we can use the `analyze table compute statistics` if it causes delay issues. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] akshat0395 commented on a diff in pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "akshat0395 (via GitHub)" <gi...@apache.org>.
akshat0395 commented on code in PR #4181:
URL: https://github.com/apache/hive/pull/4181#discussion_r1155606740


##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_partitioned_clustered.q:
##########


Review Comment:
   Thanks for the comment @rkirtir, These test individually and the reason for having Major and minor in separate tests is to test these compaction in isolation for different scenarios.
   This patterns has been followed in other compaction related qtests as well.
   Here are some ref that follows the same pattern:
   
   1. ql/src/test/queries/clientpositive/compaction_query_based.q
   2. ql/src/test/queries/clientpositive/compaction_query_based_clustered.q
   3. ql/src/test/queries/clientpositive/compaction_query_based_clustered_minor.q
   4. ql/src/test/queries/clientpositive/compaction_query_based_insert_only.q
   5.  ql/src/test/queries/clientpositive/compaction_query_based_insert_only_minor.q
   6. ql/src/test/queries/clientpositive/compaction_query_based_minor.q



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] akshat0395 commented on a diff in pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "akshat0395 (via GitHub)" <gi...@apache.org>.
akshat0395 commented on code in PR #4181:
URL: https://github.com/apache/hive/pull/4181#discussion_r1158031774


##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;
+
+create table orc_bucketed (a int, b string) clustered by (a) into 3 buckets stored as orc TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into orc_bucketed values('1', 'text1');
+insert into orc_bucketed values('2', 'text2');
+insert into orc_bucketed values('3', 'text3');
+insert into orc_bucketed values('4', 'text4');
+insert into orc_bucketed values('5', 'text5');
+insert into orc_bucketed values('6', 'text6');
+insert into orc_bucketed values('7', 'text7');
+insert into orc_bucketed values('8', 'text8');
+insert into orc_bucketed values('9', 'text9');
+insert into orc_bucketed values('10', 'text10');
+
+describe extended orc_bucketed;
+alter table orc_bucketed compact 'MAJOR' and wait;
+analyze table orc_bucketed compute statistics;

Review Comment:
   @SourabhBadhya IMK hive.compactor.gather.stats is used for Compaction stats.
   the analyze table command is for table stats and for auto generation of those stats `hive.stats.autogather=true` is used.
   Ref: https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/performance-tuning/content/hive_generate_hive_statistics.html.
   Please correct me if Im wrong



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4181:
URL: https://github.com/apache/hive/pull/4181#issuecomment-1492391823

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4181)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4181&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4181&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4181&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4181&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4181&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] SourabhBadhya commented on a diff in pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "SourabhBadhya (via GitHub)" <gi...@apache.org>.
SourabhBadhya commented on code in PR #4181:
URL: https://github.com/apache/hive/pull/4181#discussion_r1158012483


##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;
+
+create table orc_bucketed (a int, b string) clustered by (a) into 3 buckets stored as orc TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into orc_bucketed values('1', 'text1');
+insert into orc_bucketed values('2', 'text2');
+insert into orc_bucketed values('3', 'text3');
+insert into orc_bucketed values('4', 'text4');
+insert into orc_bucketed values('5', 'text5');
+insert into orc_bucketed values('6', 'text6');
+insert into orc_bucketed values('7', 'text7');
+insert into orc_bucketed values('8', 'text8');
+insert into orc_bucketed values('9', 'text9');
+insert into orc_bucketed values('10', 'text10');
+
+describe extended orc_bucketed;
+alter table orc_bucketed compact 'MAJOR' and wait;
+analyze table orc_bucketed compute statistics;

Review Comment:
   Stats update on query based compaction depends upon this config - 
   `hive.compactor.gather.stats`
   Source - https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java#L103
   https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java#L444
   
   After taking a deeper look, this config seems to be disabled in MiniLlapLocalCompactorCliDriver here as part of HIVE-26802 - 
   https://github.com/apache/hive/blob/master/itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CliConfigs.java#L271
   
   @zratkai Is there any reason why this was disabled as part of HIVE-26802? I did try some tests locally and I was able to see correct stats after removing this disabling config line from CliConfigs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] SourabhBadhya commented on a diff in pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "SourabhBadhya (via GitHub)" <gi...@apache.org>.
SourabhBadhya commented on code in PR #4181:
URL: https://github.com/apache/hive/pull/4181#discussion_r1155722596


##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;
+
+create table orc_bucketed (a int, b string) clustered by (a) into 3 buckets stored as orc TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into orc_bucketed values('1', 'text1');
+insert into orc_bucketed values('2', 'text2');
+insert into orc_bucketed values('3', 'text3');
+insert into orc_bucketed values('4', 'text4');
+insert into orc_bucketed values('5', 'text5');
+insert into orc_bucketed values('6', 'text6');
+insert into orc_bucketed values('7', 'text7');
+insert into orc_bucketed values('8', 'text8');
+insert into orc_bucketed values('9', 'text9');
+insert into orc_bucketed values('10', 'text10');
+
+describe extended orc_bucketed;
+alter table orc_bucketed compact 'MAJOR' and wait;
+analyze table orc_bucketed compute statistics;
+
+describe extended orc_bucketed;

Review Comment:
   nit: Add a new line at the end of file.



##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;

Review Comment:
   `drop table if exists`?



##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;
+
+create table orc_bucketed (a int, b string) clustered by (a) into 3 buckets stored as orc TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into orc_bucketed values('1', 'text1');
+insert into orc_bucketed values('2', 'text2');
+insert into orc_bucketed values('3', 'text3');
+insert into orc_bucketed values('4', 'text4');
+insert into orc_bucketed values('5', 'text5');
+insert into orc_bucketed values('6', 'text6');
+insert into orc_bucketed values('7', 'text7');
+insert into orc_bucketed values('8', 'text8');
+insert into orc_bucketed values('9', 'text9');
+insert into orc_bucketed values('10', 'text10');
+
+describe extended orc_bucketed;
+alter table orc_bucketed compact 'MAJOR' and wait;
+analyze table orc_bucketed compute statistics;

Review Comment:
   Is `analyze table compute statistics` command required here? 
   Stats update usually happens within the compaction cycle so I think re-evaluation of stats is extra effort.
   
   The place where Stats update happens in compaction is here - https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/StatsUpdater.java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] SourabhBadhya commented on a diff in pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "SourabhBadhya (via GitHub)" <gi...@apache.org>.
SourabhBadhya commented on code in PR #4181:
URL: https://github.com/apache/hive/pull/4181#discussion_r1158051755


##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;
+
+create table orc_bucketed (a int, b string) clustered by (a) into 3 buckets stored as orc TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into orc_bucketed values('1', 'text1');
+insert into orc_bucketed values('2', 'text2');
+insert into orc_bucketed values('3', 'text3');
+insert into orc_bucketed values('4', 'text4');
+insert into orc_bucketed values('5', 'text5');
+insert into orc_bucketed values('6', 'text6');
+insert into orc_bucketed values('7', 'text7');
+insert into orc_bucketed values('8', 'text8');
+insert into orc_bucketed values('9', 'text9');
+insert into orc_bucketed values('10', 'text10');
+
+describe extended orc_bucketed;
+alter table orc_bucketed compact 'MAJOR' and wait;
+analyze table orc_bucketed compute statistics;

Review Comment:
   @akshat0395 `hive.stats.autogather` is already true by default. This config is used by insert/delete/update statements to compute stats once records are written. 
   
   Compaction checks this config - `hive.compactor.gather.stats` to see whether it has to perform stats computation of the table which has been compacted. Since its turned off by MiniLlapLocalCompactorCliDriver, stats computation is disabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] akshat0395 commented on a diff in pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "akshat0395 (via GitHub)" <gi...@apache.org>.
akshat0395 commented on code in PR #4181:
URL: https://github.com/apache/hive/pull/4181#discussion_r1157976783


##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;
+
+create table orc_bucketed (a int, b string) clustered by (a) into 3 buckets stored as orc TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into orc_bucketed values('1', 'text1');
+insert into orc_bucketed values('2', 'text2');
+insert into orc_bucketed values('3', 'text3');
+insert into orc_bucketed values('4', 'text4');
+insert into orc_bucketed values('5', 'text5');
+insert into orc_bucketed values('6', 'text6');
+insert into orc_bucketed values('7', 'text7');
+insert into orc_bucketed values('8', 'text8');
+insert into orc_bucketed values('9', 'text9');
+insert into orc_bucketed values('10', 'text10');
+
+describe extended orc_bucketed;
+alter table orc_bucketed compact 'MAJOR' and wait;
+analyze table orc_bucketed compute statistics;

Review Comment:
   Thanks for the review @SourabhBadhya, The reason why this is required here is to make sure the test runs independent of Hive config. The automatic Stats updates happens when hive.stats.autogather=true.
   To ensure we have the data in q.out regardless of config analyze statement is used. Let me know if you have any thought on this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zratkai commented on a diff in pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "zratkai (via GitHub)" <gi...@apache.org>.
zratkai commented on code in PR #4181:
URL: https://github.com/apache/hive/pull/4181#discussion_r1158152919


##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;
+
+create table orc_bucketed (a int, b string) clustered by (a) into 3 buckets stored as orc TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into orc_bucketed values('1', 'text1');
+insert into orc_bucketed values('2', 'text2');
+insert into orc_bucketed values('3', 'text3');
+insert into orc_bucketed values('4', 'text4');
+insert into orc_bucketed values('5', 'text5');
+insert into orc_bucketed values('6', 'text6');
+insert into orc_bucketed values('7', 'text7');
+insert into orc_bucketed values('8', 'text8');
+insert into orc_bucketed values('9', 'text9');
+insert into orc_bucketed values('10', 'text10');
+
+describe extended orc_bucketed;
+alter table orc_bucketed compact 'MAJOR' and wait;
+analyze table orc_bucketed compute statistics;

Review Comment:
   I do not remember exactly why it was necessary to do it this way. With this steps you can force the analyze to happen exactly when it needed. If I remember correctly the autogather is async, and it caused issues, that sometimes didn't happen until the test needed at the last describe table, so it failed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] sonarcloud[bot] commented on pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "sonarcloud[bot] (via GitHub)" <gi...@apache.org>.
sonarcloud[bot] commented on PR #4181:
URL: https://github.com/apache/hive/pull/4181#issuecomment-1497146300

   Kudos, SonarCloud Quality Gate passed!&nbsp; &nbsp; [![Quality Gate passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png 'Quality Gate passed')](https://sonarcloud.io/dashboard?id=apache_hive&pullRequest=4181)
   
   [![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png 'Bug')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=BUG) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=BUG) [0 Bugs](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=BUG)  
   [![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png 'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=VULNERABILITY) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=VULNERABILITY) [0 Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=VULNERABILITY)  
   [![Security Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png 'Security Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4181&resolved=false&types=SECURITY_HOTSPOT) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4181&resolved=false&types=SECURITY_HOTSPOT) [0 Security Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive&pullRequest=4181&resolved=false&types=SECURITY_HOTSPOT)  
   [![Code Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png 'Code Smell')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=CODE_SMELL) [![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png 'A')](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=CODE_SMELL) [0 Code Smells](https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=4181&resolved=false&types=CODE_SMELL)
   
   [![No Coverage information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png 'No Coverage information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4181&metric=coverage&view=list) No Coverage information  
   [![No Duplication information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png 'No Duplication information')](https://sonarcloud.io/component_measures?id=apache_hive&pullRequest=4181&metric=duplicated_lines_density&view=list) No Duplication information
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] akshat0395 commented on a diff in pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "akshat0395 (via GitHub)" <gi...@apache.org>.
akshat0395 commented on code in PR #4181:
URL: https://github.com/apache/hive/pull/4181#discussion_r1158028978


##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;
+
+create table orc_bucketed (a int, b string) clustered by (a) into 3 buckets stored as orc TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');
+
+insert into orc_bucketed values('1', 'text1');
+insert into orc_bucketed values('2', 'text2');
+insert into orc_bucketed values('3', 'text3');
+insert into orc_bucketed values('4', 'text4');
+insert into orc_bucketed values('5', 'text5');
+insert into orc_bucketed values('6', 'text6');
+insert into orc_bucketed values('7', 'text7');
+insert into orc_bucketed values('8', 'text8');
+insert into orc_bucketed values('9', 'text9');
+insert into orc_bucketed values('10', 'text10');
+
+describe extended orc_bucketed;
+alter table orc_bucketed compact 'MAJOR' and wait;
+analyze table orc_bucketed compute statistics;
+
+describe extended orc_bucketed;

Review Comment:
   Added 



##########
ql/src/test/queries/clientpositive/compaction_query_based_insert_only_clustered.q:
##########
@@ -0,0 +1,33 @@
+--! qt:replace:/createTime:(\d+)/#Masked#/
+--! qt:replace:/location:(\S+)/#Masked#/
+--! qt:replace:/lastAccessTime:(\d+)/#Masked#/
+--! qt:replace:/ownerType:(\S*)/#Masked#/
+--! qt:replace:/owner:(\S*)/#Masked#/
+--! qt:replace:/skewedColValueLocationMaps:(\S*)/#Masked#/
+--! qt:replace:/transient_lastDdlTime=(\d+)/#Masked#/
+--! qt:replace:/totalSize=(\d+)/#Masked#/
+--! qt:replace:/rawDataSize=(\d+)/#Masked#/
+--! qt:replace:/writeId:(\d+)/#Masked#/
+--! qt:replace:/bucketing_version=(\d+)/#Masked#/
+--! qt:replace:/id:(\d+)/#Masked#/
+
+drop table orc_bucketed;

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] veghlaci05 merged pull request #4181: HIVE-27203: Add compaction Qtest for Insert-only, Partitioned, Clustered, and combination ACID Tables

Posted by "veghlaci05 (via GitHub)" <gi...@apache.org>.
veghlaci05 merged PR #4181:
URL: https://github.com/apache/hive/pull/4181


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org