You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/11/09 08:42:03 UTC

[GitHub] [hive] HarshitGupta11 opened a new pull request #2771: [Draft]HIVE-24367 Do not review yet

HarshitGupta11 opened a new pull request #2771:
URL: https://github.com/apache/hive/pull/2771


   For Testing Only


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #2771:
URL: https://github.com/apache/hive/pull/2771#discussion_r783012583



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -1230,12 +1232,16 @@ else if(statementId != parsedDelta.statementId) {
    * that for any dir, either all files are acid or all are not.
    */
   public static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs) throws IOException {
-    return parsedDelta(deltaDir, fs, null);
+    return parsedDelta(deltaDir, fs, null, false, -1);
   }
 
-  private static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs, HdfsDirSnapshot dirSnapshot)
+  private static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs, HdfsDirSnapshot dirSnapshot,
+      boolean canTrim, long highWaterMark)
       throws IOException {
     ParsedDeltaLight deltaLight = ParsedDeltaLight.parse(deltaDir);
+    if(canTrim && !(deltaLight.minWriteId >= highWaterMark)){

Review comment:
       please comment this part code, as this is the point of this patch as I understood
   
   so according to some circumstances (canTrim + watermark thing, about which I have no idea :) ), getAcidState -> getChildState -> parsedDelta can short-circuit, so I'm assuming this will prevent processing some deltas, leading to performance improvement




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #2771:
URL: https://github.com/apache/hive/pull/2771#issuecomment-1007307171


   @HarshitGupta11: please provide a description about what's actually implemented here, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #2771:
URL: https://github.com/apache/hive/pull/2771#discussion_r783006992



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##########
@@ -499,6 +499,34 @@ public static void populateQuickStats(List<FileStatus> fileStatus, Map<String, S
     params.put(StatsSetupConst.NUM_ERASURE_CODED_FILES, Integer.toString(numErasureCodedFiles));
   }
 
+  public static void populateQuickStatsWithPrevStats(List<FileStatus> fileStatus, Map<String,String> params){

Review comment:
       this method has exactly the same logic as populateQuickStats except that in this case, initial integer values are set from parameters
   we should find a way to have the same logic written once: iteration on FileStatus + setting integers back to parameters, otherwise there is a chance that the two methods will diverge




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] HarshitGupta11 closed pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
HarshitGupta11 closed pull request #2771:
URL: https://github.com/apache/hive/pull/2771


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #2771:
URL: https://github.com/apache/hive/pull/2771#discussion_r783007686



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -136,6 +136,11 @@ public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf,
       }
     }
 
+    public boolean canTrim(){

Review comment:
       please javadoc on this method
   I struggle to understand what does "trim" mean in this context, also, a huge amount of conditions inside need an explanation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] HarshitGupta11 commented on a change in pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
HarshitGupta11 commented on a change in pull request #2771:
URL: https://github.com/apache/hive/pull/2771#discussion_r783023540



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -1230,12 +1232,16 @@ else if(statementId != parsedDelta.statementId) {
    * that for any dir, either all files are acid or all are not.
    */
   public static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs) throws IOException {
-    return parsedDelta(deltaDir, fs, null);
+    return parsedDelta(deltaDir, fs, null, false, -1);
   }
 
-  private static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs, HdfsDirSnapshot dirSnapshot)
+  private static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs, HdfsDirSnapshot dirSnapshot,
+      boolean canTrim, long highWaterMark)
       throws IOException {
     ParsedDeltaLight deltaLight = ParsedDeltaLight.parse(deltaDir);
+    if(canTrim && !(deltaLight.minWriteId >= highWaterMark)){

Review comment:
       Yeah, it will not read the deltas that are not part of the current transaction or were written before the current transaction.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] HarshitGupta11 commented on a change in pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
HarshitGupta11 commented on a change in pull request #2771:
URL: https://github.com/apache/hive/pull/2771#discussion_r783011556



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/stats/BasicStatsTask.java
##########
@@ -136,6 +136,11 @@ public BasicStatsProcessor(Partish partish, BasicStatsWork work, HiveConf conf,
       }
     }
 
+    public boolean canTrim(){

Review comment:
       The stats update will work only for tables that are non-partitioned and in the queries where there is a single stats stage(otherwise it will lead to overcounting) and some other conditions. So here it checks for those conditions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #2771:
URL: https://github.com/apache/hive/pull/2771#issuecomment-1010978261


   left some minor comments
   could you please share what did you find in terms of performance improvement, with the example query like:
   ```
   CREATE TABLE IF NOT EXISTS test (name String, value int);
   INSERT INTO test VALUES('K1',1);
   INSERT INTO test VALUES('K2',2);
   ..
   ..
   ..
   INSERT INTO test VALUES('K20000',2)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #2771:
URL: https://github.com/apache/hive/pull/2771#discussion_r783012583



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -1230,12 +1232,16 @@ else if(statementId != parsedDelta.statementId) {
    * that for any dir, either all files are acid or all are not.
    */
   public static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs) throws IOException {
-    return parsedDelta(deltaDir, fs, null);
+    return parsedDelta(deltaDir, fs, null, false, -1);
   }
 
-  private static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs, HdfsDirSnapshot dirSnapshot)
+  private static ParsedDelta parsedDelta(Path deltaDir, FileSystem fs, HdfsDirSnapshot dirSnapshot,
+      boolean canTrim, long highWaterMark)
       throws IOException {
     ParsedDeltaLight deltaLight = ParsedDeltaLight.parse(deltaDir);
+    if(canTrim && !(deltaLight.minWriteId >= highWaterMark)){

Review comment:
       please comment this part in the code, as this is the point of this patch as I understood
   
   so according to some circumstances (canTrim + watermark thing, about which I have no idea :) ), getAcidState -> getChildState -> parsedDelta can short-circuit, so I'm assuming this will prevent processing some deltas, leading to performance improvement




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on a change in pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
abstractdog commented on a change in pull request #2771:
URL: https://github.com/apache/hive/pull/2771#discussion_r783006992



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
##########
@@ -499,6 +499,34 @@ public static void populateQuickStats(List<FileStatus> fileStatus, Map<String, S
     params.put(StatsSetupConst.NUM_ERASURE_CODED_FILES, Integer.toString(numErasureCodedFiles));
   }
 
+  public static void populateQuickStatsWithPrevStats(List<FileStatus> fileStatus, Map<String,String> params){

Review comment:
       this method has exactly the same logic as populateQuickStats except that in this case, initial integer values are set from parameters
   we should find a way to have the same logic written once: iteration on FileStatus + setting integers back to parameters, otherwise there is a chance that the two methods will diverge
   
   maybe the new method will fit for both cases, you just have to call clearQuickStats first when you don't want to use previous stats




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] abstractdog commented on pull request #2771: HIVE-24367: Explore whether HiveAlterHandler::alterTable can be optimised for non-partitioned tables

Posted by GitBox <gi...@apache.org>.
abstractdog commented on pull request #2771:
URL: https://github.com/apache/hive/pull/2771#issuecomment-1007307171


   @HarshitGupta11: please provide a description about what's actually implemented here, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org