You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/07/30 00:39:37 UTC

[GitHub] [hive] achennagiri opened a new pull request, #3487: HIVE-26438: Remove unnecessary optimization in canHandleQbForCbo()

achennagiri opened a new pull request, #3487:
URL: https://github.com/apache/hive/pull/3487

   
   
   ### What changes were proposed in this pull request?
   This ticket is an improvement on https://issues.apache.org/jira/browse/HIVE-26426.  The canHandleQbForCbo() checks whether Calcite can handle the query or not and it returns null if the query can be handled; non-null reason string if it cannot be. 
   
   But currently, it returns an empty string if INFO Log is not enabled which should not be the case.
   Also, this is probably a performance optimization that is not needed and can be simplified. 
   
   ### Why are the changes needed?
   Simplify the code and improve the correctness.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak closed pull request #3487: HIVE-26438: Remove unnecessary optimization in canHandleQbForCbo()

Posted by GitBox <gi...@apache.org>.
zabetak closed pull request #3487: HIVE-26438: Remove unnecessary optimization in canHandleQbForCbo()
URL: https://github.com/apache/hive/pull/3487


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] achennagiri commented on a diff in pull request #3487: HIVE-26438: Remove unnecessary optimization in canHandleQbForCbo()

Posted by GitBox <gi...@apache.org>.
achennagiri commented on code in PR #3487:
URL: https://github.com/apache/hive/pull/3487#discussion_r934940068


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -974,44 +972,40 @@ Pair<Boolean, String> canCBOHandleAst(ASTNode ast, QB qb, PreCboCtx cboCtx) {
    *         Query<br>
    *         2. Nested Subquery will return false for qbToChk.getIsQuery()
    */
-  private static String canHandleQbForCbo(QueryProperties queryProperties, HiveConf conf,
-      boolean topLevelQB, boolean verbose) {
-
-    if (!queryProperties.hasClusterBy() && !queryProperties.hasDistributeBy()
-        && !(queryProperties.hasSortBy() && queryProperties.hasLimit())
-        && !queryProperties.hasPTF() && !queryProperties.usesScript()
-        && queryProperties.isCBOSupportedLateralViews()) {
-      // Ok to run CBO.
-      return null;
-    }
-
+  private static String canHandleQbForCbo(QueryProperties queryProperties,
+                                          HiveConf conf, boolean topLevelQB) {
+    List reasons = new ArrayList();
     // Not ok to run CBO, build error message.
-    String msg = "";
-    if (verbose) {
-      if (queryProperties.hasClusterBy()) {
-        msg += "has cluster by; ";
-      }
-      if (queryProperties.hasDistributeBy()) {
-        msg += "has distribute by; ";
-      }
-      if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
-        msg += "has sort by with limit; ";
-      }
-      if (queryProperties.hasPTF()) {
-        msg += "has PTF; ";
-      }
-      if (queryProperties.usesScript()) {
-        msg += "uses scripts; ";
-      }
-      if (queryProperties.hasLateralViews()) {
-        msg += "has lateral views; ";
-      }
-      if (msg.isEmpty()) {
-        msg += "has some unspecified limitations; ";
-      }
-      msg = msg.substring(0, msg.length() - 2);
+    String errorMsg = "";
+    if (queryProperties.hasClusterBy()) {
+      errorMsg = "has cluster by";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasDistributeBy()) {
+      errorMsg = "has distribute by";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
+      errorMsg = "has sort by with limit";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasPTF()) {
+      errorMsg = "has PTF";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.usesScript()) {
+      errorMsg = "uses scripts";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasLateralViews()) {
+      errorMsg = "has lateral views";
+      reasons.add(errorMsg);
+    }
+    if (!queryProperties.isCBOSupportedLateralViews()) {

Review Comment:
   Trying to keep the error messaging the same. If isCBOSupportedLateralViews() returns False, only then the error message will be "has some unspecified limitations".
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] achennagiri commented on a diff in pull request #3487: HIVE-26438: Remove unnecessary optimization in canHandleQbForCbo()

Posted by GitBox <gi...@apache.org>.
achennagiri commented on code in PR #3487:
URL: https://github.com/apache/hive/pull/3487#discussion_r934943226


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -987,30 +985,29 @@ private static String canHandleQbForCbo(QueryProperties queryProperties, HiveCon
 
     // Not ok to run CBO, build error message.
     String msg = "";
-    if (verbose) {
-      if (queryProperties.hasClusterBy()) {
-        msg += "has cluster by; ";
-      }
-      if (queryProperties.hasDistributeBy()) {
-        msg += "has distribute by; ";
-      }
-      if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
-        msg += "has sort by with limit; ";
-      }
-      if (queryProperties.hasPTF()) {
-        msg += "has PTF; ";
-      }
-      if (queryProperties.usesScript()) {
-        msg += "uses scripts; ";
-      }
-      if (queryProperties.hasLateralViews()) {
-        msg += "has lateral views; ";
-      }
-      if (msg.isEmpty()) {
-        msg += "has some unspecified limitations; ";
-      }
-      msg = msg.substring(0, msg.length() - 2);
+    if (queryProperties.hasClusterBy()) {

Review Comment:
   @jfsii I have made the changes accordingly. Let me know if there are any issues



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a diff in pull request #3487: HIVE-26438: Remove unnecessary optimization in canHandleQbForCbo()

Posted by GitBox <gi...@apache.org>.
zabetak commented on code in PR #3487:
URL: https://github.com/apache/hive/pull/3487#discussion_r935321551


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -974,44 +972,40 @@ Pair<Boolean, String> canCBOHandleAst(ASTNode ast, QB qb, PreCboCtx cboCtx) {
    *         Query<br>
    *         2. Nested Subquery will return false for qbToChk.getIsQuery()
    */
-  private static String canHandleQbForCbo(QueryProperties queryProperties, HiveConf conf,
-      boolean topLevelQB, boolean verbose) {
-
-    if (!queryProperties.hasClusterBy() && !queryProperties.hasDistributeBy()
-        && !(queryProperties.hasSortBy() && queryProperties.hasLimit())
-        && !queryProperties.hasPTF() && !queryProperties.usesScript()
-        && queryProperties.isCBOSupportedLateralViews()) {
-      // Ok to run CBO.
-      return null;
-    }
-
+  private static String canHandleQbForCbo(QueryProperties queryProperties,
+                                          HiveConf conf, boolean topLevelQB) {
+    List reasons = new ArrayList();

Review Comment:
   ```suggestion
       List<String> reasons = new ArrayList<>();
   ```
   Check "Effective Java, Item 23: Don’t use raw types in new code" for more info regarding the suggestion.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -974,44 +972,40 @@ Pair<Boolean, String> canCBOHandleAst(ASTNode ast, QB qb, PreCboCtx cboCtx) {
    *         Query<br>
    *         2. Nested Subquery will return false for qbToChk.getIsQuery()
    */
-  private static String canHandleQbForCbo(QueryProperties queryProperties, HiveConf conf,
-      boolean topLevelQB, boolean verbose) {
-
-    if (!queryProperties.hasClusterBy() && !queryProperties.hasDistributeBy()
-        && !(queryProperties.hasSortBy() && queryProperties.hasLimit())
-        && !queryProperties.hasPTF() && !queryProperties.usesScript()
-        && queryProperties.isCBOSupportedLateralViews()) {
-      // Ok to run CBO.
-      return null;
-    }
-
+  private static String canHandleQbForCbo(QueryProperties queryProperties,
+                                          HiveConf conf, boolean topLevelQB) {
+    List reasons = new ArrayList();
     // Not ok to run CBO, build error message.
-    String msg = "";
-    if (verbose) {
-      if (queryProperties.hasClusterBy()) {
-        msg += "has cluster by; ";
-      }
-      if (queryProperties.hasDistributeBy()) {
-        msg += "has distribute by; ";
-      }
-      if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
-        msg += "has sort by with limit; ";
-      }
-      if (queryProperties.hasPTF()) {
-        msg += "has PTF; ";
-      }
-      if (queryProperties.usesScript()) {
-        msg += "uses scripts; ";
-      }
-      if (queryProperties.hasLateralViews()) {
-        msg += "has lateral views; ";
-      }
-      if (msg.isEmpty()) {
-        msg += "has some unspecified limitations; ";
-      }
-      msg = msg.substring(0, msg.length() - 2);
+    String errorMsg = "";
+    if (queryProperties.hasClusterBy()) {
+      errorMsg = "has cluster by";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasDistributeBy()) {
+      errorMsg = "has distribute by";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
+      errorMsg = "has sort by with limit";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasPTF()) {
+      errorMsg = "has PTF";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.usesScript()) {
+      errorMsg = "uses scripts";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasLateralViews()) {
+      errorMsg = "has lateral views";
+      reasons.add(errorMsg);
+    }
+    if (!queryProperties.isCBOSupportedLateralViews()) {

Review Comment:
   If I am not wrong there is an important change in behavior with the proposed refactoring. Going quickly over the code it seems that `hasLateralViews` always returns `true` when there are such views in the query. We do not want to raise an error for every lateral view but only for those that CBO cannot handle. I think the following refactoring is valid:
   ```java
   if (!queryProperties.isCBOSupportedLateralViews()) {
     reasons.add("has lateral views");
   }
   ```
   We could include `queryProperties.hasLateralViews()` in the condition but I think it is redundant.
   
   Moreover, I get the impression that `"has some unspecified limitations"` is not reachable so can be omitted completely.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -974,44 +972,40 @@ Pair<Boolean, String> canCBOHandleAst(ASTNode ast, QB qb, PreCboCtx cboCtx) {
    *         Query<br>
    *         2. Nested Subquery will return false for qbToChk.getIsQuery()
    */
-  private static String canHandleQbForCbo(QueryProperties queryProperties, HiveConf conf,
-      boolean topLevelQB, boolean verbose) {
-
-    if (!queryProperties.hasClusterBy() && !queryProperties.hasDistributeBy()
-        && !(queryProperties.hasSortBy() && queryProperties.hasLimit())
-        && !queryProperties.hasPTF() && !queryProperties.usesScript()
-        && queryProperties.isCBOSupportedLateralViews()) {
-      // Ok to run CBO.
-      return null;
-    }
-
+  private static String canHandleQbForCbo(QueryProperties queryProperties,
+                                          HiveConf conf, boolean topLevelQB) {
+    List reasons = new ArrayList();
     // Not ok to run CBO, build error message.
-    String msg = "";
-    if (verbose) {
-      if (queryProperties.hasClusterBy()) {
-        msg += "has cluster by; ";
-      }
-      if (queryProperties.hasDistributeBy()) {
-        msg += "has distribute by; ";
-      }
-      if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
-        msg += "has sort by with limit; ";
-      }
-      if (queryProperties.hasPTF()) {
-        msg += "has PTF; ";
-      }
-      if (queryProperties.usesScript()) {
-        msg += "uses scripts; ";
-      }
-      if (queryProperties.hasLateralViews()) {
-        msg += "has lateral views; ";
-      }
-      if (msg.isEmpty()) {
-        msg += "has some unspecified limitations; ";
-      }
-      msg = msg.substring(0, msg.length() - 2);
+    String errorMsg = "";

Review Comment:
   The local variable `errorMsg` is redundant and can be removed. You can directly do `reasons.add("has cluster by")` etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] achennagiri commented on a diff in pull request #3487: HIVE-26438: Remove unnecessary optimization in canHandleQbForCbo()

Posted by GitBox <gi...@apache.org>.
achennagiri commented on code in PR #3487:
URL: https://github.com/apache/hive/pull/3487#discussion_r935841364


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -974,44 +972,40 @@ Pair<Boolean, String> canCBOHandleAst(ASTNode ast, QB qb, PreCboCtx cboCtx) {
    *         Query<br>
    *         2. Nested Subquery will return false for qbToChk.getIsQuery()
    */
-  private static String canHandleQbForCbo(QueryProperties queryProperties, HiveConf conf,
-      boolean topLevelQB, boolean verbose) {
-
-    if (!queryProperties.hasClusterBy() && !queryProperties.hasDistributeBy()
-        && !(queryProperties.hasSortBy() && queryProperties.hasLimit())
-        && !queryProperties.hasPTF() && !queryProperties.usesScript()
-        && queryProperties.isCBOSupportedLateralViews()) {
-      // Ok to run CBO.
-      return null;
-    }
-
+  private static String canHandleQbForCbo(QueryProperties queryProperties,
+                                          HiveConf conf, boolean topLevelQB) {
+    List reasons = new ArrayList();
     // Not ok to run CBO, build error message.
-    String msg = "";
-    if (verbose) {
-      if (queryProperties.hasClusterBy()) {
-        msg += "has cluster by; ";
-      }
-      if (queryProperties.hasDistributeBy()) {
-        msg += "has distribute by; ";
-      }
-      if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
-        msg += "has sort by with limit; ";
-      }
-      if (queryProperties.hasPTF()) {
-        msg += "has PTF; ";
-      }
-      if (queryProperties.usesScript()) {
-        msg += "uses scripts; ";
-      }
-      if (queryProperties.hasLateralViews()) {
-        msg += "has lateral views; ";
-      }
-      if (msg.isEmpty()) {
-        msg += "has some unspecified limitations; ";
-      }
-      msg = msg.substring(0, msg.length() - 2);
+    String errorMsg = "";
+    if (queryProperties.hasClusterBy()) {
+      errorMsg = "has cluster by";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasDistributeBy()) {
+      errorMsg = "has distribute by";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
+      errorMsg = "has sort by with limit";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasPTF()) {
+      errorMsg = "has PTF";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.usesScript()) {
+      errorMsg = "uses scripts";
+      reasons.add(errorMsg);
+    }
+    if (queryProperties.hasLateralViews()) {
+      errorMsg = "has lateral views";
+      reasons.add(errorMsg);
+    }
+    if (!queryProperties.isCBOSupportedLateralViews()) {

Review Comment:
   I see! Thanks a lot for this review. 
   I agree `"has some unspecified limitations"` is not reachable.  I will omit it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jfsii commented on a diff in pull request #3487: HIVE-26438: Remove unnecessary optimization in canHandleQbForCbo()

Posted by GitBox <gi...@apache.org>.
jfsii commented on code in PR #3487:
URL: https://github.com/apache/hive/pull/3487#discussion_r933840592


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -987,30 +985,29 @@ private static String canHandleQbForCbo(QueryProperties queryProperties, HiveCon
 
     // Not ok to run CBO, build error message.
     String msg = "";
-    if (verbose) {
-      if (queryProperties.hasClusterBy()) {
-        msg += "has cluster by; ";
-      }
-      if (queryProperties.hasDistributeBy()) {
-        msg += "has distribute by; ";
-      }
-      if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
-        msg += "has sort by with limit; ";
-      }
-      if (queryProperties.hasPTF()) {
-        msg += "has PTF; ";
-      }
-      if (queryProperties.usesScript()) {
-        msg += "uses scripts; ";
-      }
-      if (queryProperties.hasLateralViews()) {
-        msg += "has lateral views; ";
-      }
-      if (msg.isEmpty()) {
-        msg += "has some unspecified limitations; ";
-      }
-      msg = msg.substring(0, msg.length() - 2);
+    if (queryProperties.hasClusterBy()) {
+      msg += "has cluster by; ";
+    }
+    if (queryProperties.hasDistributeBy()) {
+      msg += "has distribute by; ";
+    }
+    if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
+      msg += "has sort by with limit; ";
+    }
+    if (queryProperties.hasPTF()) {
+      msg += "has PTF; ";
     }
+    if (queryProperties.usesScript()) {
+      msg += "uses scripts; ";
+    }
+    if (queryProperties.hasLateralViews()) {
+      msg += "has lateral views; ";
+    }
+    if (msg.isEmpty()) {
+    msg += "has some unspecified limitations; ";

Review Comment:
   indenting is incorrect  -
   one positive change from my suggestions above is that it enforces anyone who modifies this section of code with new conditions - would be forced to also provide a valid error message (so the only time msgs/reason would/should be empty is if it is a valid QB for  CBO).



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -987,30 +985,29 @@ private static String canHandleQbForCbo(QueryProperties queryProperties, HiveCon
 
     // Not ok to run CBO, build error message.
     String msg = "";
-    if (verbose) {
-      if (queryProperties.hasClusterBy()) {
-        msg += "has cluster by; ";
-      }
-      if (queryProperties.hasDistributeBy()) {
-        msg += "has distribute by; ";
-      }
-      if (queryProperties.hasSortBy() && queryProperties.hasLimit()) {
-        msg += "has sort by with limit; ";
-      }
-      if (queryProperties.hasPTF()) {
-        msg += "has PTF; ";
-      }
-      if (queryProperties.usesScript()) {
-        msg += "uses scripts; ";
-      }
-      if (queryProperties.hasLateralViews()) {
-        msg += "has lateral views; ";
-      }
-      if (msg.isEmpty()) {
-        msg += "has some unspecified limitations; ";
-      }
-      msg = msg.substring(0, msg.length() - 2);
+    if (queryProperties.hasClusterBy()) {

Review Comment:
   I would maybe change this to something like (but this is more of an opinion):
   List<String> reasons = new ArrayList();
   and in each if () { reason.add(errorMsg); }
   and at the end do:
   return Strings.join("; ", reasons);
   
   You could technically combine the top if check for validity with the error message processing to force each case to always have a valid error message by deleting the top if (conditions) { return null },
   add the missing condition to the error messages (queryProperties.isCBOSupportedLateralViews())) and change the last return to be something like:
   return reasons.isEmpty() ? null : Strings.join("; ",  reasons);
   
   Hopefully that makes sense? It is also a matter of style - feel free to ignore if you do not think the above suggestions make the code more readable or less fragile. (I'm not 100% sure they make it better - but in my head I think they do).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org